Evaluating Science and Scientists 9789633865668

The shift to a market economy in post-communist Eastern Europe has had a profound impact on science and scientists acros

203 42 6MB

English Pages 238 [236] Year 1997

Table of contents :
CONTENTS
List of Contributors
Acknowledgments
Introduction
Part I. Evaluating Science and Scientists: The Political and Economic Context
1. The Political Context of Science Priority-Setting in the United States
2. The Political and Economic Context of Research Evaluation in Eastern Europe
3. Factors Affecting the Acceptance of Evaluation Results
Part II. Peer Review: Self-Regulation and Accountability in Science
4. Peer Review in Science and Science Policy
5. A Polish Perspective on Peer Review
6. The Limits of Peer Review: The Case of Hungary
7. The Evaluation of Research Institutes in Hungary
8. Peer Review in the Czech Republic
9. Peer Review in Poland: Practical Solutions and Possible Improvements
Part III. Quantitative Techniques for Evaluating Performance
10. Quantitative Techniques in Evaluation in Western Europe
11. The Evaluation of Scientists and Their Work
12. Scientometric Methods in the Evaluation of Research Groups in Hungary
13. Measuring and Evaluating Scientific Performance in the Czech Republic
14. Institutionalizing Evaluation of Research and Education in the Slovak Republic
15. Formal Evaluation Methods: Their Utility and Limitations
16. Evaluation of Research and Development Programs by Technology Indicators
17. Quantitative Citation Data as Indicators in Science Evaluations: A Primer on Their Appropriate Use
18. Ethical and Political Aspects of Using and Interpreting Quantitative Indicators
Index

Recommend Papers

Oracles of Science: Celebrity Scientists Versus God and Religion

Oracles of Science examines the popular writings of the six scientists who have been the most influential in shaping our

411 121 279KB Read more

The Representation of Science and Scientists on Postage Stamps: A science communication study 1925021785, 9781925021783

The Representation of Science and Scientists on Postage Stamps examines how the postal authorities of the world have dev

427 69 63MB Read more

Practicing Science, Living Faith: Interviews with Twelve Leading Scientists 9780231534222

I hope that this volume of spiritual reflections from scientists around the globe will help its readers find a calm and

111 14 59MB Read more

Evaluating Peace Operations 9781685856922

There has been a great deal written on why peace operations succeed or fail.... But how are those judgments reached? By

129 45 1MB Read more

Sight and Sensibility: Evaluating Pictures 0199277346, 9780199277346

Книга Sight and Sensibility: Evaluating Pictures Sight and Sensibility: Evaluating PicturesКниги English литература Авто

409 59 2MB Read more

Evaluating Exam Review Book

1,464 18 17MB Read more

Evaluating Democracy Assistance 9781685854416

A comprehensive, practical guide to the on-the-ground tasks of evaluating and monitoring democracy assistance programs,

120 92 2MB Read more

Evaluating Shelley 9781474465762

Even in his own day, Shelley's value as a poet and a thinker was hotly debated. This book argues that Shelley was b

117 61 29MB Read more

Computer science to the Point: Computer Science for Life Sciences Students and Other Non-Computer Scientists 9783658384425, 9783658384432

This textbook is aimed at students of non-specialist courses with computer science components. Special emphasis is place

204 91 9MB Read more

Data Science for Sensory and Consumer Scientists (Chapman & Hall/CRC Data Science Series) [1 ed.] 0367862875, 9780367862879

Data Science for Sensory and Consumer Scientists is a comprehensive textbook that provides a practical guide to using da

111 36 15MB Read more

Evaluating Science and Scientists
9789633865668

Author / Uploaded
Mark S. Frankel (editor)
Jane Cave (editor)

0 0 0
Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up

File loading please wait...

Citation preview

EVALUATING SCIENCE AND SCIENTISTS

EVALUATING SCIENCE AND SCIENTISTS An East-West Dialogue on Research Evaluation in Post-Communist Europe Edited by

Mark S. Frankel and Jane Cave

" ' ,,

: CEU '• �

Central European University Press BUDAPEST

First published in 1997 by Central European University Press Okt6ber 6 u. 12 H-1051 Budapest Hungary © American Association for the Advancement of Science 1997 All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, without the prior permission of the publisher. ISBN: 978-1-85866-079-0 cloth ISBN: 978-963-386-566-8 PDF

Distributed by Oxford University Press, Walton Street, Oxford OX2 6DP Order books from Oxford University Press Saxon Way West, Corby Northamptonshire NNl 8 9ES

CONTENTS

List of Contributors

Vll

Acknowledgments Introduction Mark S. Frankel and Jane Cave

Part I. Evaluating Science and Scientists: The Political and Economic Context 1. The Political Context of Science Priority-Setting in the United States 9 Albert H. Teich 2. The Political and Economic Context of Research Evaluation in Eastern Europe Gyorgy Darvas

18

3. Factors Affecting the Acceptance of Evaluation Results Ben R. Martin

28

Part II. Peer Review: Self-Regulation and Accountability in Science 4. Peer Review in Science and Science Policy Edward J. Hackett

51

5. A Polish Perspective on Peer Review Adam Lomnicki

61

vi

Contents

6. The Limits of Peer Review: The Case of Hungary Katalin Hangos

71

7. The Evaluation of Research Institutes in Hungary Peter Zilahy and Istvan Lang

82

8. Peer Review in the Czech Republic Miroslava Vrbova

93

9. Peer Review in Poland: Practical Solutions and Possible Improvements Julita Jablecka

96

Part Ill. Quantitative Techniques for Evaluating Performance 10. Quantitative Techniques in Evaluation in Western Europe Terttu Luukkonen

115

11. The Evaluation of Scientists and Their Work Andrzej Ziabicki

132

12. Scientometric Methods in the Evaluation of Research Groups in Hungary Peter Vink/er

149

13. Measuring and Evaluating Scientific Performance in the Czech Republic Jaros/av Koutecky

160

14. Institutionalizing Evaluation of Research and Education in the Slovak Republic Jozef Tino

166

15. Formal Evaluation Methods: Their Utility and Limitations C. le Pair

170

16. Evaluation of Research and Development Programs by Technology Indicators Hariolf Grupp

183

17. Quantitative Citation Data as Indicators in Science Evaluations: A Primer on Their Appropriate Use Alfred Welljams-Dorof

202

18. Ethical and Political Aspects of Using and Interpreting Quantitative Indicators Gunnar Sivertsen

212

Index

221

CONTRIBUTORS

Jane Cave is a freelance researcher and translator. Gyorgy Darvas is a Senior Researcher at the Institute for Research Organization of the Hungarian Academy of Sciences. Mark S. Frankel is Director of the Scientific Freedom, Responsibility and Law Program of the American Association for the Advancement of Science. Harloff Grupp is Deputy Director of the Fraunhofer Institute for Systems and Innovation Research, Karlsruhe. Edward J. Hackett is an Associate Professor in the Department of Science and Policy Studies, Rensselaer Polytechnic Institute. Katalin M. Hangos is Deputy Director of the Computer and Automation Institute of the Hungarian Academy of Sciences. Julita Jablecka is a Research Fellow at the Center for Research on Science Policy and Higher Education, University of Warsaw. Jaroslav Koutecky is Chairman of the Grant Agency of the Czech Academy of Sciences. Istvan Lang is a Member of the Hungarian Academy of Sciences. C. le Pair is Director of the Technology Branch of the Netherlands Research Council, Utrecht. Adam Lomnicki is a Professor at the Institute of Environmental Biology, Jagiellonian University.

viii

Contributors

Terttu Luukkonen is a Senior Researcher at the Technical Research Centre of Finland. Ben R. Martin is a Senior Fellow of the Science Policy Research Unit, University of Sussex. Gunnar Sivertsen is a Research Fellow in the Department of Scandinavian Studies and Comparative Literature, University of Oslo. Albert H. Teich is Director of Science and Policy Programs of the American Association for the Advancement of Science. Josef Tino is Vice-President of the Slovak Academy of Sciences. Peter Vinkler is Scientific Secretary of the Central Research Institute for Chemistry at the Hungarian Academy of Sciences. Miroslava Vrbova is Vice-Rector of the Czech Technical University, Prague. Alfred Welljams-Dorof is Associate Publisher of The Scientist. Andrzej Ziabicki is a Professor of Polymer Physics at the Institute of Fundamental Technological Research of the Polish Academy of Sciences. Peter Zilahy is Head of the Secretary-General's Office of the Hungarian Academy of Sciences.

ACKNOWLEDGMENTS

This book owes a debt to many people and organizations. It has its origins in 'Evaluating Science and Scientists', a workshop that brought together scien tists, government officials, research managers, and scholars of science and technology from 21 countries to assess the use of peer review and quantita tive techniques in evaluating science and scientists in East-Central Europe, Western Europe and the United States. We are grateful to the three co sponsors of that workshop (convened in Pultusk, Poland, in October 1993): the American Association for the Advancement of Science (AAAS), the Polish Society for the Advancement of Science and the Arts and its presi dent at that time, Zbigniew Grabowski, and the State Committee for Scientific Research in Poland and its former chair, Witold Karczewski. We also acknowledge the financial support of the US National Science Foundation. Two of its program managers, Bonnie Thompson of the Division of International Programs, and Rachelle Hollander of the Ethics and Values Studies Program, were especially encouraging during the early stages of planning the workshop. Funding was also provided by the State Committee for Scientific Research in Poland and by Poland's Stefan Batory Foundation, whose vice-president, Andrzej Ziabicki, was instrumental in helping us secure additional funding to support the travel of several of the workshop participants from Central and Eastern Europe. We are also grateful to Kamla Butaney at AAAS for her assistance with the production of the manuscript. Finally, we owe a considerable intellectual debt to the authors of the thoughtful and informative writings that make up the substance of this

x Acknowledgments volume, and to others participating in the original workshop whose insights and reflective comments helped to sharpen the essays included here. May 1996

Mark S. Frankel Jane Cave

INTRODUCTION Mark S. Frankel and Jane Cave

The political and economic transformation of Eastern Europe has had a major impact on science and scientists in the region. The end of the Cold War has brought with it a need to rethink science and technology policy, as well as new opportunities for international collaboration. The shift to a market economy has led to reforms in research funding and management, together with drastic cuts in funding levels everywhere. And the transition from single party rule to parliamentary democracy has expanded the bound aries of professional autonomy and brought the scientific communities of the region new opportunities to engage in self-regulation. One of the key issues currently facing East European scientists and science policy makers is research evaluation. What criteria should be applied - and who should apply them - in assessing the quality of research? What criteria should be applied and what mechanisms should be used to determine which research proposals receive funding, which articles get published, and which researchers and teachers get appointed and promoted? There are several reasons why evaluation has emerged as a major issue in the transition period. First, the centralized funding and management typical of communist science administration is widely viewed as having resulted in much poor-quality research. Research quality was also adversely affected by international isolation, which shielded both individual researchers and entire fields of research from the scrutiny of outsiders. Not surprisingly, then, both scientists and policy makers are interested in taking stock of what they have inherited from the previous regime. There is, of course, a strong economic incentive to do so. The early years

2

Mark S. Frankel and Jane Cave

of market reform have led to deep cuts in government spending in all budget categories, including research and development. At the same time, industrial enterprises, struggling to adapt to the market, no longer have the resources they once had to commission applied research projects. In many countries of Eastern Europe, the desire to 'save the best' from the budget ax has led to the evaluation of the national research network. Evaluation is also at the center of efforts to construct a new institutional framework for research and development. While the scope and pace of insti tutional reforms implemented to date vary from one country to another, a common denominator is the search for a system that will both foster high quality research and allocate scarce resources with maximum feasible efficiency. Across the region, there is widespread agreement among reformists that the new system of research funding and management should be one in which competitive grants are awarded to individual research projects on the basis of peer review. The need to take stock of the past and to reform the mechanisms of research evaluation and funding has given rise to considerable debate over the criteria to be used in evaluation. The discussion has focused, in partic ular, on the use of quantitative, or bibliometric, indicators in assessing overall research performance at the national and institutional level, and ways in which such indicators might be combined with peer review to evaluate the performance of individual researchers and research teams. The introduction of new funding and evaluation mechanisms in Eastern Europe is taking place in difficult circumstances. Across the region, scien tists are confronting not only changes in the rules of the game but also drastic cuts in the amount of research funding available. As a result, evalua tion has become a life-and-death matter not only for individual scientists, but for research institutes and even entire fields of research. Peer review has long been recognized as a critical element in advancing science. Peer review reflects the principle that scientific claims be accepted or rejected on the basis of merit, and reinforces the notion that scientists are in the best position to determine whether a scientific field is ripe for exploitation, whether the work is technically sound, and whether researchers have the requisite credentials to do the research. It is a mechanism of evalua tion that governs access not only to research funds but also to publication and professional status, all three of which are crucial in determining individual careers and the composition of research institutions. In the larger social context, peer review serves both as 'a mechanism of scientific self regulation that preserves the autonomy of science and as a symbol of profes sional accountability that ensures democratic control of science' . 1 Peer review constitutes a radical break with the centralized funding and management practices typical of communist science administration.

Introduction

3

Throughout Eastern Europe, postwar governments adopted, despite differ ences in culture and historical traditions, the Soviet tripartite system in which universities concentrated on teaching, basic research took place in institutes of the Academy of Sciences, and applied research was conducted in institutes belonging to industrial ministries and other government agencies. (These divisions were not, of course, absolute.) They also adopted the Soviet system of funding research largely through block grants to research institutes, where powerful scientist-administrators distributed funds to individual research teams. Such a system left plenty of room for decisions based on favoritism and political connections and, critics charged, allowed funds to be used in the support of much poor-quality research while creating patterns of dependence that eroded standards of professional conduct and critical debate. 2 By contrast, in a system based on peer review, research funds are allocated in open competition, according to criteria that are more or less clearly defined, and following systematic procedures that are the same for all. Unlike the hierarchical systems to be found under communism, peer review systems accord working scientists a major role in determining which research projects should receive funding. The introduction of peer review in Eastern Europe is not, therefore, just an administrative matter but is part of a broader transformation of the social structure of science in the region. At the most practical level, the introduction of peer review has involved the creation of new funding agencies. In 1990, in what was then still Czechoslovakia, the Czechoslovak Academy of Sciences and the Slovak Academy each set up a Grant Agency to allocate research funds on a competitive basis to Academy researchers. Subsequently, the scope of these agencies expanded to include research conducted in universities and other research institutes. In 1991, Poland established the Committee for Scientific Research, charged with allocating government funds for both basic and applied research. In the same year, the Scientific Research Foundation, origi nally established in 1986 to support research within the Hungarian Academy of Sciences, was transformed into an independent and nation-wide institu tion. New funding agencies have also been set up in Bulgaria, the Baltic states and Russia. All these new agencies have constructed their own systems of peer review panels, made up of working scientists, to evaluate research proposals and make recommendations regarding funding. It should be noted, however, that the transition to competitive grants is a gradual one, and research institutes continue to receive a sizeable percentage of their funding in the form of block grants to cover overheads and a propor tion of salaries. These block grants are also distributed by the new funding agencies and, in most cases, the level of funding depends on an evaluation of institute research performance based in some part at least on quantitative indicators.

4

Mark S. Frankel and Jane Cave

The introduction of peer review mechanisms presents the scientific communities of Eastern Europe with a number of complex issues. Peer review is not without its shortcomings. Some critics in the United States argue that the changing scale and organization of science have increased the need to bring values other than merit to bear on the allocation of resources to science, such as equity considerations in the geographical and institutional distribution of funds. 3 Others fault the peer review system for failing to safeguard adequately against bias and conflict of interest. 4 Still others lament the lack of accountability in a system shrouded in anonymity. 5 Similar criticism can be heard in Eastern Europe. Of course, some of this simply reflects the vested interests of those who have lost out under the new system - groups within the Academies of Sciences and ministries who have been deprived of control over resource allocation, and researchers who have found themselves unable to compete under the new rules of the game. But supporters of the new system, too, have expressed a number of concerns. In particular, they argue that the small size of the scientific communities in many countries of the region makes anonymity virtually impossible, and that this, together with drastic cuts in funding levels - which have led to increased competition for resources - undermines all efforts to safeguard against bias. The participation of foreign scientists in the review process is seen as one way of counteracting bias, though this presupposes that reviewers and those being reviewed share a common language, which is by no means always the case. East European researchers are also concerned about the possible theft of ideas if reviewers from wealthier countries with larger research budgets are involved in the review process. Concerns about bias are heightened by the fact that, for the time being at least, the new funding agencies constitute the sole source of government support for research. Some critics fear that, given their de facto monopoly over resource allocation, the new agencies may simply perpetuate the kind of bureaucratic politics that influenced research funding under the old regime. Concerns regarding bias in the funding process have fueled interest in the use of quantitative, or bibliometric, techniques to assess performance, whether of individual researchers or entire research institutes. Now that the Cold War has ended, science policy officials in Eastern Europe are also interested in such techniques as a possible way of measuring their country's research performance relative to that of other countries. Bibliometric techniques involve the counting of papers and citations in the scientific literature which are then used to generate indicators of scientific output and the impact of this output on the broader scientific community. While publication counts provide only an approximate measure of research productivity (since publications vary widely in terms of their importance and value), it has been demonstrated that there is a correlation between citation counts and other performance measures, such as professional awards, peer

Introduction

5

assessments and Nobel Prizes. 6 And in small countries, where it may be diffi cult to assemble a disinterested committee of reviewers in a highly special ized field, the option of using quantitative measures can be very appealing. Nevertheless, the construction, interpretation and use of such indicators are by no means unproblematic. There is the added cost involved in recruiting the expertise and constructing the databases needed to collect and analyze the indicators. Citation indices are heavily biased toward English language literature, thus placing scientists from non-English-speaking countries at a disadvantage. Depending on the rigor applied by a journal in its review process, citations in certain journals will have more value than others; yet they each count as a single citation. Hence, citations may indicate some measure of productivity, but tell us little about the quality of the work. Finally, there are concerns that increasing reliance on quantitative indicators to judge performance may lead some scientists to adjust their publishing behavior, perhaps leading to more short-term studies and fragmented publi cations at the expense of more longitudinal studies and detailed reports. A greater emphasis on publication and citation counts may increase the pressures to publish and, concomitantly, the temptation to cut ethical corners in research. 7 In Eastern Europe, the use of bibliometric indicators as a guide to past performance is further complicated by the nature of the communist regime. Many researchers were prevented, for political reasons, from publishing abroad and participating in international conferences. Consequently, their work is relatively unknown in the West, and bibliometric indicators are of little use in making international comparisons. The role of political factors in shaping professional careers and publication patterns under communism also renders problematic any attempt to construct valid bibliometric indicators within individual countries of Eastern Europe. This volume examines current efforts to reform research funding and evalu ation in Eastern Europe, and provides an overview of Western experience and scholarly research related to the use of peer review and quantitative evaluation techniques. Contributors include scientists and science adminis trators involved in the R&D reform process in Poland, Hungary, and the Czech and Slovak Republics, as well as American and West European research evaluators and scholars who have studied efforts to evaluate science in the West. In the chapters that follow, they discuss how peer review and quantitative techniques are used in their own countries, and how varying economic and political conditions affect the allocation of resources from country to country. They examine the political and economic contexts in which resources for science are allocated, the underlying assumptions of peer review and the use of quantitative indicators, the reliability of those methods, their weaknesses and possible remedies, and their consequences

6

Mark S. Frankel and Jane Cave

for science and other social institutions. Also presented are new ideas for practical steps to minimize the technical, administrative and ethical problems raised by the use of peer review and quantitative techniques in evaluating science and scientists. This collection constitutes, we believe, a timely and practical resource for policy makers, research managers, university administrators, and scientists/engineers - East and West - involved in the design and manage ment of science evaluation systems. It also provides valuable materials for scholars interested in the evaluation of science, or the relationship between science and government, and for all those concerned with the process of political and economic reform in Eastern Europe.

Notes and References I. Daryl E. Chubin and Edward J. Hackett, Peerless Science: Peer Review and US Science Policy (Albany, NY: State University of New York Press, 1990, p. 4). 2. See Jane Cave and Mark S. Frankel, Breaking From the Past: Setting New Ground

Rules for Scientific Freedom and Responsibility in East-Central Europe and the Russian Federation (Washington, DC: AAAS, 1992). 3. Mark S. Frankel, 'An Ethical Framework for Allocating Resources in Science', SRA Journal, Vol. XIII, No. 2 (Fall 1991), pp. 47-52. 4. Eliot Marshall, 'NSF Deals with Conflicts Every Day', Science, Vol. 257 (31 July

1992), p. 624. 5. See Chubin and Hackett, op. cit. 6. See J. R. Cole and S. Cole, Social Stratification in Science (Chicago: University of Chicago Press, 1973). 7. Panel on Scientific Responsibility and the Conduct of Research, Responsible Science: Ensuring the Integrity of the Research Process, Vol. I (Washington, DC: National Academy Press, 1992).

PART

I

EVALUATING SCIENCE AND SCIENTISTS : POLITICAL AND ECONOMIC CONTEXT

Any attempt to evaluate science and scientists must take into account the political and economic context in which research - basic and applied - is conducted. Science does not take place in a vacuum; it is a socially embedded activity which reflects the culture, traditions and resources of the country and institutions in which it takes place. The first three chapters in this collection provide an overview of the setting in which priorities for research and development are established and how evaluation efforts contribute to, and are affected by, deliberations over science and technology policy. The end of the Cold War and increasing economic constraints in both the East and West have put severe pressures on policy makers and the scientific community with respect to setting research priorities. In both the East and West, scientists and policy makers must increasingly deal with the relation ship between priority setting and research evaluation. Albert Teich discusses how these issues are being addressed in the United States, where the alloca tion of resources ' is understood better in terms of history, tradition, and the clout of various constituencies than in terms of the relative importance to the nation of different areas of research' . Teich reviews several proposed new approaches to setting research priorities and the assumptions that underlie them, including the role that the scientific community is expected to play in setting priorities. Policy making in the countries of Eastern Europe is complicated by an unstable political environment and the preoccupation of both policy makers and the public with short-term problems. Gyorgy Darvas describes how the

8

Part I

collapse of communism has led to a drastic decline in the priority accorded to science. Throughout the region, government expenditures on science have been cut and the number of people employed in research and development has fallen. If science is to gain public support, argues Darvas, it must justify itself to a public that is preoccupied with more immediate issues. Objective methods of evaluating science are, he claims, critical in this effort. Ben Martin discusses the opportunities and challenges presented by formal efforts to evaluate science. He stresses the need to design valid and reliable indicators that are able to 'capture' the outcomes of research and their impact, and to shape the evaluation in a way that maximizes its value to different audiences. Martin observes that techniques for evaluating science are 'still as much an art as a science' , and suggests that non technical factors are often as important as the technical aspects of the evalu ation in determining its ultimate acceptance and use. As a consequence, he emphasizes the importance of ensuring, as much as possible, that evaluation efforts are sensitive to the political and economic conditions that prevail in any particular country.

1

THE POLITICAL CONTEXT OF SCIENCE PRIORITY-SETTING IN THE UNITED STATES Albert H . Teich

The rationale for government support of research for the past fifty years in the United States has been the contribution of science and technology to military security and national prestige in the Cold War environment, coupled with a sense (taken mainly on faith) that a strong research community will more than pay for itself in long-run economic and social (such as health) benefits. 1 This rationale, laid out in Vannevar Bush's landmark 1945 report, Science: The Endless Frontier, is increasingly being called into question. 2 While it has not yet been completely discarded, it is no longer taken for granted to the extent that it used to be. As economics comes to dominate the national agenda to an ever-greater extent, the so-called ' social contract' underlying science funding is being rewritten, and the terms of the new structure are not yet clear. Central to the ferment in US science policy are calls for changes in the way priorities for federal funding of basic research are set. One of the drives behind the discussions of priority-setting has been the simultaneous appear ance in the budget of several ' megaprojects' that place large claims on R&D resources. Another has been a growing sense that the nation needs to do better in exploiting its basic research for national goals, especially economic competitiveness. The mechanism through which the United States allocates resources for basic research at the macro-level is the federal budget process. This process is neither rational nor systematic. It is, indeed, unsystematic, confusing and, in many respects, irrational. Its major strength, however, is that it works and - at least up to now - has helped to foster a uniquely vigorous and creative research enterprise.

10

Albert H. Teich

An observer whose information comes strictly from the organizational chart of the US government might conclude that the National Science Foundation (NSF) is the nation's principal funder of basic research. It is not. In fact, it is not even in second place. In fiscal year 1994, NSF, the only federal agency whose central mission is support of basic research, was third among federal agencies in terms of the size of its basic research expendi tures, behind the National Institutes of Health (NIH) and the National Aeronautics and Space Administration (NASA). Basic research is supported by more than 15 federal agencies and depart ments, although the six largest agencies are responsible for more than 95 per cent of the money. In most of these agencies, however, except for NSF, basic research is only a small part of the agency's R&D effort, and an even smaller part of the agency's total mission. While the design of the US system for supporting basic research is generally attributed to the ideas of Vannevar Bush, in fact the existing structure is a far cry from that envisioned in Science: The Endless Frontier. Bush recommended establishing a National Research Foundation that would be the nation's primary funder of basic research. A foundation such as Bush proposed would have been similar in operation to the research councils in other nations. It might have served as a vehicle for systemati cally allocating resources among various areas of basic research. Under such a system, government policy makers could make a decision on the level of expenditure for basic research; then, presumably, groups of experts could recommend priorities among different fields and projects and the agency could allocate funds accordingly. Criteria such as those proposed in 1963 by Alvin Weinberg could be applied to these allocation decisions, and an appropriate balance could be achieved that would advance science and serve national goals. 3 This, however, is not the system that we have in the United States. The United States has no single budget for basic research that gets divided up among different projects and disciplines. Basic research is included within the budgets of various federal agencies and departments along with the other programs that those entities finance. And those entities compete against one another, first for a share of the President's proposed budget that goes to Congress, then for shares of the congressional appropriations. Thus the agency basic research programs are generally budget items which are separate from one another and which are added up after the fact, rather than parts of a whole that is divided among the agencies. The US budget process has two major phases. In the first phase, the President's budget is prepared and coordinated by the Office of Management and Budget (0MB). The programs within each agency compete against one another under ceilings placed on the agency, and the agencies compete against one another under an overall budget ceiling imposed by the

The Political Context of Science Priority-Settin g

11

President. Within this context, there are relatively few opportunities for comparing research programs of different agencies directly against one another and making priority decisions among them. Despite the amount of time and effort that goes into its preparation, the President' s budget is only a proposal, subject to review and approval by Congress. This review and approval takes place at several levels in both houses. Most important are the appropriations bills, which actually authorize the expenditure of federal funds. In this arcane realm of congressional activity, the entire federal budget is divided among 13 separate pieces of legislation, each of which is the domain of a subcommittee of the Appropriations Committee. The principles of this division, which are the same on both the House and Senate sides, can only be understood in terms of congressional history and tradition, and appear, to most observers, to be completely random. Appropriations for NSF and NASA are contained within the Veterans Affairs, Housing and Urban Development, and Independent Agencies bill. This means that in the congressional process, NSF and NASA compete against one another, but also against housing, veterans, and environmental programs (as well as such miscellaneous bodies as the American Battle Monuments Commission). The National Institutes of Health contend with health services, labor, and social welfare programs. The Department of Energy's research (including the super collider) is pitted against other DOE programs, and river, harbor and dam construction projects. Thus, in Congress, as in the Executive Branch, NSF, NIH and DOE do not face off directly against one another for funds, and none of the civilian research programs competes directly with Defense Department research. Put simply, research is not a trade-off area within the budget, and research programs do not compete against one another for shares of a single, limited pie. In the view of many scientists, this is an advantage and has actually led to more funding for basic research than might otherwise be the case. The disadvantage of the process, however, is that federal funding for basic research seems to lack a coherent vision and set of priorities. Resources are allocated in a manner that is understood better in terms of history, tradition and the political clout of various constituencies than in terms of the relative importance to the nation of different areas of research. And therein lies the source of a growing chorus of criticisms and calls for change. Concerns about priority-setting in research seem to have three major thrusts. Critics claim: • that the system fails to give sufficient priority to research s upporting key national goals (such as economic competitiveness) ; • that the system does not provide for the appropriate balance between different areas of research;

12

Albert H . Teich

• that the process for choosing among different research initiatives is flawed. 4 The term 'priorities' is used (and misused) in at least two different ways in discussions of research budgets and funding. One sense is political: a program is designated as 'high priority' as a means of advocating it and enhancing its visibility and importance. In general, however, this designa tion does not imply a formal ranking nor does it usually mean that other programs should be sacrificed to assure funding for it. The so-called 'war on drugs' has been a federal budget priority in this sense through at least the past three administrations. The other sense in which the word 'priority' is used does suggest a more formal ranking within a trade-off area. If programs or budget items compete under a given ceiling and there is (as is always the case) not enough money to go around, then setting priorities is one way (but not the only one) to allocate funds. The concern that the existing priority-setting system does not provide for an appropriate balance between different areas of research is one that has been widely expressed. The issue is generally discussed in terms of the relative funding levels of different programs or fields. Underlying the concept of balance - which is generally used as a form of argument by those who feel their favorite programs are underfunded in comparison to others is the notion that budget dollars are a direct measure of priority. In one sense, of course, they are. Money is by definition a means of placing value on various things and facilitating their comparison. Nevertheless, there is a fundamental problem with the idea of balance measured this way. Simply put, some things cost more than others. Experimental physics, especially at high energies, is inherently more expensive than theoretical physics. Studying the surface of Venus with a planetary probe costs more than studying the properties of a chemical in a laboratory. The funding levels of these programs do not necessarily reflect their priorities. The meaning of balance among such diverse programs, measured in dollars, is not self-evident. This is not to imply that the costs of programs are irrelevant. Obviously, they are an essential factor in decision making. But it also does not mean that just because a program or a goal is 'high priority' it should receive more money at the expense of lower priority programs. Balance is a matter of funding levels relative to the cost of achieving goals, not funding levels per se. The third main criticism of basic research priorities has to do with the process of choice. Critics who claim that the process for choosing among different research initiatives is flawed, however, are making an implicit assumption that the choice should be among those initiatives - that is, they

The Political Context of Science Priority-Setting

13

should be traded off against one another. 'We cannot afford to build the space station and the super collider and pursue the human genome project all at the same time' is a common complaint that has been heard for the past several years. It may be true that the United States cannot afford all of these expensive programs in its current budget situation, but the implication that we need to choose among them is a fallacy: Each of these projects is directed at entirely different objectives, and should be considered on its merits in competition with projects of all sorts in the budget, not just with other R&D projects up for consideration. If there are to be priority trade-offs, they should be with other programs having the same general objectives. 5 There is no reason and no need to regard these initiatives, simply because they are all thought of (not necessarily correctly) as 'big science', as a trade off area in the budget. In fact, there is good reason not to do so. The fact that the federal budget process generally treats them independently might well be seen as a virtue of the process rather than a flaw. In response to dissatisfaction with the current system, US scientific leaders have, over the years, advanced several proposals for new ways of setting priorities in research. In general, these proposals suggest criteria for comparing research initiatives and levels of support for different fields. They include mechanisms for assembling groups of experts to apply the criteria, and some also recommend ways (generally vague) in which the experts' conclusions are to be transmitted to the administration and Congress and incorporated into the budget process. 6 These proposals - which I will not dwell on in detail - share a number of assumptions, all of which are open to some question. First, they all seem to be based on the notion that the scientific community should (and can) take the lead in approaching the problem of priority-setting for research in a unified and systematic way, through deliberations of committees of experts. This assumes a degree of organization, communal spirit and rational behavior that is not apparent in the scientific community - at least in the United States. Beyond this, they assume that policy makers in Congress and the Executive Branch will respond to recommendations from these groups in a rational and consistent manner. This is somewhat akin to suggesting that the research enterprise should be made a trade-off area. Not only am I dubious about the wisdom of this notion, but the idea that the byzantine form and structure of the federal budget process could be overhauled to suit the needs of scientific researchers is, I am afraid, naive in the extreme. Finally, the proposals reflect a central focus on overall funding for the

14

Albert H . Teich

research enterprise as the key question in science policy. This point has come under attack recently on several fronts, most prominently the House of Representatives Committee on Science and its former chairman, Congressman George Brown. Several years ago, Brown convened a 'Task Force on the Health of Research' to look into some of the issues underlying science policy discus sions. In its report, the Task Force criticized current discussions of priority setting. It noted that the idea of choosing among disciplines is an inherently unproductive one, and suggested instead that those who would set priorities for science approach the problem from the perspective of societal or policy goals. Such an approach would require analysts and policy makers to ask two types of questions: 1 For a given national goal (say, energy independence; expansion of the knowledge base), what research is most necessary? 2 What mechanisms for administering, performing and evaluating research create the optimal pathways from research to goal attainment? 7 This seemingly sensible notion has stirred considerable controversy in the scientific community. What has been raising the scientists' blood pressure is a fear that Congress is becoming impatient with the productivity of the research system, and that even the friends of science in the political worl d may be willing to sacrifice some degree of traditional scientific autonomy in favor of a more explicit goal orientation. This fear that has been reinforced by some hostility to scientific programs elsewhere in Congress and by pressures from several quarters to move NSF and NIH, traditional bastions of basic research, more directly into activities supporting industry and inter national economic competitiveness - that is, into ' strategic' rather than 'curiosity-driven' research. Discussions about priority-setting in US science are not likely to move beyond the level of political rhetoric and academic debate if they continue to focus on the problem of choosing among disciplines and scientific initia tives. An approach that focuses on national goals and develops research priorities in relation to these goals is more consistent with American polit ical traditions and would appear to have more potential for yielding useful results. Nevertheless, adopting a goal-orientated strategy may not be all that easy, particularly if scientists regard the notion of strategic research as a threat to scientific freedom. Are curiosity-driven research and strategic research compatible? I would argue that the two are not mutually exclusive. It is possible to allocate resources strategically and at the same time allow researchers to choose their problems and research approaches according to traditional scientific criteria. A number of developments have taken place in regard to coordination and

The Political Context of Science Priority-Setting

15

priority-setting among federal research programs during the past several years. Under the Bush Administration ( 1989-93), a process known as the 'FCCSET process' (for Federal Coordinating Council for Science, Engineering and Technology) was designed and put into place. This process gave special budgetary attention to a number of R&D areas that were regarded as ' strategic' - potentially important to US economic competitive ness and other national goals - for example, advanced materials and processing, multidisciplinary research on the environment, advanced manufacturing, and biotechnology. When President Clinton took office in 1993, his science advisory staff, headed by John Gibbons, built on this process in developing the National Science and Technology Council (NSTC), to which the President gave a significant budgetary role. NSTC extended the idea of strategic priorities to the overall budget for R&D. It has used an extensive set of committees and subcommittees devoted to specific functional areas (such as transportation, energy and environment, health, fundamental research, etc.) to prepare budgetary guidance for all of the R&D efforts of the executive branch agencies. The idea is to set priorities among R&D programs across agencies and to coordinate programs serving related national goals. Most recently, in late 1995, a committee of the National Academy of Sciences produced a report calling for even further-reaching changes in the process of allocating funds for federal science and technology programs. 8 Among other recommendations, the report suggests that the President submit, in his annual federal budget, a comprehensive budget for science and technology, including government-wide coordination and areas of increased and decreased emphasis. It also suggested that Congress adapt its own fragmented budget processes to the needs of this unique style of budget submi ssion. While Academy reports often carry considerable weight in US science policy, the extent to which this committee's recommendations will be followed is not yet clear. None of these coordination and priority-setting initiatives is intended to diminish the level of federal funding for curiosity-driven research. Indeed, most are intended to protect it while rechanneling curiosity and providing different levels of opportunity for researchers with interests consistent with the areas of emphasis. Most federal basic research programs, especially at NSF, are made up of numbers of individual investigator-initiated research projects, selected by scientists according to their interests. Taken together, however, these projects may constitute a strategic program by advancing knowledge in an area of special importance to a national goal. The point is that, except in cases where a government agency sponsoring research has a clear expectation of a specific product or outcome in mind, it cannot effectively set the direction of individual projects and it should not seek to do so. Strategy is most effectively applied at higher levels of aggregation, in

16

Albert H . Teich

choosing among areas to support. Criteria should be applied at the program level and should reflect, as has long been recognized in the scientific commu nity, a balance between scientific opportunity, social relevance, the quality of researchers in the field, and cross-disciplinary impact. There are no hard and fast rules, but it is worth noting that, despite the frequent appearance of controversy among scientists of different disciplines (especially about 'big science' projects), consensus about which fields are really important to invest in tends to develop fairly readily in the scientific community. The difficulties of setting priorities in US science and technology are highlighted by the political impasse that arose between the Administration and Congress in 1 995, following the 'Republican Revolution' , in which control of both houses of Congress passed to the Republican party for the first time in 40 years. Congressional Republicans have a set of national priorities very different to that of the Clinton Administration, and the unwill ingness of either side to compromise has produced a lengthy budget stale mate in which budgets for science and technology agencies (like those of many other agencies) have been bounced around like footballs. While, by early 1 996, it looked as if the principal agencies funding basic research would do reasonably well budgetarily (at least by comparison to other parts of the federal government) for the time being, the process by which their funding levels and their programs are being determined seems even more unsystematic and irrational than in the past. In the larger scheme of US politics and government, science and technology are relatively minor players and the way their priorities are determined can often by affected by much larger political forces outside the control of the science and technology community or even of the leaders of congressional science and technology committees or executive branch science and technology agencies. This is a cost - but in a larger sense, a virtue - of a democratic political system. Notes a n d References 1 This chapter is adapted from a paper originally prepared for 'Grant-Giving and Grant Management Procedures and Processes in Comparative Perspective: An International Symposium' , Australian Academy of Science, Canberra, Australia, 25-26 July 1 993. 2 Vannevar Bush, Science: The Endless Frontier (Washington, DC: US Government Printing Office, 1 945; reprinted, 1990). 3 Alvin M. Weinberg, 'Criteria for Scientific Choice' , Minerva, Vol. I, No. 2 (Winter 1 963), pp. 1 59-7 1 . 4 Willis H . Shapley, The Budget Process and R&D (Washington, DC: Carnegie Commission on Science, Technology, and Government, 1 992), p. 38. The discussion in this section owes much to Shapley' s perceptive analysis. 5 Shapley, Budget Process, p. 44. 6 Frank Press, 'The Dilemma of the Golden Age', address to the 125th Annual Meeting

The Political Context of Science Priority-Setting

17

of the National Academy of Sciences, 26 April 1 988; National Academy of Sciences,

Federal Science and Technology Budget Priorities: New Perspectives and Procedures

(Washington, DC: National Academy Press, 1 988); John A. Dutton and Lawson Crowe, ' Setting Priorities among Scientific Initiatives' , American Scientist, Vol. 76 (November-December 1 988), pp. 599-603 ; US Congress, Committee on Science, Space, and Technology, US House of Representatives, Setting Priorities in Science and Technology (Washington, DC: US Government Printing Office, 1 989). See also National Academy of Sciences, Committee on Science, Engineering, and Public Policy, Science, Technology, and the Federal Government: National Goals for a New Era (Washington, DC: National Academy Press, 1 993). 7 US Congress, Committee on Science, Space, and Technology, House of Representatives, Report of the Task Force on the Health of Research, 1 02nd Congress, 2nd session (Washington, DC: US Government Printing Office, 1 992), p. 1 0. 8 Committee on Criteria for Federal Support of Research and Development, Allocating Federal Funds for Science and Technology (Washington, DC: National Academy Press, 1 995).

2

THE POLITICAL AND ECONOMIC CONTEXT OF RESEARCH EVALUATION IN EASTERN EUROPE Gyorgy Darvas

The evaluation of science and the evaluation of scientists are two different things. The evaluation of science involves an assessment of the whole scien tific establishment of a given country, including its science policy and research institutions, and its legislation, and well as the country's overall contribution to worldwide science. The evaluation of scientists involves an assessment of individual (or team) performance. The two forms of evalua tion are not totally independent of each other, but their methods and goals may diverge. The first kind of evaluation is concerned with the degree to which govern ments and national agencies effectively manage science, while the second is concerned with the effectiveness of research units and individual researchers functioning under a particular administration. While the latter evaluation can (also) be carried out within a country or a single institution, the first neces sarily involves international comparison. On the other hand, it can be said that an objective measure of the effectiveness of a country's science is the extent to which the administration ensures conditions for fruitful and 'effec tive' work on the part of its scientists. In this sense, international standards can be used: which country's scientific performance is rated more highly? At the same time, the assessment of what are the proper conditions for effective scientific work differs from country to country and depends partly on the general political and social conditions, as well as living standards and working conditions, in the country concerned.

The Political and Economic Context of Research Eva luation

19

The Political Environment in which Science is Eva luated in Eastern Europe In ' post-1990' Eastern Europe, there are three main political tendencies: social-democratic or modern socialist, liberal, and state-capitalist. (This last is represented by conservative national or christian-democratic forces.) The first and the third can be placed along a classical left-right axis, but they more or less agree on the importance of a strong role for the state. The liberal tendency opposes these along a perpendicular axis, striving for minimal state intervention. In the different countries of Eastern Europe, political parties representing all three forces have held power since the collapse of the previous regimes. At the same time, it should be noted that the overall political situation remains unstable. Voters have not established any long-term preferences. They have no experience of different policies, and their choice of one party or another is determined largely by chance factors. In many cases, they are simply voting for change. There are frequent shifts in public opinion, and there is widespread disappointment with certain policies. The parties themselves are still seeking their place in the political spectrum. Science policy does not play a decisive role in obtaining votes. In such circumstances, science, by which I mean its representatives, is seeking its place in the new political system. Given the lack of priority accorded to science policy and the deterioration of research conditions, scientists are trying to find their own solutions. Seeing little prospect of assistance from the state, and starting from the internal interests of science itself, scientists are striving to acquire more autonomy and are interested, in the long run, in liberal policies. In the short run, until real market conditions have been established, science needs strong state involvement and support. I am not concerned here with assessing the extent to which this movement has been successful, but I want to underline the fact that this is the main reason why we now need consistent self-evaluation of science. Within the East European scientific community, the demand for an objective evaluation of science and the presentation of the results of this evaluation to the public has never been as strong as it is now. Why? Because science has to justify itself. Science in Eastern Europe was never rich. Nevertheless, although neither total nor per capita expenditure was particularly high, science was relatively better off than other sectors of the economy. The percentage of gross domestic product allocated to R&D was higher than in the industrially developed countries of the West. While the proportion of students in the 18-23 age group was lower than in the developed countries, the proportion of the population employed as university staff was not lower. Moreover, many independent researchers worked in research institutes outside the universities. Scientists and engineers had greater opportunities to be

20

Gyorgy Darvas

included in the international division of labor than did any other social group. This relative advantage had a positive impact in terms of relatively higher effectiveness: indices of scientific performance were higher than those in the economy. During the 1 980s, Hungary, for example, occupied between 24th and 27th places in the ranking of world economies in terms of GDP per capita, while it ranked from 1 6th to 1 8th in terms of scientific publications and citations. During the economic crisis that arose following the political changes of the early 1 990s, the new governments found themselves obliged to reduce the scope of the central budget and to reduce expenditure on certain budgetary items. The easiest solution was to cut expenditure on those items where government spending was above the international average. The victim (along with social policy) is science. In all countries of the region, there has been a decline in both government expenditure on science and the number of people working in R&D. In Hungary, for example, the total number of people employed in R&D declined by 3 1 per cent (from 35,069 to 24, 1 92) between 1 98 8 and 1 992. In 1 988, R&D expenditure amounted to 2.28 per cent of GDP; by 1 992, this figure had dropped to 1 . 1 3 per cent. 1 It seems that the newly formed governments in the region did not learn from the experience of OECD countries, which shows that an increase in GDP during the early 1 980s was preceded (by 3-4 years) by an increase in government R&D expenditure (with a correlation of about +0.3). 2 Many scientists argue that maintaining the earlier level of research and development expenditure offers the greatest prospect of overcoming the economic crisis; that society will be repaid for higher expenditure by higher performance; that it is not the case that the number of university staff is too high, but that student enrollment is too low; and that we should not engage in a process of 'leveling downwards' . If we previously had a scientific estab lishment that was relatively more highly developed than that of the West, we should retain this advantage while developing other sectors to the same level so as to ensure the subsequent improvement of 'per researcher' indices, and so on. This argument needs proof. Proof could be provided by evaluation. In order to present an argument involving examples taken from the developed countries, we need to adopt their methods. This is how the scientific commu nity itself initiated the process of self-evaluation. The reduction in government funding for science during recent years has coincided with a crisis in i ndustrial enterprises, which are no longer able to place orders for contract research with R&D units. Research and develop ment are thus more dependent on the state budget than they were under the previous regime. In debates on the budget, some members of parliament in Hungary have demanded that the government throw a lifebelt to science, at least temporarily. Unfortunately, pro-science rhetoric is not always matched

The Political and Economic Context of Research Evaluation

21

by pro-science votes when the time comes to allocate dwindling state resources. The parliamentary and public debates have been similar to, if not exact repetitions of, the debate over the famous Vannevar Bush report to the US Congress after World War II: should society give a blank check to science, or should science be required to render a strict account of how effectively it has used public funds? The answers are not as unanimous as one might expect, given the outcome of the debate in the world's leading scientific power nearly half a century ago. In Hungary, for example, there is a strong public demand for science to account for its use of funding. While Hungarian political parties have very different programs, they do not differ essentially in their handling of science policy. None of them gives priority to this sphere. Science policy is not a means to win votes at the next election. The Hungarian parliament, for example, has no specialized committee for science, which is one of several areas assigned to a committee that deals with broader cultural issues. Although science could provide solutions to long-term social problems, policy makers concentrate on immediate, short-term problems. No one on the political stage plans further ahead than a single parliamentary term. This approach coincides with the demands of the public, which is impatient and interested, above all, in solving short-term problems; science policy is beyond the horizon of the broader public. Therefore, while parliament has discussed and passed a substantial amount of legislation, the main principles and goals of national science policy were adopted by the government in 1993, on the recommen dation of its Science Policy Committee and without parliamentary debate. The government's science policy document sets out, among other things, the five basic priority areas of scientific research. These priorities were discussed with representatives of the main political parties and with a relatively small group of specialists. Participants in these discussions did not, however, include such a broad spectrum of society or of the scientific community as was usually the case under previous governments. The opinion of the broader community is now supposedly represented by democ ratically elected deputies. Formally, this is the case, but in practice, fewer people were actually consulted before the decisions were made. I can add that the picture is gloomy, despite the fact that several former researchers are members of the new political elite. Those social scientists who played a major role in preparing the ground for subsequent political change are rarely consulted by the new policy makers. On the other hand, the opinion of the international scientific community has been heard and taken into account to a much greater extent than ever before. It should be noted that in Hungary and elsewhere in Eastern Europe, reports compiled by OECD experts have had a major influence on science and technology policy legislation. In the case of Hungary, the OECD noted:

22

Gyorgy Darvas

The scientific community has a long tradition of assessing the quality of research results and of people and institutions who produce them. Evaluation of individual performance is intrinsic to science. Its practice, mainly through peer review, is now widespread and widely accepted. This method relies heavily on international and independent referees and now incorporates such quantitative indicators as citation and co-citation analysis. The report gave examples of the evaluation procedures used in several countries of Western Europe and went on to state: These schemes can serve as models for establishing a well-structured and efficient framework for science evaluation in Hungary. For the time being, the priorities should be put on evaluation of R&D institutes and units. 3 In 1 992, World Bank experts formulated recommendations for national science policy in Hungary at the invitation of the State Committee for Technological Development. After an evaluation which the Hungarian Academy of Sciences conducted of its research institutes (and similar evalu ations of other R&D networks carried out by several ministries), there emerged the need to involve foreign experts and organizations in the evalua tion process. (Peter Zilahy and Istvan Lang discuss these evaluations in Chapter 7 of this volume.) Although several international surveys and studies of Hungarian R&D had already been carried out, in 1 992 the presi dent of the Hungarian Academy of Sciences decided to invite the leading officials of the International Council of Scientific Unions to take part in an objective evaluation of basic research.

The Environ ment of the Eva luation of Scientists Competition for financial support of research began to spread in Eastern Europe during the 1 980s. Researchers had to get used to the fact that their parent organization would no longer be able to provide all the resources needed for their work and that they would have to apply elsewhere for funding. Applications were evaluated and 'measured' during the process. At the same time, the scientific community had to adapt itself to the require ments of evaluation. The methods of evaluation were new for this commu nity, but since the early 1 980s all the methods used in West European countries have been adopted in Hungary and elsewhere in Eastern Europe. The number of funding agencies and foundations, both domestic and foreign, has increased, and researchers have also acquired opportunities to apply for support abroad. Recognizing the increasing importance of research evaluation, in January 1 987 the Hungarian government's Science Policy

The Politica l and Economic Context of Research Evaluation

23

Committee discussed a detailed report prepared by the Hungarian Academy of Sciences on the state and art of evaluation. A permanent working commission, headed by the Secretary General of the Academy, has regularly reviewed the status of evaluation since that time. I should emphasize that the use of quantitative methods in evaluating research goes back to the mid l 970s in Hungarian universities and basic research institutes. The first scien tific research unit (headed by Tibor Braun, also editor of Scientometrics) on bibliometrics and the analysis of quantitative research information in Eastern Europe was formed within the Central Library of the Hungarian Academy of Sciences in the mid-1970s. National research foundations have been established in most countries of Eastern Europe. In Hungary, the first grant-awarding bodies, established in the mid-1980s, were the National Scientific Research Fund (OTKA), initi ated by the Hungarian Academy of Sciences, and the Soros Foundation. OTKA is a public body, while the Soros Foundation is a private agency with its own criteria. With the establishment of OTKA, there arose the need to develop methods of evaluation. Since OTKA is a nationwide research fund, in principle, the whole scientific community is involved. I do not want now to give a detailed description of the experience of the OTKA evaluation process, since this is discussed elsewhere in this volume. I would, though, like to mention some problems that characterize the current period in Hungary, and, to some extent, the other countries of Eastern Europe. Let me cite here the opinion of one analyst which is, nevertheless, very characteristic: OTKA has largely failed to manage the selection of research projects, although this was its main objective. There is a good reason for this. OTKA was introduced in an effort to compensate for a drastic budget cut in 1 983. Since that time, the threat of cuts has always been present, and in the last three years, drastic cuts have been made. As a result, grants have come to constitute not just a source of additional support for research projects but a source of basic funding, necessary for the survival of research institutes. Although the department head is supposed to carry out a preliminary quality control of the applications submitted by his subordinates, he is anxious to pursue any possibility of gaining overhead for his institute. At the same time, a small country has a small scientific community. The resulting interdepen dence and informal relations make it very difficult to engage in objective evaluation. Everyone knows everyone else in the peer review system. Decisions are taken in disciplinary committees where the pattern of sharing resources within the institute, characteristic of the previous regime, sometimes reasserts itself. Year by year, no one is deprived of resources and no one improves his position. Nearly every applicant gets a small sum for living expenses (for himself and his institute), which is not enough to carry

24

Gyorgy Darvas

out the planned project. In this way, OTKA performs an equalizing rather than a selection function. The grant system has thus also strengthened the atomization process brought about by declining funds and growing self financing.4 Problems of this kind can be attributed, in part, to the lack of long-term experience with evaluation procedures. The lack of established ethics governing participation and practice in the evaluation process can also be regarded as an inevitable phenomenon at this stage. But some special problems arise from the fact that we have a small scientific community. In the relatively small nation states of Eastern Europe, as noted above, the members of a given scientific community all know each other. In most cases, anonymous evaluation is impossible, and the influence of lobbies or interest groups cannot be eliminated or counterbalanced. Most of those who are evaluating grant applications are themselves applying for funding in the same round. This has led the evaluation process in two directions. First, there is a tendency to give greater weight to the 'more objective' quantita tive methods, such as scientometric methods, than to 'subjective' peer review; second, there is a trend towards involving international experts in the evaluation process. The first approach cannot solve the problem. At the same time, there are limits to the applicability of quantitative methods: they cannot be used on their own without reference to other methods, and their applicability varies considerably in different disciplines. As far as the inter nationalization of the evaluation process is concerned, this is limited by the low level of knowledge of foreign languages on the part of researchers in the region, and by the preference for the use of national languages in scientific life. One can cite excellent examples of the use of these and other methods in Hungary and elsewhere, but the fact is that, regardless of how many methods are used, the scientific community continues to debate the i;elevance of given methods in certain fields of science, and the incommen surability of the same methods in different disciplines. Criticism of evalua tion methods did not arise among R&D managers, but among scientists themselves, who have voiced doubts regarding the 'objectivity' of the evalu ation. The evaluation of interdisciplinary proposals presents particular problems that have yet to be resolved. Since applications from different fields of science or disciplines cannot be compared, the research funds available for distribution by individual review panels must be allocated in advance. Decisions concerning the allocation of these large sums are not taken at the level of individual research applica tions. They involve questions of priorities, not evaluation. Priority-setting is quite different from the evaluation of individual research proposals, and demands different methods (see Chapter 7 for an account of the different

The Political and Economic Context of Researc h Evaluation

25

methods used at different levels). As I see it, less emphasis has been given to the methodology of this problem than to evaluation. This medium-level priority-setting differs somewhat from nationwide priority setting. It depends on how the scientific community and the relevant level of research management evaluate the performance of the whole community of a given discipline or scientific field, and on how much priority is given to the future development of that field. While the evaluation of current performance gives priority to scientific considerations, decisions regarding future development involve political criteria. As this process takes place on the second (that is, medium) level of the management of science, these two aspects are naturally combined. (As described in Chapter 7, at the medium level, ex-ante evalua tion is confined to the cost-benefit method, which is relevant only in the case of applied research but not that of fundamental research.) I cannot estimate the relative weight accorded to scientific criteria and political factors in such decision making. These decisions are made in much narrower milieux and by less developed methodology than is the case with the evaluation of specific grant applications. Another problem, it seems to me, is that there are contradictory elements in the establishment of evaluation criteria. These contradictions open up space for the most subjective judgments. One cannot simultaneously give priority to children, retirees and the working population; or to men and women. These are simplified examples, and the formulation of contradic tory criteria in the statutes of individual foundations are more differenti ated. For example, applicants are expected to submit proposals for new topics, but they have to prove their former productivity (probably not in the same field); priority is given to young researchers and to those having the highest citation indices, etc. (It should be noted that OTKA is currently considering the OECD suggestion that it establish a special fund for young scientists.) There are no objective criteria to measure the effectiveness of the evalua tion process itself. Each system can be compared with its own earlier state. In this respect, all the competitive systems in Hungary, for example, have made large steps forward. Who can tell what would be the optimal? No optimal systems exist, even in the West. There are only better ones. In Eastern Europe, we can aim to follow the good examples and adapt them to our own specific conditions. As I see it, the evaluation methods used in Hungary do not distinguish sufficiently between the procedures appropriate to different types of research - for example, fundamental and applied research. Many criticisms advanced by researchers can be reduced to this problem. Let me recall here the opinion of the Nobel Prize winning Canadian chemist John Polanyi, voiced at a World Bank conference on science and technology financing at Beijing in October 1992:

26

Gyorgy Darvas

Researchers in basic science are able to make great scientific discoveries if, and only if, they have freedom to choose the object of their research. Nature is resistant to our interest, and gives responses to our questions if we put them in a clever way. Therefore we cannot leave the choice of our research themes to governments or science committees, who decide on the basis of lists of national scientific priorities. Does the right of free choice of research topics mean that each scientist is guaranteed a 'meal ticket' ? Not at all, because scientists must satisfy high performance requirements. Nevertheless, these requirements must be the requirements of the basic sciences and their fulfill ment must be judged by appropriate persons. . . . Basic and applied research must be distinguished, and controlled in different ways. Here one can face a danger. This danger is to accept an elegant but incorrect sequence of thinking. Policy, quite correctly, strives to gain the sympathy of people. Applied research can easily gain the sympathy of people, because it can directly influ ence their lives. Thus, naturally, the political leadership emphasizes the necessity of better applied research to improve economic competitiveness. B asic research is then justified as being a precondition of applied research. The incorrect and dangerous conclusion is that basic sciences should be made dependent on the needs of applied research. . . . As there is a time lag of one to three decades between a fundamental result and the new technology origi nating from it, we are deluding ourselves if we believe that we can choose those basic researches that will lead to the desired applied research results. 5 In setting R&D priorities, policy makers in Eastern Europe should bear in mind Polanyi's cautionary remarks on the relationship between basic and applied research. While it is reasonable, both scientifically and politically, to link research to national policy goals, it would be short-sighted to invest in applied research at the expense of basic research. Experience demonstrates that it is impossible to predict w hich fields of science will lead to new technologies, and history provides many examples of basic research that produced practical applications in areas unrelated to the original work. There is no simple formula to determine the distribution of funds for basic and applied research, but it is clear that investment in the generation of new knowledge - basic research - is a prerequisite for a strong scientific infra structure that can support technological development in the service of national goals.

Notes and References Gyorgy Darvas et al., 'Transformation of the Science and Technological Development System in Hungary' , in R. Mayntz, U. Schimank and P. Weingart (eds.), Transformation mittel- und osteuropaeischer Wissenschaftssysteme (Liinderberichte) (Opladen: Leske und Budrich, 1995), pp. 853-976.

The Politica l and Economic Context of Research Eva luation

27

2 Author' s calculations based on data published in OECD Science and Technology Indicators. No. 2: R&D, Invention and Competitiveness (OECD, Paris, 1986), and OECD economic indicators cited in Kozgazdasagi Szemle, 1 987. 3 OECD Report on Science and Technology in Hungary, DSTI/STP/92.5, pp. 3-34. 4 Katalin B alazs, unpublished manuscript, 1993 . 5 OTKA Hirlevel, 1993, No. 2, pp. 1-2 (in Hungarian).

3

FACTORS A FFECTING THE ACCEPTANCE OF EVALUATION RES U LTS Ben R . Martin

The field of research evaluation is still in its infancy. It has certainly yet to achieve the status of a ' science,' and there must be doubts as to whether it ever will, given the nature of the field. Whether the findings of a research evaluation are accepted depends partly on technical factors - for example, the rigor of the methodology and the reproducibility of the results - factors upon which the conventional scientific approach can be brought to bear. However, as this chapter 1 will demonstrate, non-technical factors are also important. They include the political context in which the evaluation is carried out; the relationship between the evaluator, those being evaluated and (where relevant) the agency commissioning the evaluation; and how the evaluation results are presented and disseminated. Much 'craft' knowledge is involved in determining how best to deal with these factors. Such knowl edge is often largely tacit; much of it cannot be readily formalized, linked to theoretical models or set down in standard textbooks - at least not at present. There is therefore little literature on the subject. (Among the few researchers to have addressed the issue are Montigny 2 and Luukkonen and Stahle. 3 ) In what follows, I shall examine the importance of various factors in determining the acceptance of evaluation results by looking at three evalua tions carried out by John Irvine and my self. However, before that, I will first briefly consider two issues central to research evaluation: how to construct valid and reliable indicators; and how to ensure that evaluations yield results relevant to science policy. The three evaluations examined here are (i) the original 'big science' evaluation at SPRU (the Science Policy Research Unit at the University of

Factors Affecting the Acceptance of Evaluation Results

29

Sussex); (ii) the assessment of NTNF, the Norwegian Council for Applied Research; and (iii) an ill-fated project on the European Community's steel research program. All three projects were conducted during the late 1970s and early 1980s. At that stage, it was becoming clear to scientists as well as policy makers that a more systematic approach to monitoring and evaluating research was called for. There were several reasons. One was the emergence of funding constraints following a substantial reduction in the growth rate of national science budgets in many industrialized countries compared with previous decades. With this came the recognition that, in order to find the resources to support new areas of research or young scientists, it might be necessary to cut back first on existing research commitments. Secondly, problems were beginning to appear with the traditional peer review system for allocating resources among scientists. The system had generally worked well during the 1950s and 1960s when funds were growing rapidly, but it was now proving less effective. This was especially true of 'big science' , where it was becoming increasingly difficult to find neutral peers. 4 Furthermore, peer review sometimes exhibited a tendency towards the reproduction of past priorities, to the disadvantage of smaller, newer or interdisciplinary research fields. 5 Lastly, at the end of the 1970s, one began to hear calls from politicians for increased public accountability wherever government expenditure was involved. In the case of research, accountability to one's scientific peers was no longer sufficient. 6 Research evaluations are one possible response to these emerging problems. The aim of such evaluations is to determine, in as systematic and transparent a manner as possible, which research activities have been conducted more successfully than others. 7 The results of these evaluations can then be fed into the peer review process which, I would argue, should remain at the heart of decision making in science. 8 Armed with this informa tion, policy makers and funding agencies will, one hopes, 9 make more effec tive decisions on the distribution of resources among competing alternatives than they would in the absence of evaluations.

H ow to Construct Valid a n d Reliable I n d icators for Evaluation In order to ensure that the indicators chosen for an evaluation are valid, the starting point for any research evaluation is to identify the main type of output from the research being assessed and the primary audience for that output. 1 0 One must then attempt to devise indicators to 'capture' that output and the impact of the output on the audience. 1 1 For example, for basic research, the main output is contributions to scientific knowledge, generally encapsulated in a journal article. The primary audience for such outputs is the scientific community. In this case, the number of articles published in

30

Ben R . Martin

international learned journals gives some indication of the volume of output from a basic science group. As for the impact of the work on the scientific community, this is generally acknowledged in the references listed in subse quent literature. Hence, the number of citations to the publications of a basic research group should give an approximate indication of their impact on the scientific community. However, for applied science or engineering, researchers may see their main output as a new product or process and their principal audience as companies. In such circumstances, numbers of publi cations and citations would be largely irrelevant. If we tum to the question of the reliability of indicators, we should note first that there are no perfectly reliable measures of research performance. At best, there may be a number of imperfect or 'partial' indicators, each capturing a different aspect of research performance (though with varying degrees of success). For example, while the number of publications produced by a group gives some idea of the total volume of output and the citation total an approximate indication of the impact, the number of highly cited papers relates to the group's success in making major advances or ' discoveries' , and the number of citations per paper is a size-adjusted measure of the average impact of the group's papers. Because there is no perfect measure of research performance, one cannot assess an i ndividual research uni t . (such as a department, laboratory or group) in isolation. However, one can make comparisons between research units, though with the proviso that one can only legitimately compare ' like with like' . Furthermore, since all indicators are imperfect, the most effective approach will often i nvolve using a range of them and seeing if the results are consistent or if they ' converge' . For example, for basic research, one might use a combination of several bibliometric indicators and peer evalua tion data. For applied research (where bibliometric indicators are often inappropriate), a better approach might be to combine peer review with ' customer review' - indicators based on the assessments of potential customers for the research results (for example, in industry). However, in both cases, such approaches work best when applied to groups of researchers (as opposed to individuals) and when those groups are roughly ' matched' - that is, when they work in the same field, are similar in size and funding levels, publish in the same set of journals, and so on.

Ensuring That Evaluations Yield Results Relevant to Science Policy Before devising a suitable methodology for an evaluation, one must first consider who is the most important audience for the results. If the prime purpose of the evaluation is to inform science policy making, then the

Factors Affecting the Acceptance of Evaluation Results 3 1 conditions to be met by the evaluation methodology and any indicators are rather different than when, say, sociologists of science are the end-users. One can identify a number of ' boundary conditions' on evaluations where the results are to be used for policy purposes: 1 2 1 The approach adopted must be capable of producing information that addresses specific problems facing science policy makers and funding agencies ( often concerning the distribution of resources). 2 The techniques should be sufficiently flexible to allow one to focus on any field of research. This requires being able to pinpoint reasonably accurately the boundaries of a given field. 3 It is often desirable to be able to draw international comparisons. Any data used should therefore be relatively free from cross-national bias, or, if biases exist, their approximate magnitude should be known so that the effects can be allowed for. 4 The data should permit significant trends to be identified. Data extending over several years are required to distinguish genuine trends from random fluctuations. 5 One should be able to disaggregate the indicators to focus on individual institutes, departments or groups, because these are often the unit of analysis for funding decisions. 6 The approach must be relatively inexpensive - the costs of monitoring and evaluating a given field should be much less than expenditure on research in that field. 7 The approach must be capable of being used routinely if it is to provide research performance data on a regular basis to supplement the peer review process. 8 The adopted approach must yield results in a publicly accessible and understandable form so that they can be validated by scientists and fed into the policy making process. The above boundary conditions helped to determine the evaluation method ology in the three SPRU evaluation studies described in the following sections.

The ' Big Science' Project Work on research evaluation began at the Science Policy Research Unit (SPRU) in 1 978. At that stage, concern over science policy was beginning to surface in Britain, not least because of the financial difficulties faced by the government. In order to determine what priority should be given to scientific spending compared with other government responsibilities such as education

32 Ben R. Martin and health, it was desirable to establish whether research funds were being well spent and what benefits were accruing. Since a key feature of Britain's expenditure on research was a heavy concentration on a few large scientific laboratories, 13 the obvious starting point for any evaluation was 'big science'. The objective of the first SPRU evaluation was to assess the scientific, technological and educational outputs from five 'big science' centers in the UK. Because the centers were all engaged in research of a very basic nature (radio astronomy, optical astronomy and particle physics), most effort was devoted to assessing the scientific outputs, and it is this aspect of the evalua tion that we shall focus on here. 1 4 The methodology involved the combined use of several bibliometric indicators together with extensive peer evalua tion based on detailed interviews. The initial response of scientists to our study and to the notion of evalua tion was somewhat contradictory. According to some, it was impossible to evaluate basic science - the end-product is too intangible and diffuse. Others argued that evaluation is certainly possible, but that only insiders (that is, scientific peers) could evaluate research. In their view, outsiders (and especially social scientists) could not tell them anything that they did not already know about research performance in their area. Undaunted, we ignored these warnings that our task was doomed ! However, they did alert us to the need to devote considerable effort to convincing scientists and others why evaluations are required and that they are certainly not impossible. What were the reasons for that early opposition? Partly underlying them was the ideology that science should be 'free' - free from external pressures of all forms. Society should leave it entirely to scientists to decide on the distribution of funds and to monitor how scientists use those resources. For some holding this view, the whole notion of evaluation is misguided because science is an activity characterized by an essential serendipity where the great bulk of experiments and theoretical hypotheses are doomed to 'fail'. These 'failures' are not just unpredictable but also essential to scientific progress. A second reason for the doubts of at least some scientists was that prior work on research evaluation had caused considerable controversy. 1 5 One task that we faced was to distance our study from earlier attempts at evalua tion, demonstrating how the problems that had beset previous methodolog ical approaches could be overcome. To do this, we adopted a twofold approach. The first part involved not just developing the necessary methodological tools for evaluation but also clari fying the fundamental concepts involved. In our first substantial publication on assessing basic research, 16 we devoted almost half the paper to this task. Secondly, we recognized the importance of interviewing large numbers of scientists (approximately 200 in this particular study), not only to elicit

Factors Affecting the Acceptance of Evaluation Results

33

information from them (such as their assessment of the performance of the main laboratories in their field), but also to discuss their objections to evalu ation, to remove any misunderstandings and perhaps to answer some of their criticisms. From our interviews and other discussions, it was clear that the results of this study would prove highly controversial. How then could we ensure the greatest level of acceptance? One step was to adopt the policy of sending successive drafts of our paper to a gradually widening circle of people, thereby steadily improving the paper while not immediately exposing it to the most hostile critics. We thus had a chance to identify weaknesses in the analysis and likely lines of attack, and were able to prepare our defenses to forestall the most likely criticisms. After much correspondence and rewriting, we submitted the manuscript to a journal. The next we knew, we were being threatened with legal action by a certain laboratory director! Later, we discovered that his first response had been to contact the Vice-Chancellor of Sussex University, asking him to get us to withdraw the paper. To his credit, our Vice-Chancellor refused on principle to meet this request; no pressure was brought to bear on us and indeed we only learnt of the incident later. Next, the publishers of the j ournal to which the paper had been submitted received a solicitor's letter. This stated, somewhat mysteriously, that, although the journal was published in The Netherlands, it was still subject to British libel laws. However, it gave no indication what might be the possible grounds for a libel action. The publishers were astonished. They sought advice from a leading libel barrister, who requested a few minor changes to the wording. 1 7 Eventually, the article was published, three years after it had been accepted, 1 8 and to our relief no libel action was mounted. What was the wider response to this first evaluation? There were several forms of reaction. From a few senior scientists, there was outright hostility. For them, evaluations were a threat to the very autonomy of science - an autonomy vital to its health. A more common response, however, was skepticism. This took two mutually contradictory forms. Some focused on the methodology, claiming this was inappropriate and the conclusions there fore unsound. Others were more concerned with the results. According to this view, the methodology might be acceptable but the results told the scientific community nothing that they did not already know - in which case, why bother with all the effort of compiling bibliometric indicators and conducting large numbers of interviews? Outside the 'big science' community, there was another reaction, one of surprise: surprise that an evaluation of a very basic area of science had proved possible; and even greater surprise at some of the results. For example, the laboratory where the director was so upset (let us call it Center A) had always been in the public eye. 1 9 Our results indicated that, although

34

Ben R. Martin

Center A had been a world-leader during the 1950s and 1960s, it had since been overtaken by several overseas laboratories. In contrast, the other British laboratory in the field (Center B) was still among the three world leaders. 20 Even so, Center A continued to be perceived by virtually everyone outside the field as at least the equal of Center B. It had 50 per cent more staff than Center B and received 50 per cent more funding. Yet the indicators together with the results of our interviews with scientists in the field suggested that Center A was now well behind Center B in terms of scientific contributions over the 1970s. 2 1 This came as a surprise to policy makers at the relevant Research Counci122 and even to scientists in adjacent fields. The early reactions to our results, while not completely unanticipated, were certainly more intense than expected. With hindsight, we had perhaps not fully appreciated the wider political context in which the study was carried out. In view of the deep-rooted skepticism, we clearly had to devote much effort to disseminating the results and discussing them with critics. This involved giving a large number of conference papers, seminar presenta tions and more popular lectures. The results were also published in various journals aimed at different audiences. In addition, we tried to ensure wide coverage of the findings in the scientific and popular press. The result was that many scientists as well as policy makers gradually began to see that quantitative evaluations could play a useful role in supplementing conven tional peer review procedures. 23 The lesson is that, if the conclusions of an evaluation are to be accepted, one needs to go to some lengths to ensure that the results reach all the interested parties. The NTN F Study The 'big science' project revealed a great deal about factors governing the acceptance of evaluation results, such as the relationship between evaluators and those being assessed, and that between evaluation and wider policy issues. We were therefore better prepared for our next evaluation, which dealt with the mechanisms used in Norway for supporting industrial R&D and in particular the Royal Norwegian Council for Applied Research, NTNF. NTNF is the main organization responsible for the support of applied R&D in Norway. Since its establishment 30 years earlier, it had the same director. During this period, one strength had been its great flexibility. Whenever a new technological need arose, a new institute or committee was set up. By 1980, there were some 30 institutes (a large number for a country with Norway's population) and a similar number of NTNF committees. With a new director taking over, questions were being asked about whether NTNF had become too cumbersome. In addition, there were worries about the

Factors Affecting the Acceptance of Evaluation Results

35

longer-term competitiveness of Norwegian industry, with a recognition that greater efforts were required to ensure that local firms harnessed the poten tial of new technology. These concerns led to the setting up of the 'Thulin' Royal Commission to evaluate government policy for supporting industrial R&D. The time-scale for the Commission's inquiry was quite short and SPRU was commissioned to produce a report within five months. Since we could not look at all industrially related research, we focused on two fields: mechanical engineering, a more traditional research area; and electrical/electronic engineering, a newer and rapidly developing field. In this way, we hoped that our findings might be generalizable to technology and industry as a whole. We also concentrated on research in just four insti tutes, these being carefully chosen to reflect different ways of organizing applied research and interacting with industry. They included the two largest NTNF institutes (one a free-standing institute with little interaction with the nearby university, the other an integral part of a technical university), a smaller regional research institute (whose primary task was to meet the needs of local industry) and the Norwegian Defence Research Establishment. Given our experiences with the 'big science' project, the first step was to explore and map out the political context of the evaluation, identifying the main actors and their respective interests. This proved fairly easy, not least because of the great openness encountered in Norway. 24 We were thus quickly able to determine the principal issues on which the evaluation should focus and the questions to ask in interviews. The methodology we adopted involved a combination of peer review (that is, what scientists thought of the work in which they and rival institutions were engaged) and 'customer review' (that is, the views of actual or poten tial 'customers' for the research results, in particular those working in industry). We also analyzed relevant statistical material (for example, on the distribution of grants). 25 However, the main task was to carry out some 200 interviews, half with scientists at the four institutes and half with industrial research managers. The Thulin Commission seemed well pleased with our findings, 26 many of which they incorporated in their report. 27 This in tum formed the basis of a White Paper, which eventually brought about some of the changes we had proposed. This was a gratifying impact for a commissioned study. 28 What were the reasons for the high level of acceptance? A very important factor was the extensive interviewing. 29 When we planned the interviews, one aim had been to identify and trade on our comparative advantages. In persuading scientists and industrialists to give up their time to be interviewed, we had presented this as a unique opportu nity to put their views, and any criticisms, to a high-level committee without

36

Ben R . Martin

any risk that their responses might subsequently be used against them. Many leapt at the opportunity. Furthermore, by promising to treat interviewees' comments confidentially (they were analyzed statistically, and where any quotes were used, there was no attribution to named individuals), we found we were more likely to obtain people's 'real' views than glib 'public relations' answers. Another comparative advantage was our status as independent outsiders. If the study had been carried out by a Norwegian group, they would doubt less have been seen as representing some vested interest. For example, if the evaluation had been conducted by a university group, any recommendation of a larger role for universities would have been greeted with suspicion. Likewise, if an NTNF institute had been responsible, their suggestions concerning the institutes might have been seen as special pleading. SPRU, by contrast, had no obvious axe to grind, and it was therefore harder to dismiss our conclusions as tainted by vested interest. However, the notion of acting as independent experts should not to be taken too far. In the report, we stressed that the conclusions were not so much our own as a synthesis of views expressed to us. We merely acted as a 'midwife' for criticisms and proposals for change that were already latent among those interviewed. Each point was also backed up with illustrative quotations to demonstrate that the idea had not originated with us. 30 The final factor aiding the acceptance of the evaluation results was that Norway is very small in population terms. During the study, we were able to talk to a large proportion of the key actors. This afforded us the chance to try out different views and possible changes and to obtain feedback. The inter views also set people talking; in certain cases, they may have had a catalytic role in reshaping people's views. Consequently, the report's recommenda tions came as no great surprise, though some were quite radical (for example, the idea of separating the institutes from NTNF). The groundwork for acceptance had already been laid. This highlights the crucial importance of interviews in evaluations. It can be dangerous for evaluators to rely solely on indicators or even questionnaires (where there is less scope for feedback and follow-up questions). Such an approach may lead evaluators to misun derstand the complexities of the situation, and certainly the audience will be far less prepared for the eventual conclusions. Interviews encourage a sense of participating in discussions on key issues, with the result that those involved are more likely to be committed to the results. 3 1 Although our report was generally well received, there was a problem with one institute. From the start, we had sensed a reluctance to take part in the study. Whereas another laboratory had made available every facility, this institute had left us to fend for ourselves. 3 2 The institute had been very successful in the 1950s and 1960s (for example, it was central in the intro duction of CAD/CAM to shipbuilding). However, since that time, its

Factors Affecting the Acceptance of Evaluation Results 37 problems had been growing. One could detect a process of 'institutional ageing' - not just the ageing of the staff (many of whom had been recruited during the 1950s and worked in fields that were no longer so topical), but also the onset of a certain intellectual stagnation as the institute lost its sense of direction and the creative urge to do new research. Furthermore, the insti tute was no longer very good at recognizing the true needs of client firms. 33 All this had gone largely unnoticed by senior management at the institute, though many of the research staff were quite critical in their discussions with us. Our report duly noted the criticisms of the institute's management put to us by staff and by companies. Not surprisingly, this met with a hostile reaction from the institute's director. He argued that we totally lacked the expertise needed to evaluate his institute and refused to accept that the criti cisms had any foundation. However, the story had a happy ending ! Two years later, we attended a conference in Helsinki. Among the participants was the same institute director. When we arrived at the conference recep tion, who should walk over to us but the director. To our surprise, he was smiling. We learnt that he had eventually come to see that we had no reason to be biased against his institute so the criticisms had to be taken seriously. He then had to persuade his staff that changes were essential and had used our report for this purpose. There had since been an upsurge in the institute's fortunes, to such an extent that at the end of the year it had made a profit on contract income and was able to award staff a Christmas bonus, an event previously unheard of among Norwegian institutes.

The Evaluation of the ECSC Steel Research Program By 1980, the European Commission had accumulated some 50 major research programs. Each supposedly had a fixed lifetime. However, they tended to be routinely renewed with little in the way of serious appraisal. Consequently, some had by then been operating for over 20 years. It was recognized that a more systematic procedure was needed to evaluate each program as it approached the end of its term, before deciding whether it should be extended. In response to the growing concern, the European Commission set up an R&D Evaluation Unit to conduct assessments of Community research programs. By 1982, six evaluations had been completed. The approach was to convene a committee of experts in the field concerned. They met once a month or so to hold hearings with a few senior researchers and officials involved in the program. All six investigations led to remarkably similar reports: each concluded that the research program in question was essen tially fine; all that it needed was more money. In the Evaluation Unit and

38

Ben R. Martin

elsewhere, there was unease that this approach was failing to get to grips with the problems known to confront Community research programs. It was felt that a more critical approach was called for, making use of independent evaluators. The next program to be assessed was the European Coal and Steel Community (ECSC) steel research program. The usual committee of experts was assembled to conduct official hearings with the heads of steel research institutes and the R&D directors of steel producers. In addition, an SPRU team was asked to participate. Our task was to seek the views of researchers and companies 'at the ground level' . 34 Initially, my colleague and I knew little about steel research or the steel industry. Our first priority was to find half a dozen 'tame' experts to brief us - people with whom it did not matter if we displayed our ignorance, but who could point to the key issues to be addressed. These preliminary discussions also gave an opportunity to map out the wider political context of the steel research program and of the evaluation exercise. The methodology we then adopted was similar to that used in the NTNF study, involving extensive interviews with steel researchers (that is, peer review) and industrialists ( 'customer review' ), together with an analysis of statistical and other infor mation (for instance, on projects funded through the program). 35 Our report pointed to several major achievements of the steel research program but also to some shortcomings. One was that the structure of the program still reflected the structure of the steel industry 10 or 20 years earlier. The program had been operating since the 1950s, a time when there were many medium-sized steel producers in Europe, most without their own R&D laboratories. Therefore, it made sense for countries to have a national metallurgical research center conducting R&D on behalf of all the compa nies in the sector. At that stage, it was only appropriate that a majority of the ECSC funds should go to these centers. By 1983, in most European countries there were just one or two big steel producers and all had their own R&D laboratories. Yet in Belgium, France and Italy, up to 90 per cent of ECSC funds continued to be channeled to the national research center. Steel producers in these countries justifiably felt aggrieved. We duly reported this and suggested that the research program might adjust to the changed industrial structure. 36 Another controversial suggestion was that the role played by the principle of juste retour should be reduced. The ECSC steel research program was funded by a levy on steel production. Decisions as to which research projects to support were then made by a committee supposedly on the basis of the merits of individual project proposals. We were therefore surprised to discover, when we compared the percentage contribution of each country (raised through the levy) with what each country got back from the program, that the national percentage shares agreed almost exactly ! Some forthright

Factors Affecting the Acceptance of Evaluation Results 39 probing of interviewees elicited the explanation. The day before the program committee officially convened to go through the proposals and decide which should be funded, national representatives assembled informally and agreed what share of the 'pot' each country should get and hence which projects should receive support in order to arrive at the desired end-result. When they met with EC officials the following day, the decisions had been largely prepared. Initially, our report 37 received a favorable response from the panel of experts, the EC R&D Evaluation Unit and steel producers. Later, however, critics brought pressure to bear on officials responsible for the steel research program. Here, an institutional problem became apparent: the steel program and the R&D Evaluation Unit reported to the same Directorate. Criticisms of our report went up the chain of command, then down to the Evaluation Unit which was 'leaned on' to tone down or reject our findings. Our report was gradually buried and the committee of experts also failed to produce a report acceptable to the Commission. 38 Two main lessons can be drawn from this saga. First, unlike the NTNF assessment, there was little recognition - prior to the evaluation being commissioned - that changes were needed. Without that prior commitment among those responsible for the steel research program, the chances of our findings having any appreciable impact were always slim. A second lesson concerns the autonomy and institutional location of evaluators in relation to those being evaluated. The closer that the body responsible for initiating an evaluation is to the policy making and imple mentation process, the more likely the results are to be accepted and acted upon. Yet at the same time, too close proximity is likely to bring implicit or even explicit political pressures to ensure that the evaluation comes up with the 'right' conclusions. In this example, the EC R&D Evaluation Unit proved to be too close to the steel research program. Lacking sufficient insti tutional autonomy, it eventually acceded to demands that the evaluation results be suppressed. Conversely, the SPRU team was sufficiently remote that its findings could be safely ignored by the Commission. 39 Conclusions From the above examples, a number of conclusions can be reached about factors governing the acceptance of evaluation results. First, any evaluation is likely to yield results that are subject to political pressures from vested interest groups keen to use the findings to their advantage. Whether the evaluation conclusions eventually have any impact may ultimately depend less on the technical rigor of the exercise than on the weight of institutional forces lined up in support or opposition. Indeed, unless there are strong

40

Ben R. Martin

interest groups with a prior commitment to assessment and a willingness to contemplate changes, an evaluation is unlikely to have any major influence. Hence, the essential first step in an evaluation must be to map out the wider political context, identifying the key actors and their respective interests. This is not to say that the evaluation should then be 'compromised' in terms of merely telling people what they want to hear, but rather that much thought must be given to presenting the findings so as to maximize the chances that they will be seen as relevant40 and that they will receive support from at least some of the interested parties. Where opposition is inevitable from one or more groups, efforts should be made to anticipate the likely form of their objections and to respond to them in advance. 4 1 A second conclusion is the importance o f involving a s fully a s possible in the evaluation all those being assessed, be they scientists, policy makers or any other stakeholders. 42 Widespread interviewing is often essential to achieve this. Evaluations that rely predominantly on indicators or postal questionnaires are at a severe disadvantage here. Furthermore, interviews provide an opportunity to check or even challenge the responses of intervie wees where further elaboration or substantiation is required. The quality of the information thus obtained should therefore be appreciably higher than that from questionnaires. Third, the evaluation methodology must ultimately be acceptable to those being assessed. This is partly a question of credibility, of ensuring that inter nationally recognized experts are involved in the evaluation. 43 There are, however, other considerations. As noted earlier, the methodology cannot be too complex because neither those being evaluated nor the potential users of evaluation results will usually have the time or patience to follow all the intricacies. Likewise, it should not be too expensive, or scientists will argue that the money would have been better spent on them. (We have argued elsewhere that, as a rule of thumb, evaluations should not cost more than 1 per cent of the research being assessed. 44) The evaluation must not take up too much of the evaluee's time. It should also give results within a reason able period. If the evaluation takes too long, the findings are likely to be out of date and policy interests will almost certainly have moved on by the time the work is completed. A fourth conclusion is that the results must be simply and succinctly expressed. This means avoiding the traditional guarded style of the academic (on the one hand, this; on the other hand, something entirely different). The audience must know exactly what findings have been reached so that they can either agree or disagree. Conclusions that are so hedged around with reservations that no one can work out what they really mean are useless for policy purposes. Likewise, brevity is of the essence, especially where the intended audience consists of senior policy makers; they are generally too busy to read anything longer than an executive summary,

Factors Affecting the Acceptance of Evaluation Results 4 1 though they need to be reassured that it i s backed up by a more detailed report. Next, for any evaluation, there are several different audiences who may be interested in the results - scientists in the area, policy makers, other science policy researchers, even the general public. For different audiences, one needs different mechanisms for ensuring that the evaluation findings are disseminated as effectively as possible. This will probably entail publishing the 'same' results in different journals, some of a more scholarly nature, others with a semi-popular orientation. In addition, it must be recognized that any good evaluation is likely to prove controversial. It is j ust conceivable that the research (or group of scientists) one is evaluating represents perfection and that the resources made available could not under any circumstances have been. used more effectively. In that unlikely event, the science policy researcher should produce a report that highlights all the successful features and the reasons for the success. In most circumstances, however, there will be some strengths but also certain weaknesses or areas where, at least with the benefit of hindsight, performance could have been improved. A balanced and fair appraisal of those strengths and weaknesses is vital. Furthermore, in analyzing the defects, the aim should be not so much to criticize those involved (who did not have the benefit of all the information now available). Rather, the intention should be to identify constructive lessons for the future, lessons that may help to ensure that the limited resources available for science will be employed as effectively as possible. Finally, the institutional location and degree of autonomy of the evalua tors is crucial. As the above examples illustrate, the results of any evaluation will be subj ect to political pressures as different interested parties attempt to use or undermine them for their own purposes . Evaluators must have suffi cient independence to ensure that they are not constrained by such pressures. At the same time, however, if the results are to be used, the evaluators must not be too remote from those responsible for the policies being assessed. In other words, the evaluators need to be close - but not too close - to the policy makers if their efforts are to be effective and their results accepted. Achieving the appropriate balance here is one of the most difficult tasks facing those embarking upon evaluations of science and scientists.

Notes a n d References 1 I am grateful to the Economic and Social Research Council for support of my work on research evaluation through the earlier programs on 'Academic Research Performance Indicators' and 'The Interface between Corporate R&D and Academic Research' , and currently through the ESRC Center for Science, Technology, Energy

42

Ben R. Martin

and Environment Policy (STEEP). I would also like to thank Diana Hicks, Phoebe lsard, Terttu Luukkonen, Geoff Oldham and Keith Pavitt for helpful comments on an earlier draft presented at the SPRU International Conference on Science and Technology Policy Evaluation, London, 2-3 October 1991 . 2 P. Montigny, 'The Cassandra Paradox Relating to the Effectiveness of Evaluation and of Forecasting Methods' , in D. Evered and S. Harnett (eds.), The Evaluation of Scientific Research (Chichester: Wiley, 1 989), pp. 247-64. Montigny gives particular importance to the competence, legitimacy and authority of the evaluators. 3 See T. Luukkonen and B. Stahle, 'Quality Evaluations in the Management of Basic Research' , Research Policy, Vol. 19 ( 1 990), pp. 357-68. The authors attempt to analyze the utilization and acceptance of research evaluation data by drawing upon a conceptual framework established in the field of 'evaluation research' (that is, studies of the utilization of research information in decision making). 4 J. Irvine and B. R. Martin, 'What Direction for Basic Scientific Research?' , in M. Gibbons, P. Gummett and B. M. Udgaonkar (eds.), Science and Technology Policy in the 1980s and Beyond (London: Longman, 1 984), pp. 67-98 . 5 However, as regards the problem of how to determine the allocation of resources between fields, research evaluations offer little more help than peer review. Because the exact approach will depend on the nature of the field being assessed, only comparisons within that field can be legitimately made. At best, one can construct relative performance indicators - relative, that is, to the average for the field. Even so, relative performance in a specific field is not the only criterion for deciding where to invest money. 6 A more detailed analysis of the reasons why research evaluations are needed can be found in Irvine and Martin, 'What Direction for Basic Scientific Research?' , op. cit. 7 This represents a view of evaluation as seen from a science policy perspective. Others may, of course, regard the purposes of evaluation very differently. For example, those being assessed may see the primary role of evaluation as providing more job satisfac tion, better management, more conducive working conditions and so on. For them, the criteria for judging what constitutes 'success' in an evaluation may consequently be quite different. Likewise, those in the science studies community may have a different view of the function of evaluations. 8 In other words, I do not advocate that evaluations or performance indicators should replace the peer review system, merely that the results should be fed into that system to enable it to arrive at better informed and more systematic decisions. 9 The assumption here that more information results in better decisions is, however, virtually impossible to prove. In addition, it must be stressed that evaluations deal with past performance while policy makers are concerned with future performance. Nevertheless, past 'track record' is one of the most important factors to be considered when determining which research groups to fund. For an example of how this might be done, see B. R. Martin and J. Irvine, 'CERN: Past Performance and Future Prospects - III - CERN and the Future of World High-Energy Physics' , Research Policy, Vol. 1 3 ( 1984), pp. 3 1 1-42. 10 For a more detailed discussion of the issues raised in this section, see B. R. Martin and J. Irvine, 'Assessing Basic Research: Some Partial Indicators of Scientific Progress in Radio Astronomy' , Research Policy, Vol. 12 (1 983), pp. 61-90. In partic ular, this introduces the concept of 'partial indicators' and sets out the methodology of 'converging partial indicators ' . 1 1 I n some cases, however, no indicators may be appropriate and an entirely qualitative approach must be adopted. 1 2 This list comes from D. Crouch, J. Irvine and B. R. Martin, 'Bibliometric Analysis for Science Policy: An Evaluation of the United Kingdom' s Research Performance in

Factors Affecting the Acceptance of Evaluation R esults

13 14 15

16 17 18 19 20

21

22

23

24

43

Ocean Currents and Protein Crystallography', Scientometrics, Vol. 9 ( 1986), pp. 239-67 (see p. 24 1). In 198 1-2, for example, 65 per cent of the expenditure on science by the Science Research Council was devoted to just seven laboratories. See Irvine and Martin, 'What Direction for Basic Scientific Research?', op. cit., pp. 74-5. The evaluation of the technological and economic outputs from radio astronomy laboratories is described in B. R. Martin and J. Irvine, 'Spin-Off from Basic Science: The Case of Radio Astronomy', Physics in Technology, Vol. 12 (1 98 1), pp. 204-12. For example, the work by the Cole brothers using publication and citation analysis see J. R. Cole and S. Cole, 'Citation Analysis', Science, Vol. 183 (1974), pp. 32-3, and 'The Ortega Hypothesis', Science, Vol. 178 (1972), pp. 368-75. Although the Coles' work focused on the United States, it was widely reported in Britain, with New Scientist (19 November 198 1, p. 483) coming up with the memorable headline 'God does play dice - with scientists' grants!' Martin and Irvine, 'CERN: Past Performance and Future Prospects', op. cit. You are permitted to say that one laboratory is less good than other centers, but not that it is poor. In other words, relative statements are allowed but not absolute ones. Hence, the paper is dated 1983 even though it was actually accepted for publication in 1980. He had published several popular books and numerous articles and had appeared frequently on radio and television. At least at the time of our study. Since then, it too has apparently begun to slip. See J. Irvine, B. R. Martin, J. Abraham and T. Peacock, 'Assessing Basic Research: Reappraisal and Update of an Evaluation of Four Radio Astronomy Observatories', Research Policy, Vol. 16 (1987), pp. 213-27. One defense offered by Center A was that during the period analyzed, it had been heavily involved in the construction of a major new facility. Once this was complete, it argued, the research output would improve considerably. Our response was that we had looked at performance over a reasonably long period (ten years), long enough to allow for the effects of this 'instrumentation' cycle. Throughout that time, Center B's research performance had been markedly better, even though it too had been involved in constructing new equipment. Furthermore, a later analysis showed no evidence of any pronounced improvement in Center A's output over subsequent years (see Irvine et al., ibid.). This is one example of how, under certain conditions, peer review can break down. In a big science where there are perhaps only two or three laboratories operating, those laboratories may decide it is in their joint interests to maintain a united front to the outside world. They may take it in turns to put in proposals for major new facilities, with each proposal receiving the backing of the other laboratories (regardless of what they might actually think of the scientific merits of that project). Our interviews suggest that, especially in big science, the 'public' views expressed by scientists in the peer review process may on occasions diverge quite appreciably from their 'real' views. Important here was the realization that publication and citation data could be used to support scientists' views on the relative health of British science, an issue on which there had previously been little more than anecdotal evidence. Bibliometric data have since played a key part in the debate over whether British science is in relative decline. This was in refreshing contrast to the cloak of secrecy that pervaded science policy in Britain at the time. An amusing example of this concerned another of the 'big science' areas we had examined. Among the scientists interviewed was one who had been a member a few years earlier of a Science Research Council (SRC) committee

44

Ben R. Martin

set up to make recommendations on future British research facilities in the field. After the committee had drafted its report (which contained some controversial recommendations) and submitted it to SRC, this person had heard nothing more. A few months later, he approached SRC for a copy of the committee' s report, only to be told that it was unavailable because it was secret ! 25 There was little use of bibliometric indicators for mechanical engineering; such indicators are largely irrelevant, since researchers in this field do not regard articles in learned journals as their primary output. 26 J. Irvine, B. R. Martin and M. Schwarz, with K. Pavitt, K. and R. Rothwell, Government Support for Industrial Research in Norway: An SPRU Report (Oslo: Universitetsforlaget, 198 1 ), Norwegian Official Publication NOU 1 98 1 : 30B . 27 Thulin Commission, Forskning, Teknisk Utvikling og Industriell Innovasjon (Oslo: Universitetsforlaget, 1 9 8 1 ) . Norwegian Official Publication NOU 1 98 1 : 30A. 28 For an analysis of the impact of this evaluation, see T. Olsen, The SPRU report on Industrial Research in Norway' , in B. Stahle (ed.), Evaluation of Research: Nordic Experiences (Copenhagen: Nordic Science Policy Council, 1 987), FPR publication No. 5, pp. 203-209. 29 See Olsen, 'The SPRU report on Industrial Research in Norway' , ibid., p. 209. 30 Afterwards, we learned that the large number of anonymous quotations that appeared in the report served the added purpose of providing a bit of 'spice' as readers attempted to guess who had said what ! 3 1 A similar conclusion holds for research foresight; attempts to look into the future of research in order to arrive at priorities are more likely to prove successful if all those who will be affected by the results have been fully involved and are consequently committed to the findings. See B. R. Martin and J. Irvine, Research Foresight: Priority-Setting in Science (London: Pinter Publishers, 1 989), p. 35 1 . 3 2 While the first institute generously lent us two offices in which to conduct the inter views and to make telephone calls, in the second we had to use a telephone in the corridor in order to make appointments to see staff. This had told us something about that institute' s ability to deal with visitors, an important attribute in an organization reliant upon earning a large volume of contract research income. 33 An interesting example was a commissioned project to construct an electronic wheel chair. A company that for many years had manufactured seats for trains and trams recognized that it had to develop more technologically sophisticated products if it was to survive. It hit upon the idea of using its expertise in manufacturing seats to produce an electronically controlled wheelchair for the disabled. Since it knew nothing about electronics, it approached the institute for help. Researchers at the institute duly produced a sophisticated electronic control system that enabled the chair to do virtually anything. According to the researchers, this project had been a great success. However, as far as the company was concerned, it was a disaster: it was too complex to manufacture and too expensive to sell commercially. 34 One EC official suggested that our role was 'to rock the boat, although not too violently' , just in case we antagonized the industry, and thousands of steel workers descended on Brussels to start banging on the doors of Commission ! 35 For a field like steel research, bibliometric indicators are of little relevance. 36 One reason why the three national centers still obtained most of their countries' funds from the ECSC program was that in each of the three countries the national represen tative on the committee responsible for the allocation of project grants was the center director. Not surprisingly, the centers were far from enthusiastic about the suggestion in our report that a higher proportion of funds should go to steel producers, and subsequently expressed their views to EC officials. 37 J. Irvine and B. R. Martin, The Economic and Social Impact of the ECSC Steel

Factors Affecting the Acceptance of Evaluation Results

45

Research Programme: An SPRU Evaluation (Brighton: SPRU, University of Sussex,

1 983, mimeo). 38 The Commission even withdrew the permission given to SPRU at the start of the project to publish an academic journal article. Although we prepared a draft paper describing the methodology and summarizing the findings (we were careful not to betray any commercially confidential information) and sent it to the EC for comment, the message came back that under no circumstances were we to publish it. Furthermore, if we ignored this instruction, it was implied that SPRU would never again receive funds from the EC. Regretfully, we backed down. According to Commission officials now, this study did not take place and our report never existed - a rewriting of history that smacks more of (the old) Eastern Europe than the West. A little glasnost here would perhaps not come amiss ! 39 There have been several changes since, and the R&D Evaluation Unit now has greater autonomy. 40 See the discussion of 'relevance' in Luukkonen and Stahle, 'Quality Evaluations in the Management of Basic Research' , op. cit. , pp. 365-6. 41 In the words of a famous Welsh rugby coach, 'Get your retaliation in first ! ' 4 2 Cf. Luukkonen and Stahle and their discussion o f 'communication' (op. cit. , p . 366). 43 Cf. the discussion of 'credibility' in ibid., pp. 366-7. 44 See Crouch et al. , 'Bibliometric Analysis for Science Policy' , op. cit. It is significant that the US Congress has mandated the National Institutes of Health to devote 1 per cent of their budget to monitoring and evaluating the research they support.

PART

11

PEER REVIEW: SELF-REGULATION AND ACCOUNTABILITY IN SCIENCE

Part II consists of six chapters on the subject of peer review; there are five from Eastern Europe, and one from the United States. Edward Hackett argues that peer review is much more than a set of practices and principles for allocating rewards and resources. Too often, he says, analysts focus exclusively on the mechanics of peer review and produce erroneous diagnoses of perceived malfunctions in the system because they fail to understand the multiple functions of, and demands made on, peer review. Hackett argues that our understanding of how particular peer review systems function would be enhanced by further research on such issues as the ways in which stakeholders - decision makers, politicians, scientists - actually differ in their definitions and demands of peer review, and the relationship between a given configura tion of values within a peer review system and selection decisions and project outcomes. Adam Lomnicki opens the East European contribution to this Part with a scathing critique of research evaluation as practiced under communist rule in Poland, and a skeptical view of the changes that have been introduced to date. Despite the introduction of competitive grants in Poland, the majority of research support is still allocated via block grants to research institutes. The competition within institutes for increasingly scarce resources continues to exert a dampening effect on critical debate, while reforms intended to democratize institute decision making have simply increased the power of a large number of middle aged and mediocre scientists who have no interest in promoting excellence, writes Lomnicki. He offers a

48

Part I I

number of recommendations to improve the standard of a research evalua tion in Poland. Katalin Hangos discusses the use of peer review in Hungary, including a brief account of recent changes in evaluation procedures governing the award of higher degrees in Hungary, and the peer review process within her own research institute. She describes the 1992 evaluation of the research institutes of the Hungarian Academy of Sciences, and argues that the evalua tion yielded assessments that were uniformly positive and thus of little use to the leadership. She concludes by summarizing what she views as major weaknesses in the main peer review systems functioning in Hungary: the failure to involve foreigners and the consequent lack of real anonymity among reviewers; the absence of any mechanism to evaluate reviewers themselves; and the tendency among decision makers to prefer quantitative, rather than narrative, assessments. Peter Zilahy and Istvan Lang also focus on the evaluation carried out by the Hungarian Academy, as well as a subsequent evaluation conducted by the International Council of Scientific Unions (!CSU). Their view of the first evaluation differs somewhat from that of Hangos. While they agree that the results were in some ways disappointing, they argue that the reason for this was that reviewers were anxious to shield research institutes from cuts in funding and personnel, and thus produced highly positive assessments. The !CSU evaluation was intended in part to provide an assessment of the Academy's own evaluation exercise, and Zilahy and Lang outline the practical recommendations that emerged from the !CSU study. Miroslava Vrbova recounts her experience as both a grant applicant and a reviewer of grant applications. While one problem with peer review in the Czech Republic, according to Vrbova, is that anonymity is virtually impos sible, the main problem is that the shortage of money means that many research proposals highly assessed by reviewers cannot be funded. When decision makers have to choose which projects to fund, and particularly when they have to choose between projects in different fields, Vrbova argues that the peer review system is of little use, and decisions are based on fairly ad hoc criteria. Julita Jablecka draws on empirical research to analyze the functioning of Poland's new funding agency, the Committee for Scientific Research (CSR). Any funding agency, observes Jablecka, is a complex structure in which individual subsystems of peer review function independently of each other, and she suggests that within any peer review system, there is a division between two sets of norms that derive from the culture of science and the culture of administration. Her paper focuses on one subsystem of the CSR, disciplinary sections or review panels. She examines their internal opera tions and their relations with the CSR committees that make the final funding decisions. She discusses the impact on the CSR of various external

Peer Review: Self-Regulation and Accountability in Science 49 factors, such as the shortage of resources, the absence of a developed private sector to fund applied research, and the fact that the CSR is the sole source of all government research funding, and proposes several ways in which project evaluation and selection within the CSR could be improved.

4

PEER REVIEW IN SCIENCE AND SCIENCE POLICY Edward J . Hackett

An East European economist once observed that ' you don' t try to cross over a chasm in two leaps' , meaning that the social and economic transformations of those nations might best be accomplished in a single bound. 1 Adhering to this principle, Jeffrey Sachs and others have recommended that East European nations adopt Western practices in a paroxysm of change. 2 I' m not certain that social policy should be guided by aphorisms, however clever they may be. Nor am I convinced that orderly social change bears much resemblance to ' leaping over a chasm' . But if for the moment we accept the chasm-leaping metaphor, if not the advice it embodies, then I would urge the science policy community of Eastern European nations to look before they leap into Western practices of science evaluation and peer review. My intent in this chapter is to provide information useful in deciding whether and in what direction to leap, and perhaps even to suggest ways of crossing the chasm more prudently. ' Peer review' is an umbrella term for a family of selection and oversight practices. Included are the familiar mechanisms of grant proposal review at agencies such as the National Science Foundation and National Institutes of Health in the United States, manuscript review at scholarly journals, and expert review of decisions (say, in hospitals). While there are strong family resemblances, any close study will immediately reveal consequential differ ences, even across programs within a single agency (or journals within a single field of science). Since the mechanics are generally familiar but also complicated and idiosyncratic, I will not describe them here. 3 Too often, discussions of peer review focus narrowly on technical matters such as

52 Edward J . Hackett inter-rater agreement, conflicts of interest, and normalization of raters' scores to achieve comparability across panels. Analysts mesmerized by the mechanics of peer review (and allied questions about the efficiency, fairness or reliability of those practices) have often produced narrow empir ical studies and skewed critiques. As a corrective to this tendency, I wish to discuss three recent examples of such studies, then offer a framework for thinking more deeply about the merits and purposes of peer review. In his recent book Impure Science,4 Robert Bell is intent upon exposing the politicized nature of contemporary US science. In this effort he devotes a sizable chunk of his book to telling two stories about peer review. One tells how Jon Kalb, an American archeologist in Ethiopia, was denied National Science Foundation research support because a conspiracy of colleagues working on similar topics used the peer review system to unfairly scuttle his grant application (for example, by rumoring that he worked for the US Central Intelligence Agency). The second tells how a Buffalo, New York, university was awarded funds to construct an earth quake engineering center because a conspiracy of colleagues working on similar topics used the peer review system to unfairly promote its grant application. Left hanging is the critical question of how a scientist's closest colleagues/competitors decide whether to help or harm a proposal's chances for support. Kalb was purportedly undone by reviewers working on very similar topics whereas, Bell says, the chief competitor of the Buffalo center (a California consortium) was disadvantaged because no scientists from the West Coast were on the review panel. If familiarity can cause both favoritism and its opposite, then it is not a very helpful explanatory variable. At best we may conclude that science is politicized and that the peer review system can be a conduit of bias. Peter Abrams, in an empirical article published in Social Studies of Science, 5 explores this puzzle: past scientific performance (indexed by counts of publications and citations) is strongly correlated with future scien tific performance, but past performance is not correlated with reviewers' ratings of grant proposals. Since it is therefore unlikely that proposal ratings are correlated with subsequent scientific performance - indeed, you could readily calculate the maximum possible correlation - he concludes that NSF should abandon the practice of awarding research support on the basis of 'hurried evaluations by relatively poorly qualified judges of documents (proposals) whose quality has been shown to correlate poorly with the quality of published work'. 6 (A piece of his analysis calls into question the scientific credentials of his own panel.) In the place of peer review Abrams recommends a system that awards grants according to 'some measure of past research achievements' that would 'require considerable research' to design but would probably include a moving average of peer ratings of prior work, citation and publication counts, with handicaps for older investigators,

Peer Review in Science and Science Policy

53

younger investigators, field-switchers, women, minorities and geographic diversity. 7 Abrams's analysis of the shortcomings of NSF peer review is generally convincing, though he places unwarranted faith in publication and citation counts as measures of scientific quality, and his remedy is no better founded than the disease it would cure. First, the practice of using publication and citation counts to evaluate the work of individual investigators is shaky; such indicators are more stable and valid when applied to larger collectivi ties (such as research teams, specialties, disciplines). Second, peer judgment of papers is subject to many of the same distorting forces as peer review of proposals: referees' evaluations of manuscripts are not always reliable and valid, 8 and editorial decisions can be influenced by author characteristics. 9 Whatever concreteness of judgment is gained by having a completed piece of work in hand may be lost when the paper 's results interact with referees' preferences and preconceptions. In contrast, a proposal can be open-ended enough to allow reviewers to project their preferred outcomes onto the proposed experiments. Finally, there is an attempt by Abrams to overload peer review with the task of ensuring fair treatment of young and old, black and white, men and women, and that those located on the coasts and in the heartland all get their due. By the time all these adjustments were applied, the unwieldy system that resulted would appear arbitrary and illegitimate. David Faust's contribution to the peer review literature is a synthetic paper that reviews and criticizes empirical studies of reliability and bias in peer review, undertakes a social and philosophical examination of the logical empiricist principles that motivate such studies, and proposes a pluralistic perspective on peer review. 1 0 I'll say nothing here about his review and critique of empirical work but will turn to his model of pluralism because it provides a jumping off point for my own concerns. Faust contends that a 'classical empiricist' philosophy of science is responsible for the belief that manuscripts and proposals can be evaluated according to standard, objective criteria (which in tum should give rise to reliable and unbiased judgments). Drawing on work in the history, philos ophy and social study of science, Faust demonstrates the 'descriptive inade quacy' of this view. In its place he would substitute a pluralistic view, thus recognizing that scientists employ different criteria to judge the quality of a completed piece of work or the prospects for a proposed project. I substantially agree with Faust's position, but I find his view of pluralism insufficiently social and his commitment to the idea somewhat inconsistent. Faust's view is insufficiently social because there is widespread agreement among analysts that consensus in science is low - this seems to be what Faust means by pluralism - but very sharp disagreement about whether there are social patterns to the distribution of consensus; that is, whether certain fields or research areas (or the research front in contrast to the core) differ in

54

Edward J . Hackett

consensus. Thus the mere recognition of disagreement without an effort to identify its systematic dimensions does not take us very far. 1 1 Faust's commitment is inconsistent because Faust believes there are 'groups of researchers/referees who should agree with each other' because they are members of a research community. 1 2 But if science is inherently pluralistic, as Faust seems to contend, and if the level of consensus is low (especially at the research front or in certain fields at certain times), then no meaningful subdivision will yield such a group of agreeing scientists. To .the contrary, scientists are likely to disagree, and the problem for science and science policy is to make good use of this disagreement, not to treat it as aberrant or embarrassing (or to devise mechanisms for eliminating it). Indeed, if Cole is correct that consensus is always low at the research front, then no method for identifying research communities, however 'sophisti cated' , will uncover consensual communities. 1 3 Instead, high consensus would characterize non-research communities, because scientists who do no research - who teach, say, using similar textbooks at liberal arts colleges - would be much more likely to agree than scientists engaged in creating and criticizing new knowledge. This is a very important point for those concerned with peer review policy because much energy has been expended on methods for 'homogenizing' peer ratings, whereas we may be better served by strategies for making better use of the inherent and valuable variability in those ratings. Similarly, to assert that 'the persuasiveness of knowledge claims, and not the dictates of one group over others, must determine scientific choice, action, and belief' seems somewhat optimistic in light of actor-network theory and its inclusive model of how scientific knowledge claims are advanced and supported. 1 4 While I am not an avid fan of actor-network theory, it makes strong and consequential claims that deserve attention. In this case, the persuasiveness of knowledge claims and their effect on scien tific choice, action and belief probably depend in part on characteristics of the groups in contention. Our theories of peer review should take this into account. The pluralistic nature of peer review is revealed in its varied stakeholders, purposes, and the values it is intended to serve. Disagreements may arise because different stakeholders are enacting different values in their decisions. Similarly, in discussions and analyses of peer review, misunder standings may arise when participants disagree about their implicit defini tions and performance criteria. Any theory about the place of peer review in contemporary science should begin with an understanding of these and proceed by explaining how one or another configuration comes to ascen dance in any particular instance. These papers on peer review, not unlike others I might have chosen, are generally well intentioned, clearly argued and adequately supported by the

Peer Review in Science and Science P olicy

55

evidence they marshal] (thought some are a bit selective in their use of evidence). But they share an ad hoc quality: none proposes a model of the peer review system, identifies its stakeholders, describes its purposes and delineates the inconsistent values it serves. As a contribution to this endeavor, let me conclude by sketching the elements of such a model and suggesting some of the consequences of taking a more comprehensive view of peer review.

What is Peer Review? Too often, peer review is viewed only as a mechanism, a set of practices and principles for allocating rewards and resources. Faust argues, and I agree, that it is a dire oversimplification to reduce peer review to a measuring and allocation system that should be reliable and unbiased. But his focus on variation among individual scientists, which he calls pluralism, and his concern to devise a system that can measure the 'true' worth of a paper or proposal, do not sufficiently capture the complexity of peer review, its diverse stakeholders, and the inconsistent demands they place on it. For example, as varied as psychologists and their ' epistemic values' might be - some may favor observational studies, while others insist on experiments; some are clinicians, others rat-runners - they are only one sort · of scientist, and scientists are but one of several stakeholders in the peer review system. A more comprehensive view of the pluralism of sci ence would recognize that decision makers and politicians, various publics and scientists from other fields, as well as the full range of scientists within a field (with their varied substantive and epistemic tastes) are all stakeholders in the peer review system, each emphasizing a different facet of peer review and each giving greater emphasis to one or another desirable property of the system. A partial list of the functions of peer review, ordered roughly from the concrete to the abstract, might include the following: 1 A mechanism for improving research, both proposals and manuscripts, by providing expert advice from scientists to scientists. 2 A forum for establishing priority claims and determining research priori ties within research areas. 3 A counterweight to the drive for originality in science, organi zing and focusing scientists' skepticism about new knowledge claims to ensure proper recognition of prior work; in brief, it is a locus for enacting Kuhn's 'essential tension' between tradition and innovation. 1 5 4 A procedure for allocating the scarce resources of research support, journal space, and recognition.

56

Edward J . Hackett

5 A communication system that circulates scientists' research plans, increases the receptivity of others to the forthcoming results, and provides reassurance and confidence to the proposing scientist. 6 An entry point for non-scientific considerations to influence science in a limited and controlled fashion. 7 A quality control system for assuring non-expert 'users' of published material that the work meets professional standards. 8 An assertion of professional authority and autonomy that keeps the laity at bay. 9 A symbolic expression of the ideal of community participation in knowl edge production that contributes to a sense of collective purpose and self-determination - a fountain of legitimacy for scientific work. 10 A ceremony or ritual that affirms public trust in experts (and experts' trust in each other). If peer review in fact serves these diverse purposes for a range of stake holders, then it is quite distorting to concentrate research solely on its flaws and virtues as a measurement and allocation system. Overemphasizing some purposes at the expense of others may have the unintended consequence of unbalancing the system.

Desiderata Matters grow more complicated when we consider not only the stakeholders and purposes of peer review but also the values or demands placed on it. These 'desiderata', which are as diverse as the stakeholders and purposes of peer review, can be expressed as a set of five value pairs, with each pair representing two inconsistent but desirable properties. Importantly, there are tensions across value pairs as well as within pairs. Effectiveness-efficiency This is the principal value tension. On the one hand, peer review is supposed to be effective in two senses: (a) to do a good job selecting the 'best' proposals and manuscripts while excluding (or improving) unworthy ones; and (b) to contribute to the goals of science and society through the selected proposals and manuscripts. That is, peer review is expected to be an effec tive selection procedure and to promote effectiveness through the work selected. Within the domain of effectiveness there is a further tension between sensitivity and selectivity (or, to statisticians, Type I and Type II error): a selection system that generously includes every paper or proposal with some

Peer Review in Science and Science Policy

57

merit is quite likely to include a few duds, yet a system that assuredly excludes all dubious material will probably discard some good work as well. Peer review is often damned both for the weak work admitted and for the excellent work excluded, but it is seldom recognized that these are inconsis tent demands. On the other hand, the peer review system is expected to be efficient, costing little to operate and conserving resources through shrewd alloca tions (for example, by avoiding redundancy). But increases in efficiency are often attained at the expense of effectiveness. For example, a peer review system that is efficient in its own operations might provide only cursory examination of papers and proposals (thus placing modest demands on reviewers' and referees' time), yet such a system is unlikely to be very effective as a selection tool - unless, of course, peer judgments are no more accurate than chance, or all material is equal in quality and the review process itself has no effect on subsequent quality. A peer review system designed to promote efficiency through the work it supports would do well to fund only the single 'best' project of a given type or publish the 'one correct' solution to a problem. But it may be more effective (and expedi tious) to support and publish complementary works. Plainly, both efficiency and effectiveness, both sensitivity and selectivity, cannot simultaneously be maximized. There are several additional value tensions, less prominent than the foregoing but equally worthy of attention and equally suggestiv� for research and policy. Accountability-autonomy Peer review makes scientists accountable to their peers: they must explain their plans and expectations in a proposal, their methods, results and conclu sions in a paper; and their past performance may enter into reviewers' evalu ations. But at the same time that peer review scrutinizes scientists' work, it also serves as a mechanism of professional self-regulation that affords scien tists a degree of autonomy from scrutiny by the public at large. Responsive-inertial Peer review is expected to be responsive to new ideas and new national needs, bestowing the authority of expert judgment upon novel results or ideas and, reciprocally, translating social needs into scientific priorities. But at the same time peer review is an inertial or conservative force - a flywheel - imparting stability and continuity to the scientific enterprise by testing innovative ideas against the body of received knowledge. This is the 'essen tial tension' between tradition and originality, played out somewhat openly

58

Edward J . Hackett

in the peer review system and, through preparation for that review, internal ized by the scientist writing the paper or proposal. Meritocratic-fair Peer review is expected to be meritocratic, judging proposals, manuscripts and scientists according to strict criteria of merit. But peer review is also expected to follow societal norms of fairness, which may give lesser emphasis to merit. For example, merit may conflict with fairness when scientists' characteristics (including prestige, influence, ethnicity, age, geographic location and sex), political considerations, or national needs and priorities enter into a decision. The legitimacy of the peer review system depends on these values. An inscrutable but effective system will probably not work for very long. Reliable-valid As a measuring system, peer review is asked to be reliable and valid, meaning that it measures quality with little random error and with little systematic error (or bias). But in practice reliability and validity are hard to achieve at the same time. Narrow, rigid, quantifiable criteria may contribute to reliability (because they can be applied again and again by different raters with quite consistent results), but these criteria may not accurately reflect the 'true' worth of a piece of work - that is, they may have low validity. Introducing an author 's (or proposer 's) personal characteristics (such as prestige, track record and the like) or the reviewer 's 'hunches ' about the value of a piece of work may increase validity at the cost of reliability.

New Questions for Research The value of this framework for thinking about peer review resides chiefly in its usefulness as a guide to research and policy. Among the questions raised are the following: 1 How do stakeholders differ in their definitions and demands of peer review? How are these differences expressed and resolved? Of special interest is the exercise of power in determining the 'true' purposes of peer review. For example, members of Congress might argue that the grants peer review system is insufficiently selective, funding too much mediocre work. Scientists, offering counterinstances of good ideas gone unsupported, might find the system somewhat insensitive to quality. No number of cases in point mustered by either side can resolve this disagreement. One must either make

Peer Review in Science and Science Policy

59

a decision in principle or must agree on outcome criteria and test the conse quences of various levels of sensitivity and selectivity through an experi ment. I've merely asserted that different stakeholders have different conceptions of peer review and different values. That assertion remains to be empirically examined. 2 Are there peer review systems - different agencies or programs within an agency - that enact different values? With what consequences for selec tion decisions and project outcomes? 3 Would different sorts of research or different research areas benefit from a peer review system that embodied a different configuration of values? For example, are problems characterized by urgency and uncertainty better addressed by a peer review system that sacrifices efficiency for effec tiveness? Are predictable problems with low urgency better addressed by a single project? Too often this trade-off is not recognized, let alone examined. 4 Are there performance differences among scientists under different levels of accountability and autonomy? The Pew Scholars program and the Howard Hughes Medical Institute Investigators program select 'good scien tists', affording them the freedom to choose their own research projects. While this is an affirmation of their ability, it fails to provide specific feedback and approval of research plans. In contrast, a traditional NIH (National Institutes of Health) grant, when awarded after the rigors of peer review, bears the approval of a panel of scientists - an imprimatur that brands the project 'do-able' and that may elicit greater effort, perseverance and commitment from the investigator. 5 What are the symbolic merits of peer review? In what ways does it embody values important for the working of science, regardless of its practical utility as a selection mechanism? Trial by a jury of one's peers is a cherished right in the United States and elsewhere, yet it is also an ineffi cient fiction. Jurors are seldom true 'peers' of the accused - similar in social characteristics, background and experience - and they are seldom well prepared for the legal and technical aspects of their task. But their presence has an effect on the workings of the court, on the nature of arguments and evidence marshaled by the parties involved in the proceeding, and on our view of the legitimacy of the legal system. In a similar fashion, the peer review system in science embodies certain values and influences the thinking of scientists in ways that are probably not totally dysfunctional.

Notes a n d References 1 Portions of this chapter appeared in a paper presented at the annual meeting of the Society for Social Studies of Science, Gothenburg, Sweden, August 1992. 2 Amitai Etzioni, 'Eastern Europe: The Wealth of Lessons'. Paper presented at a conference on 'Interdisciplinary Approaches to the Study of Economic Problems',

60

Edward J . Hackett

Stockholm, 1 99 1 . Reprinted by the Socio-Economic Project, Georgetown University. 3 See Daryl E. Chubin and Edward J. Hackett, Peerless Science: Peer Review and US Science Policy (Albany, NY: State University of New York Press, 1 990). 4 Robert Bell, Impure Science: Fraud, Compromise, and Political Influence in Scientific Research (New York: Wiley, 1992). 5 Peter A. Abrams, 'The Predictive Ability of Peer Review of Grant Proposals: The Case of Ecology and the US National Science Foundation' , Social Studies of Science, Vol. 21 ( 1 99 1 ), pp. 1 1 1-32. 6 Abrams, ibid., p. 1 28 7 Ibid. , pp. 1 26-7. 8 See Chubin and Hackett, Peerless Science, op. cit., chapter 4; Dominic V. Cicchetti, 'The Reliability of Peer Review for Manuscript and Grant Submissions: A Cross Disciplinary Investigation' , The Behavioral and Brain Sciences, Vol. 14 ( 1 99 1 ), pp. 1 1 9-34; and David Faust, 'Research on the Process of Journal Review Re-Viewed' , Social Epistemology, forthcoming 1 997. 9 Von Bakanic, Clark McPhail and Rita Simon, 'The Manuscript Review and Decision-Making Process' , American Sociological Review, Vol. 52 ( 1 987), pp. 63 1-42. IO See David Faust, 'Research on the Process of Journal Review Re-Viewed' , op. cit. 1 1 The matters at issue in the exchange are too intricate and technical to summarize here. For a quick immersion in the dispute about consensus that makes specific refer ence to the review process, see Lowell Hargens, 'Scholarly Consensus and Rejection Rates' , American Sociological Review, Vol. 53 ( 1 988), pp. 1 39-5 1 ; Lowell Hargens and Jerald Herting, 'Neglected Considerations in the Analysis of Agreement among Journal Referees' , Scientometrics, Vol. 1 9 ( 1 99 1 ), pp. 9 1 - 1 06; Stephen Cole, Gary Simon and Jonathan R. Cole, 'Do Journal Rejection Rates Index Consensus?' American Sociological Review, Vol. 53 (1 988), pp. 1 52-6. 1 2 Faust, op. cit., p. 28. 13 Stephen Cole, 'The Hierarchy of the Sciences?' American Journal of Sociology, Vol. 89 ( 1 983), pp. 1 1 1 -39. 14 Michel Callon, John Law and Arie Rip (eds.), Mapping the Dynamics of Science and Technology (London: Macmillan, 1 986) ; Bruno Latour, Science in Action (Milton Keynes: Open University Press, 1 987); John Law, 'The Anatomy of a Socio Technical Struggle' , in B. Elliott (ed.), Technology and Social Process (Edinburgh: Edinburgh University Press, 1 988), pp. 44-69. 1 5 Thomas Kuhn, 'The Essential Tension: Tradition and Innovation in Scientific Research' , in Kuhn, The Essential Tension: Selected Studies in Scientific Tradition and Change (Chicago: University of Chicago Press, 1 977) pp. 225-39.

5

A POLIS H PERS PECTIVE ON PEER REVIEW Ada m lomnicki

In order to understand peer review mechanisms one has to take into account some peculiarities of science: scientists evaluate themselves, police themselves, but are unable to finance themselves. In other areas of human activity, a buyer controls the quality of the products that he or she buys. In science, the control of quality is left to the producers themselves. This raises the question of why scientists are usually honest and do their best to make products of the highest quality.

Why Are Scientists Trustworthy? It is generally believed that scientists, unlike many other professionals, exhibit high moral standards and that they are an exceptional group of people for whom the search for truth is more important than personal career or profits. Close analysis - for example, that made by Hull 1 - does not confirm this idealistic view. Consider English-speaking countries, where the largest scientific communities are to be found and where, it is generally agreed, most of the best-quality science is done. In these countries, the personal success of a scientist, including financial success, depends on his or her personal contribution to the accepted body of scientific knowledge. That contribution can be evaluated by the opinions expressed by other scientists or by the frequency with which they cite his or her papers. This is confirmed by the strong correlation between these two measures: other scientists' opinions and citation indices. 2

62

Adam lomnicki

Hull convincingly argues that scientists' behavior is based not on high moral standards but on their self-interest in increasing their own individual contribution to science. It pays to cooperate with others because their knowl edge can be valuable to us. It pays to work among bright and clever colleagues because it increases our own chances of making an important discovery. It pays to read what others have written in order to avoid carrying out research that has already been done and to avoid mistakes arising from failure to take account of recent advances in the field. It does not pay to avoid citing important papers of others because we would be considered ignorant of advances in our field, and our manuscripts would not be accepted for publication. And most importantly: it does not pay to cook up data and cheat because, if this is discovered, all of our other contributions, including those we make in the future, will be ignored. They will be ignored not because scientists detest cheaters, but because denouncing a fraud is in everybody's personal interest. Two phenomena seem to confirm that scientists' behavior is determined not by high moral standards but by their own personal interests. First, uneth ical behavior, such as appropriating the data and concepts of our assistants and students, is never condemned as strongly as is fraud and the fabrication of data in science. Theft of others' scientific contributions evokes moral indignation on the part of those whose work was appropriated, but to a third party it usually does not matter who was the real author of the contribution, as long as this contribution is valid and important. When asked, scientists condemn theft in science, but their condemnation and ostracism are not as great as in the case of fraud. 3 Second, the authors of poor papers are usually not named and condemned but simply ignored. Scientists are not much concerned that there are others who do very bad science, as long as this does not interfere with their own work. Scientists criticize others only if they are asked to do so or if the paper concerned was published in such a high quality and well known journal that it pays to criticize it in order to show their own competence. To conclude, scientists do not have to be moral creatures, and their behavior does not need to be altruistic, in order for them to do good science. The behavior of scientists that allows for cooperation among them, high quality of research, and progress in science, is stable in the sense of an evolutionary stable strategy in biology: anyone who does not behave like the others loses out. The attempts to develop a moral code for scientists, which are undertaken from time to time, would be futile if the scientific community did not function according to the rules described above. A moral code and moral principles cannot be introduced unless science is managed in such a way that scientists are rewarded for behavior that fosters the development of science and punished for behavior that hinders it. These comments relate to pure science. In the arts and humanities, the

A Polish Perspective on Peer Review

63

fields of study are often small and local, so the scholars working in these fields do not form one worldwide scientific community. In the arts and humanities, the perception of knowledge by the general public is more i mportant than it is in science; therefore the mechanisms that function here can be quite different. On the other hand, in technology and applied sciences, financial agencies know what they are paying for: new technology that sooner or later should yield a calculable profit. Since technological secrets and patents are more i mportant here than individual contributions to accepted knowledge, scientists are evaluated not according to their contri butions but by the profits directly resulting from their activities. I do not know how the mechanisms of peer review function either in the arts and humanities or in applied science, and I will not discuss these questions here.

Why Does It Pay to Write Trustworthy Peer Reviews in English-Speaking Countries? I will consider here three kinds of peer reviews: manuscripts of papers and books submitted for publication; grant proposals; and recommendations and opinions concerning professorships and other positions in science. The peer review system has recently been criticized, but I will leave this cri tique to my American and West European colleagues. However, in order to present the peer review system from my perspective, I have to recall my own experience in writing reviews for, or receiving reviews from, English-speaking countries. (My comments about English-speaking countries are also true for Scandinavian and other countries that have adopted American and British standards in science.) Without this experi ence, I would not know what is wrong with the peer review system i n Poland. While one may consider my view too idealistic, I want t o present i t a s a point for comparison. In these countries it really does not pay to write a deceitful review, but it does pay to write frankly and honestly. Why? The key factor is the size of the scientific community, the fact that I am one of many research workers within the same or similar fields. Because of this, not only are the names of reviewers confidential, but the person who is the subject of the review usually has only a small chance to retaliate. Therefore, it is safe to write frank reviews. I can also expect that there will be a second reviewer and that, if we differ in our opinions, a third one will be called in. If my review is poor, dishonest or biased, I will be considered ignorant and will not be asked to write any more reviews. Consequently, I will lose a good way to learn what is going on in my field at least a year before the reviewed manuscript is published or several years before the results of the proposal

64 Adam lomnicki

are completed and published. I also take into account the fact that the editors and other persons asking for a review are important and influential, so the acceptance of my manuscripts and my grant proposals may one day depend on them. Their main problem is a surplus of manuscripts, grant proposals and applications, and therefore they expect me to be both critical and discriminating. I can gain their respect and goodwill by fulfilling their expectations. A problem arises when the author of a poor manuscript cites my own publications plentifully. What should I do in such a case? Write a frank review and lose citations, or push a manuscript for publication in spite of its low quality? Taking into account the fact that there are also other reviewers and that the editor has some knowledge of the subject, I can be only slightly biased. If the manuscript is really poor, it will not be published even with a positive review, so by writing a biased review I can lose both citations and my reputation. Reviewing grant proposals poses a different problem. Usually, we receive proposals from within our own field, which is the field we consider the most important. We also know that there is strong competition, so that only proposals that are graded excellent or almost excellent will be funded. For that reason, there is a tendency among reviewers to award proposals high marks. On the other hand, here too the existence of other reviewers makes exaggeration impossible. If the reviews are not anonymous, as is sometimes the case when a position or a manuscript of a book is concerned, there is a tendency among reviewers to be less sharp. However, if a book or a candidate for a position is really poor, one does not lose anything by writing frankly, as long as the author of the book or the candidate in question lives far away, and this is usually the case in countries with a large scientific community. For a visitor from Poland, the most astonishing behavior of American and British scientists is their tendency to hire bright young men and women and to get rid of mediocre old colleagues. I think this is because of the external funding of research by grants, which means that there is no competition for resources within university departments and other scientific institutions. An outstanding scientist competes with others for laboratory and office space only, but he or she is able to attract so much funding that it will be possible to acquire more space. The profits attributable to the presence of an outstanding colleague, even one who is not good-natured, are enormous. To generate important discoveries, we need good partners for discussion and cooperation, with a similar interest but of different academic background. An outstanding scientist raises the prestige of an entire department and attracts funds, good students and interesting visitors. It is in the selfish interest of everybody to hire such a person. For the reasons presented above, I do not need to be a moral creature

A Polish Perspective on Peer Review

65

when writing peer reviews for abroad. But I do have to invoke my moral instinct when writing such reviews in Poland. Why ?

W h y Does It Not Pay to Write Frank Reviews in Poland? First, in Poland there are only relatively few scientists in each field, and we therefore know each other quite well. Moreover, peer reviews are only written by professors and other senior scientists. Young scientists, who are usually very active and fault-finding, almost never serve as reviewers. Thus the group of reviewers is even smaller than it might be. Even if the name of the reviewer is supposed to be kept confidential, it is usually very easy to find out who wrote a review. Writing frank and critical reviews is the best way to make enemies among those who sooner or later will review a manuscript or grant proposal of mine, my friends or my students. To do so is obviously self-destructive behavior. One has to take into account the fact that writing critical and frank reviews for Polish journals does not make too much sense either. Many journals suffer from a shortage rather than an excess of manuscripts, so no more than 10 per cent of manuscripts are rejected. The editors want to avoid making enemies for themselves. Therefore, when they have too many manuscripts and are faced with the choice between rejecting a manuscript or publishing it in several years, they tend to choose the latter alternative. Second, we still suffer from the legacy of the totalitarian system, not only in the economy and in administration but also in science. For a long time, the communist ruling group tried to control all areas of human activity. Nothing was left to the free market or clearly defined rules of law. The best example of this attitude is what Joseph Goebbels once said: 'I decide who is a Jew here. ' What our rulers really feared was an independent person with high authority. Although the scientific establishment was independent to a certain degree, this was a controlled independence. The ruling group could refuse to issue passports or allocate funds, and it controlled all senior positions in science and education. As a result of interaction between rulers and scientists, a hierarchical system was established in science. In this system, scientific achievements were important, as was the ability to be obedient and to remain on good terms with the rulers. It paid to exhibit both a moderate opportunism and a strong sense of solidarity with one's fellow scientists. In such a scientific community there was room neither for strong competition nor for rapid promotion of young scientists, but the status of senior scientists was high and their position secure. The only breach in this system was introduced by young researchers who returned to Poland after a lengthy stay abroad. If they had been successful abroad, their status was also relatively high at

66 Adam lomnicki home. Nevertheless, the criteria used to evaluate scientists were never clearly developed, and the ruling group, as well as - I am afraid - the scien tific establishment, did everything to prevent the application of any objective criteria. Peer reviews and an elaborate system of titles and degrees were the only criteria in use. With the end of communism, the ruling group was replaced by the self government of scientists. The basic feature of this system is solidarity among scientists; there is still no room for competition or any objective evaluation. A good American, British or Scandinavian scientist is self-confi dent. He knows that there will always be a university or some other institu tion where he may find a job and research funds. He can allow himself to express his views freely, including those from outside of his field of study. In Poland, nobody seems to be really confident and secure. Relatively low wages and housing difficulties are major reasons why scientists spend all their lives in the same city and often in the same institution. A large propor tion of science funding is still being distributed not as individual grants but as block grants to support scientific institutions. An outstanding scientist with whom we have to share limited resources is therefore a disaster. Since universities and other institutions are autonomous, the best way to secure good conditions for research and a quiet life is to be on good terms with one's colleagues. This can be achieved by avoiding frank discussions, critiques, objective evaluation of others, and by arranging the distribution of all available funds equally among everybody. What I have said here concerns senior scientists. The younger ones have very little decision making power, but as members of trade unions they can fight for job security and against high standards. In certain respects, the situation now is worse than under the communist regime. A university rector or the director of some other scientific institution who was not elected but appointed to his position with Party approval was sometimes more free and independent than he is now. If he wanted, he was able to promote outstanding scientists and to eliminate mediocre ones. Now, scientific institutions are self-governed by senior scientists, and they decide who will be promoted and who will receive funds to carry out research. Theoretically, the democratic system should function better than the authori tarian one, but in science this is not necessarily the case. With the end of communism, confidential recommendations for professorships have disap peared in Poland. 4 This has almost eliminated unfavorable recommenda tions, and consequently anybody who is the right age and maintains good relations with his or her colleagues can become a professor. Third, another legacy of the old system is the large number of mediocre scientists who publish only in local journals with a circulation of less than 500 or sometimes even 100 copies. These scientists do not know, and refuse to accept the fact, that their research is far removed from the mainstream of

A Polish Perspective on Peer Review

67

science. Their attitude used to fit very well with communist isolationism, with the idea that we are surrounded by a hostile world and that we have to develop our own science, independently of others. People with such an attitude towards science did not disappear with the end of communism. Ironically, they are still influential within the scientific establishment. At a meeting of the Central Qualifying Commission, the body that ratifies all professorships, I was astonished to hear one of its members claim that scien tists should not be evaluated by the quality of journals in which they publish - and nobody in the room contradicted him. We must also keep in mind that, in Poland, as in many other European countries, there is an older legacy of nineteenth-century German universi ties, a tradition that has not been completely eliminated and to which many scholars would like to return. This tradition of university teaching and of science management does not allow for the efficiency in doing science in the second half of the twentieth century that one sees in American and British universities. Taking all the above into account, one has to be an absent-minded idealist, with no respect for one's own interests, to write honest, frank and critical reviews in Poland. Most reviews are positive, and reviewers do their best to show that a manuscript or a grant proposal is excellent and that a given candidate is the best one for the job. A friend of mine who serves on one of the panels of the Committee for Scientific Research estimates that only one third of reviews of grant proposals are critical and honest. From my own experience, I think that only from 1 to 5 per cent of reviews for professor ships are negative. Those who try to be honest simply refuse to write a review if the work or the candidate is poor. Others do strange things: they write a very critical review, but at the end they conclude that the manuscript should be published, the grant proposal accepted, Ph.D. or professorship conferred. Sometimes, when the poor quality of a manuscript or a grant proposal is obvious to everybody, an honest review is given, but this is not always the case. A wise man writes a review like an obituary: either favor ably or not at all. 5

Who Are the Losers in the Present Syste m ? The lack of clear and obj ective criteria to evaluate scientists and the failure of the peer review system are not problems for senior scientists. Provided they represent very high standards and are internationally recognized, they will also have a high status within the country. They do not need obj ective criteria. The real beneficiaries of the system are mediocre persons old enough to have a high position and who are therefore members of self governing bodies in scientific institutions. This system is stable and immune to any changes.

68

Adam lomnicki

The real victims of the present system are young scientists. They perceive science as a game without any rules, in which individual contributions to scientific knowledge are of minor importance. The most important thing is to be polite and obedient to those with the power to decide one 's future. Those young scientists who were successful abroad and who have become accustomed to strong competition on the basis of clear evaluation criteria try to go abroad again. They do this also because of low salaries and lack of housing, but I think that a major reason is that they see no goo