Changing Educational Assessment: International Perspectives and Trends 9780415675383, 0415675383

Assessment is a key area of interest and debate in education. Its increased use by governments as a powerful means of in

238 76 3MB

English Pages 236 [248] Year 2012

Report DMCA / Copyright

DOWNLOAD PDF FILE

Table of contents :
CHANGING EDUCATIONAL ASSESSMENT
International perspectives and trends
Copyright
Changing educational assessment
International perspectives and trends
Copyright
Contents
List of illustrations
List of contributors
Introduction
I
Comparative perspectives on educational assessment
1 The role of assessment, re-examined in international context
2 Trends in the assessment of teaching and learning: educational and methodological perspectives
3 Reshaping the standards agenda: from an Australian's perspective of curriculum and assessment
4 National assessment: a comparison of English and American trends
5 Possibilities and limitations in cross-national comparisons of educational achievement
II
Comparative perspectives on public examinations
6 Trade-offs in examination policies: an international comparative perspective
7 Examination systems in Africa
8 The introduction of continuous assessment systems at secondary level in developing countries
9 Exam questions: a consideration of consequences of reforms to examining and assessment in Great Britain and New Zealand
10 Bring your grandmother to the examination: Te Reo O Te Tai Tokerau Project, New Zealand
11 The GCSE: promise vs. reality
III
Selection, certification and the accreditation of competence
12 University entrance examinations in China: a quiet revolution
13 Learning motivation and work: a Malaysian perspective
14 Assessment, certification and the needs of young people: from badges of failure towards signs of success
15 Beyond commissions and competencies: European approaches to assessment in information technology
Index
Recommend Papers

Changing Educational Assessment: International Perspectives and Trends
 9780415675383, 0415675383

  • 0 0 0
  • Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up
File loading please wait...
Citation preview

CHANGING EDUCATIONAL ASSESSMENT International perspectives and trends Edited by Patricia Broadfoot, Roger Murphy and Harry Torrance

ROUTLEDGE LIBRARY EDITIONS: EDUCATION

ROUTLEDGE LIBRARY EDITIONS: EDUCATION

CHANGING EDUCATIONAL ASSESSMENT

CHANGING EDUCATIONAL ASSESSMENT International perspectives and trends

Edited by PATRICIA BROADFOOT, ROGER M U R P H Y A N D HARRY TORRANCE

Volume 36

First published in 1990 This edition first published in 2012 by Routledge 2 Park Square, Milton Park, Abingdon, Oxon, OX14 4RN Simultaneously published in the USA and Canada by Routledge 711 Third Avenue, New York, NY 10017 Routledge is an imprint of the Taylor & Francis Group, an informa business © 1990 British Comparative and International Educational Society All rights reserved. No part of this book may be reprinted or reproduced or utilised in any form or by any electronic, mechanical, or other means, now known or hereafter invented, including photocopying and recording, or in any information storage or retrieval system, without permission in writing from the publishers. Trademark notice: Product or corporate names may be trademarks or registered trademarks, and are used only for identification and explanation without intent to infringe. British Library Cataloguing in Publication Data A catalogue record for this book is available from the British Library ISBN 13: 978-0-415-61517-4 (Set) eISBN 13: 978-0-203-81617-2 (Set) ISBN 13: 978-0-415-67538-3 (Volume 36) eISBN 13: 978-0-203-80869-6 (Volume 36) Publisher’s Note The publisher has gone to great lengths to ensure the quality of this reprint but points out that some imperfections in the original copies may be apparent. Disclaimer The publisher has made every effort to trace copyright holders and would welcome correspondence from those they have been unable to trace.

Changing educational assessment International perspectives and trends

Edited by Patricia Broadfoot, Roger Murphy and Harry Torrance for the British Comparative and International Education Society (BCIES)

ROUTLEDGE

R London and New York

First published 1990 by Routledge 11 New Fetter Lane, London EC4P 4EE Simultaneously published in the USA and Canada by Routledge a division of Routledge, Chapman and Hall, Inc. 29 West 35th Street, New York, NY 10001 © 1990 British Comparative and International Educational Society Printed and bound in Great Britain by Biddies Ltd, Guildford and King’s Lynn All rights reserved. No part of this book may be reprinted or reproduced or utilized in any form or by any electronic, mechanical, or other means, now known or hereafter invented, including photocopying and recording, or in any information storage or retrieval system, without permission in writing from the publishers. British Library Cataloguing in Publication Data Changing educational assessment : international perspectives and trends. 1. Education. Assessment I. Broadfoot, Patricia II. Murphy, Roger III. Torrance, Harry IV. British Comparative and International Education Society 379.154 ISBN 0-415-05293-9 Library of Congress Cataloging in Publications Data 0-415-05293-9

Contents

List of illustrations List of contributors Introduction I

vii ix 1

Comparative perspectives on educational assessment 1

The role of assessment, re-examined in international context Angela Little

9

2

Trends in the assessment of teaching and learning: educational and methodological perspectives Carol Anne Dwyer 23

3

Reshaping the standards agenda: from an Australian's perspective of curriculum and assessment David Cohen 32

4

National assessment: a comparison of English and American trends Caroline Gipps 53

5

Possibilities and limitations in cross-national comparisons of educational achievement Les McLean

65

II Comparative perspectives on public examinations 6

7

Trade-offs in examination policies: an international comparative perspective Harold J. Noah and Max A. Eckstein

84

Examination systems in Africa Thomas Kellaghan

98 v

Contents 8

The introduction of continuous assessment systems at secondary level in developing countries 106 David Pennycuick

9

Exam questions: a consideration of consequences of reforms to examining and assessment in Great Britain and New Zealand Tony McNaughton 119

10 Bring your grandmother to the examination: Te Reo O Te Tai Tokerau Project, New Zealand Paul Rosanowski 136 11 The GCSE: promise vs. reality Desmond L. Nut tall

143

III Selection, certification and the accreditation of competence 12 University entrance examinations in China: a quiet revolution Keith Lewin and Wang Lu 13 Learning motivation and work: a Malaysian perspective Jasbir Sarjit Singh, T. Marimuthu and Hena Mukherjee

153

177

14 Assessment, certification and the needs of young people: from badges of failure towards signs of success Penelope Weston 199 15 Beyond commissions and competencies: European approaches to assessment in information technology Alison Wolf 207 Index

vi

224

Illustrations

Tables 1 2 3 4 5 6

The role of assessment by level of analysis The role of assessment by seven levels of analysis Inter-scale correlations Scale means values Relationship between learning motivation, work orientation and work behaviour Relationship between nature of task, organizational orientation, work orientation and work behaviour

14 18 187 189 193 194

Figures 1 2 3 4

Map of assessment functions Relationship between intrinsic and extrinsic learning motivation scales Relationship between learning motivation, work orientation and work behaviour Alternative tracks in the French educational system

114 186 192 216

vii

Contributors

Dr David Cohen was formerly Associate Professor of Education at Macquarie University and is now a freelance educational consultant based in Melbourne. Dr Carol A. Dwyer is Senior Development Leader at the Educational Testing Service, Princeton, New Jersey. Professor Max A. Eckstein is Professor of Education at Queen's College of the City University of New York, and Research Associate at the Institute of Philosophy and Politics of Education, Teachers College, Columbia University, New York. Dr Caroline Gipps is Lecturer in Education at the Curriculum Studies Department at the University of London Institute of Education. Thomas Kellaghan is Director of the Educational Research Centre at St Patrick's College, Dublin. Dr Keith Lewin is Reader in Education at the University of Sussex. Professor Angela Little is Professor of Education (in developing countries) at the Department of International and Comparative Education, University of London Institute of Education. Professor Leslie D. McLean is Executive Head of Sponsored Research and Professor at the Department of Measurement, Evaluation and Computer Applications at the Ontario Institute for Studies in Education, Toronto. Professor Tony McNaughton is Emeritus Professor of Education at the University of Auckland, New Zealand. Professor Harold J. Noah is Gardner Cowles Professor Emeritus at Teachers College, Columbia University, New York and Professor at the University of Buffalo, State University of New York. Professor Desmond L. Nuttall is Director of the Centre for Educational Research at the London School of Economics. David Pennycuick is Lecturer in Education and Chairperson ix

Contents of the Centre for International Education at the University of Sussex. Dr Paul Rosanowski was involved in evaluating the ‘mothertongue’ assessment projects based at the University of Auckland. Dr Jasbir S. Singh is Chief Project Officer at the Education Programme, Human Resource Development Group at the Commonwealth Secretariat, London and was formerly Professor of Sociological Studies in Education at the University of Malaya, Kuala Lumpur. Wang Lu is Lecturer in Education at the Institute of Foreign Education, Beijing Normal University, Beijing. Dr Penelope Weston is Deputy Head of the Department of Evaluation and Policy studies at the National Foundation for Educational Research, Slough. Dr Alison Wolf is Senior Research Officer and Research Lecturer at the Department of Mathematics, Statistics and Computing, University of London Institute of Education.

x

Introduction

All the chapters in this book were originally presented as papers at the twenty-third annual conference of the British International and Comparative Education Society held at the University of Bristol in September 1988. The choice of ‘International Perspectives on the Changing Purposes of Educational Assessment’ as the theme of that conference represented a significant reflection of the rapidly growing interest in assessment policy issues that has characterized recent years. Less than a decade ago, scholarly debate about assessment was largely confined to the technical domain of psychometrics and the search for ever more accurate measures of learning achievement. In the last few years, however, the development of quite new approaches to educational assessment and the increasing use of public assessment by governments as a powerful means of influencing educational practice have combined to encourage the emergence of an increasingly substantial literature in the field of assessment policy analysis. It is now rightly recognized that the ‘high stakes’ which tend to be associated with assessment — whether this is in terms of its role in the allocation of individual life chances or in terms of judgements about institutional quality — make it a peculiarly effective means of effecting leverage on the education system. At the most general level this leverage is associated with concern to monitor standards within and between countries to provide reassurance about the effectiveness of the educational system. Where such monitoring reveals areas of concern, more closely targeted assessment policies are likely to be forthcoming aimed at bringing about an improvement. Examples in this respect might be the ‘Teachers Test’ and the minimum competency testing of students now being used in some parts of the United States. Apart from the institution of specific assessments of this kind, more general accountability and, hence, leverage on 1

Introduction institutions within the education system is provided for by the publication of the results of other kinds of assessment such as public examinations. In what in many countries is increasingly becoming a market economy in educational provision in which individual institutions compete to attract clients, assessment information is one of the main currencies of that competition. Small wonder, then, that governments around the world are increasingly turning to assessment policy as a means of bringing about other desired changes in the system. A change in emphasis in the content or skills examined in a particular public examination paper, for example, can be a highly effective means of curriculum development, as the current controversy over ‘Measurement Driven-Instruction’ (MDI) illustrates. By the same token, the imposition of greater national homogeneity in the content of such syllabuses can in turn lead to a greater homogeneity of curriculum provision more generally. While assessment may be a useful source of leverage on the system as a whole, its impact on individual pupils and their attitude to education continues to be a source of grave concern in many countries. The key role that assessment plays in selection makes the public legitimacy and reliability of such procedures an overriding concern and a frequent barrier to the introduction of a range of classroom-based assessment techniques. Little can be done to remove the well-recognized shortcomings of many public examinations in countries where it is this kind of assessment that is to determine the small proportion of a given age-group who will be allowed to continue their studies. In countries where educational opportunity does not need to be so severely rationed, it is still frequently used to support a hierarchy of higher and lower status institutions, with the result that legitimacy and perceived reliability retain their critical importance. Only where the more pressing problem is a shortage of suitably qualified candidates rather than the need to select, are such concerns typically giving way to a greater emphasis on validity and assessment techniques that encourage the development of new skills and competencies among young people. The chapters presented in this volume offer a range of international analyses of such issues. In each case they demonstrate the tension between the social, economic or political purpose which the assessment has been designed to serve and the effects these imperatives have on the process of education itself when translated into specific assessment techniques. In Part I we offer a collection of papers that together provide a conceptual framework for the international 2

Introduction issues in assessment that the book as a whole addresses. New developments in assessment thinking and practice are set in the context of a more theoretical analysis of the role and scope of assessment. The contributions of Part I employ a range of international perspectives to explore some of the most promising and some of the most worrying among current developments in assessment. Many of these issues are addressed in more detail in Part II of the book, which provides a more explicit international focus on the role of public examinations and how traditional priorities and procedures in this area of practice may be changing. The papers in Part III reflect these tensions between the old and the new in even starker contrast. The overriding imperatives of selection in developing countries are set against the more novel emphasis on accrediting absolute rather than relative competencies, which is now increasingly characteristic of many developed, especially European, countries. Taken together this collection offers a topical and virtually unique study of one of the most profound elements affecting all education systems today. The global perspective which the book provides underlines the communality of the issues facing teachers and students, policy-makers and politicians throughout the world as they struggle to reconcile issues of equity and national development, educational imperatives and finite state resources. In the light of the analyses presented there can no longer be any doubt about the significance of assessment as a reflection of these tensions. It is also to be hoped, however, that a greater understanding of these issues will serve to point the way towards some more effective resolution of the problems of assessment in contemporary education systems.

3

I Comparative perspectives on educational assessment

Taken together the papers in this section map out the range of issues that must be addressed in any international study of educational assessment. Not only do we have the conventional questions of who is to be assessed, where, when, how, and for what purpose in any particular context; we must also examine how the answers to these questions differ from one national context to another, to suggest why this should be so and to consider the effects and implications of such differences. To this end, Angela Little’s Presidential Address to the Society, which constitutes the opening chapter of this book, provides us with an invaluable conceptual framework and a series of fundamental principles on which to base such studies. She reminds us that assessment can be both facilitative and inhibitive to educational objectives at any one of several levels. Thus, together with her exhortation to rigour and a respect for context in the way we conduct any such comparative study, this provides an important substantive and methodological framework for the book as a whole. Carol Dwyer's comprehensive review of new developments in assessment identifies a growing concern with the promotion of a greater validity in assessment. Identifying a number of novel contexts in which assessment is now being widely applied — such as the assessment of teachers — she argues the need for a more sophisticated understanding of the interrelationship between different assessment purposes if these are to be harnessed for maximum educational benefit. David Cohen’s hard-hitting denunciation of the standards programme in Australia urges us to resist the reductionism of much current standards testing by reasserting our confidence in what schools have achieved across a much broader range of objectives and the inability of assessment to portray adequately some of the most important of these achievements. The last two papers in the section are specifically concerned with the cross-national comparisons of standards. Caroline Gipps compares the American and English initiatives 7

in this respect in the light of their implications for educational practice in both systems. She sounds a strong warning of the dangers inherent in any such project where the potential diagnostic benefit for teachers is replaced by the use of such results for accountability and overt competition as is already the case in some parts of the United States. Les McLean sounds a rather similar note of warning through his detailed case-study of one of the International Association for the Evaluation of Educational Achievement (IEA) studies. In arguing that policy-makers and tests may in some cases be working from a very different conception of the subject than current professional orthodoxies, he raises the question of the validity and utility of such studies where testing and teaching definitions of competence do not match. Taken together, these papers provide both a comprehensive map of the potential for international studies of assessment and clear testimony to their significance.

8

1 The role of assessment, re-examined in international context Angela Little

It is appropriate — if unusual — to begin a chapter on the role of assessment with a short test which will enable readers to test their knowledge about assessment and examinations. The answer to my first question will be fairly easy for those steeped in the history of examinations and assessment in England and Wales. 1 In which year did the report of the Consultative Committee on Examinations in Secondary Schools in England and Wales provide a summary of the good and bad effects of examinations on pupils which included ‘examinations incite him to get his knowledge into reproducible form and to lessen the risk of vagueness’ on the one hand, and ‘by setting a premium on the power of merely reproducing other people's ideas and other people’s methods of presentment (examinations divert) energy from the creative process’ on the other? The answer to that one is 1911. 2 In which year and in which country did a memorandum submitted to the Morgan Committee point out that in consequence of the affiliation of an educational institution known as The Academy to the Calcutta University the whole curriculum of study is sacrificed to the subjects required for the University Entrance examination, the effect of which is felt even in the lower school, in which it was intended that a practical commercial education should be imparted, without the necessity of advancement to the upper school. But the real fact is that for the small percentage of boys who wish to pass up into the upper school, the whole of the pupils are obliged to prepare for the curriculum of the university; and thus the object of both schools is to a certain extent frustrated? (Quoted in Jayasuriya 1979) The answer to that one is 1867 and the country Sri Lanka, then Ceylon.

9

Changing Educational Assessment 3 In which country and in which subject did the following multiple choice item appear in the National University Entrance Examination in 1987? The candidates were asked to choose one or more correct answers. The relationship between the brain and consciousness is (a) consciousness is a function of the brain. The brain is the biological mechanism of the consciousness. (b) consciousness is a reflection of objective things. The content of consciousness is not determined by the brain. (c) consciousness relies on the brain. The brain is the source of consciousness. (d) consciousness is the product of the brain. The brain can produce consciousness itself. The answer is the People’s Republic of China and the subject politics. And in case you are wondering about the correct answer to the question it is, of course, (b)! 4 In which country did the following item appear in an examination for teacher trainees? A child was asked to work out the following question. 'A water pump takes 2 hours and 55 minutes to fill a tank. If it starts pumping water into the empty tank at 10.50 p.m. at what time will the tank be full?’ The child chose the answer 1.05 a.m. Question: (i) What error did the child make? (ii) How would you show the child that the answer is not L05 a.m.? It is an item from an exam for teacher trainees in the midseventies in Kenya. 5 Finally, in which country would you be able to read the following advertisement in a Sunday newspaper? Parents seek suitable partner of sober habits for their daughter, aged 26, pleasant appearance, gemmologist, Diploma London, highly talented in flower arranging and doll making. Large dowry. The answer to this one is the same as question number two, Sri Lanka. The answer to the first question was fairly easy. Widely quoted in the English literature on examinations, the 1911 10

The role of assessment Report of the Consultative Committee on Examinations has provided a framework for debate about the pros and cons of testing in the teaching-learning situation in England for over seventy years. The backwash of assessment on the process of learning referred to sometimes as ‘putting the cart before the horse’ or ‘letting the tail wag the dog’ has preoccupied educationalists for a very long time. The answer to the second item, 1867, in Ceylon, may have been more surprising. The backwash effects of University Entrance Examinations on secondary school curricula were well articulated in some of the British colonies over 120 years ago; and this phenomenon of examination backwash had been considered by some to have constituted a problem many centuries earlier in Imperial China when examinations were introduced to select Imperial civil servants. In his account of The Schooling of China, John Cleverley notes how, in its earliest form, the examination syllabus included the singing of odes, tested the practical skills of archery and horsemanship, and allowed some individual interpretations of Confucian ideas. But later on it emphasized literary skills, leading to complaints that officials were being selected in accordance with their ability to speak and write rather than to act and apply their understanding of Confucian ideas to government. China was the answer to the third question about the 'correct' relationship between the brain and consciousness. The classification of this question as a politics item in the University Entrance Examination might seem a little strange, but in the Chinese politics syllabus the understanding of the correct relationship between the brain and consciousness is subclassified as philosophy, which includes units on Marxist epistemology and differences between idealism and materialism. In their chapter in Part III, Wang Lu and Keith Lewin present a detailed analysis of the 1987 politics, physics and history papers in China. The fourth item, from a teacher training college exam, may also have surprised those familiar with teacher training in Britain where curriculum units on the theory and practice of assessment feature relatively little. In Kenya assessment and item writing techniques were introduced to college of education curricula in the mid 1970s as part of a policy package designed to use examinations to promote rather than to inhibit learning in the classroom. Widespread reforms in the quality of the standard seven school-leaving exams were introduced in the mid 1970s. These reforms were reinforced through training in assessment for pre-service teachers, assessments of the acquisition of assessment skills, in-service 11

Changing Educational Assessment training for experienced teachers and the production of ‘backwash’ documents for teachers and teacher educators which analysed examination performance and gave advice on helping students to overcome error. In this example assessment is being used extensively to promote pedagogy and improved quality in learning. But in tandem with their pedagogy-promoting role, all the examples presented so far have been of assessments used simultaneously for opening up for some, and closing off for others, opportunities in a wider social, political, and economic system — opportunities for jobs, for income, for status, for prestige, for power and, in the case of the girl whose parents placed an advertisement in the local English press in Sri Lanka, for a good marriage. That advertisement also drew attention to the importance in some societies of qualifications and diplomas which have not only a national currency but also an international one. The girl's parents obviously thought it important to note in the advertisement that their daughter held a London diploma. The importance of London diplomas may be illustrated by the example of an optician's shop in Colombo. Prominently displayed on a shelf in the shop were three large picture frames. The first framed a photograph of the optician receiving a certificate of participation in a course of instruction in the theory and practice of modern methods of prescribing and fitting contact lenses. The second framed the certificate itself titled 'The London Course of Optometry’ signed by the course director, replete with his qualifications FBDA (MD) FSMC D.Opth. The third framed a photograph of around forty course participants standing on the steps of the University of London Union in Malet Street. The words ‘University of London Union’ are displayed prominently. The fact that neither the course nor the certificate were from either the University of London or the University of London's students’ union was immaterial. The young woman with her diploma and the optician with his certificate of course participation are using their qualifications from abroad to enhance their opportunities at home, but others are using their overseas qualifications to enhance opportunities at home and abroad. Take the case of the Colombo Christian parents seeking a wife for their son. Respectable and well-to-do Sinhalese Christian parents (father professional) seek a pretty, fair, tall, healthy, English-educated and non-working partner, preferably a Methodist, Anglican or Baptist with religious background, 12

The role of assessment between 18-20, with means for their only son, 24 years, British qualified electronics engineering technician (teetotaller and non-smoker) returning to Sri Lanka next month and hoping to proceed to Australia for employment in 1989, and owning house and property and business premises worth nearly 20 lakhs, for immediate engagement and early marriage. Caste immaterial. The examples I have provided have underlined many of the roles played by assessment found in different countries of the world. I have chosen to select my illustrations mainly from developing countries for three reasons. First, because I think that the international and comparative literature on assessment already contains a wealth of material from developed and industrialized countries, and although there is a literature on and from developing countries it has yet to be well integrated with the former. Second, in many developing countries the experience of assessments used for occupational and educational selection has been longer, the innovations made in assessment often more radical, and the importance of assessment for life-chances greater, than in many industrialized countries. Third, in some aspects of assessment, developing country experience is running ahead of, or side by side with, experience in a country like Britain. Let me try to build an initial framework for the analysis of the role of assessment, in a national, international and comparative context. The initial framework is simple and has only two dimensions — the first is the level of analysis; the second is the type of role played by assessment. The level of analysis enables us to distinguish for whom the role of assessment is played out (Table 1). I shall distinguish initially between two levels of analysis: (i) the individual student and (ii) the social group. The type of role played by assessment can be classified in many ways. In his chapter in Part II David Pennycuick distinguishes between the formative and summative role of assessment. I have distinguished instead between the facilitating and the inhibiting role of assessment — for example between the way assessment promotes or facilitates learning and the way assessment prevents or inhibits learning. Much of the current literature, like many of the chapters in this book, is pitched at the level of individual student learning or individual teacher teaching. Assessment at this level is valued positively for the impact it has on learning motivation, for the cognitive impact it has on the reinforcement of learning objectives and for the access it provides to 13

Changing Educational

Assessment

Table 1 The role of assessment by level of analysis Level of analysis

Facilitative

Inhibitive

Individual student

- motivates learning

- alienates learner from process and enjoyment of learning - inhibits concept of use-value

- reinforces learning goals - opens access to ‘good life’ Social group - certifies competency and qualification for membership of educational and occupational groups

- failure reduces self-esteem - prevents greater equality and legitimates inequalities of income, power, prestige, status

the ‘good life’ and the opportunities denied to the children of a previous generation. On the inhibitory and preventive side, assessment has been argued to alienate the learner from the process and enjoyment of learning; to inhibit the development of the concept of the ‘use value’ of knowledge, emphasizing instead the concept of exchange value; to reduce in those who fail feelings of self-esteem, and to decelerate motivation for achievement. At the second level of analysis, the social group, we note the opposition between propositions about the role of assessment in certifying competency for membership of educational and occupational groups and those which stress the role of assessment in social control and in the legitimation of inequalities of income, power, prestige and status between groups. Now this distinction between the facilitative and inhibitive, or promoting and preventing, role of assessment may, at first glance, be thought to be purely semantic. With the inclusion of a negative here and there, inhibitive statements can be transformed into facilitative statements. But usually embedded within most people's thinking about assessment is an evaluation made by the writer or researcher of that which assessment promotes or prevents. In the case of facilitative propositions, assessment promotes goals which are valued positively, either implicitly or explicitly, while in inhibitive propositions assessment prevents the achievement of goals 14

The role of assessment valued positively by the researcher. Let me extend and improve this simple framework. In so doing I shall expand the number of levels of analysis and consider some relations between the levels of analysis. I begin with a consideration of the number of levels of analysis. In 1984, Keith Lewin and I presented a paper considering reforms of the secondary school assessment system in Sri Lanka between 1972 and 1982 (Lewin and Little 1984). We considered the role of assessment in society and attempted to explain why assessment reforms had come about, first in 1972 and again in 1977. In 1972 the government abolished the earlier system of O- and A-level examining and replaced it by examinations intended to support curriculum objectives more in line with the learning needs of the majority of students. The new system tested students one year earlier than the previous system and effectively delinked itself from nationally and internationally equivalent qualifications. The new system was implemented for five years before mounting criticism and a change of government led to its abolition and a return to a system named once again GCE O- and A-level. In our analysis we considered the educational factors which precipitated change in 1972 — for example the dissatisfaction with an academic curriculum, and the restrictive range of assessed learning objectives. But in our search for explanations we were compelled to look much further afield. We needed to understand the political context in which the reforms took place. Two years prior to the reform, in 1970, a socialist populist coalition came to power. Its statements of educational policy noted the dependence of the country on foreign expertise and foreign aid and the need to develop indigenous skills and resources. The type of education then on offer was criticized for the premium placed on examinations and the low value attached to the development of skills necessary for economic development. In February 1971 there was a hint in official circles that the O- and A-level might be abolished; just two months later, in April 1971, a youth insurgency swept the country. The insurrection was apparently organized by disaffected youth frustrated by the speed with which government was implementing its election promises to reduce unemployment and improve equity. According to one commentator an estimated 85 per cent of those detained after the insurrection had the GCE O-level qualification. The insurgency focused attention on the education system as a fomenter of unrest and muted those who had previously opposed reform. After civil order had been re-established, the embryo proposals for reform were revitalized and resulted in 15

Changing Educational Assessment a restructuring of the education and assessment system more far-reaching than would have been possible had the insurrection not occurred. What I am highlighting here is that national political arena in which educational assessment plays out its role. But the changes which took place five years later in 1977 were to highlight yet another level of analysis important for an understanding of the role of assessment. Like the earlier period, national political considerations were important. The United National Party swept to power with an overwhelming majority in 1977. A party which drew its support traditionally from the business community and the conservative establishment, it was committed to a package of economic reforms designed to open up the national economy to international market forces and to encourage a return to educational standards that had international comparability. Under its auspices the powerful, though small, lobby interested in reestablishing access to internationally recognized qualifications made its voice heard. This interest was expressed alongside educational objections to the former system. The new Minister of Education noted parental and educational objections to the earlier system and reassured them that the new A-level would include provision for students who wished to sit the London external A-level examination. At the same time the Deputy Minister of Education was to argue that ‘people are perturbed over the present system of examination, which is only accepted locally. This system will be changed so that our qualifications will be accepted the world over.’ (Lewin and Little 1984:77). He later went on to justify the decision by reminding the political opposition that the previous Prime Minister had offered the opportunity of access to foreign education only to her own children and promised ‘whether it be London A-level examinations or the Russian A-level we will not shut the door for our children’ (op. cit.:78). This concern with legitimating membership of an international and global society was to find favour both with the elite who had traditionally enjoyed access to a British and Commonwealth society and with a more rural constituency increasingly attracted to skilled and unskilled jobs in the Middle East. The role of assessment in facilitating international mobility contrasted starkly with the inhibitive role accorded it by politicians and senior officials in the earlier regime. As the Secretary of the Ministry of Education in the earlier period was to note: 16

The role of assessment Of all the great deficiencies created in our societies by colonial rule nothing is so pervading as a lack of moral courage and strength to think beyond the intellectualism imposed on us by the imperial powers . . . The preoccupation of us all with what occurs in developed societies has stifled our intellectuals, thinkers and innovators in education. (Curriculum Development Centre 1975) Others would refer to this stifling as ‘cultural alienation’ or the ‘colonization’ or ‘crippling’ of minds. But whether crippling or creating, the significance of the international context for some developing countries is clear. Whereas the interest of people in Britain in what goes on in the rest of the world in matters of assessment and education has, until very recently, been almost non-existent, interest in theory, policy and practice in the field of assessment in other countries is intense in many developing societies. Having illustrated the importance of the level of analysis for thinking about the role of assessment, we can now extend Table 1 and increase the number of levels of analysis to seven. The seven levels listed in Table 2 distinguish (i) the individual student from (ii) the individual student in relation to his or her family, (iii) the individual student in relation to his or her teacher, (iv) social, economic or political groups of which the student or his or her parents may be a member, (v) national society, (vi) regional society and (vii) international society. The table also includes a number of propositions about the role of assessment at different levels. At the international level, for example, are facilitative propositions of the kind ‘assessment promotes global mobility’, or inhibitive propositions of the kind ‘assessment inhibits equality between poor and rich countries; assessment legitimates the power of international elites and assessment encourages continued “dependence” leading to “colonized” or “crippled” minds and the inhibition of autonomous national development'. But what is the relationship between these different levels? By presenting these propositions in a vertical column one can imagine a series of levels which ‘frame’ the analysis for the preceding level. That is not to suggest that these higher-order frames determine processes at a lower level. Instead they may be thought of as conditioning or setting the range of possibilities for the preceding level. I will illustrate the idea of framing and context by reference to studies on educational and occupational aspiration. 17

Changing Educational Assessment Table 2 The role of assessment by seven levels of analysis Level of analysis Individual student

Facilitative

-motivates learning

Inhibitive

- alienates learner from process and enjoyment of learning - inhibits the development of the ‘use value’ of knowledge - failure reduces self-esteem

-reinforces learning goals -opens access to ‘good life’ Individual -confirms high status - inhibits social relations, leading student and of family to shame, social disgrace, suicide, family -confers new status murder on family Individual -sets the boundaries - restricts learning to that which is student and for legitimate assessed teacher knowledge -defines relations between teacher and student -provides teacher and student with feedback on performance inhibits greater equality; legitimates -reinforces/creates Social, inequalities of income prestige status political, group identities economic -assists lower social group groups to achieve social mobility National inhibits mass interests in capitalist -certifies competency society society and qualifies for educational/occupational group membership -reinforces national unity -promotes economic growth -enables comparability between schools and facilitates accountability Regional -promotes mobility of society persons between countries in same region International -promotes global - inhibits equality between poor and society mobility rich countries; legitimates power of international elites and leads to continued dependence

18

The role of assessment A number of studies have shown that the level of educational and occupational aspiration of an individual student has an impact on academic achievement as measured by examination results. Samples of students’ responses from industrialized countries typically display a wide range of both educational and occupational aspirations. During the 1960s, studies conducted in some African countries noted the extremely high and narrow range of aspirations of students in African schools. In 1966, for example, David Koff found that 81 per cent of a sample of Kenyan primary school students expected to enter a secondary school at a time when only 10 per cent of the sample could expect to do so (Koff 1966). In 1971 Meg Peil wrote of the ‘insatiable demand’ for further education by secondary school students in Ghana, while Clignet and Foster reported in 1966 that 90 per cent of their secondary school sample interviewed in the Ivory Coast wished to proceed to full-time further education. In the early seventies similar findings were being reported from Latin America and Asia. Policy-makers and academics alike began to talk of inflated aspirations and expectations, and of the realism and unrealism of educated youth, especially of the unemployed educated youth. The realization of such aspirations was, of course, mediated by assessment and qualification systems, reinforcing the very high value placed on assessment by student and teacher alike. In the classroom this led to a highly developed sense of what constituted the examination syllabus and to a definition of studying and knowledge which was bounded by that which was important for assessment purposes. This point is well illustrated by Hugh Hawes in his book Curriculum and Reality in African Primary Schools (1980) in which he quotes a primary school student who, when asked ‘what subjects do you study at school?’, replied ‘Maths 1, Maths 2, Language 1 and Language 2’. These were titles, not of the subjects in the school timetable, but of the examination papers he was due to sit at the end of the year. The point of these examples is to suggest that the context in which classrooms and learning are structured will to some extent condition the type and range of responses to learning stimuli and assessment practices and the definitions of what constitutes learning and assessment in the first place. Similarly, to understand the ‘realism’ or otherwise of students’ educational and occupational expectations, we need to understand why the expectations are as they are and to understand the rationality which underpins them (Little 1980). It is not enough to dismiss the expectations as unrealistic and to try to act on the situation by persuading individual students 19

Changing Educational Assessment through career and education counselling to revise them downwards. The understanding which informs action should go beyond the individual to examine, among other things, patterns of national income distribution, the life chances of those without high levels of examination success, channels of social mobility and historical relations between education, assessment and economic and social success. Is the framing idea useful only within the facilitative and inhibitive columns of Table 2 or can it be applied across columns? Some might wish to stress the overriding facilitative role of assessment at each and every level of analysis, while others would stress the inhibitive. But there is in my view no logical necessity for facilitative propositions at one level to imply facilitative propositions at another. An analysis pitched at the individual level which emphasizes the positive role of assessment on children’s academic performance and an increase in their job chances does not imply, for example, that lower social groups will achieve social mobility through a mean increase in performance. If jobs remain in short supply then it is likely that the qualifications for jobs will rise, the examination stakes increase and lower social groups will fail to improve their position en masse (although a few individuals may do so). Similarly, propositions pitched at the national level which emphasize the facilitative role of examinations for the occupational selection system do not imply that within that same system examinations are promoting the development of competence at the individual level. At the international level propositions about the delinking of national assessment systems from international ones do not imply that the educational professionals responsible for developing the technical aspects of the indigenous and independent system will delink their professional alliances from the international educational community — a point that became clear in our analysis of the assessment reforms in Sri Lanka. Thus, what is 'good' for society may not be 'good' for the individual, and vice versa; and what is 'good' for international society may not be 'good' for national society, and vice versa. So far, then, I have extended the framework for analysis to include relations between nations as a level of analysis which can in turn frame or set the context for developments at a lower level. I have also referred to national political concerns with social unrest and economic liberalization which can create the conditions for change in assessment practices with their associated implications for individual life chances. The lower levels of the framework showed how the individual 20

The role of assessment student might be a member of a student body with shared educational and occupational aspirations — a situation explicable in terms of historical precedent and the structure of the economy, and leading, arguably, to a classroom environment somewhat different from that which we find in schools in Britain. Although this framework extends the very simple one presented at the beginning of this chapter, it is still far from complete. The dimension of time is obviously important and is one which, apart from its relevance to strictly historical accounts of developments in assessment systems, has tended to be ignored by research in this area. Jasbir Sarjit Singh's account in Chapter 13 of this book of the Malaysian findings from the SLOG project — a six-country study of student motivation — demonstrates a concern with the implications of types of learning orientation developed in school for motivation over time, in this case for motivation in the work place (SLOG 1987). In conclusion, what does this framework imply for comparative and international studies of assessment? There are at least five implications. The first is to stress the importance of a multi-levelled context and to avoid the temptation to lift aspects of assessment out of context too quickly in order to effect speedy comparisons between countries, if the comparison is intended to do more than simply provide interesting information about how things work in other countries. The second is to recognize that good, contextual, single-country studies provide the necessary foundation for good, comparative country studies. The third is the distinction between an international and a comparative study of assessment. An international study need not necessarily be comparative. Studies which examine relations between one country and another are inter-national but involve no comparison in the strict sense until a second set of studies which also examine relations between countries is available. The fourth is to distinguish clearly the level of analysis and, thus, for whom or what assessment is playing out its role, and what precisely assessment is thought to be promoting or preventing, crippling or creating, facilitating or inhibiting. The fifth is to acknowledge that a particular form of assessment can be broadly facilitative at one level of analysis but broadly inhibitive at another. Finally, I hope that my use of research, examples and illustrations mainly from developing countries will have underlined the latter’s significance and the importance to our mutual development of a rather more integrated international 21

Changing Educational Assessment and comparative literature on assessment than we have at present. References Cleverley, J. (1985) The Schooling of China, New South Wales: George Allen & Unwin. Clignet, R. and Foster, P. (1966) The Fortunate Few: A Study of Secondary Schools and Students in the Ivory Coast, Northwestern University Press. Curriculum Development Centre (1975) Bulletin of the Curriculum Development Centre, 1,1, April. Hawes, H. (1980) Curriculum and Reality in African Pritnary Schools, London: Longman. Jaysuriya, J. E. (1979) Educational Policies and Progress During British Rule in Ceylon (Sri-Lanka) 1796-1948, Sri Lanka: Associated Educational Publishers. Koff, D. (1966) ‘Education and employment: Perspectives of Kenya primary pupils’, in J.R. Sheffield (ed.) Education, Employment and Rural Development, Nairobi: East African. Lewin, K. M. and Little, A. W. (1984) ‘Examination reform and educational change in Sri Lanka 1972-1982: Modernisation or dependent underdevelopment?’ in K. Watson (ed.) Dependence and Interdependence in Education: International Perspectives, Beckenham, Kent: Croom Helm. Little, A. W. (1980) ‘The logic of students’ employment expectations’, Institute of Development Studies Bulletin, II, 2: 20-7. Peil, M. (1971) ‘Education as an influence on aspirations and expectations’, paper presented at Conference on Urban Unemployment in African Institute of Development Studies, Sussex. SLOG (1987) ‘Why do students learn? A six country study of student motivation’, Institute of Development Studies, Research Report, Rr 17, Sussex.

22

2 Trends in the assessment of teaching and learning: educational and methodological perspectives Carol Anne Dwyer This chapter draws on trends in the United States and elsewhere. I will also be mixing technical (primarily measurement) material with educational and social policy material together with both technical and policy-related literature in the field of assessment to argue that the United States is experiencing an increased use of assessment due primarily to social and political forces, including extensive demographic changes. At the same time it is also experiencing changes in the character of the assessments that are being conducted with both teachers and learners. These changes in the character of assessments are the result of both technical and social forces. I believe that both of these kinds of change represent longterm trends generalized to other nations. Issues related to increased use of assessment may not so generalize. Trends in the technical and scientific character of assessments A paradigm shift A fundamental paradigm shift has occurred in the social sciences in the United States and elsewhere. This shift could be characterized in a number of ways, for example as a shift from emphasis on prediction and control to an emphasis on meaning and understanding. In assessment, specifically, we are seeing a shift from mathematical and statistical models to educational and psychological models to guide the formation of assessments and the interpretation of their results. Logical positivism of ‘dustbowl empiricism’ is no longer seen as an appropriate paradigm for assessment — the days when assessment people could say ‘if it predicts it must be okay’ are gone for ever. We are experiencing a renewed sensitivity to the context of an assessment, and to the importance of under23

Changing Educational Assessment standing and theory in guiding our use of assessment. With this paradigm shift has come an increased influence of cognitive approaches to psychology in place of behavioural approaches, which has had ramifications for education and educational assessment as well. We must now commit ourselves to understanding what it is we are trying to assess before we design our assessments. In recent years this shift in perspective has been exemplified dramatically by changing views about the place of validity. Replacing the old model of validity as a concept neatly divided into three parts (content, predictive and construct), we are now coming to view validity as an attempt to construct meaning from data, and from a network of inferences. We now begin with the identification of a theoretically interpretable target of some important and complex behaviour. We then attempt to decide how inferences should be made about that complex target, using pieces of more readily available information. In this model the primacy of the target behaviour itself is evident, and thus the importance accorded to validation, particularly in its more modern formulations. More narrowly empirical concerns such as reliability, while still important to interpretations of assessment data, can no longer be seen as the end objective of measurement. With this paradigm shift there is an increased sensitivity to the limitations of multiple-choice testing and an increased tolerance for the complexities and ambiguities that necessarily accompany more direct and contextually meaningful forms of assessment. Interest (and participation) in this paradigm shift is not restricted to psychologists and psychometricians. Many sectors of the public and many educational professionals recognize this shift, at least intuitively, even if they are not able to explicate it in technical terms — or have no interest in doing so. Even in the United States, that bastion of multiple-choice assessment, there is growing disaffection with traditional forms of assessment. With this also comes an increased interest in complex ways of reporting assessment results, including verbal as well as numerical schemes. It still remains to be seen, however, whether the public and educational professionals will experience an increase in tolerance for ambiguity in the interpretation of assessment results.

24

Trends in the assessment of teaching and learning Technology and assessment The role of technology in assessment trends should not be underestimated. Technology has a strong tendency to drive the shape of large-scale assessments and the policies governing their use. The pervasiveness of multiple-choice tests in the United States was a direct result of the availability and cost effectiveness of optical scanners and their fit within our educational system. Technology’s power to influence extends beyond the obvious practical aspects — it also affects how people think about assessment. The explosion in personal computer power and availability appears today to be changing the economics of large-scale assessments and therefore the form of these assessments, but it also changes people's expectations about what can and should be assessed. Simply put, more complex stimuli and more complex responses can be captured and analysed with the help of the computer. Today the problems with computer use pertain more to the personal motivation and skills of those who use them than to the availability of the machines themselves. Personal computers are widely available in schools in the United States today and this availability continues to increase dramatically. The manufacturers of computers remain eager to have their products used in schools. It seems, however, that the world is still divided into two kinds of people — computer users and computer avoiders. At this time, many of our teachers are still in the latter category, and the use of computers for instruction and for assessment linked to instruction will be limited by this during the tenure of these teachers. New forms of assessment Our options for changing assessment are expanding rapidly now because of advances in technology, as well as our scientific and philosophical understanding of human cognition. The current expansion of personal computer use will remove many practical limitations to what we can test and how we can test it, but will also bring with it substantial challenges to our intellect and judgement. When practical limitations are suddenly removed, the intellectual and value shortcomings of our testing activities will be rudely exposed. We need to step back from the inviting possibilities presented by our new technologies and consider at a very fundamental level what our assessment goals really are. Given a wider range of practical 25

Changing Educational Assessment possibilities we are clearly headed towards two interrelated goals for assessment: (1) assessing more complex samples of behaviour and (2) making more realistic approximations to that actual behaviour during the testing process. This latter goal suggests the possibility of assessments unobtrusively embedded in instruction. It is interesting to note that the same possibility seems also to be hinted at in the recent work in England of the Task Group on Education Assessment and Testing (TGAT 1983), the Report on National Curriculum. There is also a related trend in test development towards direct assessment of skills — that is, closer approximations in assessment to the skills we are actually attempting to measure (in validity terms, approximating the ultimate rather than intermediate criteria). In the United States, as in Great Britain, there are already extensive applications of direct assessment in the area of writing. American psychometricians, of course, have long resisted the use of essays to assess other subject matter for large-scale applications, a reluctance which is not traditionally shared by their British counterparts. Interestingly, the new developments I have been describing may bring the theoretical and operational goals of our two nations’ assessment practices closer together. Gains in efficiency and in the meaningfulness of assessment results are currently being made possible through the use of technology and psychometric research. It seems certain that the future holds more direct assessment of various kinds of performance including such covert performances as reasoning, with little sacrifice in our desire to have fair, reliable and cost-effective assessments. We have already had some promising experiences with direct assessment in structuring judgements to yield logical and defensible assessment of complex behaviour such as artistic performances. Within the next decade we should be able to do this much more efficiently and to expand the underlying concepts to other teaching and learning situations. Assessment and demographic trends Certain demographic trends, which are not restricted solely to the United States, are also changing the assessment picture. It seems virtually certain that the United States is soon to face serious teacher shortages, at least in some areas and in some disciplines. Many of our experienced teachers will be retiring over the next ten years and the age cohort that will be graduating from college during these years and becoming our beginning teachers is a much smaller cohort than the cohort of 26

Trends in the assessment of teaching and learning pupils who will need to be taught by them. These developments have potentially very significant effects on the assessment of both teachers and students. Paradoxically, our experience with teacher shortages leads us to wish to raise standards. In such an environment, we are also inclined to call for an elevation in the status of teachers, including a rise in their salary levels. We are also facing changes in immigration patterns, particularly in the increased immigration of Asian minorities. These groups represent a number of educational challenges for us. In the short term, the challenge is one of assimilating them into our existing school structures while making provisions for their transitional language difficulties. In the longer term, we expect many of these groups to follow in their cultural traditions and excel in educational enterprises. This suggests that additional attention needs to be given to the interrelationships of all of America’s minority and majority groups, particularly in the face of the persistent and widening socioeconomic gap between our Black and Hispanic minority groups and the majority population. This gap impacts on schools and their assessment policies directly, but perhaps more importantly fuels public demands for school accountability. A more constructive aspect of these changes, however, is the promise of continued interest in issues of test fairness which remain a critically important social and technical issue for us. Trends in the control of assessment Assessing individual students In the United States, control over the assessment of individual students has typically been exercised at the state and local levels. Functionally, it is the state which sets the great preponderance of assessment policy in the United States. This tradition was strengthened by the National Commission on Excellence in Education’s 1983 publication A Nation at Risk and other reports critical of American education. These publications set a national mood for educational reform in which assessment plays a central role. Forty-four of the fifty states now have state-mandated elementary and secondary school tests for students, which are typically linked to state curriculum goals. Forty-five of the fifty states now have new coursework requirements. The great preponderance of both testing and coursework requirements focus on reading, language and mathematics. A small minority also include 27

Changing Educational

Assessment

writing, reasoning and citizenship. Only six states now require assessment in science and social studies. Many groups and individuals are highly critical of such state assessments. Among their concerns are the generally perceived low level of skills required by such assessments. Local school agencies and teachers both resist state assessments on educational grounds. In addition to the low level of skills required, they point to a tendency for such assessments to contribute to fragmentation and oversimplification of curricula. Partly as a result of these concerns, state assessments are moving from a focus on basic skills assessments to assessments of higher order skills. Change in this area is slow, however, as both state and local educational systems are inherently conservative. With respect to the assessment of individual students, there continues to be very strong opposition to national assessment. The support for such assessment stems from very abstract concerns not necessarily appealing to a broad spectrum of groups or individuals, and the practical problems associated with such an assessment are very significant. Assessing groups of students The assessment of groups of students is done both nationally and at state level. State level assessments of groups of students are carried out simply by aggregating individual-level assessments of students. National assessment, however, is carried out on a sampling basis and there is no attempt to report the achievement of individuals, students or schools. There is currently some interest in reporting national assessment results by state, but this is a relatively new development. National assessment in the United States is intended to be truly national, but there seems to be no explicit or implicit link between the assessment and the notion of a national curriculum. In fact, the national assessment still generates widespread concerns over federal government control of the curriculum, which, as I have mentioned, is largely a state rather than a local prerogative. Assessment and

accountability

Public pressure for school accountability remains high and takes a variety of forms, but assessment is a favourite remedy. 28

Trends in the assessment of teaching and learning In a typical cycle, public concern is expressed to state legislatures who then make educational policy through enacting specific legislation. Seldom does such legislation result from the input of professional educators themselves. This process is, of course, not confined to education. The result is that testing in the United States is experiencing an unprecedented boom period which may perhaps be linked to our conservative administration and its interest in educational and economic accountability, particularly at the federal level. The strong interest in school accountability could also be simply a manifestation of a more widespread social concern. Assessment of teachers A more recent assessment development concerns the assessment of teachers themselves. Beginning with the calls for educational reform in the reports of 1983, states began to show increasing interest in the qualifications of teachers. This interest took the form of creating new standards for beginning teachers, including standards for assessment. Forty-eight out of fifty states now have such assessment standards in place. An examination developed by the Educational Testing Service — the National Teachers Examination (NTE) — is the most widely used form of assessment. Utilizing multiple-choice questions, it covers a limited number of basic skills, subjectmatter content, and some forms of pedagogical skill. The majority of states’ current interest in teacher assessment is restricted to the licensing of beginning teachers, although some have shown an interest in the regulation of experienced teachers. A few states have tried to increase their control through periodic re-licensing and through creation of schemes of differentiated teacher status and pay based on assessment. In the remainder of states the regulation of experienced teachers is viewed as a responsibility of local school districts who must make hiring and retention decisions about these teachers, as well as later tenure decisions. There is no corresponding national movement to regulate beginning teachers, perhaps because of the strong tradition of states holding primary responsibility for licensing to protect the public from harm. It is very interesting, however, that national control, although non-governmental in nature, is being asserted for the ‘certification’ of highly experienced teachers. This type of certification, analogous to medical board specialities, is envisioned as a voluntary assessment for the purpose of 29

Changing Educational Assessment demonstrating one’s expert mastery of the subject matter and pedagogical skills related to that subject matter. Such moves may be seen as part of a widespread desire on the part of both the public and teachers alike to ‘reprofessionalize’ the career of teaching. Also contributing to this move towards reprofessionalizing teaching are the two teacher organizations — the National Educational Association and the American Federation of Teachers — which have both professional and economic concerns for their members. There is currently a serious move to restructure teacher assessment in the United Sates. Interestingly, this movement has been led by educators themselves, and has resulted in the formation of the National Board for Professional Teaching Standards. This group, which has been in existence for just a year, is committed to developing model assessments for experienced teachers and is looking to the educational research community to provide innovative prototypes for such assessments. The Educational Testing Service (ETS) has also committed itself to replacing the National Teacher Examinations by 1992 with a system of assessment for beginning teachers that does not rely on multiple-choice testing. In the United States, as well as in Great Britain, it seems that there is a willingness to consider some very fundamental changes in assessment, including its relationship with instruction. As the potential for new forms of assessment grows, our expectations for it will continue to increase. Among these expectations will almost certainly be a desire to see instruction and assessment more closely linked, which I believe would be to the ultimate benefit of both. Judgements made by teachers about their students are assessment. They are data that need to be captured and evaluated against such important criteria as fairness, accuracy and reliability. Combining the values attached to such judgements (which tend to be high in validity because they are close in form to the criterion of student performance) with other forms of assessment, which might have a weaker relationship to the behaviour of interest but greater reliability and efficiency, may result in a more satisfactory assessment from both educational and technical points of view. The National Curriculum Assessment proposals of the Task Group on Assessment and Testing contain some interesting elements of this idea of combining assessment methods, but the specific form clearly needs further development before we can know if this potential can actually be realized. I see similarities as well in the British and American battles for control of curriculum. Both are engaged in a form 30

Trends in the assessment of teaching and learning of governmental struggle, with a fundamental question being argued: when decisions must be made about what and how to teach, how close to those being educated should the decisionmakers be? When such decisions are made locally, the results are likely to be closer to optimal for individuals in a local unit. When such decisions are made at a great remove, one can argue that the emphasis on standardization ultimately benefits the group as a whole. The control issue can also be characterized as a struggle between governmental forces (at whatever level) and educators. The enactment of national or state standards tends to remove a degree of control over the substance of the decisions from the ‘experts’, the professional educators, as the standards are developed and implemented. This process is a complex one that I think has not been deeply considered at a policy level. As we learn more about teaching and learning, we are increasingly aware of our own ignorance and the complexity of the problems we face. We are therefore less likely to be content to settle for simple solutions, including simple forms of assessment, and increasingly inclined to accept the complexity of the interrelationship between instruction and assessment. I believe that now may be the time when we can begin to put that knowledge into practice.

31

3 Reshaping the standards agenda: from an Australian’s perspective of curriculum and assessment David Cohen Lower taxes and standardization in the Republic of Ruinacea Once upon a time, a new government came to power in the Republic of Ruinacea. It was determined to eradicate societal ills, but also was committed to staying in power by whatever pragmatic policies were found effective. Their most popular policy was that of ‘keeping taxes down’, and this became the major platform of their hip pocket economy. The obvious solution to this dual challenge — eradication of ills and lowering of taxes — was standardization. New laws were legislated. • ‘No more toothaches’, the government told dentists —‘just pull the teeth out. This will be a quick and surefire remedy, at little cost to the taxpayer.’ • ‘Bridges have become too expensive’, they told the architects and engineers. ‘In future, you must use our new modular multipurpose bridges, all made to the same design but differing only in length. These identical bridges must be built over all rivers, regardless of bridge width, anticipated traffic flow, or other environmental conditions.’ • Nationalized medicine now required all doctors to prescribe Theoaminophenate B for all coughs and sore throats, regardless of the origin of the condition. ‘We must eradicate the wasteful expenses associated with diversity’, the government spokespersons said. Protests from the professionals were ignored. ‘You are destroying the hallmarks of professional decision-making, said the professionals. ‘We must fight to retain individual diagnoses and courses of action for the unique situations confronting each of our clients.’ ‘Patently absurd nonsense’, replied the government. ‘We have a mandate from the people. You have had 32

Res heaping the standards agenda much too long to find inexpensive remedies, and the people are fed up with waiting.’ And so the new laws were passed. These new laws also called for new statistics to be gathered. These statistics showed a dramatic decrease in the frequency of toothaches, sharp increases in the rates of building new bridges, and big savings in the healthcare bill resulting from the elimination of throatrelated illness. In fact, in the latest international study, Ruinacea had risen from thirty-eighth to twelfth on the international index of professional parsimony. Politicians proudly proclaimed the remarkable improvements in ‘standards’ according to the indices of toothache, bridgebuilding and throat soreness. But it soon transpired that the products instead had become damaged, bland, homogeneous. Toothless, soremouthed people shared their sad stories. Bridge closures and even collapses dominated news bulletins. Deaths from throat cancer doubled in two years. The population was outraged as attempts by the government to obscure the facts were sabotaged by leaked information from senior civil servants, and the re-election of the government was predictably averted. Intrusion of politics into decision-making This scenario depicts the loss of CQntrol of the agendas by the professionals of Ruinacea as they capitulated to naked political power. But, alas, the similarities with the realities in so-called developed nations are all too apparent. Extravagantly unrealistic promises are presented as seductive lures for unwitting or unwary members of the public. Witness a recent (1988) (electorally unsuccessful!) slogan in Australia: T A X E S DOWN. STANDARDS UP. Life will be better’. What a contradiction of terms! With the maintenance and upgrading of standards of health, education and welfare requiring government financial support, the lowering of taxes can hardly raise standards! Yet with education now a potential political vote-winner, such claims are readily transferred by politicians to debates about educational assessments and standards. The blatantly political agendas being imposed upon schools by governments in the USA, UK and Australia have already eroded soundly based educational criteria. The educational ideas of teachers and professional educators are being replaced by the expedient and narrowly conceived tools developed hurriedly and imperfectly by a small group of psychometricians prepared to prostitute their educational ideals in response 33

Changing Educational Assessment to the chase for disappearing education funding and political expediency. I shall return to this theme later. In his contribution to this book, Canadian Professor Les McLean starts with the thesis that cross-national comparisons have no more inherent limitations than other studies. It is my thesis that these other studies themselves in relation to assessment have very severe limitations, and that cross-national comparisons share these limitations. Second, I will argue that educators have too easily, and at their peril, capitulated to political decision-makers. There are indeed curriculum ideals; and assessment and its statistical and psychometric supports exist to be the servants of these ideals. Further, the current status of assessment is primitive and quite inadequate to reflect the ideals which we espouse concerning curriculum, even if we, like the New Zealanders Professor Tony McNaughton refers to in his chapter, pledge to conduct assessment ‘in an educated way’. Problems of definitions First, however, I wish to address a basic problem that plagues many educational debates, namely the problem of definitions. There are generally many alternative and equally acceptable definitions of educational terms. For example, in a major national survey about curriculum decision-making in Australian schools, the 600 respondents had 600 differing perceptions of the meaning of ‘curriculum’ (Cohen and Harrison 1982:195205). There are no ‘right’ or ‘wrong’ answers about such definitions: there is rather a wide range of acceptable definitions. In the case of curriculum, one dimension of the range might extend, for example, from the production of a set of documents (representing curriculum intentions) at one end of the spectrum, to the teacher-student classroom interactions in a classroom (as a curriculum in action) at the other end of the spectrum. Now, the same breadth of definitions arises in a consideration of terms like ‘assessment’ and ‘standards’, and again — even amongst acclaimed authorities — there is little consensus about meanings and definitions. Consider the definition of ‘assessment’. I have recently searched the literature and also asked a number of colleagues. As a result, I have located several alternatives but, I think, incomplete or confusing definitions of ‘assessment’. Often the term assessment is not clearly delineated from testing, measurement and evaluation. Choppin (1985) says that what assessment, measurement and 34

Reshaping the standards agenda evaluation have in common is ‘testing’, although many would consider that evaluation does not need to include testing. Two other recent definitions of assessment are as follows: A general term encompassing all methods of judging performance, rate of progress, or difficulties experienced by a pupil. (Primary School Pupil Assessment Project 1988) The term assessment should be reserved for application to people. It covers activities included in grading (formal and non-formal), examining, certifying, and so on. Student achievement on a particular course may be assessed. An applicant’s aptitude for a particular job may be assessed. Note that the large monitoring programme within the United States, the National Assessment of Educational Progress (NAEP), is not an assessment within this definition. Although individual students are given tests, no interest is attached to their individual results. Data are aggregates before analysis, interpretation, and reporting (e.g. ‘performance levels of 17-year-olds are declining’, or ‘The gap in standards between the South Eastern states and the Far West is decreasing’). NAEP is an example of evaluation rather than assessment. (Choppin 1985) These quotations do little to relieve confusion about the definition of assessment. This is particularly worrying in view of the fact that assessment is a central concern of both the educational and political agendas at present. Even among experts there may exist widely differing perceptions of the meaning of assessment, as this book illustrates. Some limitations of assessment Before turning to address the ‘standards’ agenda, I want to refer to four aspects of assessment which I believe seriously limit the potential contribution of assessment to this debate about standards. Large-scale assessment versus school-level change The introduction of large-scale (i.e. statewide and national) assessment schemes constitutes the infiltration of ‘scientific 35

Changing Educational Assessment management’ and the ‘cult of efficiency’ into education. These mass production trends deny or even contradict recent evidence from several recent reports of successful US corporations which demonstrate that the most effective (even most profitable) enterprises are those imbued with strong interpersonal components (c.f. Peters and Waterman 1982; Iacocca 1984). The trends towards large-scale schemes likewise ignore educational research evidence of two major varieties: (1) the individual school is the locus of change (as both Tyler [1987] and Goodlad [1975:255] have shown); and small is generally beautiful in education; and (2) involvement of teachers in decision-making breeds commitment to implementing those decisions (c.f. Spring 1985). Lack of evidence of impact of assessment on learning ‘The policy of mandated testing as a means of school improvement remains untested’. (Stake, BERA oral presentation 2.9.88) There is no evidence that the introduction of assessment schemes has any positive effects upon educational quality or standards, nor even that assessment is an effective or constructive instrument of change in education. One wonders upon what evidence the Australian ex-Shadow Minister on Education (Mr J. Carlton) based his publicly and confidently expressed view that ‘Freedom from assessment might feel good at first, but it leads to deterioration in performance and personal insecurity over time.’ (Carlton 1988:5). In fact, interesting recent speculation is that assessment may actually have an adverse effect on learning. This is based upon research in England concerning the fourth and fifth years of secondary schooling, which showed that many students were too busy writing during lessons to learn, and they intended to learn later, but that their pre-exam homework of ‘learning the notes’ often resulted in short-term memorization without understanding (Elliott, 1985:108-9). In his analysis of this research, Elliott commented ‘If standards refer not simply to the quantity of content learned, but also the quality of thinking involved in that learning, then the public examination system, rather than the teaching profession, could be responsible for the lowering of standards.’ (Elliott, 1985:111) (my italics).

36

Reshaping the standards agenda Assessment lacks

comprehensiveness

The prime criterion of sound assessment remains as ‘validity’. Even with the application of every known technical and psychometric advance, and with every trick available internationally for the improvement of assessment, validity is the inescapably crucial criterion. We further know that there are no strategies known or invented which can assess with high levels of validity, in practicable situations and on a large scale, the most valued of human capacities and abilities which include critical and creative thinking, and such attributes as feelings, values, attitudes, and higher-order cognitive, affective and psychomotor abilities. I would go so far as to argue that the portrayal of the overall educational progress of learners by current assessment strategies is totally inadequate, reflecting at best about 10 per cent of learner development. Indeed Komoski has argued elsewhere that only 5 per cent of the learning objectives in the USA are assessed by the currently-mandated state tests in that nation! (reported by Stake 1988). Subject-dominated

curriculum

The commonest framework for assessment of standards is embedded within the subject-dominated straitjacket mistakenly labelled by many education bureaucrats as 'the curriculum'. Thus, standards are purportedly reflected by the accumulated and averaged scores from tests derived from a set of disintegrated subject compartments, including mathematics and/or reading, sometimes supplemented by such subject areas as science, foreign languages and social sciences. Consequently, in assessments which purport to reflect educational standards, the overall curriculum as experienced by the learners is generally not even considered. Portrayal of progress by current assessment strategies is severely constrained by subject and domain boundaries. Assessment corrupts curriculum Thus, in summary, the portrayal of the educational progress of learners by current assessment strategies and the improvement of learning is severely limited by: limited school-level and teacher involvement when large-scale assessments are conducted; the lack of demonstrable positive impacts of 37

Changing Educational Assessment assessment upon learning; the lack of comprehensiveness of current assessment strategies; the continued existence of subject-dominated curricula with their boundaries/domains; the lack of contextual reference, and the lack of negotiation by adults (as determiners) with the learner/consumers. These limitations clearly imply that, far from providing an accurate portrayal of educational standards, currently used assessment strategies constitute a corruption of education, curriculum and of student learning processes. Curriculum ideals for excellence Recent personal observation in universities and schools in the USA, Canada, England, Sweden and Israel, and a lifetime personal concern for the quality of education in schools, have confirmed a number of curriculum principles which underpin excellence in schooling. These same criteria are also vital in defining the nature of educational standards. I would designate the following as my three highest-rating curriculum criteria to achieve excellence in schooling. (1) A broadly-based balanced curriculum which promotes diversified objectives and learning experiences. Thus the school develops in each student academic, intellectual, social, emotional, physical and cultural talents, and still acknowledges the basic importance of language and numeracy. I noted with interest that, in England, Kenneth Baker agreed with this, as the school curriculum satisfies the Education Reform Act 1988 if it is a balanced and broadly based curriculum which (a) promotes the spiritual, moral, cultural, mental and physical development of pupils at the school and of society; and (b) prepares such pupils for the opportunities, responsibilities and experiences of adult life (ERA 1988:1). (2) Professional judgements and decisions about curriculum which reflect the recognition of necessary differences in curriculum between schools, classroom and individual students, and these differences are generally positive and represent the strength for continuous and meaningful school renewal. Thus the curriculum, resource materials and evaluation strategies will be tailored to the particular context in which they are used. For me, the ultimate ideal is the individualized (or, better, personalized) curriculum which incorporates largegroup, small-group and individual activities, negotiated between teacher and learner and resulting in a personalized curriculum contract for each student. National curriculum is thus almost a contradiction of terms. 38

Reshaping the standards agenda (3) A curriculum which contributes to making the schools attractive, educationally stimulating, flexible and enjoyable places for students to be. This will be reflected in records which show high levels of school attendance and increased school retention rates. I reiterate that it is these above three criteria which are the bases of ‘standards’ against which I believe educational progress can be assessed. How then do major national and international students ‘measure up’ to these ideals? Current large-scale studies of ‘standards’ Judgements about educational standards often purport to be global statements about the levels of educational achievement across a school, state or nation. Large-scale studies involving such broad judgements have several deficiencies which make the related assessments unreliable and lacking in validity. These large-scale studies mask the huge variations between individual students as well as between classrooms and schools, and report their results so as to treat as a single entity (‘standards’) what is a very wide spectrum of educational progress. On the other hand, individual results are increasingly being portrayed as ‘profiles’, i.e. tracing the shape of student learning over a wide variety of performances in a wide variety of fields. At least three countries have seemingly ignored these and other limitations of large-scale studies. These countries have presented data as purported evidence in attempts described as monitoring of educational standards nationally. These countries are: USA with its National Assessment of Educational Progress (NAEP); England and Wales with their Assessment of Performance Unit (APU), and Australia with its Australian Studies of School Performance (ASSP). In addition, the International Study of Educational Achievement (IEA) has sought to provide cross-national comparisons, and now has increased its representation from twenty to nearly forty nations in some of its activities. Some features of these four studies are summarized below. National Assessment of Educational Progress (NAEP) USA, 1969+. (Adapted from Livingstone 1985:4789.) Stage 1: Educational objectives designed by committees of 39

Changing Educational Assessment laypersons/educators in ten subject areas: reading, writing, literature, mathematics, science, social sciences, music, art, citizenship, career and occupational development. Stage 2: Achievement on knowledge, skills and other outcomes assessed on variety of test instruments (e.g. skill in playing a musical instrument). Stage 3: National samples totalling 30,000 students, at three age levels (9-, 13-, 17-year olds) tested in two or three subject areas each year. Stage 4: Reaction of panels (laypersons, administrators and educators) sought in analysis and interpretation of meanings of statewide public information. Stage 5: Half of items made public and used as ‘benchmark’ for later comparisons. Half of items kept confidential and readministered in follow-up studies four to five years later, Assesssment of Performance Unit (APU) United Kingdom, 1985:4790.)

1974+. (Adapted

from

Livingstone

Assessment strategies were eventually devised in three broad curriculum areas: linguistic, mathematical and scientific. Personal and social, aesthetic and physical assessments were originally envisaged but could not be operationalized. These strategies went beyond pencil-and-paper tests to include interviews, observations, and assessment of practical performances. National samples of 12,000 students at the age of 11, 13, and 15 years were tested in one area each year. Australia Studies of Student Performance (ASSP) Australia, 1976. (Power 1982.) Sponsored by the Australian Education Council. Limited to the testing of ‘basic skills’ (reading, writing and numeration) using standardized (half-hour, pencil-and-paper) tests developed by Australian Council for Educational Research. 1,000 students in each state and 500 in each territory were sampled, half each from groups of 10- and 14-year olds. The difficulties intrinsic to large-scale assessment programmes are further exemplified by changes in the nature of the school population and curriculum emphases in recent years in Australia. School retention rates to years 11 and 12 (the 40

Reshaping the standards agenda final two years of secondary education) continue to rise dramatically. Between 1982 and 1984, the percentage of students starting secondary school four to six years earlier enrolled in year 12 rose from 36 per cent to 45 per cent, while the corresponding percentages for year 11 rose from 57 per cent to 66 per cent (Brewster et aL 1985:21). The Australian study was curtailed in 1980 following coordinated opposition. (Interestingly, a Victorian Ministerial Committee established in 1977 had likewise been jettisoned in 1979 after opposition from teacher organisations and alleged lack of support by the Education Department of Victoria for its ‘Essential Skills Assessment Project’, ESAP.) International Achievement

Association for the Evaluation of (IEA)

Eighteen nations, 1964+. 1985:4799, IEA 1970.)

(Adapted

from

Educational Livingstone

Tests developed on an international basis in seven subject areas: mathematics, science, reading comprehension, literature, French as a foreign language, English as a foreign language, civic education. Also used were attitudes scales and questionnaires. The above descriptions provide a brief summary of some key features of the assessment and sampling procedures used in NAEP, APU, ASSP and IEA. Each of the above projects has been constrained educationally by the inbuilt problems arising from their large-scale administration. With notable exceptions, such large-scale assesssment programmes dictate the domination of items requiring written responses to printed stimulus materials, and, further, they limit evaluation mostly to elements of the curriculum which are common throughout the nations concerned. These constraints have largely mitigated against efforts to broaden the bases of evaluation and have emphasized readily-measurable, atomistic curriculum elements. The resulting distortions have been the cause of considerable concern to a specially appointed USA committee: Even the best of current efforts within NAEP only provide a view of children’s command of basic academic knowledge and skills in mathematics, reading, and writing . . . they represent only a portion of the goals of elementary and secondary schooling . . . 41

Changing Educational Assessment The Academy Committee is concerned that the narrowness of NAEP may have a distorted impact on our schools . . . At root here is a fundamental dilemma. Those personal qualities that we hold dear — resilience and courage in the face of stress, a sense of craft in our work, a commitment to justice and caring in our social relationships, a dedication to advancing the public good in our communal life — are exceedingly difficult to assess. And so, unfortunately, we are apt to measure what we can, and eventually come to value what is measured over what is left unmeasured . . . The language of academic achievement tests has become the primary rhetoric of schooling . . . We repeat that what is assessed tends to become what the community values. Thus, it seems critical that the assessment direct attention towards the fullness of the human experience. (Report of the Study Group, 1987) With such constraints, how can any politician or government honestly claim that national assessment will validly portray educational standards? This is especially of concern when placed in the context that underlying the standards debate are the designated ideals of educational excellence and the curriculum principles embraced by those ideals. As discussed above, these ideals and principles dictate: a broadly diversified curriculum; positive differences between curricula, and an attractive and stimulating curriculum. Clearly, all is not well at present as far as strategies used in large-scale assessments as the bases for educational standards are concerned. But that is not all. Reshaping the agenda: the nature of standards, and processes In the reshaping of the standards agenda, there are two important aspects of that agenda which, in my view, are equally important to one another. First, there is the nature of the standards themselves. What do we mean by standards? (This implies the need for a redefinition both at the professional and public levels.) This includes the question of: what are the criteria for standards?; what are the most effective ways of defining standards?, and how can we best communicate about standards —especially amongst educators, and with the public. Second, there are the processes by which standards are developed, negotiated, understood. Who should be involved in deciding upon the questions concerning the nature of 42

Reshaping the standards agenda standards? I shall now consider these two aspects (nature and processes) in turn. The nature of standards Changes over time (including the instability of criteria and assessment strategies) Elsewhere, I have described the changing perceptions historically of educational ‘standards’ (Cohen 1989). To take the example of ‘literacy’, prior to 1900 the signing of one’s name was regarded as evidence of being literate. With the relatively recent arrival of the ‘whole language’ approach to education, there has been a further re-evaluation of literacy, and there is certainly now a much wider interpretation of assesssment of language abilities than simply to assess performance in areas such as grammar and spelling. Accompanying these changes in curriculum emphases, one would expect to find that the criterion of achievement in language had shifted substantially from the traditional areas, such as ‘the number of words correctly spelled in a twenty-word test’, to a set of criteria which might include ‘the diversity/variety of different words used in telling a tale or writing a one-page story’. Such a shift would be totally reasonable and would better reflect recent trends in whole language teaching, thereby intrinsically improving the level of validity of assessment. Of course this rather complicates assessment — a price we must pay for progress. At the same time, it serves to remind us, as Weitz (1961) points out, ‘In many instances we use criteria which are expedient. In these cases, certain behavioural measures are found readily available (or more available than others we would prefer to have), so we use them.’ (Weitz 1961:228). Too often, the readily available criteria and/or instruments for assessment dictate the strategies used in education, both by teachers and indeed by researchers. Put another way, ‘give a child a hammer, and suddenly everything will need hammering’. It is crucial to realize that the selection of different criteria can have a highly significant impact on the outcomes of evaluation. Equally, with changes in criteria, even if at some stage in history there are convenient so-called benchmarks, educational advances will soon invalidate the use of these benchmarks for future assessment. The very concept of timefixed benchmarks needs reconsideration. 43

Changing Educational Assessment Changes in curriculum emphases Changing content, changing teaching methods, changing emphases in curriculum objectives . . . these are but a few factors which make it imperative to change assessment strategies or even criteria. Numerous reports on secondary education have referred to the problems of disintegrating the school curriculum into subject areas. Following a national survey of curricula in Australian secondary schools, it was reported that: a major constraint upon any implementation of SBCD was the impact of compartmentalisation within the school. This is the product of school organisation as subject departments, in which the curriculum is disintegrated into discrete and non-communicating subject disciplines. Each of these subject areas created their own jealously-guarded territory and resources for which they competed aggressively. This competition related to resources, including time, staff, space, and funds. (Cohen and Harrison 1982:265-6) In the reshaping of the standards agenda, one might hope to introduce common cross-curriculum integrating themes and processes such as investigating, communicating, expressing, critical thinking, creative thinking and negotiation. In this respect, it was interesting to hear the British government's Secretary of State for Education, Kenneth Baker, agreeing with the late Lawrence Stenhouse, and confessing ‘a concern that his Bill would do little to cater for creative learning’, and further to note with Elliott (1983:13) that ‘GERBIL assumes that in setting targets the -task groups will be establishing educational standards’. Contrast this with Stenhouse’s view that the quality of educational achievements is manifested in unpredictable and diverse performances. Contexts affect standards Recent issues of Phi Delta Kappa (February 1987, January 1988, April 1988, June 1988) have included articles which debate the presentation of findings and scores from the NAEP so-called ‘Reading Proficiency Scale’. The article by McLean and Goldstein was described as ‘a vitriolic attack on the design and interpretation of the NAEP’ (Stenner et aL 1988:765). At the root of the debate lie questions about whether reading 44

Reshaping the standards agenda comprehension is unidimensional and therefore reading proficiency can be represented as a single factor, as NAEP has claimed; or whether ‘people tend to exhibit different performances in different contexts, since interest, motivation, intention, and the like all play a role’ (McLean and Goldstein 1988:371-2). Commonsense, coupled with personal experience, confirms for me that undoubtedly context affects performance, and therefore assessments must take contexts into account. This is one of the biggest concerns with large-scale assessments. For example, is it not the case that IEA violates the basic tenets of sound and valid assessments, in that it ignores the contextual factors which are culturally inbuilt into educational systems? Even taking two so-called developed nations like the USA and Australia, how can comparisons validly be made between performances of secondary school students between the USA, which manages to retain more than 90 per cent of its students to complete secondary schooling (year 12), and Australia, where, for the first time in 1986, schools managed to retain more than 50 per cent of entering students to complete year 12? And what of the tragically high price paid by Japanese students? While maintaining their high IEA status, they also suffer high suicide levels in the same age group. Also, what about the state or national data incorporating assessments which purport to compare the relative achievements between different schools, school districts or regions (including, in Australia, divisive arguments about relative assessments from government and non-government schools)? Such data usually ignore the crucial contextual data (e.g. socio-economic levels) about the populations which they are being used to depict. The 1988 report of the UK government’s Task Group on Assesssment and Testing for the National Curriculum was cognisant of such contextual factors; but it wanted rather dishonestly to convey to the public that they should have ‘confidence in the measurement precision’. Tts solution is that the aggregate school results should be published along with a “general report for the area . . . to indicate the nature of socioeconomic and other influences which are known to affect schools”.’ (Goldstein and Cuttance 1988:198). There certainly are lies, damn lies and statistics! Two contextual explanations for apparent declines in results from assessment are increased retention rates in schools, and the fact that schools have increasingly catered for a more diverse ethnic population — surely factors which should outweigh public displeasure at changes in results. 45

Changing Educational Assessment Perceptions about standards can be improved by better communication between educators and the public One of the greatest problems about the standards agenda results from the paucity of definition within educational groups. It is important that shared perceptions of definitions form the bases of discussions. Then, one can ask, with whom do we communicate? Perhaps we are communicating moreand-more about less-and-less to fewer and fewer! Where have been our voices of protest? On this question Jacoby, in his 1987 book entitled The Last Intellectuals, had this to say: Younger intellectuals no longer need or want a larger public — they are almost exclusively professors — campuses are their homes, colleagues their audience, monographs and specialist journals their media. Unlike past intellectuals they situate themselves within fields and disciplines for good reasons — their jobs, advancement of salary depend on the evaluation of specialists and this dependence affects the issues . . . and the language employed. Independent intellectuals who wrote for the educated reader are dying out, to be sure often they wrote for small periodicals. Academics today write the professional journals that create societies, gathering in annual conferences to compare notes; they constitute their own universe. A famous sociologist means famous to other sociologists and not to anyone else. Recent research concerning how the press treats education (Cohen 1987) highlighted the concentration of the press upon the polarization of issues and upon what can only be described as priority of prominence and of column space for destructive attacks upon education. These attacks featured negative headings and coverage and allegations of ‘declining standards’, with primary news definers being prominent non-educators high in authority positions but totally lacking any evidence of professional credibility in the field of education. On the other hand, these same newspapers also exhibited a willingness to carry education-related items which conveyed human interest stories about accomplishments of individual students or teachers. There is a great need for educators to provide the press with news releases which capitalize upon positive newsworthy slants. We must also explain to the public that assessment is highly complex and its language is often ambiguous. Likewise the public must come to realize that some aspects of assessment and its strategies are more widely 46

Reshaping the standards agenda accessible to professionals than to the lay public, just as other valued professions share their technical vocabularies, symbols and strategies. At the same time, educators generally, and particularly those responsible for assessment, have an obligation to make much clearer to the public generally, and to politicians in particular, that current assessment strategies are totally inadequate to reflect the breadth and depth of the education each student receives. The public should be better informed that: (1) assessment focuses upon products of learning but rarely upon processes of learning; (2) current strategies assess limited aspects of academic learning but largely neglect such vital areas as social development (how well do students get on with others?), emotional development (do students feel good about themselves?), cultural development (contact with ‘the arts’), and physical development (e.g. how fit are the students?); (3) current assessment strategies provide some information about certain types of learning progress in some subject areas, but largely neglect crucial areas such as divergent and creative thinking, as well as critical thinking and problem solving; (4) current assessment strategies can cause irreparable lifetime damage to individual students, especially to their self-images, by providing negative feedback, and (5) many (if not most) important educational outcomes are not able to be measured, weighed or counted, and in fact assessments represent probably at best 10 per cent of the total impact of learning by students. Process by which standards are shaped A major lesson which seems not to have been learnt in the USA and the UK is that changes in educational standards cannot be legislated, nor can they result from political edict. The blind faith of non-educators in mass-production strategies — prescriptive, mechanistic, teacher-proof models and materials, imposed teacher in-service agendas — are based upon the totally false premise that identical sets of solutions can be effected with the wave of a wand, and, furthermore, that they will solve a diversity of needs and problems. In fact, improvements are likely to result from the encouragement of school-level participatory activity. The empowerment of individual schools to be involved in school improvement has meant the acceptance of shared responsibility and the freedom to tailor a curriculum to meet the needs of the school and its studies. Those familiar with the findings of 47

Changing Educational Assessment the great Eight Year Study in the USA of the 1930s hardly need to be reminded of the value of this curriculum freedom. In that study, thirty secondary schools were given curriculum freedom, and they then developed thirty quite distinctive curriculum patterns. More emphases became placed upon problem solving and clear thinking, and upon future careers. Rigid subject boundaries were broken down and schools addressed the problems of how best to prepare students for life. At university level, nearly 1,500 students from these autonomous schools were compared with (otherwise similar) students educated in traditional high schools. The students from autonomous schools achieved higher grades, had higher levels of curiosity, were more actively concerned about world affairs, were more active in extra curricular activities, were more precise, systematic and objective in their thinking, showed greater powers of leadership, made better choices of careers, participated more and enjoyed more aesthetic activities, and were more socially active. Note that the ‘standards’ game which relies upon standardized or widescale product-oriented testing, effectively suppresses nearly all these indices of superiority! There have also been significant examples where federal agencies have provided small grants directly to stimulate individual schools in their innovative curriculum activities. Two such major Australian programmes have been the School Innovations Programme, and the Human Rights Curriculum Project school grants. Each of these provided seed money of around $1,000 to schools which proposed innovative curriculum activities. This represented yet again the wellknown, but often ignored finding of many educational research studies, namely that: the focus should return to the school as the unit of learning and as the unit of improvement; and that where this occurs, then the improvements in morale, job satisfaction and teacher effectiveness are likely to enhance overall educational standards. Involvement in decision-making breeds commitment to the implementation of these decisions. Curriculum development and teacher development go hand-in-hand, as the late Professor Lawrence Stenhouse established. In conclusion: a four-part action plan It is probably not an exaggeration to suggest that real education internationally is in danger of extinction and of replacement by work-related training. In an effort to reverse 48

Reshaping the standards agenda this trend, I commend to you a four-point action plan as follows: Mobilize professional

associations

We need to call upon professional educational associations to denounce and condemn publicly recent political actions in curriculum and assessment which contained negative pressures towards homogenising education into a mediocre pulp. High standards We need to remind the world that, at the present time, more people are being educated for more years and to higher levels than in any previous era in history, but that, despite the progress made with assessment strategies, no assessment schemes will ever be able to reflect fully many aspects of educational development. School public relations programmes We need to encourage individual schools to proclaim proudly and publicly the very positive advances which they have made and continue to make in the all-round education of their students, many aspects of which defy measuring, counting and weighing. Advancement but limitations in assessment We need to publicize widely recent efforts to improve and diversify assessment strategies for schools, including the use of profiles and portfolios of student work; but at the same time, we need also to publicize the limitations of assessment to portray adequately: the richness of the accomplishments and talents of our individual students, and the diversity of outcomes in relation to social, emotional, cultural, physical and higher intellectual talents. These are unprecedented accomplishments which we as educators have cause to celebrate. Their dissemination to the media and to the public is an urgent and vital task if we are to defend the real key to educational quality. At last, the media have great news to broadcast and publish about 49

Changing Educational Assessment assessment and curriculum. Let the public share our celebrations! References Brewster, D. et al. (1985) ‘Leaving school and going where?’ Education News, 19, 4 (July 1985): 19-23. Carlton, J. (1988) ‘The education debate continued . . . The virtues of honest measurement, in focus’ Weekend Australian (August 6-7 1988):5. Choppin, B. (1985) ‘Assessment’ in T. Husen and N. Postlethwaite (Eds in Chief) (1985)International Encyclopedia of Education: Research Studies, Oxford: Pergamon Press. Cohen, D. (December 1987) ‘Raising the curriculum issues: what are they and how are they created?, Australian Educational Researcher, 14, 3:2154. Cohen, D. and Harrison, M. (1982) Curriculum Action Project: A Report of Curriculum Decision-Making in Australian Secondary Schools, Sydney: Macquarie University. Cohen, D. (1989) ‘Evolution of the concepts of literacy and numeracy’, in G. Tickell and S. Marginson (eds) (forthcoming) Raising the Standards. Education Reform Act (1988) London: HMSO. Elliott, J. (1985) ‘Teaching for understanding and teaching for assessment: A review of teachers’ research with special reference to its policy implications', Chapter 13, in D. Ebbut and J. Elliott (eds) Issues in Teaching for Understanding, York: Longman for SCDS Publications. Elliott, J. (1988) 'Education in the Shadow of G.E.R.B.I.L.' The Lawrence Stenhouse Memorial Lecture, BERA Annual Conference, University of East Anglia, September 1988, mimeo. Goldstein, H. and Cuttance, P. (April-June 1988) 'A note on national assessment and school comparisons', /. Ed. Policy, 3, 2:17-20. Goodlad, J. I. and Klein, M. F. (1975) The Conventional and the Alternative in Education, California: McCutcheon. Iacocca, L. (with Novak, W.) (1984) Iacocca: An Autobiography, New York: Bantam Books. Jacoby, R. (1987) The Last Intellectuals, New York: Basic Books. McLean, L. and Goldstein, H. (1988) ‘The US National Assessments in Reading: Reading too much into the findings’, Phi Delta Kappa 69, 5 (January 1988):369-72. National Commission on Excellence in Education (1983) A Nation at Risk, Washington DC: US Government Printing Office. Peters, T. J. and Waterman, R. H. (Jr) (1982) In Search of Excellence, New York: Harper & Row. Power, C. et al. (1982) National Assessment in Australia: An Evaluation of the Australian Studies in Student Performance Project, (ERDC Report No. 35).

50

Reshaping

the standards

agenda

Primary School Pupil Assessment Project (1988) University of Southampton, ‘Glossary of some assessment terms’, mimeo. Report of the Study Group, Lamar, A., Chairman (1987) The Nation’s Report Card: Improving the Assessment of Student Achievement, Cambridge, Massachusetts: National Academy of Education. Spring, J. (1985) ‘Book review: The school and the economic system’, Curriculum Inquiry, 15, 2:223-7. Stake, R. (1988) ‘Oral presentation at the Invitational Seminar on Assessment’, University of East Anglia, Norwich, September 1988. Stenner, et al. (1988) ‘Most comprehension tests do measure reading comprehension: A response to McLean and Goldstein’, Phi Delta Kappa 69, 10 (June 1988): 765-7. Task Group on Assessment and Testing (1987) A Report, London: DES. Tyler, R.W. (1987) Personal interview with writer, San Francisco, California. Weitz, J. (1961) ‘Criteria for criteria’, American Psychology, 16:228-31.

51

Joys and Sorrows Reflections by Pablo Casals And what do we teach our children in school? We teach them that two and two make four and that Paris is the capital of France. When will we also teach them what they are? We should say to each of them: Do you know what you are? You are a marvel! You are unique. In all the world there is no other child exactly like you. In the millions of years that have passed there has never been another child like you. And look at your body — what a wonder it is! Your legs, your arms, your cunning fingers, the way you move! You may become a Shakespeare, a Michelangelo, a Beethoven. You have the capacity for anything. Yes, you are a marvel! And when you grow up, can you then harm another who is, like you, a marvel? You must cherish one another. You must work . . . we all must work . . . to make this world worthy of its children.

4 National assessment: a comparison of English and American trends Caroline Gipps

Introduction National assessment has been in operation in England and Wales since 1975 and in the USA since 1969. In its early days, the English model — the Assessment of Performance Unit (APU) — was modelled on the American National Assessment of Educational Progress (NAEP) with a key role to monitor standards. Around 1983, however, the APU began to change its focus and to analyse its data for professional development purposes. Currently it is moving towards supporting the assessment of the new national curriculum. By 1993 it will be rolled up into the national curriculum assessment programme and will disappear. With the emphasis that this latter programme has on competition and comparison, accountability and the market place, national assessment in the UK will have gone well away from its ‘professional’ role in the 1980s and moved back to a ‘standards’ and accountability model. NAEP, too, is now to be involved in testing at state level to allow state-by-state comparisons (and, ultimately, school comparisons with national data). In both countries we can see an attempt to move towards control over education by external sources using an accountability-through-testing model, with national assessment underpinning the process. In this chapter I want to expand on the current trends in national assessment in these two countries, to draw the comparisons and to comment on the relative influence at political and professional levels of the two systems. The assessment of performance unit The APU is a unit within the Department of Education and Science (DES) which supervises the national assessment of 53

Changing Educational Assessment performance in maths, language, science, modern languages and design for technology. Although the APU was set up at a time of concern over the education of minority children, and has as one of its tasks to identify ‘underachievement’, in reality its main task, as far as the DES was concerned in the early 1970s, was to operate as an indicator of educational ‘standards’ and to give ministers information on whether, and by how much, these were rising or falling. 1979-83 In the growing atmosphere of accountability in the late 1970s, when it became clear that the APU was intending to monitor standards, there was considerable concern that the APU was intended as an instrument to force accountability on schools and therefore teachers. The APU assessment programme, though ostensibly concerned with children’s standards, was interpreted as potentially dealing with teachers’ competencies. Our evaluation (Gipps and Goldstein 1983) showed that this fear was largely unfounded, likewise that the APU was a Trojan horse to bring in centralized control over the curriculum seemed unlikely. Briefly, the decisions to go for light sampling, matrix sampling and anonymity of pupils and schools, together with the research team’s attempts to manage curriculum backwash, and their style of reporting which meant that the reports were rarely read, defused concerns about its role in teacher evaluation and control of the curriculum. The APU has had only limited success in monitoring changes in performance on tests over time due to a major technical problem. That is, changes large enough to be meaningful will be detected only over a number of years, at least four or five, and any serious monitoring of performance would go on over a longer period than that. For example, the NFER Reading Surveys ran from 1948-72. The problem is that the same test used over this sort of period becomes dated. Curriculum and teaching change, thus the test becomes harder and standards will seem to fall. To make the test ‘fair’ it is necessary to update it, but then one cannot compare the results on the modified version of the test with the results on the original form because it is not a true comparison. The problem here is that various statistical techniques are needed to calculate comparable difficulty levels and there is no consensus on which of them is satisfactory. It seems 54

National assessment that ‘absolute’ measures of change over time are difficult if not impossible to obtain and, at most, we can only hope to measure ‘relative’ — between group — changes. (Goldstein 1983) In the early 1980s the APU had to drop the controversial Rasch technique of analysing difficulty levels of test items and admit that it could not comment on trends over time. This issue has dogged the APU’s work for several years (see Nuttall 1986), and is by no means a uniquely British problem. The National Assessment of Educational Progress has also grappled with it and relies on using ‘item response models’ which are elaborated versions of the Rasch technique. Both NAEP and the APU use a subset of common items between occasions to try to equate the score scales for change purposes. The problem with this method is that the core of items which is used regularly cannot provide a wholly representative sample of the items used in any particular survey and so the information thus provided on changes in performance over time is inevitably limited. Underachievement What the APU is doing, rather than making comments about overall levels of performance or standards, is comparing the performance of sub-groups at points in time, for example boys and girls, regions of the country, and children with different levels of provision of science equipment, laboratory accommodation, etc. This has been one of the main activities of the APU over the last five to six years, and could be described as looking at ‘relatively low performance’, which is the unit's working definition of underachievement. 1983-7 Each of the three major survey areas — maths, language and science — completed an initial round of five annual surveys in 1982, 1983 and 1984 respectively. After this initial phase the teams were commissioned to survey only every five years and to spend time in between surveys on dissemination and on making a more detailed analysis of their findings, for example in relation to background factors. During this time surveys in modern (foreign) language and design for technology took place. 55

Changing Educational Assessment Dissemination involves writing publications for teachers. Over the year 1985-6 the teams wrote eight booklets (for example, Decimals: Assessment at ages 11 and 15; The Assessment of Writing: Pupils aged 11 and 75; and Planning Scientific Investigations at age 11). These are aimed specifically at teachers in order to get across messages about particular areas of the curriculum. The implications for teaching are clearly and concisely stated. As the government paper Better Schools put it: The findings of the APU surveys of the pupil population as a whole provide much information about the obstacles to good performance which individual pupils face in the subject areas surveyed. This information has great professional value for classroom practice and is being made available to teachers through a series of booklets dealing with specific aspects of learning. (Para 82, DES 1985) There is no doubt that the findings from the maths, language and science teams include a tremendous amount of information for teaching, whether it is about children’s errors in maths, children’s misconceptions in science, or the linking of reading, writing and oracy skills in language. The APU is committed to using the implications of the survey findings to help teachers. Clearly, it would be a waste not to do so. In modern languages, for example, there is a research project to exploit the survey findings for their lessons for teachers. This involves working with teacher groups on the assessment processes and the findings. One outcome will be the production of training materials for teachers. The APU also had an input to the GCSE, the new 16+ public exam, the General Certificate of Secondary Education. The curriculum models adopted in languages and science rely heavily on the frameworks used by the APU (see Gipps 1987). The emphasis on practical and oral assessment in GCSE, too, owes much to the pioneering work in assessment in these areas by the APU. 1988 This has been a year of tremendous change in education in the UK. The Education Reform Bill (ERB) which became law in the summer of this year has at the heart of its reforms for schools a national curriculum enforced by national assessments. 56

National assessment I shall refer to this as national curriculum assessment in order to distinguish it from the national assessment carried out by the APU. In brief, from 1991 all children of 7, 11, 14 (and 16 in non-GCSE subjects) will be assessed on attainment targets (what children should know, understand and be able to do) which form a key part of the national curriculum. These results are to be available to parents in detailed form, and to be publicly available in a summarized form. Ultimately, these data will be analysed centrally in much the same way as the APU data are now. This is currently a little-known fact in England and none of the questions about whether the data will be anonymous, who will control their analysis and use etc., has yet been asked. A question which clearly has been asked at the DES is — do we need the APU if we are going to have national curriculum assessment data on all children? And the answer to this is — no. The APU will disappear in the mid 1990s because, the argument goes, the new assessment will provide enough information on standards. The work of the APU that is currently in hand will continue in a modified form: subjects in which there are surveys still to come — science, foreign language and design and technology — will test at ages 11 and 14 rather than 11,13 and 15 as previously, and the assessments will ‘take account’ of the national curriculum attainment targets as they become finalized. There will also be an APU survey of 7-year-olds in 1991. But why continue with this testing programme at all if the DES feels that the new assessments will provide all the information that they need? The answer is, to provide a baseline of data against which the national curriculum can be assessed in the future. (Of course, the extent to which the APU data can be used for this depends on the overlap between the APU curriculum frameworks and the national curriculum. We have no idea yet how great this overlap will be, though it is inconceivable that the extensive APU data on what children on various ages can do will not be used to help determine the attainment targets.) The other reason is that the APU data will be far more comprehensive than the new assessments for the first few years. From 1992 the work of the APU will be progressively absorbed into the two new bodies set up to oversee the national curriculum and assessment — the National Curriculum Council and the Schools Examination and Assessment Council. In the meantime, the APU is embarking on a major programme of dissemination and publication of its findings and messages. 57

Changing Educational Assessment Once the APU becomes absorbed into the national curriculum assessment programme it is unlikely that there will be the same emphasis on disseminating messages for teachers and in-depth analyses of the data. The new national assessment data will be used to make league tables of schools (and school districts) in a way which has not happened in this country before. Data will also be available on classes, thus providing a measure of teacher performance (without taking account of the ability or previous attainment of the children). So, national assessment in this country will have gone from monitoring of standards, via detailed research and dissemination of information on pupil performance to help teachers, to accountability. Its level of success at the first of these was minimal, and at the second limited (although the actual findings were of great value). It remains to be seen how effective it is in its accountability role. The national assessment of educational progress 1969-83 NAEP, based at the Education Commission of the States, conducted its first survey in 1969. It monitors achievement in terms of what students know and can do across the country. It does so by drawing a representative sample of students at three age levels — 9, 13 and 17 — in a variety of curriculum areas and compares performance over a period of years. NAEP also collects information on the backgrounds and practices of teachers and on the characteristics of schools. It attempts to relate achievement to these factors and can break down its information by sex, by race or ethnic background, by type of school attended, by the amount of training of teachers, etc. During the period 1969-83 much effort went in to test development and design, and analysis of the data. Like the APU at this time the reports of its findings were little read and NAEP was far from a household name, despite America's keen commitment to testing. 1983-8 In 1983 NAEP was transferred to Educational Test Services (ETS), a commercial test development agency. In its application for the five-year grant to run NAEP, ETS set a dual objective: to improve it as a useful measure of educational 58

National assessment achievement and to promote its use. Again, like the APU, dissemination was seen to be important: what use is a national monitoring system if the nation doesn’t know about it? At its inception there had, as in Britain, been fears that NAEP might lead to a national curriculum. By 1983 these fears had lessened and ETS was able to set up a new design which collected and analysed more information. The main features of this design are: • • •







Regular assessment cycles, spaced so that the same cohorts of students can be monitored at three points during their schooling. Information by grade level, as well as age. Sampling and assessment techniques that permit NAEP to (1) create achievement scales in each subject area (in the past, NAEP was limited to reporting only the results of individual exercises or averaging the percentage of correct answers); and (2) collect unprecedented amounts of student background and attitude information, permitting examination of many environmental factors and their links to student achievement. Relating achievement information to the contexts of learning: home environment, teacher practices and school characteristics, as well as information about the students themselves. Providing some practical suggestions, where the findings and analysis warrant them, as to how achievement can be improved, although NAEP’s central mission remains measurement of educational progress. Devoting as much emphasis to disseminating the results as gathering the data. (ETS 1987:6)

The assessments covered reading, writing and literacy for young adults in 1984; reading, maths, science, computer competence, US history and literature (for 17-year-olds) in 1986; and reading, writing, US history, citizenship and geography in 1988. The assessments are mostly of the multiple-choice type, but funding has been received from the National Science Foundation (NSF) to develop practical ‘hands on’ tasks to assess higher-order skills in maths and science. NAEP maintains that the concepts measured and the innovative approaches used in these tasks are equally suitable for teaching, so it has published a Manual for Teaching and Assessing Higher-Order Thinking in Science and Mathematics. 59

Changing Educational Assessment Another development has been the reading scale which gives each group of students a single numerical level of proficiency (as with a reading quotient or age). This scale goes from 0 to 500 with five levels of proficiency: rudimentary (150), basic (200), intermediate (250), adept (300) and advanced (350). For each level there are descriptions of what students ‘know and can do’ as with the British attainment targets. The scale has been developed using item response theory. This scale has been criticized (see e.g. McLean and Goldstein 1988) both on statistical grounds and for the simplified view of the reading task which it assumes. NAEP, however, is likely to continue using it since it not only allows reduction of detailed data to a single figure, which makes communication easier, but it also enables NAEP to make comparisons of performance over time. The other major development has been the use of NAEP assessments by states. By 1987 thirteen states were involved, covering around one-third of the USA's school children. The states either analyse NAEP data for their state to provide data at state level, or, where the NAEP sample drawn is not statistically representative, they carry out extra testing using NAEP items. These state assessments permit comparison among states and between state and national data, which NAEP itself is not allowed to do. In the spring of 1988 a Bill was passed by the Senate to expand NAEP and remove the ban on state-by-state and district-by-district comparisons of NAEP scores. Against fears that this would result in rank ordering of states, and the consequent attempts to boost test scores with the undesirable consequences that this could bring, the Bill was modified before going to the House of Representatives. NAEP will be expanded, so that there are larger assessments of reading in 1990 and maths in 1992. State-by-state comparisons will be allowed, but will not be mandatory. States can decide whether or not to be part of the expanded testing and, once testing is completed, whether or not to have their state’s scores reported separately (Fair Test Examiner 1988). The test reform lobby won a range of modifications and safeguards on the grounds that a ‘national exam’ would be narrow and limited, use only multiple-choice techniques and ultimately drive the education system. It would also facilitate central, federal, control over state education systems: There is an influential minority that wants to use tests to help centralise educational decision making. A narrowly 60

National assessment designed federal test, however, is not in the best interest of kids, of school reform or of assessing education.’ (Fair Test Examiner 1988) A comparison We can see points along the way where the design and direction of the two national assessment schemes have touched. The aim of monitoring standards, or measuring educational progress, is common to both, and is the enduring political raison d’etre, the reason for the considerable funding which both have received. Although the NAEP has dallied with producing practical advice for teachers, it has never done this to the same extent as the APU. The APU was almost hijacked by the ‘professional’, i.e. using the information for teachers, movement in the early 1980s. Such was the concern to appear to be a professionally valid activity, and such was the confusion over the statistical issues in measuring changes in performance over time, that not only was this allowed to happen, it was actually encouraged. Around 1983, the task for both systems was dissemination: for the APU, to get teachers and teacher trainers to read the material that was aimed at them; for NAEP to make educators, parents and the public aware of the NAEP findings in the Nation's Report Card. For both systems this was a central requirement: national assessment without an audience does not represent value for money, particularly if, politically, it is supposed to have an accountability, standards-monitoring role. For even the APU, despite its mining-the-data-for-messagesfor-teachers emphasis, had not abandoned commenting on standards. However, as Wood and Power point out (Wood and Power 1984), neither of these national assessments, nor the abandoned Australian one, tell us very much about standards. Standards in education are hard to define, partisan in the setting, and have a complex relationship to testing (Gipps 1988). Wood and Power (op. cit.) maintain that national assessments promote a view of standards which is narrowing and limiting, emphasizing as they do selected, minimal accomplishments. Husen argues that low standards (the political fuel for national assessment) are not the most serious problem with public schooling: standards of average performance can easily be raised by making a system more selective. The real problem he sees as the emergence of a new 61

Changing Educational Assessment educational underclass, encouraged by the promise of equality but defeated by the workings of meritocracy (Husen 1983). The standards, testing, public education debate is, of course, a deeply political one, with differing educational philosophies at the core; the role of national assessment is determined by the prevailing political ideology with regard to education. Thus, both national assessments are currently being steered back at the standards path. NAEP is developing new techniques and scales, like the reading scale, to do this. The reporting of a single score is something the APU teams have by and large avoided on the grounds that there is far more useful information to be gained in looking at performance in different sets of skills or areas of the curriculum. Of course the problem is that this sort of detailed information is much more difficult to digest and to handle; what many politicians and members of the public would like is a single figure — like the old reading quotient. NAEP does produce for each child a single comparable score on a test (called a plausible value) which, because not all the children do the same questions on a test, requires a fairly complex statistical manipulation of the data incorporating race, sex, region, parental education and other background variables. The APU's reluctance to make this sort of statistical manipulation to produce a single figure has been, I would argue, though commendable, its undoing. It has given politicians little hard and fast information of the kind they want: it can say ‘x per cent of 15-year-olds can add two four digit figures’, but cannot say ‘reading standards have gone up by x points since 1985’. So the APU, which had a greater potential to influence teachers and teacher educators than NAEP, is to be rolled up into the most comprehensive centralized assessment system in the world, linked as it will be to a centralized curriculum. It is arguable whether the UK national assessment is to ensure that the national curriculum is taught (our teachers are notoriously resistant to new developments) or whether the assessment is the key feature of the new developments and the national curriculum's role is to legitimate it. What is the case is that politicians and educationists in America, Australia and Germany (at least) will be watching to see whether our state education system is indeed turned around in the way demanded by the current political ideology. Of course, one of the developments will be increased competition which will always push test scores up (for the more able). The other provisions in the ERB will lead to more selectivity at school level, while the assessments will lead to more differentiation 62

National assessment within schools. Following Husen’s argument, average standards can be expected to rise. But, it must be asked, at what cost to the less able student, the ethnic minority student, the student with special needs, the disadvantaged? And at what cost, too, to our justifiably famous primary education: what of those primary schools which practise that most difficult, yet rewarding form of education for young children based on a child-centred, active-learning philosophy? On both sides of the Atlantic the pressure is on, not just to have information about national standards, but to make comparisons and so to allow competition and comparison to force standards up. State and district comparison is an avowed aim for the British system. That this approach has a backwash effect on schooling is not the politician’s problem. The concern for Britain is that our national assessment, which offered so much for teachers, but was relatively little-used by them, will be transmuted into a powerfully controlling system with profound effects on teachers’, as well as children’s lives. Acknowledgements I am grateful to officials of the APU for continued access and to Professor Harvey Goldstein who commented on a draft of this paper. References DES (1985) Better Schools: HMSO. ETS (1987) Profiling American Education, NAEP Report 18-GIY Educational Testing Service. Fair Test Examiner (1988) 2, 2:11. Gipps, C. (1987) 'The APU: from Trojan Horse to Angel of Light', Curriculum, 8,1. Gipps, C. (1988) 'The debate over standards and the uses of testing', British Journal of Educational Studies, 1. Gipps, C. and Goldstein, H. (1983) Monitoring Children. An Evaluation of the Assessment of Performance Unit, London: Heinemann Educational Books. Goldstein, H. (1983) 'Measuring changes in educational attainment over time: problems and possibilities', Journal of Educational Measurement, 20, 4, Winter. Husen, T (1983) 'Are standards in US schools really lagging behind those in other countries?' Phi JMta Kappa, (March 1983). McLean, L. and Goldstein, H. (1988) 'The US national assessments in

63

Changing Educational

Assessment

reading: Reading too much into the findings’, Phi Delta Kappan, (January 1988). Nuttall, D. L. (1986) ‘Problems in the measurement of change’ in D. L. Nuttall (ed.) Assessing Educational Achievement, Lewes: The Falmer Press. Wood, R. and Power, C. (1984) 'Have national assessments made us any wiser about “standards”?’, Comparative Education, 20, 3.

64

5 Possibilities and limitations in crossnational comparisons of educational achievement Les McLean The thesis of this chapter will be that cross-national studies of educational achievement have no more limitations than similar national or local studies, but that they do have more possibilities. The focus is on comparisons of achievement, but such studies lead us into curriculum and structure and methods and social class, and more. First, the history of comparative achievement studies will be reviewed, looking at the questions asked and the methods employed. A current example will be examined in some detail by way of arriving at tentative generalizations about both possibilities and limitations. The modern history There have no doubt been comparisons as long as there have been schools, but the systematic study of achievement crossnationally is essentially a history of the IEA — the International Association for the Evaluation of Educational Achievement. The IEA came into being with the First International Mathematics Study in 1962, launched from the University of Chicago and brought to fruition under the direction of Torsten Husen and colleagues in Stockholm. The office of the IEA was then created to organize the six-subject survey (reading, literature, science, French as a second language, English as a second language and civic education), the results of which appeared more than a decade later (Walker 1976). The Second International Mathematics Study was launched soon after, followed by a second science study, research on classroom environments and written composition. As of this writing, the international reports on the latter studies were beginning to appear. An ‘Interim Report’ has been published on science (IEA 1988), and the international writing tasks and scoring scales were published from the study of written composition (Gorman, Purves and Degenhart 1988). Three studies are 65

Changing Educational Assessment under way — of pre-primary children, reading literacy and computers in education, and organization of a new study. ‘Teachers and learning social values and morality’ has been approved. In his foreword to the report on the six-subject survey (Walker 1976), Torsten Husen wrote that IE A studies contributed to better understanding of education systems in three main areas: (1) the links between social and economic class and instruction, (2) the structure of the common school system and (3) factors accounting for variation in achievement. This way of framing the questions relies for its answers on complex multivariate analyses of quantitative data, and the first two studies devoted large resources to this end. IEA statistical analyses employed the latest techniques and used state of the art computer programs of the day. The leading statistical consultants visited Stockholm and often worked there for periods of time. If technology were sufficient to extract insights from the numbers, then the studies would have indeed yielded the contributions Husen saw for them. Technology is not sufficient, of course, and insights from the first studies have been few. The problems lie in the complex social and political process that is education, a process that does not yet yield to quantitative description via the constructed variables of multivariate analysis. In contrast, the Second International Mathematics Study was planned from the beginning as a comparative study of mathematics education, in which achievement comparisons were expected to play an important but relatively minor part. The emphasis was to be on a thorough curriculum analysis, exhaustive survey of teaching methods, careful documentation of content taught (opportunity to learn) and achievement measures tied to the curriculum. All this was supplemented by documentation of school organization, student and teacher demographic statistics, and surveys of student and teacher attitudes. Anything that could be asked on a questionnaire form was asked, and considerable creativity emerged in finding ways to document teaching methods. Textbooks were analysed and proportions of age group in school charted up to the end of secondary school. Two populations were studied — ‘students in the grade where the modal number were 13 years of age’ and ‘mathematics specialists in the normal last year of secondary school’. With all this complexity it is perhaps not surprising that the international reports are taking a long time to appear. It is even less surprising, though for quite different reasons, that the most interest has been expressed in the 66

Cross-national comparisons achievement comparisons, many of which have been published with national reports (e.g. McKnight et al. 1987; McLean, Wolfe and Wahlstrom 1987). Worldwide interest in country ‘league tables’ continues unabated, also fed recently by the interim report from the Second International Science Study (IEA 1988). Apart from those who actually work in schools, there is relatively little interest in the finer points of schooling — what is actually taught, how the students are treated, what teachers think about themselves and about their teaching, what attitudes students have as they proceed through the hoops and the like. There is, on the other hand, currently an amazing faith and trust in student test scores as the single valid indicator of how well schools are performing. It would be foolish in the extreme for comparative educators to ignore this faith. It would be unprofessional and incompetent to cater only for it. Torsten Husen, who started his surveys of young people’s abilities in the mid 1940s, with wartime studies on the ‘reserve of talent’ in Sweden, wrote in a book on the relation of educational research to educational policy that researchers and policy-makers lived in two different cultures, with a considerable tension between them (Husen 1984). The book treats this tension from several perspectives, including practical, ethical and political. Experience with the early IEA studies has added a political dimension to the educational one. We will return to these tensions. A case study: IEA reading literacy In September 1988, the IEA General Assembly gave final approval for their newest survey — reading literacy. It is firmly in the IEA tradition, viz. ‘A central purpose of this study is to develop widely applicable measures to make possible accurate descriptions of literacy of particular groups or nations’ (Elley 1988). Literacy measures are not to be confined to reading but are to include ‘minimal numeracy’ and ‘scientific literacy’ (op. cit.). Writing is to be a national option (organized and co-ordinated by IEA, but not required of all participants). Among the outcomes intended from the study are: (1) sets of tests suitable for measuring literacy in all countries at two age levels. These tests, and the accompanying questionnaires, once validated, could be used in many contexts (emphasis added, for later discussion); (2) national data on four levels of performance for several literacy dimensions for each country; (3) comparative data across countries of literacy 67

Changing Educational Assessment competence on each dimension on the international scale (emphasis added); (4) data on relationships between various school, teacher or parental literacy competencies/practices for each nation and across nations (i.e. which strategies are associated with higher literacy); (5) comparative data across countries of literacy practices, and (6) relationships between literacy and economic indicators. The study proposal met with general enthusiasm, and about thirty countries immediately indicated that they would participate. The approach is exactly what policy-makers at the highest levels in Ministries or Departments of Education believe they need — test scores that can be compared across countries and related to other characteristics. The promise to relate literacy to economic indicators added sweet decoration to an already attractive cake. There is no tension between the team proposing the study and the policy-makers. Tensions do emerge, however, between the IEA team and language researchers around the world. Language researchers have varying concepts of literacy and corresponding pedagogies that rule out tests that can be applied across countries. Here is the way one researcher put it in a review of three recent books on literacy (Langer 1988:42): Although there is widespread agreement that definitions of literacy have changed across time, less attention has been paid to the plethora of meanings of the term in current research and practice. In the language and literature of today, literacy is used interchangeably to denote a skill, a state, and an action, each use stemming from a different set of research questions and a different tradition in research and instruction. Another reviewer (four books) wrote: ‘shared messages do emerge, chief among them that literacy has less to do with overt acts of reading and writing than it does with underlying postures toward language’ (Brandt 1985:128-9). C. Gordon Wells’ current working definition of being literate is ‘to have the disposition to exploit the potential of texts in appropriate ways to empower action and thinking in both inter- and intrapersonal contexts’ (seminar presentation, OISE, 9 November 1988). Moreover, the IEA study encourages continuation of a pedagogy that literacy and curriculum researchers are working strenuously to change (see e.g. Goodman and Watson 1977). The very idea of an international literacy scale is anathema to this large and vocal group. (The argument against a single 68

Cross-national

comparisons

reading scale is summarized in McLean and Goldstein 1988, with reference to the reading scale created by the US National Assessment of Educational Progress. David Olson, author of three books on literacy that have become citation classics, remarked, ‘Here’s what I think of literacy. Individuals only getting a score would be quite useless’ [personal communication, January 1988].) When the IEA study was first mooted in the writer's jurisdiction, a high-profile literacy researcher wrote to the Deputy Minister to argue that the study should not be funded. The proposal was substantially modified to attempt to meet some of the objections, but in the end the Ministry of Education decided not to fund it. We will never know what effect the letter had on the decision. Current language researchers hold that reading is a unitary process of meaning-making (Wells 1986), and that it is dysfunctional to try to separate sub-processes from the whole. ‘. . . trying to teach the husks of reading and writing — without their seeds in consciousness and their roots in life — can only baffle students and exhaust teachers’ (Brandt 1985:131). The IEA team proposes to emphasize reading tasks that ‘will measure competence in three relatively independent dimensions, viz.: decoding, location of information, and comprehension’. Researchers are stressing the necessity to engage students with substantial passages of meaningful text — with discourse rather than discrete points. The IEA team will present ‘words, sentences, documents, and prose’. As a practical matter, words and sentences will dominate, with multiple-choice questions posed about documents and short sections of continuous prose. Such an approach will accurately reflect much, if not most, of current teaching practice around the world, and hence an argument can be made for it on those grounds. Needless to say, such an argument does not appeal to those working to change present practice. In short, some have taken Torsten Husen’s warnings to heart and have begun to design cross-national comparisons to produce the results policy-makers seek. Since the policymakers provide the money for their projects, such adaptation is prudent and practical. The rationale offered goes something like this: ‘Better to approximate school outcomes, however roughly, and be listened to than to provide accurate portrayals and be ignored — especially if the accurate portrayals take years to prepare and whole books to describe’. In the case of the reading literacy project, however, this has meant setting the study aside from the broad mainstream of current scholarship.

69

Changing Educational Assessment Possibilities One of the possibilities, often overlooked, is that crossnational studies can document and compare the tensions among policy-makers, researchers and practitioners in the various countries. The main focus of most studies is elsewhere, usually on student achievement, so the information necessary to compare tensions is not always collected, but the more we seek policy implications the more we should consider such comparisons. Three examples should suffice to illustrate the possibilities presented by the tensions between groups: length of school terms, number of topics to be covered and ability grouping. Cross-national comparisons have more general possibilities, because one can regard every country as a case study and every variable as an outcome. Semester or full-year terms? In North America, the majority of secondary schools now offer two terms (semesters) per year, meaning that courses begin in September and end in December or January; after a short break (often at Christmas) a new term begins, with new courses and credits. Classes are typically twice as long (eighty minutes, for example) and there are half as many of them. The system is popular because it offers greater flexibility to the students. Some teachers like it and some do not. Because it is popular with students, policy-makers like it. Little research had been done before the Second International Mathematics Study came along, offering comparisons among hundreds of half-year and full-year classes as an offshoot of the study. Researchers found in one province in Canada that achievement in upper secondary mathematics classes was higher in fullyear than in half-year classes, just as the trend towards halfyear classes was accelerating (Raphael, Wahlstrom and McLean 1986). The researchers’ report was attacked by some practitioners as not definitive (which was correct, of course), but in another province the result was cited as a key element in a decision not to permit a general shift to half-year organization. Other countries do not use the same organization or terminology in a consistent way, so the international data did not yield clean comparisons. A salient feature of half-year classes is the double period, however, and data on period length were collected internationally. No consistent pattern favouring double or single periods was found, throwing doubt 70

Cross-national comparisons on that aspect as the probable cause of the achievement differences observed in one province. In a subsequent international study (Connelly 1987) and in local research (McLean 1987a, 1987b), the difference did not appear in science classes, nor is there widespread tension over the issue. We are left with a single result, which appears more tentative in the context of the series of studies. As some researchers have written, The problem with generalisations is that there are no generalisations’ (Lincoln and Guba 1978). Number of topics to be covered Should teachers give a small amount of time to many topics or more time to a few? It has become a commonplace that outcomes are directly related to time spent teaching, so the question is a pertinent one. Teachers in Canada complain frequently that there are too many topics to cover adequately in a school year, but policy-makers continue to insist on broad coverage. This aspect was made a central feature of the US report on the Second International Mathematics Study, The Underachieving Curriculum (McKnight et al. 1987). The US researchers compared eight countries (including Canada, but not England and Wales) and found that high achievement was attained where fewer topics were covered each year, and lower achievement where many were. Moreover, systems in which an attempt was made to cover many topics per year also reported re-teaching year after year, with little added achievement to show for it. The researchers’ evidence supports the practitioners in this case and is opposed to the policy-makers. Ability grouping Ability grouping, sometimes called setting, is popular with practitioners. Teachers find it easier, and most believe more productive, to work with relatively homogeneous classes rather than classes reflecting the whole range of abilities in the school population. In Ontario, students are divided into basic, general and advanced streams when they enter grade 9 (about age 14), and few change streams. Ability grouping is institutionalized, and it is very effective. Achievement differences between general and advanced classes within a grade (grade 9 or 10, for example) are larger than the differences between grade 7 and grade 10 (McLean 1982). In Japan, however, secondary school students are not 71

Changing Educational Assessment grouped by ability. The proportion of variance in achievement between school and classrooms is very small — half that of Ontario and a fifth that of New Zealand and the USA (McLean, Wolfe and Wahlstrom 1987). Average achievement is highest in Japan, however, and low in New Zealand and the USA. Ontario is in the middle. In short, the researchers’ evidence is against the preferences of practitioners and policymakers in most countries, though not in Japan. The crossnational comparisons force us to rethink some strongly held views. Case studies and outcomes Countries are natural case studies, in which all variables can be considered outcome variables. Demographic variables, for example, such as the social class composition of schools, are the result of a mixture of social factors and policy decisions. School composition might be considered a predictor variable in another type of analysis, but the point is that causes and effects are not uniquely defined when one considers the broad policy domain at the level of countries. Even teachers’ level of education and number of years of experience, usually taken as given, hence causal (predictor) variables, are outcomes of policy decisions over many years. It is most useful to begin any cross-national surveys with case studies of the individual countries prepared by those countries. In the second mathematics study, for example, each national centre first prepared an elaborate description of their education system, complete with a ‘system chart’ that portrayed the proportion of age groups in the various types of schools, from age 5 to the normal end of secondary school. This chart proved exceptionally useful in later stages of the data analysis. Many mathematics educators felt the study could have ended with the country case studies (which included elaborate curriculum and textbook analyses, among other things) and been worth the effort. Policy-makers, however, especially at the top levels, found the case studies far too voluminous and largely irrelevant to their outcome variable — student achievement. Here, the policy-makers’ and practitioners’ interests converge, in that both want achievement data. The policy-makers are happy when at least tentative explanations can be offered for differences that emerge — explanations that often come from the case studies.

72

Cross-national comparisons Limitations Three main limitations of cross-national comparisons of educational achievement will be discussed: the impossibility of rich description, the irresistible lure of simple answers, and the difficulties of clear communication. Only the first is a practical limitation; the other two are rooted in human frailty. Rich descriptions not possible In principle, it is possible to have rich descriptions of countries, but the very concept of rich description makes them impossible to share. Some will question the word 'impossible', but any weaker word fails to capture the barrier — language. In order to appreciate richness, you have to have enough knowledge of the context to recognize nuance and understand fine detail. The barrier is high indeed between languages (consider a rich description translated to English from Finnish), but it is formidable even when the language is the same. French in Quebec, Canada, is different in subtle ways from that in France, and the French in Arcadian New Brunswick (that’s also in Canada) is more different still. Rich description of schools in Scotland is fascinating, but much of the richness is lost on the writer, a Scot many generations removed. The result of this barrier is that we are reduced to examination of gross features of schooling — time averages, attitudes smoothed over hundreds of students or teachers. Such gross comparisons can be useful, as we found with the distribution of teaching time over mathematics topics, but the yield is small in relation to the effort. Cross-national study teams have to choose their targets carefully and concentrate their resources — not an easy task. The IEA reading literacy team has identified‘literacy activities’, for example, as an international option. Some countries thought this should have been a central feature of the study, but the difficulties are obvious. The solution proposed is to use questionnaires, ‘primarilybecause of ease of administration and the need for comparable data across nations’. The criteria for such questionnaires are daunting:‘ A concept of literacy must be presented briefly; specific behaviours rather than "attitudes" or "preferences" should be elicited; quantification of literacy activity must be easy to draw; and comprehensiveness of format, content, and purpose should be maximised’. The effort may never get past the first criterion — a concept of literacy 73

Changing Educational Assessment acceptable to all countries. The lure of simple answers For the most part, policy-makers are pragmatists; they know the limits of their influence. Schooling is too complex to control from the centre, so they reach for the few levers accessible to them. The most attractive in recent years has been the common achievement test, done to death in the USA and now rolling into place with the national curriculum in the UK. The common test is the simple answer syndrome taken to the extreme — hold everyone to a common standard and everyone will fall into line. Cross-national achievement league tables are a natural extension of the concept. But schooling is too complex to control from the centre; it is even too complex to understand from the centre. Crossnational studies have documented this in thousands of ways, but the lure of the simple answer is still too strong. Always the pressure is too great to resist, and the diverse results are squeezed, Procrustes-like, into a few simple tables. All the caveats are skimmed over, the qualifications watered down or removed and the generalizations produced. The present writer is as guilty as any (McLean, Wolfe and Wahlstrom 1987). Perhaps we should rejoice. The Second International Mathematics Study has already had a profound impact on several countries, for example the USA and New Zealand. The results have attracted interest everywhere, and the full international reports had not yet appeared as this was written. Maybe there is a moral and ethnical edge along which we can walk without losing our souls; but should we be reading Machiavelli, or Faust? The difficulty of clear communication Some of our reports are dreadful. They are wordy, obscure and generally poorly written. They richly deserve the obscurity to which they are consigned. The problem is that a talent for good writing is not always the companion of a talent for good research. We can get away with writing for our friends, so long as we wish to communicate only with our friends. The problem arises when we wish to get our message across to a wider audience that includes busy policy-makers. Some researchers are just learning the value of professional editors who know nothing about your subject but know when they do 74

Cross-national

comparisons

not understand your writing. Graphic artists can help us as well. The bad news is that we have to spend as much time preparing the reports (note plural) as on the analysis (and more time on the analysis). The report on the six-subject survey (Walker 1976) illustrates the limitations. It is well written for an audience of academics but is difficult for others to crack. There are thousands of other examples, including widely unread reports by the present author. A counter-example is the US report referred to earlier (McKnight et al. 1987). It has a short catchy title, The Underachieving Curriculum, lots of graphs and a format with plenty of white space and key quotations on facing pages for emphasis. It has recovered its production costs many times over. Too seldom, however, do we have the time and money to produce such works. An example that shows one can communicate on a smaller budget is the interim report on the Second International Science Study (IEA 1988), but that required superhuman efforts that are also in short supply (T.N. Postlethwaite, personal communication). Closing comments This chapter began with the assertion that cross-national comparisons of educational achievement had no more limitations than any other studies, but that they had more possibilities. It being impossible to discuss all limitations or all possibilities, this assertion is difficult to defend and still maintain some balance in the argument. It therefore bears this repetition, at least to remind readers of the author's position and intent. The case study would appear to emphasize limitations more than possibilities, but it was chosen because it brings out a major point not addressed at all in many studies, national or cross-national. The reading literacy study brings into sharp focus the role of the underlying theory of the subject matter, and the implications this has for the definition of achievement. In this particular case, theories have recently evolved that put much of current practice in conflict with new thinking. What does a cross-national researcher do in such a case? The IEA reading literacy team has opted for concepts from the recent past, and in this they have lots of company — the US National Assessment of Educational Progress, for example. By so doing they have gained the attention of policy-makers and the enthusiastic participation of more countries than any other study until now. Some colleagues experienced in cross75

Changing Educational Assessment national studies, as directors of a cross-Canada version of an IEA survey, have argued that every cross-national study is a policy study, so the IEA reading literacy team is probably taking the only step open to them (Connelly et aL). We should watch to see whether the tensions with language researchers and teachers are strong enough to reduce the value of the study. The message that is coming through from all this experience is that if human frailty can be substantially overcome and numbers produced, then policy-makers will avidly await cross-national comparisons. If enough descriptions of the context can be obtained, there might even be implications for administration and teaching practice. The usefulness of these comparisons, however, will be linked directly to the validity of the numbers compared. Validity has been judged by comparison to current content and teacher opinion, a strategy that worked in mathematics, and in science to some extent. It was finessed in the study of written composition and is a source of potential conflict in the study of reading literacy. We can take some solace in the finding that the frontier of methodology in cross-national studies is exactly the frontier of our understanding of the subject matter we propose to study. References Brandt, D. (1985)‘Versions of literacy’, College English, 47, 2:128-38. Connelly, F. M. (1987) Ontario Science Education Report Card: Canadian National Comparisons, Toronto: Queen’s Printer for Ontario. Connelly, F. M., Croker, R. and Kass, M. unpublished manuscript concerning methodological issues in the Second International Science Study. To appear in Comparative Education Review. Elley, W. B. (1988)‘Reading literacy: an international study’ a research proposal (ILEA/RL/01). Goodman, Y. M. and Watson, D. J. (1977)‘A reading program to live with: Focus on comprehension’, Language Arts, 54, 8:868-79. Gorman, T. P., Purves, A. C. and Degenhart, R. E. (1988) The IEA Study of Written Composition. Vol. I: The International Writing Tasks and Scoring Scales, Oxford: Pergamon Press. Husen, T. (1984)‘Issues and their background’, in T. Husen and M. Kodan (eds) Education Research and Policy: How do they relate?, Oxford: Pergamon Press. IEA (1988) Science Achievement in Seventeen Countries - A Preliminary Report, Oxford: Pergamon Press. Langer, J. A. (1988)‘The state of research on literacy’, Educational

76

Cross-national

comparisons

Researcher, 17, 3:42-6. Lincoln, Y. and Guba, E. G. (1978) Naturalistic Inquiry, Beverly Hills, California: Sage. McKnight, C. C, Crosswhite, R J., Dossey, J. A., Kifer, E., Swafford, J. O., Travers, K. J. and Cooney, T. J. (1987) The Underachieving Curriculum: Assessing US School Mathematics from an International Perspective, Champaign, IL: Stipes Publishing Company. McLean, L. (1982) Report of the 1981 Field Trials in English and Mathematics: Intermediate Division, Toronto: Queen’s Printer. McLean, L. (1987a) Teaching and Learning Chemistry in Ontario Grade 12 and 13 Classrooms: Teachers, Students, Content, Methods, Attitudes and Achievement, Toronto: Ministry of Education, Ontario. McLean, L. (1987b) Teaching and Learning Physics in Ontario Secondary Schools: Three Classic Wu Li Dances, Toronto: Ministry of Education, Ontario. McLean, L. D. and Goldstein, H. (1988)‘The US national assessments in reading: Reading too much into the findings’, Phi Delta Kappa, 69, 5:369-72. McLean, L., Wolfe, R. and Wahlstrom, M. (1987) Learning About Teaching from Comparative Studies - Ontario Mathematics in International Perspective, Toronto: Queen’s Printer for Ontario. Raphael, D., Wahlstrom, M. W. and McLean, L. D. (1986)‘Debunking the semestering myth’, Canadian Journal of Education, 11, 1:36-52. Walker, D. A. (1976) The IEA Six Subject Survey: An Empirical Study of Education in Twenty-one Countries, Stockholm: Almquist and Wiksell International (also published in New York by Wiley, Halsted Press). Wells, C. G. (1986) The Meaning Makers: Children Learning Language and Using Language to Learn, Portsmouth, N.H: Heinemann.

77

II Comparative perspectives on public examinations

Public examinations play a dominant role in many education systems throughout the world, and this section of the book looks specifically at the nature of that role in a variety of settings. Although individuals working within education systems which are dominated by public examinations are often aware of their restricting influence, much can be gained from standing back and taking an international and comparative perspective on their role. Noah and Eckstein’s chapter sets up an excellent framework for the section by comparing the secondary school leaving examination systems in eight nations. The major thrust of this chapter is an attempt to analyse the nature of the trade-offs and compromises that have occurred in each of these settings to arrive at the system which they now have. These authors go on to identify four universal dilemmas of examination policy, which relate to the potential for narrow selection exams to cater for a wide range of pupils, the clash between the demand for uniformity and the need for diversity, the backwash effects of testing or teaching and the clash between the autonomy of teachers and the use of examination results to judge school and teacher effectiveness. These are themes that are picked up again in Tom Kellaghan’s chapter which discusses the examination systems in five African countries. Again, he looks at some of the trade-offs between the potentially damaging influences which examinations can have in a developing country and the social and educational pressures which press for their creation and continuation. Kellaghan also looks at the very many different functions that examinations can perform, and the varying degrees to which this is in fact happening in a variety of contexts within the continent of Africa. David Pennycuick’s chapter deals with the more specific issue of the introduction of continuous assessment systems at secondary level in developing countries. This has been a widespread trend which was illustrated through Tom 81

Kellaghan’s data in the previous chapter. David Pennycuick explores the way in which continuous assessment has been introduced either alongside or in place of public examinations in a wide range of developing countries. He also raises questions about why its introduction has been so popular, and whether there is evidence to confirm the benefits that have been thought to accrue from it. The next two chapters were written by New Zealanders, although the first, by Tony McNaughton, is in the form of a comparative analysis of reforms in examinations and assessment in Great Britain and New Zealand. He looks at some of the issues that have been prominent in both countries, such as the attempted move towards more criterion-referencing and the widely expressed faith in examinations as the ultimate way of arbitrating about educational standards. McNaughton’s analysis is full of warnings about the way in which assessment may be subverted for very narrow ends, and its formative, educative role be neglected and forgotten. He appears to see little in the current reforms in New Zealand, England and Wales, and Scotland to give him much optimism in this respect. Paul Rosanowski’s chapter is much more specifically about a single innovative assessment project, which attempts to harness the role of assessment into the educational and cultural context within which it is to be used. His notion of culturallysensitive assessment, which was developed in the context of the assessment of oral Maori in the New Zealand School Certificate examination, is well worth exploring in relation to many of the cultural contexts in which inappropriate assessments have been imposed. The marrying of assessment to the context and purpose for which it is needed is a hard challenge, in the face of a reality whereby assessment has tended to subvert cultures and destroy the whole framework of educational settings upon which it has frequently been imposed. The section is rounded off with a post-mortem on the first year of the GCSE examination, in England and Wales, by Desmond Nuttall. Clearly the relevance of the GCSE reforms are of significance to a much wider population than those contained within the British Isles. Britain has played a major role in exporting public examinations to developing countries around the world, and any major reform in its own examination system is bound to be watched with interest. The message of the chapter, however, is that GCSE has not turned out to be the major reform that many anticipated, and in some ways it has had a retrogressive and divisive effect on the education system which it was intended to complement. Desmond Nuttall ends this section with the prediction that the 82

GCSE examination will be extinct before the end of the century, which provides the tantalizing prospect of the nation that gave much of the world public examinations trying to do without them, at least at the end of compulsory schooling.

83

6 Trade-offs in examination policies: an international comparative perspective Harold J. Noah and Max A. Eckstein

This chapter reports on recent changes in examination policy and practice in eight contemporary national systems of secondary school leaving examinations (China, England and Wales, Federal Republic of Germany, France, Japan, Sweden, USA and USSR). It refers to some aspects of a larger study of secondary school examinations in those eight countries that Max Eckstein and I have begun this year, with support (gratefully acknowledged) from the Spencer Foundation in Chicago, by identifying significant trade-offs present in current policies and practices. Controversy over examination policies is commonplace in the contemporary world. It has been exemplified in China's abandonment of secondary school and university entrance examinations during the Cultural Revolution, and their reinstatement ten years ago; in the disputes over the form and purposes of the Bac in France, disputes that from time to time threaten to undermine the very continuance of government; in the concerns expressed in Japan that whatever the benefits its ‘examination hell’ might bring in the way of stimulating student and teacher efforts, they are being bought at the price of severe tension placed on young people and their families, and in the current rolling debates in England over the institution of the GCSE, the changes proposed for A-level examinations, and the introduction of periodic national assessments of pupils’ progress throughout their school careers. Argument ranges over the entire spectrum of matters associated with examinations — from narrowly technical problems of examination procedures, through questions of broad educational significance, all the way to issues that touch fundamental ideological concerns and political preferences. Not that any particular question about examinations can be neatly categorized under a single such heading. Even the most arcane technical question can, and not infrequently does, carry with it implications for education as a whole, and even 84

Trade-offs in examination policies political choices; and these, in turn, can quickly involve specialized psychometric problems. The specific terms of debate vary significantly from nation to nation and from decade to decade, but some recur. None of the policy problems is easy to solve, and some are so difficult that they might as well even be called dilemmas. Each nation’s system of examinations may be regarded as representing a set of provisional compromises among competing values. While seeking to increase perceived benefits in one direction, a nation almost inevitably gives up some benefit or exacerbates some problem in another direction. It is in that sense, therefore, that we view extant examination systems as configurations of trade-offs. For example, consider the characteristic of examination uniformity. Uniform examinations across the entire nation facilitate comparability and even-handedness of treatment as between different groups. But uniformity exacts its price: regional and local interests may feel slighted, the centre’s purposes are likely to be served at the expense of the peripheries’, and opportunities to adjust the examination to recognize the different needs of regions or groups at different stages of school development are inevitably reduced. Alternatively, consider the extent to which options are permitted. A large measure of optionality brings the clear benefit of adapting the examination to the subject preferences and aptitudes of individual candidates. But optionality inevitably weakens the sense of a national curriculum and a national culture. A credential based on a familiar standard set of compulsory subjects is easy for employers and admissions officers to interpret; they can be puzzled indeed by the complex regulations and weighting schemes used to equate the essentially non-equatable assortments of examination subjects offered by candidates. Finally, consider the choices for the format of the examination. Oral examinations were once quite common at the end of secondary school, because they offered an opportunity for assessment based on interaction between the examiners and the candidate, and thus permitted examiners to shape standard questions to individual candidates. Nowadays, oral examinations are rare, mostly because the cost is considered too high, but also for fear of loss of objectivity and comparability across candidates. Precisely in order to gain such objectivity, a few nations have turned to multiple-choice, machine-scorable examinations, which also have the significant benefit of costing very little per additional candidate, once the substantial initial expenses of constructing 85

Changing Educational Assessment and pre-testing have been met. Yet many believe that in the end these benefits come at too high a price, encouraging styles of teaching and learning that they would prefer to avoid. These and other trade-offs can be illustrated from the experience of the eight countries in our study. The United States More than most countries, the United States has embraced the device of machine-scoreable examinations, usually in the form of collections of multiple-choice items. This has been done largely because the commitment to widening the clientele served by examinations has been so strong, and the resulting numbers of candidates so large. Large numbers of candidates in turn made it economical to invest considerable resources in formulating, pre-testing and revising a very large bank of items, from which actual question papers could be constructed. The option of retaining the traditional extended-answer type of examination was rejected, for reasons of cost and complexity of organizing a graded system that would be seen to be equitable. The choice has exacted some substantial educational costs: the development of written language skills among the student population is not a high priority; careful construction of an answer gives way to learning test-taking tricks and the tactics of guessing; in practice, short item questions tend to emphasize recall-type learning, rather than analysis and problem-solving. These drawbacks are widely conceded, but the price has been paid and the trade-off has been made relatively willingly and uncomplainingly in order to secure both the important political value of a more accessible and objective examination system, as well as the ability to deal reasonably inexpensively with the consequent flood of candidates. A second noteworthy feature of the United States' examination scene is the rejection of the slightest hint of a centralized system of examinations in the hands of the national government. Nor, indeed, do most of the fifty states offer a secondary school leaving examination or university selection/entrance examination. Instead, the job is left to what are essentially private organizations, such as the Educational Testing Service and the College Entrance Examination Board. Although these organizations do provide a certain coherence to an educational system that would otherwise be exceptionally fragmented, their non-public status has nevertheless helped maintain the states’ rights and even parochial bases of 86

Trade-offs in examination policies American education and it has done little to help raise general educational standards in less advantaged parts of the country. Japan At the other end of the spectrum of control and coherence of education in general and examinations in particular, Japan, until a few years ago, operated a very economical system of selection for higher education entrance, on the basis of a single, nation-wide, standardized examination. In view of the extreme importance of the decisions that were being made on the basis of this single examination, the quantity of resources spent on providing it was remarkably low. But in order to improve control over the make-up of their entering classes, the colleges and universities instituted a second level of examinations, set by each institution, thus effectively transforming the National Examination into a preliminary qualifying examination. This device has enabled the institutions for higher education (IHEs) in Japan to retain some measure of control over their student recruitment, but the trade-off has been the significantly higher resource costs that are now involved in selection for post-secondary education in Japan. A large share of these resource costs is borne by candidates and their families, who invest time and funds in one-on-one coaching, after-school schools (the famous juku), and the expenses of travel to distant cities to sit for the second-level examinations. Apart from these tangible resource costs, there are important intangible costs arising from intense competition for places in the best universities and the resulting academic and psychological pressure on candidates. Indeed, the competition is so intense and the pressures are so great during the secondary school period that the universities complain that students arrived burned-out, determined to make up for their lost youth, and unwilling to continue to study hard. The contrast with the United States is sharp. There, the complaint is about the lack of challenge that many high schools offer to their students, and the shock that college freshmen can receive when confronted for the first time with major demands upon their time and intellect. Of the eight countries in our study, Japan and the United States are the only two to have adopted a virtually exclusive multiple-choice machine-scorable format for the university entrance examinations. The Japanese appear to have been persuaded, along with the Americans, that such tests are more 87

Changing Educational Assessment objective, provide higher levels of comparability across candidates and are generally more efficient to administer by examiners who are facing hundreds of thousands, if not millions, of candidates. Perhaps more than in the United States, the Japanese have paid a heavy price for these benefits, producing tests that require candidates to memorize vast quantities of‘facts’, and that downplay originality and flexibility of thought. France Over the past four decades, France has placed increasing reliance on the school system (as distinct from employers) to provide its generally buoyant economy with trained labour. As a consequence, more young people are carried further in school to a degree that would have been very difficult to predict from the France of the 1950s. Moreover, as youth unemployment become ever more worrisome, the schools have been pressed to tailor their curricula and organization to the desires of employers — a development seen in many other countries, too. But one consequence of the vastly increased numbers and new types of candidates finishing a full secondary education has been recognition of the limitations of the academically oriented Baccalaureat examination of, say, 1950, in the changed circumstances. The French solution to this problem has been to retain the Bac in form, but to furnish it with substantial new content. In a major effort of educational adaptation, a Bac that had been organized on a narrow, humanities and mathematics oriented basis and was easy to comprehend, has become an extraordinarily differentiated and complex examination system, with a host of series, lignes, and options (thirty-eight in 1988, compared to just four before 1950). Although in some respects the French system has now moved some distance towards the English specialized model, this should not be exaggerated, as the Bac retains a large core of general education subjects required of all candidates. Nevertheless, one can no longer speak of a single nationally comparable examination administered to all candidates. Instead, a strongly demarcated hierarchy of prestige has emerged, with the mathematical options at the head, and the vocational options forming the tail. We should also note that, in spite of the persistence of a common core of subjects, the highly differentiated Bac has provoked fear that France is dissipating her intellectual patrimoine. Whether this is a benefit or a loss is, we suppose, 88

Trade-offs in examination policies a matter of taste, but taken simply as a matter of fact rather than of values, French culture generate has become a little less generate. The changes made in the Bac represent bold and on the whole successful moves, but they have been achieved at some price. The most obvious has been a loss of comparability across candidates, who take widely different assortments of subjects, different papers in nominally the same subject, with different weights given to the results, depending on the particular option. In addition, the limited devolution of administrative authority from Paris extends also to the administration of the Bac. Each of the twenty-three academies selects its own assortment of questions from the centrally approved list, and has some latitude to set its own standards of grading. Some academies have even acquired reputations for their relatively lenient standards. An examination system that began with a strong commitment to strict comparability across the entire country (including even overseas departments and dependencies) has had to yield up that important value. Last, as has occurred in so many other countries, the opening of broader access to the Bac has produced a flood of students for the French universities. Yet the university system has been held on an exceptionally tight budget rein by successive administrations. The result has been the development of scandalously poor conditions of study for many students, especially those in the universities situated in and around Paris. Competition for entrance into the betterprovided areas of the universities, and particularly into the grandes écoles, has intensified to the point that the Bac has become a kind of first level qualifying examination, with the decisive examination being either the concours, or the examination for admission to a grande école. This represents a devaluation of the Bac, perhaps an inevitable cost of its democratization and, in particular, its extension to the vocational tracks. Federal Republic of Germany The expansion of secondary education in Germany took place a full fifteen years later than in France and, as with the Bac, the Abitur examination has been significantly altered to cope with the increased numbers and the changes in the kinds of candidates. However, the Abitur was less radically altered than the Bac, perhaps because the need in Germany to secure some measure of agreement among eleven quite differently oriented 89

Changing Educational Assessment Lander governments always tends to put a brake on change. In addition, while the Bac is marked by a generally high degree of central direction, a determining characteristic of the Abitur is its school-based control. In matters of format, both France and Germany (in distinct contrast to the United States and Japan) have retained a traditional approach, relying on extended answers to questions, even to the extent of maintaining a certain reliance on oral examinations. Especially in Germany, the combination of local control and written and oral examinations raises questions about the extent to which grading standards are kept consistent even within a given Land, let alone across the eleven Länder. Because Germany makes less effort compared even with France to ensure such standardization, an important element of chance and arbitrariness has developed. Since 1979, the demands made on candidates for the Abitur have been reduced. In particular, they have been permitted to offer selected subjects at lower levels of difficulty. The changes have encouraged a vast expansion of the number of young people completing their secondary education with the Abitur, but again, as in France, certain costs have been paid for this advance. First, and most dramatic, has been the need to introduce restrictions on the right of the Abitur holder to enroll in any university faculty of their choice, the so-called numerus clausus. The lines of study affected are those that carry high social prestige and/or offer the opportunity of high earnings in the future (and which are also extremely expensive for the state to provide), for example medicine and dentistry. Since the legal entitlement to a university place was probably the most cherished aspect of the Abitur credential, the price paid for expansion has been high. In addition, admission to the faculties and departments under numerus clausus is determined via a highly complex points system, that takes into account not only the marks received in the Abitur (with differential weightings for different subjects for candidates choosing different sets of major and minor subjects), but also school grades. Even so, because competition is so intense for admission to some faculties, tiny fractions of a percentage point in the final placement can become critical in the admissions process. Another cost of the changed system and the introduction of numerus clausus is that some candidates with the highest scores are entering the most favoured and prestigious faculties, even though their interests, aptitude or previous educational specialization may lie in other directions. They are simply 90

Trade-offs

in examination

policies

reluctant to waste’ high standing in the admissions on a place in a less prestigious line of study. This clearly is not a welcome development, especially in such fields as medicine. These problems have caused the Council of Ministers of Education to reconsider the changes made in the Abitur. Last autumn decisions were taken that amounted to restoring some of the older standards and regulations, especially limiting candidates’ freedom to select subjects at lower levels of difficulty. People’s Republic of China Examinations for entrance to higher education and higher level technical training in the secondary technical schools serve primarily to control access to severely limited resources, already strained by a shortage of well-prepared teachers, inadequate buildings and equipment and out-of-date libraries. In the current drive to modernize the Chinese economy and the armed forces, the pendulum of Chinese higher education admission policy has once again swung to an extreme position. Ideological faith and socialist‘ g o o d works’ now count for little; nor is peasant origin any longer so helpful. Instead, admission depends on success in the examinations at the end of senior secondary school.‘Expertness’ is valued more highly than‘redness’, at least for the time being. This policy of placing student ability above political orthodoxy as the major criterion for advancement is a political risk the current administration has chosen to run. More even than in France, the basic stance of the examination system in China is one of rigid central control and uniformity of administration and content, although some devolution of authority has been made to Beijing, Shanghai and Tientsin. Partly this characteristic builds upon traditional state practices, but partly too it is based upon a desire to select on a strictly equitable basis the best and the brightest of Chinese youth for university education. The number of candidates is exceptionally large (in 1988, 2.7 million prepared for the national college admission tests), of whom only a quarter will be accepted for study. Overall, about 2 per cent of first-graders go on to higher education. The combination of intense competition and virtually nationwide uniformity of the examination leads to pressures on students that are every bit as severe as in Japan. The Chinese authorities have introduced substantial elements of multiple-choice and short-answer questions into 91

Changing Educational Assessment what had previously been a traditional extended-answer type of examination. They have not yet moved to machinescoreable formats. Given the large numbers they are presently dealing with, the costs of grading and administration must be burdensome. As the number of candidates increases in future years, the temptation to move to machine-scoreable tests and the attraction of the low marginal costs associated with such formats must grow even greater. At least one authority on examination policy in the World Bank has predicted that the time is not far off when the pressure of numbers on the Chinese examiners will become irresistible, and a vast school population of multiple-choice, machine-scored examinees will be added to those of the United States and Japan. If so, there is distinct danger that the changed format will reinforce the already strong emphasis in Chinese schools on rote learning and the recall of ‘facts’. Soviet Union As in so many other respects, the Soviet Union provides a sharp contrast to China. Though there is significant influence exerted from Moscow, each of the fifteen republics is responsible for setting the content and standards of the secondary school examinations for the leaving certificate, the attest at zrelosti. Schools work within the republic guidelines, but in turn enjoy a good deal of local discretion. The teachers who have prepared the students dominate the process of setting the questions and evaluating the responses. Paradoxically, in a society and a school system that are in most respects characterized by substantial central direction, the school completion examination is not. Thus, the Soviet Union has settled for a curious compromise between the rhetoric of centralized planning and the practice of local discretion. The trade-off for such local discretion is a substantial loss of comparability of marks, and this has led the VUZy (universities and technical institutes) to insist on applicants sitting for special, institutionally set and graded entrance examinations, very much along Japanese lines. As in Japan, the examinations are highly competitive and can impose substantial travel costs on students. There is virtually no coordination among the VUZy concerning the dates on which they will hold their examinations, examination syllabi are idiosyncratic, and grading formulas, cut-off points, and so forth, confidential. Glasnost has much work yet to do in this corner of Soviet life! 92

Trade-offs in examination policies A consequence for many Soviet young people has been to turn higher education admission into a process incorporating large elements of game-theory, almost textbook examples of decision-making under conditions of imperfect knowledge and uncertainty. Apart from the persistent reports of discrimination against certain ethnic and religious groups, influence peddling and corruption, the system appears to lack important elements of overall fairness and objectivity. Sweden In 1971 Sweden introduced a new form of upper secondary school, designed to continue the education of a very large fraction of the post-compulsory age-group, and integrating three formerly separate types of schools (gymnasium, vocational school and continuation school). The various study courses of those schools were largely imported unchanged into the new integrated school, represented by twenty-seven socalled ‘lines’ of two or three years’ duration. Some of these lines are highly academic (especially the natural sciences line); others are more vocationally oriented; yet others are ‘general’. In the mid-1970s, Sweden discarded a limited but usable final secondary school examination system in order to reduce the strain on pupils, produce more valid and reliable predictors of university success and (it was hoped) correct socio-educational inequities in assessment. In place of the final examination, the Swedes installed a combination of marks gained during regular classroom and home work and in nationally set tests administered at intervals during the school career. Meanwhile, in 1977 a major reform of the higher education system revamped admissions criteria. Work experience and age (maturity) were given strong weight in the admission decision, and completing a less demanding upper secondary school line was not by itself sufficient to be considered for admission to higher education. Additional study and credential were demanded. Abandonment of final examinations was also motivated by the desire to improve the diagnostic and predictive value of tests of individual student achievement and to give teachers national benchmarks against which to set their own pedagogical efforts. The Swedes have been willing (and able) to incur rather heavy costs to achieve these goals. The new system requires time-consuming collaboration among teachers in a given school, and across schools in a region. Exceptionally detailed 93

Changing Educational Assessment record-keeping is required, and the Swedish National Board of Education is charged with the responsibility of preparing and standardizing the tests given in the basic school subjects at various points along the school ladder. Although Sweden may have abandoned its final secondary school examinations, there has been no abandonment of tests and examinations, in general. Indeed, one might well argue that there is now more examining and evaluation based on tests and examinations than ever before. In 1987 it was announced that examinations and testing in the upper secondary school will be complemented by an assessment programme for the compulsory school years, to begin with the 1988-9 school year and to take in successive grades. All grades will be covered and reported on at threeyear intervals. This so-called ‘national evaluation to give a holistic picture of school activities’ will not confine itself to the academic side of the school, but will include data on the social and home environments of the pupils, their health, their social and emotional development and their attitudes. What is being proposed is a massive national enterprise, carried out to an exceptionally thorough degree, and demanding the expenditure of very significant resources, both human and material. External examinations have thus been replaced by continuous assessment, nationally planned but locally executed. In all of this, the authorities are seeking to ensure that the decentralization of school administration in Sweden will not lead to unacceptable degrees of inequality of provision. Regular evaluation of the entire Swedish school system is supposed to provide the data on which any necessary remedial actions can be based. Such a thorough-going system of sustained scrutiny raises a question for Sweden which has current relevance in many of the states of the United States, and which may soon become important in England: how much attention by way of tests and examination can a school system stand before it becomes overroutinized and over-preoccupied with frequent, probing testing? Is there a danger of turning education into mere instruction? Our own view is that the Swedes are indeed risking the payment of a very high price for their commitment to constructing a comprehensively detailed data-base of the performance of their schools, their teachers and their school children.

94

Trade-offs in examination policies England and Wales There have been many recent developments both within secondary school examinations — the GCSE, A and AS levels — as well as in relation to the contentious matter of regular in-school national curriculum assessments. Parallels with Sweden are striking, though the two countries have arrived at their present policies starting from distinctly different traditions of educational administration. Without going into the details, the events of the last couple of years represent an abrupt acceleration of what has otherwise been a glacially slow process of transferring authority over the schools from local to central government. In the interest of establishing national standards, voluntarism and localism are being forced to give way. Ever since Robert Lowe, Kay Shuttleworth and paymentby-results in the mid-nineteenth century, it has been clear that examinations and testing could be used effectively in England as a lever to change the way in which the schools operate. Since the demise of the General School Certificate and its associated London Matric. regulations, it has not been necessary for a given student to take any particular subject or to follow any particular syllabus within that subject, except insofar as they wanted to take an examination, or the school demanded it. The cost in terms of lack of coherence in school curricula and indefiniteness of expectations of what the schools should be doing has now been judged by the government to be too great to be supported any longer. The new structure of examinations is intended to help implement what amounts to a national curriculum, though that new structure is not likely to be accompanied by any change from the traditional extended-answer format. Perhaps it is worthwhile noting that a much greater effort has been made in England (at least within each examining board) than in either France or Germany to ensure standardization of grading criteria. For this reason, some of the more serious doubts about the fairness of marking that are voiced in the latter countries have been absent here. The changes already implemented or foreseen for the endof-secondary school examinations in England should be seen as part of a proposed comprehensive assessment procedure throughout the nation and for the mass of the school population. This last will be a major innovation in the English context, and will be one that is likely to come, at least initially, with a high price tag attached in terms of professional morale. Morale among teachers and head teachers in 95

Changing Educational Assessment England has already fallen to levels not seen since the establishment of the state education system after 1870. Some will argue that comprehensive assessment procedures are necessary, because only if it can be publicly demonstrated that the schools are returning value for money will the teaching profession be accorded the respect, appreciation and material rewards that it deserves. And, it is argued, a recovery of such respect is the precondition for a major upturn of morale in the profession. This justification for what has become known as ‘accountability’ is heard on all sides of the United States, and is very strongly sounded in Sweden. However, it fails to explain how more intense scrutiny of the work of teachers is going to help them achieve that essential characteristic of respected professional status — a large degree of personal autonomy in deciding how professional practice shall be carried out. For this reason, it will be advisable in England to take careful note of Swedish experience. Because they are very far ahead along the road of examining, testing and evaluation that is currently being charted in England, any drawbacks (or remedies) the Swedes discover en route should be quite instructive here, too. Conclusion In this chapter we have sought to show how important characteristics of examination policies and practices in each of eight countries can be usefully viewed as compromises, or trade-offs, between desirable alternatives. Such trade-offs are inherent in trying to negotiate four contemporary and wellnigh universal dilemmas of examination policy. Each dilemma has been created in large part by the pressures of universal secondary education, and the consequent expansion of higher education. Dilemma 1 A traditional examination that was used in many countries to select from a small secondary school elite an even smaller elite to enter higher education has been recognized to be inadequate to cope with vastly greater numbers and different types of candidates. Yet it has proved difficult to widen access to the examination without some devaluation of the credential gained. 96

Trade-offs in examination policies Dilemma 2 Examination uniformity is desired in order to promote comparability of marks, as well as to help create or conserve a common national culture. Yet, reconciliation of these desiderata both with demands for diversity to meet special sub-national needs and with the increasing diversity of the secondary school curriculum has not been easy to accomplish. Dilemma 3 Because of the powerful influence that examinations have on the way teachers teach and students learn, it has been difficult to make use of the new technology of testing without adversely affecting the purposes and styles of schooling. Dilemma 4 As secondary school coverage has grown, education becomes an evermore critical national concern and its costs come to represent a significant charge on tax revenues. Governments see the schools as part of the struggle for international competitiveness, and they are sensitive to demands that taxpayers should receive ‘value for money’. Hence the emphasis on‘accountability’.But governments’ readiness to use examinations as a device to monitor and change education tends to undermine the professional autonomy and status of school personnel. Yet strengthening professional autonomy and status may well be a necessary condition for successful operation of an effective school system in the contemporary world. While none of the eight countries examined here represents these trade-offs with equal force, taken together they demonstrate their importance in any consideration of examination policies and practices.

97

7 Examination systems in Africa Thomas Kelleghan

The information on which this chapter is based was obtained in visits which I and a colleague (Vincent Greaney) made to the countries concerned in July of this year (1988). The visits can be broadly described as a reconnaissance exercise to identify problems in examination systems and in particular the options that may be available to use examinations to improve the general quality of education in the countries operating these systems. This is an area of interest for the World Bank which for some time has been looking at ways of improving the efficiency of schools and the quality of education in developing countries (see Fuller 1985; Heyneman 1985, 1987; Mingat and Psacharopoulos 1985; World Bank 1987). The next step in our exercise will involve a meeting with the education ministries of the countries which we visited. At the meeting, they will have the opportunity of reacting to a report based on our visits and of exploring any further steps they may wish to take, if indeed they decide to take any steps at all. If the interest of the countries survives to this point, arrangements can be made by them to negotiate possible support programmes from the World Bank. The economic, social and educational contexts Five countries in Anglophone Africa were involved in the study: Ethiopia, Lesotho, Malawi, Swaziland and Zambia. The countries range in population from less than a million (Swaziland) to over forty million (Ethiopia). All can be considered poor; annual per capita GNP ranges from $110 (in Ethiopia) to $790 (in Swaziland). Even these figures may underestimate the extent of poverty in the countries as the figures reflect only the monetary assessment of activities related to the modern sector of the economy (which in many cases relies heavily on foreign capital) and so throw little light on the 98

Examination systems in Africa standard of living of the majority of the population who in all the countries live in rural areas and are engaged in the traditional sector of the economy, which is mainly agriculture. (The percentage engaged in agriculture ranges from 73 per cent in Zambia to 86 per cent in Lesotho [World Bank 1987].) The educational systems are similar in a number of ways. First, all have expanded rapidly in recent years. This is partly because of population growth but is also due to a commitment in the countries following independence to provide education both for economic and social reasons. An indication of the extent of growth involved can be seen in the fact that over the last twenty years, taking Africa as a whole, over fifty million new pupils enrolled in school; however, it is expected that in the next twenty years, almost 110 million new students will become eligible for enrolment (Windham 1986). Second, today there are children in all five countries who do not go to school at all. It is difficult to get accurate figures relating to this situation. However, enrolment ratios which are available reveal that the number of primary school children as a percentage of the appropriate age group varies widely for the five countries — from 38 per cent (in Ethiopia) to over 100 per cent (in Lesotho, Swaziland) (World Bank 1987). The countries also vary, though less sharply, in the percentage of primary school children who are female — from 38 per cent (again in Ethiopia) to 58 per cent (in Lesotho) (World Bank 1987). Third, despite differences in overall enrolment figures, all countries possess educational systems which are markedly pyramidal — broad at the bottom and narrowing right from the second grade onwards. Thus, secondary enrolment ratios range between only 4 per cent (Malawi) and 19 per cent (Lesotho) with one exception which is considerably higher (43 per cent in Swaziland) (World Bank 1987). Further, tertiary enrolment ratios vary between 0.5 per cent and 3.0 per cent. In all countries except Lesotho female participation decreases as one goes through the educational system (World Bank 1987). There is almost a perfect relationship between per capita GNP and participation rate. Economic development is a more potent indicator than, for example, political commitment. Thus, countries with free education and the goal of universal education (at least at the primary level) have lower participation rates than a country (Swaziland) with high GNP, even though education in the country is not compulsory and students have to pay tuition fees at all levels. Fourth, the combination of expanding numbers in the educational system with slow economic growth, aggravated in 99

Changing Educational Assessment the first instance by the oil crisis in the 1970s and then more recently by increasing levels of debt repayment obligations, means that the choice for policy-makers for at least the remainder of this century would seem to be between increased efficiency in the use of existing resources or the acceptance of declining standards of access, equity and academic achievement (Windham 1986). It is in this context that the World Bank has seen a possible role for examination reform. The system of examinations The system of examinations in all five countries was heavily influenced by traditional British examination procedures, and there have also been varying degrees of American influence. The examinations were formal, terminal, subject-based and external to the school, being administered by either the Ministry of Education or an Examinations Council closely related to the Ministry. In the examinations, the emphasis was on written work rather than on the assessment of practical, oral or course work. In general terms this is still the situation, though some aspects of all systems have undergone change in the last twenty years or so. For example, essay-type written examinations have been replaced in part or in whole in some countries (Lesotho in part, Ethiopia in whole) by multiplechoice tests. However, in some countries (Lesotho, Swaziland), the most important examination — that taken at the end of secondary schooling — is still actually administered from Britain (Cambridge Overseas School Certificate). External examinations exist in all five countries at the end of primary schooling, at the end of the junior cycle of secondary schooling and at the end of the senior cycle of secondary schooling — a situation, of course, that at one time existed in several European countries. Furthermore, in some areas in one country (Zambia) there is an external examination at the end of grade 4, because of a shortage of places in the higher primary grades in some rural areas. Given what I have said about the structure of the educational system, it is hardly surprising that examinations play a crucial role in selecting the students who will obtain the decreasing number of places that are available as one proceeds through the system. Thus, admission to junior secondary school is based on performance in the examination at the end of primary schooling; admission to senior secondary school is based on performance in the examination at the end of the junior secondary cycle, and admission to third-level education 100

Examination systems in Africa on performance in the school-leaving examination. Examinations serve functions other than selection. They are also used for certification and employers may take account of students’ educational achievements, as indicated by the certificates they have obtained. However, while the lowerlevel certificates were of some value in the past, they have lost or are losing their currency value in today's limited job market to higher examination certificates. A further function which examinations have relates to accountability. Schools (and teachers) with good examination results are regarded as ‘good’ schools (and ‘good’ teachers). In some countries, the accountability function is magnified by the publication of the examination results of schools. Given the importance of examinations, it is not surprising to find strong state control of the examination system. However, control varies somewhat from country to country. In some cases, the Ministry of Education administers the system; in others, the Ministry has devolved authority to an Examinations Council with which, however, it maintains close contact in its governance and in sharing accommodation and facilities. Within the same country, the same body may not be responsible for all examinations. There is also variation in the amount of control exercised by countries in the decisions which are made on the basis of examination performance. In some countries decisions to admit pupils are made by individual secondary schools (in Lesotho, Swaziland) on the basis of examination results, while in others such decisions are made by a state agency (e.g. in Zambia). As well as representing state interests, the examination systems in all countries, to a greater or lesser extent, allow room for other interests, particularly those involved in the educational enterprise. Thus, teachers may be involved in syllabus committees, in the setting of papers, in supervision and in examining and other forms of assessment. In general, it would seem that the examinations are perceived as fair and impartial and serve to legitimate the allocation of scarce educational benefits. At the same time, it would also seem that they can have serious negative effects on the educational process, and these have been documented to some extent. The observations that are made are similar to those made elsewhere in the world for the last hundred years or so — that subjects, topics and skills, if not covered by the examination, even though specified in the curriculum, will be ignored in classroom teaching. In a government report in Lesotho (1982:94), it was claimed that many problems with curriculum and instruction seem to stem from: 101

Changing Educational Assessment the inordinate emphasis given to the preparation for terminal examinations which undermines the attainment of certain objectives that are critical to the country's economic development . . . The JC (Junior Certificate) examination heavily emphasises the accumulation of factual knowledge and neglects general reasoning skills and problem-solving activities. The pressure of examinations and the importance of doing well in them may also contribute to the high rate of repetition of classes and of examinations which is a feature of some of the educational systems. Changes in the examination system All countries, to varying degrees, have in recent years actually implemented changes in their examination systems and/or are considering the implementation of changes. These changes relate to the techniques of assessment, a broadening of the scope of assessment and, in turn and related to this, a shift in responsibility for assessment. The main choice in techniques relates to the introduction of multiple-choice tests to replace free-response ones. Ethiopian examinations are now exclusively multiple-choice in format, raising questions about the possible effects of this on the teaching of writing skills in schools. Some systems (Ethiopia, Malawi) have considerably broadened the scope of the examination by including school-based assessments as part of a terminal examination. This has increased the involvement of teachers in the assessment process, though these kinds of assessment are accompanied by strict monitoring procedures. Lesotho and Swaziland are seriously considering the possibility of using teachers' continuous assessments of students as part of the certification process. It is not clear what precisely the impetus for these changes has been. Some of the impetus seems to have come from people who have studied abroad, either in the United States or in Britain. The continuing ties with British examining boards in some countries may also have contributed to the change, as no doubt has the influence of donor bodies such as USAid. At one level, the changes represent an attempt to improve efficiency by applying new technology; for example, machinescored multiple-choice items rather than person-scored essays are used to reduce scoring costs. The changes may, however, represent more fundamental shifts in the control and role of 102

Examination systems in Africa examinations. The use of continuous assessment grades, for example, involves a shift in the control of examinations and of the selection process, which is based on it. Some of the other changes may herald a shift in emphasis in the role of examinations from one of selection to one of certification as well as an attempt to replace assessment-led curricula by curricula-led assessment. Attempts to provide a more adequate sampling of curricula in examinations (as is being done in the multiple-choice tests) suggest, in addition to a concern for curricular validity, a concern with the education of children rather than just the ones that are likely to be selected to continue in the educational system. This seems particularly so when detailed feedback on examination performance is provided to schools rather than just the grades of students. The procedure of pre-testing the items used in multiple-choice tests, as is being done in Lesotho, might also be seen as an attempt to provide curricular validity. However, this may not be the effect of the procedure. It would seem that the selection of items of intermediate levels of difficulty for the final test is more likely to testify to instructional validity than to curricular validity, since only items which a given proportion of pupils (about half) get right will be retained in the test. Thus, only items which pupils have had the opportunity of learning (presumably in school) are contained in the test. If schools consistently ignore aspects of the curriculum, items relating to those aspects will not find their way into the test, creating a situation in which certain curriculum areas are not tested. A further problem that can arise from pre-testing items is that the use of items with high indices of discrimination assumes a unidimensionality in achievement which may be more appropriate to a selection function than to a certification function. Conclusion From this brief overview, it is clear that Africa has not escaped the winds of change in assessment which formed the theme of this book. It is equally clear that a certain lack of coordination and uncertainty surrounds the changes. Probably no one would disagree with the view that a change in the examination system that would improve the quality, without increasing the cost, of education would be desirable. However, it is not clear that the changes that have been taking place have been directed towards this objective in any purposeful way. Neither is it clear, given the enormous 103

Changing Educational Assessment pressures arising out of the structure of the system to use examinations primarily for selection purposes, that attempts to achieve the objective of improving school quality will be successful. Insofar as any country can achieve consensus in its aims for education, it would seem important that developing countries, such as the ones which were the object of this study, should attempt to reach agreement and establish priorities in values pertaining to education. For example, is the function of the educational system primarily to select a small cohort of students for higher levels of education, or is it to provide basic skills for the total population, or is it to do both, and if it is to do both, what kinds of balance should be struck between different functions? The rhetoric in most countries would suggest that the provision of basic skills for the total population is most important; present examination practice suggests that selection is most important. And, related to this issue, what emphasis is to be given to elementary education as against second-level or even tertiary education? These questions are not easily answered as they involve a variety of issues related to the economic and social development of nations. But the answers one gives to them, explicitly or implicitly, will seriously affect how resources are distributed throughout the system as well as the kind of assessment system considered appropriate. I do not wish to imply that an ideal assessment system will neatly fall into place once the values and aims of an educational system have been agreed, insofar as these can be agreed. As Noah and Eckstein have argued in the previous chapter, no examination system is without its defects, and the design of any system will inevitably involve a series of trade-offs. Even if greater emphasis is to be placed in educational policy on the education of all children, it is unlikely that the selective function of examinations, even at a relatively early stage in the educational systems of developing countries, can be abandoned. If there has to be selection, it has been argued (for a long time in European countries and currently in developing countries [see Heyneman 1985]) that examinations are a more equitable way of doing the job than other procedures that have been tried. Whether or not examination results are the fairest way of distributing educational benefits is open to debate. What does not seem to be a matter of debate is that countries will continue to experiment with their examinations systems in the belief that examinations are effective devices on which to base administrative decisions —a readily recognized view in Europe 104

Examination systems in Africa and one that is becoming popular in the United States. As experiments proceed, it will be of interest to observe the values which underlie them and whether the examination system is used to resist change in the educational and social systems or to promote it. The fate of proposed examination reform is of particular interest since the changes in assessment parallel changes in the Western world but are being implemented in educational systems that differ considerably from Western systems in their stage of development. We shall have to wait to see how the educational systems respond to such changes in assessment. We shall also have to wait to see whether developing countries will be more successful than countries with longer traditions of formal education in mitigating the influences of examinations as instruments of selection. References Fuller, B. (1985) Raising School Quality in Developing Countries. Discussion Paper, Report No. EDT 7, Washington, DC: World Bank. Heyneman, S. P. (1985) Investing in Education: A Quarter Century of World Bank Experience, Seminar Paper No. 30, Washington, DC: World Bank. Heyneman, S. P. (1987) ‘Uses of examinations in developing countries: Selection, research, and education sector management’, International Journal of Educational Development, 7:251-63. Lesotho (1982) The Education Sector Survey. Report of the Task Force, Maseru: Government of Lesotho. Mingat, A. and Psacharopoulos, G. (1985) Education Costs and Financing in Africa: Some Facts and Possible Lines of Action. Discussion Paper, Report No. EDT 13, Washington, DC: World Bank. Windham, D. M. (1986) Internal Efficiency and the African School. Discussion Paper, Report No. EDT 47, Washington, DC: World Bank. World Bank (1987) Education in Sub-Saharan Africa. Policies for Adjustment, Revitalization, and Expansion, Washington, DC: World Bank.

105

8 The introduction of continuous assessment systems at secondary level in developing countries David Pennycuick Moves towards continuous assessment There is a significant international trend towards continuous assessment (CA). Many developing countries, with a variety of political ideologies, have introduced CA to operate in parallel with external examinations at secondary level. CA results may be reported separately, or CA may form a component of students’ final results; few countries have moved as far as the Australian State of Queensland in replacing external examinations by CA. But in some developing countries (e.g. Tanzania, Papua New Guinea) CA systems have been in operation for over a decade. Others (e.g. Sri Lanka, Swaziland) are in the process of introducing CA. What are their reasons for doing so? In the first section of the chapter I shall summarize the situation in a range of developing countries of differing size, wealth and geographical location. Subsequent sections are concerned with the aims of CA systems, with the assessment functions such systems may perform and with the problems that are likely to be experienced in their implementation. In Sri Lanka, the introduction of continuous assessment coincides with the revision of school curricula for years 1-11. In year 11, pupils study nine subjects leading to Sri Lankan GCE (O-level) certification. A continuous assessment system for years 9-11 is to be implemented from 1988 (it was introduced in year 10 in 1987), together with national examinations in six of the subjects. The examination and CA results are to be reported separately, using nine-point scales. The main reason for the introduction of a CA component is to enable a wider range of educational objectives to be assessed, including affective objectives which are to be assessed through assignments and group work. This continuous assessment scheme was developed keeping 106

Continuous assessment systems in view the need for using it as a means of achieving the objectives of teaching-learning. Therefore assessment was considered an integral part of teaching-learning and not as a separate activity. (Sri Lanka 1987) The intention is for CA to be utilized in improving teachinglearning in schools. According to the Sri Lankan Department of Examinations, CA will stress the following features (adapted from Sri Lanka 1987): (1) closeness to the pupil and to the learning event; (2) openness, with expected learning outcomes, the scheme of assessment, the marking scheme and the marks being made known to the pupils and parents; (3) wider participation by teachers, parents and pupils. The role of teachers is emphasized: This recognition of the competency of the principals and teachers will result in a further enhancement of their responsibility and trust-worthiness.’; (4) integration of assessment with teaching-learning, with feedback to improve the latter; (5) wider scope for the realization of educational objectives; (6) effective feedback and meaningful remedial action; (7) scope for mastery learning; (8) the continuous element, and (9) CA is directed towards reaching achievement standards. ‘Because the assessment is based on open criteria, pupils are able to plan their studies by setting achievement targets, for improving their standards.’ The Sri Lankan scheme is an ambitious one, with over 700,000 pupils involved, and about 30,000 teachers who need in-service training in CA.* The introduction of CA in Tanzania can be traced back to the Musoma resolution of 1974, which stated the necessity of getting rid of the ‘ambush’ type of examination, and reducing the excessive emphasis placed on written examinations (TANU 1974, quoted in Njabili 1987). CA was implemented in 1976, and the CA score contributes 50 per cent of the total weighting in students' final results. Implementation is the responsibility of teachers in the schools, and a statistical moderation system is in use. ‘The correlation coefficients between continuous assessment marks and the final examination marks have, in general, been positive and high in Tanzania’ (Njabili 1987). In her paper, Njabili gives details of the operation of CA in Tanzania, and argues that The main purpose of having a continuous assessment scheme as an integral component of assessment procedures in the Tanzanian education system is to eliminate/minimise the element of risk associated with a single examination, 107

Changing Educational Assessment and to give a valid indication of student achievement, because it is felt that no student who works conscientiously should fail. (op. cit.)

She also refers to a range of other functions of CA including the monitoring of progress by both secondary school students and teachers, motivation and curriculum evaluation. Another country with a long-standing commitment to CA is Papua New Guinea, which has used two systems since Independence in 1975. Until 1981, grade 10 results in the four core subjects of English, mathematics, science and social science were based on continuous assessment over grades 9 and 10, moderated by a reference test (the ‘mid-year rating examination’ — MYRE). Each school’s allocation of distinctions, credits, upper passes and passes in each core subject was determined by the MYRE results. These were then awarded to individual students according to internal rank orders. In noncore subjects allocation of grades was according to fixed national percentages. MYRE was designed to be skills-based rather than content-based, and this system had some attractive features, but was abandoned for several reasons: (1) it became increasingly difficult to design new syllabus-independent test items; (2) there was a tendency for schools to concentrate on preparing students for MYRE in the first half of the grade 10 year, and for motivation to drop off significantly in the second half of the year after the external examinations had been held, and (3) it was increasingly felt that students' external examination results should count directly towards their final results. From 1982, syllabus-based examinations in core subjects have been held towards the end of the grade 10 year. CA and examination marks have each carried 50 per cent of the weighting for students' final results, with statistical moderation based on the examination marks. A similar system has been used for grade 12 results. The Papua New Guinea Department of Education issues detailed instructions to schools for the conduct of continuous assessment, and schools have also been supplied with resource materials from which they can generate their own classroom tests. Since assessment and examinations are the responsibility of the Department (there is no separate examination board), and since teachers are involved in the preparation of test items for examination banks, it is possible to maintain very close links between curriculum and assessment. The Seychelles is a small developing country which has 108

Continuous assessment systems introduced CA in the National Youth Service (NYS) as part of a move away from the selective function of education. The NYS is a nation-wide centralized and uniform secondary education system, which offers years 10 and 11 of education to all those in the appropriate age groups. CA is objectivesbased, is administered by subject teachers and may take the form of ongoing observation, folios of work, oral or written tests, or ‘homework’. Continuous assessment should consider the student’s achievement in both cognitive and psychomotor domains. And in addition to that the teacher should make comments about the way the student is behaving during the learning time.’ (Seychelles 1987) There are also termly common tests in most subjects, whose results are reported separately. CA has both formative and summative aims: • • • • • • •

to know the performances achieved by the students in the various fields of learning in which they are involved. to appreciate particular knowledge and skills acquired by the students individually or in groups. to identify strengths and weaknesses of the teaching/learning process. to generate an information device for guidance and counselling. to give to the students a feedback about their attainments vis-a-vis different learning targets. to place first-year students in second-year channels. to provide information to the ‘Committee for Consideration of Students’ Post-NYS Choices’ and to other decision makers using students' records of achievement. (Seychelles 1987)

Cumulative student profiles are used formatively throughout the NYS. In Nigeria (a very large country), the 1977 National Policy on Education laid strong emphasis on CA. This advocacy of continuous assessment arose from the belief that it would: (a) give the teacher greater involvement in the overall assessment of his or her pupils; (b) provide a more valid assessment of the child’s overall ability and performance; (c) enable teachers to be more flexible and innovative in their instruction; 109

Changing Educational Assessment (d) provide a basis for more effective guidance of the child; (e) provide a basis for the teacher to improve his or her instructional methods; and (f) reduce examination malpractices. (Nigeria 1985) The National Policy advocated a rigorous training programme (both pre-service and in-service) for teachers who have to implement CA, and a detailed and comprehensive handbook (Nigeria 1985) was produced by a National Steering Committee. Each school was required to set up its own CA committee. In the handbook, two major problems were identified: comparability of standards, and record-keeping and the continuity of records. The handbook is interesting for its stress first on record-keeping and reporting, and second on assessment of achievement in the affective domain. CA is seen not only as a process of grading students at various stages of their school programme, but also ‘as a possible device for the nation to have objective data about the level of achievement of students at different levels of the educational system and a systematic accumulation of evidence on the standard attained within the system’ (Nigeria 1985). An evaluation of the Nigerian CA system in operation is given by Nwakoby (1987). Finally, it always seems helpful to regard England as a developing country since it shares some common problems (e.g. educated unemployment, concern about educational standards) with the other countries considered in this chapter. The introduction of GCSE, with its major coursework element is well documented, and I have selected just two quotations for inclusion here. The main aim of coursework assessment is to make what is important measurable, rather than of making what is measurable important. (SEC 1985, quoted in Horton 1986) The provision for coursework assessment is one of the most welcome features of the GCSE criteria, since it allows teachers a considerable degree of freedom to develop their own strategies for meeting the objectives of syllabuses, and also overcomes many of the disadvantages of timed written papers as a way of allowing students to demonstrate their abilities in the subject. (MacLure 1986) 110

Continuous assessment systems The aims of CA In all the countries considered, the introduction of CA has had two major effects. First, assessment leading to secondary students’ final results has been spread over a period of between one and three years. Second, a substantial element of that assessment has become school-based. Although there are differences of emphasis, the reasons for introducing CA appear in most cases to fall within the following broad aims, which are interrelated. To enhance validity of assessment It is argued that one-off, formal examinations are not a good test of pupil achievement. For example: [Coursework] allows candidates who do not perform well under examination conditions to demonstrate their true ability in a more relaxed atmosphere. Coursework can also be used to assess those skills that cannot be measured or assessed in a written examination. (Mkandawire 1984) Although in some cases continuous assessment may consist merely of a series of written tests, it is a general aim of CA to assess and report a wider range of student achievement. Thus CA may include a wide variety of styles, e.g. projects, essays, oral tests, practical tests, portfolios, assignments, interviews, questionnaires and teacher observation. CA is also intended to cover a much wider range of skills than traditional written examinations. These skills may span cognitive, affective and psychomotor domains, and in the case of the cognitive domain emphasize higher-order skills, e.g. of application and analysis. It is felt that the validity of student results is increased by gathering assessment over a substantial period of time, and by maximizing the range of educational objectives which are assessed. The stress on the assessment of affective objectives in several countries is particularly interesting. To integrate curriculum, pedagogy and assessment Changes in what is assessed are likely to be associated with changes in what is valued, and the concept of assessmentlinked (if not assessment-led) curriculum development appears 111

Changing Educational Assessment to be gaining ground. It may be that the introduction of CA is related to the wish to place more emphasis on relevant education. Certainly CA can be argued to reduce undesirable backwash effects of external examinations. The introduction of CA may also be related to concern about the quality of educational provision. A key feature in all the countries considered is the responsibility of teachers for continuous assessment of their own pupils, and their involvement in both the planning and implementation of CA. The introduction of CA provides considerable opportunities for inservice education and training (INSET). Another key feature of a CA system is feedback of assessment data — about individual students and about curricular effectiveness. This is associated with increasing openness and clarity about the objectives to be assessed and the results obtained. To serve a broader range of assessment functions, and in particular to emphasize formative functions It is my contention that professional educators should always bear in mind the question ‘What is assessment for?’ and their answers to it. The current shift of emphasis away from summative functions appears to be of great importance, at any rate within the world of education itself. Nevertheless it would be a mistake to conclude that assessments are no longer designed to discriminate between candidates! The next section of the chapter is devoted to consideration of the full range of functions that may be performed by assessment systems in general (and by continuous assessment in particular). Functions of assessment Frith and Macintosh (1984) classify assessment functions under six headings: diagnosis, evaluation, guidance, prediction, selection and grading. A more extensive classification is as follows, although there is some overlap between categories. The list is not claimed to be exhaustive, and some headings could be subdivided. 1 2 3 4 5 112

Certification and qualification Selection and social control (see Broadfoot 1984) Clear recording and reporting of attainment Prediction Measurement of individual differences (psychometrics)

Continuous assessment systems 6 Student/pupil motivation (whether teaching-learning structures are competitive, co-operative or individualistic) 7 Monitoring student progress and feedback to students on that progress 8 Diagnosis and remediation of individual difficulties 9 Guidance 10 Curriculum evaluation 11 Feedback on teaching and organization effectiveness 12 Teacher motivation and teacher appraisal 13 Curriculum control 14 Evidence for accountability and/or distribution of resources 15 Maintaining or raising standards. It would be interesting to compare exams and school-based assessment for success at these functions. Certainly many of them are cited in decision-makers’ rationales for the introduction of CA. In practice some functions are given greater emphasis than others. This is not just a matter of perceived importance, but is also partly because some functions (e.g. diagnosis and remediation of individual difficulties) are hard for the class teacher to apply, and partly because some (e.g. the selective and motivational functions) may be in conflict. The ‘map’ of assessment functions (Figure 1) is intended to clarify the classification and to stress that there are many assessment functions other than selection. Two dimensions are used in the model: (1) formative/summative, and (2) functions which can/cannot be applied to individual students. The model has been found to be a valuable teaching aid for post-graduate courses. It is not intended to be prescriptive, and readers might like to alter the contents of the boxes or even redefine the dimensions. The recording/reporting function is seen as central (since all assessment functions depend on it), and other functions fall in one of the four quadrants, with varying degrees of certainty. To clarify the formative/summative distinction, formative functions are largely internal to the school while summative functions have more external significance. The formative functions fit more closely with the concept of assessment as an integral part of the teaching and learning process rather than as something ‘bolted-on’; they may be seen as more ‘educational’ than the summative functions, many of which are more 'political’ in nature.

113

Changing Educational Assessment

INDIVIDUAL INDIVIDUAL

FORMATIVE FORMATIVE

SUMMATIVE SUMMATIVE

Student motivation

Selection & social control

Monitoring, feedback, guidance

Certification & qualification

Diagnosis & remediation

INDIVIDUAL INDIVIDUAL

Figure 1 Map of assessment functions

Prediction

Clear recording and reporting of student attainment

Curriculum control

GROUP

Feedback on teaching

Accountability

Teacher motivation

Standards

FORMATIVE FORMATIVE

SUMMATIVE SUMMATIVE

GROUP GROUP

Curriculum evaluation

Continuous assessment systems The second dimension of the map draws attention to the questions of ‘Whom is the assessment for?’, and ‘Who is interested in the assessment data reported?’. It might be argued that reliability of assessment is more important in the upper half of the diagram (which is concerned with assessment functions which can be applied to individual students), than in the lower half (where the assessment functions are more applicable to groups of students). The introduction of continuous assessment involves a shift of emphasis from right to left in the map. Problems of CA Continuous assessment is not without its problems, and countries considering the introduction of CA would be well advised to weigh up the pros and cons. The problems are both technical and practical, and some are more easily soluble than others. I shall restrict the discussion to a fairly brief summary. There is little published evaluation of CA schemes in developing countries. One most interesting example is given by Nwakoby (1987) who studied the operation of CA in a Nigerian state. She highlighted major problem areas as being: (1) inadequate conceptualization; (2) doubtful validity and (3) inadequate structural and administrative support. Clearly, problems of communication in the introduction of any educational innovation may be greater in a large country than in smaller developing countries, but Nwakoby’s research points to the need for clarity and for an adequate infrastructure. In all cases, public acceptability of CA is a significant consideration. Specific problems which may affect the implementation of CA schemes include: 1 Teachers may lack experience of, and expertise in, CA. In particular the quality of many classroom tests may be low, tending to negate gains in validity of assessment made possible by the introduction of CA. Possible solutions are the provision of adequate INSET support, and/or the construction of item banks. 2 Teacher workload may be substantially increased by CA. There is evidence from England (Pennycuick and Murphy 1988) that teachers are prepared to make the necessary effort if they perceive the benefits to themselves and to their pupils of an innovatory assessment system. Nevertheless, schemes should be designed to take account of pressure on teachers (for example in avoiding excessive demands of record-keeping and 115

Changing Educational Assessment reporting). Again, INSET can help. 3 If CA includes project work, there may be overload on pupils undertaking projects in several subjects simultaneously. Also, pupils from relatively wealthy backgrounds may be at an advantage in that they have greater access to resources needed for such work. 4 Administration of CA within the school may not be straightforward. For example, consideration needs to be made of what to do when pupils are absent for CA tests, or when a pupil transfers from one school to another, as well as how to deal with normal aggregation and weighting of marks. The Nigerian suggestion of establishing an assessment committee within each school appears to be a good one. Clear instructions and documentation from the national body responsible for CA is another aid to satisfactory administration. 5 There are several possible sources of unreliability in school-based assessment (see, for example, Mkandawire 1984). They include: administrative mistakes; teacher or assessor bias, conscious or unconscious (e.g. the ‘halo’ effect) and doubtful originality of the work (i.e. collusion or cheating). It requires constant vigilance to minimize these factors, but it should be remembered that there may also be unreliability in external examinations. It is usually argued that reliability is increased by assessing on multiple occasions, and it is not clear whether assessment by an external marker is necessarily more or less reliable than assessment by an internal marker. 7 Finally, there is the major issue of comparability, between classes within schools and between schools. Methods of ensuring comparability usually involve some form of accreditation and/or moderation. Three forms of moderation may be distinguished: statistical, visitation and consensus. In the context of most developing countries some type of statistical moderation is probably the cheapest and easiest to apply, but there is the danger if an external examination is used as the reference test that the examination (and its negative backwash effects) will continue to dominate the secondary school curriculum. Conclusions The two main issues are: (1) whether the introduction of CA in a developing country is desirable, and, if so, (2) whether it should (a) replace external examinations, (b) operate in parallel with, but separate from, external examinations or (c) form a component of students’ final results, together with external 116

Continuous assessment systems examination results. Both issues are dependent on local conditions and attitudes, in particular on what the priorities are, in terms of assessment functions, and on whether the problems can be overcome. However, it seems clear that the introduction of CA is likely to be most successful in countries with high levels of infrastructure, authority and consensus (see Havelock and Huberman 1977), and where there is adequate planning, adequate resources and adequate INSET to support the innovation. I recognize that this is easier said than achieved! Nevertheless, I hope that this chapter will be helpful to decision-makers by giving examples of CA in a range of countries, and by drawing attention to a set of salient factors to be taken into account. Acknowledgements I am grateful to Keith Lewin, Angela Little, Angus Ross and Graham Sims who commented on a first draft of this paper, and to my MA students at the University of Sussex with whom I have discussed the map of assessment functions presented here. References Broadfoot, P. (ed.) (1984) Selection, Certification and Control, Lewes: The Falmer Press. Frith, D. S. and Macintosh, H. G. (1984) A Teacher’s Guide to Assessment, Cheltenham: Stanley Thornes. Havelock, R. G. and Huberman, A. M. (1977) Solving Educational Problems, UNESCO. Horton, T. (ed.) (1986) GCSE: Examining the New System, London: Harper & Row. MacLure, M. (1986) ‘English’, in T. Horton (ed. 1986) GCSE: Examining the New System, London: Harper & Row. Mkandawire, D. S. J. (1984) ‘Assessment of subjects with a practical component in the Malawi School Certificate Examination’. Paper presented at the tenth international conference of the International Association for Educational Assessment, Perth, Australia. Nigeria (1985) A Handbook on Continuous Assessment, Lagos: Heinemann. Njabili, A. F. (1987) ‘Continuous assessment: the Tanzanian experience’. Paper presented at a seminar on Examination Reform for Human Resource Development, at the Institute of Development Studies, at the University of Sussex.

117

Changing

Educational

Assessment

Nwakoby, F. U. (1987) ‘The educational change process in Nigeria: an evaluation of the junior secondary school innovation in Anambra State’. Unpublished D.Phil, thesis, University of Sussex. Pennycuick, D. B. and Murphy, R. J. L. (1988) The Impact of Graded Tests, Lewes: The Falmer Press. SEC (1985) Coursework Assessment in GCSE, Working Paper 2, Secondary Examinations Council, London. Seychelles, (1987) ‘System of assessment of students’, National Youth Service, Mah6 (mimeo). Sri Lanka (1987) ‘Continuous assessment for the Sri Lanka GCE (O/L) examination: a brief note’, Department of Examinations, Colombo. TANU (1974) The Musoma Resolution, Dar es Salaam: Government Printers.

Note * The scheme was clearly too ambitious since it has now been abandoned by the incoming minister. It seems that there was considerable opposition to the scheme arising from several of the problems discussed towards the end of this chapter.

118

9 Exam questions: a consideration of consequences of reforms to examining and assessment in Great Britain and New Zealand Tony McNaughton In contrast to the imminent, system-wide changes to assessment and testing procedures in England and Wales, New Zealand's reforms in the assessment area until 1985 have been confined to relatively minor changes in national exams in the upper secondary school. Indeed as late as 1984 the Minister of Education was determinedly setting his face against changes to a system of national examinations that had had few structural alterations since it was established in the period from 1934 to 1945. Yet within eighteen months of a change of government we have seen the removal of the University Entrance exam from form 6. There remains at that level what was formerly a low-status, wholly school-assessed Sixth Form Certificate. A year later two widely discussed reports have been published, one a curriculum review that is proposing widespread structural changes to the curriculum (Reading 1) in order to address the key issues raised during the review. The other report was to give advice on the award of leaving certificates, including arrangements for the moderation of standards and policy advice for the further development of such awards. The recommendations on these matters included such details as the replacement of subject results reported as percentage marks with achievement-related grades and descriptions, and structural changes leading to criterionreferenced rather than norm-referenced assessments. Course and between-school moderation procedures that have to date been based solely on group reference test and national exam results are recommended for replacement by various forms of consultation that calibrate teacher, student and moderator judgements. Finally, it is suggested that a viciously discriminatory device we call ‘means analysis’ be replaced by criteriarelated assessments that are group moderated by discussion rather than by statistics. These curriculum and assessment reforms originated from a profound concern about the high proportion of New Zealand 119

Changing Educational Assessment students leaving school without any formal qualification. For some critics of the examination system this concern is summarized in terms of inefficiencies and the wastage of human resources in the present system of assessment and examining. For others in New Zealand there is much more concern with the chronic inequalities of the examination system in the upper secondary school and with distortions to any assessment process where there is a glib categorizing of students for selection and placement. In the meantime some aspects of the British government's Education Reform Bill have caught the eye of our Minister of Education who, on 12 July, expressed a strong belief ‘that our education system must devise strategies to assess student progress in an educative way’. He then commented that while the assessment as proposed by the Black report (National Curriculum: Task Group on Assessment and Testing: A Report 1988) is clearly an expensive business, we ‘will be watching developments in England with interest to see what lessons we can learn’ (NZ Herald 12 July 1988). The rest of this chapter is an analysis and critical comparison of assessment and examination reforms proposed for Britain and some other countries, and for New Zealand with a view to considering the lessons in our separate experiences. Criterion-referencing: advantages In most countries that I know of where there are reforms in train for the examination and assessment systems normreferencing is criticized because of the unwarranted restrictions it places on students who aspire to improve their grades. Black’s TGAT (1988) reports that normal distributions across levels at each age (i.e. at 7, 11, 14 and 16) will indicate that the attainment targets corresponding to those levels constitute the range of targets appropriate to that age, but that as curriculum and teaching are adapted to the new system (i.e. recommended by TGAT), improvement in these distributions may be expected. Such improvements can be recorded and investigated in the proposed scheme because it is to be criterion-referenced and not tied to any limited expectations at particular ages. The beliefs teachers and administrators have about the extent to which this opportunity can be taken up — ('It is a crime against mankind to deprive children of successful learning when it is possible for virtually all to learn to a high level.') and the resources a government is prepared to 120

Exam questions allocate to such an enterprise will determine the pattern and the amount of that improvement. In England (SEC 1983-4, Sir K. Joseph’s Sheffield speech) and Scotland (SED 1977) as well as in New Zealand (Renwick 1977) concern has been expressed at the unacceptably large numbers of students who have left school at 15 (in New Zealand) and 16 without any qualifications either because they were (and in New Zealand still are) one way or another dissuaded from taking certification exams or because having taken them they fell on the wrong side of a pass/fail line. In 1988 and in subsequent years in England all 16-year-olds will be eligible to (and actively encouraged to?) sit for the final exams of GCSE. By 1990, when a series of trials of criterionrelated assessments are complete (DES 1987), it should be possible to interpret a student's performance by referring not to the performance of other students but to student achievement levels in specified domains of knowledge and skills. In other words, criterion-referenced assessments aim to specify what individual students know and can do. Recommendations in the Black report are that individual student profiles, most items on which will be calibrated (in ways and to an extent yet to be finally determined) against results from standard assessment tasks and in relation to specified criteria, will be available on a confidential basis to show what each 7-, 11-, and 14-year-old knows and can do. Likewise for classes and for schools, but available on a less restricted basis, similar trials of criterion-related assessments are also being undertaken in New Zealand at the sixth form level in selected subject areas. On the way towards that goal of having students put themselves against defined levels of achievement rather than against one another, many other advantages are claimed. There are alleged (SED 1986a) advantages for curriculum and professional development generally as well as for ensuring valid and reliable assessments, in having individual teachers and subject working parties teasing out and refining criteria which apply to the key skill and knowledge elements in a subject. Once these criteria are spelled out and trialled, and provided they continue to accentuate achievement rather than lack of it, and are subject to careful and continued scrutiny and modification, they undoubtedly can have important educational advantages over grades which are not criterionrelated and over aggregated marks. Grades or marks may summarize but they do not explain the nature of specific competencies (SED 1986b). Pre-specified criteria ought also to make examining and 121

Changing Educational Assessment testing processes more valid and reliable insofar as they provide precise guidance for examiners and assessment task developers. Concerns that have been expressed that prespecified criteria lend themselves more to an instructional mode than to an educational/interactive mode of teaching will be considered in the next section. If each element of a subject (or across more than one subject) is assessed by grade-related criteria or by levels of achievement, a profile of achievements can be built up for the use of students, teachers and parents. Such achievement profiles will at best be matters of some celebration and motivation to all students and at least offer a reasonable alternative to the roughly 40 to 50 per cent of students sitting School Certificate subjects who have nothing to celebrate and nothing to strive for under a system tied to a norm-referenced system. Criterion-referencing: disadvantages Research in Scotland (Brown and Munn 1985), the experience of various working parties in England (SEC Annual Reports), in Scotland (SED Reports), in Australia (e.g. Queensland's Board of Secondary School Studies: Assessment Unit Discussion Papers), in New Zealand (DoE 1987) and in the USA (Chittenden 1984) all suggest that criterion-related assessment is not necessarily a panacea for all of the problems of normreferenced assessment and examining. Brown and Munn’s 1985 research reports on the changes to assessment procedures following Dunning (SED 1977) concluded that teachers’ enthusiasm for criterion-referenced curriculum development and assessment declined sharply when their professional judgements on criteria were superseded by SED standardized lists of pre-specified criteria. Trial teachers in three research reports preferred to be treated as professionals whose well-discussed judgements about assessment criteria and in-group moderation of assessments were exercised and respected. Brown and Munn 1985 concluded that criteria are best worked out by teachers and modified by practice and reflection on that practice if they are to be used in thoughtful, reflective, professionally appropriate and creative ways. Colin Power (1985) (Australia) argues in a similar vein by favouring the development of teacher connoisseurs over that of teacher technicians. (The British Southern Examining Group (SEG) is also exploring teacher accreditation for 122

Exam questions school-based curriculum and assessment developments.) Lawrence Stenhouse (1981 and 1983) argues that where the focus of teaching is upon knowledge rather than on skills of ‘mere information’ it may well be appropriate for teachers at the outset to consider and discuss criteria that can apply to an area of knowledge, but that teachers’judgements and assessments ought not to be confined to those that have been prespecified. Stenhouse argues that it is more appropriate for teachers to use their considered judgements and to seek to improve the quality of those judgements than it is to have minds set by a pre-specified list of criteria and especially one that has been prepared by someone else. Then there is the question of the efficacy of individual testing of any kind if its use is solely for selection or certification or even for estimating overall achievement (or meeting a pre-specified criterion) in a subject. Both Chittenden (in discussing science education evaluation, 1984) and Duckworth (in seeking to broaden a student’s use of his or her ideas on a topic, 1979) question the practicality of a teacher accurately measuring current levels of ability in school subjects. For Duckworth the only diagnosis that is really necessary is to observe what in fact the children do during their learning. This is not a diagnosis of notions; it is an appreciation of the variety of ideas children have about the situation, and the depth to which they pursue their ideas. (Duckworth 1979) If, as Duckworth avers, diagnosis too often focuses on problems or what the student seems not to know rather than on achievements, it (diagnosis) could by-pass instead of capitalize on achievements, and seek then to remediate where it would be better to seek to utilize and develop that which they already have. Educative assessment The principal purpose of assessment on the Chittenden (1984) model is to promote the intellectual development and the understanding of those who participate in the assessment — mainly students and teachers. It is not an interruption to teaching (Black makes the same point) or even a preparatory diagnostic exercise. Assessment is instead seen (e.g. Duckworth 1979 and 1986; Stenhouse 1981) as an integral part of teaching. 123

Changing Educational Assessment In Piaget's clinical method (Piaget 1976; Duckworth 1979 and 1986) the very act of assessment together with the special kind of relationship that can develop between a genuinely enquiring teacher and student (see also Freire’s teacher-as-student and student-as-teacher exchanges) has been shown to have educative effects in regard to the development of understanding and of cognitive abilities. Black’s TGAT report (para. 23) gives precedence to formative over summative assessment at each age level before 16 in order that teachers can recognize positive achievements and discuss them, and then plan the appropriate next steps. The importance of the educative components in formative assessment will depend on the extent to which teachers are relative free agents and able to explore with students the nature of their understanding of important ideas instead of having to plot a course to a particular attainment target that has been laid on the system by an outside authority. In similar vein TGAT report preference for the term ‘assessment tasks’ over ‘tests’, and the number of examples of assessment tasks (mostly from the Assessment and Performance Unit) underscores their seriousness about the importance of formative assessment. Yet, in the end the ‘guidance’ teachers are given through externally developed standard assessment tasks with standard marking schedules will tend to dominate assessment procedures if the external test results take precedence over teacher assessments. As with criterion-referenced assessments there are in educative assessment no prior assumptions of fixed abilities or fixed distributions of results, but unlike in criterion-referencing there are no pre-specified behaviours to be achieved. There is instead and in the first instance a careful identification of worthwhile tasks (Stenhouse 1981; Wood 1986; Power 1985; Duckworth 1986) that are both suitable and important for both students and teachers to address. The ways in which they are addressed (Humanities Project 1970; Duckworth 1986) will vary but, as Stenhouse has argued and Piaget has demonstrated, there is a strong case for neutrally chaired discussion and dialogue in the style of the Humanities Project, in Piaget's clinical method, in Freiran teacher-as-student and studentas-teacher interchanges, and the style of marae-based discussions being at the heart of truly educative assessments. As in diagnostic assessment (Black and Dockrell 1984) educative assessment provides information to improve teaching and learning, but unlike in some diagnostic testing, the results of educative assessment are intended to be fed back to the teachers and the students for individual and group 124

Exam questions contemplation and comparison rather than just for remediation. Educative assessment that has the development of understanding as an important goal is not a first step of a process that aims at attaining some pre-specified objective. It is instead an invitation to dialogue, to group discussion, negotiation and inquiry, the outcomes of which are unpredictable (Stenhouse 1981). Any judgements about those outcomes may be better done by teachers using implicit criteria that arise from extensive study of a discipline or curriculum area (like Colin Power's connoisseurs) than by reference to hand-medown criteria. In principle, both diagnostic assessment and educative assessment have little if any use for grades and predictions in relation to student performance. A test or some other assessment exercise that was intended to be educative would aim to change student performance (Drever and Simpson in Brown and Munn 1985) and hence to subvert predictions. A truly educative test or assessment task cannot end with a measure of the attainment of a particular criterion or a level of mastery since its function is to change behaviour. Any measure which is a step towards an as yet unknown possibility can therefore only have passing interest for teachers. Standards If assessment of individual performance is integrated with teaching (a firm recommendation from the TGAT) then there is the question of how the teachers and the public will know about the overall standards that are being attained. This problem is the reason for the elaborate procedures for the moderation and the standardization of assessment task results that are outlined in the TGAT report. It is also a matter that has led to New Zealand’s Minister of Education being concerned about the lack of evidence available to parents about levels of ‘efficiency and effectiveness’ in classroom programmes. In the Southern Examining Group in England, studies of teacher accreditation (Southern Examining Group 1986) aim to increase the autonomy of teachers within the framework of group moderation processes at school and groups of schools’ levels. The professional development of teachers is the key to this proposal (see also TGAT para. 17) and associated with it is the general purpose of GCSE exam reform — namely the raising of the level of achievement in students of all abilities. 125

Changing Educational Assessment Chittenden’s (1984) proposal for checking standards in science education in the USA cites the work of the Assessment and Performance Unit in the UK and the Ontario Assessment Instrument Pool in Canada as examples of group measurement strategies and sampling techniques to enhance the validity of assessments and course evaluations based on them. At the same time importance is given in each of these projects to teacher contribution to major parts of the testing process, i.e. to professional development in concert with testing and evaluation. The results of such projects at three- to five-year intervals provide bench-mark information for the community, teachers and school systems that is related to performance in their own communities. Standards are then available for group comparisons while avoiding the inefficiencies and inequities of individual testing. The most recent proposals for the reform of national testing in Scotland, England and Wales, and New Zealand may go some way to meet critics of the inefficiency and wastage in present systems, but if this then leads to an emphasis on prespecified objectives, to the ultimate dominance of external exams and tests in the moderation of standards, and to centrally controlled procedures and criteria, there is a risk that equity considerations across schools, regions, social and ethnic groups will come off second best. Both the crucially important formative and educative (after Duckworth et al.) aspects of assessment could then too easily be set aside while a prior emphasis is given to notions of reliability and to the questionable authority of national test scores. In such a case teachers might put their own professional skills to better use by judging for themselves and by consultation with colleagues any differences between students’ standards of performance and their own concepts of worthwhile performance while at the same time calibrating judgements with data from APUstyle in-depth surveys. If resources from local authorities and central government were deployed to compensate as needed for socio-economic and other causes of discrepancies then we might see the slogan of equality of opportunity replaced by equality of high-level performance (see Renwick 1977; Bloom 1982; SEC 1983-4). The following material is excerpted from DES (NZ) 1987. The national common curriculum: principles The Committee identified fifteen principles as basic to the curriculum of every school in New Zealand: 1 The curriculum shall be common to all schools. 126

Exam questions

2

3

4

5

6

7

8

It will maintain consistency between schools but permit flexibility and diversity. It will protect students from the disadvantage of cutting themselves off too early from vital areas of learning, or failing to discover talents and abilities through lack of exposure to learning that can reveal them. The curriculum shall be designed so that it is accessible to every student. The curriculum will be equally accessible to all students regardless of race and colour, cultural background, social background, gender, religious beliefs, age, physical and intellectual characteristics, or geographic location. Some students will require special provisions to enable them to take up learning opportunities. The curriculum shall be non-racist. The curriculum will honour the promises of the Treaty of Waitangi to the Maori people on Maori language and culture. It will recognise and respond to the aspirations of all people belonging to the different cultures which make up New Zealand society. The curriculum shall be non-sexist. Learning shall not be limited by gender. It should encompass and take realistic account of women's experience, culture, and attitudes as well as those of men. The curriculum shall be designed so that all students enjoy significant success. Students will be extended, and challenged to strive for their personal best performance; however, no students will be set learning tasks they cannot be expected to accomplish. The curriculum shall reflect the fact that education is a continuous and lifelong process. Learning must build on what has already been learned, and prepare for the learning that is to come. Learning how to learn is an essential outcome of school programmes. The curriculum shall be whole. Connections and relationships between the aspects of learning must be clear to students. Teaching and learning should not be fragmented by artificial divisions of school organisation, time-tabling, or subject boundaries. The curriculum shall be balanced. Learning must be broad and general rather than 127

Changing Educational Assessment

9

10

11

12

13

14

15

128

narrowly vocational. There must be balance in the value given to knowledge, skills, attitudes and values; and in the status given to particular areas of knowledge. The curriculum for every student shall be of the highest quality. So that every student can develop fully as an individual and as a member of the community, all schools must strive constantly to provide teaching, programmes, and materials of the highest quality. The curriculum shall be planned. The planning must ensure that all aspects of the school's curriculum, including organisation and everyday practices, are consistent with the aims each school will develop. The evaluation of learning must be an integral part of curriculum planning. The curriculum shall be co-operatively designed. Decisions about the curriculum will be shared by people representative of the many groups who make up each school and its community, including students, parents, whanau, and teachers. Provision shall be made for people affected by decisions to participate in making these decisions. The curriculum shall be responsive. Each school must continually review its curriculum to make sure it is responding to the needs of communities and cultures, to the needs of New Zealand society, to new understandings of how people learn, and to the changing needs of individual learners. The curriculum shall be inclusive. All students should feel part of an education system which has been designed with their active involvement — it should be learner-friendly. The curriculum will take account of the needs and experiences of all students, including their background knowledge and existing ideas, and the diverse character of the community. The curriculum shall be enabling. Students will be empowered to take increasing responsibility for their own learning; and be involved with the teacher in setting their own goals, organising their own studies and activities, and evaluating their own learning and achievements. The curriculum shall provide learning that is enjoyable for all students. Effective learning is satisfying. It can be challenging

Exam questions and disturbing. It can also excite and stimulate. It can be fun. The national common curriculum: aspects of learning The curriculum must provide for learning in three equally important aspects: knowledge, skills, and attitudes and values. Knowledge The knowledge basic to the New Zealand curriculum is knowledge which: helps students to understand and be confident in their own culture and in the culture of Aotearoa/New Zealand, and to be sensitive to that of others. This knowledge will build students’ self-confidence and self-esteem through understanding of their own culture and heritage. It will enable them to participate effectively in New Zealand society and understand New Zealand’s heritage and past, its place in the Pacific, and its relationship with the other societies of the world. It will recognise Maori people as tangata whenua of Aotearoa. It will acknowledge the diverse cultural make-up of New Zealand, and provide for the sharing of the heritage of its people. It will provide opportunities for appreciating and responding to the ideas and perceptions of people from other times and places. develops students’ confidence and ability in mathematics. This will help students solve measuring and arithmetical problems arising out of everyday living, and recognise patterns and relationships whether in number or shape. develops students’ creative and expressive confidence and ability. This will enable students to express themselves creatively through artistic, practical, and physical activities, and to appreciate the creativity of others. develops students’ confidence in handling the day-to-day practicalities of their own lives. This will contribute to practical and technical learning which students can apply to their own well-being and to their role as consumers and providers. develops students’ understanding of how individuals and groups relate to each other and work together in social, political, and economic ways. This will contribute to understanding the structures which make up societies, the processes which bring about 129

Changing Educational Assessment change in society, and the means of participation in those changes. It will provide insights into conflict and its resolutions. develops students’ understanding of the physical, biological, and technological world, and how people interact with and influence their environment. This will help students understand scientific concepts, the use of technology, and the implications for society and the individual. develops students’ understanding of their own and others’ growth. This will help develop concepts and practices relating to physical activities, health, safety, and survival, and the responsibilities of individuals for the health, safety, and survival of themselves and others. develops students’ confidence and ability in language. Language is fundamental to learning, communication, personal and cultural identity, and relationships. This also recognises the importance of learning being enabled to develop and maintain their first language. These areas of knowledge are inter-related, and contribute equally to the wholeness of knowledge. Skills The skills which make it possible for students to learn and to apply that knowledge to their lives, and which are basic to the New Zealand curriculum, are those which enable: thinking Skills of locating and acquiring knowledge; constructing new ideas, clarifying existing ideas, linking new and old ideas, changing existing ideas; reflection and evaluation; analysis, logical organisation, and presentation of thought; listening; associating, reasoning; creative use of intuition, problem solving. expressing Skills of creating and expressing ideas and responding to the ideas of others in a variety of styles and forms. relating to others Skills of relating to other people in ways which recognise and appreciate the diverse personal qualities and ways of life, and the contribution made by different cultures and individuals; carrying out practical activities and studies Skills for learning through carrying out practical tasks, and applying creative ideas to solve practical problems. These groups of skills are inter-related. They are intended as a checklist for planning, rather than a method 130

Exam questions of organising teaching. Any student's activities are likely to involve elements from several of these skill groups. Attitudes and values Cultural background and cultural values strongly influence how individuals treat people, situations, experiences, and information. Children learn attitudes and values from their family, from the community, from society and, increasingly, through the media. No learning is values free. Some values are clearly understood and supported, and can provide a base for curriculum planning. Others are not as widely supported and are likely to be controversial. This will need to be carefully considered. In a diverse society, different groups hold different beliefs and values. Such differences must be respected, and common ground established at the local level. For example, views on religious attitudes and values, and on religious instruction in schools, vary. The Committee supports the present legal position set out in section 77 of the Education Act 1964. This states that for primary schools ‘ . . the teaching shall be entirely of a secular character’. The Committee also supports section 78 of the Education Act 1964, which allows for the closure of primary schools for religious instruction. This section sets out the authority of the school committee in making decisions about religious instruction, and no change is proposed. Teaching about the religious attitudes, values, and beliefs that people hold is already part of the curriculum of some schools. Some of the desirable attitudes and values the community mentioned most frequently in their responses were a sense of fairness; concern for truth; honesty; selfrespect and self-esteem; self-discipline; respect for other people and for their cultures, beliefs, opinions, and property; responsibility for one's own actions, trust in other people, aroha (love); mansakitanga (hospitality); wairua (spirituality); tolerance; and adaptability. Students receive powerful messages about attitudes and values from the school. They perceive the kinds of knowledge and skills which are seen to be important. From the way the school is organised they see who has the power, who participates in decision-making, and how students are grouped. They learn attitudes and values from things like school rules (what they are, and who decides what is important) and from the quality of relationships between people in the school (student-teacher/supp131

Changing Educational Assessment ort/staff/parents/community). Even everyday practices such as offering hospitality, catering, or cleaning reinforce sexist attitudes when they are seen to be carried out only by females. Each school must regularly review those values it wishes to develop, and the ways it will foster them. Inter-relationship of the aspects of learning The school curriculum has not always recognised that knowledge, skills, attitudes, and values are interdependent and inter-related. Traditionally, schools have emphasised knowledge along with some planning for skills, leaving attitudes and values largely to chance. Academic subjects have been seen to have higher status than more practical ones. The Committee believes that practical studies and activities are important and warrant an equal status; and that skills and attitudes are as important in learning as knowledge, and need to be planned for. The Committee is firmly of the view that children and young people must acquire the basic skills, knowledge, and attitudes and values that enable them to go on learning, to function effectively in everyday life, both now and in the future, and to respect themselves and live and work with others. The language skills of listening, speaking, reading, and writing are important basics, as are the skills for solving arithmetical and measuring problems. Also basic are self-esteem, the ability to meet learning challenges, the skills and knowledge needed to look after themselves and others, and the social skills of getting on with and working with others. The Committee wishes to encourage schools to organise their programmes in ways that best suit their own students. Some schools may choose to continue, for a time, an organisation based on the present subjects; others will choose to organise their programmes around themes or experience-based activities; yet others will integrate some or all of their programmes into broad areas such as humanities, technology, or social and cultural activities. The new curriculum design enables schools to provide for constructive use of time through artistic, physical, technical, and practical activities. All of these contribute to working life, encourage initiative, and can lead to entrepreneurial skills. These activities must be provided throughout the curriculum and developed gradually throughout schooling. The curriculum design will enable schools to accommodate, should they wish to do so, new developments that 132

Exam questions arise from time to time. As they arise, schools will need to examine each development and to determine how it can be fitted into the curriculum design and how it embodies the curriculum principles and fulfils the aspects of learning. Peace studies is one such current new development: it contributes to a number of the aspects of learning and can meet the requirements of the principles; therefore it has strong claims for being incorporated into the school curriculum. Other new developments should be looked at in a similar way before schools decide whether or not to include them and, if so, how. New developments being discussed at present include trade union education, consumer education, and human rights education. Others will undoubtedly arise in the future. Helping students prepare for working life (known as transition education) warrants special mention. Responses to this review said that the present practice of add-on courses for those students most at risk is inappropriate. Skills and knowledge to equip all young people to find and keep jobs, live independently, cope with change, and make the best use of their leisure time must be built into the curriculum. Once included, their presence should be visible; students, parents, and employers should be told what students are being taught, and why. The curriculum design does not require any single form of school organisation, so the Committee has not referred to traditional subject titles or suggested time allocations. Whichever approach they follow, schools must provide a coherent programme which emphasises connections between the aspects of learning. These connections must be clearly seen by students; therefore programmes must be devised so that the relationships are stressed from day to day and topic to topic. It is these connections and inter-relationships which are sometimes missing in the current school curriculum, where subjects have often been taught in isolation with consequent repetition or omission of content. Schools will need to reorganize their curriculum to ensure that learners are aware of the connections between knowledge and skills, and the attitudes and values they carry. Learners must not be left to work out for themselves the interdependence of the aspects of learning: they must be helped to see and experience the relationships.

133

Changing

Educational

Assessment

References Black, H. and Dockrell, B (1984) Criterion-Referenced Assessment in the Classroom, Edinburgh: SEG. Bloom, B. S. (1982) Human Characteristics and School Learning, McGraw Hill: New York. Bloom, B. S. (1987) Review of Ed. Research. Brown, S. and Munn, P. (eds) (1985) The Changing Face of Education 14 to 16: Curriculum and Assessment, NFER-Nelson: Windsor. Chittenden, E. A. (1984) Interim Report: Committee for Assessment in Science Education, Princeton, New Jersey: Education Testing Service. DES, Welsh Office (1977) 77K? National Curriculum 5-16 -A Consultation Document. DES (NZ) (1987) Curriculum Review. DES (1987) 77K? National Curriculum 5-16. DES, England and Wales (1988) National Curriculum: Task Group on Assessment and Testing. DoE (1987) 77K? National Curriculum 5-16. DES, England and Wales (1988) National Curriculum: Task Group on Assessment and Testing. DoE (1987) Sixth Form Moderation Trials, Wellington. Duckworth, E. (1979) ‘Either we’re too early and they can’t learn it or we’re too late and they know it already: The dilemma of applying Piaget’, Harvard Educational Review, 49:3. Duckworth, E. (1986) Teaching as research’, Harvard Educational Review, 56:4. Goldstein, H. and Nuttall, D. (1986) ‘Can graded assessments, records of achievement, modular assessment and the GCSE co-exist?,’ The GCSE: An Uncommon Exam, Bedford Way Paper No. 29, London University Institute of Education. Humanities Curriculum Project: An Introduction (1970), London: Heinemann Educational Books Ltd. Piaget, J. (1971) Science of Education and the Psychology of the Child, London: Longman Group Ltd. Power, C. (1985) Criterion Based Assessment and Reporting, Australasian Conference of Assessment and Certification Authorities, Adelaide. Renwick, L. (1977) ‘Rights to education’, in W. L. Rtnmck Moving Targets, NZCER, Wellington, 1986. Sadler, R. (1986) ROSBA Family Connections, Board of Secondary School Studies, Queensland. SEC (1983-4) Annual Report, London. SEC (1984-6) Annual Reports, London. SED (1977) Assessment for All, Edinburgh. SED (1986a) 'Interview with Dr E. McClelland', Edinburgh. SED (1986b) Assessment in Standard Grade Courses - Proposals for Simplification, Edinburgh.

134

Exam

questions

Southern Examining Group (1986) Accreditation of Teachers - The Way Ahead, Guildford: SEC. Stenhouse, L. (1981) ‘Educational procedures and attitudinal objectives’, Journal of Curriculum Studies, 13:4. Stenhouse, L. (1983) Authority, Education and Emancipation, London: Heinemann. Task Group on Assessment and Testing (1987) A Report, London: DES. Wood, R. (1986) ‘Interview on developments in assessment’, London: Pelican.

135

10 Bring your grandmother to the examination: Te Reo O Te Tai Tokerau Project, New Zealand Paul Rosanowski Tihi mauri ora. Tena kotou, tena kotou, tena tatou katoa. The owners of the Maori language and the workers on the project would have me greet you in an appropriate way in Maori. The Maori culture is rich with nga whakatauaki (proverbs). I selected this one from the literature of the project, because it puts an action focus on the contents of this chapter. It is: Ko to timatanga o te kauri rangatira Ko te takano nohinohi. When translated, the whakatauaki tells us that from a small acorn a great oak can grow. This chapter is about an innovative assessment project. It is also about harmony. By this I mean the harmony achieved when assessment practice is appropriately set in a culturallyalive, social context. The project was developed in the context of a national School Certificate examination which is nationally moderated but locally controlled, and my account is presented as an observer for the School Certificate Examination Board. Appropriateness is another theme of this account, and my conception of this theme bore heavily upon me when I began to write this chapter. In order to make a point, I share the burden with you by recalling the weight of the responsibility, and the challenge of the task, that any non-Maori should experience when writing or speaking about an intensely Maori project like this one. Let me also recall that I have received nga tautoko (permission) for this exercise from those equipped to give it. This is important not only because it is culturally appropriate. There is another dimension. Without this dimension you might see this project only as an academically interesting variation on the well-known, and 136

Bring your grandmother to the examination well-developed, Oral Proficiency Interview (OPI) technique for oral assessment which has been in use for approximately fifty years in different forms. It is, however, much more in the New Zealand context of School Certificate oral Maori. The project is loaded with sensitivities, cultural connotations and even emotions, so complex and ancient, tapu, and strong that we can touch on only a few here. It is worth remembering, also, that even with the best will in the world, we can understand only a smattering of the affective dimensions of the project, principally because we do not live (or live in) the cultural context ourselves. None of these considerations should, of course, prevent anyone from appreciating that which is good. What makes this project unique, and good, is the successful efforts made to produce a culturally-sensitive style of assessment. This style respects Maori traditions and conceptions of knowledge, learning and assessment in a manner which is politically appropriate. The manner is appropriate for me, anyway, because it empowers the owners of the language — the people of the Tai Tokerau — to have a part in the assessment events themselves as well as a very significant say in the lexical and grammatical content of the Maori language used. The content is worth mentioning here because the tribal, dialectical forms which characterize the geographical area of Northland are the ones used by the assessors and the candidates. It was the people of the Tai Tokerau (Northland) who helped decide what dialectical forms of the Maori language should be used. The preservation of the Tai Tokerau dialect, partly through the instrument of the School Certificate examination, is one of the extra-educational dimensions to this exercise. In passing, also note that one of the reasons why other tribal groups have expressed interest in the project is the strong desire to preserve and expand the Maori language in general and their tribal forms in particular. The title of this chapter, Take your grandmother to the examination’, was chosen to leave you with an image that might be more difficult to forget. It was also chosen to catch your eye and interest. Nevertheless, it is an image carefully chosen to illustrate two culturally different ways of viewing the world. In our case the part of the world under examination is the assessment of the Maori language world. Europeans, whether they live in Europe or New Zealand, do not usually associate grandmothers, or grandfathers for that matter, with a formal, assessment event. The usual pakeha (non-Maori) reaction to a title like this one is to see it as being catchy, puzzling and maybe amusing. 137

Changing Educational Assessment But there is another view. To our Maori colleagues, grandparents — indeed all old people — are taonga to be treasured as the repositories of tikanga Maori (Maori traditions and more . . .). Besides, the old people are the essence of whanaungatanga (familyness) — another cultural theme. Other foundations of Maoritanga also inherent in the grandparent image are manaakitanga and whanaungatanga (helping, hospitality, family). These themes are all present within the project along with arohatanga (a loving concern for others). All of these cultural mores make it appropriate to take your grandparent to your examination, especially when it concerns one of the central strands of all — te reo (the language). Part of the significance of the project is the fact that the Tai Tokerau people are supportive of the Department of Education’s initiative to aid the Maori people in their considerable efforts to increase the mana (prestige) of their language. This is a part of the project's political dimension mentioned earlier. A description of the project should begin by noting the special importance of the community consultation process which preceded the establishment of the project. Kaumatua and kuia (male and female elders, often grandparents) were consulted. This consultation was essential for two reasons. First, for the project to be harmoniously received by the Maori community, and consequently successful, the support of the old people was an imperative. Second, the preservation of the Tai Tokerau dialect and lexicon is important to the people of the far north. The elders are the arbiters and repositories of their dialectal forms of the Maori language. Local community involvement was, then, a notable feature of the project. This involvement has two dimensions to it: social and educational. The Tai Tokerau Project is part of a comparatively rapid development by the New Zealand Department of Education in the area of the assessment of oral Maori for the New Zealand School Certificate. Generally speaking, School Certificate is taken by 15- to 16-year-olds in their third year of secondary schooling. The project took the form of a one year trial conducted with all of Northland’s secondary schools. There were about fifteen schools and approximately 200 School Certificate candidates involved. The project was set up to investigate the feasibility of teacher-based assessment of oral Maori and to look into alternative forms of inter-school moderation. The approaches to moderation that the Department traditionally uses were thought to be inappropriate given the small number of candidates who were often located in isolated schools where their teachers had little opportunity to 138

Bring your grandmother to the examination develop their assessment techniques let alone become knowledgeable about national standards. Their training hui (meetings) for teachers, and the appointment of a Departmentallyendorsed, national moderator, produced satisfactory betweenschool moderation of attainment standards. A holistic approach to assessment was taken, and will continue to be taken from 1988 on, in terms of levels or grades defined according to agreed criteria. The procedure involved was to compare candidate speech samples, in an Oral Proficiency Interview format, with grade descriptions in order to find the closest match. At the heart of the project was a teacher-training programme. The importance of the ‘back-wash’ effect into the area of teacher development must be stressed as one of the project’s strengths. The training hui focused upon the conduct of the candidate interviews. The reliability of the teachers' assessments was evaluated by comparing the teachers’ gradings with those of a different set of School Certificate examiners. In 1988 the project comes of age as a ‘new assessment scheme in Northland’ (The Education Gazette, 1 February 1988). From 1988 on there will be at least four assessment events in oral Maori in Northland. Two will be by the candidate's class teacher, while the remaining two will be by a member of the regional team of teacher connoisseurs. The four assessment events take the form of oral assessment interviews. Each interview is structured into a logical format and is conducted in five successive phases within a specified time frame. Within this time frame candidates have the opportunity to communicate what they think and feel to another human to whom they have chosen to talk. The interviews emphasize spoken communication but they can also allow candidates to demonstrate their understanding of deep, Maori, cultural values and situations which permeate all utterances in Maori. In spite of making the above claim, we must also observe that the oral interview is a pakeha (European) format. Nevertheless, the evaluators and the teachers seem to agree that the Tai Tokerau version of the interview, despite its restrictions, allows a trained teacher-assessor to assess a candidate's knowledge, and use, of several culturally-appropriate means of communication within a supportive and human context. For instance, to Maori people the use of the shrug, the raised eyebrow, the discrete cough, eye contact, and the use of, and reaction to, silence are all extra-linguistic considerations or variables that influence the process of communication. They must be seen through Maori cultural lenses. This assessment 139

Changing Educational Assessment format allows this to happen in a unique way. The interview allows the student to use culturallyappropriate verbal and non-verbal communication to talk to their assessor in a natural setting as they move through the five phases of greeting their assessor (whakatau), talking about their family, home and school, speaking in Maori on prepared topics, responding to pictures and so on. The natural setting for these structured interviews is created, in part, by the presence of music which may be used to support oral performances. Usually a guitar is used. It may be played by the candidate or a supporter. Supporters, who are sometimes grandparents, offer their support as morale boosters by being there with the candidate while their interview is taking place. They are often there the whole day. It is important to understand that, as a generalization, Maori people say they feel more at home in the company of others. A oneto-one interview in an otherwise empty room would put an ethnic Maori candidate at an immediate disadvantage even before the assessment event began. The day begins with an ancient powhiri (welcoming ceremony) carried out according to local kaWa (customary way of doing things). Different tribes do things differently, but the first choice for venues are always marae (traditional meeting places) with a wharenui, a big meeting house with no internal walls. The powhiri involves the visiting assessors who are called on, as a group, to the marae by the karanga singer(s). By the time the powhiri is over, with its speeches and songs completed, the candidates have had time to meet the visiting teacher-assessors individually through the hongi (nose press). The moderator and the rest of the visitors are also met in this way. This welcoming event is culturally essential and flexible in its purpose. An educational event fits easily into this ancient, cultural framework of greeting and meeting. The powhiri in this context also serves a most significant educational purpose. The powhiri, the marae venue and the supporters all help ameliorate much of the menace that is usually part of the examination process. This menace manifests itself in ‘examination nerves’ and in a consequent drop-off from optimum performance. There was plenty of evidence to show that many candidates were nervous but there is no doubt that they would have been more nervous in a traditional examination room. The wharenui, or hall, is set up into interviewing bays. At a marae venue the hall is always furnished with covered mattresses. The assessors spread out around the walls. I 140

Bring your grandmother to the examination understand that sometimes kaumatua (elders) sit alongside the assessors and so lend silent support to the interviewer. This practice helps bond the school-community relationship which, as we have seen, is an important element of this educational development. The exercise is orderly and organized, professional and purposeful. Most important of all, the assessments are carried out with a human dimension which effectively reduces candidate-stress and apprehension. Most of the explanation of this reduction lies in the open nature of the proceedings, which are held in familiar physical conditions and a harmonious cultural framework and context. The results are valid examinations of competence in oral Maori language. Finally, it is important to note a special consideration present in a situation when most candidates are learning Maori, their ethnic language, as a second language. The overwhelming majority of Maori speakers, both Maori and pakeha, acquire English as a first language then learn Maori in a school environment. The teacher-assessors are bi-lingual, which involves another dimension. So, besides benefiting the candidates the project has increased bi-lingual teacher involvement and hence increased Maori control over their language. It has had a very important ‘wash-back’ effect into teacher training of a kind which reflects the oral tradition of Maori language and culture. The project's holistic view of assessment encompasses much more than the Maori language itself. An action dimension to the project lies in the expectation that the education community will see the Tai Tokerau Project, endorsed this year into permanent policy and practice, as a timatanga (a beginning). Let me recall the proverb we began with: Ko te timatatanga o te kauri rangitira Ko te takano nohinohi. From small beginnings great kauri trees grow. No reira, Ka hoki atu, kai kotou rangitira ma, tena kotou, tena kotou, kia ora tatou katoa, Acknowledgements James Marshall and Michael Peters of the University of 141

Changing Educational Assessment Auckland have produced an evaluation of this project. I wish to acknowledge their work and that of the teacher connoisseurs with whom they collaborated.

142

11 The GCSE: promise vs. reality Desmond L. Nuttall

In the summer of 1988 we saw the first GCSE examinations, based on courses that started in September 1986. It has been an examination very long in the making, with a most elaborate research and development programme carried out during the early 1970s. This was followed by a recommendation to government in 1976 that a common system of examining should be introduced to replace the dual system of CSE and GCE O-level that we had had since the mid-1960s. It was 1984 before the government finally made up its mind to introduce the GCSE. The original aim was to create a comprehensive examination for a comprehensive educational system, to reduce divisiveness and to reduce the difficult choices that had to be made, mainly by teachers, about which examination, CSE or O-level, a young person should enter. When I looked at the emerging plans for a common system in an article that was published in 1984 (Nuttall 1984), I came to the conclusion that ‘there is every danger that the common system now being created will be divisive, bureaucratic, retrogressive and obsolescent — almost exactly the opposite of the common system as desired by its proponents of the late 1960s and early 1970s’. When I re-examined those conclusions in a paper at the BERA Annual Conference in 1985 (Nuttall 1985), I saw no reason to change them. My paper was reported in the educational press and had an immediate effect: a research contract that I was negotiating with the Secondary Examinations Council was cancelled, I was removed as a consultant to the SEC, and I was taken off all but one of their committees. It is therefore with some trepidation that I reexamine those conclusions yet again. On this occasion, I do not claim that I am providing anything like a comprehensive evaluation of even the first year of the GCSE; it is clearly too early — we haven’t got all the facts and figures yet. Nor am I saying much about the history of the preparation for the GCSE and the associated in143

Changing Educational Assessment service work, though I would commend the research of Hilary Radnor (1987). I am also not going to look at the media reaction which, itself, makes a fascinating study. Headlines varied from ‘GCSE is hailed as a leap forward for education’ to ‘Half-baked: Now do the honourable thing Minister and resign’. These two quotations indicate the very varied expectations people had. Finally, I am not going to elaborate on the theme of GCSE as a dummy-run for the whole national curriculum and national curriculum assessment movement, leading to political control of the curriculum, and the power of the assessment-led curriculum. The features of GCSE In my view, GCSE has three distinctive features. First of all, it is governed by an elaborate set of national criteria. It is the first time, I think, in 'the history of English education since the war that such detail was specified by government of the objectives of learning, the content and the types of assessment device that had to be used for all courses of study in twenty of the major subjects of the curriculum. Second, the GCSE stipulated that all examinations should differentiate. That word has come to have a very specialized meaning within the context of assessment: that is, that the examination system, though distinguishing between seven different grades of performance, should at the same time differentiate between students in a manner that allows every study to demonstrate in positive terms what they know, understand and can do. The third important feature was the stipulation that every course and examination should contain an element of work done during the course as well as a set-piece examination at the end of a course. Indeed, many subjects do not require an examination at the end of a course at all — the award can simply be based on continuous assessment of work done during the course. The coursework was introduced primarily to allow objectives that are difficult to assess in timed examinations to be assessed, particularly practical work, oral work and so forth. Evaluating the GCSE First, is the GCSE examination divisive? The difficult concept of differentiation means that, in many subjects, there have to be different papers for children of different attainments. To 144

The GCSE: promise vs. reality allow even the lowest-attaining candidates to show their achievements positively requires a set of tasks on which they are likely to succeed; if we are to have enough of such tasks it means that it is difficult within the same examination paper to design tasks that will also challenge those destined to get the highest grades. So in many subjects, maths and sciences in particular, there are papers pitched at different levels. The easier assessment route may lead to the possibility of reaching only a grade D or a grade C. Conversely, the hardest route may allow the candidate to reach grade A, but may permit only a grade D as the lowest grade; if you fail to get that, you get nothing rather than a grade E as a consolation prize. Caroline Gipps' paper, given at the British Educational Research Association (Gipps 1988), showed some particular concerns in the minds of both pupils and teachers about this problem of grade limitation. Her work also shows that, in many cases, the decision as to which set of papers (the harder or the easier track) should be taken could be made quite late in the day; most decisions about entry could thus be made rather later than they were between CSE and O-level. Clearly we need more evidence about the effect on teaching and learning of differentiation strategies. Of course, many pupils did not take the GCSE examination at all, or took examinations in a very restricted range of subjects, giving the lie to the rhetoric that said that the GCSE examination was going to be an examination for all our 16year-olds. There is still a divisiveness between those who take public examinations and those who do not —something which I consider to be a much more serious educational and social problem. I refer to other aspects of divisiveness below. Second, is it bureaucratic? To that question the answer has to be a resounding ‘yes’ Syllabus approval takes place now at two levels: first of all in the examining group, and second at what was until recently the Secondary Examinations Council (that has now become the Schools Examination and Assessment Council). This approval has to take place to ensure that the syllabuses comply with the detailed national criteria, and syllabus approval has now become quite a legalistic process. Those syllabuses under Mode 3 (that is, syllabuses designed by teachers within their own schools to meet the local context and local needs) do survive, but in very small numbers compared with the position under GCE and CSE. They, too, now have to be vetted, not only by the examining group but also by the Secondary Examinations Council, and many are being rejected, or at least sent back for revision. Hilary Radnor (1988) shows how it is still possible to create a Mode 3 scheme under 145

Changing Educational Assessment GCSE that is liberating, and that indeed perhaps goes further than most GCSEs to meet these fundamental objectives of criterion-referencing, student involvement and differentiation. I understand that the syllabus has not yet proceeded to the Secondary Examinations Council, and therefore has not yet been approved. Advice has been given that no syllabus that has not yet been approved should be used for 1990 because it could be rejected and thus jeopardize the prospects of young people. The bureaucracy is even worse when you try to go for an exemption from the national criteria. The national criteria themselves do allow the possibility of exemption of a scheme for the purposes of curriculum experimentation. There are no figures that I can get hold of, but there have probably been fewer than five exemptions from the national criteria so far. I have been associated with the piloting of one of those. It had to go from the Secondary Examinations Council to the Department of Education and Science, yet another layer in the bureaucracy. We had to wait eight months for the DES to make up its mind and, when the decision came, it was hedged around with several stiff conditions. There is under way, at the moment, a review of the national criteria, to seek more consistency between subjects, and to iron out some of the bugs. Even if that review were to be finished today, it would be most unlikely that it could affect examinations taken before 1991 and possibly before 1992. My fears that the national criteria would be difficult to change and curriculum development therefore slow moving have, I think, been realized. Another kind of bureaucracy has occurred through the nature of coursework. The criteria for assessment and for the design of tasks have been spelt out in great detail. There is a lot of evidence that teachers have over-assessed in their anxiety not to put children at risk in the first year of a new examination, and there is also evidence that children have overworked. The form-filling and the moderation systems are more bureaucratic than those that have preceded them, and it is clear that, of all the changes that will be made on the basis of the experience of the first two years of GCSE, changes to the coursework will be the first. Nevertheless, I must make it clear that there is increasing evidence that the GCSE courses have stimulated more imaginative teaching and learning, including a move away from teaching that relies on dictating notes, and examination papers that require the regurgitation of those notes. The requirement that there should be more practical and oral work 146

The GCSE: promise vs. reality has certainly begun to affect classroom practice, as is made clear in the preliminary evaluation by Her Majesty’s Inspectorate (HMI 1988). There can be positive effects of an assessment-led curriculum. What I am not so sure about is whether these new emphases within the curriculum will reduce divisiveness. There have been many fears expressed that coursework requirements have put the middle-class child at an advantage; there are plenty of ways in which that child can be supported from the home in doing homework and coursework for their examinations. In the ILEA we are monitoring group differences carefully. We are just collecting the statistics at the moment to look at whether the pattern of differences, first between the sexes, has changed with the introduction of GCSE. Then we shall be looking at the pattern of differences between ethnic groups, something that we have been investigating over the last three years in relation to O-level and CSE. Finally, we would want to look at the pattern of differences between social classes. That is more problematic. Inner London has many economic and social problems; for example, a large proportion of children come from one-parent families and many come from families where one or both parents (if there are two parents) are unemployed. The conventional RegistrarGeneral's classification of social class is almost meaningless in that situation. The GCSE appears retrogressive (with honourable exceptions such as the particular Mode 3 scheme described by Hilary Radnor 1988), since it does seem that the curriculum has become more monolithic, more subject-based and more traditional in its specification under GCSE than it was capable of being under O-level and CSE. It has proved difficult for the GCSE national criteria to accommodate new developments, for example, to the modular curriculum and to some of the cross-curricular initiatives. Moreover, after the publication of the reports of the Task Group on Assessment and Testing (DES 1987, 1988), the assessment model that we see in GCSE is also looking dated. Is the GCSE obsolescent? Based on the experience of other countries, it is increasingly difficult to answer the question of why we in the UK, almost uniquely, need an elaborate examination system at 16+. It is increasingly irrelevant given the changing structure of education and training and the fact that few young people now enter the labour market directly at 16+, but take part in some form of further education or training. No doubt the 16+ system has some selective sifting and sorting function but most countries of the world seem to 147

Changing Educational Assessment manage without that same function. Where they do have that differentiation, they do not need such an elaborate system. Furthermore, we have records of achievement on the horizon as government policy; why do we need such a very elaborate examining system? But it is now, above all, the prospect of our national curriculum assessments that will mean that GCSE will have to change. National curriculum assessments are designed to be for all pupils, not just for a large proportion of the age group. As I have indicated, the model of assessment does look more progressive, more based on a developmental model of education than the courses that we see under GCSE. National curriculum assessments will report on more dimensions — for example three, four or five different profile components within each subject — thus conveying more information than the GCSE grade. I do not think that we shall have national assessments at 16+ until 1995 or 1996. With that in mind, unprecedented as it is for an English examination to last for less than twenty years, I would predict that the last GCSE examination will be in 1998. References DES (1987) Task Group on Assessment and Testing: A Report, London: DES. DES (1988) Task Group on Assessment and Testing: Three Supplementary Reports, London: DES. Gipps, C. (1988) 'The experience of differentiation in GCSE', paper presented at the British Educational Research Association Conference, Bristol. HMI (1988) The General Certificate of Secondary Education: an Interim Report on the Introduction of the New Examination in England and Wales, London: DES. Nuttall, D. (1984) ‘Doomsday or new dawn? The prospects for a common system of examining at 16+’, in P. Broadfoot (ed.) Selection, Certification and Control, London: Falmer Press. Nuttall, D. (1985) ‘Evaluating progress towards the GCSE’, paper presented at the British Educational Research Association Conference, Sheffield. Radnor, H. (1987) The Impact of the Introduction of GCSE at LEA and School Level, Slough: NFER. Radnor, H. (1988) ‘Coursework: how if not when?’, paper presented at the CEDAR conference, Coventry.

148

III Selection, certification and the accreditation of competence

The chapters in this final section of the book refocus attention on the interface, in discussions of assessment, between individual life chances and systemic purpose. The first chapter, by Lewin and Lu, describes recent changes in entry to higher education in China which have made the process more competitive and meritocratic. The drive for economic efficiency and the need to get the ‘best brains’ into key roles are identified as crucial factors in this policy change. The skills tested by the revived system of competitive examinations are very narrow, however, and Lewin and Lu suggest that this may not be the best educational preparation for producing flexible and creative graduates. Singh, Marimuthu and Mukherjee pursue similar themes in their discussion of school-to-work transition in Malaya. They review international evidence about how the instrumental pursuit of examination passes (which may benefit individuals through access to higher-paid jobs) can so narrow the educational experience of such individuals that they are unable to work flexibly and creatively when they occupy challenging high-status roles (to the detriment of overall economic performance). The need for selection mechanisms which are perceived to be fair and can be operationalized on a large scale leads to the traditional unseen end-of-course examination which in turn leads to a very narrow curriculum. While accepting this broad relationship, Singh, Marimuthu and Mukherjee's data reveal considerable complexity in the interaction of school and the labour market in Malaya, and they suggest that the pursuit of narrow work roles has as much to do with work cultures as school experience. Similarly, in passing, we can note from recent events in China that whatever else the new competitive examinations systems has produced, it has not been dull bureaucrats. Nevertheless, how a broader range of skills and understandings can be identified and fostered through school experience remains a key international issue and one which the 151

chapters in this volume reflect. The final two chapters offer a European perspective on moves towards identifying and reporting a wider range of educational achievements and establishing absolute rather than relative standards of attainment or competence. Penelope Weston’s chapter takes up discussion of the relevance of curriculum and assessment to pupil interests and the need to create forms of assessment which respect pupils rather than render them as the subjects of an alien and alienating process. She describes a collaborative effort on the part of a number of European countries to look at how assessment processes contribute to disaffection with schooling and identifies a number of principles which wider forms of assessment should embody. Finally, Alison Wolf looks at how vocational examinations are changing in England, France and Germany in order to accommodate developments in information technology. She points out that methods of assessment in the vocational field have usually had a more practical focus than traditional academic examinations, but demonstrates that the form of practical test can vary from having tasks defined centrally (a practical examination, in effect) to having tasks designed and conducted by teachers and/or supervisors in the classroom or workplace. This variety depends on the way each country has traditionally approached vocational education (centrally or locally) and in turn carries consequences for the responsiveness of the system in times of rapid change. Her discussion of local flexibility and responsiveness versus nationally prescribed standards returns the discussion to that of the 'trade-offs1 which were identified in earlier sections of the book. It is an appropriate issue on which to conclude this review of developing international practice in the field of assessment. At last, assessment experts and, increasingly, policymakers, are realizing that there is no holy grail in assessment, no one best way to measure all the processes and outcomes of education. Discussions within the ‘technical’ assessment community over validity and reliability are beginning to merge with discussions emanating from the wider education community over curriculum, pedagogy and the role of assessment in the learning process, such that different assessment purposes are seen to require different assessment methods. In turn, decisions about which methods to employ, and when, will depend on the changing social context in which assessment practices are located and the ‘trade-offs’ which policy-makers feel are appropriate at any one time.

152

12 University entrance examinations in China: a quiet revolution Keith Lewin and Wang Lu

Introduction This chapter offers a preliminary analysis of university entrance examinations in China during a period of rapid change in educational development. The 1985 reforms announced by the Central Committee of the Chinese Communist Party (Communist Party of China 1985) are now being implemented and they impinge on most aspects of the link between schools and higher education. Their introduction followed a period of rapid development in higher education in the wake of the Cultural Revolution (Huang 1984). University selection has become a key issue as more children reach the end of senior secondary school, and students are being given more choice over courses of study and field of subsequent employment. University entrance, as in other societies, is a crucial allocator of life chances. It is also a critical determinant of pedagogical practice and learning styles in the higher reaches of the school system. Here we seek to provide some brief contextual remarks for those unfamiliar within China, describe the main features of the examination and selection process, analyse the form and substance of selected instruments and develop an agenda of issues raised by the utilization of the entrance examination and its planned development. The result is a modest contribution to the debate on how much needed reforms can be reconciled with current practice. Context The university entrance examination system in China has undergone radical change since the Cultural Revolution. During the ‘Ten Years of Turmoil’ (shi nian dong luan), the competitive entrance examination was discontinued in 153

Changing Educational Assessment recognition of Mao’s belief that ‘education should connect with practice, intellectuals should link with workers and peasants’ and that 'university students should be selected from those that have some practical working experience’. The feeling that education should not be divorced from social and economic development remains a major consideration in current policy. Because of the bias against the ‘stinking ninth category’ (chou lao jou) of intellectuals during the Cultural Revolution these Maoist principles were not implemented with due attention to the cumulative nature of educational development and the characteristics of curricula at different levels. Mao's thoughts were developed and applied using political ideology and class struggle as the defining parameters. As a result many intellectuals were rusticated to the countryside or assigned to factories in order to link them closely to workers and peasants with little thought for the nature of the learning that might take place. The selection system that identified promising youths for university study became one based on the sponsorship of the masses. The procedure adopted was: the the the the

applicant applies, masses recommend, leadership approves, college reviews. (Quoted in Unger 1982)

Provincial committees allocated a quota of places to factories, People's Liberation Army units, and rural communes, and the local leadership selected students. An important part of the Party had convinced itself that ‘experts’ had mystified specialist knowledge to deny it to the masses, and that university education was accessible to laymen if it was taught in a suitably practical, non-theoretical manner. Right thoughts were more important for students than demonstrable academic achievement. Class origin and political activism were central attributes of the selection system. This system was the product of a chaotic political movement and no proper administrative and legal system existed for its procedures and requirements. In practice the third step — the leadership approves — predominated. The selection process frequently degenerated into one where ranking Party cadres identified and sponsored students and ensured places for them. The system apparently failed to increase access to the masses as had been intended. From an early stage there were reports that those identified for higher education were predominantly children of ranking cadres and rusticated rural youths, not 154

Entrance examinations in China those of peasant birth (Unger 1982). These selection practices severely damaged the education system. Higher education students could not reach the standards expected of them as they were starting from the low base provided by their incomplete schooling. University staff and factions in the political leadership became increasingly alarmed that the system of selection was damaging scientific, economic and military capabilities. Graduates were often technically not capable of doing the jobs that they were allocated. Research activity virtually ceased. The age structure of most professions began to display an increase in average age as fewer new recruits became available. In schools academic knowledge was devalued since it counted for little in gaining access to university. The motivation to study of a generation of children was undermined by the vagaries of the selection system and the intellectual development of many was stunted a result. The watershed of 1976 heralded the return to a competitive entrance examination as the main, if not the sole, criterion for university entrance. The competition for this was opened to a generation allowing over-age candidates to catch up on the backlog of students who had been excluded from access over the previous ten years. Every student reaching the end of the secondary school had the opportunity to take the new entrance examination and those that performed well were selected for further education. Political activism and class background ceased to be formally recognized as relevant. Connections (guan xi) inevitably continued to ease the path to major institutions for those fortunate enough to possess them. Nevertheless, these were of much less importance than previously since all students selected needed to score aboveminimum entry levels. In the 1980s, and especially since the announcement of the 1985 educational reforms, university selection has again become a key issue. International conferences have been hosted to benefit from experience elsewhere (e.g. Heyneman and Fagerlind 1988). Changes in the employment system for graduates have created momentum to look again at selection practices. It is to the current system that we now turn. The university entrance system The consolidated national university entrance examining system that has developed since 1977 is controlled by the National Examination Authority of China. This is now the 155

Changing Educational Assessment single body responsible for national examining and has within it five sections. These are: (1) Item Development; (2) Examination Administration; (3) Research Section; (4) International Examination Co-ordination Bureau, and (5) Office of International Administration. The main examinations it is responsible for are for university entrance, adult education entry, teacher certification, post-graduate entry, and a test for new government officials. In principle examining is free. Students paid a fee of ten yuan (two pounds) in 1988 to cover administrative costs. From next year the Authority will be responsible for the common parts of the graduate entry test (mathematics, politics and foreign language). Forty other subjects are controlled by consortia of university departments. There are currently 1.7 million candidates each year for university entrance, divided about 50:50 between science and humanities. Adult entrance examinations cater for about the same numbers. University entrance is open to all senior secondary school (grades 10-12) students under 25 years. Candidates can retake as frequently as they like. There are about 2,500,000 senior secondary school students. As a result of the large numbers many provinces are selective about entering candidates and exclude those with little chance of success. About 600,000 are selected each year on the basis of their rank-ordered scores. This number is determined by the State Education Commission with the Ministry of Finance on the basis of plans for employment and resources allocated to the universities. About 3 per cent of the intake to universities each year are admitted without taking the examination under the ‘exceptional student quota’. This dispensation has been introduced to reward excellent students and to ensure that students who perform poorly at examinations through anxiety, stress or factors beyond their control can be selected. The quota includes students who display the ‘three goods’ consistently in their school careers (academic achievement, health, leadership). Since the decision on admission under the quota is not confirmed until a month or so before the examination it is difficult to see how it can reduce stress to perform except at the final stage of the student’s school career. It is not clear how differences in standard between schools are moderated in awarding places under the quota. The Examinations Authority maintains a small core of staff for item preparation. The university entrance examination is set by subject-based item-writing groups that are usually made up of one to two core staff and five to six part156

Entrance examinations in China time experts selected from prestigious universities. They are identified through professional networks and the recommendations of university presidents. They may be asked to write examples of test papers and these are judged for quality before item writers are invited to join a group (Interview, Examinations Authority April 1988). About sixty people a year from about twenty-five universities are involved in setting the national university entrance examination. Before 1988 the setters were incarcerated for two months in hotels and held incommunicado. Though treated to the delights of Chinese tourism after the setting task was complete, and before the taking of the examination, it has become increasingly difficult to find suitable staff willing to forego so much time. As a result, from 1988 item writers are allowed home after a month and trusted not to divulge questions. There is as yet no item bank though the Authority has plans to establish one for some parts of the examinations. Marking is decentralized to provincial level. Moderation of the university entrance examination through a formal system to ensure comparability between provinces is not practised. It is thought to be unnecessary for the following reasons. First, most students remain within their province and standards established within the province can be applied consistently. Second, model answers are available to all examiners against which their marking can be compared. Third, institutional memories in universities can control yearto-year differences between provinces, and judgements about comparability are made on this basis by admitting institutions. Entry to universities is on the basis of raw scores with cutoff marks drawn at an agreed level determined largely by the number of sponsored state Education Commission places available. There is no standardization of marks between subjects or between component parts of papers. Scores in science and arts subjects are simply aggregated to give a total score. The distribution of marks often deviates from a normal distribution, creating statistical reservations about the aggregation process. Experiments are under way in Guanzhou to introduce standardization and to cease to use raw scores as the basis for selection. Students in the last cycle of secondary education in the general secondary schools follow, from choice, a curriculum that is science- or arts-based. The last term — about half a year from February to June — is left available in the school curriculum for preparation and practice for the university entrance examinations. The actual examining currently takes place over three days around 7-9 July. The examinations are 157

Changing Educational Assessment taken on the same days throughout the whole of China. Soon after the examination, standard answers for each subject paper are made available and the pass scores for entry into national key, provincial key, and ordinary universities are published. By September the results of the examinations are published and each student receives a score notification indicating the score for each subject and the aggregate score. If the student believes that there are errors in the results, on the basis of comparison with predicted scores based on past performance or because of errors in aggregation, a check can be requested. The format of examinations The present format is that there are seven subjects for science students and six for arts. These are: science: mathematics, physics, chemistry, Chinese, foreign language, politics, and biology after 1981, and arts: mathematics, Chinese, history, geography, foreign language, politics. The papers in mathematics are different for arts and science students. Multiple-choice questions have been introduced into China only since 1983 and are included in the examinations. All questions are written except in foreign languages where there is an oral test for those applying to foreign language institutes. Most of the examination papers are timed for 120 minutes. Chinese (150 minutes) and biology (sixty minutes) are exceptions to this. Further details of the format are available in East China Normal University 1985 and Lu 1988. The examination papers are reprinted with answers in Herbei Province Press 1988. Most papers have between four and eight sections with different types of question in each. The range is as follows: Subject

Parts

Chemistry Geography Biology Physics History Politics English Mathematics

4 4 5 6 6 7 7 8

The Chinese paper is exceptional, consisting of two sections: section 1 on basic knowledge and skills and section 2 on 158

Entrance examinations in China composition. The types of items used fall into five categories: filling in the blanks; multiple choice; short answer; structured, and essay. Physics, mathematics, chemistry and biology include an additional question at the end that is scored separately and is considered in applications for prestigious universities but is not part of the aggregate score. Types of items The fives types of item that are in common use in the examination are illustrated below. Blank filling These items are present in all papers except mathematics and Chinese. In these a word, short phrase or number is required from the candidate and this is written on the question paper. The correct response can be identified from the stem of the item. Multiple choice Typically these have a stem which consists of a direct question or an incomplete statement and four or five responses from which the candidate can choose. There is no standard format for these items across the different papers (chemistry has five options, physics has four). In some papers there may be one or more correct responses to an item and the student will have to identify the correct responses and none of the incorrect ones to obtain the marks available. Short answer In this candidates have to produce an answer that is more than one sentence long. The simplest form is when a text is presented as a stimulus and an open response is required based on it.

159

Changing Educational Assessment Structured questions These are usually a series of questions designed around a single topic. There are only seven items of this type and six are in geography. Some are based on data (figures, maps); others develop a single theme. The elements of these structured questions are not closely linked or hierarchical in structure. They do focus several questions on the same general topic. Essay and longer questions These items require an original response from candidates that has no single identity. Mark schemes indicate the points for which marks will be awarded. Marking involves the judgement of those competent in the subject matter. The structure of papers Most of the papers have a similar overall structure with about six parts. The simplest items are at the beginning and the difficulty increases through the papers. Chinese is an exception with only two sections. Section 1 has twelve parts and twenty-five items. Its aim is to assess students’ basic knowledge and skills in phonetics, grammar, words and phrases, reading and comprehension. Most of the items are multiple choice or short answer. Section 2 assesses writing ability and has two essay-type items. One requests the candidates to write a newsletter on the basis of information provided; in the other the candidates have to write a manifesto and argue its validity. The reading comprehension texts are quite long (the longest exceeds 1,000 words) and the essay tasks are time consuming. The Chinese examination is therefore given an extended time of 150 minutes. Few of the questions are based directly on any classroom activities. In particular, experimental questions do not form part of the science papers except in the case of a small number which elicit experimental precautions. This reflects the reality that relatively few schools teach much experimental science. English examining is heavily concentrated on multipleschoice questions and filling in the blanks, as is chemistry. Geography makes more use of the full range of items. Mathematics and Chinese award most marks to essay/longer questions, unlike other subject papers. In English the most 160

Entrance examinations in China marks are available for multiple-choice questions; indeed, across all the papers multiple-choice items are the most popular and carry more of the marks than any other category. Filling in the blanks and multiple-choice questions are numerically the most common and make up 82 per cent of all items. Together they account for nearly 55 per cent of the marks available. The great majority of both types of questions operate as recall questions, filling in the blanks almost entirely so. Guessing corrections are not used. Many of the shortanswer questions are also heavily dependent on recall. The major part of the overall aggregate score is therefore available to those with good memories and inspired guessing patterns. This structural imbalance between recall items and the marks available for the more creative, expressive and analytical aspects of the papers has implications for the attributes of the high-scoring candidates selected and for the teaching and learning patterns that the form of the examination encourages. The marking of questions occurs according to guidelines that are published after the examination. For the closed questions this simply indicates the correct responses. For the more open questions the points which must be included to score marks are listed. The history, physics and politics papers We have decided to analyse in more detail three question papers to give more insight into the quality of the university entrance examination. History was chosen as an arts subject; physics was chosen as a science subject; politics was added as indication of a type of examining not normally incorporated into university entrance in the UK. History There are six parts to the history paper: (1) filling in blanks (twenty-three questions); (2) multiple choice (sixteen questions); (3) matching (two questions); (4) map reading (three questions;) (5) listing names, etc. (eight questions), and (6) longer essay type (three questions). Altogether there are fifty-seven items. The history paper questions can be classified into Chinese history and world history. About 33 per cent of items are world history and the types of item are evenly split between the two topics. The range of questions asked suggests a view 161

Changing Educational Assessment of history as a discipline that is concerned with dynastic events, major battles, chronological perspectives on development, dates of major treaties, and key events in the development of the Communist Party. There are no questions which indicate interests in social history, local history, historical methodology or the integration of historical perspectives with those from other disciplines. One lone multiple-choice question is related to the history of science and this simply asks for the names of two scientists. Nearly all of the items in the first five parts of the paper are testing the candidate's memory of events, times, places and people. Some examples illustrate this type: In the year of BC Qi Yuan called his barons for meeting in . He then became the first powerful chief of the Princes of the Spring and Autumn period. In March 1919 Mao Zedong set up the Central Peasant Movement Institute in and issued . The years for the beginning and the end of the 100 year war between Britain and France were a. b. c. d.

1346-1453 1337-1453 1337-1429 1337-1436

The self defence principle ‘we will not attack unless we are attacked, if we are attacked we will certainly counterattack’ was proposed by the Chinese Communist Party at the time of a. the KMT besieging the Jin Gangshan revolutionary base. b. the KMT launching its fifth anti revolutionary encircling and suppressing movement c. the KMT launched the first Climax of the anti Communist Party movement d. during the civil revolutionary war. It is only the last three questions in the paper that offer an opportunity to candidates to express their own ideas. Thus the questions include one on the relationships between Parliament and the monarchy in England: Briefly state the reasons for the opposition between the British Parliament and the Stuart monarchy and the historical facts of the struggle between them. In 1689, what kinds of rights did the British Parliament obtain and what was their significance? 162

Entrance examinations in China However, the marking scheme for these types of questions indicates only that in them reference to particular points will be rewarded, not the quality of argument or originality of interpretation. The question above has nine marks available, allocated as follows: King’s belief in the Divine Right (1) Opposition to the Divine Right by Parliament and demands for its limitation (1) Conflict between King and Parliament over levying taxes and forming or abolishing laws (1) Suspension of the Parliament by the King (1) Parliament reconvened to raise taxes for a military campaign and declared taxation by the monarch alone illegal; the King tried to imprison the opposition leaders and the confrontation resulted in the King precipitating the civil war (1) The civil war lasted from 1642-48 and Charles the First was guillotined in 1649 (1) The Bill of Rights was passed in 1689 and this prevented the monarch from making or abolishing laws (0.5), disallowed the collecting taxation or maintaining of a standing army (0.5), ensured that Parliament had to meet regularly (0.5), prohibited the monarch interfering in elections and provided for Parliamentary privilege (0.5). The Bill of Rights provided a legal guarantee restricting the power of the monarchy and provided the basis for a constitutional monarchy (1). Items on the history paper were classified using Bloom’s taxonomy (Bloom 1981) since it is widely familiar inside and outside China. Recall of knowledge completely dominates testing in this subject. There are no examples of questions which involve the presentation of data (historical accounts of events, economic indicators, official records) which could be the basis of analysis by the candidates. Evidence is usually not required by the questions asked. The impression the paper gives is that there is little room for analysis and judgement in the study of history by students and that they are required to absorb and repeat the analysis of others. Physics The physics paper has seven parts: (1) filling in the blanks (eight questions); (2) multiple choice (ten questions); (3) short 163

Changing Educational

Assessment

answer (three questions); (4–6) longer answers, and (7) optional question. There are twenty-four compulsory questions altogether. The distribution of these items according to topic is heavily biased towards mechanics and electricity. Two-thirds of all the items are on these topics, leaving about 13 per cent on atomic physics, 8 per cent on optics and 4 per cent on changes of state. There are no items concerned with some topics in physics, e.g. electronics, solid state physics, nuclear physics, and little treatment of others, e.g. thermodynamics, properties of matter. Though this partly reflects the traditional bias of the physics curriculum towards classical mechanics (Lewin 1987) the absence of questions on heat and thermodynamics seems puzzling. If it is a result of their treatment earlier in the secondary school, the result is to leave an unbalanced diet for university entrance. Because of the possibility of multiple correct responses the rubric for questions can become quite complex. Thus for physics section 3 we have: There are forty marks available for this section. Four points are available for each multiple choice item. For items 1-9 there may be more than one correct response. If all the correct choices are made four points will be given. If some correct responses are made and no incorrect ones two points will be given. No points are given for no choice or ones with any incorrect responses. For item 10 there are three correct answers. One correct response gets one point. Two points for two correct responses and four points for three correct responses. No points are given for no choice or ones with any incorrect responses. When Bloom’s taxonomy is applied to the items the result is that, unlike most other subjects, the filling-in-the-blanks questions do often require higher levels of cognitive performance than recall. So also do the multiple-choice items. This is closely related to the frequency with which mathematical demands are made on candidates to calculate parameters since this involves the application of previously acquired principles and procedures. Compared to chemistry, physics has a much greater emphasis above the level of recall. Compared to British papers there is less variety in type of questions and far fewer based on experimentation or real world problems (Crellin, Orton and Tawney 1979). There are few items of the recall type illustrated below. 164

Entrance examinations in China In sunshine colour fringes can be seen on an oil film floating on the surface of water. This is the result of the interaction of two waves of light. The fringes are caused by a process known as . When a small round opaque board is irradiated by parallel monochromatic light, a bright spot can be found in the centre of the shadow of the board on a screen placed behind the board. This is a result of the of the light. There are many more of the type illustrated by this multiple-choice item: A student uses an unequal arm scale to weigh the mass m or object A. In the first instance A is put on the right hand tray of the balance. When the scale is in balance the left hand tray has a weight of ml placed on it. A is then placed on the left hand tray. When the scale is in balance the mass of the weight on the right tray is m2. The mass m of the object equals a. m = √7mlm2 b. m = (ml + m2)/2 c. m = mlm2/ml + m2 d. m = m cannot be weighed because the scale has unequal arms. Though higher-level cognitive performance is demanded by these items than those in the history paper this performance is mostly of a particular kind. It involves the identification of one or more algorithms appropriate to the problem in hand and the application of them to work through a numerical problem. There are no significant opportunities for expression of scientific thinking in the paper; there seem no items designed to demonstrate creative problem-solving by posing novel problems; estimation and order-of-magnitude calculations do not appear; very limited use is made of data-based questions where interpretation is required; no sequential questions are included which develop reasoning in a structured way, and there is little reference to experimentation. There are three questions that ask for the recall of experimental procedures that have been omitted from a description of an experiment but these do not go far to assess skills associated with the design, development and interpretation of experiments (see above, e.g. the section on short-answer questions). It is also noticeable that few of the questions take real world problems likely to be familiar to students. The typical question is abstract and based on a laboratory environment. For example: 165

Changing Educational Assessment Connect a 10v 2.0W (pure resistor ‘A’) to a power supply whose e.m.f. and internal resistance do not change. The actual power consumed by resistor ‘A’ is 2.0W. Another resistor of 10v 5.0W (‘B’) is connected to the power supply. Is it possible that the actual power consumed by ‘B’ is less than 2.0W? If it is impossible state your reasons. If it is possible please give the conditions for ‘B’ to consume less than 2.0W power (given that resistance does not change with temperature). The final optional question in the paper is available to candidates who wish to impress the examiners. It is a projectiles problem: A big gun bombards a target at the same level. Both gun and target are at sea level. If the elevation of the gun is Ql, the point of impact is at distance dl from the gun and is in front of the target. If the elevation of the gun is changed to Q2 the point of impact becomes d2 beyond the target. According to this information calculate the elevation for the gun to hit the target given that the speed of the shell leaving the gun is constant and the resistance of the air is ignored. Again this question demands the identification of the appropriate equations of motion (the algorithms), and the application of algebraic and trigonometrical functions. It is basically a calculation using applied mathematics and not requiring much physical analysis. There is no evidence from this paper as to what kind of science education policy is being promoted to contribute to national development goals. There does seem a need for some careful analysis of the attributes that China needs amongst her scientists so that they can perform adequately and enhance the modernization drive rather than hinder it (Lewin 1988b). Politics The politics paper consists of seven parts: (1) filling in the blanks (sixteen questions); (2) multiple choice (twenty questions), and (3–7) all essay questions. Altogether there are forty-one questions. Politics has three themes — current affairs, philosophy, politics/economics. All the current affairs questions are filling in the blanks. The other themes have about two-thirds of the 166

Entrance examinations in China items in multiple-choice form and the remaining third split between filling in the blanks and essay writing. The current affairs questions are mostly drawn from recent events reported in the press, on television and radio. They are mostly assessed using items that require the recall of specific information, e.g.: The ‘Government Working Report’ of the Fifth Plenary Session of the Sixth National People’s Congress says that the disturbances created by a few students at the end of last year which spread to several cities is mainly the result of . They include information thought important for national selfrespect: In September 1986 at the Tenth Asian Games Chinese athletes won gold medals. and ongoing disputes: To our surprise the judicial organs of Japan recently accepted and heard the lawsuit against the Chinese state owned property (Guang Hualiao) proposed by Taiwan in the name of the ‘Republic of China’. The essence of the problem is that Japan wants to create or in the form of a judicial judgement. The philosophical aspects of the paper are heavily influenced by Marxist/Leninist/Mao Zedong thought. Dialectical materialism is a scientific world outlook and method because a. It discovers the most general laws of nature, human development, and of intellectual understanding b. Its perception of the world is material as well as dialectical c. It is a scientific system of knowledge about the objective world developed from the impetus of practice d. It inherits the achievements of history of philosophy and generalises new ideas from the natural sciences and the process of proletarian struggle. In our country, since the exploiting class no longer exists class conflict no longer produces major contradictions. As the socialist transformation of the ownership of the means 167

Changing Educational Assessment of production has basically been completed the role of the law of class struggle has changed. These facts imply that a. Social laws may change b. Social laws may have different features at different times c. There have to be conditions for social laws to exist and operate d. The existence and workings of social laws reflects the will of the people. Economic questions require knowledge of definitions of terms or simple reasoning, as in the item below. If the exchange value of one sack of rice is two sheep then a. when the labour productivity of producing rice increases once the exchange value should be one sack of rice = one sheep b. when the labour productivity of feeding sheep increases once the exchange value should be one sack of rice = four sheep c. when the labour productivities of producing rice and sheep both increases once the exchange value should be one sack of rice = two sheep d. when the labour productivity of producing rice increases once, the labour productivity of feeding sheep increases twice, the exchange value should be one sack of rice = six sheep. As with the multiple-choice questions on philosophy it may have come as a relief to candidates to learn that more than one of the answers given are correct! The reasons for this unusual pattern of item with varying numbers of correct responses between items is not clear. It can be seen as confusing to candidates and will make statistical analysis of the performance more complicated; it is not obvious what the benefits are. The free-response questions which make up the final four parts of the paper are challenging and appear to require explanation and argument in the response, e.g. Why should young students take part in social practice as well as study Marxism and scientific knowledge? Please use the principle of the dialectical relationship between theory and practice to develop an analysis of this. According to the statistics of the International Labour 168

Entrance examinations in China Office in 1973 there were 8.270 million unemployed in the US, France, the U K , West Germany, Italy, Japan, and four other major capitalist countries. In 1975 at the peak of the economic crisis unemployment totalled 14.40 million. In the US unemployment reached 7.830 million in 1975 which was double its level in 1965. There has been some decrease in later years but it is still over 6 million. Please use the theory of surplus value and the cyclic nature of the capitalist economic crisis to explain these facts. The mark scheme for questions of this kind provides for ten points to be awarded. It details these as follows: In order to get as much surplus as possible in the process of increasing production capitalists adopt new technology to increase labour productivity. The result is that the capitalists’ need for growth in the labour force is lower than the rate of growth of capital, and many workers’ jobs are replaced by the introduction of machinery. As a result the supply of labour begins to exceed the demand created by capitalists (4). The capitalists take advantage of the large numbers of unemployed workers to strengthen their exploitation of the employed workers (3). In crisis, many enterprises drop in production or even close down so that unemployment increases sharply. In the period of recovery production increases and employment also increases (3). When classified according to Bloom’s taxonomy the distribution of items illustrates the extent to which the first section of filling in the blanks is dominated by recall questions. The multiple-choice questions seem to require careful thought to discriminate between alternatives, and doing this successfully needs more than simple recall. The last section demands expression and the use of a method of argument which suggests higher cognitive demands if the marking scheme is applied in a way that reflects this. The politics paper is currently compulsory for all students and they must pass it to be admitted to university. There has been a debate over whether the examination should be reformed, omitted or included in the proposed school leaving examination and not necessarily in the unified entrance examination (Liu 1984; Ding 1984). The examination has 'two interests' (Ding 1984) — to select outstanding students and to enhance the teaching of politics in the schools. The former 169

Changing Educational Assessment has been problematic since some science and engineering students have found the examination too difficult, though they are exceptional in science subjects; the latter has presumed that political education in schools should reflect the balance of the material in the examination. The content of the examination has given little attention to any systems of politics apart from those of the Chinese government, has not yet changed substantially to reflect the broader acceptance of market economics in China since 1978, and retains a strong normative character and may therefore need updating. Commentary China is well known as the country that first developed an examination system. University entrance examinations have a very high level of prestige based on the acceptance of examinations as a method of selection for more than 2000 years. The examination is almost certainly the largest of its kind, involving more than 1.5 million candidates each year (East China Normal University 1985). It is a key determinant of future opportunities for the majority of the candidates that take it. Inevitably the content and the form of the examination exert a powerful impact on teaching and learning in the schools, as examinations do in other Asian countries (Oxenham 1984; Dore 1976). In Chinese the examination system is called ‘the baton’ (zi hui bang) as it directs the teaching and learning at which it points. The baton can be wielded with beneficial and detrimental effects. The focus of the examination on fundamental knowledge and skills has helped to overcome some of the anti-intellectual legacy of the Cultural Revolution as it has encouraged students to respect and understand scientific knowledge and the value of research. There are, however, some severe disadvantages to the examination as it now functions. These include problems with orientation (too academic, too little association with the practical world), content (large numbers of recall-based items) and format (all instruments are written, no practical work is required and no credit is given for school-based work). The problems that arise influence the secondary curriculum in a number of ways, discussed below. The school curriculum is organized strictly according to the subject disciplines defined by the examinable subjects. In the senior secondary schools science students drop history and geography and arts students drop physics and chemistry. This creates some anomalies, not least that geography students in 170

Entrance examinations in China university are predominantly from the science stream. There is little recognition in the curriculum of any links between cognate subjects and, until recently, few connections were made to the life world of students in selecting teaching material. Teaching and learning in schools is dominated by traditional pedagogical techniques which depend heavily on chalk and talk. This is a fairly universal phenomenon in arts subjects; in science demonstration lessons are not uncommon, especially in key point schools that are well equipped. Even in these schools class practicals involving students working in small groups are a rarity (Lewin and Little 1987; Lewin 1987). Much teaching takes place following national textbooks page by page and teachers repeat the material in the books. The principal activities of students in the classroom are listening, taking down notes and reading the textbook. Active involvement, designing, exploring, problem-solving, collecting evidence and experimentation are rare events (Lewin, Little and Burrell 1988; Eraut, Lewin and Little 1987). Since there is no practical component to the university entrance examination there is little incentive for students to take activity-based work very seriously. For the schools the lack of examinability of a topic inevitably means that it will not be a significant part of the school curriculum. The schools concentrate on what is required — that is academic, mental, memory-based and written activities. Options outside the mainstream arts and science programmes are virtually unknown except in schools that are technically and vocationally orientated. In the general secondary school system those with special artistic, musical or mechanical and practical skills have little school-based opportunity to develop their talents except in after-school clubs, where these exist. The unified university entrance examination acts to restrict the development of regional and local curricula. All schools are effectively obliged to follow similar curricula, nationally prescribed, in areas as different as the coastal provinces and the autonomous regions. They also, of necessity, have to proceed through them at approximately the same pace. Decentralization of the curriculum is currently encouraged (Lewin 1988a; Lewin and Xu 1988). It is unlikely to have many tangible benefits unless the unified entrance examination is linked less closely to the school curriculum and some provision for school-based or regionally designed examining is introduced. Though the examination’s form and content can be seen as inhibiting curriculum development this is only one side of the 171

Changing Educational Assessment problem. Before the entrance examination came to prominence few curricula were specified in ways that gave more than vague guidance at a general level about learning objectives. Syllabuses were and are predominantly lists of content and information to be absorbed with no emphasis on skills to be acquired. Examinations conducted in the image of these syllabuses naturally reflect these orientations. There are, therefore, epistemological and pedagogical traditions that currently favour a particular style of teaching, learning and assessment. If change in these is desired the question then becomes whether the unified entrance examination should try to anticipate and encourage these, or whether it should remain conservatively following curriculum development that may take place. The dominant criterion to evaluate school performance, and the only one publicly available, is the number of students from a year group entering universities and colleges. This puts a premium on academic activity, the most able students and examination practice. Those students whose chances of success are low (the great majority who fail to remain in school long enough to get to the university entrance examination) experience relative neglect and suffer motivational problems as a result. This has caused serious problems in rural and urban areas and has resulted in increasing dropouts (Wang 1987). In 1984 of all the junior secondary students in Fuzhou, the capital of Fujian Province, only 78 per cent remained three years later. The rate of dropout is even higher in rural areas. In one county, Yongtai, rates as high as 50 per cent dropout have been reported (Tang 1988). As peri-urban incomes have increased in the wake of economic reforms, it has become less economically attractive to remain in school for those with access to income from small businesses. This may explain increased dropouts in some areas (Lewin and Xu 1988). The unified examination has also contributed to the escalation of examination pressure on children in lower grades. There is intense competition between key point schools for places in prestigious universities. Entrance examinations for these schools have become more and more demanding. Backwash from these has led to entrance tests being held for feeder schools. In some prominent primary schools closely associated with key point secondary schools, entrance tests have been reintroduced to select those children with the most likelihood of progressing to university level. Current research at Hangzhou University has led researchers to the conclusion that 'teachers concentrate on that which is examined and 172

Entrance examinations in China ignore the rest and students are closely orientated to the demands of the exams’ (Lewin, Little and Burrell 1988). The distortions that this pressure creates on the curriculum (frequent testing is common, recall-dominated teaching and learning the norm) and the pressure on children to work long hours practising questions and completing homework are increasing. Private tuition is still a relatively uncommon phenomenon in China compared to most other Asian countries. The signs are that this will not remain the case much longer. The sea of items (ti hai zhan shu) that is the colloquialism for senior secondary education is on a rising tide that threatens to swamp broadly based educational goals. In contrast to this view of increasing competition and pressure for access some argue that it will become less attractive to take the unified entry examination in future. The government is to stop guaranteeing jobs to graduates and allow a free labour market to operate for most graduates. Students training to become teachers are likely to be excluded from these proposals. These developments are seen by some to increase the risk that a graduate may not find employment for some time and thus diminish the benefits of being a graduate and investing so much time in studying. The entry examination is difficult for most students and only a minority succeed in gaining access. Rewards for graduates currently are not much greater than for non-graduates. Indeed average earnings of graduates are often below those of small businessmen, tradesmen, vegetable growers, etc. These arguments serve to emphasize the critical importance of understanding effective demand and its impact on the examining system and teaching and learning (Lewin 1985). A structural reform that has been proposed recently is that a school leaving examination should be introduced as a qualifying examination for university entrance. This already exists in parts of Jiangsu and in Shanghai. The idea is that those qualified to take the entrance examination should concentrate on only four subjects in future. The qualifying examination would be held in late May and the university entrance examination in early July. In the proposed system students will take the qualifying examination in stages. Senior secondary First year history Second year chemistry, biology, geography, Chinese Third year physics, mathematics, foreignlanguage, politics 173

Changing Educational Assessment Those who qualify on the basis of this will then take three compulsory subjects — Chinese, foreign language and mathematics. These will be complemented by options in the subjects in which students will specialize. The qualifying examination under this system will be organized at the provincial level by the Teaching Research Group of the Provincial Bureau of Education. Though the operation of the unified entrance examination is the same everywhere, standards for entry do vary from place to place. This means that it is easier to enter universities if a student is born in one city rather than in another. This may be seen as inegalitarian. Earlier, universities were financed on the basis of reputation and not student numbers, which led to wide variations in the unit of resource per student. In future it is planned that the numbers of students will determine the baseline income. Some provinces have many more places available than others and can therefore enrol larger proportions of qualified candidates. Quota systems are used to ensure the participation of members of national minorities but they are not used to equalize opportunities for access amongst the majority community. Some thought may therefore need to be given to how access ratios can be harmonized more in the future to equalize opportunities. Our final observations on the examination concern aspects of its form and content. As far as we know no substantial research has been undertaken on the attributes of different types of items and their performance with a view to improving their reliability and validity. We note that such research might usefully concern itself with several features of the examinations. •





174

The format of items varies widely within and between papers: what is the rationale for this? What benefits are presumed to follow from this practice? Are candidates confused by it? Does it prevent some types of item analysis from being useful? Is it possible to assess a wider range of skills than those currently tested? Why are such large numbers of items assessing recall? How can the proportion of recall items in some subjects be reduced? In what ways may .practical work or school based work be considered if this is thought desirable? Inspection of the papers suggests that there may be considerable variation in difficulty between papers in different subjects: is this intentional? How does it affect aggregation of raw scores? Which papers contribute most

Entrance examinations in China •



to the variance of the aggregate score? Through what mechanism are examination papers linked to the school curriculum? How are they planned to reinforce or reflect the curriculum aims and objectives devised by groups other than the setters of the examination questions? What is the predictive validity of the examination as a selection device? Does this justify its importance in selection? Is the examination equitable?

From the description and analysis offered here it is clear that major changes have taken place in the system of university entrance since the mid 1970s. This quiet revolution has established a unified national examination which selects students on the basis of competitive performance. Though there must be many reservations about the way this system operates and its effects on the teaching and learning in the schools, its development represents a major achievement and almost certainly a more efficient allocation mechanism than that which it replaced. We hope that the observations and ideas advanced here can contribute in a small way to the further improvement of the system and its more beneficial impact on schools. References Bloom, B. S. (1981) Evaluation to Improve Student Learning, McGraw Hill: New York. Communist Party of China (1985) ‘Decision of the Central Committee on Educational Reform’. Crellin, J. R., Orton, R. J. J. and Tawney, D. A. (1979) ‘Present day school physics syllabi’, Reports and Progress in School Physics, 24:4 et seq. 1979, London: Institute of Physics. Ding, Er (1984) 'Some views on reforming the politics examination in the College Entrance Examinations Higher Education Front’,9, 1984. Dore, R. P. D. (1976) The Diploma Disease, London: Unwin Education. East China Normal University (1985) 'Educational measurement and an evaluation of the examinations', Examination Centre of East China Normal University, Shanghai Higher Education Press. Eraut, M., Lewin, K. M. and Little, A. W. (1987) ‘Teaching and learning in Chinese schools: Training research and practice', China Research Report, No. 5, Institute of Development Studies at the University of Sussex. Herbei Province Press (1988) Items and Answers: the 1987 University Entrance Examination. Heyneman, S. P. and Fagerlind, I. 'University examinations and standardised

175

Changing Educational Assessment testing: Principles, experience and policy options,’ World Bank Technical Papers, No. 78, Washington, B.C.: World Bank. Huang, Shiqi (1984) ‘On some vital issues in the development and reform of higher education in the People’s Republic of China’. Conference Paper, World Congress of Comparative Education, July, Paris. Lewin, K. M. (1985) ‘Quality in question: A new agenda for curriculum reform in developing countries’, Comparative Education, 21, 2. Lewin, K. M. (1987) ‘Science education in China: Transition and change in the 1980s’, Comparative Education Review, August, 32, 2. Lewin, K. M. (1988a) Educational Reform: Quality, Access and Equity in Reforming the Revolution, R. Benewick and P. Wingrove (eds), London: Macmillan. Lewin, K. M. (1988b) ‘Educational planning for scientific and technological development’, International Encyclopaedia of Education, Oxford: Pergamon Press. Lewin, K. M. and Little, A. W. (1987) ‘Science education and teacher training’, China Research Report, No. 4, Institute of Development Studies at University of Sussex. Lewin, K. M. and Xu, Hui (1988) Rethinking Revolution: Reflections of China’s 1985 Educational Reforms, University of Sussex. Lewin, K. M., Little A. W. and Burrell, D. (1988) ‘The implementation of the 1985 policy reforms in China’, China Research Report, Institute of Development Studies at the University of Sussex. Liu, Sibei (1984) ‘The politics examination in college enrolment should be reformed but not omitted’, Higher Education Front, 12, 1984. Lu, Zhen (1988) ‘A brief introduction to the system of higher school enrolment examinations in China’, in S. P. Heyneman and I. Fagerlind University Examinations and Standardised Testing: Principles, Experience and Policy Options, World Bank Technical Papers No. 78, Washington D.C.: World Bank. Oxenham, J. (ed.) (1984) Education versus Qualifications, London: Unwin Education. Tang, Qinhii (1988) ‘Dropouts today, new illiterate tomorrow’, People’s Daily, Overseas Edition (23 June). Unger, J. (1982) ‘Education under Mao: Class and competition in Canton schools 1960-80’, Columbia University. Wang, Lu (1987) ‘Preliminary study of the evaluation of student learning’, Development of Foreign Education, No. 3, 1987.

176

13 Learning motivation and work: a Malaysian perspective Jasbir Sarjit Singh, T. Marimuthu and Hena Mukherjee

Introduction The slow pace of socio-economic development of Third World countries has often been blamed upon their education systems. Initially, the lack of an educated manpower was considered a major drawback and theories of human capital formation strongly recommended heavy investment in schooling as a prerequisite for economic development. Despite the heavy outlay of expenditure and the creation of a large pool of educated manpower, development has still eluded most of these countries. Attention has naturally shifted to the educative process itself in the search for more effective links between education and development, with the general contention that the answer lies not in increasing the mere numbers at different levels of education but in the quality of manpower, that is in its increased capacity for productivity. It has been argued that the majority of workers, despite their education, fall short on productivity. Working within a tight bureaucracy, they display a lack of initiative, are not prepared to be venturesome or take risks while being largely preoccupied with maintenance of the status quo. It is posited that these negative non-productive and anti-development work behaviour strategies have been acquired through a schooling experience which concentrates largely on preparing a credentialled manpower in response to the demands of the labour market which has increasingly used educational qualifications as a criterion of selection. The thesis advocated as The Diploma Disease made three basic assumptions pertaining to the relationship between education and the labour market: 1

Qualifications represent the primary recruitment criteria into the labour market; those with better qualifications stand a better chance of entering 177

Changing Educational Assessment 2

3

prestigious occupations. Since qualifications are the prerequisites for employment, the acquisition of qualifications assumes primacy in the schooling process; all learning is motivated by the desire to obtain good credentials. Learning orientation or motivation in school has a long term effect on work motivation, work behaviour and consequently productivity of workers. (Dore, R. 1976)

Productivity, thus, is as much, if not more, the outcome of the reasons why a person learns as the content or skills that he acquires. Why individuals learn (learning motivation) determines why they work (work orientation) and how effectively they work (work behaviour). Those who have indulged in learning merely as qualification earning, reduce the learning process to being ‘ritualistic, tedious, suffused with anxiety and boredom, destructive of curiosity and imagination, in short anti-educational’ (op. cit. IX:8). On the whole they extend into work similar attitudes and behaviour patterns — driven by an overriding desire for the material advantages, their interest not extending beyond fulfilling minimal demands of the job. Since the publication of The Diploma Disease in 1976, empirical studies have focused on the ill effects of examination-dominated education systems. Attention has especially been concentrated on the education systems of developing countries which inherited many of the school and examination features of western societies. Among the studies a six-nation study involving Great Britain, Japan, India, Nigeria, Sri Lanka and Malaysia attempted to operationalize some of the assumptions of The Diploma Disease to test them in national contexts which espoused different degrees of examination and assessment dominance. The study focused on the basic distinction between schooling which is only for qualification — a mere process of certification — and the process of education which has mastery of knowledge as its object. The primary objectives were to: (1) delineate the complex nature of learning motivations; (2) demonstrate the dominance of examination motivation in less developed societies, and (3) assess the long-term links between learning motivation and work motivation/behaviour. This chapter will, through an examination of crucial findings from Malaysia, throw light on the breadth of learning motivations as well as their impact on work behaviour in Malaysia specifically and in developing countries generally. 178

Learning motivation and work Examination dominance in Malaysia Malaysia typifies the major maladies of the diploma disease — a centralized examination system which moderates entry into the occupational structure, ensuring high income and prestigious jobs for the successful while relegating the rest into low-paying traditional-sector work. Since the beginning of formal schooling in Malaysia examinations have played a crucial role in accreditation and selection. Malaysia inherited an examination structure from the British through the Cambridge School Certificate examinations which influenced the curricula and teaching and learning methods in the schools. With independence a centralized examination system was introduced with public examinations at key levels restricting opportunities to move upwards in the system. Despite a general expansion in educational opportunities since independence, the greatest expansion has taken place in primary and lower secondary education, resulting in an educational pyramid which is broad at the lower levels and thereafter narrows steeply at the post-secondary and tertiary levels. A restrictive opportunity structure at the upper levels places excessive pressure on performance at the major exit points into the labour market. The three major examinations — the Lower Certificate of Education, the Malaysian Certificate of Education and the Higher School Certificate — represent major stress points in the passage from school to work. The education system historically has been linked with the labour market through entry to most (and certainly the best) modern jobs — depending on a minimum level of schooling. The higher the jobs rank ‘in terms of responsibility, emoluments and prospects, the higher the level of schooling deemed necessary’ (Dore and Oxenham 1984). The correspondence between level of schooling, job status and income is very closely knit in Malaysia; an increment in schooling is rewarded with an increment in occupational status. A number of research studies have confirmed the positive correlation between educational qualification, occupation and income (Wilson 1972; Hirschman 1974; Singh). The three variables — education, occupation and income — were demonstrated to be closely intermeshed in Malaysian society. This has placed a very high premium on educational credentials in the movement of individuals into employment. Both individuals and government invest heavily in schooling as an insurance against poverty and a facilitator for individual and group upward social mobility. 179

Changing Educational Assessment Educational expansion has had direct relevance for the labour market. An increasingly educated manpower has slowly entered the labour force in larger numbers than could be absorbed, leading to a greater level of unemployment among the educated. By the 1980s the impact of educational expansion has been felt in the presence of a significant group of school leavers as well as diploma- and degree-holders who are facing difficulties being absorbed into the labour force at the level they anticipated. Over the next few years this phenomenon is expected to intensify the chase for credentials while pressurizing the schooling process to deliver the desired examination results. Ample evidence exists pertaining to the overwhelming, almost obsessional, concern with examinations and superior performance in them both in school and at home. Teachers constantly refer to examinations while teaching, additional classes are held before public examinations, and a great deal of publicity is given to these examinations as well as pupils’ results. In school the total teaching-learning processes are geared generally to the examination syllabuses and more specifically to the skills demanded in these examinations which place a heavy reliance on memory, recall, reproduction, drill, model answers and knowing what the examiners want or expect. Keith Lewin has demonstrated how the ‘examination tail wags the educational dog’, frustrating educational reform, as in the case of the Malaysian Integrated Science. The negative effects of multiple-choice examinations ‘discouraged the development of powers of expression and language fluency’ and discouraged understanding by rewarding powers of recognition and recall (Lewin 1984). The pressure to do well is felt from all quarters — parents, teachers, the school and community. The major activity at home entails homework and school-related activities. Parents watch or monitor grades in school and treat all examination results with great concern. The assessment-dominated home climate is further reinforced by private tuition after school. Quite clearly the focus is on passing examinations well and obtaining the right credentials, rather than the actual educative process. The majority of those who pass seek safe and secure employment with the public sector and become a part of the huge government bureaucracy where length of service, loyalty and implementing policies are rewarded rather than creative, innovative and risk-taking behaviour. It seems that the school system prepares individuals to make a smooth transition from examination success to dedicated service in the government sector. 180

Learning motivation and work Research questions In view of the general assumptions of the diploma disease and the specific situation in Malaysia pertaining to the role of examinations in accreditation for and selection into the labour market, this chapter will address itself to three basic questions. 1

2 3

What are the dominant recruitment criteria in the labour market? Does the labour market transmit other ‘signals’ to future entrants into the job market than the value of credentials? How does examination/assessment motivation balance with other learning motivations? Is assessment orientation dominant vis-a-vis other learning orientations? Is there evidence of strong links between learning orientations and work behaviour?

For the first of these questions data from a study on higher education and employment will be used (Aziz et al. 1987), while selected data from the Student Learning Orientation Group (SLOG) research will be presented in the case of the other two questions. 1 Selection criteria Recent studies of recruitment criteria in Malaysia, while affirming the supremacy of qualifications, suggest the presence of other criteria in the selection process. In a study of university graduates’ perceptions of the importance of certain selection criteria, academic record and performance at interviews shared an almost equal rating. Past relevant experience and performance in aptitude tests were considered moderately important. Success in sports and cultural activities and other ascriptive criteria such as sex and race seemed extremely important only to small groups of graduates (Don 1987). The employers themselves indicated the presence of a number of criteria that influenced their selection of workers. While a sizeable number of employers regarded academic record to be a very important recruitment criterion, they actually gave greater weight to past relevant work experience. Nearly a third of the employers considered past experience to be very important, making it perhaps the most important criterion. This is followed by academic record, performance at interviews and then sex (Aziz et al. 1987). 181

Changing Educational Assessment Both students’ and employers’ views suggest that the labour market emits a number of different messages to which jobseekers respond. Qualifications undoubtedly rank foremost in sifting individuals into the occupational hierarchy, but other criteria clearly play a role. Employers are concerned that they recruit those who possess not only the right credentials but also the proper work experience or related training to fit into their organizations. Emphasis is placed equally on affective criteria and aptitudes. Malaysian data suggest that while qualifications may provide the initial consideration for entry into a job, employers seriously attempt to tap other qualities through elaborate interviews and aptitude-testing techniques. The acquisition of the additional qualities are well understood by students and undergraduates who constitute the future entrants into the labour market. In their preparation for work these are not entirely neglected, albeit considered of lesser significance than obtaining a good academic record. Furthermore, the labour market, even the modern sector, is not a homogeneous market; a marked distinction exists between the demands of the public and private sector. Examination success and good credentials are definitely more prized in the public than private sector. Interviews with Malaysian employers revealed important differences between the two sectors with regard to the criteria valued in selecting employees. Employers in the private sector were generally more demanding in the preferred work behaviour and orientations of their workers; a greater emphasis was placed on skills not clearly attested to by mere qualifications. Prominently, communicative skills, interpersonal skills, initiative in seeking and assuming new areas of responsibility, an ability and willingness to learn, to question ideas put to them, and a keenness to keep abreast with development in relevant fields — these are the characteristics on which the private sector placed a premium. The study of graduates highlighted some interesting differences between the two sectors (Hock 1987). A glance through the cognitive criteria revealed that the public sector establishments tended to place far greater importance on academic record than the private sector establishments. Another criterion which the public sector deemed important was proficiency in Bahasa Malay, On the other hand, the public sector establishments did not accord much importance to achievement in aptitude tests or to letters of recommendation. In the private sector we find a rather different picture. The proportion of establishments which considered academic record very important is significantly less than that in the 182

Learning motivation and work public sector. However, the proportion of establishments which considered proficiency in English important is significantly more than that in the public sector. Private sector establishments also placed relatively great importance on past relevant experience and achievement in aptitude tests, but less importance on proficiency in Bahasa Malay, Turning to the affective criteria, both the public and private sector establishments placed considerable importance on them. Almost similar proportions of the public and private sector establishments viewed performance at interviews and personality traits in the same way. This finding seems to point to the pivotal role of affective behavioural traits emphasized by Bowles and Gintis (1976) and Blaug (1985). As for the ascriptive criteria, they do not appear to be particularly important to the majority of the employers. However, the proportion of private sector establishments which considered gender very important is significantly greater than that for the public sector. Private sector establishments, operating in a competitive environment, were more likely to use gender as a handy indicator of differences iri marginal productivity resulting from differences in training and labour force attachment. There is no significant difference between the proportions of public and private sector establishments which considered ethnic origin important or very important. However, in the private sector, ethnic origin seems less important as a recruitment criterion than gender and family background. Research findings generally point to the presence of multiple recruitment criteria enjoying different degrees of importance. While there is little doubt that qualifications remain the predominant selection criterion, it is clear that individuals who are seeking jobs are sensitive to a wider range of criteria, fully cognisant that the possession of additional qualities gives them the edge over others in their search for work. Qualifications in conjunction with other criteria valued by employers seem to be the best assurance for work. It thus seems unlikely that students will completely ignore other criteria and be motivated purely by the desire to get good qualifications, especially if they are looking at the private sector for employment, in which case they must display qualities other than mere qualification. 2 Learning motivations During 1984, as part of the six-nation study on learning 183

Changing Educational Assessment orientations (SLOG 1987), data were collected from 900 secondary school students — 400 from form four and 500 from form six. The students were presented with a pool (127) of items from sixteen a priori learning motivation scales. Through the use of item analysis, content inspection and factor analysis, the data revealed six major learning motivations which were labelled following close content analysis of the group of items that cohered. The six learning motivations that emerged are described with the intent to capture the dominant learning motivations that prevail among Malaysian secondary school children. Interest and personal development This scale concentrates on the interest in learning dimension as well as the positive outcomes of learning and school examinations in terms of non-cognitive personal development. Items in this scale describe learning in the classroom either as an experience that is really interesting, exciting and gripping, or highlight negative feelings about examinations as being really boring, of little interest, and forcing students into a state of inability to cope with the situation. School learning and examinations are viewed generally as facilitating learning and, more specifically, as helping to organize student thoughts as well as develop their character. Thus the learning process and examinations are seen both as providing an enjoyable experience and promoting character and personal development. Extrinsic job This scale determines a very specific extrinsic job orientation. It is directed towards assessing the extent to which Malaysian children study and sit for examinations with a view to obtaining good qualifications and lucrative employment. Examinations are seen as the means of acquiring qualifications which will ensure that financial benefits and advancement are achieved. Items related to this orientation focus on the relationship between learning, job qualifications and favourable employment opportunities. Peer group This scale measures learning motivation that is governed by 184

Learning motivation and work significant peers in the lives of students. Items in this scale, however, refer to persons outside the family circle, suggesting that non-family members play an important role in moulding learning motivations. Malaysian children are clearly concerned about what others think about them. Among those whose opinions and views matter are friends, classmates and teachers. There seems to be a distinct feeling that learning is influenced by these groups of people and one must study to keep ‘face’ with them. Co-operation and competition Co-operation and competition appear as two poles of one dimension in learning. To some, examinations do not encourage much co-operation but rather encourage them to be selfish and aggressive. To others, however, the learning situation affords excellent opportunities for sharing and being unselfish. That this dimension has emerged in Malaysia indicates that this aspect of learning situation is prominent in the Malaysian context. It is thus probable that while some students thrive in the competitive situation that prevails, others enjoy the sharing and exchange of knowledge in the classroom. Parental pressure This scale clearly attempts to gauge the motivation derived from parents and family members, particularly the mother. Items describe the effect that pressure from family members has on the effort made in class. Pressure from the mother is singled out as a factor that might cause a student to worry about test scores and school reports. Generally, this scale gauges the pressure felt by students from parents who are likely to play a significant role in determining their learning motivations. Assessment dominance This scale appears to capture the overall assessment dominance that may be present in Malaysian schools. The thrust of nearly all the items is the importance of passing examinations. In this case, focus is on achievement in examinations with all energies directed merely to passing the examination well. All other 185

Changing Educational Assessment Figure 2 Relationship between intrinsic and extrinsic learning motivation scales

.0

Parental pressure

.35

8

.3 4

Co-operation and competition

.3 3

Peer group pressure

5

. 29

.34

10

.2 1

-.

.2

Interest (intrinsic)

.18 -.0

9

.7

1

Job (extrinsic)

Assessment dominance (extrinsic)

orientations are relegated to secondary importance. Positive non-cognitive outcomes that emerge from the process of learning are excluded. Examinations are viewed in the sense that they are important in themselves and for the ease in which examination success will lead to successful subsequent employment. 186

Learning motivation and work Table 3 Inter-scale correlations

Interest and personal development (IPO) IPO EJO PEGO COCO PAP ADOM

1.00

Total Peer Co-operation Parental Assessment group and pressure dominance competition (EJO) (PEGO) (COCO) (PAP) (ADOM)

Extrinsic job

0.18 1.00

0.33 0.49 1.00

0.29 0.08 0.28 1.00

0.25 0.34 0.19 0.14 1.00

-0.09 0.71 0.35 -0.10 0.21 1.00

Dominant orientation On the whole the scales measure discrete dimensions. The item-scale correlations are very high and the scale reliabilities tolerably high, suggesting that these scales are cohesive. From the pattern of correlations (Table 3 and Figure 2) it is clear that Malaysian students are motivated to learn by a number of motivations that complement each other. For instance, those who are motivated by interest learning are also high on co-operation and competition orientation, while those high on examination learning motivation are also high on job motivation. Students motivated by interest learning, examination success and job reward are also at the same time motivated by parental and peer pressure. However, there are some clearly distinct motivations — examination motivation is not correlated with interest learning motivation. While each is correlated with other common motivations, interest and examination learning motivations are slightly negatively correlated. Generally, two clear groups of orientations emerge. The first group represents intrinsic psychological motivations, akin to interest, personal character development, co-operation and competition. Opposed to this is a group of motivations that focus on the extrinsic motivations for learning. The assessment dominance motivation correlates very highly with the extrinsic 187

Educational Assessment: International Trends job reward, which in turn is highly correlated with other external pressure motivations. The profile of learning orientation showing how the respondents were distributed among three major learning orientations — assessment, interest, and significant others — clearly demonstrated that very few students were high or low on a single motivation. Generally, nearly 50 per cent of respondents were either high or low on all three orientations. Only 4.1 per cent were high on assessment alone, 9.6 per cent high on interest alone and 12.8 per cent high on significant others alone. The rest were concurrently high or low on two or more motivations, showing that learning motivation is a complex phenomenon with students capable of displaying multiple motivations (SLOG 1987). The findings, therefore, suggest that learning for examination and qualifications takes place side by side with learning for genuine interest. Most students are motivated by a number of correlated factors — drive to acquire qualification, please their parents, compete with their friends. Those who are motivated more by interest motivation are also at the same time concerned about parental and peer pressure coupled with a sense of competition. Although interest learning is less highly correlated with some of the parental and peer pressures than extrinsic job reward motivation, the distinction between pure interest and pure extrinsic rewards is blurred. It can be concluded that these orientations coexist and that most students are driven by a number of motivations simultaneously. The question now is whether the study has supported the initial hypothesis that much of the learning in schools is motivated by the extrinsic rewards of a good job and better income. A glance at the mean scores of the sample of students on these scales does suggest that students weighted heavily towards the extrinsic motivations (Table 4). Scores are higher on nearly all the extrinsic motivation scales — parental pressure (3.41), extrinsic job (3.15), peer group (3.07) — than intrinsic scales of interest and personal development (2.90) and co-operation and competition (2.90). However, it should be noted that the difference is not overwhelmingly large. The data would seem to suggest that both groups of motivations are fairly high and well balanced. Similar conclusions are reached from the factor analysis, taking stock of the variance explained by each factor. The data produced six factors of which the first, the interest and personal development, explained 34 per cent of the variance, 188

Learning motivation and work Table 4 Scale mean values (scale maxima = 4.00, minima = 0.00)

Scale Interest and personal development Extrinsic job Peer group Co-operation and competition Parental press Assessment dominance

Total

SLOG6

SLOG4

2.90 3.15 3.07 3.90 3.41 2.79

2.86 3.16 3.07 2.93 3.35 2.79

2.95 3.14 3.08 2.87 3.48 2.81

indicating that the dominant learning orientation for Malaysian students is interest and personal development orientation closely followed by the instrumental factor of job orientation, which explained 22 per cent of the variance. If the two groups of factors are taken together, then, it was found that the extrinsic scales of job, peer group, parental pressure and assessment dominance explained 40.5 per cent of the variance, while the intrinsic group of interest, personal development and competition explained 40.2 per cent of the variance. There is thus a good balance between the two groups of motivation scales in terms of the motivations they explain. In spite of the highly examination-oriented milieu in Malaysia the responses of the youths indicate the presence of a high degree of learning orientation based on their interest and personal development. One could attempt to explain this rather unexpected finding in several ways. The sample for this study constituted form 4 and form 6 (lower) students, both of whom have just finished their selection examinations. The fact that they were in these classes bears testimony to their effort and success in their respective examinations. Having slogged for these examinations, they were taking it easy in these non-examination classes. These classes are sometimes locally labelled by students as ‘honeymoon’ years because they are not under the pressure of public examinations. It is possible, therefore, that their responses were conditioned by their immediate classroom circumstances rather than the educational environment as a whole. The most plausible explanation must be the response set. The students were providing answers that they thought would be desirable in terms of the expressed national educational 189

Educational Assessment: International Trends aims. These include ideals such as the balanced development of the individual, the understanding of one's cultural heritage, the nurturing of an effective producer and consumer, the acceptance of democratic ideals and the awareness of one’s rights and responsibilities as a citizen. It is possible that they were responding to what ‘ought’ to be the situation, rather than what is. These are mainly the expressive aims of education which are celebrated. There is much public discussion about the expressive aims of education, in terms of national unity and integration in the country. The instrumental aim of preparing youth for a job or for a vocation is not stated explicitly, though many a time ministers have made pronouncements about gearing the education system to meet the manpower needs of the economy. However, this utilitarian aim of education being related to the economy remains a hidden agenda, albeit a powerful one. Therefore the students were providing the ‘right’ answers to what education and their learning motivations should be. The Malaysian students perceive education to have both the expressive and instrumental aims, and these perceptions have been shown by their major learning orientations. 3 Relationship between learning motivation and work behaviour In a study of 100 workers in the clerical, supervisory and managerial levels a modest attempt was made to establish links between learning motivation and work behaviour. Data were gathered from respondents on their learning experience (retrospectively), their work motivations and work strategies, as well as their perception of the organizational orientation of their employers and the specific nature of the tasks they performed. In the final analysis all the work dimensions were reduced to a dichotomy representing on the one hand motives and actions that were self-directed or meaningful and on the other hand those that were externally directed and reproductive. The former represented work that was innovative, productive and self-satisfying, while in the latter the locus of control was outside individuals who largely conformed to expected norms. Meaningful work was demanding and stimulating with deep involvement from the workers who set their own goals and standards, enjoyed playing with ideas, were prepared to take some risks but often did more than was expected of them. Reproductive work constituted dull and routine work directed from above with rules laid down, set 190

Learning motivation and work procedures and established ways of doing things, and workers were inclined to play safe, go along doing merely what was stipulated. Links were sought between the following sets of variables — learning motivation (examination and interest learning), work orientation (material reward and self-fulfilment) and work behaviour (meaningful and reproductive). It was hypothesized that examination learning motivation in school would be correlated with material reward motivation and reproductive work behaviour, while interest learning motivation would be correlated with self-fulfilment motivation and meaningful work behaviour. Table 5 and Figure 3 summarize the relationships among these variables, from which the following points may be concluded. Examination and interest learning motivations are significantly correlated, though not very highly, suggesting again that even in the adult population the two motivations are not completely isolated. Similarly, a great deal of overlap is revealed between the two work orientations (r = 0.5), clearly indicating that individuals are motivated both by the desire for material rewards as well as for self-fulfilment. Strikingly, however, interest learning is positively correlated with self-fulfilment work orientation (r = 0.27) and meaningful work behaviour (r = 0.32) but negatively correlated with reproductive work behaviour (r = -0.24). On the other hand, no strong links emerged between examination learning and a desire for material rewards or reproductive behaviour. Thus the long-term effects of interest learning on work are revealed, demonstrating that those who acquire a deep interest and involvement in their school learning transfer into work congruent work attitudes and strategies. It is, however, significant that there was no evidence of long-term effects of an examination orientation — there being no clear association with material rewards and dull, routine, ritualistic work. Since the thesis initially assumed a direct and positive link between examination-motivated learning and reproductive work, the absence of this correlation needs to be highlighted and interpreted. At this stage of the research, without further corroboration, it can only be suggested that examination domination appears not to have a serious damaging effect on work. Those who recalled studying largely to pass examinations and other external pressures were found to be motivated in their work both by extrinsic factors — job reward, promotion prospects and increased pay — or by the need for self-fulfilment and recognition —challenge in work, sense of purpose and feeling of accomplishment. 191

Educational Assessment: International Trends Figure 3 Relationship between learning motivation, work orientation and work behaviour

Work behaviour

Work orientation

Learning motivation

Meaningful Meaningful behaviour behaviour

.• 2277

6

Self-fulfilment Self-fulfilment

1

3

'/

 2 4

^

Interest learning

5

\

JTL^ 32











Reproductive Reproductive behaviour behaviour

Meaningful behaviour

-^u - 0 1 --. 11 0

0 8

2

7

sMaterial rewards Material rewards

•4

Examination learning

^3 4

4

Reproductive Reproductive behaviour behaviour

192

Table 5 Relationship between learning orientation, work orientation and work behaviour

Learning orientation Assessment Interest

Assessment Interest Material rewards Self-fulfilment Meaningful and creative Reproductive and externally directed

Note: x xx xxx

P > 0.05 P>0.01 P > 0.001

Work orientation Material Selfreward fulfilment

Work behaviour Meaningful Reproductive and extremely creative directed

X

0.20x -0.00 -0.01 -0.01

0.07 0.27 x 0.32**

0.50xx 0.34**

0.56xx

X

0.08

-0.24xx

0.24x

0.13

0.16

X X X

Table 6 Relationship between nature of task, organizational orientat

Work orientation SelfMaterial fulfilment rewards

Task meaningful Organization climate meaningful Task reproductive Organization climate reproductive

Note: x xx xxx

P > 0.05 P > 0.01 P > 0.001

0.61xxx 0.44xxx 0.22x

0.34xxx

0.41xxx 0.32xxx 0.26xx 0.34xxx

Learning motivation and work The absence of this positive correlation raised further questions pertaining to intermediary variables. It was now hypothesized that in the long term work behaviour was more likely to be mediated and modified by work situation characteristics than by learning orientation in a distant school past. For this purpose two aspects of the work situation were identified — the organizational orientation or climate of the work place and the nature of the specific task. Both were considered dichotomously as encouraging meaningful or reproductive behaviour. It was argued that, regardless of learning motivation, those who worked in an organizational climate or at a task that was meaningful would exhibit greater meaningful behaviour. Table 6 shows the close relationship between work place, nature of job and work behaviour. Work behaviour is very highly correlated with organizational orientation and nature of task; organizations which promote or provide an environment for creative and meaningful work solicit a response that is creative, while organizations that are bureaucratic constantly lead to reproductive work behaviour. The data generally suggest that there is a greater positive link between self-fulfilment work orientation and the movement into work which is meaningful and satisfying than between material motivation and reproductive work. Those who are self-directed show a greater tendency than others to engage in more meaningful tasks and perhaps seek out organizations that facilitate such behaviour. Such self-motivation is positively correlated with interest learning orientation, and the advantages of learning for interest become apparent. However, the study does not suggest that the examinationmotivated student is truly condemned to boring work — salvation seems possible as the job situation and possibly other intervening variables set the tone for creativity and productivity. Conclusion Bearing in mind the difficulties of research that attempts to delineate and measure motivations, together with the more specific problems of response to such items (especially retrospectively), we would interpret these results very cautiously. Caution is especially called for since some of the findings partially conflict with conventional wisdom pertaining to the ill-effects of examination dominance on work-place behaviour. 195

Changing Educational Assessment Nevertheless, the learning orientation scales have shown stability over time, while the work scales have high reliability and content validity supported by interviews and factor analysis. With a fair degree of confidence some preliminary conclusions may be attempted. Cumulatively, the picture that emerges seems to lead us to some modification of the initial thesis. First, students receive multiple messages from the labour market. While the need for qualifications and the paper chase is perhaps the clearest and loudest message, employers are also signalling other values which, if acquired, will provide an edge in seeking suitable employment. For those with their sights set particularly on private sector work, schooling must be used to acquire and demonstrate qualities not evident in mere qualifications. Second, students in Malaysia display a multiplicity of learning motivations which are not completely exclusive of each other, suggesting that most students are motivated by a spectrum of motivations. Although learning for examinations is a dominant motivation it does not exclude learning for interest or in response to pressure from significant others in their lives. Only a very small proportion are shown to be single minded in their learning. Broadly speaking, learning motivations grouped into two — those that were inner directed by psychological drives and motives and those that were externally directed by examinations, qualifications, and parental as well as peer pressure. The degree of inter-correlations among some of the intrinsic and extrinsic motivations strengthens the view that a generalized group of factors explains high or low learning motivation. Third, while a clear relationship has emerged between interest learning motivation and work orientation for selffulfilment and self-directed meaningful work behaviour, no clear relationship has been depicted between examination learning motivation and material or job reward orientation and ritualistic, reproductive work behaviour. Work behaviour was more clearly and strongly correlated with the nature of task and the organizational climate of the work place. It seems that while learning, motivated by a genuine interest, is an asset in the successful movement into creative and productive work, learning motivated by examinations may not be as much a handicap as imagined. Examination-motivated individuals have as much a chance of ending up in a reproductive job as in a creative job, depending largely on the kind of support and environment that the work situation provides. The paper raises some questions about the established 196

Learning motivation and work perceptions of the dominance of credentials in selection and the determination of learning motivations to the long-term detriment of work attitudes and behaviour. While not denying the predominance of these criteria in the movement of persons from school to work, the study provides some evidence for a somewhat more complex and also more hopeful picture of education, the learning process, and work in a developing country than portrayed by the bulk of existing literature. Acknowledgements In the preparation of this chapter we are deeply indebted to the International Student Learning Orientation Group (SLOG). The sections on learning motivations and on the relationship between learning motivation and work behaviour are abstracted from Chapter 7 (111-34) on Malaysia in SLOG (1987). References Aziz, U. A. et al. (eds) (1987) University Education and Employment in Malaysia, Paris: IIEP Research Report No. 66. Blaug, M. (1985) ‘Where are we now in the economics of education?’, Economics of Education Review, 4, 1. Bowles, S. and Gintis, H. (1976) Schooling in Capitalist America, London: Routledge & Kegan Paul. Don, F.H. (1987) ‘The transition from university education to work’ in U. A. Aziz et al. University Education and Employment in Malaysia, Paris: HEP Research Report No. 66, 142-4. Dore, R. (1976) The Diploma Disease, London: Allen & Unwin. Dore, R. and Oxenham, J. (1984) ‘Educational reform and selection for employment - an overview’, in J. Oxenham (ed.) Education Versus Qualifications?, London: Allen & Unwin, 9. Hirschman, C. (1974) Ethnic and Social Stratification in Peninsular Malaysia’, Washington D.C.: ASA Rose Monograph Series. Hock, L. K. (1987) ‘Expectations and experiences of employers’, in U. A. Aziz et al University Education and Employment in Malaysia, Paris: HEP Research Report No. 66, 189-91. Lewin, K. (1984) ‘Selection and curriculum reform’, in J. Oxenham (ed.) Education Versus Qualifications? London: Allen & Unwin, 115, 117. Singh, J. S. (Date not known) ‘Education and social mobility in Malaysia: A case study of Petaling Jaya’, Ph.D. Dissertation, Kuala Lumpur. University of Malaya. SLOG (1987) Why Do Students Learn? A Six-Country Study of Student Motivation, IDS Research Report Rr 17. 197

Changing Educational

Assessment

Wilson, A.B. (1972) ‘General education and unemployment in West Malaysia’, Journal Pendidikan (Journal of Educational Research), Faculty of Education, University of Malaya, 3:42-8.

198

14 Assessment, certification and the needs of young people: from badges of failure towards signs of success Penelope Weston For large numbers of young people approaching the end of compulsory education, the whole system of school grades, marks, examinations and certificates still seems either irrelevant to their lives or a reminder of the millstone of failure which may be the most memorable outcome of their schooling. It may be the failure to gain worthwhile qualifications which rankles most, or the memory of many uncomfortable, deadening occasions when once more it was made clear that, relative to their peers, they were not up to the mark. How and why has assessment come to play this negative role for many pupils in our schooling system? And is it just liberal idealism to think that assessment could play a positive, constructive role in the learning process for all pupils? It was this kind of concern about the purpose and role of assessment in compulsory schooling which prompted a transnational inquiry, carried out under the auspices of the European Community in 1987-8 (NFER: Weston and Evans 1988). The inquiry had its origins in the Community’s Action Programme Transition from Education to Adult and Working Life (IFAPLAN, 1988), which supported projects in all the member states between 1983 and 1987. The focus of the projects varied, within and between countries, but many sought to develop or enhance curriculum and pedagogy for pupils in the later years of compulsory schooling who were at risk of failure (or had even dropped out of normal schooling) — many of them definable in French terms as jeunes en difficulté. What became noticeable as the Action Programme developed was that only a few projects saw the need to reform the process of assessment and accreditation as well as curriculum and pedagogy. In England, for example, the two Action Programme projects (which were also part of the English Lower Attaining Pupils Programme)* worked on alternative accreditation structures and sought to move towards criterion-referenced assessment procedures which would 199

Changing Educational Assessment record success of many kinds and seek to involve pupils more actively in the assessment process. The same was true of the school project. In many other projects, however, in other parts of the Community, new courses and learning strategies were either totally outside the normal assessment framework, or it was generally assumed that the new programme could operate without too much difficulty within this framework. Why was this? How true, in fact, was it, in practice, that traditional processes, assumptions and structures for assessment and accreditation were part of the problem that some pupils faced at school? How different were conditions and procedures in the various member states? Indeed, how widespread was failure, and how did the rate and/or definition differ from one country to another? What kind of innovations in assessment were being tried, and with what results? How were teachers, pupils and policy-makers affected by changes in the approach to assessment and accreditation? These were some of the questions taken up by the Commission and considered in a proposal to investigate the role played by assessment in the educational careers of less successful pupils, and to identify, if possible, more promising strategies which might be operating in the different member states. It was decided to carry out five brief national studies of assessment practice, as this affected less successful pupils (a target population which itself had to be defined in each country). Officials from the member states had earlier agreed that there were important issues to consider about the role of assessment in education. Now these issues, as they affected some schooling systems, could be more closely identified, and then reviewed in the context of a short conference for representatives of all the member states. This would create the opportunity to review current practice and perhaps suggest some ways forward to the European Commission. The National Foundation for Educational Research (NFER) undertook to co-ordinate the inquiry and convene the conference. The national studies were carried out during 1987-8 in four member states (France, Germany, Ireland and the United Kingdom). These studies considered practices in assessment and certification as they affect less successful pupils during secondary schooling and examined new initiatives in assessment which seek to remotivate pupils and provide opportunities for them to demonstrate and record their achievements. For the United Kingdom, two separate reports were prepared on Scotland, and England and Wales, since each has its own education system, and both have been undergoing change. The German report concentrated on developments in 200

Assessment certification two of the Länder — Northrine-Westphalia and Bavaria. The conference took place in England, at Brighton, in May 1988, and was attended by representatives of eleven member states and of the Commission. Reviewing the problem in a European context It was clear from the five national studies that definitions of failure and the proportion of pupils who could be seen as ‘at risk’ differed considerably between states. In some countries, the practice of redoublement (repeating a school year) identified the pupils whom the school thought were failing, or there might be selection for different types of school, with varying consequences for pupils’ own self-image of failure. Failure to complete secondary schooling was a more severe criterion, and one which applied to a substantial proportion of an age group in one or two countries. In systems with a heavy emphasis on public examinations at the end of compulsory schooling, failure has generally been defined in terms of examination performance. Across the five national studies, the proportion of pupils thought to be at risk on some combination of these criteria ranged from 10 to 40 per cent. It was felt important to consider the consequences of current practice for a broad population of pupils, rather than just the school dropouts or those with severe learning difficulties. This approach was endorsed by the conference participants; although outright failure rates varied markedly across the Community, all recognized that assessment had discouraging effects for a considerable number of pupils. It was also recognized that maintaining ‘standards’ for an elite sometimes seemed to imply condemning a proportion of pupils to relative failure. Evidence from the national studies was used to analyse some of the factors which contributed to the negative influence of current assessment practice. These can be summarized as follows. Frame of reference The prevalence of norm-referenced grading schemes by definition defines half of each group to relative failure, however much progress an individual has made, or whatever level of achievement had been reached. 201

Changing Educational Assessment Purpose Much assessment is essentially summative, recording past performance but usually providing little guidance about how to improve achievement. Range of achievement Formal grades or examinations are often restricted to cognitive/linguistic achievement, through written performance. The problem is compounded by the aggregation of evidence about different aspects of achievement, or even of a number of subjects in an aggregate grade, in which modest success in some areas is lost in an overall ‘below average’ result. Participation Although there are marked differences between countries in their use of teachers and/or external examiners for formal assessment and accreditation, it is common practice for pupils to play little or no active role in assessment. For the less successful, poor grades are a negative judgement handed down by those in authority, with little or no opportunity for dialogue. How can assessment become more constructive? The national studies gave examples of how efforts had been made to work on one or more of these factors. It was important to recognize that there were some constructive aspects of long-established practice in one country which might be taken for granted but which offered a way forward for other systems with a different tradition. For example, in France and Germany class reviews (conseils de classe; Klassenkonferenzen) ensure that each pupil’s progress across all subjects is regularly reviewed by all the teachers involved. The UK public examination tradition, on the other hand, has established expectations about common standards of performance for pupils from all schools. Changes which had been introduced as a matter of policy covered a wide range. Some seemed designed primarily to ease pupils’ way through the existing system rather than to challenge it. For example, French pupils who are thought to 202

Assessment certification be at risk of repeating a year at the beginning of their secondary schooling may now be allowed to spread the twoyear course over three years. In Ireland changes to the examination system have allowed for the introduction of more practical and oral assessment, although this has not been widely adopted. There were also a number of initiatives of varying scale and impact which directly challenged the existing process of assessment in the classroom, and its relationship to curriculum and pedagogy. Within special projects for lower-attaining pupils, where groups of pupils were effectively removed from the mainstream curriculum for all or part of the week, usually in their last few years of schooling, it was possible to ‘change the rules’ quite markedly. Examples were given of the introduction of criterion-referenced assessment, often within a modular framework of some kind, and sometimes offering cumulative accreditation through a unit credit scheme. Profiling of various kinds had been adopted, giving pupils an opportunity to learn how to review their own performance and identify appropriate learning targets. Some schemes provided for the assessment of a far wider range of achievements, with a more imaginative use of different forms of valid evidence. It was pointed out that some of these developments were intended for all pupils; it was also noted that there was a risk in introducing new approaches within the context of only ‘lower attainers’ courses, since the process — and any novel forms of accreditation that were awarded as a result — might unintentionally become a new kind of badge of failure for the recipients. At the conference, the review of these innovations and others from the wider range of member states present at the conference led to a remarkable degree of consensus about the new directions for assessment that participants wished to endorse. Agreement was strongest in relation to the conditions that were needed for constructive assessment, and for the changes of direction needed within the classroom process. For example, one group put forward a whole series of conditions: 1 2 3

Pupils need to be treated as individuals, with mutual respect between teachers and pupils. There must be opportunities for all to experience ‘success’ (i.e. success in a wide variety of achievement, at many levels). Somehow there needs to be a change from the traditional school ethos. 203

Changing Educational Assessment 4 5 6 7

Pupils need a ‘practical’ ‘active’ approach to learning. Learning should be relevant to what pupils perceive to be their needs and long-term purposes. Pupils need to understand what they are learning and to share in judging whether they are succeeding. All this should apply throughout schooling, not just as a rescue operation for older pupils (14+).

There was support for several of the ‘new directions’ identified from the national studies: There should be more emphasis on formative feedback. It was widely felt that pupils needed more information on their progress during the process of learning, throughout their schooling. This feedback should cover their learning skills and how they went about the task, as well as the content of the course. Assessment should cover more aspects of achievement. Despite heavy emphasis on writing and formal classwork in many courses, it is clear that there are moves in most countries to widen the aspects of attainment that pupils are encouraged to develop. There was evidence from several countries of the motivating effect of assessing practical capabilities, even when teachers' expertise and experience in this area was limited. Learning targets should be more clearly specified. It was generally felt that the pupils with whom we were particularly concerned — and indeed all pupils — would benefit greatly from a clearer understanding of what they were expected to do, in all their school activities. Goals needed to be specified for individuals, and it was important that the goals were realistic — that is, targets for each pupil which they could be expected to reach. Assessment should focus on positive achievements. Many teachers had become accustomed, by their own experience and training, to look first for errors and shortcomings when they assessed pupils' work, in order to point out what needed to be corrected. It was easy to fail to acknowledge the evidence of positive achievement, especially when there were many errors. The idea of pupil self-assessment or any kind of direct share in assessment was seen as problematic by some participants. It was widely recognized that changes in the process of assessment had to be seen as part of a much wider programme of innovation, embracing curriculum and pedagogy as well as assessment: the three elements were inextricably related. It was rather more difficult to reach a consensus on changes in certification, where national traditions differed and 204

Assessment certification were seen as difficult to change. Nevertheless, there was considerable interest in the idea of a record of achievement and in schemes which allowed pupils to accumulate evidence of achievement over time. Some implications of the inquiry Some implications of the inquiry’s findings for teachers, curriculum planners and educational managers were also considered at the conference. Teachers All the groups recognized that many of the changes which they were advocating made heavy demands on teachers, requiring them to change not only their assessment procedures but also their relationships with pupils, and their underlying assumptions about attainment and the learning process. Whatever strategies were adopted, a major investment of time and expertise would be required to influence the majority of teachers. Some member states were already considering ambitious programmes of regular ‘sabbatical’ periods for all teachers to enable them to continue to update their professional competence in a rapidly changing world. Curriculum planners New approaches in assessment would make sense only as part of a broader programme of change, which would call for a very different kind of experience than that offered in the traditional classroom, and a more flexible use of teachers' and pupils' time. Managers The difficulty of bringing about changes in established accreditation structures presents particularly difficult changes at all levels of the educational system. It was agreed at the conference that there was a need to exchange information about assessment and to build a European Community network of education professionals with a wide appreciation of assessment issues. It was also felt that 205

Changing Educational Assessment much could be gained from transnational co-operation on teachers’ professional development. This could take various forms: for example, an exchange of ideas about how best to promote staff training, in initial and in-service courses, and the identification and description of significant examples of good practice. There was thought to be a need for more extensive and practical opportunities to learn from good classroom practice within and across member states. It is a tribute to the interest and commitment of the conference delegates that it was possible to have such fruitful discussions and to identify many common issues and priorities. We were left in no doubt about the conviction of all who took part that this was an aspect of educational policy and practice that warranted further work, at national and transnational level. References NFER: P. Weston and A. Evans (1988) Assessment, Certification and the Needs of Young People. A European Inquiry and Conference Report. IFAPLAN (1988) Transition Education for the 90s, Brussels: IFAPLAN. Note * The Lower Attaining Pupils Programme consisted of seventeen LEA pilot projects designed to provide more effective education for less successful pupils in their last two years of schooling; it was intended for a broad range of pupils, up to 40 per cent of the year group. It was directed by the Department of Education and Science and funded from the Urban Programme, from 1983 to 1989.

206

15 Beyond commissions and competencies: European approaches to assessment in information technology Alison Wolf Discussions of national assessment systems tend to be conducted in terms of the two major ‘functions’ of assessment: selection and accreditation of learning. This, in turn, implies that there is a more or less tight fit between institutions and underlying social structures, common to modern industrial societies. Systems may vary in the degree to which they emphasize selection via formal assessment, and the point in people’s lives at which they do so; just as they may vary in how far assessment ever approaches the transparency of idealized criterion-referencing. However, this analysis implies the nature and development of any given system of assessment can be analysed in terms of these social imperatives. This approach to analysing assessment systems comes from a tradition of sociological analysis (e.g. Halsey, Floud and Anderson 1961) which emphasizes general features of social structures and the relationships between them. In doing so, it tends to produce an over-neat picture of how educational institutions actually develop and operate. In this chapter I want to compare the way in which some of the major industrialized countries carry out assessment and certification in the ‘vocational’ sector of education, and, in particular, the way in which they have dealt with the dramatic technological and work-place changes associated with computers and ‘information technology’ (IT). I will be looking specifically at pregraduate level qualifications, although many of the points made also apply to national differences in the planning and provision of degree courses. What this comparison confirms is that there is a rather tenuous link at best between supposed labour market requirements and the behaviour of the educational assessment system. 'Accreditation of skills' is clearly not the main reason why formal systems of vocational assessment and certification develop, any more than is the case with general educational examinations. At the same time, major differences between 207

Changing Educational Assessment systems are evident which can be explained only in terms of particular, ‘local’ factors exerting a strong influence on the dynamics of assessment. In the most general organizational terms, there are only a few ways in which a nation state can organize the assessment and certification of vocational education courses. It can leave the whole process to the market place; it can set up licensing requirements and guidelines within which independent agencies operate; or it can carry out the activities itself in a more or less centralized way. Most countries operate with a mix of these approaches which is heavily weighted to the latter end: notably the major countries of mainland Europe. However, both the USA and Japan belong predominantly to the first category. This is especially true of the US, where post-high-school education has, in the last decade, become increasingly ‘vocationalized’, with the growth of two-year community colleges and a general shift away from liberal arts. American colleges (and the private non-degree-granting vocational schools which enrol 1.5 million students) provide their own diplomas and degrees without there being any significant checks on comparability of content or standards, or central guidance on manpower and training ‘needs’. (There are state licensing examinations for many trades and professions which must be passed before one can practice. Passing in one state does not enable you to practice in any other state: you must take their exams as well. These are, of course, as much a device to restrict entry as a way to maintain standards, but they do create a certain amount of uniformity in vocational courses. However, there are no such licensing arrangements in most of the IT-related fields.) Moreover, credits can be transferred between colleges of enormously different types and prestige with considerable ease, in spite of the fact that standards are known to be very different. Within Europe, the major contrasts are between the semimarket place of the English and the far more centralized and regulated approach to assessment taken by most other countries. In this chapter I will be discussing in some detail the current practices of the English, the French and the Germans; the Dutch and Italians also operate a system comparable to that of France. In discussing the differences between these systems, however, we need to bear in mind that the Americans apparently operate quite satisfactorily without any central direction of assessment and certification in ITrelated areas, or any state licensing of practitioners. 208

Beyond commission and competencies Commissions and competencies: the English system in transition The English system of vocational assessment and accreditation is highly, though decreasingly, decentralized. There exists a large number of different certification bodies. Some deal with a wide range of occupations — others with one particular occupation, in which they represent an official or unofficial monopoly not unlike the medieval guilds. As such, they are independent and self-policing, and governed only by general statutory law. However, the majority of candidates for vocational qualifications at the pre-university level are assessed by organizations which exist as examining and accrediting bodies and are not 'owned' by any one occupational group. Of these, by far the largest are the Business and Technician Education Council (BTEC), the City and Guilds of London Institute (CGLI) and the Royal Society of Arts (RSA), although others have an important stake in particular occupational fields. Although there is a rough division between them in the areas and levels they service, the three are fundamentally and, indeed, increasingly in competition. Thus far, the situation echoes that of the competing examination boards or groups that traditionally provide school-level assessment and accreditation in England and Wales. However, vocational assessment bodies have less official or legal recognition than do the examination groups. Only BTEC is a government creation. The certificates issued by the others command recognition and a market place return only because and insofar as they are recognized by employers and/or used by colleges of further education and private training establishments. This structure means that new qualifications, and new assessment methods can be introduced extremely easily. City and Guilds and RSA (like the school examination boards) take direct responsibility for assessment and accreditation, but not for course content. If they perceive a market for a new qualification, because of changes in the labour market or in the way vocational training is being organized, they can devise a syllabus and announce that the new award is open for entries. BTEC operates slightly differently, in that it approves institutions to run (and accredit) courses leading to BTEC awards. However, the basic principle is the same. A new set of guidelines can be issued, and institutions (of further and higher education) invited to submit proposals for courses in 209

Changing Educational Assessment line with these. If BTEC has judged the market correctly, then submissions will follow, courses will be approved, and a new qualification will appear. Although development costs may in some cases be heavy, a period of two years from the first glimmer of an idea to the offering of a fully documented award is standard. The consequences of this market approach to assessment and accreditation are especially striking in a new and very rapidly changing area such as information technology. (The term is used in the current broad sense to include both those who are largely concerned with the development and upkeep of equipment [hardware and software] and the actual use of the equipment for other ends.) A recent review of current qualifications in IT (excluding those predominantly concerned with engineering) listed no fewer than thirty-seven on offer to candidates currently in education. Moreover, the flexibility and responsiveness which this system encourages also extends to methods of assessment. Current IT awards in England and Wales use assessment methods which range from conventional, centrally set and marked examinations, through centrally prepared assignments which can be delivered locally (on candidates’ own machines) to full reliance on profiling and teacher assessment (Buckingham 1987). However, closer examination of current awards underlines the fact that this responsiveness is not simply a matter of technological and labour market ‘pull’. The forces to which the accrediting agencies respond often have rather little to do with employers’ demand or need for externally trained and assessed workers, and a lot to do with individual competition for status and with general government policies. The ways in which competition between individuals (and efforts by established groups to protect and improve their own position) creates a qualification spiral have been analysed beautifully elsewhere, and I do not want to rehearse again familiar, albeit powerful, arguments. (See especially Dore (1976), Collins (1979), and Boudon (1982).) Psacharopoulos (1986) emphasizes how important it is, in fuelling this process, that governments subsidize education so heavily, thus distorting the costs of alternative paths which individuals face. Less familiar, perhaps, are details of the ways in which government policies affect both the level and the mix of assessment and certification options: and here again, IT provides a vivid example. During the 1980s the British government has, through the Manpower Services Commission (now the Training Agency and the Youth Training Scheme, transformed the structure of post-compulsory education and training. Although YTS (and 210

Beyond commission and competencies its predecessor the Youth Opportunity Programme) were originally a crisis response to rising youth unemployment, YTS was also conceived as a way of responding to another perceived crisis. Government policy-makers were convinced that young people were receiving inadequate training, and were not reaching high enough skill levels compared, particularly, to our major competitors — usually listed as Japan, Germany and the US (National Economic Development Office 1984). YTS was one prong of the strategy for counter attack. The other was a complete reworking of assessment and certification of vocational skills. When governments start worrying about skill shortages, information technology is always the first area to which they turn. It is consequently no surprise that YTS schemes are obliged to provide all trainees with ‘computer literacy’ training. This tends in many cases to be provided by specialist firms who subcontract; in addition, Further Education colleges have also developed modules which can be taken by students from almost any occupational background. Many of the trainers involved looked for outside guidance on course content, as well as a way of motivating trainees, and the accrediting agencies responded fast. They developed qualifications which trainees could obtain after short courses — typically with assessment by the course tutor using preprepared assignments. By no means all YTS trainees aim for an external qualification in conjunction with their ‘computer literacy’ training, but enough do for entrants at this level to swamp the numbers taking more advanced qualifications. The two market leaders are CGLI's modular ‘Information Technology’ and RSA’s 'Computer Literacy and Information Technology' (CLAIT) certificate. The compulsory training on YTS is about ten days, which is not generally quite enough time to complete these qualifications. However, many trainees can supplement the work — predominantly word-processing — elsewhere in their training. (In addition, RSA and CGLI both increasingly provide IT modules as part of other entry level qualifications, using material from CLAIT and 726). Each of the market leaders registers about 30,000 candidates a year, which is five times higher than the total number of candidates enrolled by BTEC National and Higher National awards in Computing and Information Systems. Another important source of business is the ITeC programme — Information Technology Centres established as a separate part of YTS specifically to provide training with an IT focus. The ITeCs once again reflect government worries 211

Changing Educational Assessment about national competitiveness, but they also have been concerned to recruit from non-traditional groups, including school-leavers of low academic achievement. In order to ensure that trainees left with a formal qualification, ITeCs actually approached City and Guilds (bearing the gift of government funds!) to provide relatively easy, modular assessments and certificates that would fit their clientele and roll-on-roll-off pattern of training. City and Guilds naturally obliged. Explicit policies on vocational certification further affect the mix of certificates available. All trainees on government programmes — youth and adult — must henceforth have the chance to obtain ‘recognized qualifications’. At the same time, in every industry, ‘lead bodies’ are being encouraged, and funded, to develop new specifications of the occupational competencies required for given jobs or groups of jobs. These will then be enshrined in reworked vocational certificates based entirely on performance criteria: that is, a supposedly complete system of criterion-referenced testing will be established, based on work-place ‘standards’. Such qualifications will be accredited, if satisfactory, as ‘National Vocational Qualifications’ (NVQs) — the responsibility of the new National Council for Vocational Qualifications. In the case of IT, these policies have given yet more encouragement to accrediting agencies to develop new formal qualifications at a fairly low level — qualifications which can reasonably be obtained in the course of a limited training. Similarly, the industry now has its own lead body, which is busily defining standards, and has just given City and Guilds a contract to embody those already decided in a new set of qualifications which are intended to meet NVQ requirements. These will apply, particularly, to areas of the industry where up to now there have been no or few formal qualifications at all — a point to which I will return in the conclusion. Viewed in the light of these policies, the current list of IT qualifications becomes, clearly, a response to educational and governmental initiatives as much as to market forces — especially at the ‘introductory’ end. It also indicates that (of course) there is a downside to our earlier picture of a flexible and responsive, because competitive, system; because, in a modern state, government policy and funding remain so crucial to the accrediting agencies that there is an overwhelming tendency for them to compete in promising to deliver whatever the government decides it wants — whether this makes sense in educational or labour market terms or not, and indeed, whether or not it is really feasible. Similarly, because 212

Beyond commission and competencies the agencies are competing for custom, they have a strong incentive to keep their other ultimate customers — the candidates — happy. The less an award develops in response to employer demand and work-place practices, the less reason there is to worry overmuch about maintaining a ‘credible’ standard, and the more reason to give the customer the certificate they are paying for. ‘Rational’ state systems: some European examples In contrast to England and Wales, almost all the other European countries (including Scotland) operate with a centralized system, in which vocationally-related education is more fully integrated with core academic schooling, and both syllabuses and assessment are co-ordinated and/or administered centrally by government officials. This means that routes of progression are far more obvious to students and employers alike, and that people have a much clearer picture of the relative content and the presumed difficulty —and status —of different qualifications. At one level, therefore, such systems appear far more ‘rational’ than the English, and also fairer, in the sense that standards are likely to be more uniform across examining sites. However, as the IT case makes clear, the associated costs are also large. The more centralized the system, the more bureaucratic it becomes, and the less able to respond quickly to technological change. Similarly, the more firmly embedded in central governmental institutions the relevant institutions are, the more they are obliged, in a modern democratic state, to represent and consult with all the relevant interested parties. This, in turn, means that within these institutions one finds reflected the general state of relations between those concerned. If these are good, then, at least in theory, decisions are likely to be implemented effectively all the way down the line. If not, then tensions which derive from a wide range of other factors will in turn be reflected in discussions about vocational assessment. France The French system provides a very clear contrast with that of England and Wales, in that the bulk of vocational (or, as it is known, ‘technological’) education takes place entirely within the basic system of schooling. This is specifically the case for 213

Changing Educational Assessment initial vocational training — that is, training for young people — which itself comprises almost 90 per cent of those studying for any kind of vocational diploma (CEDEFOP 1984:231). As part of the general ‘initial education system’ this is administered at national level by the Ministry of National Education. Its decisions about tuition, diplomas and staff are valid throughout the country; vocational teaching staff, like their colleagues, have the status of civil servants, and the state is the sole authority competent to issue formal qualifications — all of which are, in turn, recognized for collective bargaining purposes. Figure 4 summarizes the divisions within French education which take place from age 16 on. Up to this point, French pupils all follow a general education of a more or less common type, which takes place in the colleges. (This pattern dates from the Haby reforms of 1975; prior to these, there was an earlier separation between first-cycle vocational streams and general/academic classes.) At the end of this ‘stage 1’ of the secondary level, there is a separation into different routes through secondary stage 2, which covers the age group 16-19, with compulsory education finishing at 16. Some students enter ‘short-cycle’ technical education which takes place in the LPs (lycées professionnelles or vocational education schools). A very few go into apprenticeships, but these are not a major part of vocational training for French young people. The rest of the students go into ‘longer-cycle’ options which can themselves be either technical or general. Longer-cycle technical education and second-cycle general/academic education take place in the lycees d’enseignement général et technologique. The distinction between short- and long-cycle entrants is a crucial one, for although it is technically possible to cross over, less than 10 per cent of the pupils from short-cycle programmes ever do so. The division within long-cycle programmes is less important, since both general and technical studies can lead to the Baccalauréate which in turn guarantees university entrance. Among students in long-cycle technical streams, the technical Baccalauréat has in fact been gaining rapidly in popularity, while fewer and fewer pupils elect to take the occupation-specific brevet de technicien (BT). By contrast, in short-cycle schools, the choice is between the highly occupation-specific CAP (certificat d’aptitude professionelle) and the rather broader BET (brevet d’etudes professionnelles). Students can also get the CEP (certificat d’etudes professionnelles) after only one year. In addition, a new Bacc. professionnelle is now being introduced, which can 214

Beyond commission and competencies be taken after the CAP or BEP (or after the end of apprenticeship) and provides a possible route on into higher education. The new Bacc. is still very small, but is an acknowledgement of the cul-de-sac nature of the existing certificates and the growing unpopularity of the vocational streams. What all these qualifications have in common is their creation and administration by centralized authorities. The Ministry of National Education’s responsibility for the whole initial education system means that it lays down not only the content of the different Baccalauréat series, but also the nature and content of all the different short-cycle vocational courses. Thus, the 300 or so national CAPs, and the different apprenticeships, are all agreed upon and specified by the same Ministry. (All ‘technical’ diplomas involve consultation with Occupational Consultative Committees [CPCs]. After discussions/decisions about setting up/discontinuing diplomas, the Ministry translates CPC deliberations into instructions on curriculum, hours per subject, etc. There is thus no direct link between CPCs and either employers or deliverers of training [Jallade 1987].) Moreover, all the vocational diplomas (and indeed the general Bacc.) are assessed in the same centralized way: by a single set of external examinations at the end of the course. In contrast to the UK (or Germany), even practical tasks are set externally. The centralized nature of French vocational education, and the emphasis on diplomas awarded through final examinations, make the system very slow to adapt to technological change. In the case of information technology, the result is that there are effectively no secondary-level vocational courses leading to diplomas in which IT plays a central or organizing role. In a recent report on IT assessment and certification, Catherine Agulhon (1988) concludes that four to five years is the minimum period needed, within the French system, to establish and ratify a new certificate. The Ministry is committed — since four or five years ago — to defining new certificates and courses in emerging areas, and, in response to slow progress, has adopted two palliatives: interim alterations to existing certificates, such as office worker, and, in a few cases, authorization of course completion certificates which can, once the diploma is ratified, be converted. However, in the case of a field such as IT, a planning and certification process which takes at least four to five years before the regulations reach the schools reminds one of nothing so much as the labours of Sisyphus. By the time one has reached the peak of ratification, it will be time to start all over again in 215

Figure 4 Alternative tracks in the French educational system MINISTRY OF NATIONAL EDUCATION

Baccalauréat (University entrance qualification)

Technical Baccalauréat (BTn) Technician Certificate (BT) (Technical school)

Vocational Baccalaureat

CAP/BEP/CEP (Vocational school)

Apprenticeship

Pre-apprenticeship year

Age 14: End of first cycle secondary education

Beyond commission and competencies the face of changing circumstance. Moreover, as Agulhon emphasizes, the French system of progression — in education and work — means that ministerial palliatives are inherently unsatisfactory. A properly ratified diploma, she stresses, remains the ‘passport to employment’, and there can be only a limited future for courses and training which do not provide it. There are, currently, quite a number of full-time courses with an IT emphasis, being funded by a range of bodies (such as the EEC), and aimed at young unemployed people, almost all with poor academic records. Although these courses and programmes experiment with various new methods of instruction, and emphasize work experience rather than copying the usual pattern of full-time schooling, the structure of French society means that all are concerned to secure for their pupils the chance to sit for a recognized diploma. Of course, this does not mean that the bulk of French workers, or even French students, receive no training in IT. What it does mean is that almost all such training is uncertificated, and that most of it takes place outside full-time, mainstream vocational schooling. Young people may take modules as an extension of their main studies; but, most commonly, training — whether for computer-assisted operation of machinery, use of word-processors, or whatever — is given to employees through courses directed to the particular need of the work place. French experience indicates that it is very difficult to assess and certificate rapidly changing skills using a centralized system which emphasizes formal diplomas and external examinations. Whether or not this actually matters is a question to which we return briefly in the conclusion. However, first it is interesting to compare French experience with that of an apparently equally regulated and centralized system — that of the Federal Republic of Germany. Germany The Federal Republic of Germany shares with France a system of vocational education and assessment which is regulated throughout by law. However, unlike France, there is no single, national ministry with overall responsibility. Instead, the famous ‘dual system’ of training (on and off the job), entered by the bulk of school-leavers, involves not only two separate places of learning, but also a dual system of responsible bodies. This is because West Germany is a federal republic, in 217

Changing Educational Assessment which education is a state (land) responsibility. Training, on the other hand, is regulated by federal law, and is therefore governed by regulations which are uniform for the whole country. Young people who leave full-time education for the dual system are therefore a state responsibility during the one or two days a week they attend vocational school, while the agreements which regulate the time they spend in a firm, on practical training, are a federal concern. (Although the dual system is the major route to vocational qualifications, a fairly large number of students also attend full-time training schools (Berufsfachschulen) which lead to similar forms of certification. Further qualifications can be obtained at Fachschulen; while vocational further education colleges and trade colleges provide more general forms of technical education which can be used as an alternative route to university education. A training contract between trainee and firm must include provision for attending a separate vocational school.) In discussing the reaction of the German system to technological change, perhaps the most important factor is the number of interested bodies who must be involved. Compared to the French, for whom most vocational training falls within 'initial education', the German dual system must encompass a large number of so-called ‘competent bodies’ at state and federal level. First of all, an occupation must be recognized as a ‘training occupation’ (an occupation requiring training) by the state. An index of these is published every year, and the number has, in fact, decreased from 603 in 1971 to around 450 in 1988. Every recognized skilled occupation is then governed by a relevant Training Ordinance, issued by the federal government, specifically by the relevant minister for the trade concerned in agreement with the Federal Minister for Education and Science. (The development work is carried out by the BIBB — the Federal Institute for Vocational Training). These same ministers can also abrogate recognition in the case of obsolete trades. These regulations provide the objectives and content of applied training to be provided in firms, including the duration of training, an outline training plan, and the criteria for the examinations involved. It is on the basis of these that young people obtain their skilled worker's or assistant's certificate, which in turn is recognized for wage bargaining purposes within a firm. The process of issuing and updating these regulations involves institutionalized consultation under law, and so too does their implementation. Regulations agreed upon by the 218

Beyond commission and competencies competent federal ministries are implemented by the ‘competent bodies’ at local level, as specified by law, with additional input from the state ministries concerned. These local bodies who regulate and control training within the firm vary according to the occupation, but are themselves organized and recognized ‘Chambers’ — whether of Industry or Commerce, Crafts, Agriculture, or whatever. While these Chambers bear some resemblance to the UK's Chamber of Commerce, they have far more legal status. The most visible single responsibility of the ‘competent bodies’ is the conduct of examinations. These comprise: intermediate examinations; final (qualifying) examinations; master craftsman’s examinations, and further training examinations. Thus, examinations are set and marked locally, at all levels. Guidelines on content are, of course, set out in the relevant ordinances, and examinations consist of both a skills examination (work tests and/or an examination set piece) and a test of theoretical knowledge, itself usually part-oral and part-written. To carry out the task of implementing training contracts, and conducting final examinations, the competent bodies appoint Vocational Training Committees. These too have their regulated composition, namely six employers’ representatives nominated by the Chamber itself, six employees’ representatives nominated by trade unions, and six vocational school teachers appointed by the state (which, of course, runs the schools itself) and who serve in a purely advisory capacity. Finally, at the level of the firm itself, there is a works council where employer and employee representatives are jointly responsible for implementing on-the-job training (Munch 1982). A system which requires so much consultation essentially works as well as the general relationships between the bodies concerned. At times of economic growth and relative social harmony, German employer and employee groups co-operate well: witness the long list of studies of German industrial relations which contrasted German experience with that of the UK. However, when there is general tension between the two, this will, in turn, be expressed in the various consultative committees and bodies on which both sides sit. In recent years this has been very obvious within the context of vocational training. Tensions from the system as a whole find expression here, too. The employers press for autonomy and minimal legal requirements, while the trade union representatives press for more integration of the on-job elements in the dual system with those carried out by the publicly-run education systems. 219

Changing Educational Assessment One major difference between the French and the German systems is in the importance attached to the final examinations. In France, the final examination decides whether or not one obtains the all-important diploma. In Germany, by contrast, the final examination is not viewed by the employers as very important at all — and almost everyone passes. This difference can in large part be ascribed to the very different role of employers in the training process. A German employer knows a great deal about the training course a young person has followed, and a great deal about the progress, during training, of the young people in his or her firm. It is from among them that new permanent workers will largely be hired. In entering the labour market generally, people are judged partly by the reputation of the firm which took them on and trained them, and partly by what is, in effect, a 'profile' of their performance in the previous years. Foreign observers often ask how comparability of standards is maintained in the decentralized German system, and the answer seems to be that, in the strict sense, they are not, and that, to most employers, it doesn't matter. By contrast, the French (and, to a lesser degree, the British) employer is buying blind, and the ‘product’ must be guaranteed. In the case of IT, the Germans have been experiencing predictable difficulties in updating their training ordinances. The prolonged consultative process, and the need to produce detailed regulations, makes it virtually impossible to produce anything sensible relating to fast-changing skills; so of the 149 regulations for Master Craftsman examinations published in 1987 only one related to specialized employment in the IT field. However, this is not perceived as quite the problem that it is in France. The government, in conjunction with the employers’ groups, consistently presses for training specifications which leave some areas free of regulation, in which firms can address recent technological change and innovations. The BIBB, itself reflecting the current position of the government and the employers (Jallade 1987:83), also tends to emphasize the importance of the training rather than the final examination; so that, if IT is covered during training but not by the examination regulations, this is not necessarily a great cause of concern. As Richard Koch of the BIBB points out in a recent note, there are a number of occupations, such as printing, where the system’s failure to make any mention of IT-related skills verges on the ridiculous. The situation is also a source of friction with the unions, who want to reduce rather than increase employers' discretion in the delivery of training (and 220

Beyond commission and competencies criteria for hiring). It also further weakens the role played by the final examinations conducted by the Chambers. These tend to be traditional in form (and repetitive in content), and, to the degree that they do not deal with new technology and processes, become progressively less important in the labour market, even though they have formal state recognition. Current German estimates are that by 1990 a third of the labour force will be in jobs making significant use of IT equipment and processes. Those already using such equipment have almost all learned their skills on the job. Only a third have attended any sort of course, and most of these were very short, with no formal certification. While the authorities will no doubt continue to grapple with the problems of incorporating IT into training ordinances and examinations, to date the main implication of the new technology for assessment and certification has been a further weakening of the importance of formal examinations for vocational certification. Conclusion The preceding comparisons will have made it obvious how differently assessment and certification systems respond to major developments in technology and the labour market. They will also, hopefully, have generated the query: Does it matter? Of the major European countries, the English have gone furthest in developing pre-graduate level qualifications in IT-related occupations, and are able to incorporate developments most quickly into some sort of certificated course. It is not at all obvious that, as a result, they have adopted IT equipment faster, or exploited it more efficiently. Moreover, the area of ‘bespoke’ software development, in which they are world leaders, is also the area of the industry where, up until now, there have been no formal qualifications on offer. By contrast, in Japan, almost all IT-related training takes place within the firm. The growing, state-recognized ‘Special Training Schools’ do not deal with anything except standard office practices (including word-processing), and the small Technical College sector has almost no courses on electronics/information technology because there is no demand [National Economic Development Office 1984:50]). In the long term, the responsiveness of the assessment system probably matters to the degree that training is itself assessment-led. Thus, in Japan, where there are very few vocational certificates, there is nonetheless very detailed guidance available to firms on training issues and approaches. 221

Changing Educational Assessment It is this, and firms’ own analyses of training needs, which drive training courses. In France, by contrast, the content of training is completely determined by the content of the crucial final exams; in Germany it is less so. If current government policies in the UK are successful, then post-compulsory education here will also become increasingly certification-led. It should also be noted that technological change of the type we have been experiencing over the last decade is unusual in that it involves large numbers of people acquiring new, distinctive and observable skills. The market place generates a demand for very large numbers of people with these skills (though not with them alone), and ranking and selection are, in this context, almost irrelevant. Employers consequently provide, and employees seek, specific training experience, and what is striking about most jobs involving IT is how satisfactory short uncertificated courses and on-job learning have proven to be. Millions of French and German workers have acquired the necessary expertise, without certification, irrespective of the impasses reached by education bureaucrats. This, in turn, suggests that the certification of acquired competences is, in most cases, of vocational assessment, taking a decidedly second place to selection. In vocational areas, just as in more general education, assessment and certification processes are well developed when, and because, diplomas and credits are of major importance in determining life-chances. The cynics who argue that the particular skills required for most jobs can be learned in six months maximum may well be right. (Their argument is further strengthened by the declining popularity of secondary-level vocational schools across the world. This is a sign not that we need ever-higher levels of general education, but that more individuals can afford to stay in the education system longer, and are determined not to be pushed off the pyramid a moment sooner than necessary.) We can no doubt expect a boom in students with IT certificates when, and if, they are on offer across Europe, but not for reasons which have anything much to do with national economic imperatives. Bibliography Agulhon, C. (1988) ‘L’évaluation dans les programmes franÇais d’Eurotechnet’, Brussels: paper presented to a meeting of EEC experts. Baars, W. S. (1979) Descriptions of the Vocational Training Systetns: Netherlands, Berlin: CEDEFOP. Berg, I. (1970) Education and Jobs: the Great Training Robbery, New York:

222

Beyond commission and competencies Praeger. Boudon, R. (1982) The Unintended Consequences of Social Action, London: Macmillan. Buckingham, K. (1987) Review of Vocational Qualifications, Information Technology Lead Body. CEDEFOP (1984) Comparative Study of the Vocational Training Systems in the Member States of the European Community, Berlin: CEDEFOP. Collins, R. (1979) The Credential Society, New York: Academic Press. Dore, R. (1976) The Diploma Disease: Education, Qualification and Development, London: Unwin. Halsey, A. H., Floud, J. and Anderson, C. A. (eds) (1961) Education, Economy and Society, New York: Free Press. Hurn, C. J. (1983) ‘The vocationalisation of American education’, European Journal of Education, 18, 1. Jallade, J-P. (1987) Les Politiques de Formation Professionnelle des Jeunes a l’Etranger, Paris: European Institute of Education and Social Policy. Mandon, N. (1988) ‘Les nouvelles technologies d’information et les emplois de bureau: Comparisons Europdéennes’, Collection des Études, No. 37, Paris: CEREQ. Münch, J. (1982) Vocational Training in the Federal Republic of Germany, Berlin: CEDEFOP. National Economic Development Office (1984) Competence and Competition Training and Education in the Federal Republic of Gennany, the United States and Japan, London: NEDO. Psacharopoulos, G. (1986) 'Links between education and the labour market: a broader perspective’, European Journal of Education, 21, 4: 409-15. Steedman, H. (1986) ‘Vocational training in France and Britain: Office work’, Discussion paper No. 114, London: National Institute of Economic and Social Research. Steedman, H. (1988) ‘Vocational training in France and Britain: Mechanical and electrical craftsmen’, Discussion paper No. 130, London: National Institute of Economic and Social Research. Wolf, A. (1988) Assessment and Certification of Information Technology Training in England and Wales, Paris: European Institute of Education and Social Policy.

223

Index

ability grouping 71 Abitur (Germany) 89-91 accessibility in national curriculum of New Zealand 127 accountability: in Anglophone Africa 101; and assessment 28-9; in England 96; and examinations 97 achievement: monitoring in United States 58; positive and assessment research 204; range of in assessment research 202, see also educational achievement administration of continuous assessment 116 affective criteria of employment in Malaysia 183 Africa: examination systems in 98-105 Agulhon, C. 215, 217 American Federation of Teachers 30 Anderson, C.A. 207 ascriptive criteria of employment in Malaysia 183 aspirations in African schools 19 224

assessment: and accountability 28-9; and achievement 202, 204; classified 112-13; constructive 202-5; corrupting curriculum 378; definitions 34-5; demographic trends and 26-7; dominance in Malaysia 185-6, 187, 189; embedded in instructions 26, 37; in European context 201-2; frames of reference of 201; functions 112-15; impact of learning on 36; in information technology 207-23; level of analysis of 14-15, 1819; limitations of 35-7; in Maori language 136-42; map of functions 114; new forms of 25-6; of practical capabilities 204; purpose 202; reform of in New Zealand 119-35; research into 199-206; role of in international context 9-22; trends in United States 2331, see also continuous assessment; educative assessment; national assessment Assessment of Performance Unit (APU): compared

Index with NAEP61-3; future of 57-8; and GCSE 56-7; monitoring of progress 53-5; and underachievement 55-8 Assessment of Performance Unit (APU) and studies of standards 40, 126 attestat zrelosti (Soviet Union) 92 attitudes and values in national curriculum of New Zealand 131-2 Australia: and changes in curriculum emphasis 44; and continuous assessment 106; curriculum studies 48; and standards agenda 3251; studies of standards 39, 40-1 Australia Studies of Student Performance (ASSP) 40-1 Australian Council for Educational Research 40 Azaz, U.A. 181 Baccalaureat (France) 88-9; and technical education in France 214-15 Baker, Kenneth 38, 44 Black, H. 120, 123, 124 blank filling in examinations in China 159, 161 Blaug, M. 183 Bloom, B.S. 126, 164, 169 Boudon, R. 210 Bowles, S. 183 Brandt, D. 68-9 brevet de technicien (BT — France) 214 brevet d’etudes professionnelles (BEP — France) 214-15 Brewster, D. 41

British Educational Research Association 143, 145 Broadfoot, P. 112 Brown, S. 122, 125 Buckingham, K. 210 bureaucracy and GCSE 145, 146-7 Burrell, D. 171, 173 Business and Technical Education Council (BTEC — England and Wales) 209-11 Canada: ability grouping 71; achievement in mathematics 70; rich descriptions in crossnational studies 73; topic coverage 71 Carlton, J. 36 Casals, Pablo 52 CEDEFOP 214 central control of examinations: in China 91; in England 95 certificate d’aptitudes professionnelles (CAP — France) 214-15 certificate d’etudes professionnelles (CEP — France) 214 certification 205; of assessment 199-206; in information technology in Germany 221; of teachers 29-30 China 10-11; examinations 84, 91-2; university examinations in 153-76 Chittenden, E.A. 122-3, 126 Choppin, B. 34-5 City and Guilds of London Institute (CGLI) 209, 211 Cleverley, J. 11 Clignet, R. 19 225

Index co-operation and competition in Malaysia 185, 187, 189 Cohen, D. 32-51 College Entrance Examination Board (USA) 86 Collins, R. 210 communication, clear, difficulty of in crossnational studies 74-5 Communist Party of China 153 Community's Action Programme: Transition from Education to Adult and Working Life 199200; implications of inquiry 205-6; review of problem 201-2 comparability in examinations in France 89 competition: in China 91; and co-operation in Malaysia 185, 187, 189; and examinations in China 172-3; in Japan 87; in Soviet Union 92 competitive entrance system to university in China 155 competitiveness of certification, England and Wales 212-13 comprehensiveness of assessment 37 Computer Literacy and Information Technology (CLAIT) 211 computers: literacy see information technology; in schools 25 Connelly, F.M. 71, 76 consultation: in assessment in Maori language 138; in 226

German system of vocational training 219 continuous assessment: administration of 116; aims of 111-12; in Anglophone Africa 102-3; functions of 112-15; and GCSE 144; moves towards 106-10; problems 115-16; in Sweden 94; systems in developing countries 10618 control of assessment: trends in 27-9 coursework: assessment see continuous assessment; and GCSE 144, 146 credentials, importance of in Malaysia 180 Crelin, J.R. 164 criterion-referenced assessment 199, 203; advantages 120-2; disadvantages 122-3 cross-national studies: case studies of 72; of educational achievement 65-77; limitations 73-5; possibilities in 70-2; and rich descriptions 73-4; simple answers, lure of 73 Cultural Revolution (China) 154 culturally-sensitive assessment in Maori language 137 curriculum: changes in emphasis 44; corrupted by assessment 37-8; definitions 34-5; ideals for excellence 38-9; integrated and continuous assessment 111-12; planners and NFER inquiry findings 205; subject-dominated 37 Curriculum Development

Index Centre Bulletin 17 Cuttance, P. 45 decision-making: intrusion of politics into 33-4 Degenhart, R.E. 65 demographic trends and assessment 26-7 Department of Education and Science: Better Schools 53, 56-7; The National Curriculum 121; Task Group on Assessment and Testing 120, 147 Department of Education and Science (DES): and Assessment of Performance Unit 53-4, 56 Department of Education and Science (NZ) Curriculum Review 12632 Department of the Environment The National Curriculum 122 diagnostic assessment 124 differentiation: and GCSE 144-5 Ding, Er 169 ‘The Diploma Disease’ in Malaysia 177-8 divisiveness and GCSE 145, 147 Dockrell, B. 124 Don, F.H. 181 Dore, R.P.D. 170, 178-9, 210 dropouts: in China 172 Duckworth, E. 123-4, 126 Dwyer, Carol Anne 23-31 East China Normal University 158, 170 Eckstein, M.A. 84-97, 104

economic context of examination system in Africa 98-100 educational qualifications: in Malaysia 177-8 educational system: expansion of in Africa 99100 education: and press 46-7 Education Reform Bill (1988) 38, 62; and national curriculum 56-7 educational achievement: cross-national comparisons of 65-77 educational context of examination system in Africa 98-100 Educational Testing Service (ETS) 30, 86; monitoring of achievement 58-9; Profiling American Education 58-9 educative assessment 123-5; tasks in 124; understanding development of 125 Elley, W.B. 67 Elliott, J. 36 employment: criteria for in Malaysia 181-3 England and Wales: and Assessment of Performance Unit 53-8; competitiveness of certification 212-13; and continuous assessment 110; examinations 84, 95-6; information technology, assessment in 209-13; national assessment in 538; research in assessment 199, 200; studies of standards 39, 40; vocational sector assessment 209 enrolment ratios in 227

Index Anglophone Africa 99 entrance examinations in Soviet Union 92 entrance system to university in China 155-8 Eraut, M. 171 essays in examinations in China 160 Ethiopia, examination system in 98-105 ETS see Educational Testing Service Europe: assessment in information technology 207-23; assessment research in 199-206 Evans, A. 199 examinations: Africa, systems in 98-105; central control of 91, 95; in China, reform of 1734; concern with in Malaysia 180; dilemmas in 96-7; external in Anglophone Africa 100; format of 85; Malaysia, systems in 179; options within 85; oral see oral examinations; papers in China, structure of 16070; reform of in New Zealand 119-35; standardized in Japan 87; to university in China 153-76; trade-offs in policies 84-97; uniformity of 85; university see university examinations expectations in African schools 19 extrinsic job orientations in Malaysia 184, 187, 189 facilitative role of assessment 14, 18, 20 228

Fagerlind, I. 155 failure, definition of in assessment research 201 Fair Test Examiner 60-1 family and assessment, role of 14-15, 18-19 Federal Institute for Vocational Training (BIBB — Germany) 218, 220 feedback, formative, in assessment 204 female participation in education in Anglophone Africa 93 final examinations in Sweden 93 First International Mathematics Study 65 Floud, J. 207 formative assessment 124 France: alternative tracks in education 216; assessment in information technology 213-17; examinations 84, 88-9; final examinations in IT 220; research in assessment 199, 200, 202; time needed to establish new courses 215 free-response questions in politics papers in China 168 Freire, P. 124 Frith, D.S. 112 full-year terms in American schools 70-1 Fuller, B. 98 General Certificate in Secondary Education (GCSE) 95; and Assessment of Performance Unit (APU) 56-7; and continuous assessment 110; and criterion-referenced

Index assessment 121; evaluating 144-8; features of 144; and national criteria 144; need for 147-8; promise vs. reality 143-8; syllabus approval in 145 Germany: assessment in information technology 217-21; course up-dating in IT 220; examinations 89-91; final examinations in IT 220; research in assessment 200, 202; vocational sector, assessment in 217 Ghana 19 Gintis, H. 183 Gipps, C. 53-64, 145 goals for assessment 26 Goldstein, H. 44-5, 54-5, 60, 69 Goodlad, J.I. 36 Goodman, Y.M. 68 Gorman, T.P. 65 graduates and qualifications in Malaysia 182-3 Greaney, Vincent 98 Guba, E.G. 71 Halsey, A.H. 207 Harrison, M. 34, 44 Havelock, R.G. 117 Hawes, H. 19 Heyneman, S.P. 98, 104, 155 Hirschman, C. 179 history papers in examinations in China 161-3 HMI(1988) 147 Hock, L.K. 182 Horton, T. 110 Huang, Shiqi 153 Huberman, A.M. 117 Human Rights Curriculum

Project (Australia) 48 Humanities Curriculum Project 124 Husen, T. 62-3, 65-6, 67, 69 Iacocca, L. 36 immigration and teacher shortages in United States 27 In-Service Training (INSET): and continuous assessment 115-16 information technology, assessment in 207-23; in England and Wales 209-13; in France 213-17; in Germany 217-21 Information Technology Centres (ITeCs) 211-12 inhibitive role of assessment 14, 18, 20 instructions: assessment embedded in 26, 37 interest and personal development in education in Malaysia 184, 187, 189 International Association for the Evaluation of Educational Achievement (IEA)41 International Association for the Evaluation of Educational Achievement: history of 65-7; and reading literacy study 679, 75-6 Ireland: assessment research in 200, 203 Ivory Coast 19 Jacoby, R. 46 Jallade, J.-P. 215, 220 Japan 45; ability grouping 71-2; examinations 84, 87-8 229

Index Jaysuriya, J.E. 9 Joseph, Sir K. 121-2 Kelleghan, T. 98-105 Kenya 10-11; aspirations of students 19 knowledge: in national curriculum of New Zealand 129-30 Koch, R. 220 Koff, D. 19 labour market: and educational qualifications in Malaysia 179-80; and qualifications in Malaysia 182 Langer, J.A. 68 large-scale assessment and school-level change 35-6 learning: and examinations in China 171; and impact of assessment 36; interrelationships in national curriculum of New Zealand 132-3; motivations and work behaviour 190-5; orientation in Malaysia 187-90; targets and assessment research 204; and work in Malaysia 177-98 Lesotho: Education Sector Survey 101; examination system in 98-105 Lewin, K.M. 11, 15-16, 153-76, 180 licensing of teachers in United States 29 Lincoln, Y. 71 literacy: and nature of standards 43, see also reading literacy Little, A.W. 9-22, 171, 173 Liu, Sibei 169 230

Lowe, Robert 95 lower-attaining pupils and assessment research 203 Lu, Zhen 158 machine-scoreable examinations: in China 92; multiple-choice, in Anglophone Africa 102; in United States 86 Macintosh, H.G. 112 McLean, L. 34, 44-5, 65-77 MacLure, M. 110 McNaughton, T. 34, 119-35 McNight, C.C. 67, 71, 75 Malawi: examination system in 98-105 Malaysia 177-98; examination dominance 179-80; learning motivations in 183-90; research into recruitment criteria 181-95; selection criteria by employers 1813; work behaviour and learning motivations 190-5 managers, school, and NFER inquiry findings 205-6 Manpower Services Commission 210 Mao Zedong 154, 167 Maori people: assessment in language of 136-42; in national curriculum of New Zealand 127, 129 Marimuthu, T. 177-98 market approach to assessment in information technology 210-11, 212 marking of examinations in China 157 mathematics: comparative studies of education 65-6 measurement, definitions of 34-5

Index ‘mid-year rating examination’ (MYRE — Papua New Guinea) 108 Mingat, A. 98 minority groups and teacher shortages in United States 27 Mkandawire, D.S.J. lll 116 moderation: approach to in assessment in Maori language 138-9; of examinations in China 157 Mukherjee, H. 177-98 multiple-choice examinations: in Anglophone Africa 1023; in China 91-2; in Japan 87, in United States 86 multiple-choice questions: in examinations in China 158, 159, 160-1; in physics papers in China 165; in politics papers in China 168-9 Munch, J. 219 Munn, P. 122, 125 Murphy, R.J.L. 115 music and assessment in Maori language 140 Musoma resolution (1974) 107 national assessment: trends in 53-64; in United States 28 National Assessment of Educational Progress (NAEP) 35, 53, 75; compared with APU 6 1 3; and national assessment 58-61; and reading standards 44-5; and studies of standards

39-40, 41 National Board for Professional Teaching Standards (USA) 30 National Commission on Excellence in Education (USA) 27 national criteria and GCSE 144 National Curriculum Council 57 national curriculum (England and Wales) 95; and Assessment of Performance Unit 56-7; assessments 30 national curriculum of New Zealand: learning, aspects of 129-33; principles 126-9 National Economic Development Office (NEDO)211, 221 National Educational Association (USA) 30 national evaluation in Sweden 94 National Examination Authority (China) 155-7 National Foundation for Educational Research (NFER) 199-200 National Policy on Education (Nigeria) 10910 National Science Foundation (NSF) 59 National Teacher Examinations (USA) 30 National Vocational Qualifications (NVQ) 212 National Youth Service (Seychelles) 109 New Zealand: ability grouping 72; assessment in Maori language 136-42; 231

Index criterion-referencing of assessment 120-3; national curriculum of 126-33; reform of examining and assessment 119-35; students without qualifications 119-20, 121 New Zealand School Certificate 136-7, 138 Nigeria: and continuous assessment 109-10, 115; Handbook on Continuous Assessment 109-10 Njabili, A.F. 107-8 Noah, H.J. 84-97, 104 numerus clausus (Germany) 90 Nuttall, D.L. 55, 143-8 Nwakoby, F.U. 110, 115 Olson, D. 69 Ontario Assessment Instrument Pool 126 optical scanners 25 oral examinations 85; in Germany 90; in Ireland 203 Oral Proficiency Interview (OPI) 137 Orton, R.J.J. 164 Oxenham, J. 170, 179 Papua New Guinea: and continuous assessment 106, 108 parental pressure in Malaysia 185, 187, 189 peer group motivation in Malaysia 184-5, 187, 189 Peil, M. 19 Pennycuick, D. 13, 106-18 perceptions about standards, improved 46-7 performance: affected by context 45 232

personal development in education in Malaysia 184, 187, 189 Peters, T.J. 36 physics papers in examinations in China 163-6 Piaget, J. 124 politics: intrusion into decision-making 33-4; papers in examinations in China 166-70 Postlethwaite, T.N. 75 Power, C. 40, 61, 122, 124-5 practical capabilities see vocational sector Primary School Pupil Assessment Project 35 productivity and education in Malaysia 177-8 professional associations and standards 49 Psacharopoulos, G. 98, 210 psychology and trends in assessment 24 public relations programmes and standards 49 pupils: lower-attaining and assessment research 203; self-assessment 204-5 Purves, A.C. 65 quota system for university in China 154, 174 racism in national curriculum of New Zealand 127 Radnor, H. 144, 145, 147 Raphael, D. 70 reading: literacy, IEA study of 67-9, 75-6; and standards 44-5 realism of students expectations 19-20

Index recruitment: criteria on in Malaysia 181 Renwick, L. 121, 126 rich descriptions in crossnational studies 73-4 Rosanowski, P. 136-42 Royal Society of Arts (RSA) 209, 211 School Innovations Programme (Australia) 48 school-based assessments in Anglophone Africa 102 school-based control (Germany) 90 schools: accountability in United States 28-9; curriculum and examinations in China 170-1; performance and examinations in China 172; and political agendas 33-4; public relations programmes and standards 49 Schools Examination and Assessment Council 57, 145 Scotland: Annual Report (SED) 121-2; assessment research in 200; vocational sector, assessment in 213 Second International Mathematics Study 65-6, 70-1, 74 Second International Science Study 67, 75 Secondary Examinations Council: Annual Report 121, 126; Coursework Assessment in GCSE 110 secondary schools: assessment system, Sri Lanka 15-17, 19; enrolment ratios in

Anglophone Africa 99 selection: criteria of and educational qualifications 177-8; by employers in Malaysia, criteria of 1813; by examination in Anglophone Africa 103-4; system for university entrance in China 154-5 semesters in American schools 70-1 setting see ability grouping sexism in national curriculum of New Zealand 127 Seychelles: and continuous assessment 108-9; System of Assessment Students 109 short answers in examinations in China 159, 161 Shuttleworth, Kay 95 simple answers, lure of in cross-national studies 74 Singh, J.S. 21, 177-98 skills: direct assessment of 26; in national curriculum of New Zealand 130-1; state assessment in United States 28 SLOG see study on learning orientations social context of examination system in Africa 98-100 social groups and assessment, role of 18 society and assessment, role of 18 Southern Examining Group 122, 125 Soviet Union: examinations in 92-3 spoken communication, assessment of in Maori

Index language 139 Spring, J. 36 Sri Lanka 9-11; and continuous assessment 106-7; Continuous Assessment for the Sri Lanka GCE 107; secondary school assessment system 15-17, 19 Stake, R. 36-7 standards: affected by context 44-5; agenda 3251; definitions 34-5; and examinations in China 174; monitoring of 61-3; and national curriculum of New Zealand 125-33; nature of 43; perceptions about, improved 46-7; processes for shaping 478; studies of 39-42 state control of examinations: in Anglophone Africa 101; in United States 86 states: assessment by in USA 27-8, 60; licensing examinations by (USA) 208; vocational assessment by in Germany 218 Stenhouse, Lawrence 44, 48, 123-5 structured questions in examinations in China 160 students: and assessment, role of 14-15, 18-19; groups of, assessing 28; individual, assessing 278; profiles of 121; without qualifications in New Zealand 119-20, 121 study on learning orientations (SLOG) 21, 234

181, 188; learning motivations study 183-4 subject-dominated curriculum 37 summative assessment 124 supporters, and assessment in Maori language 140 Swaziland: and continuous assessment 106; examination system in 98105 Sweden, examinations in 93-4 syllabus approval and GCSE 145 Tang, Qinhii 172 TANU The Musoma Resolution 107 Tanzania and continuous assessment 106-7 Task Group on Assessment and Testing 26, 30, 45, 120, 124-5 Tawney, D.A. 164 teachers: accreditation 125, assessment of 29-31; beginning, assessment of 29; China, examinations in 171; and computers in schools 25; and concern with examinations in Malaysia 180; and continuous assessment 115; and criterion-referencing of assessment 122-3; England, examinations in 95-6; and examinations 97; and GCSE 146-7; and NFER inquiry findings 205; participation and assessment research 202; and political agendas 33-4; role of assessment 14-15, 18-19; shortages in United States 26-7; Soviet Union,

Index examinations in 92; and standards in national curriculum of New Zealand 125; Sweden, examinations in 93-4; training in assessment in Maori language 139 technical and scientific character of assessment 23-7 technology and assessment 25 testing, definitions of 34-5 Third World, socioeconomic development 177 topic coverage in high schools 71 Training Commission 210 training in vocational sector in Germany 218-19 Tyler, R.W. 36 underachievement in England and Wales 55-8 unemployment and examinations in France 88 Unger, J. 154-5 uniformity of examinations in China 91, 97 United Kingdom see England and Wales United States: ability grouping 72; comprehensiveness of assessment 37; control of assessment 27-9; Eight Year Study (1930s) 48; examinations 86-7; national assessment in 53-64; studies of standards 39-40, 41-2; teachers, assessment of 29-31; technical and scientific character of

assessment 23-7; vocational sector, assessment in 208 university examinations in China 153-76; commentary on 170-5; context of 1535; entrance system 155-8; format of 158-60; papers, structure of 160-70 university places: legal entitlement to in Germany 90 validity of assessment 24, 37; and continuous assessment 111 vocational certification (England and Wales) 21112 vocational sector, assessment in: in England and Wales 209; France 215; Germany 217; of practical capabilities 204; Scotland 213; in United States 207-8 Wahlstrom, M. 67, 70, 72, 74 Walker, D.A. 65-6, 75 Wang, Lu 11, 153-76 Watson, D.J. 68 Weitz, J. 43 Wells, C.G. 68, 69 Weston, Penelope 199-206 Wilson, A.B. 179 Windham, D.M. 99, 100 Wolf, Alison 207-23 Wolfe, R. 67, 72, 74 Wood, R. 61, 124 work: behaviour and learning motivations in Malaysia 190-5; and learning motivation in Malaysia 177-98; orientation and learning 235

Index motivations in Malaysia 192-4 World Bank Education in Sub-Saharan Africa 98-9 writing: IEA study of 67-9 Xu, Hui 171-2

236

Youth Opportunities Programme (YOP) 211 Youth Training Scheme (YTS) 210-11 Zambia: examination system in 98-105