The Philosophical Limitations of Educational Assessment: Implications for Academic Selection 3031470206, 9783031470202


110 89 3MB

English Pages [165]

Report DMCA / Copyright

DOWNLOAD PDF FILE

Table of contents :
Acknowledgements
About This Book
Contents
About the Author
List of Abbreviations
List of Figures
Chapter 1: Introduction
References
Chapter 2: Philosophical Tensions Associated with Educational Assessment
Introduction
Validity and Reliability in Educational Assessment
Existing Philosophical Perspectives on Validity and Reliability in Educational Assessment
Tensions Associated with One-off High Stakes Tests
Summary
References
Chapter 3: Implications of Educational Assessment Critique for Public Examinations
Introduction
Validity and Reliability of Public Examination Results
Covid-19: A Missed Opportunity for Reform of Public Examinations?
Summary
References
Chapter 4: Historical Evolution of Academic Selection
Introduction
Francis Galton and the Eugenics Movement
Cyril Burt’s Contribution to Academic Selection Policies
Controversy Surrounding Cyril Burt’s Work
Academic Selection in Northern Ireland
Academic Selection in Other International Contexts
Summary
References
Chapter 5: Consequences of Academic Selection
Introduction
Effectiveness of Grammar Schools
Academic Achievement at Post-primary Level
Students’ Trajectories Beyond Post-primary Education
International Evidence
Methodological Issues in Grammar School Effectiveness Research
Summary Comments
Social Composition of Schools, Social Mobility, and Educational Equity
Social Composition of Schools
Social Mobility
Educational Equity
Other Consequences of Academic Selection
Students’ Socio-emotional Outcomes
Curriculum Delivery
Social Integration and Cohesion
Summary
References
Chapter 6: Ethics of Academic Selection
Introduction
Perspectives on Educational Justice and Their Implications for Academic Selection
Epistemic Injustice and Epistemic Disadvantage
Epistemic Injustice
Epistemic Disadvantage
To What Extent Does Academic Selection Cause Epistemic Harm to Young People?
Summary
References
Chapter 7: Conclusion: Implications for Policy and Practice
Introduction
Towards a New Paradigm for Educational Assessment: Implications for High Stakes Public Examinations
The Future of Academic Selection
Concluding Remarks
References
Index
Recommend Papers

The Philosophical Limitations of Educational Assessment: Implications for Academic Selection
 3031470206, 9783031470202

  • 0 0 0
  • Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up
File loading please wait...
Citation preview

The Philosophical Limitations of Educational Assessment Implications for Academic Selection

Ian Cantley

The Philosophical Limitations of Educational Assessment

Ian Cantley

The Philosophical Limitations of Educational Assessment Implications for Academic Selection

Ian Cantley School of Social Sciences, Education and Social Work Queen’s University Belfast Belfast, UK

ISBN 978-3-031-47020-2    ISBN 978-3-031-47021-9 (eBook) https://doi.org/10.1007/978-3-031-47021-9 © The Editor(s) (if applicable) and The Author(s), under exclusive licence to Springer Nature Switzerland AG 2024 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. Cover illustration: Melisa Hasan This Palgrave Macmillan imprint is published by the registered company Springer Nature Switzerland AG. The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland Paper in this product is recyclable.

Acknowledgements

This book was written during a period of sabbatical leave in the second semester of the 2022–2023 academic year. Although I have been working on areas allied to the philosophy of educational assessment for several years, I would have been unable to write this book without the dedicated time and space created by the sabbatical, which allowed me to focus on the research underpinning the arguments presented in the book. I am therefore extremely grateful to the School of Social Sciences, Education and Social Work at Queen’s University Belfast for granting me the sabbatical, and for providing the necessary resources and funding to cover my teaching and administrative duties when I was on sabbatical. I am particularly indebted to those colleagues who assisted with the PGCE Mathematics programme during the sabbatical. I would also like to thank the various individuals who read and offered constructive comments on early drafts of the manuscript. Finally, I wish to thank my family and friends for all their encouragement and support throughout the writing of the book.

v

About This Book

I use philosophical analysis to argue that there are tensions associated with using the results of high stakes tests to predict students’ future potential. The implications of these issues for the interpretation of test scores in general are then elucidated before I consider their connotations for academic selection. After a brief overview of the history of academic selection in the United Kingdom, and a review of evidence pertaining to its consequences, I suggest that the practice of using the results of contemporary high stakes tests to make important decisions about students incurs logical and moral problems that a conscientious educator cannot ignore. The gravity of the moral transgression depends on the purpose and significance of the test and, in the case of high stakes tests used for academic selection purposes, I argue that not only can the moral wrong be highly significant, but better solutions are within reach.

vii

Contents

1 Introduction  1 References   8 2 Philosophical  Tensions Associated with Educational Assessment 11 Introduction  12 Validity and Reliability in Educational Assessment  13 Existing Philosophical Perspectives on Validity and Reliability in Educational Assessment  15 Tensions Associated with One-off High Stakes Tests  20 Summary  30 References  31 3 Implications  of Educational Assessment Critique for Public Examinations 35 Introduction  35 Validity and Reliability of Public Examination Results  37 Covid-19: A Missed Opportunity for Reform of Public Examinations?  46 Summary  49 References  49 4 Historical  Evolution of Academic Selection 53 Introduction  54 Francis Galton and the Eugenics Movement  54 ix

x 

Contents

Cyril Burt’s Contribution to Academic Selection Policies  58 Controversy Surrounding Cyril Burt’s Work  61 Academic Selection in Northern Ireland  63 Academic Selection in Other International Contexts  67 Summary  68 References  69 5 Consequences  of Academic Selection 73 Introduction  74 Effectiveness of Grammar Schools  74 Academic Achievement at Post-primary Level  74 Students’ Trajectories Beyond Post-primary Education  80 International Evidence  81 Methodological Issues in Grammar School Effectiveness Research  83 Summary Comments  84 Social Composition of Schools, Social Mobility, and Educational Equity  84 Social Composition of Schools  84 Social Mobility  88 Educational Equity  93 Other Consequences of Academic Selection  96 Students’ Socio-emotional Outcomes  96 Curriculum Delivery  98 Social Integration and Cohesion  99 Summary 100 References 102 6 Ethics  of Academic Selection109 Introduction 109 Perspectives on Educational Justice and Their Implications for Academic Selection 112 Epistemic Injustice and Epistemic Disadvantage 118 Epistemic Injustice 118 Epistemic Disadvantage 123 To What Extent Does Academic Selection Cause Epistemic Harm to Young People? 125 Summary 129 References 130

 Contents 

xi

7 Conclusion:  Implications for Policy and Practice133 Introduction 133 Towards a New Paradigm for Educational Assessment: Implications for High Stakes Public Examinations 134 The Future of Academic Selection 140 Concluding Remarks 145 References 147 Index149

About the Author

Ian Cantley is Senior Lecturer in Education at Queen’s University Belfast, Northern Ireland. His research interests are in mathematics education and the mathematical and philosophical foundations of educational measurement models. He has published numerous articles in leading international journals on both the philosophy of education and mathematics education. His work is particularly concerned with the theoretical assumptions that underpin contemporary approaches to educational assessment, methods for improving students’ mathematical learning experiences at school, and gender equity issues in mathematics. Ian’s teaching responsibilities include contributions to the PGCE initial teacher education programme and taught masters’ programmes in education, and he supervises masters’ and doctoral-level dissertations on various aspects of education.

xiii

List of Abbreviations

AQE FSM GCE GCSE IQ PISA PPTC RCT RDD SEAG SEN SES

Association for Quality Education Free school meals General Certificate of Education General Certificate of Secondary Education Intelligence quotient Programme for International Student Assessment Post-Primary Transfer Consortium Randomised controlled trial Regression discontinuity design Schools’ Entrance Assessment Group Special educational needs Socioeconomic status

xv

List of Figures

Fig. 2.1 Fig. 2.2

Possible interpretations of the word “cube” L-shaped sequence of dots

22 23

xvii

CHAPTER 1

Introduction

Abstract  In this book, I explore the controversial world of high stakes educational testing by critically evaluating some of the philosophical assumptions upon which it is based and examining the potential ethical implications of any weaknesses in contemporary approaches to high stakes testing. Initially, I critically appraise the philosophical underpinnings of educational assessment and high stakes tests in general, before focusing on the implications of my analysis for one particular use of high stakes tests: academic selection for post-primary education. Whilst I draw extensively upon evidence pertaining to Northern Ireland, and the United Kingdom more generally, my analysis is likely to be relevant to other education systems around the world. This chapter provides important contextual information that is pertinent to the material addressed in the book. Keywords  Academic selection • Educational assessment • High stakes tests • Reliability • Validity In today’s world, academic success is more important than ever. With the increasing emphasis on obtaining educational credentials and the growing competition for jobs, students are under a lot of pressure to perform well at school. Throughout their school careers, the extent of students’ learning is assessed in a variety of ways, from informal monitoring of their

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 I. Cantley, The Philosophical Limitations of Educational Assessment, https://doi.org/10.1007/978-3-031-47021-9_1

1

2 

I. CANTLEY

progress by teachers to timed tests taken under examination conditions. The results obtained in some of these tests have significant import in determining the students’ future educational and vocational options, and such tests are commonly referred to as high stakes tests. While proponents of high stakes testing argue that these tests constitute a fair and objective way to measure academic potential, critics warn that they can exacerbate inequality and lead to a narrow approach to education that is inappropriately focused on preparation for tests and examinations. In this book, I explore the controversial world of high stakes testing by critically evaluating some of the philosophical assumptions upon which it is based and examining the potential ethical implications of any weaknesses in contemporary approaches to high stakes testing. Initially, I critically appraise the philosophical underpinnings of educational assessment and high stakes tests in general, before focusing on the implications of my analysis for one particular use of high stakes tests: academic selection for post-primary education. Whilst I draw extensively upon evidence pertaining to Northern Ireland, and the United Kingdom more generally, my analysis is likely to be relevant to other education systems around the world. To assure their fairness, accuracy, and trustworthiness, developers of high stakes tests are expected to ensure the tests are both valid and reliable. Validity is viewed as a fundamental requirement for high stakes tests since it is associated with the degree to which the inferences made from test scores about students’ capabilities are warranted. Validity may be defined as “an overall evaluative judgment of the degree to which empirical evidence and theoretical rationales support the adequacy and appropriateness of interpretations and actions on the basis of test scores” (Messick, 1995, p. 741). It indicates the credence that can be given to inferences about students’ capabilities based on test performance. Reliability, on the other hand, refers to the dependability or consistency of the results provided by a test, or the extent to which the test would yield the same or highly similar results if repeated under similar conditions. Reliability is therefore an essential aspect of validity since unreliable scores cannot support valid inferences about students’ capabilities, at least at the individual student level (Cizek, 2009). However, various philosophers of education have offered critical perspectives on validity and reliability of high stakes tests, including the possible tensions between the two concepts in the context of tests of students’ skills and capabilities (Davis, 1995, 1998). Considerations of validity and reliability have important implications for those who condone the use of high stakes tests to select students for

1 INTRODUCTION 

3

educational or vocational opportunities, and they are an important focus of my analysis in the current work. Most education systems around the world utilise different forms of selection to allocate places to students based on their performance in such things as formal academic examinations, aptitude tests or interviews. The use of such selection mechanisms is claimed to offer equitable, impartial, and meritocratic approaches to the allocation of student places when there is competition for access to scarce educational opportunities, such as higher education courses or vocational training programmes (Kellaghan & Greaney, 2020). However, in some countries, such as Northern Ireland, primary school students are selected based on academic capability to attend different types of post-primary schools. In the case of Northern Ireland, students can take a high stakes test of their academic capability, commonly referred to as a “transfer test”, towards the end of their primary school career, usually at 10 or 11 years of age. Students’ performance in this test is used to determine their eligibility for admission to academically oriented grammar schools, and those students who attain the highest scores in the test can choose to attend either a grammar school or a non-­ selective post-primary school. In contrast, students who attain lower scores in the test, or who do not sit it, are denied admission to a grammar school, and are normally compelled to attend a non-selective post-primary school. Grammar schools are usually viewed as being synonymous with high-­performance and academic success, while non-selective schools are often deemed to cater for students with less academic prowess (Brown et  al., 2021). Therefore, this leads to a situation whereby post-primary students in Northern Ireland are segregated into different types of schools according to their academic capability, as measured by a high stakes test. The current system of academic selection in Northern Ireland can be traced to the Education Act (Northern Ireland) 1947, which followed on from the 1944 Education Act in England and Wales and led to free post-­ primary education for all children. As in England and Wales, grammar schools had existed in Northern Ireland prior to 1947 (1944 in England and Wales), but students were admitted to them on a fee-paying basis rather than on the basis of academic capability (Gardner, 2016). However, the 1947 Education Act led to a situation where students gained access to grammar schools based on their performance in a test of their academic capability, and a system of non-selective post-primary schools was introduced to educate those who did not secure a grammar school place. A third option, for vocational education in technical schools/colleges, was

4 

I. CANTLEY

also introduced as a consequence of the 1947 Education Act, but it did not thrive and eventually disappeared (Gallagher, 2021). Standardised testing of cognitive capabilities originated around 2200 B.C. in China, where candidates for Chinese civil service positions were given tests of their capabilities in such diverse domains as music, archery, writing and arithmetic (Miyazaki, 1981). However, in Europe and America, standardised testing of mental capabilities did not begin until the nineteenth century. In the latter part of the nineteenth century, psychologists in both Europe and America independently sought mechanisms for measuring individual differences in mental capability. The English polymath Sir Francis Galton and the American psychologist James McKeen Cattell were pioneers of the use of intelligence tests to quantify mental capabilities. Galton held the view that mental capabilities are largely inherited, and he is the founding father of eugenics, which is associated with the study of methods for improving the human race by increasing the incidence of desirable heritable characteristics, such as high levels of mental capability. Galton’s enthralment with eugenics was also embraced by his disciple Sir Cyril Burt, who advocated for the widespread use of intelligence tests to classify and label schoolchildren from an early age (Chitty, 2013). When Alfred Binet devised and published the first intelligence quotient (IQ) test in France in 1905, it was solely intended to be used to identify those students who required additional support with their learning, rather than as a mechanism for measuring and ranking all children based on their intelligence. Indeed, Binet even cautioned against using IQ as a general measure of intelligence when he claimed: “The scale, properly speaking, does not permit the measure of intelligence, because intellectual qualities are not superposable, and therefore cannot be measured as linear surfaces are measured” (cited in Gould, 1996, p. 181). Binet considered intelligence to be too complex and multi-faceted to be captured by a single number. Nevertheless, Cyril Burt worked relentlessly to ensure that his own ideas pertaining to the innateness and measurability of intelligence were incorporated into British government policy, and his work was instrumental in preparing the ground for the grammar school academic selection process that was heralded by the 1944 and 1947 Education Acts. Burt’s controversial ideas pertaining to fixed capability based on inheritance have been discredited, and the policy of academic selection using tests of cognitive capabilities was gradually eroded in much of the United Kingdom during the 1960s and 1970s, with a transition to mixed-ability

1 INTRODUCTION 

5

post-primary education in comprehensive schools (Chitty, 2009). However, academic selection persists in Northern Ireland and some parts of England, despite the evidence contravening its underpinning philosophy that intelligence is innate, immutable, and measurable at a young age. Proponents of academic selection posit that the use of academic tests to select students for grammar schools offers a fair and meritocratic method for educating young people, which is unfettered by social class, and consequently acts a vehicle for social mobility (Gallagher, 2021). According to Gorard and Siddiqui (2018), those who favour academic selection argue that students generally perform better at grammar schools than at non-­ selective schools, with the most socioeconomically disadvantaged students faring better at grammar schools than if they had attended non-selective schools. Thus, advocates of academic selection contend that it reduces the poverty attainment gap and helps to promote social mobility. However, in an analysis of the full 2015 GCSE (public examinations taken at approximately 16 years of age) cohort of students in England, Gorard and Siddiqui (2018) found no evidence to support this claim. Furthermore, several tensions pertaining to academic selection have been noted in the literature. Based on research conducted in Northern Ireland, Gardner and Cowan (2005) suggested that the tests used to operationalise academic selection in that era may have had deficiencies in their psychometric properties which compromised their validity. They highlighted that these shortcomings meant the candidate ranking system used in the Northern Ireland selection tests may have potentially misclassified up to two-thirds of the candidates by up to three grades. Moreover, the assumption that academic selection tests can be used to measure ability is placed under tension because ability is not a unidimensional psychological construct that can be adequately described by a single score on a test. Gardner’s (1983) theory of multiple intelligences supports the viewpoint that ability is a multidimensional rather than a unidimensional construct. Indeed, even if a test could be constructed to provide a robust measure of an appropriate construct, the construct may change over time rather than remaining fixed (Moffitt et al., 1993). Students who fail to secure a grammar school place based on their performance in a selection test may experience a loss of confidence and self-esteem associated with a sense of failure, thus requiring intervention from post-primary teachers to rebuild confidence (Gallagher & Smith, 2000). Preparation for academic selection tests in primary schools also has the potential to disrupt teaching and learning in the primary phase, which may negatively impact upon the holistic

6 

I. CANTLEY

development of students and exacerbate learning continuity issues during the transition to post-primary education (Gallagher & Smith, 2000). A major drawback of academic selection is that it is socially divisive, and potentially compounds the effects of educational disadvantage. For example, Gorard et al. (2003) and Jerrim and Sims (2019) demonstrated that grammar school students generally tend to be elite both socially and academically. However, within selective education systems, this leads to a large number of non-selective schools with low ability students from socially disadvantaged backgrounds, which compounds the educational disadvantages associated with both factors (Gallagher & Smith, 2000). Whilst there is a significant body of empirical evidence pertaining to the deleterious consequences of academic selection, there is rather less by way of fundamental theoretical and philosophical analyses of the tensions associated with such practices. One of the aims of current book is to address this imbalance in the literature, with the overall goal of highlighting philosophical tensions associated with educational assessment in general and presenting a novel philosophical critique of academic selection. The specific objectives of the book are: (i) To highlight some philosophical tensions associated with educational assessment, and to outline the implications of these tensions for high stakes public examinations. (ii) To give a historical overview of the evolution of the use of testing as a technology for effecting academic selection, with a particular focus on the United Kingdom, and Northern Ireland in particular. (iii) To critically review the empirical evidence pertaining to the consequences of academic selection. (iv) To offer a novel philosophical critique of the ethics of academic selection, which draws upon aspects of Ludwig Wittgenstein’s later philosophy and the concepts of epistemic injustice and epistemic disadvantage from social epistemology. (v) To make recommendations for policy and practice in relation to high stakes testing in general and academic selection. Chapters 2 and 3 pertain to the philosophical limitations of educational assessment in general. After a review of some existing philosophical perspectives on the validity and reliability of educational assessment, Chap. 2 draws upon Ludwig Wittgenstein’s analysis of rule following in his later philosophy to argue that there are conceptual difficulties pertaining to validity and reliability in the context of educational assessment. In

1 INTRODUCTION 

7

particular, I articulate a philosophical rationale for the difficulties associated with reliably assessing higher level cognitive skills, such as extended writing tasks. The implications of my critique of educational assessment for the results of high stakes public examinations are then elucidated in Chap. 3. Beyond Chap. 3, I focus more specifically on academic selection. Chapter 4 provides a historical overview of the evolution of selection tests as a mechanism for operationalising academic selection in different education systems around the world. The chapter uses the United Kingdom, with a particular focus on Northern Ireland, as an in-depth case study, but the relevance to other international contexts is also articulated. Chapter 5 draws upon relevant empirical research evidence to provide a critical analysis of the consequences of academic selection. Chapter 6 initially reviews some relevant philosophical perspectives on the ethics of academic selection. This is followed by a summary of Miranda Fricker’s and David Coady’s work on epistemic injustice, which refers to the way in which an individual can be wronged in their capacity as a knower, and consequently be treated unfairly in relation to access to epistemic goods such as education. The concept of epistemic disadvantage, which refers to the intellectual or moral harms that can occur when an individual’s exclusion from knowledge exchanges is warranted, is also introduced. In the remainder of Chap. 6, I analyse the extent to which various aspects of academic selection, including the tensions associated with high stakes tests that I articulate in Chaps. 2 and 3, can potentially lead to either epistemic disadvantage or epistemic injustice for some students. Chapter 7 summarises the arguments and articulates the policy and practice implications of the analysis presented in the earlier chapters. Initially, several recommendations are made to address the critique of educational assessment that is advanced in the book, for example, in relation to the results of high stakes public examinations, before focusing on the implications for academic selection. More specifically, in the concluding chapter, I argue that contemporary approaches to high stakes tests could potentially be improved by a transition to a more distributed mode of continuous assessment that attaches greater importance to formative assessment of students’ learning as a vehicle for improving their future educational outcomes. I also make the point that it is problematic to use the results from any form of academic achievement assessment in isolation to predict future outcomes. This is because such assessments tend to place limited focus on wider capabilities, such as intrapersonal and interpersonal skills, which also have an important bearing on future outcomes (Heckman & Kautz, 2012; Parts et al., 2013; Stal &

8 

I. CANTLEY

Paliwoda-Pękosz, 2019). In relation to academic selection, I argue that those education systems which currently condone the practice should abandon it in favour of more socially just approaches to educating children and young people. I advocate the replacement of academic selection by allability comprehensive post-primary education, allied with the use of socially differentiating selection schemes based on what I term “academic banding”. I also argue in favour of mixed-ability teaching where possible within the context of comprehensive education. Furthermore, I suggest that, in the longer term, due consideration should be given to the potential of information and communications technology solutions to deliver personalised curricula and learning experiences (including associated assessment arrangements) for young people. Recent advances in artificial intelligence may pave the way to this solution rather than the alternative option of laying down specific pathways within which young people must fit.

References Brown, M., Donnelly, C., Shevlin, P., Skerritt, C., McNamara, G., & O’Hara, J. (2021). The rise and fall and rise of academic selection: The case of Northern Ireland. Irish Studies in International Affairs, 32(2), 477–498. https://doi.org/ 10.1353/isia.2021.0060 Chitty, C. (2009). Eugenics, race and intelligence in education. Continuum. Chitty, C. (2013). The educational legacy of Francis Galton. History of Education, 42(3), 350–364. https://doi.org/10.1080/0046760x.2013.795619 Cizek, G. J. (2009). Reliability and validity of information about student achievement: Comparing large-scale and classroom testing contexts. Theory Into Practice, 48(1), 63–71. https://doi.org/10.1080/00405840802577627 Davis, A. (1995). Criterion-referenced assessment and the development of knowledge and understanding. Journal of Philosophy of Education, 29(1), 3–21. https://doi.org/10.1111/j.1467-­9752.1995.tb00337.x Davis, A. (1998). The limits of educational assessment. Blackwell. Gallagher, T. (2021). A problem of policy paralysis: A response to “the rise and fall and rise of academic selection: The case of Northern Ireland” by Martin Brown et  al. Irish Studies in International Affairs, 32(2), 503–506. https://doi. org/10.1353/isia.2021.0067 Gallagher, T., & Smith, A. (2000). The effects of the selective system of secondary education in Northern Ireland: Main report. Department of Education for Northern Ireland. https://www.education-­ni.gov.uk/sites/default/files/publications/de/gallagherandsmith-­mainreport.pdf Gardner, H. (1983). Frames of mind: The theory of multiple intelligences. Basic Books.

1 INTRODUCTION 

9

Gardner, J. (2016). Education in Northern Ireland since the good Friday agreement: Kabuki theatre meets danse macabre. Oxford Review of Education, 42(3), 346–361. https://doi.org/10.1080/03054985.2016.1184869 Gardner, J., & Cowan, P. (2005). The fallibility of high stakes “11-plus” testing in Northern Ireland. Assessment in Education: Principles, Policy & Practice, 12(2), 145–165. https://doi.org/10.1080/09695940500143837 Gorard, S., & Siddiqui, N. (2018). Grammar schools in England: A new analysis of social segregation and academic outcomes. British Journal of Sociology of Education, 39(7), 909–924. https://doi.org/10.1080/01425692.2018.1443432 Gorard, S., Taylor, C., & Fitz, J. (2003). Schools, markets and choice policies. Routledge Falmer. Gould, S. J. (1996). The mismeasure of man. W. W. Norton. Heckman, J. J., & Kautz, T. (2012). Hard evidence on soft skills. Labour Economics, 19(4), 451–464. https://doi.org/10.1016/j.labeco.2012.05.014 Jerrim, J., & Sims, S. (2019). Why do so few low- and middle-income children attend a grammar school? New evidence from the millennium cohort study. British Educational Research Journal, 45(3), 425–457. https://doi. org/10.1002/berj.3502 Kellaghan, T., & Greaney, V. (2020). Public examinations examined. World Bank. https://openknowledge.worldbank.org/bitstream/handle/10986/32352/ 9781464814181.pdf?sequence=2&isAllowed=y Messick, S. (1995). Validity of psychological assessment: Validation of inferences from persons’ responses and performances as scientific inquiry into score meaning. American Psychologist, 50(9), 741–749. https://doi.org/10.1037/0003-­ 066x.50.9.741 Miyazaki, I. (1981). China’s examination hell: The civil service examinations of Imperial China. Yale University Press. Moffitt, T. E., Caspi, A., Harkness, A. R., & Silva, P. A. (1993). The natural history of change to intellectual performance: Who changes? How much? Is it meaningful? Journal of Child Psychology and Psychiatry, 34(4), 455–506. https://doi.org/10.1111/j.1469-­7610.1993.tb01031.x Parts, V., Teichmann, M., & Rüütmann, T. (2013). Would engineers need non-­ technical skills or non-technical competences or both? International Journal of Engineering Pedagogy, 3(2), 14–19. https://doi.org/10.3991/ijep.v3i2.2405 Stal, J., & Paliwoda-Pękosz, G. (2019). Fostering development of soft skills in ICT curricula: A case of a transition economy. Information Technology for Development, 25(2), 250–274. https://doi.org/10.1080/02681102.2018. 1454879

CHAPTER 2

Philosophical Tensions Associated with Educational Assessment

Abstract  In this chapter, I review some existing philosophical perspectives on educational assessment, with a focus on critically evaluating the views of various philosophers of education on the concepts of validity and reliability. I then draw upon Ludwig Wittgenstein’s analysis of rule following in his later philosophy to argue that there are philosophical tensions associated with using one-off high stakes tests to assess students’ academic capabilities and to predict future outcomes. I also make the point that these issues are exacerbated by the difficulties pertaining to the reliable assessment of higher level cognitive skills. Finally, I suggest that some of these philosophical tensions could be potentially alleviated by replacing one-off high stakes tests by a temporally extended system of continuous assessment, which could offer the prospect of fairer inferences being made about students’ academic capabilities. However, I also caution against relying solely on the results of assessments of academic capabilities to predict students’ future outcomes. Keywords  Educational assessment • High stakes tests • Reliability • Validity • Wittgensteinian critique

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 I. Cantley, The Philosophical Limitations of Educational Assessment, https://doi.org/10.1007/978-3-031-47021-9_2

11

12 

I. CANTLEY

Introduction There are several tensions associated with educational assessment and what can be successfully measured using common assessment methods. From a philosophical perspective, assessment raises questions about what we can come to know about the minds of other people, and the approaches that can be used to garner such knowledge. For example, common sense may dictate that an individual’s behaviour, such as what they write in response to an item on a test, is determined by the contents of their mind. Surely then, if the individual’s mind is populated by specific mental entities which are causally related to their test-taking behaviour, it is simply a matter of constructing appropriate and robust assessment instruments to reveal the contents of their mental realm? This is an issue that has exercised various philosophers of education and, alas, it transpires that the reality is more complex than this enticing model of educational assessment suggests. In addition to various philosophical analyses, educational assessment has been the subject of a substantial body of empirical research over many years. This research has investigated a multitude of issues pertaining to assessment including, for example, the consequences of high stakes examinations. Such examinations have traditionally been viewed as a fair and meritocratic approach to assessing students’ capabilities, and a vehicle for opening up opportunities to those who may be disadvantaged by an alternative assessment approach. However, some empirical research has revealed that high stakes examinations tend to distort the teaching and learning process in classrooms through such things as teachers “teaching to the test”, and to potentially compromise the quality of students’ learning, so that the measurement process (adversely) influences the attribute that is being measured (Kellaghan & Greaney, 2020). Overall, educational assessment has been the subject of significantly more empirical research than philosophical analyses. This chapter seeks to address the relative imbalance in the literature by contributing some philosophical insights into assessment. Initially, the concepts of validity and reliability in educational assessment are reviewed before some existing philosophical investigations into validity and reliability are summarised. Following this, a novel philosophical critique of educational assessment is presented, which draws upon Wittgenstein’s analysis of rule-following in his later philosophy and problematises contemporary approaches to high stakes educational assessment.

2  PHILOSOPHICAL TENSIONS ASSOCIATED WITH EDUCATIONAL… 

13

Validity and Reliability in Educational Assessment The validity of a test refers to the extent to which the inferences made from test scores about test takers’ capabilities are warranted. Traditionally, in the context of educational assessment, validity was defined as the degree to which a test measures the attribute it was devised to measure. However, a myriad of different ways of conceptualising validity has been theorised and discussed in the assessment literature, culminating in the contemporary view that validity is a property of the inferences made about students’ capabilities based on assessment outcomes. One of the most prominent conceptualisations of validity is attributable to Samuel Messick, who subscribed to the contemporary definition of validity supported by most psychometricians (Messick, 1989, 1995). More specifically, Messick (1989, 1995) argued that validity should always be assessed using multiple sources of evidence, based on both scientific and ethical considerations. He favoured a unified theory of construct validity that incorporates the meaning of test scores based on evidence of the technical fidelity of tests. Such evidence permits conclusions to be drawn about how well a test has been constructed and the inferences that can be drawn from a test score about the construct(s) being tested, including the degree to which the test samples the relevant domain of interest. An evaluation of this depends upon the judgements of potential test users and experts in the domain being tested. Messick also stressed that consideration should be given to the extent to which a test score can be used to make predictions and/or decisions about test-takers, including how well the test score predicts future performance and the extent to which it correlates with other existing measures. Messick’s focus on the ethical dimension of testing in his unified conception of construct validity is reflected in the importance he attaches to the social consequences of interpreting and using test scores to make judgements about those who take the test. Messick (1995) argued that an evaluation of social consequences should be based on evidence of both short and long-term positive and negative consequences, with a view to minimising the adverse consequences. Positive consequences, for example, may include the possibility of accessing improved educational opportunities, while negative consequences could relate to biased test scores or unfair use of the test outcomes in making judgements about the test-takers. More importantly, Messick argued that social consequences were part of validity only to the extent

14 

I. CANTLEY

that adverse social consequences could be traced to inadequacies in the assessment. Despite its widespread influence, several scholars have challenged Messick’s conceptualisation of validity. For example, Cizek (2012) argued that Messick’s notion of validity is incoherent because it attempts to encapsulate both the meaning of a test score and its use into a single concept. Cizek (2012) averred that such an integration of scientific and ethical dimensions was problematic because of their mutual incompatibility. Consequently, inconsistencies have emerged between theoretical conceptualisations of validity and validation practices. Nevertheless, it is important to note that the dominant contemporary conceptualisation of validity presupposes that it is possible to abstract a test score away from the measuring instrument, that is, the test paper, and use it to make inferences about the test-taker. The reliability of an educational assessment pertains to its consistency in producing similar outcomes in the same, or not significantly different, circumstances. If, for example, a test produces inconsistent results when administered under broadly similar conditions, it would make little sense to consider the accuracy of the test scores. Although the test may yield accurate scores on some occasions, it would be misleading to argue that the test possesses a high level of validity. Therefore, it would be indefensible to stake a claim to validity in the absence of reliability for a given educational assessment, thus demonstrating that the concepts of validity and reliability are intimately entwined. Cizek (2009) outlines various methods for assessing the reliability of a test. These include administering the same test on two separate occasions, which is referred to as the test-­ retest procedure, or administering parallel forms of the test that have the same basic structure and level of difficulty, but slight variations in the actual test items (such as identical arithmetical problems but with different numerical values, for example). To obviate the need for separate test administrations, the reliability of a test is often judged by considering its internal consistency, whereby a statistical calculation is performed to quantify the correlations between students’ scores in groups of related test items. It is evident that the twin concepts of validity and reliability play pivotal roles in the educational assessment arena. It is therefore instructive to consider a number of philosophical perspectives on these concepts, and relevant literature in this area is critically reviewed in the next section.

2  PHILOSOPHICAL TENSIONS ASSOCIATED WITH EDUCATIONAL… 

15

Existing Philosophical Perspectives on Validity and Reliability in Educational Assessment One of the most notable philosophical critiques of educational assessment to have appeared in the literature is that of Andrew Davis. In a detailed analysis of the philosophical foundations of assessment, Davis (1998) argues that tests cannot reveal the detailed information about students’ knowledge that are assumed by proponents of high stakes testing. After a rigorous analysis, where he draws on philosophical ideas pertaining to the nature of knowledge, Davis (1998) casts doubt on the premises upon which the traditional notion of validity in educational assessment was based, that is, validity is the extent to which a test measures the construct it was designed to measure: [A]spirations to “discover” in detail the contents of a pupil’s mind are based on an illusion. The illusion is that behind behaviour are minds populated by specific and identifiable beliefs, giving rise to the idea that if only we could probe effectively enough we could find out what is there. The reality is a more complex and elusive situation, in which interpretations are made of the mind-states of others. These interpretations require many assumptions which it would be difficult to make wholly explicit. (p. 78)

However, the aspirations implicit in this quotation are at variance with modern conceptualisations of validity as relating to properties of the inferences that can be made from test scores, rather than the extent to which a test measures what it was designed to measure. Initially, Davis (1998) invokes arguments pertaining to holism to reason that students’ learning cannot be construed as discrete entities that can be measured by a test. Davis introduces the concept of property holism by explaining that properties of objects are interconnected to other properties of those objects so that, for example, if an object has shape, it must also have size, and vice versa. This leads to the conclusion that properties occur in interconnected groups. Davis proceeds to caution against the temptation to refer to discrete properties in the same way as discrete physical objects can be referred to. He argues that, when a reference to discrete properties is attempted, this really signifies different characteristics of a more universal scenario. The thrust of his subsequent arguments amounts to the conclusion that, whilst a test may give crude insights into what a student has learned, the

16 

I. CANTLEY

insights cannot be sufficiently detailed to comprehensively summarise the extent of the student’s learning because of its interconnections to a rich web of other knowledge and understanding. In a similar vein, Davis (1998) uses detailed philosophical analysis to reason that, although intentional states are attributed to a person based on their speech and behaviour, those states should not be construed as discrete entities that can be identified. Davis (1998) cautions that an individual’s behaviour needs to be interpreted to make inferences about their intentional states and, although certain interpretations are more applicable than the alternatives, such interpretations are only weakly correlated with corresponding cognitive states. In Davis’s view, the observed behaviour could align with a multitude of different interpretations. Furthermore, he posits that the criteria for assessing the appropriateness of a particular interpretation are predicated on the consensus opinion of the relevant adult community pertaining to the way they interpret other people’s behaviour within their particular cultural context. In respect of educational assessment, this leads to the sceptical conclusion that, irrespective of how a student responds to a test item, for example, there is inherent uncertainty about their associated cognitive state. In other words, the attribution of a particular level of knowledge to the student based on their performance in a test will never be objectively correct, which, if true, would appear to strike a hammer blow to educational assessment. However, most contemporary psychometricians would dismiss these concerns because they do not undermine the possibility of making appropriate inferences about an individual based on their overall test performance. Davis (1998) also problematises the assessment of higher level cognitive abilities, such as critical evaluation skills, by arguing that the identification of such mental attributes is predicated upon the capacity of assessors to make judgements about the similarity of students’ performances in different contexts. Davis argues that such judgements require a “theory” regarding what constitutes similarity, which he deems to be problematic in the case of cognitive performance. Unlike physical properties, such as the malleability of a metal for example, which possess underlying characteristics that explain their tendencies to act in a certain way in a range of different contexts, Davis posits that no such characteristics exist to permit similarity judgements about cognitive performances in different contexts. According to Davis (1998), actions are inextricably linked to features of the social context in which they occur. Therefore, in a similar manner to his dismissal of specific mental entities underlying units of knowledge,

2  PHILOSOPHICAL TENSIONS ASSOCIATED WITH EDUCATIONAL… 

17

Davis (1998) implies that the absence of an adequate theory of similarity for cognitive performances across different social contexts undermines the plausibility of specific mental constructs underpinning higher level cognitive abilities or skills. Davis’s arguments in this respect are, however, somewhat unconvincing, but I offer some philosophical support for his thesis later in the current chapter. Davis (1998) argues that it is impossible to construct tests with sufficient validity and reliability to assess rich knowledge and skills, that is, the level of understanding that is required to the use and apply the knowledge and skills rather than simply recalling them. Davis invokes Wittgenstein’s (2009) analysis of rule-following to reason that, due to the multiplicity of interpretations an assessor may attach to the requirements of a particular test item, and what constitutes a credit-worthy answer, a consistent approach to assessment mandates the tight prescription of appropriate responses to the item. Davis (1998) concludes that such prescription has negative connotations for the design of valid tests of rich knowledge since, to ensure consistency in marking, the assessment of student achievements will tend to be skewed towards basic propositional (or thin) knowledge. Whilst this may be the case in some disciplines, it is less applicable to some aspects of mathematics, where students can be set novel closed problems that they have never encountered before. Such problems have unique correct answers, which, at least in theory, could be reliably judged as either correct or incorrect. Nevertheless, I generally share Davis’s concerns, but I contend that there are philosophical tensions associated with the construction of valid tests of basic propositional or “thin” knowledge (“knowing that”) as well as valid tests of “rich” knowledge (“knowing how”), as illustrated by my arguments later in this chapter. Davis’s critique of the theoretical basis of educational assessment has elicited a mixed response from other philosophers of education. For example, White (1999) concurs with Davis’s concerns about the incompatibility of validity and reliability in the assessment of rich knowledge and understanding. However, this suggests that he does not appreciate that reliability is an essential aspect of validity by virtue of the fact unreliable scores cannot support valid inferences about students’ capabilities. Furthermore, White (1999) is of the view that it is impossible to test rich knowledge and understanding by breaking it up into a series of assessments of corresponding elements of thin knowledge. He legitimises his support for this sensible position by drawing parallels with Passmore’s (1980) claim that an open capacity is not replaceable by a set of closed

18 

I. CANTLEY

capacities. To make robust judgements about rich knowledge and skills, White (1999) posits that it is necessary to have some personal acquaintance with learners, to ensure an appropriate level of understanding of their mental schema, including the logical interconnections between different constituent elements of the schema, in addition to a good comprehension of how the learners operate in general. Based on this somewhat bizarre stance, White (1999) concludes that a system of teacher monitoring of student performance would be a preferable option to high stakes tests, but he provides limited details in relation to how the validity and reliability of teachers’ assessment judgements could be assured. The system of teacher assessment envisaged by White (1999) is strongly criticised by Gingell and Winch (2000). They quite correctly argue that, in asserting the necessity to have personal knowledge of students to assess their performance robustly, White (1999) is confusing personal and professional knowledge, and adopting a naïve view of teachers’ roles, while simultaneously ignoring the need for objectivity in assessment judgements. According to Gingell and Winch (2000), a system of teacher assessment would be fraught with potential difficulties pertaining to teacher bias, which relevant in-service training would be unlikely to ameliorate, and conflicts of interest in situations where teachers’ pay is linked to student performance. These concerns are well founded since several empirical research studies have demonstrated the possibility of inherent student gender, ethnicity, socioeconomic background, special educational needs status, and personality trait biases in teacher assessments of student performance (Johnson, 2013; Ready & Wright, 2011; Reeves et  al., 2001; Wyatt-Smith et al., 2010). Gingell and Winch (2000) also dispute Davis’s (1998) conception of rich knowledge, and they suggest it appears to be based on an amalgam of different positions, including adults’ understanding of concepts, the plasticity of cognitive abilities, and critiques of naïve empiricism. They proceed to argue that, although a test may assess partial understanding of a particular domain, which is relevant to the stage of development of the students taking the test, this does not negate the value of the assessment. It may not be practical or desirable to assess the type of rich knowledge envisaged by Davis. For example, when teachers are assessing the ability of early-stage primary school children to add numbers, it would not be sensible, or even possible, to test the children’s ability to add all possible types of numbers, including fractions, in all possible practical contexts. Nevertheless, useful age-specific information can be gleaned about the

2  PHILOSOPHICAL TENSIONS ASSOCIATED WITH EDUCATIONAL… 

19

children’s grasp of the concept of addition by, for example, asking them to calculate sums of single- or double-digit positive integers. Winch and Gingell (1996) therefore aver that, whilst tests of rich knowledge in the sense implied by Davis, may not be feasible, it is nevertheless possible to formulate high stakes tests with sufficient validity and reliability for educational accountability purposes. Several other scholars, such as Curren (2004, 2006) and Elgin (2004), have argued that Davis’s critique of the philosophical foundations of educational assessment is fraught with difficulties that undermine the plausibility of his concerns about the viability of high stakes testing of the most significant aspects of learning. For example, Curren (2006) contends that the thrust of Davis’s arguments would undermine not only high stakes testing, but also every aspect of human life that entails construing people as individuals with “specific, identifiable mental contents” (p. 19). Given that it is possible to become acquainted with the beliefs of others, Curren (2006) contends that Davis’s sceptical arguments to the contrary are fundamentally flawed and thus do not pose a threat to the high stakes testing enterprise. Furthermore, Curren (2006) draws upon the work of Donald Davidson and Daniel Dennett to make the point that, although Davis rejects the possibility of objectively assessing mental constructs underpinning students’ test-taking behaviour, cognitive states can in fact be attributed to an individual without any knowledge of their physical basis. Despite these challenges to Davis’s ideas, he remains resolute in his misgivings about educational assessment, as illustrated by the arguments he put forward in Educational Assessment on Trial (Davis et  al., 2015), which includes updated contributions to the assessment debate by both Davis and Winch, together with an introduction and conclusion by Gerard Lum. In this book, Davis reiterates his concerns about the possibility of contemporary approaches to assessment being able to measure students’ rich knowledge and problem-solving skills accurately and reliably, and he therefore reasons that it is problematic to use assessment outcomes for school accountability purposes. In response to Davis, Winch emphasises the importance of educational institutions being held accountable for the quality of their provision, and he intimates that accountability measures need to include some form of assessment of educational outcomes. Winch argues that all educational assessments have associated margins of error, but they should not be dismissed out of hand simply because they are inexact. However, he acknowledges that permitting teachers to undertake formative, classroom-based assessments of students’ knowledge and skills

20 

I. CANTLEY

may provide a more robust basis for gauging students’ capabilities than contemporary high stakes tests. Winch also challenges the status quo in the educational assessment arena by claiming that greater attention needs to be given to the actual purpose of the assessment (Davis et al., 2015). This is an extremely important point that appears to have been overlooked in policymaking circles since it is highly unlikely that a given assessment instrument would be able to adequately capture all possible dimensions of an educational endeavour. For example, it is unlikely that a single educational measure would be suitable for accurately and reliably summarising the extent of students’ capabilities, while simultaneously furnishing robust information that could be used for school accountability purposes, yet such duplicate purposes of assessment are implicit in the policies of many educational administrations. It is evident that Davis’s ideas about the limits of educational assessment are, at best, considered to be controversial and, at worst, fundamentally flawed. However, I share Davis’s concerns and, in fact, believe they should be further strengthened to highlight the philosophical tensions associated with constructing valid and reliable tests of both basic propositional knowledge and rich knowledge. Accordingly, in the following section, I use relevant insights from Wittgenstein’s later philosophy to problematise educational assessment practices, and to argue that there are conceptual difficulties associated with contemporary approaches to high stakes educational testing. Initially, the arguments in the next section use the rule-following aspect of Wittgenstein’s later philosophy to demonstrate the tensions associated with using a hypothetical high stakes test to measure students’ understanding of mathematical concepts, but the analysis could be generalised to other disciplines, as illustrated by the later literacy-­related examples.

Tensions Associated with One-off High Stakes Tests Wittgenstein’s (2009) work on rule following implies that philosophical conundrums arise if a student’s academic capability is considered to be determined by a finite mental entity, like an image or a formula, for example. This, in turn, leads to further dilemmas if a one-off high stakes test is considered to provide an appropriate method for assessing academic capability, as outlined below. Bruner (1996) intimated that learning entails developing the ability to use rules to “go beyond the information given” (p.  129). However, Wittgenstein (2009) contends that, if a student’s

2  PHILOSOPHICAL TENSIONS ASSOCIATED WITH EDUCATIONAL… 

21

capacity to follow a rule originates from a finite mental entity, then a paradoxical situation arises. According to Wittgenstein, if a mental entity such as an image or formula governs the student’s capacity to obey the rule, any action by the student could be considered to either accord with or contravene the rule if an appropriate interpretation is attached to the entity: This was our paradox: no course of action could be determined by a rule, because every course of action can be brought into accord with the rule. The answer was: if every course of action can be brought into accord with the rule, then it can also be brought into conflict with it. And so there would be neither accord nor conflict here. (Wittgenstein, 2009, §201)

For example, suppose that an item, M1, on a high stakes mathematics test asks a student to calculate the value of x2 when x = 3. Clearly, a conventional interpretation of the formula could be adopted by the student, that is, replace x by 3, then multiply 3 by itself, and they could state the answer is “9”. Instead, however, the student might interpret the formula in any number of unorthodox ways, and proffer answers like 6 (by finding the product of 3 and 2), 5 (by finding the sum of 3 and 2) or 8 (by calculating 2 × 2 × 2), to name but a few. To illustrate Wittgenstein’s paradox of interpretations further, consider an item, M2, which requires students to identify all the cubes in a diagram that includes both cubes and triangular prisms. It is tempting to suggest that a student’s capacity to correctly answer such a question is determined by a mental image of a cube, with which they can compare each shape in the diagram and identify those shapes that coincide with the mental image as cubes. To give this enticing causal model every opportunity of succeeding, Wittgenstein assumes the student has in mind a perfect skeleton outline of a cube with no irrelevant features. Surely, this explains the student’s capacity to discriminate cubes from triangular prisms? In other words, this mental image determines the student’s future categorisation skills with respect to cubes. However, there is a difficulty with the notion of a mental image forcing an application or applications on the student, and it turns out that the image cannot logically compel the student; the mental cube cannot be the source of the student’s categorisation skills: Well, suppose that a picture does come before your mind when you hear the word “cube”, say the drawing of a cube. In what way can this picture fit or fail to fit a use of the word “cube”?—Perhaps you say: “It’s quite simple; if

22 

I. CANTLEY

that picture occurs to me and I point to a triangular prism for instance, and say it is a cube, then this use of the word doesn’t fit the picture.”—But doesn’t it fit? I have purposely so chosen the example that it is quite easy to imagine a method of projection according to which the picture does fit after all. The picture of the cube did indeed suggest a certain use to us, but it was possible for me to use it differently. (Wittgenstein, 2009, §139)

Unfortunately, an unambiguous use of the word “cube” cannot be read off the image. The mental image of the cube cannot determine the student’s choices because, as it turns out, there are projections of a cube that can be fitted onto a triangular prism. “For when I reflect on matters, I see that it is quite easy to imagine another method of projecting the picture, e.g. one by which it fits a triangular prism” (McGinn, 1997, p. 84). The student might introspect their mental image of the skeleton cube and attach a non-orthodox interpretation to it. This (albeit highly unusual) student could interpret the mental image of the cube as defining a solid angle of π/2 steradians (see Fig. 2.1). Since the triangular prism can be similarly construed (see Fig. 2.1) then, on this interpretation, a triangular prism is in accord with the rule represented by the student’s mental image of the cube. The student might have introspected the mental cube and focused on the solid angle between the planes ABCD, ADHE and ABFE (see Fig.  2.1 [left]). This solid angle is π/2 steradians. The triangular prism (see Fig. 2.1 [right]) contains precisely the same solid angle between

G

F H

C

C D D

F

E E B

B A

Fig. 2.1  Possible interpretations of the word “cube”

A

2  PHILOSOPHICAL TENSIONS ASSOCIATED WITH EDUCATIONAL… 

23

planes ABCD, ADFE and ABE. Once again, there exists an interpretation of the student’s mental image whereby a triangular prism accords with it. Therefore, a reductive explanation proves unsustainable: the mental image by itself cannot determine a unique use. There exists an interpretation whereby the triangular prism is in accord with the student’s meaning of the word “cube”. As with the simple algebraic substitution example considered above, the notion of a mental entity (a formula or a skeleton cube) forcing a use on an individual is shown to be empty. Bloom’s (1956) taxonomy of educational objectives articulates a framework for categorising cognitive skills that reflects increasing complexity and ability to use higher order thinking skills, ranging from the lower level skills of knowledge recall and comprehension, to the higher level skills of analysis, synthesis, and evaluation. The items M1 and M2 referenced above are clearly examples of items that assess lower level skills. However, the same difficulties persist, and are further compounded, if one considers a more cognitively demanding task which assesses higher level skills. Consider, for example, an open-ended task, M3, which asks students to investigate the sequence of dots shown in Fig. 2.2. Students may approach the task in a myriad of different ways. For example, they may simply extend the sequence by drawing a few further figures in the sequence, or they could count the number of dots to form a numerical sequence, and then predict the next few terms of the numerical sequence. Furthermore, some students may endeavour to find a formula for the number of dots in the nth term of the numerical sequence, and some may attempt to justify the formula with reference to the geometry of the figures. At a more sophisticated level, some may endeavour to extend the problem to investigate a three-dimensional sequence consisting of L-shaped figures comprised of dots, or they may investigate similar sequences of dots for letters of the alphabet other than L. At each stage, a

Fig. 2.2  L-shaped sequence of dots

24 

I. CANTLEY

given student could attach any number of unconventional (incorrect) interpretations to the rule(s) they are attempting to follow at that stage, or they could adopt the conventional (correct) interpretation of the rule(s). The failure of a rule (e.g., formula or mental image) to offer guidance to the student on its application cannot be circumvented by suggesting that the rule-follower must have the capacity to interpret the rule in the correct manner. Unfortunately, an infinite regress ensues because any interpretation of the rule could itself be variously interpreted: If it [the rule] requires interpretation, that could be done in lots of ways. So how do I tell which interpretation is correct? Does that, for instance, call for a further rule—a rule for determining the correct interpretation of the original—and if so, why does it not raise the same difficulty again, thereby generating a regress? (Wright, 2001, p. 163)

Furthermore, this conundrum is not averted by positing the existence of a Platonic object in the mind of the student that does not require interpretation, but somehow grants access to all possible rule applications. Wittgenstein rejected mathematical Platonism as a plausible explanation of rule-following behaviour: “The mathematician is an inventor, not a discoverer” (Wittgenstein, 1978, I, §168). Wittgenstein’s work implies that, in advance of a student offering a response to a test item, such as the elementary algebra item, M1, the cube identification item, M2, or a particular step within the sequence item, M3, criteria do not exist for ascertaining if the relevant rule has been applied in the correct manner. This is equivalent to saying that the student is both correct and incorrect before they proffer an answer to the question. According to Wittgenstein, it is impossible to privately (i.e., mentally) follow a rule since criteria do not exist for its correct use in the mental realm. Rather, Wittgenstein posits that correct or incorrect uses of a particular rule are determined by well-established practices: “Following a rule” is a practice. And to think one is following a rule is not to follow a rule. And that’s why it’s not possible to follow a rule “privately”; otherwise, thinking one was following a rule would be the same thing as following it. (Wittgenstein, 2009, §202)

For example, in the elementary algebra item, M1, it is after the student gives the answer “9”, or an alternative response, that the relevant criteria

2  PHILOSOPHICAL TENSIONS ASSOCIATED WITH EDUCATIONAL… 

25

for the mathematical practices of substituting a numerical value into an algebraic expression, and squaring a number, are employed to judge the student’s answer to be correct or incorrect. The answer “9” would be deemed right, whereas any option other than “9” would be considered wrong since it would fail to adhere to conventional mathematical practices. After the student answers the test item, they transition from having indeterminate capability relative to the item (since they could be construed as being right and wrong) to having definite capability relative to the item (since they are now deemed to be right or wrong). An analogous argument applies in respect of the cube identification item, M2, and each step attempted by the student in the sequence item, M3. Prior to the student answering M2 or offering an answer to an attempted step in M3, their capability relative to the item is indeterminate, but it becomes definite when an answer is given. However, the difficulties associated with assessing a higher level cognitive task such as the sequence item, M3, are further compounded by the multiplicity of interpretations an assessor may attach to the requirements of the task, or a student’s response to a particular stage of it (Wittgenstein, 2009), in relation to what the assessor deems to constitute a credit-worthy answer. Given that any mark scheme necessarily consists of a finite number of acceptable answers, and that different assessors may attach different interpretations to those answers and/or students’ actual responses, it will be impossible to guarantee a consistent approach to assessment by different assessors, or perhaps even by any given assessor, thus compromising the reliability of assessment judgements. Goodman (1972) highlighted the ambiguities associated with assessing the similarity of two actions, which in the context of item M3, could be the answer suggested in a mark scheme and a given student’s answer. Goodman stressed that it is inappropriate for similarity judgements to be predicated solely on subjective or idiosyncratic factors but, rather, they should be grounded in a robust conceptual framework that incorporates objective criteria or standards. However, such a framework is unlikely to exist for an open-ended test item such as M3, not least because of the multiplicity of different approaches that students may take to approaching the item. Clearly, this would raise difficulties in relation to ensuring consistency in the marking of the item between different assessors, or by the same assessor on different occasions. This directly aligns with the difficulties associated with reliably assessing rich knowledge and skills that were raised by Davis (1998).

26 

I. CANTLEY

Now consider an item, L1, on a high stakes literacy test, which asks candidates to explain the meaning of the word that is underlined in the following sentence: “The car is in pristine condition.” A candidate could interpret the underlined word in the conventional manner, and write the answer “immaculate”, for example, which would be deemed to be correct by the examiner. Instead, the candidate could attach any one of a number of unconventional interpretations to the word and offer responses such as “it is sparkly”, “it is expensive”, or an infinite number of other possibilities, which would be considered to be incorrect. By applying the same logic as for the exemplar mathematics test items considered above, a candidate would have indeterminate capability with respect to the item L1 before they respond to it. However, the capability in question would become definite when a response is given. A more cognitively demanding item, L2, which requires students to analyse a poem, for example, would suffer from similar difficulties to the sequence item, M3, which was considered previously. The correctness, or otherwise, of each statement or claim made by a student in response to L2 would be judged by invoking relevant disciplinary practices, with the student transitioning from indeterminate to determinate capability with respect to a given statement or claim after it has been offered. Any mark scheme for L2 can only include a finite number of acceptable points, and different assessors may attach different interpretations to those points and/or the points made by students. The item L2 also suffers from the same affliction as item M3 in relation to the difficulties associated with assessing the similarity of a student’s answer and the acceptable answers identified in the mark scheme. Therefore, it will be impossible to guarantee a consistent approach to assessment of the item between different assessors, and potentially even by any individual assessor. This could therefore undermine the reliability of assessment outcomes in a similar fashion to the mathematical example on sequences, M3. It could be argued that Wittgenstein’s views are aligned with behaviourism because of his focus on behavioural criteria, rather than mental states, to adjudge the correctness or otherwise of the application of a rule. However, Wittgenstein denies that he is a behaviourist: “Aren’t you nevertheless a behaviourist in disguise? Aren’t you nevertheless basically saying that everything except human behaviour is a fiction?—If I speak of a fiction, then it is of a grammatical fiction” (Wittgenstein, 2009, §307). Rather than rejecting that mental states and processes exist, Wittgenstein makes the point that inner, mental states are non-separable from outer,

2  PHILOSOPHICAL TENSIONS ASSOCIATED WITH EDUCATIONAL… 

27

public behaviour: “An ‘inner process’ stands in need of outward criteria” (Wittgenstein, 2009, §580). Wittgenstein’s contention that an entity in the mental realm, such as a formula or image, cannot compel a student to give a particular response to a question demonstrates his rejection of a direct causal relationship between the student’s inner, mental states and their public behaviour. Instead, the following quotation demonstrates that Wittgenstein’s views lie between the extreme positions of cognitivism and behaviourism: “It’s [a mental state] not a Something, but not a Nothing either! The conclusion was only that a Nothing would render the same service as a Something about which nothing could be said” (Wittgenstein, 2009, §304). Here, Wittgenstein is denying that, when a student follows a rule, they are guided by an object in the mental realm (a “something”) that is masked by behaviour, or the possibility of rule-following being entirely characterised by public behaviour, thus meaning that inner mental states equate to a “nothing”. Indeed, Wittgenstein suggests that any quest to discover latent determinants of the public behaviour corresponding to private, mental states is futile: Now we try to get hold of the mental process of understanding, which seems to be hidden behind those coarser, and therefore more readily visible, concomitant phenomena. But it doesn’t work; or, more correctly, it does not get as far as a real attempt. For even supposing I had found something that happened in all those cases of understanding, why should that be the understanding? … And if I say it is hidden—then how do I know what I have to look for? I am in a muddle. (Wittgenstein, 2009, §153)

The fact that a student’s capability with respect to a given question (e.g., M1, M2, M3, L1 or L2 alluded to previously) only becomes definite when the question is answered by the student suggests that the examiner is both a participant and an observer in the assessment process. In the process of choosing particular questions for the test paper, and omitting others, the examiner is defining the capability assessed by the test paper. Therefore, the test is not measuring innate attributes of the examinees, but those attributes relative to the domain from which the questions that feature on the test paper are selected. Interestingly, this resonates with the views of Richardson (1999), who argues that IQ scores obtained from IQ tests do not describe innate attributes of people, but rather the tests actually create the observed attributes:

28 

I. CANTLEY

In all these ways, then, we find that the IQ-testing movement is not merely describing properties of people: rather, the IQ test has largely created them. (p. 40) As in the construction of the tests themselves, you don’t get what you see: you get what you want to see. (p. 45)

These concerns could be dismissed because student performance on individual test items is not particularly informative, but their aggregate performance on multiple items is likely to give a good indication of their academic capability. However, summing individual item scores to yield total scores is problematic as it presupposes that all items are assessing a common underlying construct. The Wittgensteinian reasoning presented above suggests that a well-defined construct of this type may not actually exist. It was established earlier that whether a student’s answer to a given test item is correct or incorrect is determined by judging how it compares with the answer dictated by the applicable practice(s), and that the student’s capability relative to the item is uncertain prior to an answer being proffered. Thus, the measured level of the student’s capability is influenced by the act of measuring in the sense that the measurement is a unified characteristic of the student and the measuring instrument, that is, the test items. Therefore, it appears that the capability of the student cannot be considered to exist as a thing-in-itself but can only be meaningfully construed relative to the domain from which the test items were drawn at the time when the test was taken (Cantley, 2015, 2017, 2019, 2023). Clearly, the arguments I have advanced in this section may lead to the accusation of radical scepticism in relation to the minds of others, or even in relation to self-knowledge. However, I reject such an accusation for the reasons I will now explain. On a social level, if an individual knows little about a given topic, and consistently demonstrates their ignorance over a prolonged period, but they demonstrate some familiarity with the topic on a single isolated occasion, it is highly unlikely that the relevant epistemic community would perceive the individual to be knowledgeable in the topic area. By the same token, if an individual has extensive knowledge of the topic, and consistently demonstrates their knowledge over an extended time interval, but they make erroneous claims on one particular occasion, it is unlikely that the relevant epistemic community would denigrate them and come to view them as an ignoramus. In such instances, the decisions of the epistemic community would usually be informed by the temporally

2  PHILOSOPHICAL TENSIONS ASSOCIATED WITH EDUCATIONAL… 

29

extended performance of the individual in the relevant area, rather than the individual’s performance on a single occasion. However, in the case of high stakes tests, judgements are made about examinees based on one-off performances, rather than temporally extended performances. Whilst I have argued that measurements of capabilities are non-separable from the relevant measuring instruments (i.e., test papers), my arguments pertain to discrete tests taken at a particular point in time. To address the philosophical tensions which I have highlighted in contemporary approaches to high stakes testing, I suggest that it may be preferable to introduce a system of continuous assessment rather than one-off high stakes tests at a particular point in time. Such a system may admit the possibility of consistent patterns emerging in test performance and could also furnish teachers with formative assessment information to guide their subsequent pedagogical approaches in a manner that could enhance students’ learning. I believe an approach of this type could potentially offer the prospect of more robust inferences being made from assessment outcomes about students’ academic capabilities. Therefore, an important aspect of my critique of testing regimes pertains to their limitations in relation to making fair judgements about examinees’ capabilities based on their performance in a one-off test at a particular point in time. There is the possibility of students under-performing in a one-off test if, for a myriad of reasons, they simply have a “bad day”. Likewise, it is conceivable that a student could over-perform in a one-off test, in the sense that their performance is better than their usual standard. In the case of test items that assess higher level cognitive skills, such as open-ended mathematical problem-solving or extended writing tasks, I have also demonstrated that the reliability of assessment judgements by different assessors could be compromised by the difficulties associated with the necessarily finite nature of the options that can be included within mark schemes and differences in markers’ professional judgement (Sherwood, 2022). It is important to note such difficulties would persist even if a system of continuous assessment using conventional tests were adopted. I revisit this aspect of assessment in the concluding chapter of the book, where I suggest a possible solution to the dilemma, and more fully articulate my proposed solutions to the limitations of contemporary approaches to high stakes testing. The ideas in this section have been developed with reference to hypothetical high stakes tests to measure students’ mathematical and literacy skills, but the analysis could be generalised to other disciplines, including the sciences, humanities, and creative arts. I established earlier that the

30 

I. CANTLEY

standard of an answer to a test item is determined by drawing comparisons between the answer and well-established disciplinary practices. Such practices are just as relevant for other subjects as they are for the mathematical and literacy examples discussed above. For example, there are established practices pertaining to such things as composing a good piece of music, which are invoked to make judgements about the standard of a student’s response to a composition task in music. Furthermore, similar difficulties to what have been outlined in the context of mathematics and literacy exist in relation to reliably assessing students’ responses to tasks that assess higher level cognitive skills within other disciplines. It is important to highlight that, although the results of academic achievement tests, whatever their format, may well correlate with future academic achievement and other outcomes, such as earning potential, there are other factors that impact on future outcomes. Most notably, softer skills such as intrapersonal and interpersonal competence have a bearing on longer-term outcomes such as employment prospects (Heckman & Kautz, 2012; Parts et  al., 2013; Stal & Paliwoda-Pękosz, 2019). However, academic achievement tests tend to attach less priority to assessing soft skills and to focus on the assessment of academic competence (Heckman & Kautz, 2012). Therefore, caution needs to be exercised in relation to relying solely on the results of high stakes tests of academic achievement to predict future outcomes.

Summary In the present chapter, I have reviewed some existing philosophical perspectives on educational assessment, particularly in relation to the work of Andrew Davis and various responses to Davis’s work. Furthermore, I have strengthened Davis’s critique of assessment by arguing that there are conceptual difficulties associated with assessing both lower and higher level cognitive skills using one-off tests. I have suggested that a temporally extended system of continuous assessment could potentially permit more warranted inferences to be made about students’ academic capabilities than is possible using a one-off high stakes test. However, I have cautioned against relying solely on the results of assessments of academic capabilities to predict students’ future outcomes. In the next chapter, I consider the implications of the critique of educational assessment that I have advanced in this chapter for the results of high stakes public examinations of the type that currently feature in the education systems of the United Kingdom and numerous other countries.

2  PHILOSOPHICAL TENSIONS ASSOCIATED WITH EDUCATIONAL… 

31

References Bloom, B.  S. (1956). Taxonomy of educational objectives handbook 1: Cognitive domain. Longmans. Bruner, J. (1996). The culture of education. Harvard University Press. Cantley, I. (2015). How secure is a Newtonian paradigm for psychological and educational measurement? Theory & Psychology, 25(1), 117–138. https://doi. org/10.1177/0959354314561141 Cantley, I. (2017). A quantum measurement paradigm for educational predicates: Implications for validity in educational measurement. Educational Philosophy and Theory, 49(4), 405–421. https://doi.org/10.1080/00131857.2015.1048668 Cantley, I. (2019). PISA and policy-borrowing: A philosophical perspective on their interplay in mathematics education. Educational Philosophy and Theory, 51(12), 1200–1215. https://doi.org/10.1080/00131857.2018.1523005 Cantley, I. (2023). Replicable quantitative psychological and educational research: Possibility or pipe dream? Educational Philosophy and Theory, 55(1), 111–121. https://doi.org/10.1080/00131857.2022.2090926 Cizek, G. J. (2009). Reliability and validity of information about student achievement: Comparing large-scale and classroom testing contexts. Theory Into Practice, 48(1), 63–71. https://doi.org/10.1080/00405840802577627 Cizek, G. J. (2012). Defining and distinguishing validity: Interpretations of score meaning and justifications of test use. Psychological Methods, 17(1), 31–43. https://doi.org/10.1037/a0026975 Curren, R. R. (2004). Educational measurement and knowledge of other minds. Theory and Research in Education, 2(3), 235–253. https://doi.org/10.1177/ 1477878504046517 Curren, R. (2006). Connected learning and the foundations of psychometrics: A rejoinder. Journal of Philosophy of Education, 40(1), 17–29. https://doi. org/10.1111/j.1467-­9752.2006.00491.x Davis, A. (1998). The limits of educational assessment. Blackwell. Davis, A., Winch, C., & Lum, G. (2015). Educational assessment on trial. Bloomsbury Academic. Elgin, C.  Z. (2004). High stakes. Theory and Research in Education, 2(3), 271–281. https://doi.org/10.1177/1477878504046522 Gingell, J., & Winch, C. (2000). Curiouser and curiouser: Davis, White and assessment. Journal of Philosophy of Education, 34(4), 687–695. https://doi. org/10.1111/1467-­9752.00202 Goodman, N. (1972). Seven strictures on similarity. In N.  Goodman (Ed.), Problems and projects (pp. 437–447). Bobbs-Merrill Co. Heckman, J. J., & Kautz, T. (2012). Hard evidence on soft skills. Labour Economics, 19(4), 451–464. https://doi.org/10.1016/j.labeco.2012.05.014

32 

I. CANTLEY

Johnson, S. (2013). On the reliability of high-stakes teacher assessment. Research Papers in Education, 28(1), 91–105. https://doi.org/10.1080/0267152 2.2012.754229 Kellaghan, T., & Greaney, V. (2020). Public examinations examined. World Bank. https://openknowledge.worldbank.org/bitstream/handle/10986/ 32352/9781464814181.pdf?sequence=2&isAllowed=y McGinn, M. (1997). Routledge philosophy guidebook to Wittgenstein and the philosophical investigations. Routledge. Messick, S. (1989). Meaning and values in test validation: The science and ethics of assessment. Educational Researcher, 18(2), 5–11. https://doi.org/10.310 2/0013189x018002005 Messick, S. (1995). Validity of psychological assessment: Validation of inferences from persons’ responses and performances as scientific inquiry into score meaning. American Psychologist, 50(9), 741–749. https://doi.org/10.1037/0003-­ 066x.50.9.741 Parts, V., Teichmann, M., & Rüütmann, T. (2013). Would engineers need non-­ technical skills or non-technical competences or both? International Journal of Engineering Pedagogy, 3(2), 14–19. https://doi.org/10.3991/ijep.v3i2.2405 Passmore, J. (1980). The philosophy of teaching. Duckworth. Ready, D. D., & Wright, D. L. (2011). Accuracy and inaccuracy in teachers’ perceptions of young children’s cognitive abilities: The role of child background and classroom context. American Educational Research Journal, 48(2), 335–360. https://doi.org/10.3102/0002831210374874 Reeves, D.  J., Boyle, W.  F., & Christie, T. (2001). The relationship between teacher assessments and pupil attainments in standard test tasks at key stage 2, 1996-98. British Educational Research Journal, 27(2), 141–160. https://doi. org/10.1080/01411920120037108 Richardson, K. (1999). The making of intelligence. Weidenfeld & Nicolson. Sherwood, D. (2022). Making the mark. Canbury Press. Stal, J., & Paliwoda-Pękosz, G. (2019). Fostering development of soft skills in ICT curricula: A case of a transition economy. Information Technology for Development, 25(2), 250–274. https://doi.org/10.1080/0268110 2.2018.1454879 White, J. (1999). Thinking about assessment. Journal of Philosophy of Education, 33(2), 201–211. https://doi.org/10.1111/1467-­9752.00131 Winch, C., & Gingell, J. (1996). Educational assessment: Reply to Andrew Davis. Journal of Philosophy of Education, 30(3), 377–388. https://doi.org/10.1111/ j.1467-­9752.1996.tb00407.x Wittgenstein, L. (1978). Remarks on the foundations of mathematics (3rd ed., G. H. von Wright, R. Rhees, & G. E. M. Anscombe, Eds. & G. E. M. Anscombe, Trans.). Blackwell.

2  PHILOSOPHICAL TENSIONS ASSOCIATED WITH EDUCATIONAL… 

33

Wittgenstein, L. (2009). Philosophical investigations: The German text with an English translation (Edited by P.  M. S.  Hacker & J.  Schulte; translated by G. E. M. Anscombe, P. M. S. Hacker & J. Schulte; 4th edition). Wiley-Blackwell. Wright, C. (2001). Rails to infinity. Harvard University Press. Wyatt-Smith, C., Klenowski, V., & Gunn, S. (2010). The centrality of teachers’ judgement practice in assessment: A study of standards in moderation. Assessment in Education: Principles, Policy & Practice, 17(1), 59–75. https:// doi.org/10.1080/09695940903565610

CHAPTER 3

Implications of Educational Assessment Critique for Public Examinations

Abstract  In formal education, the end of the school year usually heralds high stakes examinations, and the results of these significantly influence subsequent educational or vocational pathways for many students around the world. Such examinations have traditionally been viewed as an equitable, impartial, and meritocratic approach to assessing students’ capabilities, and they are considered to offer a means of accessing opportunities to those who may be disadvantaged by any other form of assessment. Despite the purported meritocratic benefits of high stakes public examinations, some researchers have noted issues pertaining to the inferences that can be drawn from their results. In this chapter, I consider potential ways in which the critique of educational assessment that I articulated in Chap. 2 may help to explain some of these issues. Keywords  Accuracy • Public examination results • Reliability

Introduction In formal education, the end of the school year usually heralds high stakes examinations, and the results of these significantly influence subsequent educational or vocational pathways for many students around the world. Such examinations have traditionally been viewed as an equitable,

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 I. Cantley, The Philosophical Limitations of Educational Assessment, https://doi.org/10.1007/978-3-031-47021-9_3

35

36 

I. CANTLEY

impartial, and meritocratic approach to assessing students’ capabilities, and they are considered to offer a means of accessing opportunities to those who may be disadvantaged by any other form of assessment (Kellaghan & Greaney, 2020). Despite the purported meritocratic benefits of high stakes public examinations, some researchers have noted issues pertaining to the inferences that can be drawn from their results. For example, according to Lavy et al. (2016), “[o]ur analysis highlights that high stakes exams provide measures of student quality that may be imprecise and misleading” (p. 1). Indeed, Coe et al. (2008) estimated that academic selection tests for grammar schools could lead to 22% of students being mis-allocated to different school types, with 11% incorrectly denied grammar school places and 11% wrongly admitted to grammar schools. In 2018, the government body responsible for regulating public examinations in England, Ofqual, undertook an analysis of the reliability of public examination grades awarded by four different examination boards in the summer 2017 examination series (Ofqual, 2018). In this analysis, Ofqual calculated the probability of a candidate being awarded what they termed the “definitive grade” in a range of subjects, which they defined as the grade that would be awarded if the candidate’s examination script had been marked by a senior examiner. Grade reliabilities for each subject were determined by calculating the percentage of grades awarded based on marking of examination scripts by an ordinary examiner that coincided with the definitive grades. The resulting grade reliabilities varied by subject and ranged from 96% for mathematics to just 52% for English language and literature. Between these extremes, estimated grade reliabilities included 85% for biology, 65% for geography and 56% for history (Ofqual, 2018; Sherwood, 2022). Although Ofqual (2018) did not quantify the overall grade reliability across all subjects, Sherwood (2022) estimated this to be approximately 75%. The range of grade reliabilities presented by Ofqual (2018), coupled with Sherwood’s (2022) estimate of overall grade reliability, raises serious concerns about the processes involved in high stakes testing of students’ capabilities for certification purposes in public examinations. It is extremely problematic that, on average, across all disciplines, and at all levels, around one grade in every four grades awarded is deemed to be incorrect. Aspects of Wittgenstein’s writings on rule following were invoked in Chap. 2 to problematise contemporary high stakes educational assessment approaches and highlight tensions associated with the validity and reliability of assessment outcomes. In particular, the philosophical tensions

3  IMPLICATIONS OF EDUCATIONAL ASSESSMENT CRITIQUE FOR PUBLIC… 

37

associated with reliably assessing higher level cognitive skills were articulated. The current chapter addresses the implications of the critique of educational assessment that I propounded in Chap. 2 for the robustness of high stakes examination results, with a view to explaining the lack of reliability in public examination grades alluded to above and other purported inaccuracies in assessment judgements.

Validity and Reliability of Public Examination Results In Chap. 2, the concepts of validity and reliability were introduced, and their importance in relation to the credibility of educational measurement was emphasised. The assessment literature places considerable emphasis on validity and reliability in the context of standardised tests of various psychological traits such as IQ. The items on standardised tests are chosen to appropriately discriminate between candidates who possess different levels of the trait being measured, with the ultimate objective of rank ordering the candidates. Tests of this type, which permit a candidate’s test result to be compared to the results of others who take the same test, are referred to as norm-referenced tests. In such tests, it is common practice to omit items that do not sufficiently differentiate between candidates despite the fact the items do assess the relevant trait, such as very straightforward or very demanding items. Public examinations, which are high stakes tests of academic attainment administered at the end of a course of study for certification purposes, and often to assess the suitability of candidates for either vocational or further educational opportunities, tend to attach greater importance to criterion-referenced modes of assessment. Criterion-referenced tests focus on assessing the extent to which candidates have mastered pre-specified, curriculum-aligned assessment criteria rather than rank ordering and comparing the candidates. However, in practice, at least some element of norm-referencing features in public examination practices to ensure some degree of consistency of results profiles between different examination series. It is noteworthy that public examination results are generally less likely than the results of standardised tests to be subjected to high levels of sophisticated statistical analysis to ensure their validity and reliability (Kellaghan & Greaney, 2020). This is perplexing since the former are likely to have greater import than the latter for students’ future careers in many jurisdictions around the world.

38 

I. CANTLEY

Nevertheless, some researchers have critically engaged with validity and reliability-related issues in the context of public examinations. Some scholars have alluded to the threats to the validity of public examinations that are engendered when the assessments employed do not sufficiently align with, or underrepresent, the content/skill domain being examined (Kellaghan & Greaney, 2020). Several skills and capabilities that curricula aim to develop may not be amenable to measurement by written examinations that are taken under strictly controlled conditions and within restrictive timeframes. Consequently, candidate performance in an examination may not give an accurate representation of their mastery of the curriculum objectives the examination was designed to assess. As Black (1996) argues, when a curriculum is specified in detail, a very significant amount of time would be required to accurately assess it. The underrepresentation of the content/skill domain being examined is particularly pertinent in the context of written examinations taken simultaneously by large groups of candidates under similar conditions. Davis (1998) noted that this form of high stakes testing is beneficial for improving the reliability of assessment judgements, but it simultaneously reduces the extent to which all relevant content and skills can be assessed, thus compromising the validity of the assessment. On the other hand, any attempt to improve validity by extending the scope of the examinations to, for example, assess rich knowledge and skills, such as open-ended problem-solving capabilities, may lead to corresponding reductions in the reliability of assessment outcomes, as alluded to in Chap. 2. A further consequence of these restrictions in the target content domains assessed by high stakes public examinations is so-called teaching to the test, whereby learning and teaching are mainly targeted at the content and skills that are anticipated to be tested in the examinations. This in turn leads to a situation where candidates’ examination performance does not accurately reflect the extent of their competence in the intended curriculum objectives simply because they have not been given opportunities to acquire the relevant knowledge and skills (Kane, 2001), thus further compromising the validity of the examinations. It is unsurprising, therefore, that some studies have found limited correlation between candidates’ performance in public examinations used for the selection of university students and the standard of the university degrees they ultimately obtain, or their performance upon graduation from their courses (European Parliament, 2014). Such findings fuel concerns about the predictive validity of high stakes public examinations, thus

3  IMPLICATIONS OF EDUCATIONAL ASSESSMENT CRITIQUE FOR PUBLIC… 

39

leading to misgivings about the extent to which the assessment methods used in such examinations accurately measure potential. Notwithstanding these concerns about the validity of the examinations themselves, Lavy et al. (2016) also posit that the fact high stakes examination results just represent a snapshot of candidates’ knowledge and skills at a particular point in time can precipitate underperformance due to circumstances beyond their control. These include, for example, such things as illness or sleep deprivation and occasions when students simply have a “bad day” (p. 1). Although public examinations are based on the premise that all candidates receive the same treatment, most examination authorities have arrangements in place to permit, under certain circumstances, variations of examination procedures for candidates with disabilities and/or special educational needs, such as those with hearing or sight impairments, or specific learning difficulties (Kellaghan & Greaney, 2020). Subject to approval by the relevant examination authority, reasonable adjustments may be made to the examination arrangements to allow such candidates to demonstrate their capabilities under slightly different conditions from the majority of candidates, but also ensuring the integrity of the examination process. The adjustments, which may include such things as exemptions from some examination components, additional time to complete examinations or enlarged question papers, for example, are intended to ameliorate the impact of a candidate’s disabilities or needs on their performance, while simultaneously ensuring they are not given an unfair advantage. In addition to reasonable adjustments for candidates with disabilities or special educational needs, most examination authorities also have special consideration protocols in place, whereby concessions can be granted to candidates who have been temporarily impacted by illness, an accident, or some other circumstances beyond their control, such as a bereavement (Ofqual, 2022). The adverse circumstances usually need to have arisen either immediately before or during the examination and require authentication using appropriate evidence. The support available for candidates with verified unfavourable circumstances include the award of extra marks to compensate for underperformance due to their situation, or the award of a qualification even if all components of the relevant examination have not been completed. Despite the widespread existence of reasonable adjustments and special consideration protocols, I am sympathetic to Lavy et al.’s concerns about candidates having a “bad day” for a myriad of reasons, not all of which can

40 

I. CANTLEY

be adequately dealt with through reasonable adjustments or special consideration applications. For example, a candidate could misconstrue what is required in response to some questions due to the way in which they are framed, or they may run out of time to complete the examination. In such cases, it is conceivable that a situation may arise where the candidate’s performance in the examination does not accurately reflect their capability level. I am also cognisant of the restricted content domain that can be assessed in a single written examination. Accordingly, I shall now explicate how the critique of educational assessment propounded in Chap. 2 has implications for contemporary approaches to high stakes public examinations. Aspects of Wittgenstein’s later philosophy pertaining to rule following were used in Chap. 2 to argue that capabilities are relational rather than innate attributes of students. The point was made that the score a student obtains in a high stakes test, which is taken at a particular point in time, and necessarily assesses just a subset of the content associated with a qualification specification, is a measure of the student’s capability relative to the test paper utilised for the assessment. Given that the criteria of correctness for responses to test items are governed by established external practices, a student’s score on the test is a joint property of the student and the test paper, rather than an innate characteristic of the student. Thus, there are philosophical tensions associated with abstracting the total score away from the test paper and using it to make inferences about the student’s capability as a thing-in-itself. Given these tensions, it is entirely possible that some students could be treated unfairly in the qualification awarding process. Such a situation would manifest itself in the event of a discrepancy between the score awarded to a student in the high stakes test and the score that would accurately reflect the student’s capability level, as determined by a more robust procedure than a one-off high stakes test. I construe one-off high stakes tests to include those tests/examinations that consist of separate components or papers taken at slightly different times. The discrepancy may be associated with the student being awarded either a lower or a higher score than they deserve. The problem is further exacerbated in situations where the results of the high stakes examinations are reported as grades rather than scores, for example, the General Certificate of Secondary Education (GCSE) examinations, which are taken by students towards the end of their compulsory post-primary education, at approximately 16 years of age, in the United Kingdom. Similar difficulties afflict the General Certificate of Education

3  IMPLICATIONS OF EDUCATIONAL ASSESSMENT CRITIQUE FOR PUBLIC… 

41

(GCE) Advanced Level (A-level) school-leaving examinations in the United Kingdom, which are usually taken at around 18 years of age, the results of which are key determinants of higher education admission. Given that, in such examinations, cut scores are used to delineate different grades, it is feasible that grading practices predicated upon the results of high stakes tests may have even greater potential to lead to the unfair treatment of some students. Since the score obtained by a student determines their grade, a score that is not reflective of their capability level could conceivably lead to the award of an inappropriate grade. The student could be awarded either a lower or higher grade than they deserve and, depending upon the cut scores for the various grade boundaries, the erroneously awarded grade could misrepresent the student’s capability level, perhaps by more than one grade level. A common criticism of some high stakes public examinations is that they tend to prioritise the assessment of lower level cognitive skills, such as the recall of factual knowledge, rather than higher level skills that require the analysis, synthesis and evaluation of information (Kellaghan & Greaney, 2020). For example, some examinations may prioritise assessing the capacity of candidates to recall, and apply, basic facts and principles, but fail to test their ability to make inferences or to devise a strategy for solving an open-ended problem. As outlined in Chap. 2, Bloom’s (1956) taxonomy of educational objectives articulates a framework for cognitive skills that reflects increasing complexity and ability to use higher order thinking skills, ranging from the lower level skills of knowledge recall and comprehension, to the higher level skills of analysis, synthesis, and evaluation. Some scholars, such as Davis (1998), have stressed that the focus of high stakes public examinations on lower level cognitive skills improves the reliability of the assessments at the expense of their validity. However, by taking the instrumental view of education, it appears reasonable to suggest that modern education systems should aim to foster the acquisition and development of higher level cognitive skills to prepare students for employment in competitive industrial economies. If this is accepted, it would seem logical that public qualifications should seek to give some indication of the standard of candidates’ higher level cognitive skills. Davis (1998) has alluded to the tensions associated with reliably quantifying candidates’ levels of mastery of such skills and, as I have articulated in Chap. 2, I share his concerns in this area. It is instructive to return to the grade reliability statistics that I alluded to in the introductory section of the current chapter, and to consider the

42 

I. CANTLEY

reasons for the apparent lack of reliability. In referring to “marking consistency metrics” in its title, Ofqual (2018) seems to tacitly imply that the low levels of grade reliability are somehow attributable to the quality of marking of candidates’ scripts. In other words, if the quality of marking were higher, if examiners were better trained, and if examination boards had more stringent quality control procedures, the grade reliability estimates would be much higher. In January 2018, at an event organised by the Education Policy Institute (EPI) to specifically discuss public examination grade (un)reliability, an Ofqual representative explicitly referenced quality of marking in the context of grade reliability: Ofqual explained that there are a range of reasons for variation in marking— these could be categorised into four types: procedural error (e.g. not marking all the pages of an answer); attentional error (concentration lapses by examiners); inferential uncertainty (insufficient evidence provided by the candidate for the examiner to reach a definitive judgement) and definitional uncertainty (there is a range of legitimate marks allowed by the mark scheme because of a lack of tight definition of the construct to be rewarded). (EPI, 2018)

The final reason for variations in marking standards proffered by Ofqual in the previous quotation could be construed as an implied criticism of examination boards for failing to tightly define their mark schemes. However, I contend there is a more profound reason for the “definitional uncertainty” that means such uncertainty would endure even for the most tightly defined mark scheme, where the examination scripts are marked to an extremely high standard by very experienced examiners. Public examination papers frequently invite candidates to communicate their knowledge and skills by giving extended written answers to open-­ ended questions. Indeed, even in subjects such as mathematics, some questions require candidates to explain their mathematical reasoning. The philosophical dilemmas associated with reliably assessing such higher level cognitive skills were addressed in Chap. 2. More specifically, it was emphasised that the multiple interpretations an examiner may attach to the requirements of a particular examination question, or a candidate’s response to it (Wittgenstein, 2009), in relation to what the examiner deems to constitute a credit-worthy answer, has the potential to lead to inconsistencies in the marking of the question. Any mark scheme necessarily consists of a finite number of acceptable answers, and different

3  IMPLICATIONS OF EDUCATIONAL ASSESSMENT CRITIQUE FOR PUBLIC… 

43

examiners may attach different interpretations to the requirements of the question, the answers included in the mark scheme and/or candidates’ actual responses. In addition, it would be virtually impossible to ensure the mark scheme contains an appropriately robust conceptual framework with objective criteria or standards for assessing the similarity of candidates’ answers to answers suggested in the mark scheme. It will therefore be impossible to guarantee a totally consistent approach to assessment of the question by different examiners, or perhaps even by any given examiner, thus potentially compromising the reliability of marks awarded to the question. Furthermore, even if the mark scheme for the question contains substantial detail and lists many credit-worthy answers, with elaborate further explanations pertaining to each possible answer, definitional certainty will be unattainable. This is because of the multiple interpretations an examiner may attach to the further explanations and/or a given candidate’s response to the question. Consequently, a particular candidate’s response to a question that assesses higher level cognitive skills does not have an associated unique, correct mark, m, say. Rather, the candidate’s response to the question could theoretically be awarded any mark within a range, from m  −  δ to m + δ, say, with each mark in this range representing the expert opinion of a very experienced examiner. None of the marks within the range is more correct than any other mark within the range. Such a situation can arise even when the examiners marking the question do not make any mistakes, and when the examination board has stringent quality control procedures. Clearly, this inherent lack of certainty in the mark to be awarded to a particular question is further compounded when an examination paper contains multiple questions that assess higher level cognitive skills. This, in turn, may lead to a situation whereby a candidate could be awarded any total mark for the paper within an even wider range of marks than for any individual question, say, M − ∆ to M + ∆, where ∆ > δ, even in the absence of marking mistakes and against the backdrop of stringent quality control processes. Sherwood (2022) refers to the existence of such an admissible range of marks for a candidate as “fuzziness”, and he posits it is not associated with incorrect marking, but rather is an inherent subject-­related characteristic associated with the assessment of higher level cognitive skills. Sherwood draws parallels between the legitimate different assessments of such skills and other contexts where professional judgement is intimately entwined with the assessment process:

44 

I. CANTLEY

In the Olympic Games, for example, the winner of the high jump is unambiguously identified as the person who is able to clear the highest bar, a bar knocked down by all the other athletes. In many sports, however, such as diving and figure skating, expert judges award marks according to their opinion as regards the smoothness of the diver’s entry into the water, or the quality of the skater’s jump. And different judges have different opinions. Not wildly so, but different none the less. (Sherwood, 2022, p. 151)

Marking of an open-ended assessment task in an examination paper, such as an essay, for example, also entails expert judgement, and can thus legitimately lead to the award of a range of admissible marks. The fact that a candidate could be legitimately awarded a range of different total marks for a given level of performance in the examination has significant implications for the award of grades. In a grading system, each grade spans a range of possible total marks, and is delimited from lower and higher grades by grade boundaries. For example, if the lower boundary for a grade B is 60 and the lower boundary for a grade A is 70, then a candidate P who obtains a total mark between 60 and 69 inclusive, such as 68, would be awarded a grade B, while a candidate Q who obtains a total mark of 71, for example, would be awarded a grade A. However, as noted above, if the examination included questions assessing higher level cognitive skills, both P and Q could be legitimately awarded any total mark within a range of admissible values. Suppose, for example, the legitimate range for P is 68 ± 3, while the legitimate range for Q is 71 ± 3. In other words, P could have been legitimately awarded any total mark between 65 and 71 inclusive, while Q could have been legitimately awarded any total mark between 68 and 74 inclusive. If, in fact, P had been awarded a total mark of 71 and Q had been awarded a total mark of 68, then P would have been awarded a grade A, while Q would have been awarded a grade B. Under such circumstances, the grades awarded to P and Q would be interchanged, and so the initially awarded grades would have been unreliable. For the grade awarded to a given candidate to be reliable, the grade corresponding to each mark within the range of admissible marks must be the same, which is clearly not the case in the example considered above. This type of situation could arise if different examiners had marked the examination scripts of P and Q, or perhaps even if the same examiner had marked their scripts on different days and is symptomatic of the fuzziness in educational assessment that Sherwood (2022) alluded to, and for which I have offered a philosophical rationale.

3  IMPLICATIONS OF EDUCATIONAL ASSESSMENT CRITIQUE FOR PUBLIC… 

45

The inherent fuzziness in the grading of examination candidates could explain the worrying statistics considered in the introductory section of this chapter: that grade reliability for public examinations in England is approximately 75%, with one in every four grades awarded being incorrect. As I mentioned in the introductory section, grade reliability varies by discipline and, in the statistics cited above, ranges from 52% for English language and literature to 96% for mathematics, with subjects such as geography having a grade reliability of 65%. This variation in grade reliability is likely to reflect the inherent degree of fuzziness associated with each subject, which, in turn, is likely to be governed by the modes of assessment used for the subject. Those subjects that make more extensive use of open-ended questions to assess higher order cognitive skills, by requiring candidates to communicate their knowledge and skills in the form of extended written answers or essays, are more likely to have higher levels of fuzziness, and corresponding lower grade reliabilities, than subjects that do not embrace such modes of assessment. However, it is noteworthy that, even in subjects such as mathematics, more open-ended questions do occasionally feature and, while grade reliability for mathematics is much higher than for disciplines such as English or Geography, it is not 100%. Perhaps then, an alternative approach is required to assess higher level cognitive skills more reliably. This is an aspect of high stakes assessment in the context of public examinations that I will return to in the concluding chapter of the current book. Although they originated in China, public examinations became an increasingly common feature of many education systems around the world during the nineteenth and twentieth centuries. A system of public examinations is often purported to provide a necessary competitive influence on students and teachers that will motivate them to achieve high educational standards, which, in turn, will help to ensure national competitiveness. Furthermore, it is frequently posited that the results of public examinations provide a fair, meritocratic, and equitable means of selecting candidates for educational or vocational opportunities. However, the claims of fairness, meritocracy and equity have been disputed by numerous scholars on multiple grounds, including, for example, performance differentials on the basis of socioeconomic status (Kellaghan, 2015). I have added to the critical perspectives on the purported fair, meritocratic and equitable characterisations of public examinations by challenging their heavy reliance on high stakes assessments at a particular point in time, potentially using

46 

I. CANTLEY

unreliable assessment approaches, particularly in the context of higher level cognitive skills. Although public examinations take a variety of formats in different jurisdictions including, for example, multiple choice tests, essay-type examinations, and elements of teacher-assessed coursework, the model under which they operate has remained very similar in many countries over the years. For example, in England, despite inevitable changes in their content and focus, and a foray into modular assessment during the early part of the twenty-first century, public examinations retain many of the features that characterised such examinations in the nineteenth century. Candidates study a prescribed syllabus over a given period, after which they sit high stakes summative tests, the results of which are used for certification and selection purposes. Indeed, despite variations to normal examination practices during the Covid-19 pandemic, which I outline in the following section, countries such as England have transitioned back to their pre-pandemic public examination arrangements.

Covid-19: A Missed Opportunity for Reform of Public Examinations? In many international contexts, the Covid-19 pandemic forced a transition to remote learning for most students, and bans on social gatherings, for extended periods of time, which led to knock-on consequences for high stakes public examinations. There were three common approaches to dealing with such examinations during the pandemic: cancellation, postponement, or continuing with them albeit in a modified format such as using online assessment approaches (World Bank, 2020). In England, for example, as in some other nations, the public examinations scheduled to take place in 2020 and 2021 were cancelled due to the pandemic. Consequently, in England, an alternative certification model for qualifications, predicated on teacher-assessed grades, was used in 2020 and 2021 (Ofqual, 2020a, 2021). In 2020, students taking qualifications that would normally have been assessed using high stakes examinations were graded using a combination of teacher-assessed grades and grades calculated by a statistical standardisation process. The standardisation process was designed to ensure the results profile for a given school was broadly comparable to the corresponding school results profiles in previous years (Ofqual, 2020a, 2020b). There was considerable controversy surrounding

3  IMPLICATIONS OF EDUCATIONAL ASSESSMENT CRITIQUE FOR PUBLIC… 

47

the use of a statistical algorithm in the 2020 grading process, which led to it being dispensed with in 2021, when grades were solely based on teacher assessments (Ofqual, 2021). In determining students’ grades, teachers were instructed to draw upon a range of relevant evidence of student attainment, such as the results of mock examinations, class tests, class work, homework, and other assignments completed in class or at home. Protocols were put in place to try to ensure appropriate procedures were employed in the determination of teacher-assessed grades, such as training of school personnel in good assessment practices, a requirement for schools to have their assessment policies approved by awarding bodies, and internal and external moderation of grades. A number of scholars have argued that, since teachers work with students for extended periods, involving numerous opportunities to interact with and observe their performance on a wide range of activities, they will have a more comprehensive view of students’ capabilities than high stakes test results can provide (Johnson, 2013). Consequently, it has been posited that teacher assessment has the potential to improve the validity (Harlen, 2007) and reliability (Wiliam, 2003) of assessment outcomes compared to what is possible using high stakes tests. However, despite the purported advantages of teacher assessment, and the protocols that were put in place to ensure accuracy and consistency, there are several difficulties associated with the approaches adopted in the pandemic-­ induced teacher-assessed grading of public examinations in England. Firstly, it is conceivable that a school could have opted to base teacher-­ assessed grades in a particular subject on the results of what effectively amounted to high stakes tests, possibly incorporating questions provided by the relevant awarding body (Ofqual, 2021). The use of such tests for grading purposes was problematised in the previous section, where it was argued that there are philosophical tensions associated with the use of one-off high stakes tests for student assessment. However, a number of other issues have the potential to compromise the validity and/or reliability of teacher assessments. For example, given that students were only assessed on what they had been taught (Ofqual, 2021), different schools could have operationalised their teacher-assessments of a particular subject in different ways, giving greater emphasis to some aspects of the subject than others did, for example, the assessment of numerical work may have been prioritised over algebra in mathematics. Furthermore, the lack of consistency in the evidence base used by different schools presented further challenges to assuring the validity and reliability of teacher-assessed

48 

I. CANTLEY

grades. For example, school A may have based students’ mathematics grades on the results of tests taken under highly controlled conditions, while school B awarded mathematics grades primarily based on students’ performance in class work and homework. It is thus possible that a given student could have been awarded different teacher-assessed grades for mathematics in the two schools. Several other biases in teacher assessment have been noted in the literature, and these could have a bearing on the validity and reliability of teacher-assessed grades, particularly in those subjects where there is a subjective element to assessing student achievement. Teachers have been shown to be either consciously or unconsciously influenced by irrelevant factors when assessing students (Johnson, 2013). For example, in a US-based study of the accuracy of teachers’ assessments of young children’s literacy skills, Ready and Wright (2011) reported that teachers tend to overestimate girls’ literacy skills relative to those of boys, and the literacy skills of white compared to Hispanic children. Furthermore, Ready and Wright (2011) found that teachers underestimated the literacy skills of children from lower socioeconomic backgrounds relative to those of their more affluent peers. Worryingly, Reeves et al. (2001) reported that, in their study, British teachers underestimated the performance of upper primary school special educational needs (SEN) students in English, mathematics and science. Finally, Wyatt-Smith et al. (2010) noted teachers’ tendencies to base their assessment decisions on irrelevant student personality traits such as behaviour and effort. These examples demonstrate the possibility of inherent student gender, ethnicity, socioeconomic background, SEN status, and personality trait biases in teacher assessments. Such biases could potentially lead to deleterious consequences for the validity and/or reliability of teacher-assessed grades in public examinations. In tacit acknowledgement of the fairness issues associated with relying solely on teacher assessment for the grading of public examinations, many countries, including England, have transitioned back to their pre-­pandemic approaches to awarding public qualifications, that is, on the basis of high stakes examinations (Department for Education, 2022). However, given the tensions that I have articulated in relation to high stakes examinations, I suggest that a significant opportunity has been missed to review contemporary approaches to public examinations. Accordingly, I believe it is imperative for policymakers to undertake a review of their examination

3  IMPLICATIONS OF EDUCATIONAL ASSESSMENT CRITIQUE FOR PUBLIC… 

49

systems to ensure they are fit-for-purpose, rather than presuming that pre-­ pandemic approaches to determining the results of public examinations were infallible. To this end, possible reforms to public examinations are suggested in the concluding chapter of this book.

Summary In this chapter, I have considered potential implications of the critique of educational assessment that I articulated in Chap. 2. More specifically, I have explained how the tensions associated with educational assessment may come to bear on the results of high stakes public examinations. The problematic nature of using assessments at a single point in time, and the difficulties associated with reliably assessing higher order cognitive skills, are important aspects of my critique of educational assessment. In the final chapter of the book, I suggest possible approaches to making educational assessment approaches more robust, which do not rely on snapshots of students’ performance at a single point in time, and which also acknowledge the seemingly intractable problems associated with accurately and reliably assessing higher level cognitive skills. In the meantime, Chaps. 4–6 of the book focus exclusively on the very real tensions associated with one particular use of high stakes educational assessment, namely, academic selection. The next chapter gives a historical overview of the evolution of academic selection in the United Kingdom, and Northern Ireland in particular, although it also briefly addresses the relevance of academic selection to other international contexts.

References Black, P. (1996). Commentary. In E. D. Britton & S. A. Raizen (Eds.), Examining the examinations: An international comparison of science and mathematics examinations for college-bound students (pp. 19–21). Kluwer. Bloom, B.  S. (1956). Taxonomy of educational objectives handbook 1: Cognitive domain. Longmans. Coe, R, Jones, K., Searle, J., Kokotsaki, D., Kosnin, A. M., & Skinner, P. (2008). Evidence on the effects of selective educational systems: A report for the Sutton trust. CEM Centre, Durham University. https://www.suttontrust.com/wp-­ content/uploads/2019/12/SuttonTrustFullReportFinal-­1.pdf Davis, A. (1998). The limits of educational assessment. Blackwell. Department for Education. (2022). Subject content and assessment arrangements in the academic year 2022 to 2023. Department for Education https://www.

50 

I. CANTLEY

gov.uk/government/publications/subject-­c ontent-­a nd-­a ssessment-­ arrangements-­2022-­to-­2023/subject-­content-­and-­assessment-­arrangements­in-­the-­academic-­year-­2022-­to-­2023 EPI. (2018). Round-table discussion: “Making the grade? Exam accuracy and its implications”. Education Policy Institute. https://epi.org.uk/publications-­ and-­research/exam-­accuracy-­implications/ European Parliament. (2014). Higher education entrance qualifications and exams in Europe: A comparison. European Parliament. https://www.europarl. e ur op a . e u/R egD a t a / et u d es/ et u d es/ j o i n / 2014/ 529057/ IP OL -­ CULT_ET(2014)529057_EN.pdf Harlen, W. (2007). Assessment of learning. Sage. Johnson, S. (2013). On the reliability of high-stakes teacher assessment. Research Papers in Education, 28(1), 91–105. https://doi.org/10.1080/0267152 2.2012.754229 Kane, M. T. (2001). So much remains the same: Conception and status of validation in setting standards. In G. J. Cizek (Ed.), Setting performance standards: Concepts, methods, and perspectives (pp. 53–88). Lawrence Erlbaum. Kellaghan, T. (2015). Family and schooling. In J. D. Wright (Ed.), International encyclopedia of the social & behavioral sciences (2nd ed., pp. 751–757). Elsevier. https://doi.org/10.1016/B978-­0-­08-­097086-­8.92005-­1 Kellaghan, T., & Greaney, V. (2020). Public examinations examined. World Bank. https://openknowledge.worldbank.org/bitstream/handle/10986/ 32352/9781464814181.pdf?sequence=2&isAllowed=y Lavy, V., Ebenstein, A., & Roth, S. (2016). The long term economic consequences of having a bad day: How high-stakes exams mismeasure potential (Global Perspectives Series: Paper 8). University of Warwick Centre for competitive advantage in the global economy (CAGE). https://www.smf.co.uk/wp-­ content/uploads/2016/06/SMF-­C AGE-­H ow-­h igh-­s takes-­e xams-­ mismeasure-­potential-­Embargoed-­0001-­290616-­Font-­WEB-­FINAL.pdf Ofqual. (2018). Marking consistency metrics: An update. Ofqual. https://assets. publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/759207/Marking_consistency_metrics_-­_ an_update_-­_ FINAL64492.pdf Ofqual. (2020a). Changes to awarding of GCSE, AS and a level, extended project qualification and advanced extension award in maths–Summer 2020. Ofqual. https://assets.publishing.service.gov.uk/government/uploads/system/ uploads/attachment_data/file/911568/6676_Changes_to_awarding_ GCSE__AS__A_level__EPQ__AEA_-­_summer_2020.pdf Ofqual. (2020b). Requirements for the calculation of results in summer 2020. Ofqual. https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/910614/6674_Requirements_for_the_ calculation_of_results_in_summer_2020_inc._Annex_G.pdf

3  IMPLICATIONS OF EDUCATIONAL ASSESSMENT CRITIQUE FOR PUBLIC… 

51

Ofqual. (2021). Student guide to awarding: Summer 2021. Ofqual. https://assets. publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/1009462/21-­6 817-­1 _Student_guide_to_awarding_in_ summer_2021_20210808_1528_-­_accessible.pdf Ofqual. (2022). Guide for schools and colleges 2022: GCSEs, AS and A levels (Special consideration). Ofqual. https://www.gov.uk/guidance/regulating-­gcses-­as-­ and-­a-­levels-­guide-­for-­schools-­and-­colleges-­2022/special-­consideration Ready, D. D., & Wright, D. L. (2011). Accuracy and inaccuracy in teachers’ perceptions of young children’s cognitive abilities: The role of child background and classroom context. American Educational Research Journal, 48(2), 335–360. https://doi.org/10.3102/0002831210374874 Reeves, D.  J., Boyle, W.  F., & Christie, T. (2001). The relationship between teacher assessments and pupil attainments in standard test tasks at key stage 2, 1996-98. British Educational Research Journal, 27(2), 141–160. https://doi. org/10.1080/01411920120037108 Sherwood, D. (2022). Making the mark. Canbury Press. Wiliam, D. (2003). National curriculum assessment: How to make it better. Research Papers in Education, 18(2), 129–136. https://doi.org/10.1080/ 0267152032000081896 Wittgenstein, L. (2009). Philosophical investigations: The German text with an English translation (Edited by P.  M. S.  Hacker & J.  Schulte; translated by G. E. M. Anscombe, P. M. S. Hacker & J. Schulte; 4th edition). Wiley-Blackwell. World Bank. (2020). High-stakes school exams during Covid-19 (Coronavirus): What is the best approach? World Bank. https://blogs.worldbank.org/education/high-­s takes-­s chool-­e xams-­d uring-­c ovid-­1 9-­c oronavirus-­w hat-­b est-­ approach Wyatt-Smith, C., Klenowski, V., & Gunn, S. (2010). The centrality of teachers’ judgement practice in assessment: A study of standards in moderation. Assessment in Education: Principles, Policy & Practice, 17(1), 59–75. https:// doi.org/10.1080/09695940903565610

CHAPTER 4

Historical Evolution of Academic Selection

Abstract  This chapter outlines the historical developments that led to the introduction of the grammar school entrance tests in the United Kingdom, with a particular focus on the contributions of Francis Galton and Cyril Burt, both of whom espoused theories that ultimately legitimised the use of such tests for academic selection purposes. Of particular note is the impact of Burt’s controversial ideas pertaining to innate, immutable intelligence that can be measured accurately and reliably at a tender age. Although the evidence contravening these discredited views has led to the abandonment of academic selection in most of Great Britain, it is problematic that it has not led to a similar outcome in Northern Ireland and some parts of England, where academic selection continues to flourish. The impact of Galton’s and Burt’s theories on British education policy is critically evaluated before the historical evolution of academic selection in Northern Ireland, from 1947 to the present day, is summarised. The chapter concludes with an overview of academic selection in several other international contexts. Keywords  Academic selection • Cyril Burt • Eugenics • Francis Galton • Northern Ireland

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 I. Cantley, The Philosophical Limitations of Educational Assessment, https://doi.org/10.1007/978-3-031-47021-9_4

53

54 

I. CANTLEY

Introduction Before the 1947 Education Act (Northern Ireland), most Northern Irish children attended elementary schools until the age of 14, and very few attended academically focused grammar schools. Up until 1947, most grammar school students were admitted on a fee-paying basis, and many were from the middle and upper classes. In Great Britain, political pressures in the early part of the twentieth century had come to bear on this class-based education policy, and there were calls for the provision of secondary education for all students rather than just the elite few. This culminated in the 1944 Education Act and the follow-up 1947 Education Act (Northern Ireland). The 1947 Act made provision for a secondary education system consisting of grammar, secondary intermediate and technical schools. However, technical schools failed to flourish and were eventually phased out (Sutherland, 1990). As in Great Britain, all children in Northern Ireland were entitled to take an entrance examination to gain admission to a grammar school, and those who passed the examination were entitled to free places. This approach was purported to offer equality of opportunity and to enable sufficiently capable students from poorer backgrounds to obtain a grammar school education. Interestingly, grammar school students were always prepared for public examinations, but secondary intermediate students were not entered for such examinations until the 1970s (Gardner, 2016). Therefore, at least until the 1970s, a grammar school education considerably increased the life chances of those who managed to obtain one. This chapter outlines the historical developments that led to the introduction of the grammar school entrance tests, with a particular focus on the contributions of Francis Galton and Cyril Burt, both of whom espoused theories that ultimately legitimised the use of such tests for academic selection purposes. The impact of these theories on British education policy is critically evaluated before the historical evolution of academic selection in Northern Ireland, from 1947 to the present day, is summarised. The chapter concludes with an overview of academic selection in several other international contexts.

Francis Galton and the Eugenics Movement Francis Galton was an eminent English polymath in the Victorian era. In his long life (1822–1911), he made important contributions to such diverse fields as geography, meteorology, psychology, statistics, and human

4  HISTORICAL EVOLUTION OF ACADEMIC SELECTION 

55

heredity. Francis was a second cousin of the renowned English naturalist and biologist Charles Darwin, who was 13 years older than him. Darwin famously developed a theory of evolution based on natural selection, which postulated that, for any species, individuals who are better suited to the environment have a higher probability of surviving and reproducing to pass their heritable traits to future generations, while those who are less suited have a lower likelihood of surviving/reproducing. Galton’s relationship to Darwin is particularly noteworthy given the views Galton ultimately developed on the importance of inherited characteristics (Gillham, 2001). The majority of Galton’s research and publications were in two main areas: geography and human heredity. Up until approximately 1860, Galton was mainly involved in exploration and studying geography, authoring outputs such as Tropical South Africa and The Art of Travel, which were published in 1853 and 1855, respectively. In November 1859, Charles Darwin published his highly influential work On the Origin of Species, and this significantly influenced the remainder of Galton’s career. On the Origin of Species articulated Darwin’s theories pertaining to how life had developed and diversified over a period of 3.5 billion years. However, it did not focus specifically on human evolution, which was addressed in his 1871 book, The Descent of Man. Galton was quick to appreciate the implications of the ideas set out in On the Origin of Species, as he pointed out in his autobiography: I … devoured its contents and assimilated them as fast as they were devoured, a fact which may probably be ascribed to an hereditary bent of mind that both its illustrious author and myself have inherited from our common grandfather, Dr. Erasmus Darwin. (Galton, 1908, p. 288)

He proceeded to claim, “I was encouraged by the new views to pursue many inquiries which had long interested me, and which clustered round the central topics of Heredity and the possible improvement of the Human Race” (Galton, 1908, p. 288). Consequently, after Darwin’s publication of On the Origin of Species, Galton devoted a substantial proportion of his life to studying possible approaches to improving the human race to increase the proportion of individuals with desirable hereditary traits. The dominant themes of Galton’s work in the latter half of his career can be traced to two articles, which were published in the June and August 1865 editions of a prestigious journal of the era, Macmillan’s Magazine.

56 

I. CANTLEY

In these articles, Galton attempted to prove that Darwin’s “natural selection” theory was just as applicable to human beings as it was to domestic animals and plants. He also advocated the use of selective breeding to promote the development of a human race with superior mental abilities, while discouraging the procreation of those with less desirable characteristics. IQ tests had not been invented at the time, and Galton made use of pedigree analysis (an approach to studying inheritance in humans, which permits prediction of the way in which a particular trait is propagated to future generations), twin studies and anthropometric data to investigate what he termed “talent and character”. Galton devised new statistical methods for analysing the data collected in his investigations. Indeed, he believed that desirable physical traits were associated with higher levels of mental capability, despite having no methods at his disposal to investigate the relationship directly (Gillham, 2001). Galton employed pedigree analysis in his Hereditary Genius book, which was published in 1869, to demonstrate the inheritance of “talent and character”. His intentions were made clear at the beginning of Hereditary Genius: I propose to show in this book that a man’s natural abilities are derived by inheritance, under exactly the same limitations as are the form and physical features of the whole organic world. Consequently, as it is easy, notwithstanding those limitations, to obtain by careful selection a permanent breed of dogs or horses gifted with the peculiar powers of running, or of doing anything else, so it would be quite practicable to produce a highly-gifted race of men by judicious marriages during several consecutive generations. (Galton, 1869, p. 1)

Galton was particularly concerned with verifying the inheritability of mental abilities in Hereditary Genius. In his research, Galton studied 300 highly esteemed families and noted that they contained almost 1000 “distinguished men”, which he suggested greatly exceeded the incidence of distinguished men in the general population, estimated by him to be one distinguished man per 4000. In addition, Galton reported that an individual who was closer in kinship to a distinguished man would have a greater probability of also being distinguished (Galton, 1869). Furthermore, Galton was dismissive of the potential role of nurture in the genesis of talent, and posited that nature was the dominant factor. Galton was convinced that the average age of marriage was an important factor in

4  HISTORICAL EVOLUTION OF ACADEMIC SELECTION 

57

determining the virility of human beings. He believed that, since those who married young tended to produce more offspring who were alive simultaneously, it would be prudent to implement a policy of delaying the age of marriage for the intellectually weak and accelerating it for those with greater intellectual prowess (Galton, 1869). He was convinced that such an approach, coupled with encouraging “the vigorous classes” to have large families, would maximise the pool of natural ability in the population. Galton’s controversial ideas received a mixed response, and several critiques of them were published in the leading journals of the era. Nevertheless, he persisted with this line of work, which laid the foundations for eugenics: a set of beliefs and practices aimed at improving the genetic calibre of human beings. Galton is a founding father of eugenics, having first referred to the term “eugenics” in Inquiries into Human Faculty and its Development, which was published in 1883, to characterise the cultivation of the human race: That is, with questions bearing on what is termed in Greek, eugenes namely, good in stock, hereditarily endowed with noble qualities. This, and the allied words, eugeneia, etc., are equally applicable to men, brutes, and plants. We greatly want a brief word to express the science of improving stock, which is by no means confined to questions of judicious mating, but which, especially in the case of man, takes cognisance of all influences that tend in however remote a degree to give to the more suitable races or strains of blood a better chance of prevailing speedily over the less suitable than they otherwise would have had. The word eugenics would sufficiently express the idea; it is at least a neater word and a more generalised one than viriculture which I once ventured to use. (Galton, 1883, p. 17)

Galton added further clarity to the definition of eugenics in his autobiography, where he defined it as “the study of agencies under social control that may improve or impair the racial qualities of future generations, either physically or mentally” (Galton, 1908, p. 321). Galton’s controversial ideas pertaining to eugenics significantly influenced his followers and had considerable impact even after his death. Galton’s work had a direct bearing on the intelligence testing movement that evolved in Britain during the course of the twentieth century, and particularly on the contributions of Cyril Burt, whose career and influence on academic selection policies are summarised in the next section.

58 

I. CANTLEY

Cyril Burt’s Contribution to Academic Selection Policies Cyril Burt exerted a substantial influence on policies pertaining to academic selection in Britain during the twentieth century, and he worked relentlessly to make sure the notion of innate, immutable intelligence became entrenched in mainstream thinking within the public domain (Chitty, 2013). Interestingly, there was a personal connection between Burt and Galton as the young Burt encountered Galton when he accompanied his father on his rounds as a family doctor during the 1890s. Burt was very impressed by Galton and described him as “one of the most distinguished-­ looking people I have ever known” (Burt, 1962, p.  1). Galton’s work inspired Burt to pursue scientific approaches to studying phenomena, and to embrace measurement and quantitative methods. Burt passionately embraced eugenic principles and contributed several articles on the inheritability of mental traits to The Eugenics Review, an academic journal established in 1909 to advance the eugenics research agenda. In his own work, Galton did not refer explicitly to “intelligence”, but he alluded to the less clearly defined concept of “natural ability”, which he described as follows: By natural ability, I mean those qualities of intellect and disposition, which urge and qualify a man to perform acts that lead to reputation. I do not mean capacity without zeal, nor zeal without capacity, nor even a combination of both of them, without an adequate power of doing a great deal of very laborious work. But I mean a nature which, when left to itself, will, urged by an inherent stimulus, climb the path that leads to eminence, and has strength to reach the summit—one which, if hindered or thwarted, will fret and strive until the hindrance is overcome, and it is again free to follow its labour-loving instinct. It is almost a contradiction in terms, to doubt that such men will generally become eminent. On the other hand, there is plenty of evidence in this volume, to show that few have won high reputations, without possessing these peculiar gifts. It follows that the men who achieve eminence, and those who are naturally capable, are, to a large extent, identical. (Galton, 1869, pp. 37-38)

However, Burt concurred with Galton’s conception of mental ability in the sense that both men construed it to have innately governed limits, and that it varies distinctly between individuals. Like Galton, Burt also viewed mental ability as an intellectual property that characterised cognitive

4  HISTORICAL EVOLUTION OF ACADEMIC SELECTION 

59

functioning, and he believed that innate ability was a general rather than a specific attribute. In other words, both Burt and Galton posited that an individual’s specific ability in a particular knowledge domain was always underpinned by a more significant generic ability (Chitty, 2013; Gould, 1996). As an illustration of his conviction in this regard, Burt contended that a man could not be a mathematician unless he possesses specific mathematical aptitude, but that great mathematicians must necessarily also possess high levels of general ability (White, 2006). The study of human intelligence and ability was a central theme of Burt’s work throughout his long career. In one of his early articles, Burt claimed: “mental inheritance … not only moulds the character of individuals; it rules the destiny of nations” (Burt, 1912, p.  200). Burt was resolute in his belief that intelligence was an inherited trait: “Burt’s belief in the innateness of intelligence was for him almost an article of faith, which he was prepared to defend against all opposition, rather than a tentative hypothesis to be refuted, if possible, by empirical tests” (Hearnshaw, 1979, p. 49). Furthermore, he believed that it was straightforward to measure intelligence accurately, as articulated in the following quotation, which also communicates his definition of intelligence: By the term “intelligence”, the psychologist understands inborn, all-round intellectual ability. It is inherited, or at least innate, not due to teaching or training; it is intellectual, not emotional or moral, and remains uninfluenced by industry or zeal; it is general, not specific, that is to say it is not limited to any particular kind of work, but enters into all we do or say or think. Of all our mental qualities, it is the most far-reaching; fortunately, it can be measured with accuracy and ease. (Burt, 1933, pp. 28-29)

Burt’s views on the nature of intelligence, and his work on its measurement, significantly influenced the British government’s 1938 Spens Report on the future of secondary education. Burt provided evidence pertaining to children’s intellectual development to the authors of the report in which he supported the notion of innate general intelligence that was unlikely to change during the teenage years and could be accurately and reliably measured at 11  years of age (Chitty, 2009, 2013). The Spens Report paved the way for the 1944/1947 Education Acts that led to the introduction of the tripartite system of grammar, technical and secondary intermediate schools, and the introduction of the so-called 11+ test, to be taken in the final year at primary school, to select children for a grammar

60 

I. CANTLEY

school education. Despite having considered the option, the Spens Report rejected all-ability comprehensive schools at secondary level in favour of the tripartite system (Chitty, 2009). During the early 1950s, several criticisms of the use 11+ tests for academic selection purposes emerged. For example, Simon (1953) and Heim (1954) both raised strong objections to the use of intelligence tests for academic selection purposes, arguing that many questions on such tests had inherent cultural and social biases, and that intelligence is non-­ separable from other mental attributes. Further criticisms of the 11+ tests emerged during the remainder of the 1950s and in the 1960s, which highlighted that the academic selection process was deeply flawed. For example, Simon (1955) provided evidence of non-grammar school students who had failed the 11+ test proceeding to achieve high standards in subsequent public examinations that were better than those obtained by many grammar school students. Simon (1955) argued that such examples highlighted the deeply flawed nature of 11+ selection, and he concluded that “we must provide opportunities and worthwhile objectives, not for the few at the expense of the many, but for the youth of the country as a whole” (p. 70). Furthermore, Pedley (1963) marshalled critique at the view that innate ability could be measured using intelligence tests, arguing that inborn ability was inseparable from what children had learned. Pedley made the point that traits such as ability were intimately entwined with environmental factors, which conferred an unfair advantage to children from appropriately supportive backgrounds in the selection process. It also became apparent during the 1950s that grammar schools were mainly attended by children from middle class backgrounds, so that the post-primary education system was segregated along class lines. Research demonstrated that academic selection led to a direct relationship between educational opportunities and social class, which favoured middle class children and discriminated against those from working class backgrounds (Floud et al., 1956). Despite the growing concerns about academic selection, Cyril Burt defended the notion that children possess innate mental abilities that can be accurately and reliably measured using intelligence tests. Accordingly, he continued to be strongly supportive of the use of the 11+ examination to allocate children to different types of post-­ primary school: Since individuals differ so much in [innate, general, intellectual ability] … it is essential, in the interests of the children themselves and of the nation as a whole, that those who possess the highest ability—the cleverest of the

4  HISTORICAL EVOLUTION OF ACADEMIC SELECTION 

61

clever—should be identified as accurately as possible. Of the methods hitherto tried out the so-called 11+ examination has proved to be by far the most trustworthy. (Burt, 1959, p. 117)

Throughout his life, Burt maintained his unwavering commitment to the view that mental abilities were largely inherited traits, and that the future intellectual potential of children could be reliably predicted by their performance in the 11+ examination (Burt, 1969). Despite the views of Burt and his supporters, the mounting criticism led to the widespread abandonment of 11+ selection tests and the introduction, from the mid-1960s to the mid-1970s, of all-ability comprehensive schools to replace the bipartite system of secondary and grammar schools in most of Great Britain. However, the selective system remained intact in Northern Ireland despite the various critiques that led to its demise in Great Britain.

Controversy Surrounding Cyril Burt’s Work After Burt’s death in 1971, considerable controversy emerged about his competence and the authenticity of his theories about intelligence, with several scholars making scathing criticisms of his work. Burt’s thesis that intelligence is genetically determined came under particular scrutiny by Kamin (1974), who reviewed various studies that Burt purported to confirm the heritability of intelligence. Kamin (1974) argued that Burt’s claims about the genetic basis of intelligence were unwarranted, and that there was insufficient evidence to dismiss the impact of environmental factors on intellectual ability. In Burt’s research into identical twins who were separated at birth, and brought up in different environments, he claimed to have found strong, positive correlations between the IQs of the twins in later life, from which he concluded that intellectual prowess was significantly determined by genetics rather than environmental influences (e.g., Burt, 1966). Kamin (1974) criticised Burt’s research in this area on several grounds, including the lack of precision around the methodology employed in the studies and conflicting statements in some of the publications that emanated from the research. However, the most serious concern raised by Kamin (1974) pertained to the surprisingly high levels of similarity between some of the correlation coefficients in the twin studies for different sample sizes, which he suggested were consistent with fraudulent research practices and data fabrication. Kamin (1974, p. 57) noted that “the correlations were usually reported to three decimal places. They

62 

I. CANTLEY

were astonishingly stable, seeming scarcely to fluctuate as the sample size was changed.” From a mathematical perspective, I concur with Kamin’s concerns about the authenticity of some of the correlations reported in Burt’s work. For example, the addition of an extra 20 pairs of data values to a sample of 131 had no impact on the reported correlation coefficients (to three decimal places) between different measures of school attainment (Burt, 1955, 1966; Kamin, 1974, p. 58). As Kamin intimated, it is highly improbable that correlation coefficients for different sample sizes in situations such as this would be identical when rounded to three decimal places. Gillie (1977) also highlighted some serious concerns about Burt’s work on the heritability of intelligence. These included the fact Burt frequently estimated the IQ scores of parents he interviewed as part of his research, but subsequently treated the estimates as accurate scientific data. Gillie also alleged that two supposed collaborators of Burt’s, who were named as co-authors of research articles, may not actually have existed, with Burt fabricating and using their names despite having undertaken the research solely by himself. In relation to Kamin’s (1974) exposure of the surprising levels of agreement, to a high degree of precision, between the IQ correlation coefficients for different samples in the twin studies, Gillie (1977) suggested that Burt could only have achieved such results by working backwards to make the observed data consistent with his desired findings. Furthermore, Gillie (1977) also accused Burt of fabricating data to support his preferred genetic theories, thereby falsifying the existence of conclusive scientific evidence for their validity. These highly problematic accusations about Burt’s professional endeavours cast considerable doubt on his credibility and integrity. This is further corroborated by the views of Professor Leslie Hearnshaw, who was Burt’s official biographer. Hearnshaw (1979) reluctantly conceded that many of the accusations levelled against Burt were correct, but he intimated that this mainly affected his work after the Second World War. Hearnshaw (1979) appeared to be rather more reserved in his assessment of the authenticity of Burt’s earlier work, as illustrated by his reference to Burt’s “many scholarly and practical achievements” (p.  259). These damning indictments of the validity of the research that underpinned the introduction of academic selection in Britain are particularly worrying since children are still selected on the basis of academic ability to attend grammar schools in Northern Ireland, and some other jurisdictions internationally, despite such discredited practices having been largely abandoned in other

4  HISTORICAL EVOLUTION OF ACADEMIC SELECTION 

63

parts of the United Kingdom. The following section summarises the evolution of academic selection in Northern Ireland from the 1947 Education Act to the present day.

Academic Selection in Northern Ireland Academic selection for post-primary education has been a characteristic of the Northern Ireland educational landscape from 1948 to the current era, and the test used to effect selection has been variously known as the qualifying examination, the selection procedure, the 11+ or the transfer test at different stages (Brown et al., 2022). These tests are taken during the final year of primary education, usually at 10 or 11  years of age, and their results determine the type of post-primary school students can attend, grammar or non-selective. Only a very small number of comprehensive schools operate in certain areas of Northern Ireland. However, it is important to note that a different system of selection is used within the Craigavon area of Northern Ireland. This region, which accounts for a small proportion of the Northern Ireland population, operates a delayed system of selection at 14 years. In the Craigavon area, students transfer automatically from primary schools to junior high schools at age 11 years and, after a delayed selection procedure at age 14 years, some students are selected for grammar schools. The selection procedure at 10 or 11 years in the rest of the province has taken different formats over the years. Initially, from 1948 to 1965, the so-called qualifying examination consisted of tests in English and arithmetic, both based on a specified syllabus, together with an intelligence test. In 1966, a selection procedure consisting of verbal reasoning rather than curriculum-aligned tests replaced the qualifying examination, and this was in place until 1977 (Sutherland, 1990). The selection procedure was scrutinised by the Advisory Council for Education in Northern Ireland during the early 1970s, which recommended in its 1973 Burges Report that academic selection should be abandoned (Gardner, 2016; Sutherland, 1990). In response to the Burges Report, an “alternative transfer procedure”, based on non-attributable tests and primary school principal nominations, was employed to select children for grammar schools, although this only lasted from 1978 to 1980 inclusive. This system was fraught with difficulties as some principals exceeded their allocated quota of grammar school places and, consequently, attributable testing returned in 1981 (Sutherland, 1990). Government-regulated academic selection continued for more

64 

I. CANTLEY

than a quarter of a century, up until 2008, although the selection tests were largely curriculum-aligned during this period, and tested English, mathematics and, for a proportion of the time, science and technology. In addition, different grading systems were used at various stages during this period. Up until the 1980s, boys and girls were treated differently in the transfer procedure. In that era, girls were considered to mature earlier than boys, but boys were purported to catch up towards the end of post-­ primary education. This led to the use of a quota system, whereby girls and boys required different scores to pass the transfer test, with girls requiring a higher score than boys to secure a grammar school place. This unfair quota system, which was also a feature of the educational landscape in England, was successfully challenged in the Northern Ireland High Court in June 1988. The parents of several girls who had been unsuccessful in obtaining non-fee-paying grammar school places, despite their primary school principals considering them to be more capable than some of the boys who were successful in the transfer procedure, won a legal challenge to the separate treatment of girls and boys (Sutherland, 1990). Consequently, the two genders were no longer considered as separate populations in subsequent state-regulated transfer tests. Significant changes for Northern Ireland schools were heralded by 1989 Education Reform (NI) Order which, like the 1988 Education Reform Act in England and Wales, introduced a competitive dimension to schooling. The introduction of a system of open enrolment permitted schools to admit students up to a maximum agreed admission number (Gallagher & Smith, 2000). Although grammar schools could, in principle, refuse to admit a student if it was considered the student was not suited to their academic-focused curriculum, most grammar schools admitted students up to their maximum permitted admission number. This has led to a substantial increase in the percentage of post-primary students who are admitted to grammar schools over time, from 29% in 1984 to 43% in 2021 (Department of Education Northern Ireland [DENI], 2022; Gallagher & Smith, 2000). A consequent decrease in the proportion of post-primary students enrolled in non-selective secondary schools, from 71% in 1984 to 57% in 2021, is an obvious corollary to this. Following the introduction of a devolved Northern Ireland Assembly as a consequence of the 1998 Good Friday Agreement, the inaugural minister for education commissioned a review body to consult and bring forward recommendations on the future organisation of post-primary

4  HISTORICAL EVOLUTION OF ACADEMIC SELECTION 

65

education in Northern Ireland. The so-called Burns Report (Burns, 2001) that emerged from the review recommended the cessation of the transfer tests and the abolition of academic selection for primary to post-primary school transition. Burns (2001) also recommended the introduction of a system of formative assessment in primary schools to furnish meaningful information on educational attainment to students, their parents/guardians and teachers, coupled with collaborative groups of different types of post-primary schools known as collegiates. The public consultation following the publication of the Burns Report failed to yield a consensus view on the future of academic selection, and a further review group was established by the relevant United Kingdom direct rule minister during a period when the Northern Ireland Assembly was suspended between 2002 and 2007 (McMurray, 2020). The remit of this group was to take account of the responses to the consultation on the Burns Report, including the diversity of views on academic selection, and to provide advice on options in relation to future arrangements for post-primary education. The report of this second review body, known as the Costello Report, recommended the termination of academic selection. It also advocated primary to post-primary transition based on parental and student choice, but informed by formative assessment in primary schools, and the introduction of measures aimed at broadening the post-primary curriculum and encouraging greater collaboration between post-primary schools (Costello, 2004). Intense political debate on the future of academic selection has ensued since the publication of the Burns Report, with widening gaps between the views of pro-academic selection unionist parties and anti-academic selection nationalist parties (Gallagher, 2021). However, since the relevant United Kingdom direct rule minister had accepted the recommendations of the Costello Report in principle, and the education minister of the newly reconstituted 2007 Northern Ireland Assembly could not achieve consensus to ratify the termination of academic selection, the last government-regulated transfer test took place in 2008 (Gallagher, 2021; McMurray, 2020). In view of the termination of the government-regulated transfer test, the Northern Ireland Department of Education published guidance on transfer from primary to post-primary school, which advocated that academic criteria should not feature in post-primary schools’ admissions policies. This guidance also made it clear that primary schools should deliver the statutory Northern Ireland curriculum and should refrain from preparing students for any unregulated transfer tests that may emerge (Perry,

66 

I. CANTLEY

2016). During the St Andrews Talks, which led to the re-establishment of the Northern Ireland Assembly in 2007, an agreement was reached with unionist politicians that would effectively permit grammar schools to employ academic selection to admit students if they so desired. Consequently, two unregulated transfer tests emerged that were administered by two consortia: the Association for Quality Education (AQE) and the Post-Primary Transfer Consortium (PPTC). These were used for academic selection purposes from the demise of the state-regulated transfer test in 2008 up to 2022, with the AQE test being used mainly by Protestant grammar schools and the PPTC test being favoured by the majority of Catholic grammar schools, although some schools accepted either test (McMurray, 2020). Both the AQE and PPTC tests were based on the English and Mathematics curricula followed in primary schools, but the PPTC test consisted of multiple-choice questions, while the AQE test employed constructed response items. Although primary schools were initially counselled to refrain from preparing students for these unregulated tests under a nationalist Northern Ireland education minister, this position was reversed by a unionist education minister in 2016, who endorsed primary school support for the unregulated transfer procedure (McMurray, 2020). Worryingly, the fact that these unregulated tests were administered by private organisations meant there was a void in relation to the provision of data for public scrutiny and accountability purposes, despite preparation for the tests being condoned by the education minister. The existence of two distinct tests presented challenges for those students who opted to sit both tests, and there were various unsuccessful attempts at brokering an agreement on a single transfer test to replace the AQE and PPTC tests. However, a breakthrough came in early 2022, when the Schools’ Entrance Assessment Group (SEAG) was established to run a single transfer test, based on the upper-primary school curricula in English and Mathematics, to replace the dual AQE/PPTC system from autumn 2023 (Bain, 2022). It therefore looks likely that unregulated transfer tests will continue to be used for academic selection purposes in Northern Ireland, and the lack of political consensus on the future of academic selection, which is divided on a sectarian basis (Pivotal, 2022), with unionists favouring selection and nationalists opposing it, is set to endure. Indeed, the SEAG assessment model is predicated on two 60-minute tests consisting of a combination of multiple choice and free response questions, but results will be provided for those candidates who sit only one of the two tests (SEAG, n.d.). Worryingly, the SEAG assessment model provides a

4  HISTORICAL EVOLUTION OF ACADEMIC SELECTION 

67

less secure basis for classifying candidates than the AQE model which the SEAG test partially supersedes, where candidates’ performance was assessed using the best two test scores from three 60-minute tests. These developments are particularly problematic against the backdrop of such an approach to segregating young people by academic ability at a tender age being based on the discredited theories of Cyril Burt. The current book brings further theoretical and philosophical insights to bear on the highly contentious issue of academic selection, and problematises the ethical basis upon which selection is predicated.

Academic Selection in Other International Contexts The 1944 Education Act led to the widespread use of academic selection to select students for grammar schools throughout the United Kingdom. Although academic selection persists in Northern Ireland until the present day, it has been largely discontinued in Wales and Scotland in favour of all-ability post-primary education in comprehensive schools. The situation is slightly different in the state-funded school sector in England. Most post-primary students in England attend all-ability comprehensive schools, but academic selection still features for a small minority of students in those geographical areas that opted to retain grammar schools when decisions pertaining to reorganisation of post-primary education were devolved to local education authorities. In 1964, there were 1298 grammar schools in England and Wales, but this number plummeted during the remainder of the 1960s and in the 1970s as government policy promoted the move to non-selective comprehensive schools (Danechi, 2020). Whilst there was a modest increase in the number of grammar schools in England during the early 1990s, the School Standards and Framework Act 1998 prohibited the establishment of new maintained grammar schools, and there are currently just 163 grammar schools, educating around 5% of statesponsored post-primary students in England (Danechi, 2020). These state-sponsored grammar schools co-exist alongside several independent (fee-paying) schools that are free to select students on the basis of their own admissions policies, which usually include academic criteria. More recently, however, the UK government announced plans to increase the prevalence of academic selection in England, making £50 million available to expand the number of selective school places (Department for Education, 2018). Although less priority has been given to this initiative since the 2019 General Election, Liz Truss, who was appointed as UK

68 

I. CANTLEY

Prime Minister in September 2022, expressed her wish to see “more grammar schools in every area” (Francis, 2022). Although academic selection at around 11 years of age for admission to government-funded schools does not generally feature in other Anglophone countries such as Australia, Canada, and the USA, it is used in a number of European countries. For example, Germany retains academic selection, referred to as tracking, for selection into different types of secondary schools, although the exact structure of the education system varies by federal state. After attending all-ability primary schools (Grundschule), students are selected, from 10 years of age, for three different types of secondary school based on teachers’ recommendations, which take cognisance of both students’ prior academic performance and parental wishes. The tripartite system of secondary school tracks is hierarchically organised into lower (Hauptschule), intermediate (Realschule) and upper (Gymnasium) secondary schools. However, in most federal states, mixed-track comprehensive schools (Gesamtschulen) have also been introduced, and there have been moves to merge lower and intermediate secondary schools into a single school type (Schulen mit mehreren Bildungsgängen), although upper secondary schools (Gymnasien) have remained a feature of the German educational landscape (Henninges et al., 2019). Therefore, although operationalised in a different manner to Northern Ireland and England, academic selection is still a notable feature of the German education system. Like Germany, Austria retains an early academic selection/tracking system, whereby students are selected, at 10 years of age, for different types of secondary school according to their ability (OECD, 2020). Several other international jurisdictions also have highly selective schools within their education systems including, for example, Singapore, which operates a system of academic selection at 12 years of age (OECD, 2020). Nevertheless, relatively few jurisdictions around the world use a narrowly focused system of early academic selection similar to that used in Northern Ireland.

Summary The current chapter has given an overview of the history associated with the introduction of academic selection in the United Kingdom, including the influential roles played by the eugenic theories of Francis Galton and Cyril Burt. Of particular note is the impact of Burt’s controversial ideas pertaining to innate, immutable intelligence that can be accurately and

4  HISTORICAL EVOLUTION OF ACADEMIC SELECTION 

69

reliably measured at a tender age. Although the evidence contravening these discredited views has led to the abandonment of academic selection in most of Great Britain, it is problematic that it has not led to a similar outcome in Northern Ireland and some parts of England, where academic selection continues to flourish. A significant volume of empirical research has been conducted into the consequences of academic selection, both in the United Kingdom (including Northern Ireland) and in other international contexts. Whilst this chapter has given a brief history of academic selection in Northern Ireland, and a brief synopsis of how selection operates in selected other jurisdictions, the next chapter focuses on critically reviewing the extant empirical research on the consequences of academic selection.

References Bain, M. (2022, March 24). New single transfer test for P7 pupils in Northern Ireland to begin in November 2023. Belfast Telegraph. https://www.belfasttelegraph.co.uk/news/education/new-­single-­transfer-­test-­for-­p7-­pupils-­in-­ northern-­ireland-­to-­begin-­in-­november-­2023-­41482737.html Brown, M., Skerritt, C., Roulston, S., Milliken, M., McNamara, G., & O’Hara, J. (2022). The evolution of academic selection in Northern Ireland. In B. Walsh (Ed.), Education policy in Ireland since 1922 (pp.  371–399). Palgrave Macmillan. https://doi.org/10.1007/978-­3-­030-­91775-­3_12 Burns, G. (2001). Report of the review body on post-primary education (Burns report). Department of Education for Northern Ireland. https://www.education-­ni.gov.uk/publications/report-­r eview-­body-­post-­primary-­education-­ burns-­report Burt, C. (1912). The inheritance of mental characters. The Eugenics Review, 4(2), 168–200. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2986829/pdf/ eugenrev00374-­0054.pdf Burt, C. (Ed.). (1933). How the mind works. Allen & Unwin. Burt, C. (1955). The evidence for the concept of intelligence. British Journal of Educational Psychology, 25(3), 158–177. https://doi.org/10.1111/j.2044­8279.1955.tb03305.x Burt, C. (1959). The examination at eleven plus. British Journal of Educational Studies, 7(2), 99–117. https://doi.org/10.2307/3118498 Burt, C. (1962). Francis Galton and his contributions to psychology. British Journal of Statistical Psychology, 15(1), 1–49. https://doi.org/10.1111/j.2044­8317.1962.tb00081.x

70 

I. CANTLEY

Burt, C. (1966). The genetic determination of differences in intelligence: A study of monozygotic twins reared together and apart. British Journal of Psychology, 57(1–2), 137–153. https://doi.org/10.1111/j.2044-­8295.1966.tb01014.x Burt, C. (1969). The mental differences between children. In C.  B. Cox & A. E. Dyson (Eds.), Black paper two: The crisis in education (pp. 16–25). The Critical Quarterly Society. https://www.jstor.org/stable/41553800 Chitty, C. (2009). Eugenics, race and intelligence in education. Continuum. Chitty, C. (2013). The educational legacy of Francis Galton. History of Education, 42(3), 350–364. https://doi.org/10.1080/0046760x.2013.795619 Costello, S. (2004). Future post-primary arrangements in Northern Ireland: Advice from the post-primary review working group (Costello report). Department of Education for Northern Ireland. https://www.education-­ni.gov.uk/publications/costello-­report-­full Danechi, S. (2020). Briefing paper number 1398: Grammar school statistics. House of Commons Library. https://researchbriefings.files.parliament.uk/documents/SN01398/SN01398.pdf. DENI. (2022). Annual enrolments at schools and in funded pre-school education in Northern Ireland 2021–22. DENI. https://www.education-­ni.gov.uk/publications/annual-­e nrolments-­s chools-­a nd-­f unded-­p re-­s chool-­e ducation­northern-­ireland-­2021-­22 Department for Education. (2018). Drive to create more good school places for families. Department for Education https://www.gov.uk/government/news/ drive-­to-­create-­more-­good-­school-­places-­for-­families Floud, J. E., Halsey, A. H., & Martin, F. M. (1956). Social class and educational opportunity. Heinemann. Francis, P. (2022, September 5). Will grammar school ban be lifted when Liz truss or rishi Sunak becomes new prime minister? KentOnline. https://www.kentonline.co.uk/kent/news/what-­w ill-­n ew-­p m-­m ean-­f or-­g rammar-­s chool­ban-­273006/ Gallagher, T. (2021). Governance and leadership in education policy making and school development in a divided society. School Leadership & Management, 41(1-2), 132–151. https://doi.org/10.1080/13632434.2021.1887116 Gallagher, T., & Smith, A. (2000). The effects of the selective system of secondary education in Northern Ireland: Main report. Department of Education for Northern Ireland. https://www.education-­ni.gov.uk/sites/default/files/publications/de/gallagherandsmith-­mainreport.pdf Galton, F. (1869). Hereditary genius: An inquiry into its laws and consequences. Macmillan. Galton, F. (1883). Inquiries into human faculty and its development. Macmillan. Galton, F. (1908). Memories of my life (2nd ed.). Methuen & Co.. Gardner, J. (2016). Education in Northern Ireland since the good Friday agreement: Kabuki theatre meets danse macabre. Oxford Review of Education, 42(3), 346–361. https://doi.org/10.1080/03054985.2016.1184869

4  HISTORICAL EVOLUTION OF ACADEMIC SELECTION 

71

Gillham, N. W. (2001). A life of sir Francis Galton: From African exploration to the birth of eugenics. Oxford University Press. Gillie, O. (1977). Did sir Cyril Burt fake his research on heritability of intelligence? Part I. The Phi Delta Kappan, 58(6), 469–471. https://www.jstor.org/ stable/20298643 Gould, S. J. (1996). The mismeasure of man. W. W. Norton. Hearnshaw, L. S. (1979). Cyril Burt: Psychologist. Hodder & Stoughton. Heim, A. (1954). The appraisal of intelligence. Methuen. Henninges, M., Traini, C., & Kleinert, C. (2019). LIfBi working paper no. In 83: Tracking and sorting in the German educational system. Leibniz Institute for Educational Trajectories. https://www.neps-­data.de/Portals/0/Working%20 Papers/WP_LXXXIII.pdf Kamin, L. J. (1974). The science and politics of IQ. Penguin. McMurray, S. (2020). Research and information service briefing paper: Academic selection. Northern Ireland Assembly. http://www.niassembly.gov.uk/globalassets/documents/committees/2017-­2 022/education/post-­p rimar y-­ transfer-­survey/academic-­selection-­briefing-­paper-­niar-­209-­2020.pdf. OECD. (2020). PISA 2018 results (volume V): Effective policies, successful schools. OECD Publishing. https://doi.org/10.1787/ca768d40-­en Pedley, R. (1963). The comprehensive school. Penguin. Perry, C. (2016). Research and information service briefing paper: Academic selection–A brief overview. Northern Ireland Assembly. http://www.niassembly. gov.uk/globalassets/documents/raise/publications/2016-­2021/2016/education/4816.pdf Pivotal (2022). Impacts of academic selection in Northern Ireland–Literature review for independent review of education. Pivotal. https://www.independentreviewofeducation.org.uk/key-­d ocuments/academic-­s election-­l iterature-­ review-­pivotal SEAG. (n.d.). Frequently asked questions: Answers to the most common questions about the entrance assessment. Retrieved on May 2, 2023, from https://seagni. co.uk/guidance-­for-­parents/faqs Simon, B. (1953). Intelligence testing and the comprehensive school. Lawrence & Wishart. Simon, B. (1955). The common secondary school. Lawrence & Wishart. Sutherland, A. E. (1990). Selection in Northern Ireland: From 1947 act to 1989 order. Research Papers in Education, 5(1), 29–48. https://doi.org/10.1080/ 0267152900050103 White, J. (2006). Intelligence, destiny and education: The ideological roots of intelligence testing. Routledge.

CHAPTER 5

Consequences of Academic Selection

Abstract  In this chapter, I critically review relevant literature on the consequences of academic selection that has been published since the beginning of the twenty-first century, with the aim of assessing the overall benefits, and potential drawbacks, of academically selective education systems. Initially, I focus on evidence pertaining to the capacity of academic selection to improve educational outcomes for students. I then proceed to evaluate the impact of selection on the social composition of schools and the capacity of selective education systems to foster equality of educational opportunities for all students. I assess the evidence pertaining to the potential of academic selection to promote social mobility, which is one of the commonly purported benefits of selection: it allows bright children from poorer backgrounds to avail of educational and vocational opportunities that would not have been accessible to them otherwise. Finally, I consider evidence pertaining to other knock-on consequences of academic selection, such as its implications for learning and teaching in schools, and the effects of failure to obtain a place at an academically selective school on students’ future socio-emotional outcomes. Keywords  Advantages of academic selection • Disadvantages of academic selection • Empirical evidence

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 I. Cantley, The Philosophical Limitations of Educational Assessment, https://doi.org/10.1007/978-3-031-47021-9_5

73

74 

I. CANTLEY

Introduction Many studies have been conducted into the consequences of academic selection. Although a significant proportion of this work relates to the United Kingdom context, some researchers have investigated the consequences of academic selection beyond the confines of the United Kingdom. This body of research has addressed a range of issues, from the effectiveness of academically selective schools, and the overall effectiveness of education systems that include such schools, to unintended consequences and social justice implications of academic selection. In this chapter, I critically review relevant literature on the consequences of academic selection that has been published since the beginning of the twenty-first century, with the aim of assessing the overall benefits, and potential drawbacks, of academically selective education systems. My dominant concern in the current chapter is to engage critically with literature that reports on empirical research, rather than theoretical analyses or policy critiques relevant to the academic selection debate. Initially, I focus on evidence pertaining to the capacity of academic selection to improve educational outcomes for students. I then proceed to evaluate the impact of selection on the social composition of schools and the capacity of selective education systems to foster equality of educational opportunities for all students. In particular, I assess the evidence pertaining to the potential of academic selection to promote social mobility, which is one of the commonly purported benefits of selection: it allows bright children from poorer backgrounds to avail of educational and vocational opportunities that would not have been accessible to them otherwise. Finally, I consider evidence pertaining to other knock-on consequences of academic selection, such as its implications for learning and teaching in schools, and the effects of failure to obtain a place at an academically selective school on students’ future socio-emotional outcomes.

Effectiveness of Grammar Schools Academic Achievement at Post-primary Level Despite the purported academic benefits attributed to grammar schools, there is a lack of consensus in the literature regarding the relative merits of selective and non-selective education systems in terms of their capacity to improve the educational outcomes of students. There is very clear

5  CONSEQUENCES OF ACADEMIC SELECTION 

75

evidence that grammar school students perform significantly better in raw attainment terms than students who attend non-selective schools, as demonstrated by their superior results in public examinations (Gallagher & Smith, 2000; Andrews et  al., 2016). However, raw public examination results do not provide a fair indication of the effectiveness of a school because of the influence of both school and student level characteristics on examination performance. Differences in examination performance may be largely attributable to students’ prior attainment levels, or other attributes of students that are beyond the control of schools, such as socioeconomic status (SES), rather than the type of school attended (Gorard & See, 2013). Students’ eligibility for free school meals (FSM) is often taken to be a proxy for their SES since those from lower socioeconomic backgrounds are more likely to be eligible for FSM. In addition, differences in students’ educational growth may be impacted by school characteristics, such as whether it is a co-educational or single-sex school. A fairer approach to gauging school effectiveness entails using value-added models to measure the growth in students’ academic achievement while also controlling for other contextual variables, at either the student or the school level, or both. Such an approach enables the impact of the school on academic performance to be more accurately quantified since it permits students’ progress to be compared on a like for like basis, while mitigating the effects of student and/or school characteristics. Several studies into the effectiveness of grammar schools have been conducted that do not control for prior attainment at school level. Such studies generally report that academically selective grammar schools have a positive effect on student achievement. For example, based on an analysis of data collected from 1784 students from eight grammar schools and 17 non-selective schools in Northern Ireland, Shuttleworth and Daly (2000) reported that grammar schools had a significant positive effect on student achievement. According to Shuttleworth and Daly (2000), grammar school students performed significantly better than comparable students from non-selective schools in General Certificate of Secondary Education (GCSE) examinations, which are taken by students towards the end of their compulsory post-primary education, at approximately 16 years of age, in the United Kingdom. Grammar school students achieved 16 GCSE grades more than comparable students from non-selective schools did. Although Shuttleworth and Daly’s (2000) work used multi-level modelling to account for the hierarchical structure of the dataset, since students were nested within schools, and it controlled for several

76 

I. CANTLEY

student-­level variables, it did not control for prior attainment at the school level. Levačić and Marsh (2007) conducted a similar study, albeit on a larger scale, that involved the analysis of data for more than 330,000 students from 20 local education authorities in England, 10 of which were totally selective and 10 were partially selective. Levačić and Marsh’s (2007) analysis included data derived from students attending grammar schools, secondary modern schools (non-selective post-primary schools attended by students who either failed to secure a grammar school place, or who did not sit the relevant transfer test) and all-ability comprehensive schools. Based on multi-level modelling that controlled for several student-level and school-level variables, but not prior attainment at school level, they found that grammar school students on average obtained an additional 5.5 GCSE grades relative to comparable students attending comprehensive schools, while secondary modern students obtained on average one GCSE grade less than comparable students who attended comprehensives. Again, the significant grammar school effect is noteworthy, although the achievement gain associated with grammar school attendance was substantially lower than that reported by Shuttleworth and Daly (2000) in a Northern Irish context. Harris and Rose (2013) conducted a smaller-scale investigation than Levačić and Marsh (2007) that focused on a local education authority in England, which operates a fully selective education system, Buckinghamshire. Harris and Rose conducted a logistic regression analysis of GCSE performance data, with school type as the predictor variable, for matched samples of grammar and secondary modern students who were academically borderline with respect to whether their prior attainment (at the end of primary education), thus indicating they were potentially capable of obtaining a grammar school place. Harris and Rose concluded that grammar school students were more likely than their matched secondary modern counterparts to attain five or more GCSE grades A*-C. They also performed an analysis comparing Buckinghamshire grammar school students to similar comprehensive school students from the neighbouring local education authority of Oxfordshire, which does not operate a selective education system. In addition, they compared the performance of Buckinghamshire secondary modern students with similar students from Oxfordshire who attended comprehensive schools. Based on this comparative analysis, Harris and Rose (2013) concluded that Buckinghamshire’s grammar schools increased the probability of borderline students obtaining 5+ GCSEs at grades A*–C by four or five percentage points relative to

5  CONSEQUENCES OF ACADEMIC SELECTION 

77

Oxfordshire’s comprehensive schools, while secondary modern schools decreased the probability of success by one to three percentage points relative to Oxfordshire’s comprehensive schools. This suggests that early academic selection leads to a bipolar profile of academic achievement, with generally high achieving grammar schools, and a separate if wider, profile of outcomes in secondary schools. In other words, grammar schools do provide access to relatively high performance to those who gain admission to them and stay there, but they enhance the level of inequality of outcomes in the student population as a whole. However, it is important to note that Harris and Rose’s (2013) conclusion in this regard is contradicted by Coe et al.’s (2008) finding that grammar schools do not result in collateral damage to the performance of students in non-grammar schools. Nevertheless, evidence from Northern Ireland, where academic selection is in widespread use, lends support to Harris and Rose’s (2013) claim pertaining to the reduced performance of students in non-grammar schools that is inextricably linked to the academic success of those who attend grammar schools. The Northern Ireland context is somewhat different from the English context because of the prevalence of grammar schools in the post-primary educational landscape. In addition, the system of open enrolment that was introduced following the 1989 Education Reform (NI) Order has led to a situation where grammar schools admit a much larger proportion of post-primary students than was the norm prior to 1989. For example, the proportion of post-primary students admitted to grammar schools increased from 29% in 1984 to 43% in 2021 (DENI, 2022; Gallagher & Smith, 2000), and the corresponding proportion of students admitted to non-selective secondary schools decreased from 71% to 57% over the same timeframe. This has also led to an associated decrease in the ability profile of students attending non-selective secondary schools as grammar schools “cream off” the more academically talented students who would have been admitted to non-selective schools prior to the introduction of open enrolment (Byrne & Gallagher, 2004; Gallagher & Smith, 2000). Consequently, this has led to a strong bipolar distribution of outcomes for students between high-performing grammar schools and lowerperforming non-selective secondary schools (Gallagher, 2021; Gallagher & Smith, 2000). As alluded to previously, studies that do not control for school-level prior attainment generally report that grammar school students make greater academic progress than students in non-selective schools do.

78 

I. CANTLEY

However, the findings of Gorard and Siddiqui’s (2018) study, which did not include prior attainment at the school level as a covariate in the model, cast doubt on the existence of a positive grammar school effect on academic achievement. Based on regression analyses relating the GCSE performance of the complete 2015 cohort of GCSE students in England to various student-level and school-level variables (excluding prior attainment at school level), Gorard and Siddiqui concluded that grammar schools have no advantage on other schools in relation to improving students’ educational outcomes. However, it is noteworthy that Gorard and Siddiqui controlled for students’ socioeconomic background by including the total number of years’ students had been eligible for FSM as an independent variable in their regression models, rather than the more usual dichotomous indicator of FSM eligibility (yes/no). This may have more reliably quantified SES and controlled for unmeasured differences between grammar school students and non-grammar school students. It has been suggested that controlling for school-level prior attainment in value-added models may help to eliminate the effects of unmeasured differences between students that are not adequately captured in the student-­level variables or other school-level variables, thus removing measurement errors which could potentially bias effectiveness estimates in favour of more privileged schools (Coe et  al., 2008; Perry, 2019). However, there is a possibility that controlling for school-level prior attainment could factor out genuine differences between grammar schools and non-selective schools, thus leading to underestimates of grammar school effectiveness. Nevertheless, in Coe et  al.’s (2008) extremely thorough research into grammar school effectiveness, attempts were made to investigate the consequences of including school-level variables (including prior attainment), in addition to student-level variables, as covariates in various value-added models. Based on several statistical models, Coe et al. (2008) concluded that grammar school attendance leads on average to a gain of between zero and three-quarters of a grade per subject at GCSE level. However, when school-level compositional variables were included in the models, which they favoured based on advice from the school effectiveness literature, Coe et al.’s estimates of the grammar school advantage were at the lower end of this range. Furthermore, Coe et al. (2008) noted that, in their analyses where grammar school students appeared to make greater progress, the same students were already making greater academic progress during their primary school careers. This may indicate the existence of unaccounted for differences between students from grammar and

5  CONSEQUENCES OF ACADEMIC SELECTION 

79

non-selective schools, which limits the confidence that can be placed in the existence of a positive grammar school effect in relation to academic achievement. It is also important to note that Coe et al. (2008) stressed the conclusions of any investigation into grammar school effectiveness are strongly influenced by a range of factors, including the availability of high-quality data and various methodological decisions pertaining to the way in which the data are analysed. On the latter point, Lu (2023) reinforced the importance of carefully considering the suitability of different research designs and statistical models when undertaking studies into grammar school effectiveness. Lu (2023) argued that the choice of outcome variables, explanatory variables, and different types of regression analysis (ordinary least squares regression or multi-level modelling) could have a substantial impact on conclusions regarding the grammar school effect on educational outcomes. Acknowledging that unstable estimates of effect sizes are common issues in school effectiveness research, Lu (2023) suggested that studies into grammar school effectiveness might be even more sensitive to choices of statistical models. Lu hypothesised that differences between grammar and non-selective school intakes may mean there is a greater likelihood of choices in relation to outcome measures, explanatory variables and regression models confounding the conclusions that can be drawn from grammar school effectiveness research, with different choices leading to different conclusions. Indeed, she alluded to the fact such issues could help to explain the inconsistent evidence regarding grammar school effectiveness. Some research has found that grammar schools are of particular benefit to certain groups of students, but there is a lack of consistency in the reported findings. For example, Levačić and Marsh (2007) concluded that grammar school attendance leads to the greatest attainment-related benefits for average ability students, but that the grammar school attainment advantage was lower for more capable students. However, Galindo-Rueda and Vignoles (2005) reported that high ability students, particularly girls, gained most from a grammar school education in terms of educational outcomes, while Atkinson et al. (2006) suggested that socioeconomically disadvantaged students stand to derive most benefit from attending a grammar school. The latter finding is also supported by the research of Maurin and McNally (2007), thus suggesting that grammar school attendance is particularly advantageous for students from less affluent backgrounds. However, the lack of consensus in relation to whom a grammar

80 

I. CANTLEY

school education favours to the greatest extent is problematic and suggests that drawing generalisations from the research may be unjustified. In addition, some studies have investigated the issue of whether selective education systems are more effective overall than comprehensive systems, but again the evidence is inconsistent. For example, Atkinson et al. (2006) and Andrews et al. (2016) both conclude that there is no appreciable difference between the two systems. However, Marks (2000) found in favour of selective systems, while Jesson (2001) concluded that comprehensive systems lead to the optimal overall educational outcomes for all students. This lack of consensus makes it difficult to discern which system is most beneficial overall for all students, and thus drawing generalisable conclusions from the extant research would be ill advised. Students’ Trajectories Beyond Post-primary Education Several studies have investigated the impact of academic selection on students’ later trajectories, after they have left school, rather than just focusing on the impact of selection on academic achievement towards the end of post-primary education. In a similar vein to academic achievement, the research evidence is inconclusive about the effects of academic selection on students’ trajectories beyond school. Based on evidence from Northern Ireland, Gallagher and Smith (2000) reported that grammar school students were more likely to pursue A-level study post-16 and to aspire to entering higher education than their counterparts in non-selective schools were. Non-selective students, by contrast, were more likely to pursue other options, such as vocational training, and very few aspired to enter higher education. Similarly, Clark (2010) concluded that grammar school students were more likely to pursue higher education studies than their counterparts in other school sectors were. However, based on an analysis of data from national records in England, Crawford (2014) posited that grammar school attendance tends not to influence students’ progression to higher education in the long term. Crawford (2014) found that state grammar school students were over 40% more likely to go to university and over 30% more likely to go to a prestigious university than students who attended non-selective state schools. Nevertheless, when students’ prior attainment was controlled for, the grammar school advantage reduced to under 4% for progression to higher education, and under 1% for entry to a prestigious university. Interestingly, Crawford (2014) noted that grammar school students were less likely to

5  CONSEQUENCES OF ACADEMIC SELECTION 

81

complete their degrees and less likely to graduate with a high degree class than those from non-selective schools. Therefore, when students’ prior attainment is taken account of, there is little difference between higher education participation rates of grammar and non-selective school students, and the latter appear to be more likely to successfully complete their degree courses. Lu (2021) also conducted research on a large sample of student data from national records in England, and she reported that grammar school students were significantly more likely to participate in higher education than their counterparts from comprehensive schools were. She noted, however, that the grammar school advantage reduced substantially when pre-existing differences between students and/or schools were accounted for. Furthermore, Lu (2021) suggested there was limited evidence of a favourable grammar school effect in relation to attending a prestigious university. Although Crawford (2014) and Lu (2021) found limited advantage for grammar school students in relation to higher education participation and outcomes, Sullivan et  al. (2018) reported that there was a residual positive grammar school effect, after controlling for student and school-level variables, in relation to highest qualification attained by age 42. Nevertheless, in general, the research pertaining to the impact of academic selection on students’ trajectories beyond post-primary education places substantially greater emphasis on their progression to, and performance in, higher education than on their longer-term outcomes. Rather less attention has been given to how academic selection influences individuals’ longer term personal autonomy, health, happiness, and social responsibility, which are arguably more important outcomes of education than academic achievement. International Evidence In a similar manner to the United Kingdom evidence base, the international evidence pertaining to the effectiveness of academically selective schools is divided. For example, based on research conducted in Chile, Araya and Dussaillant (2019) found that selective schools led to significant gains in student attainment in both language and mathematics. Furthermore, beneficial longer term effects of selective schooling were reported by Estrada and Gignoux (2017), who concluded that students who attended selective high schools in Mexico City were more likely to have greater future earning potential. However, in a Chinese context, Anderson et  al. (2016) reported that students who attended the most

82 

I. CANTLEY

selective high schools demonstrated no greater academic achievement gains than those who attended less selective high schools. Terrin and Triventi (2022) performed a meta-analysis of international evidence pertaining to the effects of tracking (ability grouping), in the form of both between-school tracking (academic selection) and within-­ school tracking (within-school ability grouping) on post-primary students’ overall learning achievement. In this analysis, which incorporated the results from 53 studies published between 2000 and 2021, Terrin and Triventi (2022) found that tracking leads to no statistically significant effect on either overall student achievement or subject-specific achievement (at the 5% level). However, Terrin and Triventi (2022) also reported that, relative to within-school tracking (within-school ability grouping), between-school tracking (academic selection) leads to a statistically significant positive effect on overall student achievement (at the 0.1% level), with an effect size of 0.023. This implies that academic selection can increase overall student achievement to a greater extent than within-school ability grouping, but the associated effect size is very small. Nevertheless, there are some issues associated with Terrin and Triventi’s work that may restrict its utility. Firstly, it is noteworthy that the meta-analysis synthesised results from a diverse range of studies that adopted different methodological approaches, and were conducted in different contexts, so the results may not be directly comparable. Secondly, it is important to note that Terrin and Triventi’s analysis of the effects of tracking compared not only the difference between tracked and comprehensive systems, but also the difference between tracked and less-tracked systems, which may mask the true effects of tracking when the counterfactual comprises just comprehensive systems. Thirdly, several of the synthesised studies utilised international comparative tests of student achievement, such as those that feature in the Programme for International Student Achievement (PISA), which curtails the analysis of student achievement to the age at which the PISA study is conducted, approximately 15 years old. However, tracking does not commence until this age in some countries, and therefore the effects of tracking may have had insufficient time to emerge. Finally, Terrin and Triventi’s work just considered school-level achievement and did not investigate the impact of tracking on student performance beyond post-­ primary level, such as in higher education or in the labour market. Therefore, it may be unwise to draw generalisable inferences from Terrin and Triventi’s meta-analysis.

5  CONSEQUENCES OF ACADEMIC SELECTION 

83

Methodological Issues in Grammar School Effectiveness Research Most of the research that has investigated the effectiveness of grammar schools relative to non-selective schools has relied on regression models to relate student outcomes to various student-level or school-level explanatory variables. However, such an approach is essentially predicated on correlational analyses, and therefore cannot conclusively establish causal relationships between grammar school attendance and student outcomes. There may be unmeasured characteristics of students and/or schools that have not been incorporated into the models that serve as confounding variables. The failure to account for these confounding variables could potentially lead to erroneous conclusions about grammar school effectiveness. Of course, the definitive approach to establishing whether a causal relationship exists between grammar school attendance and student outcomes would be to conduct a randomised controlled trial (RCT), whereby students would be randomly allocated to either grammar or non-selective schools, and the relative change in student outcomes over a fixed time interval compared. Clearly, an RCT is untenable in this context, but Lu (2020a) investigated the merits of using a regression discontinuity design (RDD) to assess grammar school effectiveness. In an RDD, students scoring close to the cut score in the grammar school admission test are allocated to treatment or control groups based on whether they have attained the pass score in the test, thus securing a grammar school place (treatment), or otherwise (control). Since students in both groups are similar, an RDD reduces the likelihood of baseline differences between the groups. Based on her research, Lu (2020a) concluded that grammar school attendance led to a positive effect on student attainment, but she questioned the robustness of the observed effect because of the lack of availability of some of the data required to perform the analysis. However, it is important to note that an RDD only considers the impact of academic selection on borderline students and gives no indication of its impact on non-­ borderline students. A further important issue that compromises the validity and reliability of grammar school effectiveness research is that many studies in this area may have relied upon assessments of students’ cognitive capabilities derived from high stakes tests taken at specific points in time. Of course, such assessments are subject to the Wittgensteinian critique of educational assessment that I advanced in Chap. 2. In other words, scores in these assessments may not reliably reflect students’ capability levels, thus

84 

I. CANTLEY

compromising the robustness of the research methodology. Indeed, this may be an important explanatory factor in relation to the apparent lack of consensus in the grammar school effectiveness research literature. The fact that some studies report a positive grammar school effect, while others conclude that a grammar school education leads to no significant academic achievement gains may, at least to some extent, be attributable to the dubious philosophical foundations of the educational measurements employed in the studies. Summary Comments In this section, I have reviewed a range of evidence pertaining to the effectiveness of academically selective education systems in terms of their capacity to improve educational outcomes for students, both in the short term and in later life. Unfortunately, because of the vast array of research purposes, contexts, methods, and data analysis approaches that feature in the literature, and the inherent weaknesses of many of the studies, it would be inappropriate to attempt to synthesise the results to arrive at a net grammar school effect for different educational outcomes. Nevertheless, I contend that the evidence I have reviewed demonstrates there is a distinct lack of consensus in the literature about the potential of selective education systems to improve educational outcomes for students, although more studies appear to report some form of positive grammar school effect rather than any alternative conclusion. In the following section, I switch my focus to assessing the impact of academic selection on the social composition of schools and the extent to which selective education systems help to promote social mobility and equity in education.

Social Composition of Schools, Social Mobility, and Educational Equity Social Composition of Schools One of the most frequently purported benefits of academic selection is its capacity to promote social mobility by enabling capable students to attend grammar schools irrespective of their socioeconomic background. However, the evidence supporting the potential of academic selection to enhance social mobility is disputable because of the limited proportion of

5  CONSEQUENCES OF ACADEMIC SELECTION 

85

socioeconomically disadvantaged students who gain admission to grammar schools (Andrews et  al., 2016; Jerrim & Sims, 2019; Levačić & Marsh, 2007; Lu, 2020b). For example, during the 2021–2022 academic year, just 13.4% of grammar school students in Northern Ireland were eligible for FSM, compared to an FSM eligibility rate of 36.1% for students attending non-selective post-primary schools (DENI, 2022). The under-representation of socioeconomically disadvantaged students in grammar schools is also apparent in England, with Andrews et al. (2016) reporting an FSM eligibility rate of 2.5% for grammar school students, compared to a corresponding figure of 13.2% for all state-funded post-­ primary students. It is important to note, however, that different criteria are used to assess FSM eligibility in Northern Ireland and England. Nevertheless, these statistics are worrying and cast doubt on the potential of grammar schools to promote social mobility. After all, socioeconomically disadvantaged students will be unable to benefit from any positive effects that may be associated with a grammar school education unless they secure a place at a grammar school. In a logistic regression analysis of data pertaining to a large sample of students from England, Lu (2020b) found that, for those with comparable prior attainment levels, FSM eligible students were less likely to attend grammar schools than their non-FSM eligible peers were. However, there is evidence to suggest that socioeconomically disadvantaged students are less likely to take academic selection tests than their more affluent peers, and that those who do take the tests have a lower pass rate than children from more socially advantageous backgrounds (Allen & Bartley, 2017). Some research has indicated that parents from higher socioeconomic backgrounds are more likely to value academic achievement, thus incentivising them to seek admission to academically selective schools for their children (Leroux, 2015; Reay, 2004). Therefore, it is conceivable that parents of children from disadvantaged backgrounds may be less likely to encourage their children to sit the academic selection tests that are necessary to gain entry to a grammar school. There is also empirical evidence to suggest that peer groups can exert an influence on school choice (Morris & Perry, 2017), and thus it is possible that, even if they are considered capable of securing entry to a grammar school, children from more deprived backgrounds may receive parental support to attend the same (non-selective) schools as their peers. Interestingly, Lu’s (2020b) analysis also revealed that students with special educational needs (SEN) were less likely to be educated in

86 

I. CANTLEY

grammar schools than those without SEN, and that some students from ethnic minority backgrounds had a greater probability of gaining admission to grammar schools than those from other backgrounds. Whilst these findings broadly resonate with those reported by other scholars, some of the conclusions relating to minority ethnic students contradict the results of previously published research (Andrews et al., 2016; Cribb et al., 2014). However, Lu (2020b) also reported that academic attainment towards the end of primary education is the dominant predictor of grammar school attendance, and that students’ personal backgrounds, such as their FSM eligibility, SEN status or cultural background, have comparatively less influence on grammar school admission. Lu (2020b) concluded that the differences in grammar school admission rates between students from different social backgrounds largely reflected the differentials in their earlier academic attainment, rather than a biased selection procedure. Whilst grammar school admission is determined by academic performance, academic attainment has been shown to be positively correlated with children’s social backgrounds (Rasbash et al., 2010). In effect, this suggests that grammar schools effect social selection as well as academic selection, which further fuels misgivings about their capacity to promote social mobility. Moreover, this type of social stratification is undesirable since students who attend more socially diverse schools are likely to have increased civic awareness and tolerance of individuals from different social backgrounds (Morris & Perry, 2017). However, it is important to acknowledge that social selectivity is not just a characteristic of grammar schools (Cullinane et al., 2017). Coe et al. (2008) confirmed that social selectivity is also apparent within the non-selective sector, with some of the most socially selective schools in England being comprehensives. This is unsurprising given that more affluent families will be able to afford houses within the catchment areas of top-performing comprehensive schools (Cullinane et al., 2017). Coe et al. (2008) concluded that some comprehensive schools might be facilitating social selection, with academic selection as a by-product, while grammar schools facilitate academic selection with social selection as a by-product. This suggests that simply abolishing academic selection, and transitioning to a fully comprehensive system, may not be sufficient to alleviate social selectivity in post-primary education. In addition, it is likely that schools would need to de-prioritise the proximity of home and school locations in their admissions criteria. In a more in-depth analysis, Jerrim and Sims (2019) took up the challenge of more thoroughly investigating why a disproportionately low

5  CONSEQUENCES OF ACADEMIC SELECTION 

87

number of children from socioeconomically disadvantaged backgrounds are admitted to grammar schools. Jerrim and Sims (2019) conducted an analysis of longitudinal data from a representative sample of students in England and Northern Ireland, who participated in the Millennium Cohort Study, to investigate the extent of, and reasons for, social selectivity in grammar schools. Jerrim and Sims (2019) found that, in Northern Ireland, children from families in the top quartile of household income were 33% more likely to attend a grammar school than children from families in the bottom quartile who had equivalent levels of prior attainment. In England, the corresponding figure was found to be slightly less at 20%. Jerrim and Sims (2019) found private tutoring to be a major contributory factor to the differential grammar school attendance rates for children from high- and low-income families in England. This resonated with the fact Jerrim and Sims (2019) also found that private tutoring tends to be more prevalent in areas where academic selection persists in England, is used to a greater extent by high income families and is more likely to be in subjects addressed by grammar school admission tests. However, although private tutoring was found to be equally popular in Northern Ireland, it had a less influential explanatory role in accounting for the differences in grammar school attendance between children from more and less affluent backgrounds (Jerrim & Sims, 2019). This may be attributable to the fact some primary schools explicitly devote time to preparing students for admissions tests in Northern Ireland, thus reducing, but not eliminating, the impact of tutoring. Furthermore, the influence of private tutoring on the different rates of grammar attendance is unsurprising since Ireson and Rushforth (2011) confirmed that English children from higher socioeconomic backgrounds are more likely to be tutored to help them to make successful transitions within the education system. Cribb et al. (2013) specifically highlighted the fact that children from more affluent backgrounds are frequently coached to pass grammar school admission tests. Indeed, the widespread use of tutoring to assist students in gaining admission to academically selective schools has been reported in other international contexts. For example, Exley (2020) noted the role of selective schooling in fuelling the demand for private tutoring in South Korea. Differences in parental preferences for different school types, with more affluent families favouring schools with good examination results, have also been found to have a bearing on the differential grammar school attendance rates between children from high- and low-income families in Northern Ireland (Jerrim & Sims, 2019). Interestingly, Jerrim and Sims

88 

I. CANTLEY

(2019) observed that differences in parental school preferences had a less pronounced role in accounting for the grammar school attendance differential in England. They surmised that the greater influence of parental school choices in accounting for the grammar school attendance differential in Northern Ireland might be attributable to the totally selective education system in Northern Ireland, whereby students who fail to gain a grammar school place are generally compelled to attend a secondary school with other lower-attaining students. In England, however, the partially selective education system ensures that students who do not secure a grammar school place have the fallback option of attending an all-ability comprehensive school. There have been initiatives to address the critiques pertaining to the low grammar school attendance rates of students from socially disadvantaged backgrounds in England, such as quota systems to ensure that a certain proportion of FSM-eligible students are admitted to grammar schools. However, there is limited evidence of beneficial effects associated with these initiatives, and there have been political and practical challenges in relation to exactly which groups should be given priority (Lu, 2020b). Therefore, such initiatives are unlikely to deal effectively with the underrepresentation of socioeconomically disadvantaged students in grammar schools. Social Mobility Notwithstanding the evidence cited above confirming that grammar schools admit a disproportionately low number of students from socially disadvantaged backgrounds, Mansfield (2019) argued that grammar schools have the capacity to promote social mobility by facilitating improved admission rates to prestigious higher education institutions. Mansfield’s (2019) analysis compared the performance of students from selective and non-selective areas of England in relation to their progression to higher education. He found that, at the whole system level, students from selective regions only had a marginally higher probability of progressing to higher education than their counterparts from non-­selective regions. However, Mansfield (2019) pointed out that students from selective areas were significantly more likely to gain access to prestigious universities, including institutions such as Oxford and Cambridge, than those from non-selective areas were, and he claimed that this trend was apparent across all five quintiles of social disadvantage. In fact, Mansfield (2019)

5  CONSEQUENCES OF ACADEMIC SELECTION 

89

posited that a student from the most socioeconomically disadvantaged quintile had more than twice the probability of gaining admission to Oxbridge if they resided in a selective instead of a non-selective region. Mansfield (2019) also made the point that study at highly prestigious universities is strongly correlated with higher earning potential upon graduation and progression to the upper echelons of professional careers. Furthermore, Mansfield (2019) claimed that 45% of grammar school students in England come from families with less than the median income. Consequently, he concluded that grammar schools have a pivotal role to play in promoting social mobility, since they grant students from lower income families comparable access to prestigious higher education institutions to those from more affluent backgrounds who are educated in independent, fee-paying schools. Mansfield (2019) suggested that his findings were more representative of the potential of grammar schools to promote social mobility than previous research had reported because he took account of a more representative measure of social disadvantage, while previous studies focused too narrowly on FSM-eligible students. Indeed, Mansfield (2019) insinuated that personal ideologies of researchers may have contaminated the grammar school debate, and he claimed: Particular manifestations of this include the way the debate on social mobility has been narrowed to focus on those eligible for Free School Meals and a failure to consider the contribution that grammar schools play on enabling social mobility among less advantaged groups who are proportionately represented at grammar schools, such as pupils below median income. (p. 49)

In view of his conclusions about the positive role of grammar schools in promoting social mobility, Mansfield (2019) recommended the expansion of the grammar school sector in England, with a particular focus on increasing their prevalence in socially disadvantaged areas. Despite Mansfield’s (2019) glowing endorsement of the positive grammar school effect on social mobility, there are weaknesses in his analysis, which severely limit the credence that can be given to his conclusions, including his rather curious conception of social mobility. Furthermore, Mansfield (2019) fails to consider the impact of academic selection on those who do not gain admission to grammar schools within selective areas, and who are compelled to attend non-selective secondary schools that are non-comprehensives. This serious omission means that Mansfield’s (2019) work fails to reflect the overall impact of academic selection on

90 

I. CANTLEY

social mobility and focuses on the upward mobility of those who are educated in grammar schools. What if those who fail to gain admission to a grammar school experience downward mobility, to such an extent that the aggregate impact on social mobility is negligible or even negative? Mansfield (2019) is silent on issues such as this. In addition, Mansfield (2019) appears to conflate correlation with causality when gauging the impact of grammar schools on progression to prestigious universities and social mobility. Grammar schools are generally concentrated in more affluent areas of England, such as the south-east, and there may be other differences between these areas and those that embraced the comprehensive model of post-primary education, which impact on progression to higher education. For example, there may be important unmeasured differences between the respective student populations in these areas, other than grammar/comprehensive school education, which help to explain the differential in prestigious higher education progression rates between selective and non-selective areas. Finally, as Dickson and Macmillan (2020) pointed out, significant issues with the underlying statistical analysis invalidate Mansfield’s (2019) claim that 45% of grammar school students in England come from families with less than the median income, and the actual figure is likely to be considerably less than this. This is mainly attributable to the incompleteness of the Department for Education data relating to household income utilised by Mansfield in his analysis, which excluded substantial numbers of high income families. This, in turn, led to an overestimate of the proportion of students from lower socioeconomic backgrounds who secure grammar school places. It is also noteworthy, however, that several other researchers have come to very different conclusions from Mansfield regarding the potential of selective education systems to promote social mobility. In an English context, Burgess et al. (2017) investigated the potential of academic selection to promote social mobility by analysing grammar school admission rates for students from the entire range of SES, at different percentiles of the SES distribution, rather than the more usual FSM eligibility binary indicator of SES. They reported that admission rates to grammar schools are highly skewed, with respective grammar school admission rates of 6%, 23%, 51% and 79% at the 10th, 50th, 90th and 99th percentiles of SES. In addition, Burgess et al. (2017) compared the relative likelihoods of attendance at a grammar school in selective areas and a comprehensive school with a similar calibre of student intake in matched non-selective areas. They concluded:

5  CONSEQUENCES OF ACADEMIC SELECTION 

91

It is clear that in comparison to access to grammar schools in selective areas, the relative chances of attending a comparable school in a non-selective area is far more evenly spread across the distribution of SES, with those from more deprived families only slightly less likely to attend comparable-intake schools than those from the 70th percentile. (Burgess et al., 2017, p. 15)

Burgess et al. (2017) noted that, although students at the upper end of the SES distribution are more likely to gain admission to high calibre intake comprehensive schools in non-selective areas, the relative likelihood of attending such schools is less pronounced than for grammar schools in selective areas. Burgess et al. (2017) also considered the higher education participation of grammar school students relative to those who narrowly failed to gain grammar school places in selective regions of England. They found that grammar school students have a greater probability of pursuing higher education, and of attending a prestigious university, than students who narrowly failed to obtain a grammar school place in selective areas. In addition, they reported that students who narrowly missed out on obtaining a grammar school place within selective regions have a significantly lower probability of higher education participation, and of gaining admission to a prestigious university, than their counterparts from non-selective regions, with differentials of 3% and 8% respectively. This suggests that academic selection operates to the detriment of high-attaining students who just narrowly fail to obtain a grammar school place, which therefore raises important ethical questions about selective education, while also furnishing information on some of the consequences of academic selection that Mansfield (2019) failed to address. Burgess et al.’s (2017) findings cast doubt on the validity of Mansfield’s (2019) claim that academic selection promotes social mobility, a claim that has also been critically evaluated by Buscha et al. (2021). Based on an analysis of census data and linked administrative datasets pertaining to academic selectivity in different areas of England, Buscha et  al. (2021) concluded that there is scant evidence to suggest academic selection has either a positive or a negative impact on social mobility. Buscha et  al.’s findings are more convincing than those of Mansfield (2019) because, unlike Mansfield, they investigated the effects of academic selection on all children rather than just those who attended grammar schools. Furthermore, Buscha et al.’s (2021) analysis took account of the regional and temporal variations in selective schooling in England over several

92 

I. CANTLEY

years, which facilitated the estimation of more robust social mobility effects than were reported by Mansfield (2019). However, a notable corollary to Buscha et al.’s (2021) work is that comprehensive schools do not promote social mobility either, although this is less problematic since, unlike for academically selective schools, social mobility is not a commonly purported benefit of the comprehensive system. If academic selection were truly a vehicle for promoting social mobility, and creating a more meritocratic society, then it should lead to a situation where academically capable children of lower SES experience greater upward mobility because of the opportunities afforded to them by selection. Correspondingly, less academically talented children from higher socioeconomic backgrounds would be expected to experience greater downward mobility since academic selection purportedly rewards inherent talent and effort rather than social privilege. Of course, there are other factors influencing the extent of social mobility that is feasible within a given society, such as changes to the relative proportions of low- and high-­ status professions over time. However, if academic selection had a significant bearing on the relative life chances of children from lower socioeconomic backgrounds, this should be reflected to some extent in higher rates of social mobility. Yet, the more robust available evidence suggests that the levels of social mobility have remained stubbornly low in relation to both selective and comprehensive education systems, and generally indicates that the ultimate social standing of an individual is strongly conditioned by the socioeconomic circumstances into which they were born (Buscha et al., 2021). Hurn (1993) invoked status competition theory to analyse why supposedly meritocratic education systems, such as those that embrace academic selection, fail to promote social mobility. Status competition theory posits that competition between different groups for high-status jobs and desirable positions in society has precipitated a rapid expansion of the prevalence of educational credentials. However, the relative possession of credentials in comparison to other groups determines the chances of a given group securing desirable outcomes. Even if academic selection leads to children from lower socioeconomic backgrounds gaining more educational credentials than they would have achieved within a comprehensive system, their more socially advantaged peers have access to greater resources to augment their credentials and restore their competitive advantage. This, in turn, means that the relative life chances of students

5  CONSEQUENCES OF ACADEMIC SELECTION 

93

from lower socioeconomic backgrounds will tend to remain reasonably constant irrespective of the system within which they were educated. Therefore, it appears that schools, either selective or comprehensive, have limited impact on reducing the disadvantages experienced by students from lower socioeconomic backgrounds, and promoting their social mobility, to assist in creating a society with greater opportunities for all citizens. However, although schools cannot compensate for social background, scepticism regarding the potential of any type of schooling to mitigate the effects of SES on future mobility does not mean that academic selection should be condoned. Whilst selection may not promote social mobility, there is evidence to suggest that selective regimes lead to greater achievement differentials between socially advantaged and disadvantaged students compared to comprehensive systems (Van de Werfhorst & Mijs, 2010). The resulting attainment inequalities between advantaged and disadvantaged students in favour of the former have the potential to compromise the life chances of the latter. Evidence pertaining to this consequence of academic selection is reviewed in the next sub-section. Educational Equity It has already been established that a limited proportion of socioeconomically disadvantaged students secure grammar school places, and that most grammar school students come from more socially advantaged backgrounds (Andrews et al., 2016; Jerrim & Sims, 2019; Levačić & Marsh, 2007; Lu, 2020b). If, as the evidence suggests, grammar schools are more effective than other state schools in relation to promoting academic achievement (Coe et al., 2008; Levačić & Marsh, 2007), academic selection will further widen the achievement gulf between students from higher and lower socioeconomic backgrounds. Grammar school students, who mainly come from more socially advantaged backgrounds, will achieve better examination results than those who are compelled to attend non-­ selective secondary schools. However, students from lower SES groups, many of whom may have limited parental support, and consequently may be underperforming by the end of primary education, will lag further behind their grammar school counterparts. The net effect of this is that academic achievement by the end of post-primary education is likely to be positively correlated with social background, a fact that is borne out by international evidence pertaining to equity issues in selective versus non-­ selective education systems (Van de Werfhorst & Mijs, 2010). This is a

94 

I. CANTLEY

classic illustration of the so-called Matthew effect, where those from advantaged backgrounds accumulate further advantage as time progresses, while the initially disadvantaged become even more disadvantaged with time. Numerous researchers have used evidence from different international contexts to investigate how academic selection influences the strength of the relationship between social origin and student achievement (Terrin & Triventi, 2022; Van de Werfhorst & Mijs, 2010). For example, based on a cross-national analysis of reading and mathematical literacy scores from 29 countries in PISA 2003, Horn (2009) found that academic selection increases achievement inequalities by social class, with an earlier age of selection being associated with greater achievement differentials. In contrast, comprehensive education systems were generally found to reduce achievement differentials, and to promote greater equality in student attainment. In a similar vein, by analysing reading test scores from PISA 2009, Bol and Van de Werfhorst (2013) concluded that, in more selective education systems, the variation in student achievement is more strongly correlated with social background. They noted that the effect remained statistically significant after controlling for a range of factors such as wealth and government educational spending. In their meta-analysis pertaining to the effects of tracking, Terrin and Triventi (2022) reported that tracking leads to a statistically significant positive effect, at the 0.1% level, on educational inequality, both in terms of the spread of student achievement and inequality of opportunities (as gauged by the strength of the relationship between students’ social backgrounds and their achievement levels). In other words, tracking leads to greater variation in student achievement, and greater inequalities of opportunity, so that students’ socioeconomic origins are inextricably linked to their achievement levels. However, between-school tracking (academic selection) was found to increase inequality to a lesser extent than within-school tracking (within-school ability grouping), but it is noteworthy that, for all forms of tracking, its negative effect on educational equality is similar across different subject areas. Terrin and Triventi’s (2022) meta-analysis has a number of limitations, such as the inclusion of several studies predicated upon PISA tests, which are taken at a similar age to when tracking occurs in some countries, thus potentially masking the true effects of tracking on educational inequality. Nevertheless, their conclusions align with those of multiple other scholars, thus giving confidence in their accuracy.

5  CONSEQUENCES OF ACADEMIC SELECTION 

95

Academic selection could exacerbate social inequality in students’ achievement levels through various mechanisms. For example, if high-­ achieving students are clustered together in grammar schools, and lower achieving students are grouped together in non-selective secondary schools, then the latter will be unable to benefit from the positive peer group influence of higher-achieving students. This reduced access to positive peer group role models may lead some students to disengage and to achieve at lower levels than if they had been educated alongside higher-­ attaining peers (Betts & Shkolnik, 2000). The presence of higher-­achieving students may contribute to the creation of a positive classroom environment that is more conducive to learning, and where teachers are able to focus more on learning and teaching, potentially employing a restricted range of pedagogical approaches aimed at securing good examination results. A further mechanism by which academic selection could compound social inequality in student achievement is related to differences in teacher expectations and curriculum delivery between grammar and non-­ grammar schools. Grammar school teachers may have higher expectations of their students (Kelly & Carbonaro, 2012), and teach the curriculum in a manner that optimises academic achievement (Betts & Grogger, 2003). Indeed, better qualified and/or experienced teachers may opt to work in the grammar sector (Brown et al., 2021) because of a desire to teach what they perceive to be more academically gifted students (Brunello & Checchi, 2007). Such personal choices by teachers may boost the achievement of grammar school students at the expense of those who do not attend grammar schools, thus increasing the grammar versus non-­grammar performance gap. Irrespective of the underpinning mechanisms, there is conclusive evidence to indicate that grammar schools “maintain social order between social strata and facilitate inequality of opportunity” (Brown et al., 2021, p. 488). I consider this to be an important consequence of academic selection that could have a potentially deleterious impact on the life chances of many children and young people, and which therefore cannot be ignored. In the following section, I consider some further consequences of academic selection, including its impact on students’ socio-emotional outcomes, its implications for curriculum delivery, and its effects on social integration and cohesion.

96 

I. CANTLEY

Other Consequences of Academic Selection Students’ Socio-emotional Outcomes A significant volume of the research into grammar school effectiveness has focused on the impact of grammar schools on students’ later educational achievement, but some research has also demonstrated that academic selection can have an impact on students’ non-cognitive skills and socio-­ emotional outcomes. Failure to secure a grammar school place has been linked to a detrimental effect on students’ self-esteem, self-confidence, and general emotional wellbeing (Ahmavaara & Houston, 2007; Gallagher & Smith, 2000). In a Northern Irish context, teachers in non-selective secondary schools reported that their new intake often commenced post-­ primary education with a sense of failure and diminished self-esteem, and that significant effort had to be invested in re-establishing the students’ sense of self-worth (Gallagher & Smith, 2000). Gallagher and Smith (2000) also found that grammar and secondary school students were acutely aware of the more socially favourable standing attributed to grammar schools, despite the fact secondary students generally held positive views of the schools they attended. In addition, both groups of students highlighted the detrimental impact of selection on friendship groups, although some secondary students perceived a sense of superiority amongst their former friends who attended grammar schools. Drawing on analyses of qualitative data derived from interviews with principals and members of school senior management teams, Byrne and Gallagher (2004) reported that grammar school staff were primarily concerned with academic issues surrounding the transition to post-primary education. However, secondary school staff appeared to attach greater importance to induction and pastoral care as vehicles for re-establishing some students’ self-esteem and self-confidence following negative experiences of the selection procedure. Worryingly, secondary school staff referred the emotional fallout from selection encompassing issues such as students’ feelings of resentment and rejection, reticence of some students to speak in class, reduced student motivation levels, negative student attitudes towards their work, students’ low academic targets, and, in some cases, challenging behaviour (Byrne & Gallagher, 2004). More recently, Jerrim and Sims (2020) investigated the effect of academic selection on students’ socio-emotional outcomes such as

5  CONSEQUENCES OF ACADEMIC SELECTION 

97

self-­ confidence, self-esteem, and future aspirations. Jerrim and Sims (2020) used logistic regression models, together with propensity score matching, to match a sample of grammar school students from the Millennium Cohort Study in England and Northern Ireland to non-grammar school students based on a range of variables likely to influence socioemotional outcomes. They concluded that, in both jurisdictions, there was little difference between grammar and non-grammar students for the majority of outcomes considered, such as in relation to engagement with school work, wellbeing or self-esteem. However, grammar school students from Northern Ireland reported lower academic self-concepts than non-­ selective students, despite there being no appreciable differences in this outcome between grammar and non-selective students from England. The lower levels of academic self-concept amongst Northern Irish grammar school students may be attributable to the fact they are comparing their own performance to that of their high-attaining grammar school peers, and consequently erroneously concluding they are underachieving. It is noteworthy that the findings of Jerrim and Sims (2020) largely contradict those reported by Gallagher and Smith (2000) and Byrne and Gallagher (2004) in relation to principals’ and teachers’ perspectives on how academic selection negatively impacts socio-emotional outcomes for students who do not secure grammar school places in Northern Ireland. However, Gallagher and Smith (2000) also noted that, in quantitative research into students’ socio-emotional outcomes, there were limited differences between grammar school students and their non-grammar school counterparts. This suggests that the negative perceptions of teaching staff regarding the socio-emotional consequences of failure to gain admission to a grammar school may not have resonated with the reality of how students experienced their transition to post-primary education. Therefore, although the research of Jerrim and Sims (2020) was based on a relatively small sample of students, which could potentially limit the generalisability of their findings, the conclusions appear to be in broad agreement with the limited previous quantitative research into the impact of academic selection on students’ socio-emotional outcomes. Furthermore, the limited differences between the socio-emotional outcomes of grammar and non-grammar students suggest that any academic achievement gain associated with grammar school attendance is unlikely to be explained by a positive grammar school effect on engagement with school work, and must be linked to other aspects of a grammar school education.

98 

I. CANTLEY

Curriculum Delivery In a Northern Ireland context, there is evidence from school inspection reports to suggest that curriculum delivery in the later stages of primary education is distorted by preparation for academic selection tests (Gallagher & Smith, 2000). Teaching is too narrowly focused on preparing for the tests, with the result that students do not gain exposure to the envisaged range of learning experiences that are implicit in the statutory Northern Ireland curriculum. Primary schools are placed under considerable pressure to prioritise the subject areas that feature in the selection tests, despite the fact a minority of students will benefit from taking the tests and ultimately gain admission to a grammar school. Interviews with post-primary teachers in both grammar and non-grammar schools also revealed a belief that primary schools “teach to the test” in the latter stages of primary education (Gallagher & Smith, 2000). This led to narrow curriculum coverage and necessitated a “fresh start” approach at post-primary level to ensure that all students had the prerequisite knowledge and skills to progress their learning. Such a “fresh start” approach to curriculum delivery at post-primary level is problematic since it can lead to student disengagement and impede subsequent academic progress (Cantley et  al., 2021; O’Meara et al., 2020; Prendergast et al., 2019). Primary schools were initially counselled against preparing students for the unregulated transfer tests that were introduced in Northern Ireland after the demise of the state-regulated transfer test in 2008. However, this position was reversed as a result of a change in political control of education from a nationalist to a unionist Education Minister, who endorsed primary school support for academic selection test preparation in 2016 (McMurray, 2020). It is therefore unsurprising that recent research has once again revealed the negative impact of preparation for academic selection tests on the upper primary curriculum in Northern Ireland (Brown et al., 2021). Brown et al. (2021) reported that primary teachers tend to focus on English and Mathematics during the penultimate and final years of primary education because they are the only curriculum areas that are assessed in the selection tests. Such an exclusive focus on these subjects is undesirable since it leads to other disciplines being relegated to a lower level of priority, which could potentially damage students’ holistic education and be to the detriment of their future educational and vocational options.

5  CONSEQUENCES OF ACADEMIC SELECTION 

99

Social Integration and Cohesion It has been argued that children and young people who attend schools with a greater social mix tend to demonstrate greater civic awareness and respect for those from different backgrounds (Morris & Perry, 2017). Therefore, the separation of students at a tender age according to their notional academic capabilities into different types of schools where, as highlighted earlier in the current chapter, the separation largely mirrors students’ socioeconomic backgrounds, is likely to have negative implications for social integration and cohesion. This important issue has received limited attention in the literature, but Hughes and Loader (2022) offer some insights into the potential links between academic selection and social cohesion in a Northern Ireland context. Hughes and Loader (2022) make the point that Northern Ireland has a highly segregated education system, which is divided by both religion and social class. The historical evolution of the school estate in Northern Ireland means that, although some children and young people may self-­ identify as being neither Protestant nor Catholic, most students attend schools that admit those from either mainly Protestant or mainly Catholic backgrounds. The religious segregation in the education system reflects more general demarcations between the two main religious traditions in Northern Ireland that may have helped to fuel the political conflict in the jurisdiction. A small minority of students, approximately 7%, are enrolled in integrated schools, which seek to foster an ethos of social inclusion by educating those from both Catholic and Protestant backgrounds, and other backgrounds, within the same environment (Gardner, 2016). The integrated sector has been successful in promoting positive intergroup relationships between students from different traditions in Northern Ireland, but there appears to be limited potential for further growth of the sector (Gardner, 2016; Hughes & Loader, 2022). In recognition of the limited overall impact at a systemic level of integrated education and other initiatives aimed at promoting reconciliation and fostering greater degrees of social cohesion, a shared education project was inaugurated in 2007. Shared education entails sustained collaboration between schools from different sectors involving curriculum-aligned experiences for students from different religious and cultural traditions to work collegially (Gallagher, 2016). In a similar vein to integrated education, the shared education initiative has successfully promoted positive relationships

100 

I. CANTLEY

between those from different backgrounds, and it may be enhancing social cohesion more generally (Hughes & Loader, 2023). It has already been established in the current chapter that academic selection is socially divisive since students from lower socioeconomic groups are underrepresented in grammar schools, and the majority of those enrolled in the grammar sector come from more privileged backgrounds. The interaction of this socioeconomic stratification of post-­ primary education with the widespread religious segregation of schooling in Northern Ireland poses considerable challenges for a post-conflict society. Although the shared education initiative has had beneficial effects in terms of promoting reconciliation and peacebuilding, further progress has been hampered by the continued existence of academic selection, which “serves to perpetuate class and group divisions within and between school sectors, and across wider society in Northern Ireland” (Hughes & Loader, 2022, p. 3). Hughes and Loader (2022) draw on social cohesion theory to argue that, in addition to perpetuating inequalities in educational outcomes, academic selection may constitute an impediment to reconciliation initiatives in education and thereby undermine attempts to improve social cohesion. The unique Northern Ireland context considered by Hughes and Loader (2022) demonstrates the way academic selection can interact with other pre-existing forms of social division to detract from initiatives aimed at promoting higher levels of social integration and cohesion. Interestingly, Henderson (2020) also demonstrates how the system level divisions in educational provision in Northern Ireland, in relation to both religion and academic selection, constrain school choice options at post-­ primary level in a manner that potentially compromises children’s education rights, as prescribed in the United Nations Convention on the Rights of the Child.

Summary It is often claimed that academic selection promotes social mobility in the sense that it offers a mechanism through which bright children from less affluent backgrounds can avail of educational and vocational opportunities that would not have been accessible to them otherwise. Selective education systems purport to offer children from such backgrounds the opportunity to attend academically oriented grammar schools, and the opportunity to benefit from their high academic standards, provided they perform sufficiently well in a selection test taken towards the end of

5  CONSEQUENCES OF ACADEMIC SELECTION 

101

primary education. However, there have been numerous critiques of academic selection, which challenge the benefits espoused by its proponents. In this chapter, I have critically reviewed a range of evidence relating to the consequences of academic selection, including the capacity of selective systems to improve educational outcomes for students, both in the short and long terms. I have also critically assessed evidence pertaining to the social composition of grammar schools, and the extent to which selective systems promote social mobility and foster equitable educational outcomes for all students. Finally, I have analysed various other consequences of academic selection that are alluded to in the extant literature, including its impact on students’ socio-emotional outcomes, its influence on the delivery of the primary school curriculum, and how it can interact with other pre-existing forms of social division to impact on social cohesion. The individual studies within this body of research were conducted for a wide range of purposes, and in different contexts, utilising different methods, and data analysis approaches, and many of them had inherent shortcomings that detracted from the reported findings. Accordingly, this meant it would be problematic to attempt a synthesis of the results to arrive at convincing generalisable conclusions. Nevertheless, some important points emerged from the evidence review. Despite a lack of consensus regarding the effect of academic selection on students’ educational outcomes, the literature appears to point to a positive grammar school effect on students’ school level achievement, with some evidence of a beneficial effect on later outcomes in relation to higher education participation. Of course, if post-selection academic performance is measured by the same types of tests that are used for academic selection, then it is hardly surprising that grammar school students tend to do better than others on that kind of test. This may merely attest to a predictable training effect, but not necessarily to a difference in capability between grammar school students and their non-grammar school peers, or any positive grammar school effect on attainment. Whilst it has been claimed that grammar schools do not cause collateral damage to the academic achievement of students in non-grammar schools (Coe et al., 2008), some studies have disputed these claims, most notably in Northern Ireland, where academic selection remains almost ubiquitous. Despite the claims advanced by advocates of academic selection, it is noteworthy that there appears to be scant robust evidence to suggest that grammar schools do indeed promote social mobility. This is unsurprising since it has been well established in the literature that students from lower

102 

I. CANTLEY

socioeconomic groups are underrepresented in grammar schools, and the majority of those who attend grammar schools come from more socially advantaged backgrounds. Furthermore, there is convincing evidence to suggest that selective education systems lead to greater levels of inequity in academic achievement than comprehensive systems, with an apparent strong social gradient in educational outcomes in the case of the former. Quantitative evidence reveals that there is little difference between grammar and non-grammar students in relation to socio-emotional outcomes, such as engagement with school work, wellbeing, and self-esteem. However, academic selection tends to interfere with the delivery of primary school curricula as teachers attach greater priority to preparing students for selection tests in the latter stages of primary education, which may negatively impact upon students’ post-primary learning experiences. In the unique context offered by Northern Ireland, research has demonstrated that academic selection can interact with other pre-existing forms of social division, such as religious segregation, to detract from initiatives aimed at promoting higher levels of social integration and cohesion. Finally, it is important to note the delayed system of academic selection at 14  years of age that operates within the Craigavon area of Northern Ireland has not been found to offer a better overall alternative to selection at 11 years (Alexander et al., 1998). In the next chapter, I offer a novel philosophical critique of the ethics of academic selection. More specifically, I draw upon Miranda Fricker’s and David Coady’s work on epistemic injustice and the related concept of epistemic disadvantage, evidence pertaining to the consequences of selection, and the critique of educational assessment that I propounded in Chap. 2, to argue that academic selection has the potential to cause epistemic harm to some children and young people.

References Ahmavaara, A., & Houston, D. M. (2007). The effects of selective schooling and self-concept on adolescents' academic aspiration: An examination of Dweck's self-theory. British Journal of Educational Psychology, 77(3), 613–632. https:// doi.org/10.1348/000709906x120132 Alexander, J., Daly, P., Gallagher, A., Gray, C., & Sutherland, A. (1998). An evaluation of the Craigavon two-tier system: Research report no. 12. Department of Education for Northern Ireland. https://www.education-­ni.gov.uk/sites/ default/files/publications/de/craigavon-­evaluation-­1998.pdf

5  CONSEQUENCES OF ACADEMIC SELECTION 

103

Allen, R., & Bartley, J. (2017). The role of the eleven-plus test papers and appeals in producing social inequalities in access to grammar schools. National Institute Economic Review, 240, R30–R41. https://doi.org/10.1177/0027950 11724000112 Anderson, K., Gong, X., Hong, K., & Zhang, X. (2016). Do selective high schools improve student achievement? Effects of exam schools in China. China Economic Review, 40, 121–134. https://doi.org/10.1016/j.chieco. 2016.06.002 Andrews, J., Hutchinson, J., & Johnes, R. (2016). Grammar schools and social mobility. Education Policy Institute. https://epi.org.uk/publications-­and-­ research/grammar-­schools-­social-­mobility/ Araya, P., & Dussaillant, F. (2019). Does attending a selective secondary school improve student performance? Evidence from the Bicentenario schools in Chile. School Effectiveness and School Improvement, 31(3), 426–444. https:// doi.org/10.1080/09243453.2019.1697299 Atkinson, A., Gregg, P., & McConnell, B. (2006). The result of 11 plus selection: An investigation into opportunities and outcomes for pupils in selective LEAs. Centre for Market and Public Organisation. http://www.bristol.ac.uk/media-­ library/sites/cmpo/migrated/documents/wp150.pdf Betts, J. R., & Grogger, J. (2003). The impact of grading standards on student achievement, educational attainment, and entry-level earnings. Economics of Education Review, 22(4), 343–352. https://doi.org/10.1016/s0272-­7757 (02)00059-­6 Betts, J. R., & Shkolnik, J. L. (2000). The effects of ability grouping on student achievement and resource allocation in secondary schools. Economics of Education Review, 19(1), 1–15. https://doi.org/10.1016/s0272-­7757 (98)00044-­2 Bol, T., & Van de Werfhorst, H. G. (2013). Educational systems and the trade-off between labor market allocation and equality of educational opportunity. Comparative Education Review, 57(2), 285–308. https://doi.org/10.1086/ 669122 Brown, M., Donnelly, C., Shevlin, P., Skerritt, C., McNamara, G., & O'Hara, J. (2021). The rise and fall and rise of academic selection: The case of Northern Ireland. Irish Studies in International Affairs, 32(2), 477–498. https://doi. org/10.1353/isia.2021.0060 Brunello, G., & Checchi, D. (2007). Does school tracking affect equality of opportunity? New international evidence. Economic Policy, 22(52), 782–861. https://doi.org/10.1111/j.1468-­0327.2007.00189.x Burgess, S., Crawford, C., & Macmillan, L. (2017). Assessing the role of grammar schools in promoting social mobility (Working Paper No. 17-09). Department of Quantitative Social Science, UCL Institute of Education. http://repec.ioe. ac.uk/REPEc/pdf/qsswp1709.pdf

104 

I. CANTLEY

Buscha, F., Gorman, E., & Sturgis, P. (2021). Selective schooling has not promoted social mobility in England (Discussion Paper No. 14640). IZA Institute of Labor Economics. https://docs.iza.org/dp14640.pdf Byrne, G., & Gallagher, T. (2004). Systemic factors in school improvement. Research Papers in Education, 19(2), 161–183. https://doi.org/10.108 0/02671520410001695416 Cantley, I., O’Meara, N., Prendergast, M., Harbison, L., & O’Hara, C. (2021). Framework for analysing continuity in students’ learning experiences during primary to secondary transition in mathematics. Irish Educational Studies, 40(1), 37–49. https://doi.org/10.1080/03323315.2020.1779108 Clark, D. (2010). Selective schools and academic achievement. The B.E. Journal of Economic Analysis & Policy, 10(1), 1–40. https://doi.org/10.2202/1935-­ 1682.1917 Coe, R, Jones, K., Searle, J., Kokotsaki, D., Kosnin, A. M., & Skinner, P. (2008). Evidence on the effects of selective educational systems: A report for the Sutton trust. CEM Centre, Durham University. https://www.suttontrust.com/wp-­ content/uploads/2019/12/SuttonTrustFullReportFinal-­1.pdf Crawford, C. (2014). The link between secondary school characteristics and university participation and outcomes. Institute for Fiscal Studies. https://ifs.org.uk/ publications/7235 Cribb, J., Jesson, D., Sibieta, L., Skipp, A., & Vignoles, A. (2014). Poor grammar: Entry into grammar schools for disadvantaged pupils in England. The Sutton Trust. https://www.suttontrust.com/wp-­content/uploads/2013/11/Poor Grammar2013.pdf Cullinane, C., Hillary, J., Andrade, J., & Stephen, M. (2017). Selective comprehensives 2017: Admissions to high-attaining non-selective schools for disadvantaged pupils. The Sutton Trust. https://www.suttontrust.com/wp-­content/ uploads/2019/12/Selective-­Comprehensives-­2017.pdf. DENI. (2022). Annual enrolments at schools and in funded pre-school education in Northern Ireland 2021–22. DENI. https://www.education-­ni.gov.uk/publications/annual-­e nrolments-­s chools-­a nd-­f unded-­p re-­s chool-­e ducation­northern-­ireland-­2021-­22 Dickson, M., & Macmillan, L. (2020). A methodological critique. In J. Furlong & I. Lunt (Eds.), Social mobility and higher education: Are grammar schools the answer? (occasional paper 22) (pp. 15–23). Higher Education Policy Institute. https://www.hepi.ac.uk/wp-­content/uploads/2020/01/Social-­Mobility-­ and-­Higher-­Education-­Are-­grammar-­schools-­the-­answer.pdf Estrada, R., & Gignoux, J. (2017). Benefits to elite schools and the expected returns to education: Evidence from Mexico City. European Economic Review, 95, 168–194. https://doi.org/10.1016/j.euroecorev.2017.03.007 Exley, S. (2020). Selective schooling and its relationship to private tutoring: The case of South Korea. Comparative Education, 56(2), 218–235. https://doi. org/10.1080/03050068.2019.1687230

5  CONSEQUENCES OF ACADEMIC SELECTION 

105

Galindo-Rueda, F., & Vignoles, A. (2005). The heterogeneous effect of selection in secondary schools: Understanding the changing role of ability. Centre for the Economics of Education http://eprints.lse.ac.uk/19440/1/The_Hetero geneous_Effect_of_Selection_in_Secondary_Schools_Understanding_the_ Changing_Role_of_Ability.pdf Gallagher, T. (2016). Shared education in Northern Ireland: School collaboration in divided societies. Oxford Review of Education, 42(3), 362–375. https://doi. org/10.1080/03054985.2016.1184868 Gallagher, T. (2021). Governance and leadership in education policy making and school development in a divided society. School Leadership & Management, 41(1–2), 132–151. https://doi.org/10.1080/13632434.2021.1887116 Gallagher, T., & Smith, A. (2000). The effects of the selective system of secondary education in Northern Ireland: Main report. Department of Education for Northern Ireland. https://www.education-­ni.gov.uk/sites/default/files/publications/de/gallagherandsmith-­mainreport.pdf Gardner, J. (2016). Education in Northern Ireland since the good Friday agreement: Kabuki theatre meets danse macabre. Oxford Review of Education, 42(3), 346–361. https://doi.org/10.1080/03054985.2016.1184869 Gorard, S., & See, B.  H. (2013). Overcoming disadvantage in education. Routledge. Gorard, S., & Siddiqui, N. (2018). Grammar schools in England: A new analysis of social segregation and academic outcomes. British Journal of Sociology of Education, 39(7), 909–924. https://doi.org/10.1080/01425692.2018. 1443432 Harris, R., & Rose, S. (2013). Who benefits from grammar schools? A case study of Buckinghamshire, England. Oxford Review of Education, 39(2), 151–171. https://doi.org/10.1080/03054985.2013.776955 Henderson, L. (2020). Children’s education rights at the transition to secondary education: School choice in Northern Ireland. British Educational Research Journal, 46(5), 1131–1151. https://doi.org/10.1002/berj.3620 Horn, D. (2009). Age of selection counts: A cross-country analysis of educational institutions. Educational Research and Evaluation, 15(4), 343–366. https:// doi.org/10.1080/13803610903087011 Hughes, J., & Loader, R. (2022). Is academic selection in Northern Ireland a barrier to social cohesion? Research Papers in Education. Advance online publication. https://doi.org/10.1080/02671522.2022.2135016 Hughes, J., & Loader, R. (2023). Shared education: A case study in social cohesion. Research Papers in Education, 38(3), 305–327. https://doi.org/10.108 0/02671522.2021.1961303 Hurn, C. (1993). The limits and possibilities of schooling: An introduction to the sociology of education (3rd ed.). Allyn and Bacon.

106 

I. CANTLEY

Ireson, J., & Rushforth, K. (2011). Private tutoring at transition points in the English education system: Its nature, extent and purpose. Research Papers in Education, 26(1), 1–19. https://doi.org/10.1080/02671520903191170 Jerrim, J., & Sims, S. (2019). Why do so few low- and middle-income children attend a grammar school? New evidence from the millennium cohort study. British Educational Research Journal, 45(3), 425–457. https://doi. org/10.1002/berj.3502 Jerrim, J., & Sims, S. (2020). The association between attending a grammar school and children’s socio-emotional outcomes. New evidence from the millennium cohort study. British Journal of Educational Studies, 68(1), 25–42. https:// doi.org/10.1080/00071005.2018.1518513 Jesson, D. (2001). Selective systems of education– Blueprint for lower standards? Education Review, 15(1), 8–14. https://educationpublishing.com/wp-­ content/uploads/2019/06/Education_Review_Vol.15_No.1.pdf Kelly, S., & Carbonaro, W. (2012). Curriculum tracking and teacher expectations: Evidence from discrepant course taking models. Social Psychology of Education, 15(3), 271–294. https://doi.org/10.1007/s11218-­012-­9182-­6 Leroux, G. (2015). Choosing to succeed: Do parents pick the right schools? Social Market Foundation. https://www.smf.co.uk/wp-­content/uploads/2015/01/ Social-­Market-­FoundationPublication_SMF-­Briefing_Choosing-­to-­succeed_ Do-­parents-­pick-­the-­right-­schools_160114WEB.pdf Levačić, R., & Marsh, A. J. (2007). Secondary modern schools: Are their pupils disadvantaged? British Educational Research Journal, 33(2), 155–178. https:// doi.org/10.1080/01411920701208209 Lu, B. (2020a). How can we evaluate the effectiveness of grammar schools in England? A regression discontinuity approach. British Educational Research Journal, 46(2), 339–363. https://doi.org/10.1002/berj.3581 Lu, B. (2020b). Selection on attainment? Local authorities, pupil backgrounds, attainment and grammar school opportunities. Educational Review, 72(1), 68–87. https://doi.org/10.1080/00131911.2018.1483893 Lu, B. (2021). Does attending academically selective schools increase higher education participation rates? Cambridge Journal of Education, 51(4), 467–489. https://doi.org/10.1080/0305764x.2020.1863914 Lu, B. (2023). Understanding the unsettled evidence of the effectiveness of selective education in the value-added approach. British Journal of Educational Studies, 71(2), 213–231. https://doi.org/10.1080/00071005.2022.2045898 Mansfield, I. (2019). The impact of selective secondary education on progression to higher education (occasional paper 19). Higher Education Policy Institute. https://www.hepi.ac.uk/wp-­content/uploads/2019/01/HEPI-­Occasional-­ Paper-­19-­as-­published-­Screen.pdf

5  CONSEQUENCES OF ACADEMIC SELECTION 

107

Marks, J. (2000). The betrayed generations: Standards in British schools 1950-2000. Centre for Policy Studies. https://cps.org.uk/wp-­content/uploads/2021/ 07/111028112306-­BetrayedGenerations.pdf Maurin, E., & McNally, S. (2007). Educational effects of widening access to the academic track: A natural experiment. Centre for the Economics of Education https://cep.lse.ac.uk/pubs/download/CEE/ceedp85.pdf McMurray, S. (2020). Research and information service briefing paper: Academic selection. Northern Ireland Assembly. http://www.niassembly.gov.uk/globalassets/documents/committees/2017-­2 022/education/post-­p rimar y-­ transfer-­survey/academic-­selection-­briefing-­paper-­niar-­209-­2020.pdf Morris, R., & Perry, T. (2017). Reframing the English grammar schools debate. Educational Review, 69(1), 1–24. https://doi.org/10.1080/0013191 1.2016.1184132 O’Meara, N., Prendergast, M., Cantley, I., Harbison, L., & O’Hara, C. (2020). Teachers’ self-perceptions of mathematical knowledge for teaching at the transition between primary and post-primary school. International Journal of Mathematical Education in Science and Technology, 51(4), 497–519. https:// doi.org/10.1080/0020739x.2019.1589004 Perry, T. (2019). “Phantom” compositional effects in English school value-added measures: The consequences of random baseline measurement error. Research Papers in Education, 34(2), 239–262. https://doi.org/10.1080/0267152 2.2018.1424926 Prendergast, M., O'Meara, N., O'Hara, C., Harbison, L., & Cantley, I. (2019). Bridging the primary to secondary school mathematics divide: Teachers’ perspectives. Issues in Educational Research, 29(1), 243–260. http://www.iier. org.au/iier29/prendergast.pdf Rasbash, J., Leckie, G., Pillinger, R., & Jenkins, J. (2010). Children's educational progress: Partitioning family, school and area effects. Journal of the Royal Statistical Society: Series A (Statistics in Society), 173(3), 657–682. https://doi. org/10.1111/j.1467-­985x.2010.00642.x Reay, D. (2004). Exclusivity, exclusion, and social class in urban education markets in the United Kingdom. Urban Education, 39(5), 537–560. https://doi. org/10.1177/0042085904266925 Shuttleworth, I. & Daly, P. (2000). The pattern of performance at GCSE: Research paper SEL3.1. Department of Education for Northern Ireland. https://www. education-­ni.gov.uk/sites/default/files/publications/de/gallagherandsmith-­ patternofperfatgcse.pdf Sullivan, A., Parsons, S., Green, F., Wiggins, R.  D., Ploubidis, G., & Huynh, T. (2018). Educational attainment in the short and long term: Was there an advantage to attending faith, private, and selective schools for pupils in the 1980s? Oxford Review of Education, 44(6), 806–822. https://doi.org/10.108 0/03054985.2018.1481378

108 

I. CANTLEY

Terrin, É., & Triventi, M. (2022). The effect of school tracking on student achievement and inequality: A meta-analysis. Review of Educational Research, 93(2), 236–274. https://doi.org/10.3102/00346543221100850 Van de Werfhorst, H. G., & Mijs, J. J. B. (2010). Achievement inequality and the institutional structure of educational systems: A comparative perspective. Annual Review of Sociology, 36(1), 407–428. https://doi.org/10.1146/ annurev.soc.012809.102538

CHAPTER 6

Ethics of Academic Selection

Abstract  In this chapter, I critically review various perspectives on educational justice, and their implications for academic selection. I argue that most of the extant literature on educational justice and academic selection focuses on selection by any means, and rather less attention has been accorded to the educational justice implications of academic selection predicated on high stakes tests. To address this gap in the literature, I demonstrate how some of the consequences of selection (see Chap. 5), together with the philosophical limitations of educational assessment (articulated in Chap. 2), mean that academic selection via high stakes tests can lead to the harmful treatment of some students. More specifically, I argue that academic selection by any means has the potential to precipitate epistemic harms for some young people in the form of epistemic disadvantage or epistemic injustice. Keywords  Academic selection • Educational justice • Epistemic disadvantage • Epistemic injustice

Introduction Ethics is the branch of philosophy that is concerned with moral judgements about what is good or bad in relation to aspects of human behaviour. Warnock (1998) cautions that ethical decision-making is a complex © The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 I. Cantley, The Philosophical Limitations of Educational Assessment, https://doi.org/10.1007/978-3-031-47021-9_6

109

110 

I. CANTLEY

issue that is guided by general principles, and entails making judgements about what constitutes virtuous human behaviour (or otherwise) at a particular point in time, but she asserts that standards for judging behaviour vary with time. Warnock briefly highlights the interconnections between ethics and religion and proceeds to highlight the relevance of ethics to various aspects of human life, including education. In particular, she emphasises the relevance of ethical judgements to education by illuminating the pivotal role they play in ensuring that students have equality of opportunity in accessing educational resources and experiences. Given its important role in influencing decisions about the admission of students to different school types, it is unsurprising that there are important ethical issues associated with academic selection. In this chapter, my approach to ethical issues allied to academic selection is largely predicated on the just distribution of educational opportunities rather than any alternative ethical approach, such as one that prioritises human rights. I consider such an approach to be warranted given the important role of education in preparing students for employment in competitive industrial economies. Elwood (2013) made helpful contributions to elucidating some of the potential ethical issues associated with academic selection in a Northern Ireland context by invoking Messick’s (1989, 1995) conception of test validity, which incorporates an ethical dimension through the priority he attaches to consequences of using test scores to make decisions about students. According to Elwood (2013), there were potential equality of opportunity dilemmas associated with the unregulated transfer tests that were used for academic selection purposes in Northern Ireland between 2009 and 2022. Elwood indicated that the different types of assessment employed in the two unregulated transfer tests (multiple choice questions in PPTC tests versus constructed response questions in AQE tests) may have advantaged different sub-groups of candidates who took these tests. More specifically, Elwood (2013) surmised that the multiple choice format of the PPTC tests may have been more advantageous for boys than girls, while the opposite may have been true for the constructed response AQE test format, thus leading to issues of fairness and equity in grammar school admission decisions that were based on the scores obtained in these tests. Elwood also argued that such potential ethical dilemmas were further compounded by the lack of evidence pertaining to the comparability of the two tests in relation to their difficulty levels, which she insinuated could also compromise fairness and equity in the academic selection process. Although these are very real concerns that, if shown to be true, would

6  ETHICS OF ACADEMIC SELECTION 

111

strike an ethical hammer blow to the academic selection process that operated in Northern Ireland between 2009 and 2022, the dearth of publicly available data for scrutiny and analysis means that it is difficult to test Elwood’s claims. On the ethical front, the philosophical literature on educational justice is particularly relevant to the academic selection debate. Many contributions to scholarship in this area have built on John Rawls’s work on distributive justice (Rawls, 1971), which offers insights into how social goods like education should be distributed in a fair and just society. Rawls (1971) believed that fairness is a pivotal idea in the concept of justice, and he developed a model of distributive justice in which he articulated two principles for the distribution of social goods. The first of these is the “liberty principle”, which asserts that all citizens should have equal rights to a comprehensive system of equal basic liberties. The second has two parts: the “fair equality of opportunity principle” and the “difference principle”. The fair equality of opportunity principle holds that individuals with comparable abilities and skills should have similar life chances, while the difference principle requires that any social and economic inequalities should be of maximal benefit to the least advantaged. According to Rawls (1971), these principles should be lexically ordered, so that the liberty principle takes priority over the fair equality of opportunity principle that, in turn, takes priority over the difference principle. Given the socioeconomic gradient in grammar school admissions that was highlighted in Chap. 5, academic selection appears to contravene Rawls’s fair equality of opportunity principle, but, in the current chapter, I further elaborate on the ways in which academic selection can have a detrimental impact on educational justice. Initially, I critically review some perspectives on educational justice that take different stances on distributive justice and consider their implications for academic selection. I argue that, although such considerations give some useful insights into the ethics of academic selection, they tend to apply to selection by any means. Rather less attention has been given to the justice implications of effecting academic selection using a one-off, high stakes test of the type that is currently utilised in Northern Ireland and some other jurisdictions internationally. In the current chapter, I address this gap in the extant literature by synthesising the consequences of academic selection reviewed in Chap. 5, the critique of educational assessment articulated in Chap. 2, and insights from social epistemology on epistemic injustice (Fricker, 2007, 2013; Coady, 2010) and epistemic

112 

I. CANTLEY

disadvantage (Goldstein, 2022). Epistemic injustice refers to the way in which an individual can be wronged in their capacity as a knower, and consequently be treated unfairly in relation to access to epistemic goods such as education. Epistemic disadvantage, on the other hand, refers to intellectual or moral harms that can occur when an individual’s exclusion from knowledge exchanges is warranted. To set the scene for my analysis, I outline several perspectives on educational justice, and their concomitant consequences for academic selection.

Perspectives on Educational Justice and Their Implications for Academic Selection Harry Brighouse and Adam Swift have articulated a particularly robust defence of an egalitarian approach to educational justice whereby the capacity of all citizens to access equal educational opportunities is promoted (Brighouse, 2002; Brighouse & Swift, 2006). Brighouse and Swift (2006) consider education to be an example of a positional good, where positional goods are defined as “goods the absolute value of which, to their possessors, depends on those possessors’ place in the distribution of the good—on their relative standing with respect to the good in question” (p. 474). According to Brighouse and Swift, positional goods such as education are valued to some extent because they grant access to other goods, such as higher education or employment opportunities, and their value in relation to achieving those other goods is a function of how much an individual possesses relative to others rather than the absolute amount he or she possesses. For example, successful applications to higher education courses are generally positively correlated with candidates’ educational credentials, and those candidates with stronger credential profiles tend to have a greater probability of success. Brighouse and Swift (2006) argue in favour of equality in relation to how positional goods are distributed and condone “levelling down” in relation to such goods. In other words, they are supportive of decreasing the educational opportunities available to the more advantaged in society to secure greater equality of opportunities for everyone, claiming that levelling down enhances the absolute position of the most disadvantaged. Thus, Brighouse and Swift (2006) attempt to reason that greater equality in the distribution of positional goods like education may be preferential even on a prioritarian basis, that is, based on the overall benefit for all individuals but giving extra weight to the most

6  ETHICS OF ACADEMIC SELECTION 

113

disadvantaged, rather than the intrinsic value of such equality. Brighouse and Swift (2006), unlike Rawls, thus do not condone prioritisation of the equality of opportunity principle over the difference principle. In Chap. 5, I demonstrated that grammar school attendance is associated with positive effects on educational outcomes, and that students from lower socioeconomic groups are underrepresented in grammar schools, with most grammar school students coming from more socially advantaged backgrounds. I also highlighted that students who fail to gain grammar school places tend to achieve lower academically in non-selective secondary schools compared to similar students in comprehensive schools, while grammar school students tend to achieve higher academic standards in grammar schools compared to similar students in comprehensive schools. In the context of academic selection, the reasoning of Brighouse and Swift (2006) implies that students who would have been denied access to grammar schools under a selective regime may be better off absolutely, rather than just relatively so, in educational opportunity terms, if academic selection were prohibited. If all students were compelled to attend comprehensive schools, this would remove any educational advantage associated with a grammar school education. It would also reduce the social gradient in educational outcomes and permit those who would have failed to gain grammar school places in a selective system to compete for higher education courses or jobs on something more akin to equal terms with their erstwhile grammar school counterparts. It is therefore unsurprising that Brighouse explicitly endorses the abolition of academic selection based on his well-documented egalitarian stance (Brighouse et al., 2010). It could be argued that “levelling down” contravenes the freedom of families and/or jurisdictions to pursue the values and priorities they attach to education (Anderson, 2007; Anderson & White, 2019; Satz, 2007), and therefore that it fails to give due regard to Rawls’s liberty principle. Satz (2007, p. 634) articulates the apparent tension between equality of opportunity and freedom: “We cannot secure the equal development of children’s potentials while permitting a world with diverse families, parents, parenting styles, geographical locations, and values.” However, in the Northern Ireland context, where academic selection is widespread, it is important to note that giving families the freedom to choose a grammar school education for their children, if they so wish, also precipitates negative consequences for those children who do not secure grammar school places. Many children in this category are compelled to attend non-­selective secondary schools, and I contend that this therefore infringes their

114 

I. CANTLEY

fundamental liberties. Therefore, acknowledging and accommodating the liberties of those in Northern Ireland who favour academic selection simultaneously infringes the liberties of those families whose children are denied admission to a grammar school. Furthermore, it is conceivable that “levelling down” could eradicate the societal benefits for all citizens associated with a grammar school education for some individuals, who could enrich the lives of others by virtue of their enhanced knowledge and skills (Anderson & White, 2019). Conversely, I suggest that the lives of those who are denied admission to a grammar school may be impoverished by virtue of the fact they are denied access to the educational opportunities offered by grammar schools. In response to these challenges, Elizabeth Anderson and Debra Satz both advocate an “adequacy” rather than an “equality” standard for educational justice, so that all students are educated to an adequate standard, while the equality agenda is de-prioritised. However, this raises the question of what exactly constitutes an adequate standard of education and, more worryingly, leaves substantial leeway for the emergence of considerable degrees of inequality in educational opportunities and, in all probability, it would be to the benefit of the socially advantaged. Although Anderson suggested that an adequate standard of education would necessitate substantial “levelling up” for the most disadvantaged in society and that it would be sufficient to enable them to relate to other people on equal terms (Anderson & White, 2019), this does not sufficiently clarify the definition of adequate. In a US context, however, Satz (2007) stipulates that an adequate standard of education would ensure that all sufficiently capable individuals are able to access higher education. Whilst I strongly endorse the importance of ensuring that students receive a sufficient level of education to equip them to function as autonomous and competent citizens, I reject the replacement of an equality standard with an adequacy standard for educational justice. I concur with Brighouse and Swift’s (2009) claim that “educational adequacy … is not a comprehensive principle to guide the distribution of educational resources” (p. 127). My main objection to the adequacy standard is the potential inequalities in educational opportunities that are enshrined within it. In contrast to the implications of Brighouse and Swift’s (2006) analysis, the educational adequacy agenda advanced by Anderson (2007) and Satz (2007) would appear to condone academic selection provided non-grammar school students receive an “adequate” standard of education.

6  ETHICS OF ACADEMIC SELECTION 

115

Although Brighouse and Swift (2006) place considerable emphasis on the positional nature of education in articulating their stance on educational justice, they also acknowledge that education has non-positional benefits. More specifically, they highlight that education has important epistemic benefits as it fosters knowledge amongst its recipients. Since there are finite limits on educational opportunities, and the credentials that stem from those opportunities, the benefits of education are positional in the sense that the quality of one individual’s education can theoretically diminish or enhance the quality of another individual’s education. Consequently, education is a competitively valuable social good in recruitment exercises for opportunities that it grants access to, such as higher education courses or jobs. However, the knowledge that results from education is not constrained in this way since the knowledge level of one person is independent of that of another person; whether person X knows a truth t is unrelated to whether another person Y knows t. Therefore, the extent of one individual’s knowledge cannot increase or decrease the level of knowledge possessed by another individual, and so knowledge is non-positional. Brighouse and Swift (2006) admit that a balanced approach must be taken in relation to the distribution of the positional and non-positional benefits of education. For example, a child from a socioeconomically disadvantaged background may be better off overall if they acquire less knowledge because of levelling down, since they could compete in higher education or job applications on something more akin to a level playing field with children from more affluent backgrounds. Although Brighouse and Swift’s particular flavour of egalitarianism, whereby they condone levelling down as a vehicle for promoting equality of educational opportunities, holds considerable appeal for many who are interested in educational justice, alternative perspectives on educational justice have been espoused in the literature. Indeed, one such perspective, based on the principle of educational adequacy, was discussed above. However, some scholars have advocated an approach to educational justice that prioritises the non-­ positional, epistemic benefits of education over its competitive, positional aspects. For example, Kotzee (2013) argued that the dominant approach to educational justice should not entail levelling down but, rather, improving the epistemic benefits of education for all citizens. In Kotzee’s view, such an approach would permit priority to be accorded to the “distribution of

116 

I. CANTLEY

what arises directly and essentially from education: knowledge” (Kotzee, 2013, p. 348). He posits that, although decisions pertaining to admission to higher education courses and job offers are sometimes predicated on the knowledge that students acquire through their education, they are often influenced by extraneous factors that are not directly attributable to candidates’ levels of education. Kotzee argues that education needs to be improved for those who are knowledge impoverished to enhance their epistemic standing and to permit them to “gain a voice” (Kotzee, 2013, p. 349). In addition, he makes the point that relevant understanding of the essence and degree of oppression in the world ought to be imparted to the powerful as a means of eradicating such oppression. Academic selection does not therefore appear to be incompatible with Kotzee’s (2013) model of educational justice, although he does not explicitly endorse a link between academic selection and his justice model. However, I contend that the gravity of Kotzee’s arguments is somewhat weakened by his claim that education fundamentally distributes knowledge. This naïve conceptualisation of education is extremely contentious and conflicts with the views of scholars such as John Dewey who consider education to have much broader scope than the distribution of knowledge. Furthermore, it may be overly optimistic to presume that enhanced education of the powerful would lead to the elimination of oppression. Rawls’s model of distributive justice has been subjected to extensive critique in the literature, particularly in relation to the difference principle. According to this principle, the distribution of social goods such as education should be of greatest benefit to those who are the most disadvantaged. However, it is extremely problematic that Rawls’s framing of the difference principle gives scant attention to the extent to which the most disadvantaged warrant preferential treatment (Dworkin, 1981). For example, it is conceivable that a given group of individuals could be living in poverty by personal choice rather than due to factors that are beyond their control. Such a situation would manifest itself if, for example, the individuals concerned opted to forego opportunities to secure gainful employment, despite being abundantly qualified, capable, and in appropriate circumstances to do so. From a fairness perspective, it would appear to be highly inappropriate to countenance the prioritisation of such a group for preferential treatment in the distribution of social goods, since they do not seem to deserve such treatment. The failure to take appropriate account of how deserving a group is in relation to preferential treatment is a serious omission from a model of distributive justice that purports to emphasise the importance of fairness.

6  ETHICS OF ACADEMIC SELECTION 

117

In the context of academic selection, it could be argued that the use of high stakes tests to select suitable candidates for a grammar school education offers a mechanism to ensure that those who are selected deserve such an accolade, irrespective of their social background. In other words, the use of high stakes tests to effect academic selection could be deemed to offer a fair and meritocratic system for deciding on the most suitable type of post-primary school for a child. After all, if the results of selection tests were valid and reliable, academic selection decisions would be predicated on objective and incontrovertible evidence pertaining to children’s academic potential. Davis (2009) has disputed this enticing picture of a robust, meritocratic system. Unlike most critics of academic selection, Davis invokes philosophical arguments to question its fairness, and he suggests that the validity of the relevant high stakes tests is instrumental in determining if the selection process is fair. Drawing on empirical evidence, however, Davis posits that the more extensive coaching received by candidates from middle class backgrounds, compared to those from working class origins, artificially inflates their test scores, and correspondingly compromises test validity and ultimately test fairness. However, irrespective of academic selection’s meritocratic basis, Davis also questions its proponents’ ethical and political motivations for wishing to promote social mobility, which is a frequently cited benefit of selection. He ponders why it is more desirable to promote the mobility of people based on socioeconomic status rather than race, gender, geographical location, or any other demographic category, given that mobility is likely to vary by category. However, I contend the prioritisation of mobility based on socioeconomic status is justifiable because, unlike other sociodemographic variables, socioeconomic background is a consistent and enduring predictor of low educational achievement in many educational systems (Bruckauf, 2016; Liu et al., 2022). Unlike most philosophical critiques of academic selection, Davis (2009) explicitly highlighted some of the ethical dilemmas associated with the use of high stakes tests to effect academic selection. Like Davis, I am concerned about the moral consequences of using one-off high stakes tests as a means of selecting students for post-primary education based on the supposition that the tests accurately and reliably measure academic potential. In this chapter, I argue that the tensions associated with educational assessment which I highlighted in Chap. 2, and the consequences of selection that I articulated in Chap. 5, have the potential to lead to harmful treatment of some candidates as a consequence of basing academic

118 

I. CANTLEY

selection decisions on the results of one-off high stakes tests. My arguments draw upon important notions from social epistemology, namely epistemic disadvantage, and epistemic injustice, which I introduce in the following section.

Epistemic Injustice and Epistemic Disadvantage Epistemic Injustice According to Fricker (2007, p.  1), epistemic injustice is a “distinctively epistemic kind of injustice”, in which an individual is wronged “specifically in their capacity as a knower”. Fricker (2007) initially introduced two distinct forms of epistemic injustice: testimonial injustice and hermeneutical injustice, which she referred to subsequently as discriminatory forms of epistemic injustice (Fricker, 2013). Testimonial injustice is deemed to have occurred when prejudice causes someone to be wronged in their capacity as a knower because of reduced credibility being accorded to their testimony, and where the prejudice is unrelated to whether the person should be granted credibility, for example, prejudice associated with sexism or racism. Therefore, a classic example of testimonial injustice occurs when a man fails to take the testimony of a woman seriously because of a prejudicial stereotype that women’s opinions tend to be more influenced by emotion than intellect. In such a scenario, the woman is treated unjustly as a knower since she is accorded reduced credibility to convey truthful and objective testimony. Testimonial injustices thus confer an unfair advantage to those who are not subject to such prejudices. It is important to note that Fricker (2007) considered the possibility of extending the concept of testimonial injustice to incorporate scenarios where a speaker’s testimony receives greater credibility than is warranted (i.e., a credibility excess), rather than curtailing it to situations where less credibility is afforded to the speaker than is deserved (i.e., a credibility deficit). Although Fricker (2007) acknowledged that it is possible for credibility excesses to have negative consequences for a person’s epistemic character if, for example, such excesses limit their capacity to cultivate epistemic virtues such as open-mindedness, she ultimately rejected the broadening of testimonial injustice to include credibility excesses:

6  ETHICS OF ACADEMIC SELECTION 

119

I do not think it would be right to characterize any of the individual moments of credibility excess that such a person receives as in itself an instance of testimonial injustice, since none of them wrongs him sufficiently in itself. (Fricker, 2007, p. 21)

Fricker (2007) emphasised that not all epistemic wrongs constitute cases of epistemic injustice, and she took great care to clarify her position regarding innocent epistemic wrongs. Fricker argued that credibility deficits can stem from things other than prejudice and can arise due to innocent mistakes that are non-culpable from both an ethical and an epistemic perspective. Thus, if testimonial injustice is construed as being associated with a credibility deficit in line with Fricker’s (2007) preferred definition of the concept, there are forms of credibility deficit that do not constitute instances of testimonial injustice. It is conceivable that a hearer could harbour a false opinion about a given speaker’s capability level or about their motives for behaving in a particular manner, thereby leading to the hearer ascribing less credibility to the speaker than is warranted. Provided the hearer’s false opinion is non-culpable both ethically and epistemically, so that it is not based, for example, on prejudice or careless epistemic judgements, the erroneous credibility ascription would be non-culpable. Therefore, this would not represent an example of testimonial injustice, but rather an unfortunate epistemic mistake. The other type of epistemic injustice identified by Fricker (2007), hermeneutical injustice, occurs when a dearth of relevant conceptual resources puts a person at an unfair disadvantage in relation to making sense of their social experiences. Such injustices unfairly advantage those who have access to a relevant conceptual framework to understand their social experiences, thereby leading to a situation where, because of their ready access to the common pool of epistemic resources by virtue of their identity, “the powerful have an unfair advantage in structuring collective social understandings” (Fricker, 2007, p. 147). A notable example of hermeneutical injustice, considered by Fricker (2007), pertains to a woman who is the victim of sexual harassment in the era before this concept became known, resulting in a situation where she is unable to properly understand what she has experienced or communicate her experience to other people. Although Fricker (2007) suggested that unfairness in the distribution of epistemic goods, such as education, did not in itself constitute epistemic injustice, Coady (2010) argued strongly for the opposite position. According to Coady (2010), questions about the just distribution of

120 

I. CANTLEY

epistemic goods such as education and information are warranted by virtue of their epistemic nature. Coady (2010) thus defined distributive epistemic injustice as “injustice in the distribution of the epistemic good of knowledge” (p.  112). Consequently, Fricker (2013) also extended her notion of epistemic injustice to incorporate distributive epistemic injustice, which she describes as “the unfair distribution of epistemic goods such as education or information” (p. 1318). However, in a later contribution, Coady (2017) argued that testimonial and hermeneutical injustices are both types of distributive injustice, thus unifying the discriminatory and distributive forms of epistemic injustice. According to Fricker (2003, p.  154), testimonial injustice “occurs when prejudice on the part of the hearer leads to the speaker receiving less credibility than he or she deserves”. This raises important questions about exactly what constitutes the just distribution of credibility. Fricker (2007) suggested that credibility is fundamentally different from goods such as wealth or healthcare that are suited to a distributive model of justice. She argued that there is less of a quandary in relation to the fair distribution of credibility than the fair distribution of goods such as wealth since “the hearer’s obligation is obvious: she must match the level of credibility she attributes to her interlocutor to the evidence that he is offering the truth” (Fricker, 2007, p.  19). However, Coady (2017) posits that Fricker’s (2007) stance fails to address important considerations about the means by which the hearer is obliged to garner and interpret suitable evidence pertaining to the veracity of the speaker’s testimony. Given that there is similar ambiguity concerning the criteria upon which the just distribution of goods such as wealth should be predicated, Coady (2017) concludes that there is insufficient distinction between goods such as wealth and credibility to warrant the applicability of the distributive model of justice to the former, but not the latter. Furthermore, Fricker (2007) argued that goods which are suitable for a distributive model of justice are usually finite, potentially scarce, and generally entail competition to access them, thus leading to ethical dilemmas regarding their just distribution. Accordingly, Fricker (2007) concluded that the non-finite nature of credibility, coupled with its non-competitive nature, undermines the suitability of a distributive justice model for credibility: Those goods best suited to the distributive model are so suited principally because they are finite and at least potentially in short supply. … Such goods

6  ETHICS OF ACADEMIC SELECTION 

121

are those for which there is, or may soon be, a certain competition and that is what gives rise to the ethical puzzle about the justice of this or that particular distribution. By contrast, credibility is not generally finite in this way, and there is no analogous competitive demand to invite the distributive treatment. (pp. 19-20)

However, Coady (2017) argues, correctly in my view, that credibility has finite bounds since it would be bizarre to suggest that a given hearer could believe every testimonial utterance they encounter, and it would be even more bizarre to assign maximum credibility to the associated speakers (Medina, 2013). In addition, and contrary to Fricker’s (2007) claim, Coady (2017) notes that credibility is frequently in short supply, thus leading to competition for it, as exemplified by the fact most legal trials entail a competition for credibility between an accused individual and their accuser. For example, an unwarranted low level of credibility assigned by jurors to the testimony of an accused person is intimately entwined with the unwarranted high level of credibility they accord to the testimony of the accuser. Similarly, the jurors’ assignment of an unwarranted low level of credibility to the testimony of the accuser is intimately associated with the unwarranted high level of credibility they grant to the testimony of the accused. Coady (2017) concludes that the finite nature of credibility, coupled with the fact credibility assignments can and do entail competition, undermines Fricker’s (2007) attempt to establish a disanalogy between credibility and goods that are traditionally viewed as being suited to a distributive justice model. Accordingly, Coady (2017) asserts that testimonial injustice represents a type of distributive injustice since it pertains to an injustice in credibility distribution. Although Coady (2017) broadly supports Fricker’s (2007) contention that individual moments of credibility excess being accorded to a person do not constitute examples of testimonial injustice, he rightly bemoans the outright omission of credibility excesses from her analysis of testimonial injustice. I concur with Coady’s suggestion that such an omission is problematic because it is conceivable that credibility excesses could indeed be unjust if, for example, a speaker gains unwarranted privileges as a consequence of being accorded a credibility excess. Fricker (2013) categorised both testimonial injustice and hermeneutical injustice as forms of epistemic injustice of the discriminatory type, which she contrasted with the distributive form of epistemic injustice. This suggests that, as for testimonial injustice, Fricker does not view

122 

I. CANTLEY

hermeneutical injustice to be a form of distributive injustice. However, Coady (2017) argued that, in the same manner as testimonial injustice, hermeneutical injustice should be conceptualised in distributive terms. According to Fricker (2006, p. 99), hermeneutical injustice is “the injustice of having some significant area of one’s social experience obscured from collective understanding owing to persistent and wide-ranging hermeneutical marginalization”. Hermeneutical marginalisation refers to being a member of “a group which does not have access to equal participation in the generation of social meanings” (Fricker, 2013, p. 1319). Some social groups are in a privileged position in relation to the generation of social meanings, such as politicians, white people, and men, so that they could be viewed as having more “hermeneutical power” (Coady, 2017, pp. 64–65) than social groups who are in less privileged positions. Those in the less privileged groups, who have lower levels of hermeneutical power, can experience unjust epistemic harm due to their social experience being misunderstood. Coady (2017) presents a convincing argument that, in a similar manner to credibility, hermeneutical power is finite, and that competition may therefore be engendered to acquire it. Coady (2017, p.  65) uses the following example, pertaining to sexual harassment, to illustrate how imbalances in hermeneutical power can lead to hermeneutical injustice: It is because women have not had equal hermeneutic power that the kinds of encounters we now recognize as sexual harassment were (and often still are) dismissed as harmless flirting.

According to Coady (2017), it is appropriate to view hermeneutical injustice as a type of distributive injustice since it entails unjust distribution of hermeneutical power. Thus, Coady (2017) construes both testimonial injustice and hermeneutical injustice as forms of distributive injustice, thereby dissolving Fricker’s differentiation between the distributive and discriminatory forms of epistemic injustice. Testimonial injustice entails injustice in the distribution of credibility, while hermeneutical injustice is concerned with injustice in the distribution of hermeneutical power. The disadvantage associated with epistemic injustice is harmful, both intrinsically and instrumentally. From an intrinsic perspective: “To be wronged in one’s capacity as a knower is to be wronged in a capacity essential to human value. When one is undermined or otherwise wronged in a capacity essential to human value, one suffers an intrinsic injustice”

6  ETHICS OF ACADEMIC SELECTION 

123

(Fricker, 2007, p.  44). In particular, a person who suffers a testimonial injustice is wronged in their capacity to communicate knowledge truthfully and accurately, which in turn undermines their capacity to engage in an activity that is of paramount importance to human life, namely reasoning. Instrumentally, Fricker (2007) posits that the harm emanating from epistemic injustice has both practical and epistemic dimensions. There may be practical consequences if, for example, a person suffers testimonial injustice in a court, resulting in them being found guilty when they are innocent, thereby receiving a fine or perhaps even imprisonment. However, there may also be purely epistemic harms associated with epistemic injustice. For example, an individual who is the victim of an isolated case of testimonial injustice may have their confidence in their beliefs, or the rationale for them, undermined. However, if someone suffers persistent testimonial injustice, this may negatively impact their confidence in their own intellectual capabilities, thus impeding their future educational and general intellectual development. Epistemic Disadvantage Goldstein (2022) made further helpful contributions to clarify the differences between epistemic injustice and epistemic harm more generally. In particular, she considered the unintentional epistemic harms associated with the warranted ascription of reduced levels of credibility to some people, based on a lack of capability, for example, and the consequent exclusion of those individuals from certain epistemic practices. Goldstein (2022) stressed that such warranted exclusions are not unjust, and therefore do not constitute instances of epistemic injustice. Nevertheless, she acknowledged that the excluded individuals do experience unintentional epistemic harms, which she characterised as forms of “epistemic disadvantage” (Goldstein, 2022, p. 1862). Goldstein (2022, p. 1862) intimated that epistemic disadvantage arises when “non-deliberate, asymmetrical relations exclude person(s) from social participation, leading to an intellectual or moral harm”. According to Goldstein (2022), epistemic disadvantage and epistemic injustice both emanate from hermeneutical inequalities, which she defines as “non-deliberate, asymmetrical relations that either exclude one from social participation, or obscure one’s social experience from collective understanding” (p. 1868). Epistemic injustice in its hermeneutical form occurs when some groups are excluded from social participation in an

124 

I. CANTLEY

unwarranted manner, so that asymmetrical epistemic relations are maintained deliberately to systematically ostracise marginalised groups. Contrastingly, epistemic disadvantage arises when social participation is hampered by warranted rather than unwarranted exclusionary practices, such as on meritorious terms. However, as Goldstein (2022) notes, “when one is in a greater epistemic position … epistemic harm can occur against those who are not in an equal position” (p. 1868). Goldstein (2022) identified two conditions that I think are particularly helpful in comprehending the difference between epistemic disadvantage and epistemic injustice. Firstly, epistemic injustice entails an identity prejudice, so that some type of identity prejudice negatively influences an individual’s social experience. For example, in the case of testimonial injustice, identity prejudice on the hearer’s part can lead to a speaker’s testimony being discounted because of an unwarranted reduced level of credibility being accorded to them by the hearer. However, in a situation where the speaker’s testimony is discounted because they genuinely lack relevant competence, rather than an identity prejudice harboured by the hearer, the speaker experiences an epistemic disadvantage rather than an epistemic injustice. Secondly, epistemic disadvantage may be non-intentional and potentially originate from “circumstantial epistemic bad luck” (Fricker, 2007, p. 152). In this context, “circumstantial epistemic bad luck” may arise if, for example, the dominant, privileged social group has not acquired an appropriate conceptual framework to make sense of the situation concerned and to prevent exclusionary processes from occurring. The epistemic harm that is engendered by the exclusionary processes in this type of scenario is non-intentional, and therefore constitutes an instance of epistemic disadvantage rather than epistemic injustice. Such a situation would, for example, manifest itself if a medical practitioner failed to give due credence to the testimony of a patient suffering from a particular condition in an era before the condition had become properly understood by the medical profession. Clearly, the patient would experience epistemic harm because of their inability to have their account of their experiences taken seriously. However, the non-intentional failure of the medical practitioner to give credence to the patient’s testimony by virtue of the fact a suitable conceptual framework for making sense of the situation did not exist at the time, renders this to be an example of epistemic disadvantage rather than epistemic injustice. Nevertheless, epistemic injustice may be non-­ intentional since, for example, a hearer could accord reduced credibility to

6  ETHICS OF ACADEMIC SELECTION 

125

a speaker because of an implicit bias (e.g., in sexism or racism), thus precipitating a non-intentional occurrence of testimonial injustice. A salient point to emphasise is that epistemic disadvantage and epistemic injustice are not in opposition to each other, but, rather, they can apply to different aspects of the same phenomenon. It is entirely conceivable that some facets of a particular phenomenon could give rise to epistemic disadvantage, while others lead to epistemic injustice. In the next section, I analyse the extent to which various aspects of academic selection, including the tensions associated with high stakes tests that I articulated in Chap. 2, can potentially lead to either epistemic disadvantage or epistemic injustice for some students.

To What Extent Does Academic Selection Cause Epistemic Harm to Young People? It is well established in the literature that high stakes tests tend to privilege white, middle class norms (e.g., Knoester & Au, 2017). Therefore, contemporary high stakes tests, by their design, may permit explicit exclusion and marginalisation of groups of students whose epistemic resources are deemed to be erroneous, or irrelevant, from grammar schools (e.g., students of colour, working class students, speakers of other languages, etc.). Such students may thus be victims of epistemic injustice. A point that warrants particular consideration in this context is the role of private tutoring for admissions tests in helping to explain the social gradient in grammar school admissions, as highlighted in Chap. 5. Existing research indicates that students from higher socioeconomic groups are more likely to gain grammar school places, even if they have similar capability levels to less privileged students (Cribb et  al., 2014; Ireson & Rushforth, 2011; Jerrim & Sims, 2019). A relevant conceptual framework thus exists that helps to account for the social gradient in grammar school admissions. Therefore, it seems reasonable to argue that those politicians who condone academic selection based on high stakes tests are guilty of perpetrating epistemic injustice in its hermeneutical form against some students from lower socioeconomic backgrounds. This is particularly disturbing given that the epistemic injustice is masked by the purported meritocratic basis of academic selection via high stakes tests. Superficially, such tests appear to offer objective and incontrovertible evidence pertaining to students’ future academic potential, but the reality is

126 

I. CANTLEY

that social backgrounds can and do influence test performance through, amongst other things, the capacity of more affluent families to pay for tutoring. I suggest that some politicians, who are likely to come from middle class backgrounds, turn a blind eye to this worrying situation, since academic selection perpetuates the reproduction of middle class privilege in accessing a grammar school education. In a Northern Ireland context, for example, could it be that middle class parents, who tend to have more leverage in influencing policy decisions than those from working class backgrounds, and some politicians, fear the abolition of academic selection would lead to some grammar schools becoming fee-paying institutions? Such a move would clearly be to the financial detriment of those from middle class backgrounds who aspire to a grammar school education for their offspring. It is also conceivable that there is a desire to ensure children from more privileged backgrounds have minimal opportunity to interact with, and to be potentially negatively influenced (in the perception of parents/guardians) by those from more disadvantaged backgrounds, since the majority of those from lower socioeconomic groups are likely to be in different schools. There is, however, a further way in which academic selection could lead to epistemic harm for some students. The later Wittgenstein’s analysis of rule following was used in Chap. 2 to argue that academic capabilities are relational rather than innate attributes of students. The point was made that the score a student obtains in a high stakes test, which is taken at a particular point in time, is a measure of the student’s capability relative to the test paper utilised for the assessment. Given that the criteria of correctness for responses to test items are governed by established external practices, a student’s score on the test is a joint property of the student and the test paper, rather than an innate characteristic of the student. Thus, it is problematic to abstract the total score away from the test paper and use it to make inferences about the student’s capability as a thing-in-itself, let alone their future potential. Indeed, as I argued in Chap. 2, capability is unlikely to be an unproblematic, singular phenomenon, and factors other than academic capability influence future outcomes. This raises philosophical tensions about measurements of capabilities derived from high stakes tests that purport to measure the said capabilities. Against the backdrop of these tensions, it is entirely possible that utilising scores or grades derived from such tests to effect academic selection has the potential to lead to intellectual or moral harms for some students.

6  ETHICS OF ACADEMIC SELECTION 

127

Such a situation would manifest itself in the event of a discrepancy between the score obtained by a student in the high stakes test and the score that would accurately reflect the student’s capability level, as determined by a more robust procedure than a one-off test. In particular, the discrepancy may be associated with the student being awarded a lower or higher score than they deserve. Indeed, empirical evidence examined by Gardner and Cowan (2005) demonstrated that the candidate ranking system used in the Northern Ireland selection tests during that era may have potentially misclassified up to two-thirds of the candidates by up to three grades. A student (say student X) who is denied access to a grammar school based on a test score which under-represents their actual capability level experiences an epistemic harm, since they have been wrongly denied the opportunity to further their education in the school of their choice. Clearly, however, a student (say student Y) who gains access to a grammar school based on a higher test score than warranted for their true capability level experiences no such epistemic harm. It could be argued that student X experiences an epistemic disadvantage rather than an epistemic injustice. This is because politicians who condone academic selection, and schools that use it, do not possess a relevant conceptual framework to correctly understand the limitations of educational assessment that I have articulated in Chap. 2, and therefore student X’s exclusion from the grammar school is non-deliberate. Nevertheless, in a Northern Irish context, I dispute the hypothesised lack of familiarity of selective schools and some politicians with the limitations of high stakes educational assessment, particularly since the shortcomings of academic selection tests have been highlighted by Gardner and Cowan (2005). These shortcomings may present challenges in relation to the SEAG test to be introduced in Northern Ireland from autumn 2023. Although the SEAG assessment model is predicated on two 60-minute tests, results will be provided for those candidates who sit only one of the two tests (SEAG, n.d.), which may increase the likelihood of misclassification. Furthermore, politicians and selective schools have a moral obligation to be aware of the limitations of high stakes tests that have significant consequences for students’ future educational and vocational options. Therefore, I suggest that grammar schools and politicians who support academic selection are more culpable than it may initially appear. If this position is accepted, then both hypothetical students X and Y referenced above would be victims of epistemic injustice of the testimonial variety. Less credibility would be afforded to student X than warranted, that is,

128 

I. CANTLEY

they would be accorded a credibility deficit, while greater credibility than deserved would be given to student Y, that is, they would be accorded a credibility excess. Whilst Fricker (2007) rejected the characterisation of any individual occurrence of credibility excess as a case of testimonial injustice, I think it would be morally wrong to deny that there is at least some element of injustice in student Y securing a grammar school place based on an unwarranted credibility excess. Several scholars have argued that, since teachers work with students for extended periods, involving numerous opportunities to interact with and observe their performance on a wide range of activities, they will have a more comprehensive view of students’ capabilities than high stakes test results can provide (Johnson, 2013). Consequently, it has been suggested that teacher assessment has the potential to improve the validity (Harlen, 2007) and reliability (Wiliam, 2003) of assessment outcomes compared to what is possible using high stakes tests. Therefore, it could be conjectured that academic selection based on teacher assessment may address the difficulties associated with the use of formal tests that have been articulated above. However, academic selection predicated on teacher assessment would be problematic for several reasons. Firstly, it is highly likely that teachers would come under enormous pressure from parents to assess their children as being capable of a grammar school education. Secondly, such an approach to academic selection would perpetuate the fallacy that 10-year-­ old children can be reliably categorised as either “academic” or “non-­ academic”. Finally, several biases in teacher assessment have been noted in the literature, and these could have a bearing on the validity and/or reliability of teacher assessments. Teachers have been shown to be either consciously or unconsciously influenced by irrelevant factors when assessing students (Johnson, 2013). For example, in interviews, Wyatt-Smith et al. (2010) noted teachers’ tendencies to base their assessment decisions on irrelevant student personality traits such as behaviour and effort. Therefore, whilst it could be argued that teacher assessment has the potential to circumvent the difficulties associated with high stakes tests because teachers have an in-depth knowledge of their students’ capabilities, it is potentially afflicted by issues that compromise the validity and/or reliability of assessment judgements. Because of the difficulties alluded to above, teacher assessments could be prejudiced by irrelevant factors such as parental assertiveness, or a student’s gender, socioeconomic background, or SEN

6  ETHICS OF ACADEMIC SELECTION 

129

status. Therefore, it is entirely plausible that a student could be accorded either a credibility deficit or a credibility excess and awarded a teacher-­ assessed score or grade that is lower or higher than is warranted. This would again lead to the perpetration of an epistemic injustice of the testimonial form against the student in relation to the academic selection process.

Summary In this chapter, I have critically reviewed various perspectives on educational justice, and their implications for academic selection. These include Brighouse and Swift’s (2006) support of “levelling down”, that is, decreasing the educational opportunities available to the more socially advantaged to secure greater equality of opportunities for all citizens, which is consistent with the abolition of academic selection and a move to comprehensive post-primary education for all. As an alternative to the equality standard for education justice advocated by Brighouse and Swift (2006), I reviewed the adequacy standard supported by Anderson (2007) and Satz (2007), which academic selection would not appear to contravene provided all non-grammar school students receive an adequate standard of education. However, the term “adequate” lacks definitional clarity in this context. I also critically appraised Kotzee’s (2013) implicit support of academic selection through his dismissal of levelling down in favour of promoting the epistemic benefits of increased educational opportunities for all, including the socially privileged. Most of the extant literature on educational justice and academic selection focuses on selection by any means, and rather less attention has been accorded to the educational justice implications of academic selection predicated on high stakes tests. To address this gap in the literature, I demonstrated in the current chapter how some of the consequences of selection (see Chap. 5), together with the philosophical limitations of educational assessment (articulated in Chap. 2), mean that academic selection via high stakes tests can lead to the harmful or unjust treatment of some students. More specifically, I have argued that academic selection by any means has the potential to precipitate epistemic harms for some young people in the form of epistemic disadvantage, and potentially epistemic injustice. Although the arguments in this chapter have been developed in relation to academic selection for post-primary education, similar arguments could

130 

I. CANTLEY

be propounded for other forms of selection for educational opportunities based on the results of high stakes tests. For example, university entrance in many jurisdictions is also predicated on the results of high stakes examinations, and the process is likely to suffer from similar afflictions to academic selection for post-primary education. This leads to the inescapable question as to whether education in the twenty-first century must necessarily remain a scarce resource that will always require selection processes to access, regardless of the merits or faults of those processes. Given that online learning platforms offer increasing potential to deliver high quality, accessible education using a diverse range of interactive learning resources in a flexible and cost-effective manner, I suggest that the era of construing education as a scarce resource requiring selection procedures for suitable candidates may soon fade into history. Against this backdrop, the final chapter summarises my conclusions in relation to the future of both high stakes educational assessment and academic selection.

References Anderson, E. (2007). Fair opportunity in education: A democratic equality perspective. Ethics, 117(4), 595–622. https://doi.org/10.1086/518806 Anderson, E., & White, J. (2019). Elizabeth Anderson interviewed by John White. Journal of Philosophy of Education, 53(1), 5–20. https://doi.org/10.1111/ 1467-­9752.12336 Brighouse, H. (2002). School choice and social justice. Oxford University Press. Brighouse, H., Howe, K. R., Tooley, J., & Haydon, G. (2010). Educational equality. Continuum International Publishing Group. Brighouse, H., & Swift, A. (2006). Equality, priority, and positional goods. Ethics, 116(3), 471–497. https://doi.org/10.1086/500524 Brighouse, H., & Swift, A. (2009). Educational equality versus educational adequacy: A critique of Anderson and Satz. Journal of Applied Philosophy, 26(2), 117–128. https://doi.org/10.1111/j.1468-­5930.2009.00438.x Bruckauf, Z. (2016). Falling behind: Socio-demographic profiles of educationally disadvantaged youth. Evidence from PISA 2000-2012 (Innocenti Working Paper No. 2016-11). UNICEF Office of Research. https://www.unicef-­irc.org/ publications/837-­falling-­behind-­socio-­demographic-­profiles-­of-­educationally-­ disadvantaged-­youth-­evidence.html Coady, D. (2010). Two concepts of epistemic injustice. Episteme, 7(2), 101–113. https://doi.org/10.3366/epi.2010.0001 Coady, D. (2017). Epistemic injustice as distributive injustice. In I.  J. Kidd, J. Medina, & G. Pohlhaus Jr. (Eds.), Routledge handbook of epistemic injustice (pp. 61–68). Routledge.

6  ETHICS OF ACADEMIC SELECTION 

131

Cribb, J., Jesson, D., Sibieta, L., Skipp, A., & Vignoles, A. (2014). Poor grammar: Entry into grammar schools for disadvantaged pupils in England. The Sutton Trust. https://www.suttontrust.com/wp-­content/uploads/2013/11/Poor Grammar2013.pdf Davis, A. (2009). Examples as method? My attempts to understand assessment and fairness (in the spirit of the later Wittgenstein). Journal of Philosophy of Education, 43(3), 371–389. https://doi.org/10.1111/j.1467-­9752. 2009.00699.x Dworkin, R. (1981). What is equality? Part 2: Equality of resources. Philosophy & Public Affairs, 10(4), 283–345. http://www.jstor.org/stable/2265047 Elwood, J. (2013). Educational assessment policy and practice: A matter of ethics. Assessment in Education: Principles, Policy & Practice, 20(2), 205–220. https:// doi.org/10.1080/0969594x.2013.765384 Fricker, M. (2003). Epistemic injustice and a role for virtue in the politics of knowing. Metaphilosophy, 34(1–2), 154–173. https://doi.org/10.1111/1467-­ 9973.00266 Fricker, M. (2006). Powerlessness and social interpretation. Episteme, 3(1–2), 96–108. https://doi.org/10.3366/epi.2006.3.1-­2.96 Fricker, M. (2007). Epistemic injustice: Power and the ethics of knowing. Oxford University Press. Fricker, M. (2013). Epistemic justice as a condition of political freedom? Synthese, 190(7), 1317–1332. https://doi.org/10.1007/s11229-­012-­0227-­3 Gardner, J., & Cowan, P. (2005). The fallibility of high stakes ‘11-plus’ testing in Northern Ireland. Assessment in Education: Principles, Policy & Practice, 12(2), 145–165. https://doi.org/10.1080/09695940500143837 Goldstein, R. B. (2022). Epistemic disadvantage. Philosophia, 50(4), 1861–1878. https://doi.org/10.1007/s11406-­021-­00465-­w Harlen, W. (2007). Assessment of learning. Sage. Ireson, J., & Rushforth, K. (2011). Private tutoring at transition points in the English education system: Its nature, extent and purpose. Research Papers in Education, 26(1), 1–19. https://doi.org/10.1080/02671520903191170 Jerrim, J., & Sims, S. (2019). Why do so few low- and middle-income children attend a grammar school? New evidence from the millennium cohort study. British Educational Research Journal, 45(3), 425–457. https://doi. org/10.1002/berj.3502 Johnson, S. (2013). On the reliability of high-stakes teacher assessment. Research Papers in Education, 28(1), 91–105. https://doi.org/10.1080/0267152 2.2012.754229 Knoester, M., & Au, W. (2017). Standardized testing and school segregation: Like tinder for fire? Race Ethnicity and Education, 20(1), 1–14. https://doi.org/1 0.1080/13613324.2015.1121474

132 

I. CANTLEY

Kotzee, B. (2013). Educational justice, epistemic justice, and leveling down. Educational Theory, 63(4), 331–350. https://doi.org/10.1111/edth.12027 Liu, J., Peng, P., Zhao, B., & Luo, L. (2022). Socioeconomic status and academic achievement in primary and secondary education: A meta-analytic review. Educational Psychology Review, 34(4), 2867–2896. https://doi.org/10.1007/ s10648-­022-­09689-­y Medina, J. (2013). The epistemology of resistance: Gender and racial oppression, epistemic injustice, and resistant imaginations. Oxford University Press. Rawls, J. (1971). A theory of justice. Harvard University Press. Satz, D. (2007). Equality, adequacy, and education for citizenship. Ethics, 117(4), 623–648. https://doi.org/10.1086/518805 SEAG. (n.d.). Frequently asked questions: Answers to the most common questions about the entrance assessment. Retrieved on May 2, 2023, from https://seagni. co.uk/guidance-­for-­parents/faqs Warnock, M. (1998). An intelligent person’s guide to ethics. Gerald Duckworth &. Wiliam, D. (2003). National curriculum assessment: How to make it better. Research Papers in Education, 18(2), 129–136. https://doi. org/10.1080/0267152032000081896 Wyatt-Smith, C., Klenowski, V., & Gunn, S. (2010). The centrality of teachers’ judgement practice in assessment: A study of standards in moderation. Assessment in Education: Principles, Policy & Practice, 17(1), 59–75. https:// doi.org/10.1080/09695940903565610

CHAPTER 7

Conclusion: Implications for Policy and Practice

Abstract  In this chapter, I outline the salient aspects of a new paradigm for educational assessment that I suggest may be more robust, and which may be less afflicted by validity, reliability, and allied ethical dilemmas than contemporary models of assessment. To address some of the tensions associated with contemporary approaches to high stakes testing, I make the case for an assessment model based on a system of distributed continuous assessment, and I advocate use of the method of comparative judgement to assess higher level cognitive skills. I then go on to consider the implications of my critique of educational assessment, and the proposed new assessment paradigm, in relation to the future of academic selection for post-primary education. Keywords  Academic banding • Assessment reform • Comparative judgement • Comprehensive schools • Distributed continuous assessment • Future of academic selection

Introduction As I indicated in Chap. 6, the increasing prevalence of high-quality online learning may lead to a diminishing rationale in the future for construing education as a scarce resource that requires a selection procedure for

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 I. Cantley, The Philosophical Limitations of Educational Assessment, https://doi.org/10.1007/978-3-031-47021-9_7

133

134 

I. CANTLEY

suitable candidates based on high stakes tests. In the meantime, however, I believe that policymakers should give due consideration to improving high stakes educational assessment to make it more robust than the contemporary approaches to such assessment currently utilised in Northern Ireland, the United Kingdom more generally, and numerous other jurisdictions around the world. Of course, even against the backdrop of more ubiquitous access to educational opportunities, and education becoming a more plentiful resource, high stakes educational assessments are likely to be required to assist in selection of suitable candidates for jobs and potentially those higher education courses that require high levels of academic capability. I argued in Chap. 6 that contemporary approaches to high stakes testing in the context of academic selection of students for post-primary education have the potential to perpetrate epistemic injustice against some students, or to lead to other forms of epistemic harm for certain students. I also pointed out that such moral harms are likely to be a feature of other forms of selection predicated on the results of high stakes tests, such as in the selection of suitable candidates for higher education courses. Moreover, I stressed that a transition to a system of teacher assessment is unlikely to ameliorate the difficulties associated with high stakes testing. Therefore, in this chapter, I outline the salient aspects of a new paradigm for educational assessment that I suggest may be more robust, and which may be less afflicted by validity, reliability, and allied ethical dilemmas than contemporary models of assessment. I then go on to consider the implications of my critique of educational assessment, and the proposed new assessment paradigm, in relation to the future of academic selection for post-primary education.

Towards a New Paradigm for Educational Assessment: Implications for High Stakes Public Examinations In Chap. 2, I argued that high stakes tests taken at a particular point in time may not yield valid and reliable assessments of students’ capabilities. I also highlighted that these difficulties are likely to be exacerbated when the tests incorporate items that attempt to assess students’ higher level cognitive skills, such as extended writing assignments or open-ended problem-solving tasks. Therefore, in contemporary high stakes tests, students could be awarded scores that are not reflective of their capabilities in

7  CONCLUSION: IMPLICATIONS FOR POLICY AND PRACTICE 

135

a particular domain. Use of the results of these tests to make decisions about future pathways may thus lead to unwarranted decisions about the educational or vocational opportunities available to some students. In addition, I have argued that teacher assessments of students’ capabilities are also likely to lead to compromises in the validity and/or reliability of assessment judgements, resulting in similar ethical dilemmas to those posed by contemporary high stakes tests. A more robust assessment paradigm may consist of judging student performance on the basis of several assessments taken at different points over an extended time interval. I would envisage these assessments to be predominantly formative in nature in the sense that their results would be used to inform teachers’ subsequent pedagogical approaches to optimise student learning, and to inform student choice. Standardised tracking tests could be used to assess relevant student knowledge, understanding and skills on multiple occasions during a course of study. Whilst each discrete test would clearly be subject to the Wittgensteinian critique that I propounded in Chap. 2, consistency of performance over the extended time interval may engender greater confidence in the assessment outcomes than any one of the discrete tests by itself. Students’ summative scores or grades, if required, could then be derived from, say, the best six tracking test results that they had achieved during their course of study. I have proposed the inclusion of results from six tracking tests since it has been suggested that a reliable judgement about a student’s capability can only be made when the student has attempted six or more assessment tasks (Wiliam, 2003). However, other options could be considered including, for example, the incorporation of the result obtained in a final, integrative test that assesses the entire content of the domain being assessed. These tests could, for example, be administered via computer and consist of a combination of multiple-choice, short-answer and extended-response questions, as appropriate for the test or discipline concerned. A uniform mark scale could be used to compensate for any variation in the overall difficulty of the assessment instruments between different assessment periods. Assuming the tests were well-constructed, and accurately marked, this may provide a more secure evidence base upon which decisions pertaining to students’ attainment, and interventions to improve their attainment levels, could be predicated than is offered by current assessment arrangements in the United Kingdom. Other sources of evidence beyond tracking test results could be incorporated into the evidence base contributing to summative assessments for students with genuine, certified extenuating circumstances.

136 

I. CANTLEY

Whilst marking based on traditional mark schemes would be acceptable for multiple-choice and other short-answer questions, an alternative approach is required to deal appropriately with the marking of extended-­ response questions. To deal with the difficulties associated with reliably assessing higher order cognitive skills, such as extended writing and open-­ ended problem-solving capabilities, I suggest that due consideration should be given to using the process of comparative judgement (Pollitt, 2012). Comparative judgement entails evaluating student performance by making comparisons between two students’ responses to a given test item, and deciding which response better meets the agreed assessment criteria for the item. A large number of repetitions of this process occur, with a number of different examiners making comparative judgements about the relative merits of pairs of student responses. A statistical model is then used to create a scale and to position each candidate’s response at an appropriate point on the scale. Marks derived from this scale are posited to yield more reliable measures of student performance on a test item that assesses higher level cognitive skills than would be achievable using conventional mark schemes (Pollitt, 2012). The method of comparative judgement is predicated on Louis Thurstone’s law of comparative judgement, which he initially formulated to compare intensities of physical stimuli in psychophysics, although he also emphasised its applicability to comparisons of performance in educational contexts (Thurstone, 1927). The primary rationale for using comparative judgement to assess higher order cognitive skills is that human beings are more accomplished at making comparative judgements than they are at making absolute judgements (Jones & Inglis, 2015). Furthermore, it has been suggested that assessments based on comparative judgements made by a panel of experts must necessarily exhibit high levels of validity since validity is implicit in the scoring process adopted in comparative judgement (Kelly et al., 2022). Indeed, comparative judgement has already been successfully used to assess higher level cognitive skills in several different disciplines, including mathematics (Jones & Inglis, 2015), English language arts (Steedle & Ferrara, 2016), geography (Whitehouse & Pollitt, 2012), and technology and design (Kimbell, 2012). Despite the increasing interest in comparative judgement in recent years, some scholars have challenged its underpinning theoretical foundations. For example, Kelly et al. (2022) suggested that the psychological basis of comparative judgement is underdeveloped in the extant literature. They raised concerns about the lack of empirical evidence to support the

7  CONCLUSION: IMPLICATIONS FOR POLICY AND PRACTICE 

137

assumption that human beings are more accomplished at making comparative judgements than they are at making absolute judgements. According to Kelly et al. (2022), this claim tends to be justified with reference to Thurstone’s (1927) work, but they cast doubt on the applicability of a law that was devised to compare the intensities of short physical stimuli in psychophysics to the comparatively longer stimuli associated with educational assessment. Relatedly, Bramley (2007) noted that applications of Thurstone’s law of comparative judgement to the longer stimuli likely to be involved in assessment judgements might lead to order effects in comparisons of pairs of student responses, whereby aspects of one response could interact with the interpretation of the other response. Kelly et al. (2022) also questioned the supposition that assessment judgements predicated on comparative judgement are necessarily valid by virtue of the fact expert judges make the comparative judgements. They cautioned that this line of reasoning creates a novel conceptualisation of validity that is unique to comparative judgement, and which is not aligned with any existing validity framework. Although Kelly et al. (2022) cast doubt on the theoretical foundations of comparative judgement, their critique is considerably weakened by the empirical evidence pertaining to the efficacy of the method that has been produced by some researchers. For example, in the context of mathematical problem solving, Jones and Inglis (2015) demonstrated that comparative judgement led to acceptably high levels of both reliability and criterion validity, where criterion validity was assessed by investigating correlations between scores generated by comparative judgement and other measures of student performance. Similarly, Steedle and Ferrara (2016) found that comparative judgement-based assessments of English language arts essays yielded high levels of both reliability and validity (which was again quantified by correlating comparative judgement scores with other measures of students’ performance). It could be argued that the tracking tests would be oppressive for students since it would mean they are constantly under pressure to perform optimally. However, this could be addressed by basing students’ scores on a subset of a larger number of tracking test results, thus meaning that students’ overall performance would not be negatively impacted by sub-­ optimal performances in some of the tests. A further concern about the proposed system may be the burden that creation and administration of the tracking tests would place on awarding bodies. In situations where factors influencing the difficulty of questions used in the tests were known,

138 

I. CANTLEY

question shells could be used to generate large collections of questions of similar difficulty with the assistance of computers. Notwithstanding the tensions alluded to above, an assessment paradigm of the type I have proposed would offer a number of advantages. Most importantly, the use of comparative judgement to mark test items that assess higher order cognitive skills, such as essay writing, may help to ensure reliability of assessment judgements. Moreover, the proposed paradigm may help to reduce the probability of some students experiencing epistemic harms as a consequence of being awarded inaccurate results in a high stakes test taken at a particular point in time. However, the proposed paradigm may also address some of the other tensions associated with contemporary high stakes testing regimes. In particular, the transition away from high stakes tests, taken solely at the end of a course of study, to a more distributed mode of continuous assessment, with a focus on formative use of assessment outcomes to improve future learning, may help to alleviate the well-documented examination stress and anxiety that are experienced by many students (Kellaghan & Greaney, 2020). For example, Adams (2015) noted that large numbers of students in England seek counselling for problems precipitated by examination-related stress. Furthermore, the use of distributed continuous assessment would help to mitigate the effects of a crisis such as that precipitated by the Covid-19 pandemic. In the event of tracking tests being cancelled in a particular year, it should still be possible to base students’ scores or grades on the results of tracking tests taken in non-disrupted years. The replacement of contemporary high stakes tests by the proposed assessment paradigm may, in my view, lead to more robust judgements being made about students’ academic capabilities. This, in turn, may lead to fairer and more reliable decisions being made by both students/parents and educational institutions/employers about students’ future educational or vocational options at transition points in their careers. Clearly, any move to such an assessment paradigm would need to take place over an extended period of time rather than instantaneously. I therefore suggest that an incremental move towards the new model would be appropriate, commencing with trialling the use of comparative judgement in assessing higher order cognitive skills, before exploring the other suggestions I have proffered. I envisage the proposed assessment paradigm being applicable to most high stakes tests that determine students’ future educational and

7  CONCLUSION: IMPLICATIONS FOR POLICY AND PRACTICE 

139

vocational options since it would not unduly penalise candidates for having a “bad day” when any individual test is taken. However, it is important to note that there are practical limits to the critique of educational assessment I propounded in Chap. 2, and the assessment paradigm I have proposed in the current chapter, when it is applied to professional education. For example, if a nursing student is expected to demonstrate the capability to accurately prepare a 1/500 dilution of a stock solution of a drug to graduate as a nurse, the assessment of this capability cannot allow for the interpretive latitude implicit in the assessment paradigm proposed above. It would be highly inappropriate to base summative assessment judgements on, for example, the student’s best six attempts at a test assessing the said capability since lives may depend on the correct interpretation of the prescription in every instance. Numerous other real-world examples of this type from medicine, engineering, jurisprudence, and other fields are conceivable, where my critique of assessment would be overridden by other moral considerations. Nevertheless, most high stakes tests do not have life or death consequences. For example, in many disciplines, suboptimal performance in a single test of knowledge/skills is highly unlikely to have negative moral consequences, and it should not therefore prevent a student from pursuing a more advanced course of study in the discipline. However, as I pointed out in Chap. 2, although the results of academic achievement tests, irrespective of their format, may correlate with future academic achievement and other outcomes, such as earning potential, there are other factors that impact on future outcomes. In particular, softer skills such as intrapersonal and interpersonal competence have a bearing on longer-term outcomes such as employment prospects (Heckman & Kautz, 2012; Parts et  al., 2013; Stal & Paliwoda-Pękosz, 2019). Since academic achievement tests tend to attach less priority to assessing these softer skills, and to focus on the assessment of academic competence (Heckman & Kautz, 2012), caution needs to be exercised in relation to relying solely on the results of high stakes tests of academic achievement to predict future outcomes. In the next section, I consider the implications for the future of academic selection of my proposed solution for addressing the philosophical limitations of educational assessment, and the evidence pertaining to the consequences of academic selection that I articulated in Chaps. 5 and 6.

140 

I. CANTLEY

The Future of Academic Selection Even if a more robust form of assessment than a one-off high stakes test towards the end of primary education could be implemented, as per the assessment model I have proposed above, academic selection based on the reformed assessment paradigm would still be problematic. Whilst the potential epistemic harms, in the form of epistemic disadvantage or testimonial injustice, associated with the use of one-off high stakes selection tests would be ameliorated, the hermeneutical injustice associated with academic selection would be likely to persist. For example, it is well established that students from higher socioeconomic groups are more likely to be tutored for current grammar school selection tests, and consequently to gain grammar school places, even if they have similar capability levels to less privileged students (Cribb et  al., 2014; Ireson & Rushforth, 2011; Jerrim & Sims, 2019). It is highly probable that the selection bias in favour of students from more privileged socioeconomic backgrounds in the grammar school admission stakes would persist under the new assessment paradigm. The continued use of private tutoring by some of the more affluent parents/guardians in an attempt to ensure success in the reformed selection procedure and the grammar school admission stakes is one mechanism through which this would probably occur. Therefore, some students from lower socioeconomic backgrounds could potentially still be victims of hermeneutical marginalisation and injustice under the reformed assessment paradigm. Consequently, even if a more robust grammar school selection process were introduced, the issue of fairness in relation to grammar school admission would persist, thus leading to the continued perpetration of epistemic injustice against some students. In the light of my analysis, I contend that education systems which condone academic selection should abandon selection in its current form and re-envision their approach to the transfer of students from primary to post-primary education. This is particularly pertinent in the case of Northern Ireland and those areas of England that retain academic selection, since selection was first introduced in these jurisdictions by legislation that was significantly informed by the discredited work of the psychologist Cyril Burt. As I highlighted in Chap. 4, it is highly likely that Burt’s controversial claims pertaining to innate, immutable intelligence that can be measured accurately and reliably by a high stakes test taken at 10 or 11  years of age, were based on fraudulent research practices. Although the evidence contravening Burt’s discredited views has led to

7  CONCLUSION: IMPLICATIONS FOR POLICY AND PRACTICE 

141

the abandonment of academic selection in most of Great Britain, it is problematic that it has not led to a similar outcome in Northern Ireland and some parts of England, where academic selection continues to flourish. Furthermore, it is inappropriate to view achievement measures in any form of selection procedure as unproblematic predictors of future outcomes, particularly in life beyond formal education. Most of the extant research on academic selection has investigated its impact on later academic achievement at school, or on progression to higher education. Rather less attention has been given to how it influences individuals’ longer term personal autonomy, health, happiness, and social responsibility, which are arguably more important outcomes of education than academic achievement. Indeed, measures of academic achievement tend to attach much greater priority to linguistic and logico-mathematical attainment than other aspects of human competence, such as intrapersonal and interpersonal skills, which also have a bearing on future outcomes (Heckman & Kautz, 2012; Parts et al., 2013; Stal & Paliwoda-Pękosz, 2019). Below, I critically evaluate several options for the re-envisioning of selective education systems to embrace more inclusive and socially just approaches to educating children and young people. Firstly, the most obvious solution is to replace the bipartite system of grammar and secondary schools by all-ability comprehensive schools, which would educate all students within the same institution, irrespective of their academic capabilities. Assuming the schools were large enough to provide access to a full or sufficient curriculum, a major advantage of this would be the greater flexibility it would offer to students in terms of their future educational options since particular pathways would not be denied to them because of their early academic performance. However, it is highly probable that current curriculum and assessment arrangements in the United Kingdom would necessitate some form of differentiation by academic capability within comprehensive post-primary schools, which could potentially undermine the rationale for transitioning to a comprehensive system. If, for example, the school implemented a rigid system of streaming, this would lead to the segregation of students by academic capability within the school, thus potentially diminishing the benefits associated with comprehensive education and replicating some of the undesirable features of academic selection, albeit under a different guise. In the longer term, the detrimental effects of streaming could be avoided by harnessing the potential of information and communications technology to deliver personalised curricula and learning experiences

142 

I. CANTLEY

(including associated assessment arrangements) for young people. Recent advances in artificial intelligence may pave the way to this solution rather than the alternative option of laying down specific pathways within which young people must fit. However, in the interim, it would be important to mitigate the harmful effects of streaming by teaching students in mixed ability groups where possible, and to use a fluid system of streaming where mixed ability groups are not feasible, so that students could move between different streams based on their performance. Nevertheless, it is important to note that, to maximise the use of mixed ability teaching, school curriculum and assessment arrangements would need to be sufficiently flexible to allow teachers to make widespread use of differentiation by outcome rather than differentiation by task. Differentiation by outcome refers to situations where all students attempt a common task that permits a variety of student responses, and a range of outcomes emerge for different students depending upon their capability level. Differentiation by task, on the other hand, entails setting different tasks for students with different capability levels. Perhaps a move to the use of comparative judgement for the assessment of open-ended tasks, which I advocated as a potential means of improving the reliability of assessment judgements earlier in the current chapter, may help to facilitate greater accord being given to differentiation by outcome in school curricula and assessment models. A more robust method for assessing tasks that facilitate differentiation by outcome may mean that they are more commonly integrated into formal assessments, and therefore given priority in curricula, and ultimately in classroom teaching. A further potential drawback of a comprehensive model of post-­primary education is the possibility of it effectively replacing academic selection by social selection. In a system that permits parental choice, a mechanism is required to select students for those schools that are oversubscribed. If the filtering mechanism prioritises the proximity of the home and school locations, it is highly probable that more affluent families would seek to purchase houses within the catchment areas of top-performing schools (Cullinane et al., 2017). This would lead to the concentration of students from similar socioeconomic backgrounds within post-primary schools, thus undermining the rationale for transitioning from academic selection to a comprehensive model of post-primary education. To address this issue, the proximity of the home and school locations would need to be de-prioritised in school admissions criteria. Similarly, if the oversubscription criteria give priority to students who have a sibling at the school, or a

7  CONCLUSION: IMPLICATIONS FOR POLICY AND PRACTICE 

143

relative who attended the school in the past, this may act to preserve the social composition of the school. Indeed, when the unregulated Northern Ireland academic selection tests were cancelled in 2020 because of the Covid-19 pandemic, the demographic composition of the 2021 intake to Northern Ireland grammar schools was consistent with the composition in years that had not been impacted by the pandemic. This is unsurprising since grammar schools appeared to devise their admissions criteria for the affected cohort to preserve their social compositions (Purdy et al., 2023). A possible solution to such issues would be to enforce random allocation of students to oversubscribed schools rather than permitting schools to devise their own oversubscription criteria (Boyle, 2010). However, random allocation has some obvious undesirable implications, such as the likelihood of it leading to increased journey times for students who are not allocated places in their nearest school and the scattering of siblings and friendship groups over a potentially wide geographic area. Whilst the increased distances between home and school would be less problematic in urban areas with relatively large numbers of schools, it could be extremely challenging in jurisdictions such as Northern Ireland with large rural communities. An alternative option would be the replacement of parental choice with some form of random allocation of students to schools within particular geographic areas. The United Kingdom/Northern Ireland governments subscribe to the principle of parental choice since they consider that creating a competitive market for school places serves to improve the standard of educational provision within schools. Yet there is scant evidence to indicate that parental choice does in fact raise educational standards (Boyle, 2010). Indeed, it has been suggested that parents/guardians of students from lower socioeconomic groups are less likely than those from middle class backgrounds to make optimal use of the options afforded by the parental choice agenda to make appropriate choices for the longer-term benefit of the children (Tough & Brooks, 2007). Furthermore, Stone (2008) posited that random allocation of students to schools offers a uniquely just way to allocate places since it ensures impartiality by factoring out all types of rationality from the allocation process, including both virtuous and non-virtuous forms of reasoning. Non-virtuous reasons would include those that lead to allocation decisions predicated on socioeconomic status, such as parents’ ability to buy houses in school catchment areas where proximity of home and school locations features prominently in admissions criteria. However, as noted previously, random

144 

I. CANTLEY

allocation would be challenging in jurisdictions with large rural populations. A further option would be to assign post-primary schools to wide catchment areas, ensuring that each area includes several post-primary schools of diverse types, and allow parents to choose any school within the wide catchment area. The actual size of the catchment area would differ in urban or rural contexts, and random allocation within the catchment area would only be used for those schools that are oversubscribed. Such an approach would have the potential additional benefit of encouraging parents to exert pressure on politicians and policymakers to ensure that all schools in a catchment area are viewed as good schools. Indeed, in the Northern Ireland context, such a goal is meant to be the animating principle of the jurisdiction’s education policy (DENI, 2009). However, although random allocation of students to schools may be fair and just from a technical perspective, it may not be viewed in this light by the public at large because of the lack of an apparent rationale for school allocations. Any form of random allocation of students to schools is therefore likely to be fraught with difficulties. To address the seemingly intractable problems associated with promoting greater social diversity within schools, and equality of opportunity for all students irrespective of their socioeconomic background, greater promise may be offered by a system for allocating students to all-ability comprehensive schools based on “academic banding”. Parental choice would be on the agenda insofar as parents/guardians would nominate several preferred schools. Academic banding would necessitate an assessment of all students’ academic capabilities, which could be undertaken using the assessment paradigm that I outlined above during the latter stages of primary education. A summative score awarded to each student towards the end of their primary education would be used in the academic banding process to ensure that each all-ability comprehensive school admits students with a wide range of prior academic attainment. Given the positive correlation between academic attainment and social class, it is likely that, if a school were compelled to admit students with a range of prior attainment, this would also lead to a socially diverse intake. Under this regime, the applicant pool for a given post-primary school would be divided into several equal bands based on the score obtained in the primary school summative assessment, and an equal number of students from each band would ultimately be admitted to the school. For example, if four bands

7  CONCLUSION: IMPLICATIONS FOR POLICY AND PRACTICE 

145

were used, the top band would consist of those students with scores in the top quartile, the second band would consist of students scoring in the second quartile, and so on. After students had been allocated to bands, no further use would be made of the score obtained in the primary school assessment. The post-primary school would then admit one quarter of its admission number from each band. In cases of over-subscription within bands, other criteria could be applied within the band, such as the proximity of home and school, the presence of siblings in the school, etc. Although I have pointed out the potential of such criteria to lead to social segregation between schools, the academic banding process would probably serve to limit the extent of this compared to what would occur in the absence of banding. If academic banding were considered a viable option, it would need to be administered centrally rather than by individual schools, with a common number of bands and band over-subscription criteria being used for all post-primary schools. Otherwise, there would be the possibility of individual schools attempting to game the system to enhance the profile of their intake, which again could potentially lead to increased levels of social segregation between schools. As I indicated previously, maximal use should be made of mixed ability teaching within the banded post-primary schools to obviate the possibility of social segregation occurring within rather than between schools. However, as I have acknowledged, this would require the post-primary curriculum and assessment model to be sufficiently flexible to support widespread use of mixed ability teaching groups. Although the academic banding system is not flawless, and I suggest that no system would be, I think it represents a better option than contemporary approaches to academic selection. It would offer genuine prospects for limiting social differentiation between schools, promoting greater equality of opportunity for all students irrespective of socioeconomic background, and enhancing social cohesion.

Concluding Remarks In this book, I have used philosophical analysis to argue that capability is a relational rather than an innate characteristic of an individual. I have argued that such a conceptualisation of capability has important implications for the validity and reliability of contemporary high stakes test results that purport to quantify academic capabilities. I contend that it is

146 

I. CANTLEY

incoherent to abstract a measurement of a capability away from a one-off high stakes test, which is taken at a particular point in time. This means that the results of such a test may not accurately represent a student’s capability level, nor yield consistent results if re-administered under similar conditions. These difficulties are significantly exacerbated when a test incorporates items that assess higher level cognitive skills such as essay writing or open-ended problem solving. Furthermore, I have highlighted that contemporary high stakes tests tend to prioritise the assessment of linguistic and logico-mathematical attainment over other aspects of human competence, such as intrapersonal and interpersonal skills, which also have a bearing on future outcomes. In addition, I have made the point that ethical issues may emerge if the results from contemporary high stakes tests are used to make decisions about students’ future educational or vocational options. Students may be disadvantaged or treated unjustly from an epistemic perspective because of their socioeconomic background or if the test score or grade does not accurately reflect their capability level. Therefore, I suggest that the practice of making decisions about students based on the results obtained in contemporary high stakes tests potentially incurs logical and moral problems that a conscientious educator cannot ignore. The gravity of the moral transgression depends on the purpose and significance of the test and, in the case of high stakes tests used for academic selection purposes, I have argued that, not only can the moral wrong be highly significant, but better solutions are within reach. To address the tensions associated with contemporary approaches to high stakes testing, I have proposed one possible assessment paradigm based on a system of distributed continuous assessment, and I have advocated use of the method of comparative judgement to assess higher level cognitive skills. I have also proffered suggestions for the re-envisioning of academically selective education systems to eschew academic selection in favour an academic banding approach to admitting students to all-ability comprehensive schools at post-primary level. I suggest that, in jurisdictions such as Northern Ireland, which condone academic selection, such a move would help to ameliorate the potentially harmful consequences of making important decisions about children at the tender age of 10 or 11 years based on a one-off high stakes test of their academic achievement. I acknowledge that the proposed solutions to the issues associated with both high stakes tests and academic selection may not be perfect. However,

7  CONCLUSION: IMPLICATIONS FOR POLICY AND PRACTICE 

147

I hope my contribution promotes discussion and debate about the issues I have raised. These matters are too important for educationalists to bury their heads in the sand, and to persist with flawed policies that could have deleterious consequences for the life chances of many children and young people.

References Adams, R. (2015, May 14). Surge in young people seeking help for exam stress. The Guardian. https://www.theguardian.com/education/2015/may/14/ calls-­to-­childline-­over-­exam-­stress-­break-­records Boyle, C. (2010). Lotteries for education: Origins, experiences, lessons. Imprint Academic. Bramley, T. (2007). Paired comparison methods. In P.  Newton, J.  Baird, H.  Goldstein, H.  Patrick, & P.  Tymms (Eds.), Techniques for monitoring the comparability of examination standards (pp.  246–294). Qualifications and Curriculum Authority. Cribb, J., Jesson, D., Sibieta, L., Skipp, A., & Vignoles, A. (2014). Poor grammar: Entry into grammar schools for disadvantaged pupils in England. The Sutton Trust. https://www.suttontrust.com/wp-­content/uploads/2013/11/Poor Grammar2013.pdf Cullinane, C., Hillary, J., Andrade, J., & Stephen, M. (2017). Selective comprehensives 2017: Admissions to high-attaining non-selective schools for disadvantaged pupils. The Sutton Trust. https://www.suttontrust.com/wp-­content/ uploads/2019/12/Selective-­Comprehensives-­2017.pdf DENI. (2009). Every school a good school: A policy for school improvement. DENI. https://www.education-­ni.gov.uk/sites/default/files/publications/ de/ESAGS%20Policy%20for%20School%20Improvement%20-­%20Final%20 Version%2005-­05-­2009.pdf Heckman, J. J., & Kautz, T. (2012). Hard evidence on soft skills. Labour Economics, 19(4), 451–464. https://doi.org/10.1016/j.labeco.2012.05.014 Ireson, J., & Rushforth, K. (2011). Private tutoring at transition points in the English education system: Its nature, extent and purpose. Research Papers in Education, 26(1), 1–19. https://doi.org/10.1080/02671520903191170 Jerrim, J., & Sims, S. (2019). Why do so few low- and middle-income children attend a grammar school? New evidence from the millennium cohort study. British Educational Research Journal, 45(3), 425–457. https://doi. org/10.1002/berj.3502 Jones, I., & Inglis, M. (2015). The problem of assessing problem solving: Can comparative judgement help? Educational Studies in Mathematics, 89(3), 337–355. https://doi.org/10.1007/s10649-­015-­9607-­1

148 

I. CANTLEY

Kellaghan, T., & Greaney, V. (2020). Public examinations examined. World Bank. https://openknowledge.worldbank.org/bitstream/handle/10986/ 32352/9781464814181.pdf?sequence=2&isAllowed=y Kelly, K.  T., Richardson, M., & Isaacs, T. (2022). Critiquing the rationales for using comparative judgement: A call for clarity. Assessment in Education: Principles, Policy & Practice, 29(6), 674–688. https://doi.org/10.108 0/0969594x.2022.2147901 Kimbell, R. (2012). Evolving project e-scape for national assessment. International Journal of Technology and Design Education, 22(2), 135–155. https://doi. org/10.1007/s10798-­011-­9190-­4 Parts, V., Teichmann, M., & Rüütmann, T. (2013). Would engineers need non-­ technical skills or non-technical competences or both? International Journal of Engineering Pedagogy, 3(2), 14–19. https://doi.org/10.3991/ijep.v3i2.2405 Pollitt, A. (2012). The method of adaptive comparative judgement. Assessment in Education: Principles, Policy & Practice, 19(3), 281–300. https://doi.org/1 0.1080/0969594x.2012.665354 Purdy, N. Walsh, G., Orr, K., Millar, A., & Ballentine, M. (2023). Testing times: Northern Ireland school transfer without tests in 2021. Centre for Research in Educational Underachievement. https://www.stran.ac.uk/wp-­content/ uploads/2023/03/TestingTimes-­Report-­March-­2023.pdf Stal, J., & Paliwoda-Pękosz, G. (2019). Fostering development of soft skills in ICT curricula: A case of a transition economy. Information Technology for Development, 25(2), 250–274. https://doi.org/10.1080/0268110 2.2018.1454879 Steedle, J.  T., & Ferrara, S. (2016). Evaluating comparative judgment as an approach to essay scoring. Applied Measurement in Education, 29(3), 211–223. https://doi.org/10.1080/08957347.2016.1171769 Stone, P. (2008). What can lotteries do for education? Theory and Research in Education, 6(3), 267–282. https://doi.org/10.1177/1477878508095583 Thurstone, L.  L. (1927). A law of comparative judgment. Psychological Review, 34(4), 273–286. https://doi.org/10.1037/h0070288 Tough, S., & Brooks, R. (2007). School admissions: Fair choice for parents and pupils. Institute for Public Policy Research. https://www.ippr.org/files/ images/media/files/publication/2011/05/schooladmissions_1582.pdf Whitehouse, C., & Pollitt, A. (2012). Using adaptive comparative judgement to obtain a highly reliable rank order in summative assessment. Centre for Education Research and Practice. https://filestore.aqa.org.uk/content/ research/CERP_RP_CW_20062012_2.pdf?download Wiliam, D. (2003). National curriculum assessment: How to make it better. Research Papers in Education, 18(2), 129–136. https://doi.org/10.1080/ 0267152032000081896

Index

A Academic banding (proposed system), 144–145 Academic selection, higher education and vocational training, 3, 112, 116, 129–130 Academic selection, post-­ primary schools alternative paradigm (proposed), 140–145 Austria, 68 Chile, 81 China, 81–82 Craigavon area, Northern Ireland, 63, 102 effectiveness (see Effectiveness of selective education systems) 11+ test, 59–61, 63, 127 England (see under England) and epistemic harm, 125–129 gender quotas, 64 Germany, 68 history of, 3, 54, 59–60, 63–66 impact on primary teaching, 5–6, 12, 38, 66, 98

impact on socio-emotional outcomes, 5, 96–97, 138 and liberties, 113–114 Mexico City, 81 mounting criticisms of, 60, 61 Northern Ireland (see under Northern Ireland) private tutoring/coaching, 87, 117, 125, 140 qualifying examination (Northern Ireland), 63 Singapore, 68 and social division/cohesion, 100 and social mobility, 5, 79, 84–85, 88–93, 117 and social stratification, 6, 60, 84–86, 93–95, 100, 125–126 South Korea, 87 teacher assessments for (as problematic), 128–129 teacher continuous assessment (proposal), 134–139 test validity and flawed assumptions, 5, 36, 60, 66–67

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 I. Cantley, The Philosophical Limitations of Educational Assessment, https://doi.org/10.1007/978-3-031-47021-9

149

150 

INDEX

Academic selection, post-­primary schools (cont.) transfer tests (Northern Ireland) (see Transfer tests (Northern Ireland)) Wales and Scotland, 67 See also Grammar schools Academic self-concept, 97 Accountability of schools, 19, 20 Adams, R., 138 Adequacy, as standard for educational justice, 114 Advisory Council for Education in Northern Ireland, 63 Aggregated test scores, 28, 135 Anderson, E., 114 Anderson, K., 81–82 Andrews, J., 85 AQE (Association for Quality Education) transfer tests, 66, 110 Araya, P., 81 Austria, post-primary academic selection, 68 B ‘Bad days,’ 39–40 Behaviourism, 26, 27 Bias, teacher, 18, 48, 128–129 Binet, Alfred, 4 Bloom, B. S., 23, 41 Bol, T., 94 Bramley, T., 137 Brighouse, H., 112–115, 129 Brown, M., 95, 98 Burges Report (1973), 63 Burgess, S., 90–91 Burns Report, 64–65 Burt, Sir Cyril, 4–5, 58–62, 140–141 Buscha, F., 91–92 Byrne, G., 96, 97

C Chile, academic selection, 81 China, academic selection, 81–82 Cizek, G. J., 14 Clark, D., 80 Classroom environment, and educational outcomes, 95 Coaching/private tutoring, 87, 117, 125, 140 Coady, D., 119–122 Coe, R., 36, 77–79, 86 Cognitive skills, Bloom’s taxonomy, 23, 41 Cognitive states, and observable behaviours, 16, 19, 26–27 Cognitivism, 27 Comparative judgement, 136–138, 142 Comprehensive schools catchments and social selectivity, 86, 90–91, 142–144 effectiveness vs. selective education systems, 80, 113 England, Scotland and Wales, 67, 86 introduction of (1960s-1970s), 61 Northern Ireland, 63 proposed new paradigm, 140–145 rejection by Spens report (1938), 60 Continuous assessment paradigm (proposed), 134–139 Continuous assessment, see Teacher assessments Costello Report, 65 Covid-19 pandemic, assessment during, 46–48, 143 Cowan, P., 5, 127 Craigavon area, Northern Ireland, 63, 102 Crawford, C., 80–81 Credibility deficit/excess, 118–119, 123 (see also Epistemic disadvantage; Epistemic injustice) and distributive justice, 120–121

 INDEX 

Cribb, J., 87 Criterion-referenced testing, 37 Curren, R., 19 D Daly, P., 75–76 Darwin, Charles, 55–56 Davis, A., 15–19, 41, 117 Dickson, M., 90 Difference principle (of distributive justice), 111, 113, 116 Differentiation by outcome (not task), 142 Disabilities and special educational needs and grammar school attendance, 85–86 reasonable adjustments in examinations, 39 Discriminatory epistemic injustice, 118, 120–122 Distributive justice, 111, 113, 116, 120–121 and credibility, 120–121 distributive epistemic injustice, 119–122 (see also Epistemic injustice) and online learning, 130 See also Ethics in academic selection Domain-specific attributes, measurement and testing, 27–30 Dussaillant, F., 81 E Education Act (England and Wales) (1944), 3, 4, 59 Education Act (Northern Ireland) (1947), 3–4, 54, 59 Educational Assessment on Trial (Davis et al.), 19–20

151

Educational justice, 110–117 Education Policy Institute (EPI), 42 Education Reform (England and Wales) Act (1988), 64 Education Reform (Northern Ireland) Order (1989), 64, 77 Effectiveness of selective education systems vs. comprehensive systems, 80 England, 5, 76–78, 80–81 exam performance, 5, 74–80 higher education participation and outcomes, 80–81, 88–89, 91 international evidence and meta-­ analysis, 81–82, 94 longer-term outcomes, 81, 141 Northern Ireland, 75–77, 80 research design shortfalls and considerations, 75, 77, 79, 82–84, 89–90 value-added measurement, 78–81 11+ test, 59–61, 63, 127 See also Academic selection, post-primary schools Elwood, J., 110–111 England exam performance in selective/ non-selective schools, 5, 76–78, 80–81 grammar schools social composition, 85, 87, 89, 90 post-primary academic selection, 67–68 (see also Academic selection, post-primary schools) private tutoring, 87 (see also Private tutoring/coaching) English testing, see Literacy knowledge and testing Epistemic disadvantage, 111–112, 123–125, 127 Epistemic injustice, 111–112, 118–129, 140

152 

INDEX

Estrada, R., 81 Ethics in academic selection educational justice, 110–117 epistemic disadvantage, 111–112, 123–125, 127 epistemic injustice, 111–112, 118–129, 140 introduction to, 109–110 validity of tests, 13–14, 110–111 Eugenics, 4, 56–58 Exley, S., 87 F Fair equality of opportunity principle (of distributive justice), 111, 113 Ferrara, S., 137 Free school meals (FSM), 75, 78, 85, 86, 89 Fricker, M., 118–124, 128 ‘Fuzziness’ of test scores/grades, 43–45, 127 G Gallagher, T., 80, 96–98 Galton, Sir Francis, 4, 54–59 Gardner, H., 5 Gardner, J., 5, 127 GCSE and GCE ‘A’ Level examinations grade cut scores, 40–41, 44 grade reliabilities by subject (Ofqual analysis), 36, 41–42, 45 performance of selective vs. non-­ selective schools, 5, 74–80 provisions during Covid-19 pandemic, 46–48 reliability, 38 validity, 37–39 Gender quotas in post-primary academic selection, 64

Germany, post-primary academic selection, 68 Gignoux, J., 81 Gillie, O., 62 Gingell, J., 18–19 Goldstein, R. B., 124 Good Friday Agreement (1998), 64 Goodman, N., 25 Gorard, S., 5, 77–78 Grades, see Test scores/grades Grammar schools admission numbers in Northern Ireland, 64, 77 effectiveness (see Effectiveness of selective education systems) fee-paying, 3, 54 gender quotas, 64 number in England, 67–68 social composition, 6, 60, 84–90 and special educational needs, 85–86 teachers and classroom environment, 95 See also Academic selection, post-primary schools H Harris, R., 76–77 Hearnshaw, L. S., 59, 62 Henderson, L., 100 Hereditary Genius (Galton), 56–57 Hermeneutical injustice, 119–124, 140 Higher education, 3, 80–81, 88–89, 91, 112, 116, 129–130 Higher level cognitive skills testing, 17–20, 41, 136 High schools, see Comprehensive schools; Grammar schools; Non-selective post-primary schools (Northern Ireland)

 INDEX 

High stakes tests continuous assessment alternative, 134–139 introduction to, 1–2 stress and anxiety from, 138 See also Academic selection, post-primary schools; Testing Horn, D., 94 Hughes, J., 99, 100 Hurn, C., 92 I Intelligence as ‘innate’/inherited (Galton and Burt), 56–59, 61–63 measurement (see Intelligence/ IQ tests) multiple intelligences theory (Gardner), 5 Intelligence/IQ tests, 4, 27–28, 37, 59, 61–62 Ireson, J., 87 J Jerrim, J., 86–88, 96–97 K Kamin, L. J., 61–62 Kelly, K. T., 136–137 Knowledge, education as distribution of, 115–116 Kotzee, B., 115–116, 129 L Lavy, V., 36, 39 Levačić, R., 76, 79 Levelling down, 112–115 Liberties, and academic selection, 113–114

153

Liberty principle (of distributive justice), 111, 113 Literacy knowledge and testing, 26, 36, 45, 48, 66, 136 Loader, R., 99, 100 Lu, B., 81, 83, 85–86 M Macmillan, L., 90 Mansfield, I., 88–92 Marsh, A. J., 76, 79 Mathematical knowledge and testing, 17–19, 21–25, 36, 42, 45, 48, 66, 136 Matthew effect, 94 McGinn, M., 22 McKeen Cattel, James, 4 Mental capability, and physical traits (Galton’s opinions on), 56 Mental images, limitations, 21–24 Mental states, see Cognitive states, and observable behaviours Mental wellbeing, impact of academic selection, 5, 96–97, 138 Messick, S., 2, 13–14 Mexico City, academic selection, 81 Mixed ability teaching (proposed strategies), 141–142 Multiple intelligences theory (Gardner), 5 Musical knowledge and testing, 30 N Non-positional benefits of education, see Positional/non-positional benefits of education Non-selective post-primary schools (Northern Ireland), 3, 64, 77, 113–114 See also Comprehensive schools

154 

INDEX

Norm-referenced testing, 37 See also Standardised testing Northern Ireland comprehensive schools, 63 Craigavon area, 63, 102 Education Act (1947), 3–4, 54, 59 exam performance in selective/ non-selective schools, 75–77, 80 grammar school admission numbers, 64, 77 grammar school social composition, 85, 87, 126, 143 history of post-primary academic selection, 63–67 (see also Academic selection, post-­ primary schools) non-selective post-primary schools, 3, 64, 77, 113–114 private tutoring, 87 qualifying examination, 63 religious segregation/integration of schooling, 99–100 transfer tests (see Transfer tests (Northern Ireland)) O Ofqual, analysis of grade reliabilities, 36, 41–42, 45 Online learning, 130, 133–134 On the Origin of Species (Darwin), 55 P Parental choices in academic banding (proposed) system, 144 and socioeconomic status (SES), 85–88, 143

Pedigree analysis (Galton’s work), 56 Pedley, R., 60 Personalised curricula, 141–142 Positional/non-positional benefits of education, 112–113, 115 Post-primary schools academic selection (see Academic selection, post-primary schools) comprehensive schools (see Comprehensive schools) grammar schools (see Grammar schools) non-selective (see Non-selective post-primary schools (Northern Ireland)) PPTC (Post-Primary Transfer Consortium) transfer tests, 66, 110 Predictive validity of high stakes public examinations, 38–39 See also Validity of tests Private tutoring/coaching, 87, 117, 125, 140 Programme for International Student Achievement (PISA), 82 Property holism (Davis), 15–16 Propositional (‘thin’) knowledge testing, 17, 20, 41 Public examinations enduring formats, 46 GCSE and GCE ‘A’ Level (see GCSE and GCE ‘A’ Level examinations) and secondary intermediate schools, 54 See also Testing Q Qualifying examination (Northern Ireland), 63

 INDEX 

R Random school place allocation, 143–144 Rawls, J., 111, 116 Ready, D. D., 48 Reasonable adjustments and special considerations, 39, 135 Reeves, D. J., 48 Relational nature of test scores/grades, 27–28, 40, 126 Reliability of tests and comparative judgement, 137 definition and introduction to, 2–3, 14 GCSE and GCE ‘A’ Level examinations, 38 GCSE and GCE ‘A’ Level grade reliabilities by subject (Ofqual analysis), 36, 41–42, 45 relationship with validity, 2, 14, 17, 38 and subjectivity, 25–26, 29, 42–43 teacher assessments, 128–129 See also Validity of tests Religious segregation/integration of schooling (Northern Ireland), 99–100 Richardson, K., 27–28 ‘Rich’ knowledge (higher level cognition) testing, 17–20, 41, 136 Rose, S., 76–77 Rule-following, 17, 20–25, 27, 40, 126 Rushford, K., 87 S Satz, D., 113, 114 School accountability, 19, 20 School Standards and Framework Act (1998), 67

155

Scotland, academic selection, 67 SEAG (Schools’ Entrance Assessment Group), transfer tests, 66–67, 127 Secondary schools, see Comprehensive schools; Grammar schools; Non-selective post-primary schools (Northern Ireland) Selection, see Academic selection, higher education and vocational training; Academic selection, post-primary schools Sexual harassment, as hermeneutical injustice, 119, 122 Sherwood, D., 43–44 Shuttleworth, I., 75–76 Siddiqui, N., 5, 77–78 Simon, B., 60 Sims, S., 86–88, 96–97 Singapore, post-primary academic selection, 68 Smith, A., 80, 96–98 ‘Snapshot’ nature of high-stakes testing, 39 Social cohesion theory, 100 Social context, influence of (Davis), 16–17 Social groups, and hermeneutical power, 122 Socioeconomic status (SES) and academic achievement inequalities, 93–95 academic selection and social mobility, 5, 79, 84–85, 88–93, 117 academic selection and social stratification, 6, 60, 84–88, 93–95, 100, 125–126 controlling for in research studies, 78 free school meals (FSM), 75, 78, 85, 86, 89 ‘levelling down,’ 112–115

156 

INDEX

Socioeconomic status (SES) (cont.) and parental choices of schools, 85–88, 143 social composition of grammar schools, 6, 60, 84–90 and teacher assessment bias, 18, 48, 128 Socio-emotional impact of academic selection, 5, 96–97 Soft skills, 30, 139 South Korea, academic selection, 87 Special considerations and reasonable adjustments, 39, 135 Special educational needs, see Disabilities and special educational needs Spens Report (1938), 59–60 Standardised testing and continuous assessment (proposed paradigm), 135 history of, 4–5 IQ tests, 4, 27–28, 37, 59, 61–62 test validity and reliability, 37 See also Testing Status competition theory, 92–93 Steedle, J. T., 137 Stone, P., 143 Streaming (within schools), 82, 94, 141–142 Stress and anxiety due to high-stakes testing, 138 Sullivan, A., 81 Swift, A., 112–115, 129 T Teacher assessments for academic selection (as problematic), 128–129 and consistency, 28–29 continuous assessment (proposed paradigm), 134–139

during Covid-19 pandemic, 46–48 formative, 19–20, 135 and teacher bias, 18, 48, 128–129 and teachers’ personal vs. professional knowledge, 18 Teacher expectations, and academic outcomes, 95 Teacher preferences for grammar schools, 95 Teacher (principal) nominations for grammar school places (Northern Ireland), 63 Teaching to the test, 5–6, 12, 38, 66, 87, 98 Terrin, É., 82, 94 Testimonial injustice, 118–121, 123–125, 127–129 Testing criterion-referenced testing, 37 intelligence tests, 4, 27–28, 37, 59, 61–62 literacy, 26, 36, 45, 48, 66, 136 mathematics, 17–19, 21–25, 36, 42, 45, 48, 66, 136 music, 30 norm-referenced testing, 37 post-primary school selection (see Academic selection, post-­ primary schools) public examinations (see Public examinations) reliability (see Reliability of tests) standardised testing (see Standardised testing) validity (see Validity of tests) See also High stakes tests Test scores/grades aggregated, 28, 135 comparative judgement use, 136–138, 142 GCSE and GCE ‘A’ Level grade cut scores, 40–41, 44

 INDEX 

GCSE and GCE ‘A’ Level grade reliabilities by subject (Ofqual analysis), 36, 41–42, 45 inherent uncertainty and ‘fuzziness,’ 43–45, 127 as relational, 27–28, 40, 126 ‘Thin’ (propositional) knowledge testing, 17, 20, 41 Thurstone, L., 136, 137 Tracking (within-school ability grouping), see Streaming (within schools) Transfer tests (Northern Ireland), 3, 63–65 unregulated, 65–67, 98, 110–111, 127 See also Academic selection, post-primary schools Triventi, M., 82, 94 Twin studies (Burt), 61–62 U Universities, 3, 80–81, 88–89, 91, 112, 116, 129–130 V Validity of tests and comparative judgement, 136, 137 Davis’s critique of, 15, 117

157

definitions, 2–3, 13, 15 ethical considerations, 13–14, 110–111 GCSE and GCE ‘A’ Level examinations, 37–39 Messick’s conceptualisation, 13–14 post-primary academic selection tests, 5, 36, 60, 66–67 predictive validity, 38–39 relationship with reliability, 2, 14, 17, 38 and ‘rich’ vs. ‘thin’ knowledge, 17–19 teacher assessments, 128–129 and underrepresentation of content/skill domains, 38 See also Reliability of tests Van de Werfhorst, H. G., 94 Vocational training, 3 W Wales, academic selection, 67 Warnock, M., 109–110 White, J., 17–18 Winch, C., 18–20 Wittgenstein, L., 17, 20–22, 24–27, 40 Wright, C., 24 Wright, D. L., 48 Wyatt-Smith, C., 48, 128