266 44 5MB
English Pages XX, 180 [190] Year 2020
Matthew P. Bland Barak Ariel
Targeting Domestic Abuse with Police Data
Targeting Domestic Abuse with Police Data
Matthew P. Bland • Barak Ariel
Targeting Domestic Abuse with Police Data
Matthew P. Bland Institute of Criminology University of Cambridge Cambridge, UK
Barak Ariel Institute of Criminology University of Cambridge Cambridge, UK Institute of Criminology Faculty of Law, Hebrew University Jerusalem, Israel
ISBN 978-3-030-54842-1 ISBN 978-3-030-54843-8 (eBook) https://doi.org/10.1007/978-3-030-54843-8 © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2020 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
Matthew Bland For Paul Bland, who inspired me to write. Barak Ariel For the victims of domestic abuse.
Preface
Many of the most serious crimes in our society occur between intimate partners or family members. As a result, law enforcement agencies and their public, private and charitable partners have made substantial efforts to increase awareness and recording of domestic crimes. The result has been a rapid growth in recorded cases, accompanied by a thorough catalogue of responses, particularly in police forces. Yet the extent to which these responses are founded on scientific evidence is less clear. While the ‘evidence-based policing’ movement has also been growing rapidly in policing circles, it has not yet caught up with the expansion of domestic abuse policy. This book advances the evidence base around domestic abuse by examining the richest source of data we have on the subject – police records. Yes, not all domestic crimes are recorded by the police, but it is also true that there is no single source of data as rich in detail and consistency. We have assembled more than a quarter of a million records and used them to test out some of the fundamental assumptions held by domestic abuse policy makers and scholars. The results we present shine further light on a devastating crime. We hope they will provide a useful contribution to the many people involved in the fight to prevent domestic abuse. Cambridge, UK
Barak Ariel Matthew P. Bland
vii
Acknowledgements
I am grateful for the support of Professor Lawrence Sherman and Dr Heather Strang who have enabled us to have the time and space to develop these ideas. Most importantly, I thank my wife, Lou. Without her love and support not one word would have made it to a page. Matthew P. Bland
ix
Contents
1 Introduction���������������������������������������������������������������������������������������������� 1 1.1 Targeting Domestic Abuse���������������������������������������������������������������� 1 References�������������������������������������������������������������������������������������������������� 4 2 Domestic Abuse in England and Wales�������������������������������������������������� 7 2.1 How Domestic Abuse Is Defined������������������������������������������������������ 7 2.2 Characteristics and Terms Used�������������������������������������������������������� 8 2.3 The Main Problems Facing Responders ������������������������������������������ 9 2.4 Police Responses to Domestic Abuse ���������������������������������������������� 11 2.4.1 Initial Response: Mandatory Police Attendance ������������������ 11 2.4.2 Arrest Policy ������������������������������������������������������������������������ 12 2.4.3 Alternatives to Prosecution �������������������������������������������������� 13 2.4.4 Risk Assessment ������������������������������������������������������������������ 15 2.4.5 Advocates������������������������������������������������������������������������������ 16 2.4.6 Multi-agency Meetings �������������������������������������������������������� 16 2.4.7 Perpetrator Management������������������������������������������������������ 17 2.5 A Summary �������������������������������������������������������������������������������������� 17 References�������������������������������������������������������������������������������������������������� 18 3 Key Questions That Police Data Might Help Us Answer �������������������� 23 3.1 About Police Data ���������������������������������������������������������������������������� 23 3.2 Victim Surveys���������������������������������������������������������������������������������� 24 3.3 Police Records���������������������������������������������������������������������������������� 25 3.4 What Problems to Measure? ������������������������������������������������������������ 27 3.5 Summary ������������������������������������������������������������������������������������������ 32 References�������������������������������������������������������������������������������������������������� 32 4 The Existing Evidence ���������������������������������������������������������������������������� 35 4.1 Introduction�������������������������������������������������������������������������������������� 35 4.2 Evidence on Repeat Domestic Abuse ���������������������������������������������� 36 4.3 Evidence on Escalation �������������������������������������������������������������������� 37 4.4 Evidence on the Concentration of Harm������������������������������������������ 39 xi
xii
Contents
4.5 Evidence on Serial Domestic Abuse ������������������������������������������������ 39 4.5.1 Typologies of Domestic Batterers���������������������������������������� 40 4.5.2 Serial Perpetrators���������������������������������������������������������������� 42 4.5.3 Prevalence of Serial Perpetrators of Domestic Abuse���������� 43 4.6 Evidence on Forecasting Domestic Abuse���������������������������������������� 44 4.6.1 Actuarial Instruments in Criminal Justice Forecasts������������ 44 4.6.2 Machine Learning Techniques���������������������������������������������� 48 4.6.3 Previous Use of Random Forests for Criminal Justice Forecasting���������������������������������������������������������������������������� 50 4.6.4 Criticisms and Problems ������������������������������������������������������ 54 4.7 A Summary of the Evidence ������������������������������������������������������������ 56 References�������������������������������������������������������������������������������������������������� 57 5 Measuring Harm�������������������������������������������������������������������������������������� 63 5.1 What Is Harm and How Is It Measured?������������������������������������������ 63 5.1.1 Review of Harm Measurement Tools������������������������������������ 65 5.1.2 Assessing Which Tool to Use ���������������������������������������������� 72 5.2 Summary ������������������������������������������������������������������������������������������ 77 References�������������������������������������������������������������������������������������������������� 79 6 Repeat Domestic Abuse, Escalation and Concentration of Harm ���������������������������������������������������������������������������������������������������� 83 6.1 Important Questions on Repeat Abuse���������������������������������������������� 83 6.1.1 Important Questions on Escalation �������������������������������������� 84 6.1.2 Important Questions on the Concentration of Harm������������ 85 6.1.3 What Data Did We Have and How Did We Analyse It?���������������������������������������������������������������������� 85 6.1.4 Findings�������������������������������������������������������������������������������� 88 6.1.5 Theoretical Implications ������������������������������������������������������ 97 6.1.6 Research Implications���������������������������������������������������������� 98 6.1.7 Policy Implications �������������������������������������������������������������� 99 References�������������������������������������������������������������������������������������������������� 100 7 Serial Domestic Abuse ���������������������������������������������������������������������������� 103 7.1 Important Questions on Serial Abuse����������������������������������������������� 103 7.2 What Data Did We Have and How Did We Analyse It? ������������������ 103 7.2.1 Procedure: Serial Abuse�������������������������������������������������������� 106 7.3 Findings�������������������������������������������������������������������������������������������� 107 7.3.1 Prevalence ���������������������������������������������������������������������������� 107 7.3.2 Types of Abuse���������������������������������������������������������������������� 108 7.3.3 Harm ������������������������������������������������������������������������������������ 109 7.3.4 Other Crimes������������������������������������������������������������������������ 112 7.3.5 Subclassifications of Cohorts������������������������������������������������ 114 7.4 Theoretical Implications ������������������������������������������������������������������ 115 7.5 Research Implications���������������������������������������������������������������������� 117 7.6 Policy Implications �������������������������������������������������������������������������� 119 References�������������������������������������������������������������������������������������������������� 122
Contents
xiii
8 Forecasting Risk�������������������������������������������������������������������������������������� 125 8.1 Important Questions on Forecasting Risk���������������������������������������� 125 8.2 What Data Did We Have and How Did We Analyse It? ������������������ 126 8.3 Procedure: Forecasting���������������������������������������������������������������������� 129 8.3.1 How Random Forest Algorithms Work�������������������������������� 130 8.3.2 Model Parameters ���������������������������������������������������������������� 133 8.4 Findings�������������������������������������������������������������������������������������������� 136 8.4.1 What Proportion of All Arrestees Go on to Commit Domestic Abuse?������������������������������������������������������������������ 136 8.4.2 What Proportion of Domestic Abuse Arrestees Have Prior Domestic Records? �������������������������������������������� 137 8.4.3 Can Antecedent Inputs Predict Future Domestic Abuse Cases to a High Degree of Accuracy?����������������������� 139 8.5 Which Predictors Have the Greatest Impact on Accuracy?�������������� 141 8.5.1 Age First Arrested for Domestic Abuse�������������������������������� 143 8.5.2 Years Since Last Arrest for Domestic Abuse������������������������ 145 8.5.3 Presenting Offence Was Domestic Abuse���������������������������� 145 8.5.4 Number of Prior Domestic Arrests �������������������������������������� 145 8.5.5 Age at First Arrest for a Sexual Offence������������������������������ 146 8.6 Theoretical Implications ������������������������������������������������������������������ 146 8.7 Research Implications���������������������������������������������������������������������� 148 8.7.1 Using Targeting Research Alongside Testing ���������������������� 151 8.7.2 Integrating Harm Measurement Tools���������������������������������� 152 8.8 Policy Implications �������������������������������������������������������������������������� 153 8.8.1 Political Implications������������������������������������������������������������ 153 8.8.2 Practical Implications: The Need for More Information������ 154 8.8.3 A Working Model for Prediction������������������������������������������ 154 8.8.4 Building Police Forecasting Capabilities������������������������������ 156 References�������������������������������������������������������������������������������������������������� 157 9 Conclusions: Integrating Research into Practice���������������������������������� 161 9.1 What We Set Out to Analyse������������������������������������������������������������ 161 9.2 More Data Are Needed in This Fight������������������������������������������������ 163 9.3 How to Improve Our Analysis���������������������������������������������������������� 165 9.3.1 Police Records Are Not the Whole Story����������������������������� 165 9.4 Final Conclusions����������������������������������������������������������������������������� 168 A. Appendices���������������������������������������������������������������������������������������� 168 Appendix I: Technical Information Relating to Random Forest Modelling���������������������������������������������������������������������������� 168 Appendix II: Cambridge Crime Harm Index – Selected Values���������������������������������������������������������������������������������������������� 174 References�������������������������������������������������������������������������������������������������� 176 Index������������������������������������������������������������������������������������������������������������������ 179
About the Authors
Matthew P. Bland, PhD, is a Lecturer in Evidence Based Policing at the University of Cambridge’s Institute of Criminology. He previously worked as a crime analyst for more than 15 years. His work primarily focuses on advancing the evidence base in policing by developing police analytical capabilities and conducting research to test the efficacy of existing and emerging responses. His 2015 article with Barak Ariel, ‘Targetingescalationin reported domestic abuse: Evidence from 36,000 callouts’ was one of the first published articles to quantitatively evaluate common theories of escalating severity in domestic crimes. Barak Ariel, PhD, is a Professor at the Hebrew University of Jerusalem and a Lecturer in Experimental Criminology at the Institute of Criminology, Cambridge University. He has been involved in dozens of research projects with police agencies around the globe, with specific focus on crime and technology. Professor Ariel is the recipient of the Academy of Experimental Criminology Young Experimental Scholar Award, European Society of Criminology Young Criminologist Award and a Fellow of the Division of Experimental Criminology. He is also a Jerry Lee Scholar at the Institute of Criminology at Cambridge University.
xv
List of Figures
Fig. 3.1 Prevalence of domestic abuse in the last year for adults aged 16–59 years, by gender. (Reproduced from https://www.ons.gov.uk/ peoplepopulationandcommunity/crimeandjustice/bulletins/ domesticabuseinenglandandwales/yearendingmarch2018)������������������ 24 Fig. 6.1 Number of unique victims by number of crimes recorded������������������� 89 Fig. 6.2 Percentage of unique victims by number of crimes recorded��������������� 89 Fig. 6.3 Conditional probability of victims being attributed to another crime����������������������������������������������������������������������������������������������������� 90 Fig. 6.4 Number of unique offenders by number of crimes recorded���������������� 91 Fig. 6.5 Percentage of unique offenders by number of crimes recorded������������ 91 Fig. 6.6 Conditional probability of offender being attributed to another crime����������������������������������������������������������������������������������������������������� 92 Fig. 6.7 Mean CCHI score over first 10 incidents for victims with 5+ crimes���������������������������������������������������������������������������������������������������� 93 Fig. 6.8 Mean CCHI score over first 10 incidents for offenders with 5+ crimes���������������������������������������������������������������������������������������������������� 94 Fig. 7.1 Fig. 7.2 Fig. 7.3 Fig. 7.4
Offender cohort frequency������������������������������������������������������������������ 107 Total crime harm by cohort����������������������������������������������������������������� 110 Mean CCHI per offender per cohort��������������������������������������������������� 110 Power curve graph for cumulative proportion of crime harm by cumulative proportion of offenders����������������������������������������������������� 111 Fig. 7.5 Average non–domestic abuse CCHI by crime type and cohort���������� 113 Fig. 8.1 Fig. 8.2 Fig. 8.3 Fig. 8.4 Fig. 8.5
Example of a basic decision tree��������������������������������������������������������� 131 Example of a decision tree with two decision points�������������������������� 131 Variable importance plot for forecasting model accuracy������������������ 142 Variable importance plot for forecasting model node purity�������������� 144 Potential model for the operation of a domestic abuse forecasting instrument������������������������������������������������������������������������� 155
xvii
xviii
List of Figures
Fig. 9.1 Mean forecasting error for different numbers of splitting variables���������������������������������������������������������������������������������������������� 170 Fig. 9.2 Mean forecasting error for random forest model trees 1–501������������ 170 Fig. 9.3 Partial response plots for age at which first arrested for a domestic crime������������������������������������������������������������������������������������ 171 Fig. 9.4 Partial response plots for years since last arrested for a domestic crime Note that the scale goes to −10 on these plots due to one erroneous record which had a misclassified date of crime and was missed in the cleaning process����������������������������������������������������� 172 Fig. 9.5 Partial response plots for presenting arrest was for a domestic crime��������������������������������������������������������������������������������������������������� 172 Fig. 9.6 Partial response plots for number of previous domestic arrests���������� 173 Fig. 9.7 Partial response plots for age at first arrest for a sexual offence��������� 173
List of Tables
Table 3.1 Comparison between CSEW and police–recorded domestic abuse, 2015–2018������������������������������������������������������������������������������� 26 Table 3.2 Stated aims of selected national domestic abuse strategies���������������� 28 Table 5.1 Table 5.2 Table 5.3 Table 5.4 Table 5.5 Table 5.6
Sellin and Wolfgang’s (1964) severity typology�������������������������������� 66 Viability assessment of harm measurement tools������������������������������� 73 Criteria and scales for assessing harm measurement tools����������������� 74 Suitability assessment: Cambridge Crime Harm Index���������������������� 74 Suitability assessment: victim seriousness judgements���������������������� 75 Suitability assessment: Home Office Economic and Social Cost tool��������������������������������������������������������������������������������������������� 76 Table 5.7 Suitability assessment: ONS Crime Severity Score��������������������������� 77 Table 5.8 Final viability assessment of harm measurement tools���������������������� 78 Table 6.1 Table 6.2 Table 6.3 Table 6.4 Table 6.5 Table 6.6 Table 6.7 Table 6.8 Table 6.9 Table 6.10
Research questions: repeat abuse������������������������������������������������������� 84 Research questions: escalation����������������������������������������������������������� 84 Research questions: concentration of harm���������������������������������������� 85 Comparison of key domestic abuse statistics in Dataset 1����������������� 86 Comparisons of prevalence: Dataset 1����������������������������������������������� 87 Sample sizes for each category of chronological crime analysed for escalation: victims��������������������������������������������������������� 88 Tukey’s HSD results for CCHI means attributed to victims with a minimum of five domestic abuse events���������������������������������� 94 Tukey’s HSD results for CCHI means attributed to offenders with a minimum of 5 domestic abuse events�������������������������������������� 95 Number of domestic abuse crimes in dataset attributable to highest-harm offenders and victims��������������������������������������������������� 95 Demographic comparisons between ‘power few’ and non-‘power few’ victims and offenders���������������������������������������������� 96
Table 7.1 Research questions: serial abuse������������������������������������������������������ 104 Table 7.2 Breakdown of domestic abuse outcomes������������������������������������������ 105 xix
xx
List of Tables
Table 7.3 Serial domestic abuse dataset statistical comparisons���������������������� 105 Table 7.4 Selected demographic characteristics of perpetrator cohorts����������� 108 Table 7.5 Breakdown of makeup of domestic abuse crime types by cohort������������������������������������������������������������������������������������������������ 109 Table 7.6 Power few contributions of different offender cohorts��������������������� 112 Table 7.7 Prevalence of non–domestic abuse offending among cohorts���������� 112 Table 7.8 Mean CCHI of non–domestic abuse offending among cohorts������� 113 Table 7.9 Mean domestic CCHI by cohort/offending type������������������������������ 114 Table 8.1 Baseline levels for domestic abuse outcomes����������������������������������� 136 Table 8.2 Profile of cases in training dataset���������������������������������������������������� 137 Table 8.3 Proportion of domestic abuse arrestees with prior arrest records (for any type of crime)���������������������������������������������������������������������� 138 Table 8.4 Summary table for forecasting model accuracy������������������������������� 139 Table 8.5 Model performance�������������������������������������������������������������������������� 140 Table 9.1 Random forest tuning parameters����������������������������������������������������� 169
Chapter 1
Introduction
1.1 Targeting Domestic Abuse This research explores what police-kept domestic abuse records can tell us that may assist in refining harm reduction strategies employed the police. Domestic abuse has emerged as a priority in policing, particularly in the last decade. There is extensive evidence that this form of crime is a matter of grave concern to public health and safety in the twenty-first century – it is widespread, expensive and a major drain on policing resources. Official statistics estimate almost two million adult victims per year, a prevalence of 6% of all adults (ONS 2017) and the most recent comprehensive assessment of financial cost, albeit over a decade old, (Walby 2009) placed the cost in the billions of pounds to service providers, employers and victims. With economic inflation and rising crime levels since Walby’s estimate, it is inevitable that domestic abuse costs the public purse even more today. Furthermore, domestic circumstances are a major factor in the most serious crimes, featuring in a third of murders in England and Wales and in more than a tenth of all crimes recorded by the police (ONS 2018). With such an array of driving factors, it should be no surprise that policing domestic abuse is a major part of law enforcement activity. The purpose of our work here is to contribute to the evidence base for tackling domestic abuse. This has greatly expanded in recent years, in which research has delved deeper into the impacts of some subcategories of domestic abuse. Substantially more is now known about the impacts of forced marriage (Watts and Zimmerman 2002), revenge pornography (Henry and Powell 2014; Bond and Tyrell 2018) and financial abuse (Sharp-Jeffs 2015, 2017) than at any point in the past. This body of research has developed amid a burgeoning evidence-based policing (EBP) movement in England and Wales. Led by partnerships between the police professional body, the College of Policing (CoP), the National Police Chiefs’ Council (NPCC) and academic institutions, EBP forms a central tenet of modern policing strategies. Its aim is to improve practice through the accumulation of robust © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2020 M. P. Bland, B. Ariel, Targeting Domestic Abuse with Police Data, https://doi.org/10.1007/978-3-030-54843-8_1
1
2
1 Introduction
empirical evidence (Lum and Koper 2017; Neyroud and Weisburd 2014), although the nature of that empirical evidence has been the subject of much debate (see Cockbain and Knutsson 2014; Sparrow 2011; Weisburd and Neyroud 2013). Professor Lawrence W. Sherman, who first established the term ‘evidence-based policing’ (Sherman 1998), has proposed a framework for viewing EBP activities through the lenses of targeting, testing and tracking (Sherman 2013) – what Professor Sherman calls ‘the triple T’. Sherman has appraised the development of domestic abuse evidence through this framework (Sherman 2018) and it is specifically in this context that this research is positioned, building on recent findings in targeting evidence, including our own (Barnham et al. 2017; Bland and Ariel 2015; Bridger et al. 2017; Chalkley and Strang 2017; Kerr et al. 2017; Sherman and Strang 1996; Thornton 2017). In this book we go further by realising the promise of new analytic techniques and sources applied in criminology, such as: • The availability of ‘big data’ in the manner described by Sherman (2018) and demonstrated by previous work in other fields of criminology (Berk et al. 2009). • The development of new harm measurement instruments such as the Crime Severity Score (ONS 2016), the Cambridge Crime Harm Index (Sherman et al. 2016) and the updated Home Office Cost of Crime Estimates (Heeks et al. 2018). • The potential application of new machine-learning algorithms to big datasets, such as has been demonstrated in several recent publications (Berk 2012; Berk et al. 2016). Although the evidence base for domestic abuse is already comparatively rich, at least in the context of the general depth of rigorous evidence on policing activities, there has been long been an ongoing demand from practising agencies to acquire information and evidence that can further shape strategy (see for example Shepherd 1998; Sherman 1992, 2018). The impact of ‘austerity’ in the United Kingdom was to reduce the capacity of all government sectors, including those with primary responsibility for dealing with domestic abuse (Neyroud 2015). Yet at the same time, scrutiny from the national police oversight body, Her Majesty’s Inspectorate of Constabulary, Fire and Rescue Services (HMICFRS), aimed specifically at domestic abuse and was highly critical of the police response (HMICFRS 2014a). As a consequence of this, as well as a separate critical review of crime recording practices (HMICFRS 2014b), police forces attached greater priority to identifying and responding to domestic reports, resulting in recorded crime numbers rising steeply at a time when other sources showed a decline in the prevalence of self- reported domestic abuse in the adult population (ONS 2017). With around 20% fewer police officers than in 2010 (ONS 2019), responding to the additional demand has been a challenge for police forces. Arrests and charges have declined (Ariel and Bland 2019; ONS 2017, 2018), and the inspectorate continues to highlight deficiencies in the police response in areas such as identification of risk (HMICFRS 2017, 2019). This context provides an opportunity for well- developed targeting research to help shape domestic abuse strategies, and it is precisely this opportunity which this research aims to address.
1.1 Targeting Domestic Abuse
3
There remain questions about the extent to which the evidence influences what the police (or other responding agencies) actually do in practice. As we will explore further, much of the current response to domestic abuse is not fundamentally driven by robust evidence, and the collection of data concerning victims and offenders remains, to this point, an under-utilised resource in the development of domestic abuse strategies. It is this area that is our target. The overarching aim of this book is to add to the existing evidence base in ways that could usefully contribute to front- line strategies and underpinning theories, by exploiting the potential of an existing resource abundant in every police force – crime data. In this respect, we gathered hundreds of thousands of anonymised domestic abuse records from police forces around the country. These records related to crimes, arrests, offenders and victims and these data resemble the typical sorts of information every police agency has ready access to. These records were assembled into three large datasets and analysed using a variety of statistical procedures, ranging from the very simple (rates and proportions) to the very complex (a machine learning algorithm). Each procedure that has been used has been selected with a specific research aim in mind and these aims were selected because they represent issues of high relevance to practitioners and researchers, and because there are gaps or uncertainties in what these groups know about these issues. Throughout these issues the topic of ‘harm’ is a persistent feature. As we will explore, much of the current response to domestic abuse is geared towards the identification of ‘high-risk’ cases and subsequent action to negate that risk. It is logical to argue that ‘risk reduction’ is actually an outcome that the police service and its partners are seeking, but this leads to the inevitable next question: the risk of what? The answer seems perfectly logical – serious harm to the victim – but this in turn raises a difficult question for a crime researcher. How does one measure harm? Harm is a subjective concept, especially among practitioners. What constitutes serious harm for one person does not necessarily do so for another. In this spirit, a number of harm measurement tools have been developed by researchers but, at the time of writing, there are currently no national guidelines to guide this debate in any particular direction. This is the first challenge that this research seeks to overcome: the selection of an appropriate instrument for the measurement and tracking of harm to facilitate the further exploration of police data. However, before we assess harm measurement instruments, we set must paint a picture of the world that this book is set in. Chapter 2 describes the situation with domestic abuse in England and Wales, including how it is defined, what the main features are and how the police respond. In Chap. 3 we describe how domestic abuse records are kept and what problems the records enable us to measure. We then move on to explore our different options for measuring harm in a meaningful way in Chap. 4 before giving an exposition of the existing evidence-base underpinning our areas of focus in Chap. 5. Chapters 6, 7, and 8 cycle through those areas of focus in turn. In each of these chapters we set out the specific issues we have addressed and explain how we analysed them. We show the results of our analyses and then explore what they mean for theory, future research and action by practitioners. In Chap. 9, we conclude by reflecting on how our findings might be integrated in the real-world.
4
1 Introduction
References Ariel, B., & Bland, M. (2019). Is crime rising or falling? A comparison of police recorded crime and victimisation surveys. Methods of criminology and criminal justice research. Sociology of Crime, Law, and Deviance, 24, 7–31. Barnham, L., Barnes, G. C., & Sherman, L. W. (2017). Targeting escalation of intimate partner violence: Evidence from 52,000 offenders. Cambridge Journal of Evidence-Based Policing, 1, 1–27. Berk, R. (2012). Criminal justice forecasts of risk: A machine learning approach. New York: Springer. Berk, R., Sherman, L., Barnes, G., Kurtz, E., & Ahlman, L. (2009). Forecasting murder within a population of probationers and parolees: A high stakes application of statistical learning. Journal of the Royal Statistical Society: Series A (Statistics in Society), 1721, 191–211. Berk, R. A., Sorenson, S. B., & Barnes, G. (2016). Forecasting domestic violence: A machine learning approach to help inform arraignment decisions. Journal of Empirical Legal Studies, 131, 94–115. Bland, M., & Ariel, B. (2015). Targeting escalation in reported domestic abuse: Evidence from 36,000 callouts. International Criminal Justice Review, 25(1), 30–53. https://doi. org/10.1177/1057567715574382. Bond, E., & Tyrrell, K. (2018). Understanding revenge pornography: A national survey of police officers and staff in England and Wales. Journal of Interpersonal Violence, 2018, 0886260518760011. Bridger, E., Strang, H., Parkinson, J., & Sherman, L. W. (2017). Intimate partner homicide in England and Wales 2011–2013: Pathways to prediction from multi-agency domestic homicide reviews. Cambridge Journal of Evidence-Based Policing, 12–3, 93–104. Chalkley, R., & Strang, H. (2017). Predicting domestic homicides and serious violence in Dorset: A replication of Thornton’s Thames Valley analysis. Cambridge Journal of Evidence-Based Policing, 12–3, 81–92. Cockbain, E., & Knutsson, J. (Eds.). (2014). Applied police research: Challenges and opportunities. New York: Routledge. Heeks, M., Reed, S., Tafsiri, M., & Prince, S. (2018). The economic and social costs of crime (2nd ed.). London: Home Office. Henry, N., & Powell, A. (Eds.). (2014). Preventing sexual violence: Interdisciplinary approaches to overcoming a rape culture. Basingstoke: Springer. Her Majesty’s Inspectorate of the Constabulary, Fire and Rescue Services, (2014a). Everyone’s business: Improving the police response to domestic violence. [Online] Retrieved from https:// www.justiceinspectorates.gov.uk/hmicfrs/wp-content/uploads/2014/04/improving-the-policeresponse-to-domestic-abuse.pdf [accessed 15th October 2016]. Her Majesty’s Inspectorate of the Constabulary, Fire and Rescue Services, (2014b). Crime recording: making the victim count. [Online] Retrieved from https://www.justiceinspectorates.gov. uk/hmicfrs/wp-content/uploads/crime-recording-making-the-victim-count.pdf. [accessed 15th October 2016]. Her Majesty’s Inspectorate of Constabulary, Fire and Rescue Services., (2017). A progress report on the police response to domestic abuse. [Online]. https://www.justiceinspectorates.gov.uk/ hmicfrs/wp-content/uploads/progress-report-on-the-police-response-to-domestic-abuse.pdf [accessed 9th February, 2019]. Her Majesty’s Inspectorate of Constabulary, Fire and Rescue Services., (2019). A progress report on the police response to domestic abuse [Online] https://www.justiceinspectorates.gov.uk/ hmicfrs/publications/a-progress-report-on-the-police-response-to-domestic-abuse/ [accessed 4th March 2019]. Kerr, J., Whyte, C., & Strang, H. (2017). Targeting escalation and harm in intimate partner violence: Evidence from Northern Territory Police, Australia. Cambridge Journal of Evidence- Based Policing, 1, 1–17.
References
5
Lum, C. M., & Koper, C. S. (2017). Evidence-based policing: Translating research into practice. Oxford: Oxford University Press. Neyroud, P. (2015). Future perspectives in policing: A crisis or a perfect storm: The trouble with public policing? In Police services (pp. 161–165). Cham: Springer. Neyroud, P. W., & Weisburd, D. (2014). Transforming the police through science: The challenge of ownership. Policing: A Journal of Policy and Practice, 8(4), 287–293. Office for National Statistics ONS. (2016). Research outputs: Developing a Crime Severity Score for England and Wales using data on crimes recorded by the police. [Online] Retrieved from https://www.ons.gov.uk/peoplepopulationandcommunity/crimeandjustice/articles/researchoutputsdevelopingacrimeseverityscoreforenglandandwalesusingdataoncrimesrecordedbythepolice/2016-11-29. Accessed 6th Mar 2019. Office for National Statistics ONS. (2017). Domestic abuse in England and Wales: Year ending March 2017. Statistical Bulletin. London: Office of National Statistics. [Online] Retrieved from https://www.ons.gov.uk/peoplepopulationandcommunity/crimeandjustice/bulletins/ domesticabuseinenglandandwales/yearendingmarch2017. Accessed 17th Mar 2018. Office for National Statistics ONS. (2018). Domestic abuse in England and Wales: Year ending March 2018. Statistical Bulletin. London: Office of National Statistics. [Online] Retrieved from https://www.ons.gov.uk/peoplepopulationandcommunity/crimeandjustice/bulletins/ domesticabuseinenglandandwales/yearendingmarch2018. Accessed 2nd Mar 2019. Office for National Statistics ONS. (2019). Police workforce, England and Wales: 30 September 2018. Statistical Bulletin. London: Office for National Statistics. [Online] Retrieved from https://www.gov.uk/government/statistics/police-workforce-england-and-wales-30-september-2018. Accessed 29th May 2019. Sharp-Jeffs, N. (2015). A review of research and policy on financial abuse within intimate partner relationships. London: London Metropolitan University. Sharp-Jeffs, N. (2017). Money matters: Research into the extent and nature of financial abuse within intimate relationships in the UK. London: Co-Operative Bank. Shepherd, J. P. (1998). Tackling violence: Interagency procedures and injury surveillance are urgently needed. British Medical Journal, 316, 879–880. Sherman, L. W. (1992). Policing domestic violence: Experiments and dilemmas. New York: Free Press. Sherman, L. W. (1998). Evidence-based policing. Washington, DC: Police Foundation. Sherman, L. W. (2013). The rise of evidence-based policing: Targeting, testing, and tracking. Crime and Justice, 421, 377–451. Sherman, L. W. (2018). Evidence-based policing: Social organization of information for social control. In Crime and social organization (pp. 235–266). London: Routledge. Sherman, L. W., & Strang, H. (1996). Policing domestic violence: The problem-solving paradigm. Paper presented at the Stockholm conference on “Problem-solving as crime prevention,” Swedish National Council on Crime Prevention. Sherman, L., Neyroud, P. W., & Neyroud, E. (2016). The Cambridge crime harm index: Measuring total harm from crime based on sentencing guidelines. Policing: A Journal of Policy and Practice, 103, 171–183. Sparrow, M. K. (2011). Governing science. Cambridge, MA: Harvard Kennedy School Program in Criminal Justice Policy and Management. Thornton, S. (2017). Police attempts to predict domestic murder and serious assaults: Is early warning possible yet? Cambridge Journal of Evidence-Based Policing, 1, 1–17. Walby, S. (2009). The cost of domestic violence: Up-date 2009. Lancaster: Lancaster University. Watts, C., & Zimmerman, C. (2002). Violence against women: Global scope and magnitude. The Lancet, 3599313, 1232–1237. Weisburd, D., & Neyroud, P. (2013). Police science: Toward a new paradigm. Australasian policing, 5(2), p.13.
Chapter 2
Domestic Abuse in England and Wales
2.1 How Domestic Abuse Is Defined To retain consistency with common practice in England and Wales, in this book we use the standard UK cross-government definition of domestic violence and abuse, reprinted here for clarity: Any incident of controlling, coercive, threatening behaviour, violence or abuse between those aged 16 or over who are, or have been intimate partners or family members regardless of gender or sexuality. The abuse can encompass, but is not limited to psychological, physical, sexual, financial or emotional. (Home Office 2012)
The definition includes elements that were added in 2012 to encompass coercive and controlling behaviour and 16–17-year-old victims, following a public consultation. This replaced the previous separate definitions published by the Home Office and the Association of Chief Police Officers.1 The definition change was accompanied by the introduction of a new criminal offence in relation to coercive and controlling behaviour, a concept first established by the criminologist Evan Stark (2007). Coercive and controlling behaviour is currently the only domestic abuse– specific offence in English and Welsh law; in all other cases, domestic abuse is effectively a circumstance attached to another legally defined criminal offence, for example, assault, rape or burglary. As such, accurate recording of abuse is a complex issue and separate from the usual form of official crime counting. In practice, the police is the service with primary statutory responsibility for determining and recording a criminal act, but it is by no means the only service that has contact with domestic abuse victims. Schools, hospitals, charities, doctors’ surgeries, housing providers, social workers and more all come into contact with domestic abuse cases and are required to apply the cross-government definition. Health practitioners in particular, have a critical role in identifying and referring Now known as the National Police Chiefs Council.
1
© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2020 M. P. Bland, B. Ariel, Targeting Domestic Abuse with Police Data, https://doi.org/10.1007/978-3-030-54843-8_2
7
8
2 Domestic Abuse in England and Wales
victims to specialist services. But it is only the police service that is administratively required by the Office for National Statistics (ONS) to record the occurrence of domestic abuse in official records. Despite this, the cross-government definition obviously offers the most practicable definition for research, and as we stated at the outset, police records are the only data source used in this research. Other government services such as the National Health Service provide employees with guidance on how to identify domestic violence and abuse which is consistent with the national definition, but there is no cross-agency or nationwide requirement to document domestic abuse. In these settings, guidance commonly recommends caution and the seeking of victim consent to record (see domesticviolencelondon.nhs.uk). The only other ‘regulated’ sources of domestic abuse records are the Crime Survey of England and Wales (CSEW), which comprises interviews, self-completion questionnaires and criminal justice agency statistics, which are connected to police records (having been passed into the criminal justice system by the police service). The domestic abuse charity SafeLives maintains a national database of high-risk cases which includes some that are not in police datasets, but in all examples these datasets fall within the scope of the national cross-government definition. While this definition is most relevant to agencies in England and Wales, it is not without relevance to researchers and practitioners in other countries. There may be no internationally accepted definition of domestic abuse, but intimate partner violence is a well-known and researched subject. The principal difference between the UK cross-government definition used here and the typical description of intimate partner violence is the former’s inclusion of family members over the age of 16. However, this form of abuse does not make up the majority of domestic abuse as recorded in England and Wales (ONS 2018), meaning intimate partner violence practitioners and researchers may still derive some broad meaning from the definition we have used in this work.
2.2 Characteristics and Terms Used First and foremost, domestic abuse is a form of crime with several names. It is often referred to as domestic violence or intimate partner violence. It is less frequently known as wife-battering or spousal abuse, which were popular terms in the last century. Neither term is invalid, but neither represents the full spectrum of relationship circumstances in which a domestic crime can occur. Domestic abuse as we consider it in this research, concerns personal crimes between intimate partners or family members above the age of sexual consent. These crimes are not necessarily violent. Hypothetically, any form of crime could be domestic-related, although the majority recorded by police are usually classified as violence with or without injury (ONS 2017, 2018).
2.3 The Main Problems Facing Responders
9
Each domestic crime recorded in England and Wales has consistent components which we will refer to throughout this research. For clarity, the most commonly used are explained as follows: Victim: the individual against whom the crime is committed. In this research, the term ‘survivor’ is also covered by use of this word. While we prefer the use of the latter term, which we believe better represents those subjected to abuse, we recognise that the former is a more widely understood term and we use it solely for this reason. Offender: the individual who is accused of or proved to have committed the crime. In this book we make no distinction between judicial statuses, so all suspects are described as ‘offenders’, and the term ‘perpetrator’ is often used interchangeably but always with the same definition. Dyad: the unique combination of a victim and particular offender. This research does not use dyads as an analytical unit, but reference is made to the term in the analysis of literature and discussion of implications. Crime Classification: Each case is assigned a Home Office classification based on its nature. For example: burglary, assault, grievous bodily harm etc. Technically, any form of crime can be domestic abuse, but domestic abuse is not a crime classification in its own right. Repeat Abuse: Although some agencies specify a minimum level of occurrences, or a minimum or maximum window of time, in this research any multiple instance of domestic abuse between the same dyad or by the same victim or offender is referred to as ‘repeat abuse’. Serial Abuse: When an offender or victim is attributed to multiple domestic abuse crimes but within multiple dyads, this research refers to these cases as ‘serial abuse’ rather than repeat abuse. There has been considerable interest in the last decade in understanding and targeting offenders of serial abuse in particular, which we will examine further in Chap. 7. Domestic Non-Crimes: Also known as domestic incidents, these are calls for service that the police attend and determine to be relating to a domestic dispute, but which do not meet any threshold or criteria for a criminal act. As such no victim or offender status is assigned (though the details of the individuals involved are logged).
2.3 The Main Problems Facing Responders There can be little argument that domestic abuse is a ‘just cause’ for a criminologist’s attention. Even before one considers the plethora of official statistics on prevalence, harm and cost, there is the moral heart of the matter: that people ought to be at liberty to have personal lives free from crime or abuse and it is our duty to investigate situations where this is not the case. As Stark (2007) explains, along with practitioners, researchers share the burden of formulating effective responses, and
10
2 Domestic Abuse in England and Wales
as we have already outlined, this is precisely the spirit in which we planned this book. However, before we can establish a meaningful research plan, we must understand the nature of the problem(s) with domestic abuse to identify the areas of most pressing interest. As a first step, in this chapter we attempt to outline the nature of the domestic abuse problem in England and Wales as a frame of reference for specific areas of research. We must emphasise at the outset that defining strategic domestic abuse problems is itself a problem. It is universally acknowledged that domestic abuse is underreported (see Brimicombe 2018) – which is to say more (much more) takes place than is recorded in official records (see ONS 2018). The main source of official records on domestic abuse in England and Wales is those kept by the police, both in the forms of recorded crimes and domestic ‘non-crime’ incidents. The numbers of official records increased substantially in response to the police inspectorate’s domestic abuse reports in 2014 and 2016 and parallel pressure on crime recording standards. Commentators also speculate that the rising media profile and service prioritisation of domestic abuse has led to more victims reporting to police (ONS 2017), but no robust evidence yet supports this hypothesis. However, although the rise in recorded non-crimes and crimes represented 88% between 2016 and 2018 (HMICFRS 2019), the extent to which police records reflect prevalence is still thought to be low. The charity Women’s Aid conducted a census survey of more than 12,000 domestic abuse community-based services users and 2000 women’s refuge users in the UK and found that just 28% and 43%, respectively, were reporting to the police (Women’s Aid 2017). These results may be limited – the survey period utilised a ‘day to count’ and ‘week to count’ structure and had an overall response rate of just over 50%, but even if extrapolation is disregarded, the emerging trends were notable in their own right. Elsewhere, the specialist UK domestic abuse charity SafeLives also eschews the use of official statistics (SafeLives 2018) and instead uses its own survey-based dataset of more than 35,000 records, claimed to be the ‘largest database of domestic abuse cases in the UK’. Despite this, both Women’s Aid and SafeLives quote the Crime Survey of England and Wales as the main source of statistics when presenting information on prevalence. Herein lies the most fundamental problem: what actually are the problems? With such disagreement about an ingredient as essential to defining problems as source data, it is perhaps unsurprising that there is ambiguity, disagreement and confusion around domestic abuse strategies. The roots of this most fundamental issue seem to be intrinsic. Domestic abuse is a form of interpersonal crime that takes place in an intimate environment, frequently a home, unseen by witnesses. The suffering and perpetrating parties have a relationship which predates and will possibly post-date the offence. Victims often live in fear of reprisals for themselves, their careers, their privacy, their children, other loved ones, or their pets. Offending is often subtle to the point of being almost intangible, as Stark (2007) eloquently demonstrated in his descriptions of coercive control. If all of these obstacles were not enough, there is still the matter of the lack of a single recording environment for a victim’s case but this is, to a larger extent, a matter of necessity. Victims should not be compelled to
2.4 Police Responses to Domestic Abuse
11
report to the police, nor have their personal data shared outside of the organisation they have chosen to approach. It is not merely a question of whether a single data source could be created for domestic abuse; there is a very important question of whether one ought to be created in the first place. The problems then are perhaps best described as ‘basic’ in nature. For police, rising demand is a major problem which gives rise to a primary concern: how to provide a high-quality response to each case. Given the shrinking capacity of the police in recent years (ONS 2019) this is a dilemma without an immediately feasible solution. Instead then, the issue becomes about how to best manage that demand. Here, there are three key aspects: 1. Triage: how can the police identify which cases most need their finite resources? 2. Differential response: what responses work most effectively with whom? 3. Prevention: how can the police (or other agencies) prevent domestic abuse from occurring in the first place, and thereby reduce demand and free capacity for response and further prevention? Our research is concerned with two of these three aspects. We exclude any substantial consideration of differential responses, which require individual evaluations. Instead, our research questions are aimed at primarily contributing to evidence relevant to the first and third aspects of the police problem: how might police forces triage cases and how they might prevent them? We will examine these issues in more detail in later chapters. Before doing so, it is important to consider how the police service typically responds to domestic abuse at present.
2.4 Police Responses to Domestic Abuse To further contextualise these problems and issues this section sets out a brief description of the current landscape for domestic abuse responses in England and Wales at a strategic level. Each description briefly reviews the evidence that does (or does not) underpin each activity. Our intent here is not to provide a systematic critique of current strategy and its supporting evidence so much as to further frame the relevance of the research questions we address in this book.
2.4.1 Initial Response: Mandatory Police Attendance In England and Wales, the policing response to domestic abuse is characterised by the phrase ‘positive action’, which encompasses both the initial response to a call to police and the actions taken once a police officer arrives on the scene. ‘Positive action’ is derived from the ‘positive obligation’ component of the Human Rights Act and is translated in the Authorised Professional Practice for policing (College of Policing 2018) as necessitating the attendance of a police officer at the address in
12
2 Domestic Abuse in England and Wales
every domestic abuse case. Call takers must grade domestic abuse calls for an immediate response and attending officers must use the police ‘national decision model’ to assess whether immediate action is required once they arrive. Latterly, the practice of wearing and operating a body–worn video camera has been promoted as effective in increasing the proportion of cases resulting in charges (Owens et al. 2014). HMICFRS found that the rate at which police forces comply with this policy is improving but has a long way to go (HMICFRS 2017, 2019). This is representative of the inspectorate’s wider view on the policing response and given this verdict and recent trends in the volume of domestic abuse calls it is fair to say that the attendance policy places a substantial demand on resources. So what evidence is this commitment based on? The College of Policing conducted a systematic review of research evidence on the effects of police attendance at domestic abuse events in 2016. The review synthesised evidence from nine studies, which were mostly from the USA and not from the recent past. It concluded that there was very little evidence that police attendance had any impact on repeat offending. The one exception to this conclusion came from Felson et al. (2005), which found no evidence of retaliatory violence in response to a report made to the police.
2.4.2 Arrest Policy England and Wales police forces do not operate a statutory mandatory arrest policy; it is in fact impossible to statutorily compel a police officer in these countries to arrest a person. The term ‘mandatory arrest’ is associated with the move made by many US states following the Minneapolis Domestic Violence Experiment (Sherman and Berk 1984) but is often confused with the ‘positive action’ policy operated in England and Wales. Authorised Professional Practice (College of Policing 2018) states that officers must justify any decision not to arrest, as part of a suite of actions to make victims safe. These actions include other police powers, such as the issuing of a civil order or a caution, but each of these has particular policy specifications, which we consider later on in this chapter. The fact is that arrest does not take place in the majority of domestic abuse events and has declined proportionally in recent times (ONS 2018). This trend has drawn criticism from the police inspectorate and been further complicated by changes to police bail powers (HMICFRS 2019). The effects of arrest on domestic abuse recidivism are one of the best researched areas in the field (see Vigurs et al. 2016 for a summary), but the results are not uniform. Sherman and Berk’s initial randomised experiment in Minneapolis (1984) concluded lower levels of recidivism in cases assigned to arrest than in those assigned to ‘advice’ and was the catalyst for widespread state legislative changes to domestic violence arrest policies in the USA as well as a slew of replication studies in the form of the Spouse Assault Replication Program (SARP). However, Sherman et al. (1992) found mixed results in other sites, and Maxwell et al. (2002) found moderate effects on prevention in police records but statistically significant reductions in victim reports. All in all, a mixed bag.
2.4 Police Responses to Domestic Abuse
13
Other studies also contributed to the contradictory evidence. Cho and Wilke (2010) examined the effects of arrest on repeat victimisation among males and found no deterrent effect, and both they and Felson et al. (2005) hypothesised that simple attendance by the police had as much impact on recidivism as arrest. Others have found outright detrimental effects from arrest. Iyengar (2009) compared the rate of domestic murders between US states with and without presumptive or mandatory arrest policies and found that those states with presumptive arrest policies saw a greater rise than those without. Iyengar speculated that this was due to arrests undesired by victims having a suppressing effect on future reporting and a ‘reprisal effect’ from the arrest policy, but neither theory has been tested sufficiently to establish a causal effect. Sherman and Harris (2015) were able to draw stronger causal links between arrest and increased victim mortality, albeit general and not from homicide. Following up on participants of the original Sherman and Berk Minneapolis experiment from 1984, they found that victims whose partners had been subject to arrest instead of advice had a 64% greater chance of having died in the subsequent 25 years. This was particularly evident among employed African Americans and not influenced by homicide (only three of 91 deaths were homicides). Myhill (2018) argues that the absence of definitive evidence of the positive effect of arrest on recidivism does not mean it is an inappropriate measure because it enables more comprehensive recording to shed light on patterns of coercive control. The introduction of this form of criminal offence, Myhill argues, means that dealing with domestic abuse is a distinctly different premise for police officers in England and Wales than in, say, the USA, thus negating calls by others (Sherman 2015) to explore alternatives. This remains the prevailing view at the time of writing, with the police inspectorate continuing to highlight falling arrest rates in police forces as a cause for concern (HMICFRS 2019).
2.4.3 Alternatives to Prosecution 2.4.3.1 Civil Orders Police officers may apply for civil orders in cases where an arrest has not been made (although they are often sought when an arrest has been made). Such orders, known as Domestic Violence Protection Orders (DVPO), can be applied for without victim support, and are authorised or denied by a magistrate. DVPOs are preceded by notices issued by officers while the order is being prepared (DVPNs). These require the authorisation of a Superintendent and challenge the capacity of that rank. The police inspectorate reported in 2017 (HMICFRS 2017) that it was increasingly concerned with the declining use of DVPNs and DVPOs, particularly in light of an evaluation conducted in 2013, when the scheme was being piloted (Kelly et al. 2013), which found that the tactic was ‘associated with reduced levels of re- victimisation’. But examination of that evaluation shows the finding to be at best, highly contestable. Kelly et al.’s study was a case-matched sample design which,
14
2 Domestic Abuse in England and Wales
once filtered for prior domestic crime history, left a sample size of just 123. Among this cohort, a statistically significant reduction of one domestic crime was observed. For cases reporting for the first time, there was no effect. Despite the modesty of these findings, they were the catalyst for DVPN/Os being implemented across the country. There has been only one significant study of this tactic since. Smith (2016) conducted a case-control analysis of DVPN/Os issued in Hertfordshire and found no significant differences between the DVPN/O group and the matched control sample in domestic crime before and after the issue of the order. 2.4.3.2 Cautions The College of Policing guidance to police forces in England and Wales is emphatic about cautions, stating that they are rarely appropriate in domestic abuse cases, and that for intimate cases a conditional caution2 would likely never be appropriate. Despite this, Westmarland et al. (2017) found that many forces were using out-of- court resolutions such as cautions on a regular basis. A recent Ministry of Justice evaluation of out-of-court criminal justice resolutions was inconclusive about their use in domestic abuse cases because of the absence of a counterfactual but found no significant difference in reoffending among domestic offenders in the pilot areas at a three-month review point (Ames et al. 2018). The prevailing view about cautions being unsuitable was strongly challenged by a recent experiment in Hampshire (Strang et al. 2017), which tested the effects on crime count and harm among a cohort of 154 male domestic abuse offenders who were compelled under conditional caution to attend two day-long workshops. In comparison to the randomly assigned control group, the workshop attendees were re-arrested for domestic abuse 21% less often and with 38% less harm (as measured using the Cambridge Crime Harm Index3). At the time of writing, several forces in the country were planning to embark on their own pilots of conditional caution workshop schemes for low-risk offenders. 2.4.3.3 Restorative Justice Professional practice advice is equally clear that restorative justice tactics are as inappropriate in domestic abuse cases as cautions, yet here too there is emerging evidence that challenges this position. Ptacek (2017) best summarised this by highlighting the promising evidence on the efficacy in crime reduction of restorative approaches collated by Strang et al. (2017) and setting it against a lack of rigorous evidence in any direction as far as domestic abuse or intimate partner violence is
2 A conditional caution is a classification of investigative outcome in England and Wales which implies that a caution will not be issued providing specified conditions are met. 3 The Cambridge Crime Harm Index is explained in full in Chap. 4.
2.4 Police Responses to Domestic Abuse
15
concerned, with control groups used only in Pennell and Burford (2000) and Mills et al. (2013). The result is that the research field neither knows whether victim– offender conferences can be effective in domestic cases, nor has any evidence to the contrary.
2.4.4 Risk Assessment Risk assessment is a central tenet of the domestic abuse policing strategy in England and Wales. With risk assessment defined as a cyclical process of estimating ‘the likelihood and nature of a risk posed by a perpetrator to a particular victim, children or others’ (College of Policing 2018), the police policy thereon is to require each attending officer to conduct a structured professional judgement exercise against a series of predetermined questions posed to the victim. This process has been in place since 2008, when the Home Office and Association of Chief Police Officers endorsed the risk assessment model known as Domestic Abuse, Stalking and Honour-Based Violence, or ‘the DASH’, championed by the charity Co-ordinated Action Against Domestic Abuse (now SafeLives). The DASH was constructed based on the ‘SPECS+’ model used in London but was not evaluated or tested in any structured way prior to its implementation (Myhill 2018). The College of Policing (2018) asserts that evidence concerning domestic violence predictors is limited but lists 10 predictors based on professional expertise. This list includes (inter alia) previous physical assault by the perpetrator, escalation, animal abuse, child abuse and suicidal tendencies, all of which are reflected in the DASH. Though the DASH emerged from Richards’ (2006) study of domestic homicide cases, it is not entirely clear how its predictor elements took shape, and the tool has been defended as preventative rather than predictive in nature (Richards et al. 2008). Regardless, the DASH has been subject to a range of critical studies in recent years. Thornton (2017) found the DASH to have low predictive validity in relation to domestic homicide and ‘near-miss’ cases, the majority of which the police had no prior contact for, and a very high false positive rate (in which high risk cases did not result in a deadly crime). Chalkley and Strang (2017) replicated Thornton’s work and found a false negative rate of 67%. They went on to highlight suicide and self- harm warnings on the part of male perpetrators as having high predictive validity (suicide is specifically mentioned in DASH). Robinson (2016) also found that the DASH was not being used consistently in all police forces, and in 2017, the College of Policing undertook to review the DASH question set, subsequently amending it to place greater emphasis on coercive and controlling behaviour. In 2019, Turner et al. (2019) found the tool to be underperforming, little better at prediction than chance and with every question to be weak predictors of future abuse, at best. Even ignoring other studies, this paper alone casts major doubt on how fit for purpose the DASH is as a risk assessment tool and highlights the acute need for improvement.
16
2 Domestic Abuse in England and Wales
2.4.5 Advocates In England and Wales, Independent Domestic Violence Advisors (IDVAs) operate independently of police forces but are sometimes located alongside domestic violence units. It is normal practice for all cases categorised as high risk by the DASH process to be assigned to an IDVA, whose role it is to support the victim by acting as an advocate and a main point of contact for police and other agencies, and by developing safety plans and options. The use of advocates such as IDVAs has been shown to have a positive impact on victim cooperation (Camacho and Alarid 2008) as well as a moderately positive impact on quality of life, but mixed effects on recidivism in relation to physical and sexual abuse (Rivas et al. 2015).
2.4.6 Multi-agency Meetings For cases designated as high risk, it is common practice for the victim to be discussed at a meeting of agencies concerned with the issue of domestic abuse (police, probation, children’s services, housing agencies, charities, education agencies, etc.). In England and Wales, these meetings are known as Multi-Agency Risk Assessment Conferences (MARACs). There are more than 270 such meetings, dealing with almost 100,000 cases every 3 months (College of Policing 2018; SafeLives 2018). The intention of MARACs is to reduce the risk to victims and their children by facilitating information exchange between agencies and building plans of action. Two thirds of referrals to MARACs originate from police forces, and just over a quarter of cases are discussed repeatedly (SafeLives 2018). MARACs are a form of ‘co-ordinated community response’ (CCR), which have been explored in domestic abuse research in the UK and the US, although mostly in respect of processes (Klevens et al. 2008). The original evaluation of MARACs found that 40% of cases had no further police call-outs in the 12 months after their MARAC contact (Robinson and Tregida 2005), but other than this study, the evidence base features little robust proof of the notion that MARACs or CCRs more generally fulfil their purpose. Klevens et al. (2008) compared domestic abuse rates between 10 CCR areas and 10 areas without CCRs and found no significant difference. Two quasi-experimental studies replicating Klevens et al.’s research reached the same conclusion (Post et al. 2010; Visher et al. 2008). Research conducted for the UK Home Office in 2011 concluded that evidence on the impact of MARACs on outcomes was quite weak (Steel et al. 2011), and a number of other studies in the country have echoed that concern (McGlaughlin et al. 2014; Berry et al. 2014). More recent experimental evidence has found that multi-agency perpetrator management approaches may offer some reduction in crime harm over a 2-year follow up period (see Goosey et al. 2017). However, this single study included an imbalance in treatment intensity which limited the precision of the findings in respect of identifying which aspect of the multi-agency approach caused the effect.
2.5 A Summary
17
2.4.7 Perpetrator Management In 2016, the police inspectorate explicitly called for police agencies to detail what perpetrator programmes they were operating with reference to research published by the College of Policing (HMICFRS 2017). That research, however, found no conclusive evidence about any form of perpetrator scheme, primarily due to a lack of well-designed evaluations (Vigurs et al. 2016). The only guidance available to forces in respect of perpetrator management relates to serial and repeat perpetrators, for whom the College of Policing advises that each force should have a system for active management and monitoring which makes use of existing schemes such as Integrated Offender Management and Multi-Agency Public Protection Arrangements (College of Policing 2018). In response, at the time of writing, many police forces were trialling perpetrator management programmes such as Drive (www.driveproject.org.uk) and multi-agency tasking and co-coordination (Davies and Biddle 2018).
2.5 A Summary These descriptions of current domestic abuse practices are by no means exhaustive, but they cover the main components of the policing response. One might also write about legislative changes such as the domestic violence disclosure scheme, or specialist justice provisions such as specialist domestic violence courts, but we judge these to be less relevant to establishing context for our research which focuses predominantly on issues pertinent to triage and prevention. It is plain that domestic abuse policy in England and Wales cannot be described as evidence based. There are a number of influencing factors here. Firstly, there is little strong evidence about domestic abuse activities. The deepest evidence base is in respect of arrest, yet even that is mixed. Other cornerstone elements of the domestic abuse response (risk assessment, advocates, MARACs) have been subject to so few high-quality studies that there is no strong evidence to speak of at all. In the face of such a void, it is perhaps not so surprising that agencies have used professional experience, and in some cases low-quality studies, on which to design their responses. The problem of domestic abuse is pressing, highly prevalent, costly and harmful. In the last two decades, scrutiny of the response to domestic abuse has grown ever sharper, culminating in a damning report of the policing response in 2014 (HMICFRS 2014). This engendered a prevailing imperative to ‘do something’ – yet little robust evidence was available to inform practice. For many domestic abuse response initiatives, once they have been piloted, roll-out seems inevitable; and once rolled out, it is unthinkable that they would ever be stopped, even to allow for control-group based trials. The domestic abuse response community has been largely reluctant to build ‘denial-of-service’ control groups into any forms of evaluation for fear of harming victims by doing so, and this has led circumstances in which un-evidenced
18
2 Domestic Abuse in England and Wales
responses have been implemented to the masses with no plans to rigorously test their impact on outcomes. In the few examples where this prevailing view has been overcome (see Strang et al. 2017), the results have been slow to gain traction. We argue that this status quo is not sustainable in the face of ever-increasing demand on domestic abuse services. The number of domestic abuse cases reported to police forces has increased substantially in each of the last 3 years (ONS 2016, 2017, 2018; HMICFRS 2019), and the number of cases seen by MARAC agencies has followed a similar trend (SafeLives 2018). These trends are unlikely to abate in the face of ongoing scrutiny from the police inspectorate and the ongoing advocacy of large national charities that conduct and publish their own research. It is equally improbable that the agencies responding to domestic abuse, especially the police, will have more resources to deal with the problem in the near future. In addition to the £1.6bn cut from the policing budget between 2010 and 2017, and a further planned £700 million to be cut by 2021, the police service was given a £420 million bill by the UK Treasury for pensions shortfalls (Dodd 2018), for which only temporary central funds were supplied. This situation mirrors that faced by other government agencies: local authorities have approximately 26% less funds than in 2010 (Hulme 2017), which affects adult social care and children’s services capacity; the amount of education spending per pupil fell by 8% between 2010 and 2018 (Coughlan 2019); and by 2020, the Probation Service budget had been reduced by 40% over the course of a decade. All of these cuts have potential adverse implications for domestic abuse prevention. The continuing effects of austerity do not make the development and refinement of an evidence base for domestic abuse response less relevant; they make it more so. There is a real and pressing need for agencies to better target their scarcer resources in order to achieve their desired outcomes of protecting victims. While it may be that this requires a change in attitude towards control-group-based evaluations, these need not be the only source of evidence upon which strategies are refined. Targeting evidence may be just as, if not more, useful to agencies in the current context, and this is where police data may have a crucial role to play.
References Ames, A., Di Antonio, E., Hitchcock, J., Webster, S., Wong, K., Ellingworth, D., Meadows, L., MacAlonan, D., Uhrig, N., & Logue, N. (2018). Adult out of court disposal pilot evaluation- final report. London: Ministry of Justice. Berry, V., Stanley, N., Radford, L., McCarry, M., & Larkins, C. (2014). Building effective responses: An independent review of violence against women, domestic abuse and sexual violence services in Wales. Welsh Government. Brimicombe, A. J. (2018). Mining police-recorded offence and incident data to inform a definition of repeat domestic abuse victimization for statistical reporting. Policing: A Journal of Policy and Practice, 12(2), 150–164. Camacho, C. M., & Alarid, L. F. (2008). The significance of the victim advocate for domestic violence victims in municipal court. Violence and Victims, 23(3), 288–300.
References
19
Chalkley, R., & Strang, H. (2017). Predicting domestic homicides and serious violence in Dorset: A replication of Thornton’s Thames Valley analysis. Cambridge Journal of Evidence-Based Policing, 1(2–3), 81–92. Cho, H., & Wilke, D. J. (2010). Gender differences in the nature of the intimate partner violence and effects of perpetrator arrest on revictimization. Journal of Family Violence, 25(4), 393–400. College of Policing. (2018). Domestic abuse index. [Online] Available at: https://www.app.college. police.uk/domestic-abuse-index/. Accessed 1st Mar 2019. Coughlan, S. (2019). School spending on pupils cut by 8%, says IFS. [Online] Available at: https:// www.bbc.co.uk/news/education-44794205. Accessed 7th Jan 2019. Davies, P. A., & Biddle, P. (2018). Implementing a perpetrator-focused partnership approach to tackling domestic abuse: The opportunities and challenges of criminal justice localism. Criminology & Criminal Justice, 18(4), 468–487. Dodd, V. (2018). England and Wales police funding rise of £970m ‘not enough’. [Online] Available at: https://www.theguardian.com/uk-news/2018/dec/13/england-and-wales-police-fundingrise-of-970m-not-enough. Accessed 14th Jan 2019. Felson, R., Ackerman, J., & Gallagher, C. (2005). Police intervention and the repeat of domestic assault. Criminology, 43(3), 563–588. Goosey, J., Sherman, L., & Neyroud, P. (2017). Integrated case management of repeated intimate partner violence: A randomized, controlled trial. Cambridge Journal of Evidence-Based Policing, 1(2–3), 174–189. Her Majesty’s Inspectorate of the Constabulary, Fire and Rescue Services. (2014). Everyone’s business: Improving the police response to domestic violence. [Online] Retrieved from https:// www.justiceinspectorates.gov.uk/hmicfrs/wp-content/uploads/2014/04/improving-the-policeresponse-to-domestic-abuse.pdf. Accessed 15th Oct 2016. Her Majesty’s Inspectorate of Constabulary, Fire and Rescue Services., (2017). A progress report on the police response to domestic abuse. [Online]. https://www.justiceinspectorates.gov.uk/ hmicfrs/wp-content/uploads/progress-report-on-the-police-response-to-domestic-abuse.pdf [accessed 9th February, 2019]. Her Majesty’s Inspectorate of the Constabulary, Fire and Rescue Services. (2019). A progress report on the police response to domestic abuse. [Online] https://www.justiceinspectorates. gov.uk/hmicfrs/publications/a-progress-report-on-the-police-response-to-domestic-abuse/. Accessed 4th Mar 2019. Home Office. (2012). New definition of domestic violence. Retrieved from https://www.gov.uk/ government/news/new-definition-of-domestic-violence. Accessed 18th Sept 2019. Hulme, W. (2017). Local authorities’ budgets are roughly 26% lower since 2010. [Online] https:// fullfact.org/economy/local-authorities-budgets/. Accessed 26th Oct 2018. Iyengar, R. (2009). Does the certainty of arrest reduce domestic violence? Evidence from mandatory and recommended arrest laws. Journal of Public Economics, 93(1–2), 85–98. Kelly, L., Adler, J. R., Horvath, M. A., Lovett, J., Coulson, M., Kernohan, D., & Gray, M. (2013). Evaluation of the pilot of domestic violence protection orders (Home Office Science Research Paper 76). https://www.gov.uk/government/uploads/system/uploads/attachment_data/ file/260897/horr76.pdf Klevens, J., Baker, C. K., Shelley, G. A., & Ingram, E. M. (2008). Exploring the links between components of coordinated community responses and their impact on contact with intimate partner violence services. Violence Against Women, 14(3), 346–358. Maxwell, C., Joel, H., Garner, J., & Fagan, J. (2002). The preventive effects of arrest on intimate partner violence: Research, policy and theory. Criminology & Public Policy, 2(1), 51–80. McLaughlin, H., Banks, C., Bellamy, C., Robbins, R., & Thackray, D. (2014). Domestic violence, adult social care and MARACs: Implications for practice. NHS National Institute for Health Research, Research Findings [Online]. Accessed 28th Nov 2017. Available from: http://www. sscr.nihr.ac.uk/PDF/Findings/RF44.pdf Mills, L. G., Barocas, B., & Ariel, B. (2013). The next generation of court-mandated domestic violence treatment: A comparison study of batterer intervention and restorative justice programs. Journal of Experimental Criminology, 9(1), 65–90.
20
2 Domestic Abuse in England and Wales
Myhill, A. (2018). The police response to domestic violence: Risk, discretion, and the context of coercive control. Doctoral dissertation, City, University of London. Office for National Statistics (ONS). (2016). Domestic abuse in England and Wales: Year ending March 2016. Statistical Bulletin. London: Office of National Statistics. [Online] Retrieved from https://www.ons.gov.uk/peoplepopulationandcommunity/crimeandjustice/bulletins/ domesticabuseinenglandandwales/yearendingmarch2016. Accessed 17th Mar 2018. Office for National Statistics (ONS). (2017). Domestic abuse in England and Wales: year ending March 2017. Statistical Bulletin. London: Office of National Statistics. [Online] Retrieved from https://www.ons.gov.uk/peoplepopulationandcommunity/crimeandjustice/bulletins/ domesticabuseinenglandandwales/yearendingmarch2017. Accessed 17th Mar 2018. Office for National Statistics (ONS). (2018). Domestic abuse in England and Wales: Year ending March 2018. Statistical Bulletin. London: Office of National Statistics. [Online] Retrieved from https://www.ons.gov.uk/peoplepopulationandcommunity/crimeandjustice/bulletins/ domesticabuseinenglandandwales/yearendingmarch2018. Accessed 2nd Mar 2019. Office for National Statistics (ONS). (2019). Police workforce, England and Wales: 30 September 2018. Statistical Bulletin. London: Office for National Statistics. [Online] Retrieved from https://www.gov.uk/government/statistics/police-workforce-england-and-wales-30-september-2018. Accessed 29th May 2019. Owens, C., Mann, D., & Mckenna, R. (2014). The Essex BWV trial: The impact of BWV on criminal justice outcomes of domestic abuse incidents. London: College of Policing. Pennell, J., & Burford, G. (2000). Family group decision making: Protecting children and women. Child Welfare, 79(2), 131–158. Post, L. A., Klevens, J., Maxwell, M. D., Shelley, G. A., and Ingram, E. (2010). An Examination of Whether Coordinated Community Response Affect Intimate Partner Violence. Journal of Interpersonal Violence 25(1):75–93. Web. Ptacek, J. (2017). Research on restorative justice in cases of intimate partner violence. Preventing intimate partner violence: Interdisciplinary perspectives, p.159. Richards, L. (2006, Autumn). Homicide prevention: Findings from the multi-agency domestic violence homicide review. The Journal of Homicide and Major Incident Investigation, 2(2). ACPO: Centrex. Richards, L., Letchford, S., & Stratton, S. (2008). Policing domestic violence. Oxford: Blackstone’s Practical Policing, Oxford University Press. Rivas, C., Ramsay, J., Sadowski, L., Davidson, L. L., Dunne, D., Eldridge, S., Hegarty, K., Taft, A., & Feder, G. (2015). Advocacy interventions to reduce or eliminate violence and promote the physical and psychosocial well-being of women who experience intimate partner abuse. Cochrane Database of Systematic Reviews, 12, 1–202. Robinson, A. L. (2016). What works for reducing domestic abuse: Risk-led policing and the DASH risk assessment tool. [Online] https://www.researchgate.net/profile/Amanda_Robinson5/ publication/301821428_What_works_for_reducing_domestic_abuse_Risk-led_policing_and_the_DASH_risk_assessment_tool/links/5729c64208aef5d48d2ef55a/What-worksfor-reducing-domestic-abuse-Risk-led-policing-and-the-DASH-risk-assessment-tool.pdf. Accessed 3rd May 2017. Robinson, A. L., & Tregidga, J. (2005). Domestic violence MARACs (Multi-agency risk assessment conferences) for very high-risk victims in Cardiff, Wales: Views from the victims. Cardiff University School of Social Sciences. SafeLives. (2018). About domestic abuse. [Online] http://safelives.org.uk/policy-evidence/aboutdomestic-abuse. Accessed 4th Mar 2019. Sherman, L. W. (2015). A tipping point for “totally evidenced policing” ten ideas for building an evidence-based police agency. International Criminal Justice Review, 25(1), 11–29. Sherman, L. W., & Berk, R. A. (1984). The specific deterrent effects of arrest for domestic assault. American Sociological Review, 49(2), 261–272.
References
21
Sherman, L. W., & Harris, H. M. (2015). Increased death rates of domestic violence victims from arresting vs. warning suspects in the Milwaukee Domestic Violence Experiment (MilDVE). Journal of Experimental Criminology, 11(1), 1–20. Sherman, L. W., Schmidt, J. D., & Rogan, D. P. (1992). Policing domestic violence: Experiments and dilemmas. New York: Free Press. Smith, C. (2016). A case control analysis of offenders issued with domestic violence protection orders (DVPOs) in Hertfordshire: A retrospective and prospective study. M.St thesis, Cambridge. Stark, E. (2007). Coercive control: How men entrap women in everyday life. New York: Oxford University Press. Steel, N., Blakeborough, L., & Nicholas, S. (2011). Supporting high-risk victims of domestic violence: A review of multi-agency risk assessment conferences (MARACs). London: Home Office. Strang, H., Sherman, L., Ariel, B., Chilton, S., Braddock, R., Rowlinson, T., Cornelius, N., Jarman, R., & Weinborn, C. (2017). Reducing the harm of intimate partner violence: Randomized controlled trial of the Hampshire constabulary CARA experiment. Cambridge Journal of Evidence-Based Policing, 1(2–3), 160–173. Thornton, S. (2017). Police attempts to predict domestic murder and serious assaults: Is early warning possible yet? Cambridge Journal of Evidence-Based Policing, 1, 1–17. Turner, E., Medina, J., & Brown, G. (2019). Dashing hopes? The predictive accuracy of domestic abuse risk assessment by the police. The British Journal of Criminology, 59(2), azy074. Vigurs, C., Wire, J., Myhill, A., & Gough, D. (2016). Police initial responses to domestic abuse. London: College of Policing. Available at: http://whatworks.college.police.uk/Research/ Documents/Police_initial_responses_domestic_abuse.pdf. Accessed 4th Mar 2019. Visher, C. A., Harrell, A., Newmark, L., & Yahner, J. (2008). Reducing intimate partner violence: An evaluation of a comprehensive justice system-community collaboration. Criminology & Public Policy, 7(4), 495–523. Westmarland, N., Johnson, K., & McGlynn, C. (2017). Under the radar: The widespread use of ‘out of court resolutions’ in policing domestic violence and abuse in the United Kingdom. The British Journal of Criminology, 58(1), 1–16. Women’s Aid. (2017). How common is domestic abuse? [Online] https://www.womensaid.org.uk/ information-support/what-is-domestic-abuse/how-common-is-domestic-abuse/. Accessed 4th Mar 2019.
Chapter 3
Key Questions That Police Data Might Help Us Answer
3.1 About Police Data We recognise that often the first challenge to this research will likely concern our data source of choice. Police records are nearly always only partial because much crime is simply never reported to or discovered by a police agency. This is certainly logical for domestic abuse, a form of criminality that is often ‘subtle’ and deeply personal by nature. Let us be clear from the outset: we wholeheartedly accept that police records cannot reveal everything we need to know about domestic abuse – it is simply impossible. However, we do not accept that this limitation eliminates any possibility of useful findings being derived from police records. As we will discuss in this chapter, police records offer the single largest, consistent source of cases. They are already the driving factor in hundreds and thousands of decisions made by practitioners every day. To dismiss or ignore this dataset would be short sighted. At the very least, a comparison of an analysis of these records with theory, current practice and existing evidence may reveal the extent to which the data source is limited. However, before we analyse these data, we need to understand more about their nature. Part of this understanding is how police records compare to other forms of measuring domestic abuse. In this chapter we compare police records with surveys, the other primary measurement of abuse. We also consider the key strategic objectives that analysis of police records might focus on. Throughout the chapter we emphasise the case for using these data as a source of potential insight.
© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2020 M. P. Bland, B. Ariel, Targeting Domestic Abuse with Police Data, https://doi.org/10.1007/978-3-030-54843-8_3
23
24
3 Key Questions That Police Data Might Help Us Answer
3.2 Victim Surveys The Crime Survey of England and Wales (CSEW) is commonly referred to as the best estimate of domestic abuse prevalence (SafeLives 2018; Women’s Aid 2017). This survey is used by the Office for National Statistics (ONS) to produce the official national estimates of domestic abuse prevalence. In respect of domestic abuse, the survey itself comprises two parts: interviews and a self-completion module. Until 2018, the interview process did not match the cross-government definition of domestic abuse, particularly in respect of coercive and controlling behaviour, for which it did not include any specific questions. Accordingly, prevalence estimates were calculated using the self-assessment module, which has historically had a higher reporting rate than the interviews (ONS 2018). The ONS estimated that 1.9 million adults (aged 16–59) experienced domestic abuse in the year ending March 2017, a rate of 6 in 100 adults, but given the caveats, this estimate was almost certainly below the true level. Future surveys will improve this situation, and it is anticipated that the questions introduced on coercive and controlling behaviour will increase gender asymmetry (Myhill 2015), but for now the ‘best evidence’ of prevalence indicates a broadly stable trend in the last decade (see Fig. 3.1). The CSEW offers some points of interest to strategy-makers, for example on age or gender profile, but disaggregated data are not routinely made available to agencies to interrogate, and to date the survey has not been used to answer questions beyond the most strategic – such as ‘how much domestic abuse is there?’ Whether the survey represents the ‘best evidence’ of prevalence is also questionable. Although the anonymous self-reporting methodology potentially overcomes underreporting challenges in the broadest sense, the notion of a large gap between official records and surveys for more serious crimes, such as violent crimes resulting
15
Percentage of victims once or more Men
Women
All
10
5
0 Mar 2006
Mar 2008
Mar 2010
Mar 2012
Mar 2014
Mar 2016
Mar 2018
Fig. 3.1 Prevalence of domestic abuse in the last year for adults aged 16–59 years, by gender. (Reproduced from https://www.ons.gov.uk/peoplepopulationandcommunity/crimeandjustice/bulletins/domesticabuseinenglandandwales/yearendingmarch2018)
3.3 Police Records
25
in severe injuries, has been challenged (Ariel and Bland 2019). With these views in mind, we propose that records kept by police forces are a potentially more promising option for deeper exploration of domestic abuse patterns.
3.3 Police Records As already outlined, the police service has come under considerable pressure to ‘up its collective game’ in respect of crime recording, and this appears to have led to a narrowing of the gap between ‘actual’ and recorded prevalence. Police forces recorded 1,198,094 domestic abuse events in the 12 months ending March 2018 (ONS 2018) – the smallest gap on record between police records and the CSEW. This is a large pool of record level data on domestic abuse, and if a larger one exists, it has not yet been discovered. Size is not everything, however, and if this data source is to be a real option in defining domestic abuse issues and calibrating responses accurately, it bears closer inspection for quality (an issue we return to later in this chapter). The Home Office made domestic abuse recording mandatory for police forces in 2015, following the HMICFRS’ thematic inspection of domestic abuse, which called for an improvement in data recording and the establishment of a national domestic abuse dataset (HMICFRS 2014a). Despite this, most police agencies in England and Wales had already been recording domestic abuse under the national definition as part of their own internal processes. Domestic abuse has been recognised as a core area for policing in England and Wales for more than a decade, and most forces had already established specialist units and specific processes to tackle this crime type. The Domestic Abuse, Stalking and Honour-Based Violence (DASH) risk assessment process meant that forces could already distinguish domestic cases to some extent, albeit without audit or regulation; no agency had set or inspected police data standards in respect of domestic abuse classification until the Home Office established such standards in 2015. Police domestic abuse data are perhaps the most obvious choice for the development of evidence for the purpose of targeting resources, yet these have no real reputation to speak of as a source for empirical research. Even though the police service introduced a national standard for crime recording in 2002, several governmental reports published in subsequent years were critical of the quality of police–recorded crime data (PASC 2014), leading the House of Commons Public Administration Select Committee (PASC) to conclude in 2014 that they were no longer a reliable source. Shortly thereafter, the UK Statistics Authority withdrew the designation of police–recorded crime statistics as a ‘National Statistic’ – effectively a term of endorsement. This led the police inspectorate to conduct a nationwide series of inspections of crime data quality, on the basis of which it overwhelmingly concluded that the police service was substantially under-recording offences, particularly those of an interpersonal nature (HMICFRS 2014b). This inspection contained a specific recommendation that police forces submit domestic abuse records to the Office for National Statistics to use in the creation of a national dataset aimed at
26
3 Key Questions That Police Data Might Help Us Answer
Table 3.1 Comparison between CSEW and police–recorded domestic abuse, 2015–2018 a. Number of victims estimated by CSEW b. Number of crimes and incidents recorded by police Ratio a:b
2015/16 2,000,000 1,031,120 1.94:1
2016/17 1,913,000 1,068,200 1.79:1
2017/18 2,000,000 1,198,094 1.67:1
addressing the issue of defining repeat victimisation. This national dataset was submitted from the beginning of the 2015/16 financial year, and the overall result of the inspection programme has been a sharp rise in overall levels of recorded interpersonal crimes. Critics of police data in domestic abuse research commonly have three main complaints (Brimicombe et al. 2007; Brimicombe 2018; PASC 2014). Firstly, they argue that the data are difficult to assimilate across force boundaries, being held on numerous different systems and in numerous formats. Secondly, the data are often ‘incomplete’, which is a particular problem for domestic abuse researchers because collation relies on a process known as ‘flagging’, whereby officers mark offences once they have identified them as meeting the cross-government definition. Thirdly, and most importantly, critics point out that police–recorded domestic abuse represents only a small proportion of actual crime, rendering it inappropriate as a source from which to draw conclusions about the crime type as a whole. All of these are relevant and valid concerns, but each can be overcome. The advent of the national dataset has meant that police forces are required to keep certain fields of data, and modern data-cleaning techniques mean that more data matching is now possible than ever before. Forces have improved the consistency of their record ‘flagging’, under scrutiny from the inspectorate and in anticipation of future inspections. Among the 23 forces subjected to data quality inspections in 2017 and 2018, the mean accuracy of domestic abuse records exceeded 80%.1 With domestic abuse now designated as a statutory data return, the national network of crime registrars and their audit work come into play, further increasing the checks and balances on these data. Finally, the most important criticism: that most domestic abuse is not reported to the police. It is difficult to find a precise measure on this, but it is commonly accepted among domestic abuse researchers that this is the case (Brimicombe 2018). The only practicable way to check the level of underreporting is to compare the Crime Survey of England and Wales (CSEW) prevalence estimates compiled by the Office for National Statistics with the level of police-recorded crime. The national dataset now provides this comparison, as shown in Table 3.1. It must be stressed that this is not a like-for-like comparison; the CSEW estimates the number of victims in a 12-month period, regardless of how many crimes they have experienced. Conversely, the police-recorded figure reflects the number of events reported to them in a 12-month period, in which victims may appear more 1 Figures retrieved from https://www.justiceinspectorates.gov.uk/hmicfrs individual reports into police force crime recording inspections.
3.4 What Problems to Measure?
27
than once. Police records also include victims over the age of 59, which is the maximum age considered by the CSEW. But these differences are reliable, which is to say they are consistent over time and the gap in the ratio of CSEW-estimated victims to police-recorded crimes and incidents is clearly closing. While this might be explained by improving trust in the police on the part of victims once contact has been established, the fact remains that the police data pool is not insignificant by any means. Were it considered to be a sample of the overall domestic abuse population, the general confidence interval would likely be low, and while it is clearly not a sample in this sense (as it is not randomly drawn or stratified in any way), this notion is illustrative of the potential overall power of this dataset. Add to this the fact that the common rulebook and audit infrastructure make police records a common language understood and translatable anywhere in the country, and these data offer not just the best opportunity for quantitative criminologists to explore domestic abuse, but also the most practicable opportunity by far. In a resource-restricted environment, there is compelling reason to examine this ready-made data resource.
3.4 What Problems to Measure? Thus far we have established that domestic abuse is a major concern for public and charitable agencies in England and Wales, and that though their current response is extensive, it is neither based on rigorous evidence nor is the response rapidly accumulating that evidence. Yet with scrutiny undiminishing and the volume of work still increasing, while funding for resources has reduced in recent times, the need for evidence is as acute as ever. With agencies seemingly reluctant, on ethical grounds, to engage in widespread testing of the kind that would provide clear outcome comparisons between different tactics, targeting evidence – identifying who, where and what to address – is possibly the most pressing kind of information required. If decision-makers in domestic abuse agencies cannot continue to provide an equal service to all, to whom should they provide a disproportionate service, and hope to secure the best outcomes? This is the prominent contextual challenge that frames this research, and we have argued that police data, as the richest single source of information on domestic abuse victims and perpetrators, hold the best potential to provide such evidence. First and foremost, however, we need to understand what outcome is sought by practicing agencies. This perhaps appears a simpler challenge than it really is because there is no single national domestic abuse strategy, and as such, it is worth briefly examining the stated strategic outcome aims of some of the key organisational stakeholders (Table 3.2).2
2 Note the focus on outcome aims. Many strategies include output aims in their content, for instance, increase service availability’, which is a means to an end (output) rather than the end itself (outcome). Table 3.2 focuses only on those aims seen as relating directly to outcomes.
28
3 Key Questions That Police Data Might Help Us Answer
Table 3.2 Stated aims of selected national domestic abuse strategies Agency/strategy HM Government Violence Against Women and Girls Strategy SafeLives Women’s Aid Ministry of Defence
Stated aims Continued decreases in the prevalence of domestic violence More victims helped into long-term independence To end domestic abuse for good Safety, freedom and independence Reduced prevalence and impact of domestic abuse and increase safety and wellbeing of all those affected
Modest though this selection is, it is indicative of the multitude of local strategies found on local authority, police and crime commissioner websites. There is a consistent common sentiment throughout – to make victims safer – but a striking lack of agreement on specific and measurable outcomes, ranging from the comparatively conservative ‘decrease in prevalence’ to the total and permanent cessation of domestic abuse. Our interest here is not to evaluate the merits of these strategies but rather to determine how one might develop a system of measurement, on the presumption that measurement is a prerequisite for any systematic evaluation. The obvious candidate is prevalence, which is implied or explicitly stated in most domestic abuse strategies. Easy to understand though it may be, prevalence requires survey estimates and discounts the differential nature of crimes. Prevalence could be decreased by 50%, but if the remaining crimes included an increase in homicide, rape and serious assault, it would be illogical to argue that the outcome was a successful one. Consequently, to throw light on differential patterns of harm, we need to be able to define and measure the concept of harm. Furthermore, the instrument we use for this purpose must be complementary to the source of our data (police records). We will return to this issue in due course. Our work in the chapters that follow explores what domestic abuse records kept by police forces can tell us that may assist in refining these harm reduction strategies employed the police. There is already extensive evidence that domestic abuse is a matter of grave concern to public health and safety in the twenty-first century – it is widespread, expensive and a major drain on policing resources. Official statistics on prevalence aside, the most recent comprehensive assessment of financial cost, albeit over a decade old, (Walby 2009) placed the cost in the billions of pounds to service providers, employers and victims. With economic inflation and rising crime levels since Walby’s estimate, it is likely domestic abuse costs the public purse even more today. Furthermore, domestic abuse is a major factor in the most serious crimes – domestic circumstances feature in a third of murders in England and Wales and in more than a tenth of all crimes recorded by the police (ONS 2018). Our intent with this book is to contribute to the evidence base for tackling domestic abuse. This has greatly expanded in recent years, in which research has delved deeper into the impacts of particular subcategories of domestic abuse. Substantially more is now known about the impacts of forced marriage (Watts and Zimmerman 2002), revenge pornography (Henry and Powell 2014; Bond and Tyrell 2018) and
3.4 What Problems to Measure?
29
financial abuse (Sharp-Jeffs 2015, 2017) than at any point in the past. This body of research has developed against a background of an emerging evidence-based policing (EBP) movement in England and Wales. Led by partnerships between the police professional body, the College of Policing (CoP), the National Police Chiefs’ Council (NPCC) and academic institutions, EBP forms a central tenet of modern policing with the aim of improving practice through the accumulation of robust empirical evidence (Lum and Koper 2017; Neyroud and Weisburd 2014). The nature of that empirical evidence has been the subject of much debate (see Cockbain and Knutsson 2014; Sparrow 2011; Weisburd and Neyroud 2013). Sherman, who first established the term ‘evidence-based policing’ (Sherman 1998), has proposed a framework for viewing EBP activities through the lenses of targeting, testing and tracking (Sherman 2013), and appraised the development of domestic abuse evidence through this framework (Sherman 2018). It is specifically in this context that this research is positioned, building on recent findings in targeting evidence (Barnham et al. 2017; Bland and Ariel 2015; Bridger et al. 2017; Chalkley and Strang 2017; Kerr et al. 2017; Sherman and Strang 1996; Thornton 2017) and attempting to realise the promise of new analytic techniques and sources applied in criminology, such as: –– The availability of ‘big data’ in the manner described by Sherman (2018) and demonstrated by previous work in other fields of criminology (Berk et al. 2009). –– The development of new harm measurement instruments such as the Crime Severity Score (ONS 2016), the Cambridge Crime Harm Index (Sherman et al. 2016) and the updated Home Office Cost of Crime Estimates (Heeks et al. 2018). –– The potential application of new machine-learning algorithms to big datasets, such as has been demonstrated in several recent publications (Berk 2012; Berk et al. 2016). Although the evidence base for domestic abuse is already comparatively rich, at least in the context of the general depth of rigorous evidence on policing activities, there has been long been an ongoing demand from practising agencies to acquire information and evidence that can further shape strategy (see for example Shepherd 1998; Sherman 1992, 2018). The impact of ‘austerity’ in the United Kingdom was to reduce the capacity of all government sectors, including those with primary responsibility for dealing with domestic abuse (Neyroud 2015). Yet at the same time, scrutiny from the national police oversight body, Her Majesty’s Inspectorate of Constabulary, Fire and Rescue Services (HMICFRS), has aimed specifically at domestic abuse and has been highly critical of the police response (HMICFRS 2014a). As a consequence of this, as well as a separate critical review of crime recording practices (HMICFRS 2014b), police forces have attached greater priority to identifying and responding to domestic reports, resulting in recorded crime numbers rising steeply at a time when other sources showed a decline in the prevalence of self-reported domestic abuse in the adult population (ONS 2017). With around 20% fewer police officers than in 2010 (ONS 2019), responding to the additional demand has been a challenge for police forces. Arrests and charges have declined (Ariel and Bland 2019; ONS 2017, 2018), and the inspectorate con-
30
3 Key Questions That Police Data Might Help Us Answer
tinues to highlight deficiencies in the police response in areas such as identification of risk (HMICFRS 2017, 2019). This context provides an opportunity for well- developed targeting research to help shape domestic abuse strategies, and it is precisely this opportunity which this research aims to address. There remain questions about the extent to which the evidence influences what the police (or other responding agencies) actually do in practice. As we will explore further, much of the current response to domestic abuse is not driven by evidence, and the collection of data concerning victims and offenders remains, to this point, an under-utilised resource in the development of domestic abuse strategies. It is this area that is our target. The overarching aim of this research is to add to the existing evidence base in ways that could usefully contribute to front-line strategies and underpinning theories, by exploiting the potential of an existing resource abundant in every police force – crime data. In this respect, we gathered hundreds of thousands of anonymised domestic abuse records from police forces around the country. These records related to crimes, arrests, offenders and victims, and all of these data resemble the typical sorts of information every police agency has ready access to. These records were assembled into three large datasets and analysed using a variety of statistical procedures, ranging from the very simple (rates and proportions) to the very complex (a machine learning algorithm). Each procedure that has been used has been selected with a particular research aim in mind and these aims were selected because they represent issues of high relevance to practitioners and researchers, and because there are gaps or uncertainties in what these groups know about these issues. Throughout these issues the topic of ‘harm’ is a persistent feature. As we will explore, much of the current response to domestic abuse is geared towards the identification of ‘high-risk’ cases and subsequent action to negate that risk. It is logical to argue that ‘risk reduction’ is actually an outcome that the police service and its partners are seeking, but this leads to the inevitable next question: the risk of what? The answer seems perfectly logical – serious harm to the victim – but this in turn raises a difficult question for a crime researcher. How does one measure harm? Harm is a subjective concept, particularly among practitioners; what constitutes serious harm for one person does not necessarily do so for another, and in this spirit a number of harm measurement tools have been developed. However, there are currently no national guidelines to guide this debate in any particular direction. This is the first challenge that this research seeks to overcome: the selection of an appropriate instrument for the measurement and tracking of harm to facilitate the further exploration of police data. Armed with an appropriate tool to measure and track harm, we return to the key research questions. These are organised into five principal categories: repeat abuse, serial abuse, escalation, concentration of harm and forecasting. Each category has its own distinct questions of interest which we will use the data and statistical procedures to attempt to answer. These questions are as follows:
3.4 What Problems to Measure?
31
Repeat Abuse3 1. What is the prevalence and extent of repeat victimisation of domestic abuse? 2. What is the conditional probability of further domestic abuse associated with each consecutive victimisation? 3. What is the prevalence and extent of repeat offending of domestic abuse? 4. What is the conditional probability of further domestic abuse associated with each consecutive offence? Escalation 5. Is there evidence of escalating harm in each consecutive domestic victimisation? 6. Is there evidence of escalating harm in each consecutive domestic offence committed? Concentration of Harm 7. What is the extent of concentration of harm among victims of domestic abuse? 8. What is the extent of concentration of harm among offenders of domestic abuse? 9. To what extent do the police have prior knowledge of the group of victims suffering the most harm? 10. To what extent do the police have prior knowledge of the group of offenders committing the most harm? Serial Abuse4 11. What is the prevalence and extent of serial abuse among victims of domestic abuse? 12. What is the prevalence and extent of serial abuse among offenders of domestic abuse? 13. Are serial perpetrators demographically different from repeat offenders5 or single-time offenders? 14. What types of domestic abuse crime do serial perpetrators commit and how harmful are they? 15. Do serial offenders cause more domestic abuse harm than repeat or singletime domestic offenders? 16. To what extent do domestic abuse serial perpetrators commit other forms of crime, and how does this compare with repeat or single-time domestic offenders? Repeat abuse is defined as multiple domestic crimes regardless of the identity of the other party involved. 4 Serial abuse is defined as an offender with multiple different victims, or a victim with multiple different offenders. 5 Repeat offenders in this sense are those which offend multiple times against just one victim. 3
32
3 Key Questions That Police Data Might Help Us Answer
Forecasting 17. What proportion of all arrestees go on to commit domestic abuse within 2 years? 18. What proportion of serious domestic abuse arrestees have prior records for domestic abuse? 19. Can antecedent inputs predict future serious domestic abuse cases to a high degree of accuracy? 20. If so, which inputs have the greatest impact on accuracy?
3.5 Summary Domestic abuse is measured in two main ways – public surveys and police records. It is commonly accepted that the latter source does not represent all domestic abuse, though recent trends suggest the gap between actual and recorded crimes is falling. Nonetheless, police records are the only large data source which identify individuals and thus support analytical procedures that may provide targeting insights. The largest survey in England and Wales yields data which are aggregated and anonymous and therefore cannot be used in the same way. Police datasets are also very large and cover categories of victims not reached by the official crime survey. However, whichever tool were to be used, there is still a clear and present need to identify an instrument capable of differentiating between levels of harm. Most domestic abuse strategies are focused on harm reduction in one form or another, yet there is no established mechanism for tracking this notion.
References Ariel, B., & Bland, M. (2019). Is crime rising or falling? A comparison of police recorded crime and victimisation surveys. Methods of Criminology and Criminal Justice Research (Sociology of Crime, Law, and Deviance), 24, 7–31. Barnham, L., Barnes, G. C., & Sherman, L. W. (2017). Targeting escalation of intimate partner violence: Evidence from 52,000 offenders. Cambridge Journal of Evidence-Based Policing, 1, 1–27. Berk, R. (2012). Criminal justice forecasts of risk: A machine learning approach. New York: Springer. Berk, R., Sherman, L., Barnes, G., Kurtz, E., & Ahlman, L. (2009). Forecasting murder within a population of probationers and parolees: A high stakes application of statistical learning. Journal of the Royal Statistical Society: Series A (Statistics in Society), 172(1), 191–211. Berk, R. A., Sorenson, S. B., & Barnes, G. (2016). Forecasting domestic violence: A machine learning approach to help inform arraignment decisions. Journal of Empirical Legal Studies, 13(1), 94–115. Bland, M., & Ariel, B. (2015). Targeting escalation in reported domestic abuse: Evidence from 36,000 callouts. International Criminal Justice Review, 25(1), 30–53. https://doi. org/10.1177/1057567715574382.
References
33
Bond, E., & Tyrrell, K. (2018). Understanding revenge pornography: A national survey of police officers and staff in England and Wales. Journal of Interpersonal Violence, 2018, 0886260518760011. Bridger, E., Strang, H., Parkinson, J., & Sherman, L. W. (2017). Intimate partner homicide in England and Wales 2011–2013: Pathways to prediction from multi-agency domestic homicide reviews. Cambridge Journal of Evidence-Based Policing, 1(2–3), 93–104. Brimicombe, A. J. (2018). Mining police-recorded offence and incident data to inform a definition of repeat domestic abuse victimization for statistical reporting. Policing: A Journal of Policy and Practice, 12(2), 150–164. Brimicombe, A. J., Brimicombe, L. C., & Li, Y. (2007). Improving geocoding rates in preparation for crime data analysis. International Journal of Police Science and Management, 9(1), 80–92. Chalkley, R., & Strang, H. (2017). Predicting domestic homicides and serious violence in Dorset: A replication of Thornton’s Thames Valley analysis. Cambridge Journal of Evidence-Based Policing, 1(2–3), 81–92. Cockbain, E., & Knutsson, J. e. (2014). Applied police research: Challenges and opportunities. Abingdon: Routledge. Heeks, M., Reed, S., Tafsiri, M., & Prince, S. (2018). The economic and social costs of crime (2nd ed.). London: Home Office. Henry, N., & Powell, A. e. (2014). Preventing sexual violence: Interdisciplinary approaches to overcoming a rape culture. Basingstoke: Springer. Her Majesty’s Inspectorate of the Constabulary, Fire and Rescue Services. (2014a). Everyone’s business: Improving the police response to domestic violence. [Online] Retrieved from https:// www.justiceinspectorates.gov.uk/hmicfrs/wp-content/uploads/2014/04/improving-the-policeresponse-to-domestic-abuse.pdf. Accessed 15th Oct 2016. Her Majesty’s Inspectorate of the Constabulary, Fire and Rescue Services. (2014b). Crime recording: Making the victim count. [Online] Retrieved from https://www.justiceinspectorates.gov. uk/hmicfrs/wp-content/uploads/crime-recording-making-the-victim-count.pdf. Accessed 15th Oct 2016. Her Majesty’s Inspectorate of the Constabulary, Fire and Rescue Services. (2017). A progress report on the police response to domestic abuse. [Online]. https://www.justiceinspectorates. gov.uk/hmicfrs/wp-content/uploads/progress-report-on-the-police-response-to-domesticabuse.pdf. Accessed 9th Feb 2019. Her Majesty’s Inspectorate of the Constabulary, Fire and Rescue Services. (2019). A progress report on the police response to domestic abuse. [Online] https://www.justiceinspectorates. gov.uk/hmicfrs/publications/a-progress-report-on-the-police-response-to-domestic-abuse/. Accessed 4th Mar 2019. Kerr, J., Whyte, C., & Strang, H. (2017). Targeting escalation and harm in intimate partner violence: Evidence from Northern Territory Police, Australia. Cambridge Journal of Evidence- Based Policing, 1, 1–17. Lum, C. M., & Koper, C. S. (2017). Evidence-based policing: Translating research into practice. Oxford: Oxford University Press. Myhill, A. (2015). Measuring coercive control: What can we learn from national population surveys? Violence Against Women, 21(3), 355–375. Neyroud, P. (2015). Future Perspectives in Policing: A Crisis or a Perfect Storm: The Trouble with Public Policing? In Police Services (pp. 161–165). Springer, Cham. Neyroud, P. W., & Weisburd, D. (2014). Transforming the police through science: The challenge of ownership. Policing: A Journal of Policy and Practice, 8(4), 287–293. Office for National Statistics (ONS). (2016). Research outputs: Developing a Crime Severity Score for England and Wales using data on crimes recorded by the police. [Online] Retrieved from https://www.ons.gov.uk/peoplepopulationandcommunity/crimeandjustice/articles/researchoutputsdevelopingacrimeseverityscoreforenglandandwalesusingdataoncrimesrecordedbythepolice/2016-11-29. Accessed 6th Mar 2019.
34
3 Key Questions That Police Data Might Help Us Answer
Office for National Statistics (ONS). (2017). Domestic abuse in England and Wales: Year ending March 2017. Statistical Bulletin. London: Office of National Statistics. [Online] Retrieved from https://www.ons.gov.uk/peoplepopulationandcommunity/crimeandjustice/bulletins/ domesticabuseinenglandandwales/yearendingmarch2017. Accessed 17th Mar 2018. Office for National Statistics (ONS). (2018). Domestic abuse in England and Wales: Year ending March 2018. Statistical Bulletin. London: Office of National Statistics. [Online] Retrieved from https://www.ons.gov.uk/peoplepopulationandcommunity/crimeandjustice/bulletins/ domesticabuseinenglandandwales/yearendingmarch2018. Accessed 2nd Mar 2019. Office for National Statistics (ONS). (2019). Police workforce, England and Wales: 30 September 2018. Statistical Bulletin. London: Office for National Statistics. [Online] Retrieved from https://www.gov.uk/government/statistics/police-workforce-england-and-wales-30-september-2018. Accessed 29th May 2019. Public Administration Select Committee. (2014). Caught red-handed: Why we can’t count on police recorded crime statistics. [Online] https://publications.parliament.uk/pa/cm201314/ cmselect/cmpubadm/760/760.pdf. Accessed 4th Mar 2019. SafeLives. (2018). About domestic abuse. [Online] http://safelives.org.uk/policy-evidence/aboutdomestic-abuse. Accessed 4th Mar 2019. Sharp-Jeffs, N. (2015). A review of research and policy on financial abuse within intimate partner relationships. London: London Metropolitan University. Sharp-Jeffs, N. (2017). Money matters: Research into the extent and nature of financial abuse within intimate relationships in the UK. Manchester: Co-operative Bank. Shepherd, J. P. (1998). Tackling violence: Interagency procedures and injury surveillance are urgently needed. British Medical Journal, 316, 879–880. Sherman, L. W. (1992). Policing domestic violence: Experiments and dilemmas. New York: Free Press. Sherman, L. W. (1998). Evidence-based policing. Washington, DC: Police Foundation. Sherman, L. W. (2013). The rise of evidence-based policing: Targeting, testing, and tracking. Crime and Justice, 42(1), 377–451. Sherman, L. W. (2018). Evidence-based policing: Social organization of information for social control. In Crime and social organization (pp. 235–266). London: Routledge. Sherman, L. W., & Strang, H. (1996). Policing domestic violence: The problem-solving paradigm. Paper presented at the Stockholm conference on “Problem-solving as crime prevention,” Swedish National Council on Crime Prevention. Sherman, L., Neyroud, P. W., & Neyroud, E. (2016). The Cambridge crime harm index: Measuring total harm from crime based on sentencing guidelines. Policing: A Journal of Policy and Practice, 10(3), 171–183. Sparrow, M. K. (2011). Governing science. New perspectives in policing: Harvard Kennedy School Program in criminal justice policy and management. Harvard University. Thornton, S. (2017). Police attempts to predict domestic murder and serious assaults: Is early warning possible yet? Cambridge Journal of Evidence-Based Policing, 1, 1–17. Walby, S. (2009). The cost of domestic violence: Up-date 2009. Lancaster: Lancaster University. Watts, C., & Zimmerman, C. (2002). Violence against women: Global scope and magnitude. The Lancet, 359(9313), 1232–1237. Weisburd, D., & Neyroud, P. (2013). Police science: Toward a new paradigm. Australasian Policing, 5(2), 13. Women’s Aid. (2017). How common is domestic abuse? [Online] https://www.womensaid.org.uk/ information-support/what-is-domestic-abuse/how-common-is-domestic-abuse/. Accessed 4th Mar 2019.
Chapter 4
The Existing Evidence
4.1 Introduction While it may not always be clear that domestic abuse responses are based on robust evidence, there is no shortage of existing research on the topic in general. Navigating this body of work can be a daunting task but it is an important one because prior studies provide the context in which we place the findings we present in this book, from stark interpretations (do our findings replicate others?) to more subtle analyses (do our findings extend or refute existing hypotheses?). In this chapter we outline the established evidence in each of the main question groups we set out in the previous chapter. We begin with a review of literature on repeat domestic abuse, which is a prerequisite factor for many of the factors this research seeks to explore. Without repeat abuse there can be no escalation, no concentration of harm, no serial offending and logically therefore, low potential for forecasting. There follows a section on previous research into escalation, which has become a widely accepted phenomenon in domestic abuse practice with an apparently limited empirical basis. This is followed by a brief section on the concentration of harm, a subject for which has virtually no prior research. We then present a summary of evidence on serial perpetrators, an area which has had a somewhat limited coverage in domestic abuse research to date. This section of the chapter includes synopses of work on general typologies of domestic abuse offender to contextualise how serial abusers are situated within the wider landscape of criminal behaviour. We conclude with an analysis of previous research into forecasting domestic abuse, and machine learning forecasting methods in criminal justice settings in general. As we will see, this is an emerging field, but not one without empirical precedents from which we can learn.
© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2020 M. P. Bland, B. Ariel, Targeting Domestic Abuse with Police Data, https://doi.org/10.1007/978-3-030-54843-8_4
35
36
4 The Existing Evidence
4.2 Evidence on Repeat Domestic Abuse While in many cases a domestic abuse event ends the relationship between partners, research suggests that domestic abuse can also become a repeated phenomenon and a reflection of a wider pattern of events (Walby 2005; Stark 2007, as cited in Robinson 2016). Evidence tends to support this argument for both offenders and victims (Bland and Ariel 2015; Chambers-McClellan 2002; Feld and Straus 1990; Sherman 1992; Walby and Allen 2004). This body of research suggests that some domestic batterers and abusers find it difficult to ‘break the cycle’, while certain victims of domestic abuse are similarly ‘trapped’ in abusive relationships. The causes of this persistency are unclear though, and our ability to predict either which relationships will immediately end, or which will persist despite abuse is not well refined. There are multiple psychological, environmental and economic factors that appear to play a part in these decisions (see for example Elisha et al. 2010; Malach- Pines 2002; Mintz 1980). However, even if we are unable to fully characterise the extent, scope, and nature of such cyclic abuse, we do know with hindsight, that repetitive domestic abuse does exist. Our understanding of the repeat abuse phenomenon is defined by the precision and accuracy of the data we can collect. For example, much of our knowledge is grounded in public records, such as official statistics collated by police forces. Traditionally, the external validity of these data has been challenged, and a plethora of evidence suggests that domestic abuse is underreported. When compared to victims’ surveys, for example, the ‘criminological gap’ can be large (Gracia 2004; Felson and Pare 2005; Frieze and Brown 1989; Pagelow 1981), although we highlighted in Chap. 3, this gap may be shrinking. There are also cultural variables at play (Kasturirangan et al. 2004); some types of victims are more likely to report domestic abuse, while others are less likely to lodge a complaint against a family member (Felson and Pare 2005). The way in which the police might handle such a report is also an issue, which some have referred to as ‘secondary victimisation’ given the lack of necessary sensitivity on the part of police officers (Barnish 2004). Several studies highlight both victims and offenders separately as being party to repeated abuse. Hester (2013) found that 83% of male domestic abuse offenders repeated their offences in a 6-year follow-up period, while Smith et al. (2010) found over three quarters of domestic abuse incidents to involve repeat victims (see also Barnham et al. 2017; Kerr et al. 2017; Stark 2007; Walby 2005). Feld and Straus (1989) found similar levels of repeat cases in a family violence survey in the United States, and Walby and Allen (2004) identified high levels of repeat criminality within relationships in the 12 months preceding their survey, with females experiencing higher levels of abuse than males. Scholars tend to disagree on the number of crimes victims typically experience, however, with much conjecture surrounding precise numbers (Bland and Ariel 2015; Giles-Sims 1983; Okun 1986; Straus 1990). However, the general prevalence and overall importance of recidivism in domestic abuse is demonstrated by a large body of research (see for example, Lloyd et al. 1994). Broadly speaking, such studies consolidate the evidence that repeat offending
4.3 Evidence on Escalation
37
in domestic abuse is widespread, and many early US domestic abuse studies focused on reducing recidivism. Sherman and Berk’s (1984) trial on the impact of arrest in Minneapolis found repeat rates of between 13% and 26% within 6 months among its different cohorts. This seminal study spawned numerous replications (known as the Spousal Assault Replication Series) in other cities around the US. While the studies had mixed findings on the effect of arrest on repeat offending, they did all find widespread evidence of repeat offending (Maxwell et al. 2002). While the Minneapolis trial and its replications focused on police reports as an outcome measure, others have used victim surveys, but all have reached a similar conclusion: repeat victimisation is common, lying somewhere between 17% and 59% (see Felson et al. 2005). Other researchers have conducted longitudinal studies of criminal careers. Klein and Tobin (2008) reviewed 342 men who went before court for domestic violence crimes in Massachusetts in 1995 and 1996, studying their criminal histories until 2004. The results were again similar: 32% of these men committed further domestic abuse within a year of their index offence, and 60% did so within the full duration of the study. This pattern of higher recidivism with longer study periods is supported by Loinaz’s (2014) study of 150 Spanish males imprisoned for domestic abuse, of whom 15% committed a further domestic offence initially, rising to 66% within a year. Scandinavian studies have reported prevalence rates of between 16% and 48% for differing types of domestic offender (Svalin et al. 2017; Petterson and Strand 2017), while British studies have provided an array of supporting evidence. Hester’s aforementioned study (2013) tracked 96 domestic abuse offenders for 6 years and found 83% of male perpetrators to reoffended in that timeframe. This study was the latest in a series of papers by Hester focusing on one police force in northern England. Earlier research had found that half of offenders had a repeat domestic case within 3 years (Hester and Westmarland 2006) and that those perpetrators described as ‘all round offenders’, or those who committed other non-domestic forms of criminality, were more likely to reoffend (Hester and Westmarland 2006). More recently, we analysed 36,000 police records of domestic abuse in Suffolk, in the east of England, and found that 35% of suspects were linked to more than one police-recorded report (Bland and Ariel 2015). Our initial study into this field differed from its predecessors in that it incorporated police records of non-crime incidents for the first time and used a non-judicial definition of the term ‘offender’.
4.3 Evidence on Escalation The term ‘escalation’ commonly refers to the phenomenon of increasing chronological severity, be it generalised or linear. It is a component of the current risk assessment model used by many domestic abuse practitioners (see Chap. 2), and its roots can be found in theories developed more than three decades ago (Pagelow 1981; Walker 1979, 1984). It seems that, at least in the popular view, ‘most calls to
38
4 The Existing Evidence
police or survivor advocacy agencies only occur after survivors have experienced lengthy escalation’ (www.abuseandrelationships.org). Yet the evidence lays out a more complicated story, with fewer systematic observations of its existence than we might like to see to conclusively satisfy questions such as whether abuse tends to increase in severity over time, or is ‘higher’ harm either random or circumstantial? The answers to such questions pre-empt whether temporal patterns can be identified and, by implication, predicted? Similarly, if an escalation in severity is predictable, what functional shape characterises its growth? The epistemological and phenomenological antecedents of domestic abuse have been studied over time, yet scholars do not agree fully on the evidence of these harm pathways. One clear example is domestic homicide, with the hypothesis that harm increases over time and culminates in a moment of killing, but the link between domestic abuse and domestic homicide is far from clear. Some research suggests that persistent domestic abuse and domestic homicide do not share similar characteristics, as the offenders’ justifications for domestic homicide are often different than those offered by domestic abusers (Goussinsky and Yassour-Borochowitz 2012). If this is the case, then it could be argued that escalation in harm – from verbal abuse and controlling partnerships to physical and ultimately homicidal victimisation – is rooted in discrete aetiologies (see for instance Moffitt 1993, more broadly on differential crime growth taxonomies). Given the different views on escalation of severity, it is not surprising that there is inconsistency in the evidence. Walker (1984) and Feld and Straus (1989) used surveys which relied on victim accounts and argued that escalation does take place. Campbell et al. (2007) concluded that violence by males against their partners is the most ‘salient risk factor’ for homicide, as domestic violence precedes up to 70% of cases. Similar conclusions were demonstrated by Crawford and Gartner (1992) as well as Stout (1993). However, Feld and Straus (1989) compared only two temporal data points, therefore lacking the necessary sensitivity and variation over time to demonstrate a developmental function. On the other hand, Chambers-McLellan (2002) found perhaps the clearest evidence of escalation among 19,686 residential domestic abuse cases in Georgia, USA. This study concluded that crime severity increased by 0.07 on the Conflict Tactics Scale (a 0–18-point scale of severity) but had clear limitations, using a timeframe of only 12 months, with notable sample exclusions and an unequally weighted measurement instrument. Other researchers have failed to replicate the extent of these findings (Piquero et al. (2006) – though this studied used a truncated scale of severity measurement), leading Dutton and Kerry (2002) to conclude that domestic homicide does not necessarily follow escalation of violence. Given that homicide is relatively rare, this is not surprising, and the findings do not rule out the presence of escalation in more serious cases other than homicide. The problem then is classifying ‘high harm’ or serious cases other than homicide. Other evidence to support at least partial escalation can be found. Johnson (2006) identified that just over three quarters of ‘intimate terrorism’ cases indicated that violence became more severe over time. A similar proportion of participants in Andersen et al. (2003) study made a similar indication.
4.5 Evidence on Serial Domestic Abuse
39
The lack of homogeneity across results is at least partially explained by the absence of a reliable measure of crime severity. In 2015 we attempted to address this by introducing the use of CCHI as a measurement instrument (see Bland and Ariel 2015; Sherman et al. 2016) which we argued at the time, provided us with a more robust way to assess harm over time (in Chap. 5, we re-evaluate this argument). Yet this approach did not lead to observation of statistically significant patterns of escalation among domestic abuse dyads in Suffolk, England, with five or more reported incidents in a 5-year period. The CCHI was also recently used to investigate escalation in domestic abuse cases by Barnham et al. (2017) in an analysis of 52,296 perpetrators of intimate partner violence in the Thames Valley police jurisdiction, and by Kerr et al. (2017), who analysed more than 60,000 records from the Australian Northern Territory. Neither study found evidence of escalation other than among Aboriginal offenders with three or more intimate partner incidents in a 4-year period (Kerr et al. 2017). From a policy perspective though, escalation of violence has been assumed to be a risk factor for domestic homicide for some time (Campbell 1995); however, a substantial number of abusive relationships are not known to the police or other social services (Aldarondo and Mederos 2002). Therefore, it remains an open question whether the ‘writing was indeed on the wall’ of the police station and whether violence – and specifically domestic homicide – could have been prevented by predicting future harm based on past harm reported to the police.
4.4 Evidence on the Concentration of Harm Research concerning the extent to which harm in domestic abuse cases is concentrated, is limited to just two studies. As we have already commented on, in our 2015 paper, we used the Cambridge Crime Harm Index to analyse harm patterns (Bland and Ariel 2015). We concluded that less than 2% of all dyads in Suffolk, England accounted for 80% of cumulative harm, of which half of the dyads had no prior record of domestic abuse. In a partial replication of Bland and Ariel’s study, Barnham et al. (2017) found that 3% of domestic abuse offenders in the Thames Valley police jurisdiction accounted for 90% of cumulative harm. However, these two papers offer the full extent of quantified analyses on the distribution of harm among domestic abuse offenders. Although we are not in entirely unchartered waters here, they are not well sailed.
4.5 Evidence on Serial Domestic Abuse Research specifically on serial victims and offenders is less dense than research on general repeat abuse. The notion of a ‘serial domestic abuser’ is better known than that of a ‘serial abuse victim’, often driven by media attention and the wider
40
4 The Existing Evidence
e xpectations around serial criminals (Robinson 2017). In this subsection we firstly consider how domestic abuse offenders have been typically classified in previous research and then expand specifically on serial offenders and what empirical research has so far concluded about their prevalence.
4.5.1 Typologies of Domestic Batterers A wide body of research has attempted to develop a taxonomy of classifications for domestic abuse offenders. Among these studies, the evidence strongly indicates heterogeneity (Cantos and O’Leary 2014; Cavanaugh and Gelles 2005; Gondolf 1998; Gottman et al. 1995; Hamberger et al. 1996; Holtzworth-Munro and Stuart 1994; Johnson 1995; Johnson and Ferraro 2000), and from this basis a range of typologies have emerged, largely from the field of psychological research. Consequently, typologies tend to fall into two main groups: a small number of behavioural-based models and a large number of personality-based models. Behavioural-based typologies were explored by Brisson (1981) and Gondolf (1988). Gondolf identified three types of batterers: type I – ‘sociopathic’ abusers, who commit high levels of physical and social abuse; type II – ‘antisocial’ abusers, who are generally more violent but less likely to be arrested; and type III – ‘typical abusers’, who are generally violent but less disposed to serious violence. There are clear parallels between Gondolf’s typology and Johnson’s common couple violence/patriarchal terrorist taxonomy (Johnson 1995), its subsequent expansion (Johnson and Ferraro 2000), and the personality-based models developed by other researchers, as described by Cavanaugh and Gelles (2005) in their synthesis of the evidence. Johnson and Ferraro’s (2000) work identified typologies of relationships rather than offenders, but their labels still have relevance by way of implicitly describing the characteristics of perpetrators. ‘Common couple violence’ offenders are only violent within their relationships, and according to the authors, are approximately evenly split between males and females. In these couples, violence occurs just once or twice. This doesn’t necessarily preclude ‘common couple’ offenders from being serial offenders, but this doesn’t fit the narrative of ‘predators stalking prey’, and logically it may take a long period of time for offenders to accumulate multiple victims as they move through relationships. ‘Intimate terrorists’, on the other hand, could potentially fit the serial stereotype well. This type of relationship violence is explicitly relevant to the offender who uses violence as part of a wider pattern of control and coercion. Johnson and Ferraro suggested that these types of offenders are more dangerous to victims. The third Johnson and Ferraro relationship type, ‘violent resistance’, explains offenders responding to a threat from a victim who is normally the aggressor. It is less logical that such offenders would be serial, although it is of course still possible. The fourth type, ‘mutual violent control’, is an extension of intimate terrorism, but on the parts of both parties. The fifth and final kind of violent relationship defined by Johnson and Ferraro, ‘generalist-borderline violence’, is an extension of earlier attempts to define antisocial batterers
4.5 Evidence on Serial Domestic Abuse
41
(Holtzworth-Munro and Stuart 1994; Jacobson and Gottman 1998). In these relationships, the offender undertakes violent acts as a symptom of being emotionally overwhelmed. Personality-based typologies first emerged with Elbow (1977), who described four personality syndromes in wife abusers, predicated on a combination of social learning and family perspectives. Elbow’s types (‘controller’, ‘defender’, ‘approval- seeker’ and ‘incorporator’) have less validity today than at the time of their creation owing to social changes and the growing knowledge of domestic abuse outside the context of heterosexual marriage. However, the generalisability of types and integration of theoretical constructs certainly influenced later research. Perhaps the most prominent typology to subsequently emerge is found in the work of Holtzworth-Munro and Stuart (1994), who reviewed 15 separate typologies to identify groups along theoretical lines of severity and frequency, generality of violence, and psychopathy and personality. This work led to the development of a tripartite typology: ‘family-only abusers’, who only commit crime in the domestic setting; ‘generally violent abusers’, who commit violent crime beyond their family and home; and ‘generalist abusers’, who straddle the other two groups. Holtzworth- Munro’s model is somewhat simplistic to the point of over-generalisation, but it has a high level practical application that articulates a continuum of severity, and it has been validated, to a limited extent, by other research, including that of Johnson and Ferraro (2000), which offers multiple parallels. Hamberger and Hastings (1988), albeit with a very small sample (n = 204), found three clusters of spouse abuser type, which correlated with Holtzworth-Munro and Stuart. Saunders (1992), also with a small sample (n = 165) identified three types of abuser, too (‘family only’, ‘generalised’ and ‘emotionally volatile’). Other models have been developed, too. Tweed and Dutton (1998) concluded that prior research pointed to two distinct subtypes of batterer: a group which supresses conflict in marriage and thus commits violence in non-intimate relationships, and a group that reports only intimate partner violence. Among these numerous typologies there is one common theme: the classification of the extent to which the offender’s violent behaviour exhibits outside the domestic environment. This particular aspect is supported by other research (Klein 1996; Buzawa et al. 1999), and there is a tangible, if abstract, agreement between the models that the frequency and severity of recidivism vary between typology, whatever the framework. Cavanaugh and Gelles (2005) summarise this effectively in their description of a different overarching trifold typology: low-, moderate- and high-risk offenders, for which they assert that little escalation occurs from low to high. This is perhaps the most relevant research for practitioners who are primarily focused on the management of (relatively) short-term risk of reoffending, and most likely to be recognised due to its similarity in descriptive structure to risk assessment cohorts. Despite this array of typologies, there are large gaps in the evidence base for their application and theoretical design. Most research is based on married men, and little is known about the fit of models within different demographic groups, particularly minorities. Critical reviews of typologies have concluded support for the
42
4 The Existing Evidence
Holtzworth-Munro and Stuart framework (Dixon and Browne 2003) and even suggested an expected proportion for each typology, but explicitly criticise research to date for a narrow focus on offenders only and the lack of a systematic approach to offender profiling. This problem is compounded by the relative ambiguity of definitions in this field. As Edelstein (2016) highlighted, researchers and scholars frequently misuse terms. Edelstein suggested that theory should form an agreed basis for constructs such as typologies. In the absence of this, this area of research, although rich in volume, is confusing, ambiguous and at times contradictory, limiting the potential for its meaningful practical application, at least in the law enforcement field. On the basis of this limited practical usability it is unsurprising that typologies of domestic abuse offenders have gained little traction among offender programmes (Cantos et al. 2015). Given the current established practice in England and Wales of using a ‘high/medium/low’ system prompted by the DASH process (see Chap. 2), any typology framework designed for practical application must seek to augment or complement this design. Some researchers have suggested that categorising offenders on the basis of their violence profile would be a useful development (Petterson and Strand 2017), while others have suggested this should be a prerequisite for assigning the correct intervention to an offender (Cavanaugh and Gelles 2005). Whatever the option, there is general agreement about the need for such a tool (Petterson and Strand 2017; Stoops et al. 2010; Edelstein 2016; Cavanaugh and Gelles 2005).
4.5.2 Serial Perpetrators Serial perpetrators are potentially one variation on such a typology framework, but empirical evidence about them is thin on the ground. Despite this, the term ‘serial perpetrator’ has gained some traction among practitioners in recent times, as demonstrated by the public statements of intent from Chief Officers (Robinson 2017) and the police inspectorate (HMICFRS 2014a, 2015; Robinson 2017). The label first came to attention following Richards’ (2006) review of almost 400 domestic homicide and sexual assault cases in the London area. Richards identified a number of serial offenders who went from relationship to relationship committing abuse. While important in promoting the concept of serial abuse, Richards confused the definition of serial domestic abuse with serial violence, arguably conflating two of the Holtzworth-Stuart and Munro subtypes into a single definition. The number of serial offenders was also unspecified, making it impossible to gauge prevalence. Nevertheless, the concept has taken off, with the author since leading a national media campaign for a register of serial domestic abuse offenders similar to the sex offender register. The question of definition is of particular prominence in this topic. There is no consensus on what makes an offender a ‘serial offender’, presenting a significant obstacle to both the development of knowledge and the application of practical
4.5 Evidence on Serial Domestic Abuse
43
s olutions. Although police chiefs in England and Wales have developed their own definition (Robinson 2017), this is unique to police in those countries. Overall, there have been a number of methodological variations in defining serial offenders (Kocsis et al. 2002). The term ‘serial’ is determined based on differing considerations, primarily frequency of crime or motivation. For the Federal Bureau of Investigation, three homicides are required to render an offender a serial killer (Kocsis and Irwin 1998). Elsewhere, serial rapists require just two victims (Hazelwood and Burgess 1987). Other researchers have specified the inclusion of a minimum elapsed time period between offences or that offences must be of the same category (Best and Luckenbill 1996; Egger 1985; Holmes and Deburger 1998; Mitchell 1997). Defining seriality this way is problematic for a number of reasons, not least the arbitrariness of setting a threshold (see the arguments put forward in Edelstein 2016; Kocsis and Irwin 1998), so a range of alternatives have been advanced based on psychological or motivational factors. Kocsis and Irwin (1998) proposed that a psychologically-based method could lead to the identification of a serial offender when they have committed only one offence, but their analysis is not specifically aimed at domestic abuse and discounts the limited operational practicability of such a model. Edelstein (2016) is an advocate of ‘criminal careers’ as the defining characteristic of serial offender status, proposing distinctions between the professional career criminal who is motivated by material profit, the serial pathological career criminal seeking to pathologically profit, and the serial non- professional, who lacks any professionalism and offends out of habit. This theoretical construct poses interesting questions for research into serial domestic abuse and has overt parallels to Johnson’s (1995) and Holtzworth-Munro and Stuart’s (1994) frameworks. The key challenge for practical application, however, is whether the desire for pathological profit can be determined.
4.5.3 Prevalence of Serial Perpetrators of Domestic Abuse Issues with definitions notwithstanding, there is at least some evidence that serial perpetration of abuse occurs and is moderately prevalent. Hester and Westmarland (2006) conducted the first UK-based research in conjunction with the Home Office. They studied 692 domestic violence perpetrators from the north-east of England, 90% of whom were males, over the course of an 18-month period and found that 50% of offenders had at least one further domestic incident, with 18% of those involving a different victim. This gave an overall serial prevalence rate of 9%, just half the 18% later reported by Robinson (2017). Hester and Westmarland described serial perpetrators within the context of four groups of domestic abuse offender: a ‘one-incident’ group, for which their index offence was the only known record in the study period; a ‘mainly non-domestic’ group, who had one domestic offence and more than one other type of crime; a ‘dedicated repeat domestic violence’ group which committed multiple domestic crimes but no crimes of other kinds; and an ‘all-round repeat offenders group’ composed of those with numerous types of
44
4 The Existing Evidence
offences, both domestic and non-domestic. This latter group was marginally the most prevalent. Hester and Westmarland only made passing reference to serial perpetrators, however, and it is not known how they are distributed within the ‘dedicated’ and ‘all-round’ groups. In the UK, there have been two other studies with notable findings on the prevalence of serial domestic abuse perpetrators. Firstly, our own Suffolk study from 2015 (Bland and Ariel 2015) which analysed 18,675 offender cases from between 2009 and 2014. With data incorporating non-crime and crime incidents, we identified a repeat rate of 35% and, within this cohort, a 47.6% serial rate, giving an overall serial perpetrator prevalence of 16.7%. Secondly, Robinson (2017) analysed a range of police and partner data sources pertinent to 100 domestic abuse perpetrators in Wales. Obstructed by data quality, Robinson’s estimate of prevalence among this cohort ranged from 4% to 20% owing to the disparity in definitions used by different agencies. The problems Robinson encountered offer stark insight into the difficulties of exploring the serial tendencies of offenders in a practical setting, but ultimately the generalisability of this research was limited by a small sample size and narrow geographic focus. Robinson concluded that serial considerations should form part of offender management decisions alongside risk assessments but did not develop this idea beyond a strategic overview. Research into serial perpetrators of domestic abuse from outside the UK is even sparser, but generally finds higher rates. Klein et al. (2005) found that 28% of 552 male offenders on probation in Rhode Island offended against a different victim within a year. Bocko et al. (2004) found that 43% of 1341 offenders charged with violating a restraining order had more than one victim. This work was limited, however, by the exclusion of a third of its original sample which did not have viable relationship information.
4.6 Evidence on Forecasting Domestic Abuse If we follow current practice and theory of escalation, high harm domestic abuse can be forecast before it occurs, based on prior behaviour. In Chap. 8, we will test this explicitly using a machine learning statistical procedure. In this subsection we critique the recent history of actuarial forecast tools and machine learning tools in particular, in criminal justice environments.
4.6.1 Actuarial Instruments in Criminal Justice Forecasts Assessments of dangerousness in domestic abuse cases have existed for a long time, in a number of countries, resulting in a large number of varying instruments for the task. In the last two decades, the development of tools has accelerated as the demand on agencies charged with dealing with domestic abuse cases has increased. As one
4.6 Evidence on Forecasting Domestic Abuse
45
of the primary figures in domestic abuse danger assessments, Jacquelyn Campbell, explained in her Vollmer Award address (2005), these instruments offer a method of triage, essential to allocating limited resources efficiently. At the heart of this expansion, one key issue has remained consistent: should instruments be based on clinical methods (in which forecasts are arrived at by expert panels or individuals, based on professional judgment and experience) or actuarial methods (in which forecasts are derived from an empirical, often mathematical, basis)? In fact, there are three forms of model in practice using ‘structured professional judgement’, which comprises both clinical and actuarial elements. Each form has its exponents and critics, often based on interrelated aspects; a ‘con’ for actuarial is a ‘pro’ for clinical, and vice versa. Though well-worn, in the context of assessments of future dangerousness, these arguments are worthy of our attention because the rules of this game have not changed. The development of clinical instruments in danger assessments has been the product of practicality and cost more than of rigorous research. One argument contends that a victim’s own perception of their future risk is as good a predictive tool as any, but research offers limited support for this claim (Campbell 2005). In the context of mental health and future violence, it has been argued that the ‘science’ is not available to design actuarial models of sufficient effectiveness to replace clinical judgements (Litwack and Schlesinger 1999). Two decades later, this is not necessarily the case. Even before Litwack and Schlesinger’s claim, scholars were arguing that actuarial models offered the only ‘defensible’ option (Quinsey et al. 1998), and indeed most empirical studies, even dating back to Paul Meehl’s original focus on the actuarial versus clinical debate (1954), have in the main concluded in favour of the superior accuracy of the former. This is best summarised in two meta-analyses of actuarial versus clinical model studies set out in the 2000s. Grove et al. (2000) found that actuarial techniques (described by the authors as ‘mechanical’) ‘substantially outperformed clinical prediction in 33–47% of studies examined’ (p. 19). Conversely, just 6–16% of the 136 studies included in the analysis found the difference in predictive accuracy to be substantially in favour of clinical methods. Only six of these studies related to criminal recidivism or criminal behaviour, but the overall effect, and its order of magnitude – that actuarial instruments are more accurate than clinical tools with a Cohen’s d of 0.12 – is of note in the debate overall, as is the authors’ assertion that the superiority of actuarial tools is not universal. Ægisdottir et al. (2006) built on Grove et al.’s work in the field of mental health practice in particular, also finding that actuarial instruments made more accurate predictions than their clinical counterparts. In the 48 most rigorous studies in the analysis, actuarial tools were 13% more accurate on average. However, Ægisdottir and colleagues highlighted a number of subtleties which are worthy of examination in light of our purpose. They identified that statistical rules should be established, particularly where errors were of differing costs, and that not all statistical tools were equally accurate, nor were they all more accurate than clinical tools. They also emphasised the need for practitioners to be familiar with any tools used, particularly in respect of their ethical implications.
46
4 The Existing Evidence
While the consensus of researchers is that actuarial models are more accurate at predicting outcomes than clinical methods (see also Gottfredson and Moriarty 2006; Hastie et al. 2009; Milner et al. 2017), they are far from being the dominant form of tool used in practice. Researchers commonly agree that actuarial tools can be better at using data more reliably and consistently, paying regard to base rates, and allowing for more accurate profiling of weights. However, actuarial methods require statistical expertise to build and deploy and can be costly, so it is unsurprising that most police forces do not use actuarial assessments in the field of domestic abuse. Alive to the practical difficulties of deploying actuarial tools, researchers have recommended models based on combinations of clinical and actuarial methods, commonly labelled as ‘structured professional judgement’ tools (Kropp 2004). The primary problem with this strand of tool is that it is even less specific than either of the others, and so potentially open to the problems of each. The devil, it seems, is in the detail that determines the precise role of factors which may influence the accuracy and fairness of forecasts (Urwin 2016). Foremost among researchers’ concerns is the role that heuristics play in clinical or structured professional judgement–based forecasts. By definition, these assessments rely (at least in part) on procedures where clinicians gather and interpret information through the subjective lenses of experience, training, and their own world views (Meehl 1954; Grove et al. 2000; Campbell 2007; Robinson et al. 2016). Even if we accept that all the data gathered for interpretation in this way is consistent (which it is almost certainly not, in practice – see Robinson et al. 2016, for discussion of data gathering for domestic abuse by police in England and Wales), it remains practically inevitable that different people will arrive at different conclusions for identical cases. In this process, heuristics are integral. The potential for heuristic bias in forecasting was classified by Tversky and Kahneman (1975) and reviewed in respect of policing forecasts by Urwin (2016), who concluded that the array of possible heuristic factors influencing the decisions of custody officers was extensive. This framework is applicable to our research, in which individuals, including generalists and specialists, make assessments of future risk in domestic abuse cases. The individual’s assessments are potentially affected by how readily they can recall relevant information, how many times they have encountered a similar scenario, and their confidence in their own knowledge (Tversky and Kahneman 1975; Kahneman 2011; Urwin 2016). The latter is commonly overestimated, with even individuals who know that actuarial tools are generally more accurate preferring to ‘go with their gut’ on an individual case basis. High-profile, rare events can have undue influence on ‘expert judgement’ precisely because they are easier to recall (Kahneman and Klein 2009). In theory, actuarial models can eliminate biases caused by heuristics, but this is not a given. If the predictor data on which an actuarial model is designed contains results that are the product of biases, these may trickle down to the resulting model (Harcourt 2014). In practice, the role of heuristics has not been definitively scrutinised in domestic abuse dangerousness assessments. As we discussed in Chap. 2, the most prevalent form of dangerousness assessment for domestic abuse in England and Wales, is the DASH, which is most commonly described as a structured professional judgement
4.6 Evidence on Forecasting Domestic Abuse
47
exercise. Though not applied consistently (Robinson et al. 2016), a common application of the DASH is as follows: a responding officer collates the answers to 25+ questions, asked of the victim. Each affirmative answer receives one ‘point’ and contributes to a total score, a method first established almost 100 years ago (Burgess 1928). In all forces, this numerical score is combined with a professional’s judgement of risk to determine the outcome. This process is hypothetically repeated afresh each time a call-out is made to a domestic abuse incident. Robinson et al. (2016) conducted the most comprehensive review of DASH to date, conducting observations, interviews and surveys in three forces. Their review highlights several key points about the DASH risk assessment process as it was at the time, that are of relevance to any critique of structured professional judgement or the potential application of new methods. In concluding that the DASH was applied inconsistently by police officers, the authors explained that they found evidence of officers adjusting or omitting questions, or in some cases choosing not to submit a form at all. They also found that officers tended to weight criminal offences, in particular giving greater weight to those involving physical harm, and that attention to coercive and controlling behaviours was missing. The Robinson review recommended a more ‘evidence-based’ approach, and subsequently co-authors Julia Wire and Andrew Myhill evaluated the pilot of a new risk assessment (Wire and Myhill 2018). The new risk assessment placed greater emphasis on coercive control and concluded that the tool led to higher rates of agreement between responding officers’ and secondary risk assessors’ judgements of risk. However, the methodology did not use equivalent comparison groups or test risk assessments for their predictive validity. The new tool increased the numbers of cases graded as ‘medium’ risk,1 which has potential demand implications for police forces. At the time of writing, additional forces were piloting the new instrument with a view to nationwide roll-out, even though the question of how effective the new tool is at predicting high–harm domestic abuse had still not been addressed. Whether this is even an important question or not remains a matter of debate, but Campbell (2005) presented the most conclusive summary of important issues for domestic abuse risk assessment and tackled this issue in particular. Campbell argued that, before considering the issue of predictive validity, the agency using any domestic abuse risk instrument must first decide what it is for – to predict extreme violence such as homicide or simply the risk of reoffending. The latter is far more prevalent than the former in England and Wales but is still relatively rare overall (Bland and Ariel 2015; Barnham et al. 2017). It would seem that the DASH was conceived on the premise of the former (Richards et al. 2008; Robinson et al. 2016), but in either case the central issue remains one of prediction, so it is perhaps surprising that, in an area of such high demand and profile, neither the primary predictive instrument nor its proposed replacement has as yet undergone a countrywide or otherwise extensive assessment of predictive validity, especially when single-force research (Thornton 2017; Chalkley and Strang 2017; Grogger et al. 2020; Turner et al. 2019)
The typical domestic abuse grading structure is standard, medium or high risk.
1
48
4 The Existing Evidence
has strongly indicated a tendency toward low predictive validity, particularly a high rate of false negatives. The latter of these four studies offers the most comprehensive view yet of the DASH’s predictive validity. Turner et al. focussed on the 30% of a metropolitan police force’s records (n = 350,000) in which the couple had more than one DASH record, and isolated those cases which the couple had no prior DASH record in the two preceding years resulting in a final sample of n = 61,080. Within this sample, they authors sought to establish the predictive validity of the 27 individual risk assessment questions and the overall risk grading (high, medium or standard). The outcome examined was the occurrence of future serious abuse (defined as assault with injury and above on the Crime Severity Score scale) within 1 year of the index crime. Among the cases that were re-victimised in this way, officers correctly risk assessed (i.e. gave a grading of ‘high’) in 5.7% of intimate partner cases (n = 41,570) and 2.7% of non-intimate partner cases (n = 19,510). By implication then, the false negative rates were over 90% for both categories. The false positive rates for officer predictions (i.e. cases where officers predicted some risk of future harm, but none occurred), was 94.4%. The authors concluded therefore that the DASH forecasts were not much better than random predictions, although they noted some caution because of the possibility that interventions in high risk cases may be responsible for some of the false positive results. The implication of these findings concerning the predictive power of the DASH are especially contradictory to the current resourcing predicament in policing. If resources are tight, why are forces content to continue to potentially over-allocate resources to cases that will not result in either homicide or any form of reported reoffence? There is an ethical perspective to this issue. Not all abuse is reported, so it might be argued that the abuse which is reported merits investigation. But the practical aspects of this argument cannot be ignored; there are not enough police and partner resources to go around, and the risk of over-committing to cases unnecessarily is that those cases truly at risk of high harm do not receive the preventative treatment they require. One solution would be to allocate more resources to this area of business. Another option, and one that may be more efficient operationally and financially, would be to find a risk assessment instrument that is more accurate at predicting high harm cases.
4.6.2 Machine Learning Techniques In this research, we consider the potential for actuarial instruments to fill the predictive void. In particular, we will examine machine learning techniques, a new branch of statistical method made possible by advances in computer processing capacity. Machine learning uses computing to improve automatically (i.e., without human input at every stage) through ‘experience’. Machine learning is used within artificial intelligence (AI) procedures and though they are often portrayed in the media as one and the same, machine learning is in fact distinct from AI. In general, machine learning comes in one of three varieties: supervised, unsupervised or reinforcement
4.6 Evidence on Forecasting Domestic Abuse
49
learning (Jordan and Mitchell 2015). Supervised machine learning is the most commonly used form of the technique. It involves a human operator managing the algorithm(s) at every stage of the process by controlling inputs, reviewing outputs and adjusting accordingly. Unsupervised techniques have less human involvement in the control of inputs, often working with unlabelled or unstructured data. Reinforcement learning is a combination of the two techniques (see Jordan and Mitchell 2015, for a full review). Although the use of machine learning techniques for forecasting in criminal justice environments is relatively new, they have been enthusiastically adopted by some (Berk and Bleich 2013). However, the emerging discipline has not been exempt from criticism, with some researchers contesting that the new instruments are no better than the old (Yang et al. 2010; Liu et al. 2011; Tollenaar and van der Heijden 2013) The primary premise of these criticisms is the condition of suitable transformation of data in order to allow the more traditional logistic regression methods to be effective. In contemporary times, this is a significant condition. While theoretically not unreasonable, in practice, it is extremely common for data to exist in unsuitable conditions. Berk and Bleich (2013) also contended that it is not logical that major international companies such as Google, Amazon and Microsoft would be employing new statistical techniques in their business models if they were no better than existing methods. At the root of this problem is the trend of considering new statistical developments as mere enhancements of the traditional linear models, whereas in practice they are not. Berk and Bleich highlighted a key distinction relevant to our consideration of domestic abuse forecasting: …a key distinction between forecasting and explanation has been badly conflated in some accounts. Understanding a phenomena may lead to improved forecasting accuracy, or it may not, but forecasting and explanation are different enterprises that can work at cross purposes. (Berk and Bleich 2013, p. 3)
The inference stemming from this distinction has important implications. If accurate forecasts may be achieved based on more variables than only those with apparent correlative or explanatory relationships to the outcomes we seek to predict, then a multitude of additional data sources become available to us. The subsequent questions are (1) what methods may we use to seize such an opportunity? and (2) what degree of accuracy could such methods achieve? Maximising the accuracy of forecasts should be a primary goal of forecasting instruments in criminal justice settings, Berk and Bleich asserted, because the resulting decisions have real consequences for people’s lives. They concluded that adaptive machine learning techniques offer a superior alternative to logistic models owing to their ability to detect complex, non-linear patterns in datasets and set out a thorough framework for measuring forecasting tools against each other, comprising of (1) thorough establishment of what features are being compared, (2) comparisons based on data not used in the construction of the model, (3) appropriate comparison methods, (4) accurate characterisation, (5) comparable tuning parameter use and (6) close attention to practical interpretation. Using this framework, they compared the traditional logistic regression technique to two machine learning techniques – random forests and
50
4 The Existing Evidence
stochastic gradient boosting – and concluded that random forests offer the strongest and most flexible option. Random forests, an ensemble of classification trees (explained further in Chap. 8), offer all the primary benefits of machine learning methods, as described by numerous authors in recent times (Barnes and Hyatt 2012; Berk 2012; Breiman 2001). Random forests, unlike other forms of machine learning, are not limited to the forecasting of binary outcomes such as ‘yes or no’. They offer the ability to account for asymmetric costs such as in the case of criminology where serious crimes potentially have greater ‘costs’ than less serious crimes. They build regularisation into their core calculations and can cope with a vast number of predictor variables, potentially making good use of the vast amount of data held by police forces. Importantly, they can cope with imbalanced distributions, for example where events (such as homicides) are rare, whereas traditional tools such as linear regression work most effectively when the distribution is simpler.
4.6.3 P revious Use of Random Forests for Criminal Justice Forecasting The random forests technique has been used to construct criminal justice forecasts on several occasions in recent times. A summary of the examples of its use is worthy of consideration in the preparation of the forecasting methodology we set out in Chap. 8, so the following paragraphs summarise the main studies to have used the technique, highlighting the context, methodological application and predictive validity in each case. The first study to examine the random forests technique in a criminal justice setting was Berk et al. (2005), which attempted to develop a practical forecasting tool for the screening of domestic abuse incidents for the Los Angeles County Sheriff’s Department. The authors of the study, particularly Richard Berk, would go on to contribute significantly to the body of random forest forecasting research in subsequent years, and this study, which tested the use of a single CART (Classification and Regression Tree) method and random forests (a multiple CART method), was a primer for the studies to come. The authors collected data on potential predictors from 500 Los Angeles households to which officers were called out and used a small subset of these to build a screening tool which they retrospectively tested against known outcomes. The objective of the forecasting tool was to predict future instances of any kind of domestic abuse at households within 3 months of the forecast, but the study was beset by practical problems. The intention was to sample a large range of houses with both prior and no prior domestic abuse records, but implementation failed in this respect, and the final sample was heavily skewed toward houses with prior domestic records. Officers also failed to ask all the predictor questions required, resulting in listwise deletion being employed to deal with missing values in the data. Still, the CART model used was initially successful at identifying 66% of households with any new call for police service. The authors
4.6 Evidence on Forecasting Domestic Abuse
51
were concerned about overfitting (Breiman 2001) though. Overfitting is the term given when statistical models are too closely aligned to a limited set of data points. When exposed to the ‘real world’ an overfitted model will not replicate its testing performance. To address this concern the authors tested the random forests technique as an alternative. This technique achieved 59% accuracy, but by using ‘out-of- bag’ testing, a process whereby a portion of the dataset is held back from model training to be instead used for validating the model, the authors concluded this to be a far more robust and reliable instrument. In relation to domestic abuse cases, the two techniques were also approximately equal in forecasting accuracy. Berk, He and Sorenson’s paper contained many of the analytical points which became the hallmarks of later forecasting papers, including the use of confusion matrices to display the models results, the overt consideration of cost ratios between false negative (predictions of no domestic abuse that were wrong) and false positive (predictions of domestic abuse that were wrong) errors, and consideration of the impact of individual predictors. In Berk et al. (2006), the same technique comparisons (CART versus multiple CART) were made as in Berk et al. (2005), but this time to forecast which prisoners were likely to commit serious misconduct while in prison. This outcome was found to be generally rare in the studied population, inmates in California, with fewer than 3% committing serious misconduct in a 2-year period. Following the framework for forecasting analyses set out in the same year by Gottfredson and Moriarty (2006), the authors retained 1000 of their overall sample of 9662 for the purposes of testing their models. They then built forecasting models using logistic regression and CART techniques but found no notable improvement on the original marginal probability of 0.03. As with domestic abuse in Berk et al. (2005), the cost ratio was set to (1) one false negative (an offender being incorrectly forecast as committing no misconduct) having the same ‘cost’ as ten false positives (an offender being incorrectly forecast as committing misconduct), and (2) one false negative to five false positives. The results indicated that random forests produced more accurate forecasts, correctly forecasting 49% and 62%, respectively, of misconduct for the two cost ratios. Their analysis also highlighted several key predictor variables which enhanced the accuracy of the forecasts. Berk et al. (2009) broadly replicated the methodology of the two earlier papers on which Berk had led. Their forecasting objective was the prediction of murder among a population of probationers and parolees. This diversion to the most serious form of crime marked a step towards attempting to forecast extremely rare outcomes, and the authors emphasised the ‘high stakes’ element of this matter in their title. A common theme in Berk’s work is the attention paid to the practical and personal implications of the forecasts in question, which in this paper takes the form of a stark contrast drawn between the actual cost of a false negative (a homicide) and a false positive (wrongfully extended incarceration and overcrowding in prisons). The authors observed that forecasts will never be perfect, and so it is essential to pay attention to the balance of errors. In this respect, the paper counters arguments that the world would be a better place without statistical forecasting on the basis that their tool of choice (random forests) allows a structured process for the
52
4 The Existing Evidence
balance of errors to be accounted for whereas, in processes relying only on the subjectivity of individuals, no such overarching consideration can take place. The study’s target population was 60,000 cases from Philadelphia’s Adult Probation and Parole Department. The objective of the forecasting model was to predict the occurrence of a homicide or attempted homicide within 2 years of the beginning of community supervision. The authors again took care to explain (as in the two papers covered previously in this section) that predictor values require no causal link to the outcome object of the forecast, but they did highlight the practical importance of establishing a form of ‘common sense’ link to add to a sense of legitimacy among staff using the tool in practice. They also carefully considered the use of only information that officers would have readily available at the time they needed to run the forecast. The results of the random forest modelling were again set against the context of logistic regression performance. The latter resulted in a 99.7% error rate for the prediction of homicide. Using the same predictor information, random forests achieved a 57% error rate with little variation in performance when positive to negative cost ratios were adjusted between 7:1 and 12:1. When applied to test data, the model showed no indication of overfitting. Through the use of ‘importance plots’ (which demonstrate the contribution to overall predictive contribution of each predictor variable) and ‘partial response functions’ (the pattern of each predictor variable’s predictive validity), the authors also identified a small number of individual predictor variables as contributing substantially to the overall performance of the model, inter alia, age, age at first contact and the number of prior gun crimes. A version of the forecasting model was later used in a field experiment relating to supervision levels, in which it was employed to determine ‘low risk’ offenders as candidates for participation. In 2012, Berk published a Springer Brief in Computer Science entitled Criminal Justice Forecasts of Risk: A Machine Learning Approach, which presented a detailed treatise on the justification and methodology for applying random forest modelling to criminal justice forecasts. In the book, Berk drew on the example of Barnes and Hyatt’s (2012) work with the Philadelphia Adult Probation and Parole Department. This work expanded on the previous works published with the Philadelphia Department in great detail, developing a thorough ‘do’s and don’ts’ approach to the design and implementation of a random forest model. It also tracked model performance through various iterations, each updated as new data became available. Like Berk et al. (2009), Barnes and Hyatt tested models not only on ‘out-of-bag’ data but also against a totally independent ‘test dataset’. Their recommendations for implementation included notes about data access, outcome definition, predictor selection, cost ratios, tuning (which means the adjustment of sample sizes and other parameters which may change the performance of the model), validation and practical use in the field. These later informed the implementation in Urwin (2016) and the principles for responsible algorithm use set out in Oswald et al. (2018). Barnes and Hyatt paid particular attention to the potentially controversial selection of some predictor variables, such as offender ethnicity, and their implications for the legitimacy of such forecasts. Furthering previous discussions on this topic, their work
4.6 Evidence on Forecasting Domestic Abuse
53
highlights an important issue in forecasting, which Berk (2012) also emphasises: forecasting accuracy is not the ‘be all and end all’ of model performance. A model has to be politically acceptable and operationally viable to stand a chance at successful implementation. Four years later, Berk et al. (2016) published their development of a random forest forecasting instrument for domestic violence arraignment cases. Their objective was to determine whether a tool could be developed to usefully forecast the future dangerousness of domestic abuse offenders which may enable decision-makers to be better informed when deciding to release offenders or otherwise. They determined three outcomes for their model to forecast: (1) no arrests for domestic violence within 2 years, (2) a domestic violence arrest with no physical injury, within 2 years, and (3) a domestic violence arrest with physical injury, also within 2 years. Against a baseline situation of around 80% of those actually released at arraignment not being arrested for domestic violence within 2 years, the authors’ model correctly predicted no arrest 90% of the time, leading to the general conclusion that, if magistrates used the model, they could improve the failure rate of decisions by around half. By virtue of cost ratios, the model predicted the other two outcomes less efficiently, over-compensating its forecasts to avoid a high false negative rate. Accordingly, while 74% of all domestic violence with injury was correctly forecast, only 21% of the total forecasts made for that outcome turned out to be correct. This is a critical point which we will return to later when assessing the performance of our own model. The authors emphasised the use of readily available data, in effect recycling known information in a more efficient way by use of machine learning. Their analysis of 28,646 cases considered around 30 predictor variables, predominantly relating to an offender’s criminal history. Age and gender were the only personal characteristics included; ethnicity was excluded. The analysis also included segments on the relative importance of individual predictors to the overall performance of forecasts, but the authors made no attempt to refine the model, arguing that even a small boost to predictive power was relevant. This study represents the first published attempt to establish an algorithmic approach to domestic abuse forecasting, which we attempt to replicate in this Chap. 8. Until 2016, every published paper on the use of random forest modelling for criminal justice forecasts involved law enforcement agencies from the United States. In 2016, Sheena Urwin, a police staff professional studying in the Cambridge Police Executive Programme, working with Dr. Geoff Barnes, wrote a thesis on the development and application of such an instrument in Durham, England (Urwin 2016). The Durham Harm Assessment Risk Tool (HART) aimed to forecast the future dangerousness of arrested offenders presenting at custody suites. The composition of the forecasts was similar to those of Berk et al. (2016) and Barnes and Hyatt (2012) in that it presented three possible outcomes: no offence, any less- serious offence, or a serious offence; all within a 2-year follow-up timeframe from when the forecast was made. The composition of the forecasting model followed the by-now-standard methodology for such tools. The model was trained and validated on separate datasets, with an 8% drop in overall accuracy and a 20% drop in
54
4 The Existing Evidence
d angerous forecast (false negative) accuracy from training to testing. The latter accuracy drop prompted Urwin to emphasise the importance of regularly refreshing the model construction to account for changes in the operating context. Urwin’s HART model was conceived as the gateway triage tool to a deferred charge intervention. Offenders forecast as moderate risk (any less-serious offence within 2 years) would be eligible for a scheme known as Checkpoint in which they would be diverted from the normal criminal justice system on the proviso that they comply with specified conditions. Consequently, the balance of dangerous (incorrect forecasts of low risk) to cautious (incorrect forecasts of high risk) errors differed substantially from predecessor models because of the need to balance accuracy with the capacity of the Checkpoint programme. Previous random forest models (see Berk et al. 2005, 2006) employed a ratio of ten cautious errors (false positive) for every dangerous error (false negative). Urwin’s HART model used a ratio closer to 3:1. Conversely, the probability of a ‘very dangerous’ error (a serious re-offender forecast as having no risk) was just 2%, and this, Urwin argued, was sufficient to enable the Durham Chief Constable to have sufficient confidence in the model to put it into operation. At the same time, by effectively deliberately over-forecasting the risk of individuals, Urwin did not eschew the ethical dilemmas her algorithm produced. The Durham HART model was widely scrutinised in the British media (Baraniuk 2017; Burgess 2018), both in this respect and in terms of its fairness. Later revisions of the model removed the use of a sociodemographic classifier provided by the private company Experian due to complaints that it biased the model against less-affluent communities. Urwin’s thesis was also the first random forest analysis to test the model performance against clinical judgements. Urwin developed a test wherein custody sergeants were given cases that were also tested by the model. The subsequent level of agreement was then analysed. In general, the predictions of the two methods were notably different. In moderate-risk cases, police officers and the algorithm agreed around two thirds of the time, but this dropped to around half the time in low-risk cases and less than a quarter of the time in high-risk cases. Urwin concluded that, typically, police officers were more risk-averse than the algorithm when it came to risk forecasting.
4.6.4 Criticisms and Problems Promising though the use of random forests for criminal justice and even domestic abuse forecasts may appear to be, as a branch of actuarial instruments, and in particular as a machine learning technique, the method is subject to the same criticisms and problems as other instruments of the same type. At the superficial level, these can manifest in popular media as ‘big brother’ issues, in which a person’s hidden data play a disproportionate role in determining the consequences of their actions or the services available to them. In particular, critics have focused on the ethical and discriminatory aspects of particular variables, as was especially the case in response
4.6 Evidence on Forecasting Domestic Abuse
55
to the Durham HART model (Urwin 2016; Baraniuk 2017; Liberty 2019). As we propose a replication of random forest forecasting, it is pertinent to understand the main criticisms against and problems of actuarial criminal justice forecasting instruments identified by scholars and commentators thus far. Perhaps the most comprehensive recent discussion of issues facing actuarial instruments was presented by Gottfredson and Moriarty (2006), who argued that the promise of such tools was as yet unrealised because key assumptions were being ignored or contradicted. Reviewing the original work of Gottfredson and Gottfredson (1986), the authors summarised the main issues surrounding implementation of actuarial tools as (1) the use of unreliable data in tools, (2) failure to consider the base rate (base rate meaning the rate at which the outcomes to be forecast occur within the population), and (3) the incorrect application of weighting factors. They also highlighted a number of potential methodological concerns which we return to in Chap. 8. These include the establishment of a cross-validation sample, separate from the training dataset, the selection of appropriate measures of predictive accuracy, the consideration of static and dynamic variables and the inclusion of ‘administrative overrides’ for practical and ethical considerations. The issues raised by Gottfredson and Moriarty have been frequently touched upon in publications concerned with criminal justice forecasts and random forests in particular (see Berk 2008, 2012; Berk et al. 2009; Berk and Bleich 2013). Berk and Hyatt (2015) brought together many of the ideas from those papers in their response to ‘misinformed views’ regarding actuarial models. They focused on five main criticisms: the legitimacy of actuarial instruments, insufficient levels of predictive accuracy, the double counting of predictor variables, the inability of models to be dynamic and respond to changing circumstances, and the potential for the introduction or consolidation of racial biases in decision-making. The authors concluded that clinical models, although appealingly simple to implement, also suffer from these issues, and drew attention to strategies to overcome each problem in actuarial models. A primary issue of concern, specifically with random forests, is the ‘black box’ nature of the procedure, which refers to the ‘unknowable’ aspect of its calculations. In practice, a random forests algorithm calculates so many decision points (sometimes millions) that it is practically impossible to audit each forecast. This can leave practitioners uncomfortable, and critics claim that police do not really understand the decisions they are making (Berk and Hyatt 2015; Berk et al. 2016). The size of the calculations can also create practical implementation issues with securing the appropriate computer processing power or specialist software required to perform a forecast in a timeframe that does not inhibit police personnel from carrying out their duties (Barnes and Hyatt 2012). This latter issue has been highlighted as one of the key considerations for law enforcement agencies when attempting to integrate machine learning techniques into their practices (Ridgeway 2013).
56
4 The Existing Evidence
4.7 A Summary of the Evidence The literature we have summarised in this chapter presents a range of evidence for the existence of repeat domestic abuse as an important aspect of domestic abuse. Yet there is relatively little evidence for the trend of chronological escalation in severity. There are also a number of gaps in our understanding of serial perpetrators which presents a major obstruction to professional aims to target that group. Foremost among these obstructions is the lack of an agreed definition, but this is relatively simple to propose. More pressing is the need to obtain a more robust estimate of the prevalence of serial offenders, and then to describe and understand this subset of domestic abusers, particularly in comparison to other groups. Practitioners would likely find any additions to the evidence base in these respects helpful to their efforts to develop programmes targeting domestic abuse perpetrators. The literature indicates that serial perpetrators are a distinct group, albeit not in the majority. There is little to no understanding, though, whatever the size of the group, of their relative harmfulness. Calls for a register of these individuals – by means of which they would be tracked for life – reflect a perception that the group would consist of dangerous ‘predators’, but the potential impact of such a register has not yet been properly established. Assessments of future dangerousness in criminal justice settings have been discussed for decades, with many iterations of the task evolving. Currently, forecasting instruments can be categorised into three classifications: clinical, actuarial, or structured professional judgement, with the last of these being the most commonly used in contemporary domestic abuse practice. In both clinical and structured professional judgement tools, heuristics are particularly important influencers of forecasting outcomes and one of the reasons that many scholars argue that actuarial instruments could improve on them. In England and Wales in particular, evidence shows that there is inconsistent application of the present tools, and there is no evidence at all regarding their predictive validity with reference to future dangerousness. Modern statistical techniques, combined with the large datasets now available to police agencies, open the possibility of using machine learning techniques such as random forests. Several studies, predominantly involving Professor Richard Berk and law enforcement agencies in the USA, have successfully established forecasting tools based on this model, but so far only one study has examined domestic abuse forecasts (Berk et al. 2016), and only one study has examined the use of random forests in England and Wales (Urwin 2016). A number of issues exist which the implementation of machine-learning-based actuarial instruments must address to stand a chance of being successful. These include establishing a clear framework for legitimacy and legality, including an assessment of ethics, thorough validation, consideration of base rates and appropriate IT design to enable practical use by frontline practitioners.
References
57
References Ægisdóttir, S., White, M. J., Spengler, P. M., Maugherman, A. S., Anderson, L. A., Cook, R. S., Nichols, C. N., Lampropoulos, G. K., Walker, B. S., Cohen, G., & Rush, J. D. (2006). The meta-analysis of clinical judgment project: Fifty-six years of accumulated research on clinical versus statistical prediction. The Counselling Psychologist, 343, 341–382. Aldarondo, E., & Mederos, F. (Eds.). (2002). Programs for men who batter: Intervention and prevention strategies in a diverse society. Kingston: Civic Research Institute Inc. Anderson, M. A., Gillig, P. M., Sitaker, M., McCloskey, K., Malloy, K., & Grigsby, N. (2003). “Why doesn’t she just leave?” A descriptive study of victim reported impediments to her safety. Journal of Family Violence, 18, 151–155. Baraniuk, C. (2017). Durham Police AI to help with custody decisions. [Online] bbc.co.uk. Available at https://www.bbc.co.uk/news/technology-39857645. Accessed 19th Feb 2019. Barnes, G., & Hyatt, J. M. (2012). Classifying adult probationers by forecasting future offending. Washington, DC: National Institute of Justice. Barnham, L., Barnes, G. C., & Sherman, L. W. (2017). Targeting escalation of intimate partner violence: Evidence from 52,000 offenders. Cambridge Journal of Evidence-Based Policing, 1, 1–27. Barnish, M. (2004). Domestic violence: A literature review: Summary. London: HM Inspectorate of Probation. Berk, R. A. (2008). Statistical learning from a regression perspective (Vol. 14). New York: Springer. Berk, R. (2012). Criminal justice forecasts of risk: A machine learning approach. New York: Springer. Berk, R. A., & Bleich, J. (2013). Statistical procedures for forecasting criminal behavior: A comparative assessment. Criminology & Public Policy, 12, 513. Berk, R. & Hyatt, J. (2015). Machine learning forecasts of risk to inform sentencing decisions. Federal Sentencing Reporter, 27(4), 222–228. Berk, R. A., He, Y., & Sorenson, S. B. (2005). Developing a practical forecasting screener for domestic violence incidents. Evaluation Review, 294, 358–383. Berk, R. A., Kriegler, B., & Baek, J. H. (2006). Forecasting dangerous inmate misconduct: An application of ensemble statistical procedures. Journal of Quantitative Criminology, 222, 131–145. Berk, R., Sherman, L., Barnes, G., Kurtz, E., & Ahlman, L. (2009). Forecasting murder within a population of probationers and parolees: A high stakes application of statistical learning. Journal of the Royal Statistical Society: Series A (Statistics in Society), 1721, 191–211. Berk, R. A., Sorenson, S. B., & Barnes, G. (2016). Forecasting domestic violence: A machine learning approach to help inform arraignment decisions. Journal of Empirical Legal Studies, 131, 94–115. Best, J., & Luckenbill, D. F. (1996). Careers in deviance and respectability. In D. F. Greenberg (Ed.), Criminal careers (pp. 3–14). Brookfield: Dartmouth. Bland, M., & Ariel, B. (2015). Targeting escalation in reported domestic abuse: Evidence from 36,000 callouts. International Criminal Justice Review, 251, 30–53. https://doi. org/10.1177/1057567715574382. Bocko, S., Cicchetti, C., Lempicki, L., & Powell, A. (2004). Restraining order violators, corrective programming and recidivism. Boston: Office of the Commissioner of Probation. Breiman, L. (2001). Random forests. Machine Learning, 45, 5–32. Brisson, N. J. (1981). Battering husbands: A survey of abusive men. Victimology: An International Journal, 6, 338–344. Burgess, E. W. (1928). Factors determining success or failure on parole, Part IV of A.A. Bruce et al., The workings of the indeterminate sentence law and the Parole system in Illinois. Springfield: The Board of Parole.
58
4 The Existing Evidence
Burgess, M. (2018). UK police are using AI to inform custodial decisions – But it could be discriminating against the poor. [Online] Available at: https://www.wired.co.uk/article/police-aiuk-durham-hart-checkpoint-algorithm-edit. Accessed 15th Jan 2019. Buzawa, E., Hotaling, G., Klein, A., & Byrnes, J. (1999). Response to domestic violence in a pro- active court setting, final report. Washington, DC: US Department of Justice. Campbell, J. C. (1995). Assessing dangerousness: Violence by sexual offenders, batterers, and child abusers. Newbury Park: Sage. Campbell, J. C. (2005). Assessing dangerousness in domestic violence cases: History, challenges, and opportunities. Criminology & Public Policy, 44, 653–672. Campbell, J. C. (2007). Prediction of homicide of and by battered women. Assessing dangerousness: Violence by batterers and child abusers, 2. Campbell, J. C., Glass, N., Sharps, P. W., Laughon, K., & Bloom, T. (2007). Intimate partner homicide: Review and implications of research and policy. Trauma, Violence & Abuse, 83, 246–269. Cantos, A. L., & O’Leary, K. D. (2014). One size does not fit all in treatment of intimate partner violence. Partner Abuse, 52, 204–236. Cantos, A. L., Goldstein, D. A., Brenner, L., O’Leary, K. D. & Verborg, R. (2015). Correlates and Program Completion of Family Only and Generally Violent Perpetrators of Intimate Partner Violence. Behavioural Psychology/Psicologia Conductual, 23(3). Cavanaugh, M. M. & Gelles, R. J. (2005). The utility of male domestic violence offender typologies: New directions for research, policy, and practice. Journal of interpersonal violence, 20(2), pp.155–166. Chalkley, R., & Strang, H. (2017). Predicting domestic homicides and serious violence in Dorset: A replication of Thornton’s Thames Valley analysis. Cambridge Journal of Evidence-Based Policing, 12-3, 81–92. Chambers-McClellan, A. (2002). Evidence for the escalation of domestic violence in 911 call records. Doctoral dissertation, Medical College of Georgia. Crawford, M., & Gartner, R. (1992). Women killing: Intimate femicide in Ontario, 1974–1990. Toronto: Women’s Directorate, Ministry of Social Services. Dixon, L., & Browne, K. (2003). The heterogeneity of spouse abuse: A review. Aggression and Violent Behavior, 81, 107–130. Dutton, D. G., & Kerry, G. (2002). Modus operandi and personality disorders in incarcerated spousal killers. Journal of Psychiatric Practice, 8(4), 216–228. Edelstein, A. (2016). Rethinking conceptual definitions of the criminal career and serial criminality. Trauma, Violence & Abuse, 171, 62–71. Egger, S. A. (1985). An analysis of the serial murder phenomenon and the law enforcement response. PhD dissertation, Sam Houston State University. Elbow, M. (1977). Theoretical considerations of violent marriages. Social Casework, 589, 515–526. Elisha, E., Idisis, Y., Timor, U., & Addad, M. (2010). Typology of intimate partner homicide: Personal, interpersonal, and environmental characteristics of men who murdered their female intimate partner. International Journal of Offender Therapy and Comparative Criminology, 544, 494–516. Feld, S. L., & Straus, M. A. (1989). Escalation and desistance of wife assault in marriage. Criminology, 27(1), 141–162. Feld, S. L., & Straus, M. A. (1990). Escalation and desistance of wife assault in marriage. In M. A. Straus & R. J. Gelles edited with the assistance of C. Smith (Eds.), Physical violence in American families: Risk factors and adaptations to violence in 8,145 families (pp. 489–505). New Brunswick: Transaction Publishers. Felson, R. B., & Paré, P. P. (2005). The reporting of domestic violence and sexual assault by nonstrangers to the police. Journal of Marriage and Family, 673, 597–610. Felson, R., Ackerman, J., & Gallagher, C. (2005). Police intervention and the repeat of domestic assault. Criminology, 433, 563–588. Frieze, I. H., & Browne, A. (1989). Violence in marriage. In L. E. Ohlin & M. H. Tonry (Eds.), Family violence. Chicago: University of Chicago Press.
References
59
Giles-Sims, J. (1983). Wife battering: A systems theory approach. New York: Guilford Press. Gondolf, E. W. (1988). Who are these guys? Toward a behavioural typology of batterers. Violence and Victims, 3, 187–203. Gondolf, E.W. (1998). The victims of court-ordered batterers: Their victimization, helpseeking, and perceptions. Violence Against Women, 4(6), pp.659–676. Gottfredson, S. D., & Moriarty, L. J. (2006). Statistical risk assessment: Old problems and new applications. Crime and Delinquency, 521, 178–200. Gottman, J. M., Jacobson, N. S., Rushe, R. H., Shortt, J. W., Babcock, J., La Taillade, J. J., & Waltz, J. (1995). The relationship between heart rate reactivity, emotionally aggressive behaviour and general violence in batterers. Journal of Family Psychology, 93, 227–248. Goussinsky, R., & Yassour-Borochowitz, D. (2012). “I killed her, but I never laid a finger on her” – A phenomenological difference between wife-killing and wife-battering. Aggression and Violent Behavior, 176, 553–564. Gracia, E. (2004). Unreported cases of domestic violence against women: Towards an epidemiology of social silence, tolerance, and inhibition. Journal of Epidemiology and Community Health, 58(7), 536–537. Grogger, J., Ivandic, R., & Kirchmaier, T. (2020, February). Comparing conventional and machine-learning approaches to risk assessment in domestic abuse cases (CEP Discussion Paper No 1676). Grove, W. M., Zald, D. H., Lebow, B. S., Snitz, B. E., & Nelson, C. (2000). Clinical versus mechanical prediction: A meta-analysis. Psychological Assessment, 121, 19. Hamberger, L. K., & Hastings, J. E. (1988). Skills training for treatment of spouse abusers: An outcome study. Journal of Family Violence, 32, 121–130. Hamberger, L. K., Lohr, J. M., Bonge, D., & Tolin, D. F. (1996). A large sample empirical typology of male spouse abusers and its relationship to dimensions of abuse. Violence and Victims, 11, 277–292. Harcourt, B. E. (2014). Risk as a proxy for race: The dangers of risk assessment. Federal Sentencing Reporter, 27, 237. Hastie, T., Tibshirani, R., & Friedman, J. (2009). The elements of statistical learning: Prediction, inference and data mining. New York: Springer. Hazelwood, R., & Burgess, A. W. (1987). An introduction to the serial rapist: Research by the FBI. FBI Law Enforcement Bulletin, 56(9), 16–24. Her Majesty’s Inspectorate of the Constabulary, Fire and Rescue Services, (2014a). Everyone’s business: Improving the police response to domestic violence. [Online] Retrieved from https:// www.justiceinspectorates.gov.uk/hmicfrs/wp-content/uploads/2014/04/improving-the-policeresponse-to-domestic-abuse.pdf [accessed 15th October 2016]. Her Majesty’s Inspectorate of the Constabulary, Fire and Rescue Services. (2015). Increasingly everyone’s business: A progress report on the police response to domestic abuse. [online] https://www.justiceinspectorates.gov.uk/hmicfrs/publications/increasingly-everyones-business-a-progress-report-on-the-police-response-todomestic-abuse/ [accessed 7th May 2017]. Hester, M. (2013). Who does what to whom? Gender and domestic violence perpetrators in English police records. European Journal of Criminology, 105, 623–637. Hester, M. & Westmarland, N. (2006). Domestic violence perpetrators. Criminal Justice Matters, 66(1), pp. 34–35. Holmes, R. M., & DeBurger, J. E. (1998). Profiles in terror: The serial murderer. In R. M. Holmes & S. T. Holmes (Eds.), Contemporary perspectives on serial murder (pp. 5–16). Thousand Oaks: Sage. Holtzworth-Munroe, A., & Stuart, G. L. (1994). Typologies of male batterers: Three subtypes and the differences among them. Psychological Bulletin, 1163, 476. Jacobson, N. S., & Gottman, J. M. (1998). When men batter women: New insights into ending abusive relationships. Simon and Schuster. Johnson, M. P. (1995). Patriarchal terrorism and common couple violence: Two forms of violence against women. Journal of Marriage and the Family, 57, 283–294.
60
4 The Existing Evidence
Johnson, M. P., (2006). Conflict and control: Gender symmetry and asymmetry in domestic violence. Violence Against Women, 12, 1003–1018. Johnson, M. P., & Ferraro, K. J. (2000). Research on domestic violence in the 1990s: Making distinctions. Journal of Marriage and Family, 624, 948–963. Jordan, M. I., & Mitchell, T. M. (2015). Machine learning: Trends, perspectives, and prospects. Science, 3496245, 255–260. Kahneman, D. (2011). Thinking, fast and slow. New York: Macmillan. Kahneman, D. & Klein, G. (2009). Conditions for intuitive expertise: a failure to disagree. American psychologist, 64(6), p.515. Kasturirangan, A., Krishnan, S., & Riger, S. (2004). The impact of culture and minority status on women’s experience of domestic violence. Trauma, Violence & Abuse, 54, 318–332. Kerr, J., Whyte, C., & Strang, H. (2017). Targeting escalation and harm in intimate partner violence: Evidence from Northern Territory Police, Australia. Cambridge Journal of Evidence- Based Policing, 1, 1–17. Klein, A. (1996). Reabuse in a population of court restrained male batterers. In E. Buzawa & C. Buzawa (Eds.), Do arrest and restraining orders work? (pp. 192–214). Thousand Oaks: Sage. Klein, A., & Tobin, T. (2008). Longitudinal study of arrested batterers, 1995–2005: Career criminals. Violence Against Women, 14(2), 136–157. Klein, A., Wilson, D., Crowe, A., & DeMichele, M. (2005). Evaluation of the Rhode Island probation specialized domestic violence supervision unit [NCJ 222912]. Retrieved from http://www. ncjrs.gov/App/Publications/abstract.aspx?ID=244821 Kocsis, R. N. & Irwin, H. J. (1998). The psychological profile of serial offenders and a redefinition of the misnomer of serial crime. Psychiatry, Psychology and Law, 5(2), pp.197–213. Kocsis, R. N., Cooksey, R. W., & Irwin, H. J. (2002). Psychological profiling of offender characteristics from crime behaviors in serial rape offences. International Journal of Offender Therapy and Comparative Criminology, 462, 144–169. Kropp, P. R. (2004). Some questions regarding spousal assault risk assessment. Violence Against Women, 106, 676–697. Liberty. (2019). Liberty report exposes police forces’ use of discriminatory data to predict crime. [Online]. https://www.libertyhumanrights.org.uk/news/press-releases-and-statements/libertyreport-exposes-police-forces’-use-discriminatory-data-0. Accessed 4th Mar 2019. Litwack, T. R., & Schlesinger, L. B. (1999). Dangerousness risk assessments: Research, legal, and clinical considerations. In A. Hess & I. Weiner (Eds.), The handbook of forensic psychology (pp. 171–217). New York: Wiley. Liu, Y. Y., Yang, M., Ramsay, M., Li, X. S., & Coid, J. W. (2011). A comparison of logistic regression, classification and regression tree, and neural networks models in predicting violent re- offending. Journal of Quantitative Criminology, 274, 547–573. Lloyd, S., Farrell, G., & Pease, K. (1994). Preventing repeated domestic violence: A demonstration project on Merseyside. London: Home Office Police Research Group. Loinaz, I. (2014). Typologies, risk and recidivism in partner-violent men with the B-SAFER: A pilot study. Psychology, Crime & Law, 202, 183–198. Malach-Pines, A. (2002). Falling in love: How we choose the lovers we choose. New York: Taylor and Francis Group. Maxwell, C. D., Garner, J. H., & Fagan, J. A. (2002). The preventive effects of arrest on intimate partner violence: Research, policy and theory. Criminology & Public Policy, 2(1), 51–80. Meehl, P. (1954). Clinical versus statistical prediction: A theoretical analysis and a review of the evidence. Minneapolis: University of Minnesota Press. Milner, J. S., Campbell, J. C., & Messing, J. T. (2017). Prediction issues for practitioners. In J. C. Campbell & J. T. Messing (Eds.), Assessing dangerousness: Domestic violence offenders and child abusers (pp. 33–54). New York: Springer. Mintz, E. (1980). Obsession with the rejecting beloved. Psychoanalytic Review, 67, 479–492. Mitchell, B. A. (1997). The etiology of serial murder: Towards an integrated model. Cambridge: University of Cambridge.
References
61
Moffitt, T. E. (1993). Adolescence-limited and life-course-persistent antisocial behavior: A developmental taxonomy. Psychological Review, 1004, 674. Okun, L. (1986). Woman abuse: Facts replacing myths. New York: Albany State University of New York Press. Oswald, M., Grace, J., Urwin, S., & Barnes, G. C. (2018). Algorithmic risk assessment policing models: Lessons from the Durham HART model and ‘experimental’ proportionality. Information and Communications Technology Law, 272, 223–250. Pagelow, M. D. (1981). Woman-battering: Victims and their experiences. Beverly Hills: Sage. Petersson, J., & Strand, S. (2017). Recidivism in intimate partner violence among antisocial and family-only perpetrators. Criminal Justice and Behaviour, 4411, 1477–1495. Piquero, A. R., Brame, R., Fagan, J., & Moffitt, T. E. (2006). Assessing the offending activity of criminal domestic violence suspects: Offense specialization, escalation, and de-escalation evidence from the Spouse Assault Replication Program. Public Health Reports, 121, 409. Quinsey, V. L., Harris, G. T., Rice, M. E., & Cormier, C. A. (1998). Violent offenders: Appraising and managing risk. Washington, DC: American Psychological Association. Richards, L. (2006, Autumn). Homicide prevention: Findings from the multi-agency domestic violence homicide review. The Journal of Homicide and Major Incident Investigation, 2(2). ACPO: Centrex. Richards, L., Letchford, S. & Stratton, S. (2008). Policing Domestic Violence. Oxford. Blackstone’s Practical Policing, Oxford University Press. Ridgeway, G. (2013). Linking prediction and prevention. Criminology & Pub. Pol’y, 12, p.545. Robinson, A. L. (2017). Serial domestic abuse in Wales: An exploratory study into its definition, prevalence, correlates, and management. Victims and Offenders, 125, 643–662. Robinson, A. L., Myhill, A., Wire, J., Roberts, J., & Tilley, N. (2016). Risk-led policing of domestic abuse and the DASH risk model. In What works: Crime reduction research. Cardiff/London: Cardiff University, College of Policing and UCL Department of Security and Crime Science. Saunders, D. G. (1992). A typology of men who batter women: Three types derived from cluster analysis. American Orthopsychiatry, 62, 264–275. Sherman, L. W. (1992). Policing domestic violence: Experiments and dilemmas. New York: Free Press. Sherman, L. W., & Berk, R. A. (1984). The specific deterrent effects of arrest for domestic assault (American sociological review, pp. 261–272). Ann Arbor: Inter-University Consortium for Political and Social Research. Sherman, L., Neyroud, P. W., & Neyroud, E. (2016). The Cambridge crime harm index: Measuring total harm from crime based on sentencing guidelines. Policing: A Journal of Policy and Practice, 103, 171–183. Smith, K., Flatley, J., Coleman, K., Osborne, S., Kaiza, P., & Roe, S. (2010). Homicides, firearms offenses and intimate violence 2008/09 (Home Office Statistical Bulletin 01/10). London: Home Office. Stark, E. (2007). Coercive control: How men entrap women in everyday life. New York: Oxford University Press. Stoops, C., Bennett, L., & Vincent, N. (2010). Development and predictive ability of a behaviour- based typology of men who batter. Journal of Family Violence, 253, 325–335. Stout, K. D. (1993). Intimate femicide: A study of men who have killed their mates. Journal of Offender Rehabilitation, 193-4, 81–94. Straus, M. A. (1990). Injury and frequency of assault and the “Representative sample fallacy” in measuring wife beating and child abuse. In M. A. Straus & R. J. Gelles edited with the assistance of C. Smith (Eds.), Physical violence in American families: Risk factors and adaptations to violence in 8,145 families (pp. 75–91). New Brunswick: Transaction Publishers. Svalin, K., Mellgren, C., Torstensson Levander, M., & Levander, S. (2017). The inter-rater reliability of violence risk assessment tools used by police employees in Swedish police settings. Nordisk Politiforskning, 1, 4. Thornton, S. (2017). Police attempts to predict domestic murder and serious assaults: Is early warning possible yet? Cambridge Journal of Evidence-Based Policing, 1, 1–17.
62
4 The Existing Evidence
Tollenaar, N., & Van der Heijden, P. G. M. (2013). Which method predicts recidivism best?: A comparison of statistical, machine learning and data mining predictive models. Journal of the Royal Statistical Society: Series A (Statistics in Society), 1762, 565–584. Turner, E., Medina, J., & Brown, G. (2019). Dashing hopes? The predictive accuracy of domestic abuse risk assessment by the police. The British Journal of Criminology, 59(5), azy074. Tversky, A., & Kahneman, D. (1975). Judgment under uncertainty: Heuristics and biases. In Utility, probability, and human decision making (pp. 141–162). Dordrecht: Springer Netherlands. Tweed, R. G., & Dutton, D. G. (1998). A comparison of impulsive and instrumental subgroups of batterers. Violence and Victims, 133, 217–230. Urwin, S. (2016). Algorithmic forecasting of offender dangerousness for police custody officers: An assessment of accuracy for the Durham constabulary model. Master thesis. University of Cambridge, Wolfson College. Walby, S. (2005). Improving the statistics on violence against women. Statistical Journal of the United Nations Economic Commission for Europe, 223(4), 193–216. Walby, S., & Allen, J. (2004). Domestic violence, sexual assault and stalking: Findings from the British Crime Survey. London: Home Office. Walker, L. E. (1979). The battered woman. New York: Harper and Row. Walker, L. E. (1984). The battered woman syndrome. New York: Springer. Wire, J., & Myhill, A. (2018). Piloting a new approach to domestic abuse frontline risk assessment. Evaluation Report for the College of Policing [online] https://whatworks.college.police. uk/Research/Documents/DA_risk_assessment_pilot.pdf. Accessed 4th Mar 2019. Yang, M., Liu, Y., & Coid, J. (2010). Applying neural networks and other statistical models to the classification of serious offenders and the prediction of recidivism. Ministry of Justice Research Series, pp. 610.
Chapter 5
Measuring Harm
5.1 What Is Harm and How Is It Measured? Meaningful analysis of domestic abuse aimed at improving targeting strategies needs to differentiate between crimes based on their relative ‘harm’. In a criminological context ‘harm’ is traditionally described as an emotional, psychological, financial, societal or physical impact (see Adler 2001 or Sparrow 2008 for examples of discussions about the definition of harm in the context of crime). Accordingly, a number of harm measurement frameworks and tools have emerged in the last three decades. The underlying premise for these instruments is this: not all crimes are the same and treating them as such skews analysis and interpretation of policing issues, leading to the misallocation of resources and false negative results in intervention evaluations (Sherman 2007, 2013; Sherman et al. 2016a). Any promise that police data holds is undermined in the absence of a harm measurement instrument or with the selection of an inappropriate one. We have indicated the need for such a tool in the analysis of domestic abuse in earlier chapters. Not all domestic abuse is the same; within the high volume are smaller numbers of severe and serious cases (Bland and Ariel 2015; Barnham et al. 2017; Kerr et al. 2017; ONS 2017, 2018), yet as explored in Chap. 2, the police response, marked by reduced capacity, invests relatively high resources in all cases. Analysing only aggregated trends in police data will offer at best only a limited remedy to this problem, whereas viewing the data through a lens which differentiates by harm could be the key to answering many research questions of substantial practical value. If we can filter the most harmful cases, we stand a better chance of understanding them, designing treatments for them, and possibly even forecasting them before they become harmful. Such a cohort may also offer the best potential for detecting effects in future domestic abuse experiments (see Sherman 2007, regarding the promise of ‘the power few’).
© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2020 M. P. Bland, B. Ariel, Targeting Domestic Abuse with Police Data, https://doi.org/10.1007/978-3-030-54843-8_5
63
64
5 Measuring Harm
For over two decades, criminologists have debated the concept of the measurement of harm, and its potential role at the heart of anti-crime strategies. In recent years the debate has shifted into the professional arena and is now characterised by policing strategies targeting ‘vulnerability’ (College of Policing 2018; HMICFRS 2015) as well as calls from senior government officials for the adoption of ‘public health models’ to deal with specific issues such as knife crime (see The Independent 2018). At the core of these debates is the definition of ‘harm’ and its practical dependencies, such as how it is measured and translated into practice. These are complex issues that do not currently have wholly satisfactory answers, with the debate around the concept of harm ranging across a wide spectrum of views. At one extreme, some critical criminologists challenge the very nature of our understanding of crime, claiming that it has no substantive ontology (Hillyard and Tombs 2007). This argument centres on the proposal that ‘harm’ is a more appropriate target for service delivery and research than the restrictive notion of crime. It suggests that the majority of police-recorded crime is ‘petty’ and inconsequential in terms of harm, while many of the more serious harms in society are omitted (Box 1983). This end of the harm debate spectrum is highly theoretical, to the point of impracticality, making it so obscure to practitioners as to be virtually invisible. This is not to say that it is invalid, but in terms of identifying a lens of harm with which we may realise the potential power of police data, it is hardly helpful. However, this is not the full extent of scholarly discourse on harm and its potential influence on law enforcement activities. Sparrow (2008) specifically focuses on the measurement of harm at a practical level, drawing on invented scenarios to illustrate his points. One of Sparrow’s central recommendations is to ‘pick important problems and deal with them’ (Sparrow 2008, p. 5), which is a variation on the theme of Sherman’s evidence-based targeting (Sherman 2013). Sparrow notes that successful practitioners take time to understand where risk (as a construct of potential harm) is concentrated. The missing link to which Sparrow dedicates a considerable number of pages, is the role of analysis in the quantification and subsequent definition of harm problems. Drawing on Goldstein (1990), Sparrow cogently argues for a focus on the ‘middle-layer’ of issues – that which lies between the response to individual incidents and thematic strategies. This is precisely the domain of this research and explains why an entire chapter is devoted to the identification of the right instrument with which to measure harm. This chapter chronologically details the development of harm-measurement tools in the last three decades, paying specific attention to the methodology of their development. In this respect, the tests summarised in Sherman et al. (2016a) are used for the purposes of benchmarking. These tests are designed on the basis that, to be practically effective, any harm-measurement instrument must satisfy the following criteria: 1. The democracy test: Does the instrument satisfy conflicting arguments via democratic means? 2. The reliability test: Can the measure be consistently applied to different units of analysis, remaining consistent over time?
5.1 What Is Harm and How Is It Measured?
65
3. The cost test: Is the instrument available at no or low cost? In this chapter, the third test is used as a proxy measure for ‘practicability’ by assuming that, if the tool is available at no or low cost, then it must be easy to implement. Our concern here is not for the difficulty of this research, but for its potential replicability among practitioners. Bland and Ariel (2015) utilised the Cambridge Crime Harm Index (CCHI), for which these three tests were conceived (see Sherman et al. 2016a), but for which no thorough selection process was undertaken. Since then, other tools have emerged to complement or compete with the CCHI and the tools which preceded it, and so here we will critically and objectively evaluate each of the possible options and draw a conclusion regarding which harm-measurement instrument to proceed with in the subsequent analysis.
5.1.1 Review of Harm Measurement Tools Before considering proposals for specific instruments, it is worthwhile briefly reviewing the history of the development of such tools to better refine the method of selection. Four broad categories of ‘crime harm’ measurement tool have emerged in criminological research: (1) public perception–based indices; (2) cost of crime indices; (3) sentencing-weighted indices; and (4) theoretical constructs. In this section, we will review the development of each, beginning with the seminal work of Sellin and Wolfgang (1964). At the outset of this brief history, we must emphasise the fact that there is no consistent definition of ‘harm’ shared by these tools because none has been agreed on in previous research. Many of the tools examined purport to measure ‘severity’ or ‘crime seriousness’, and for the purpose of this research we treat these as proxy terms for harm on the following logic: if crime X is more serious or severe than crime Y, then it follows that X is more harmful. This logic necessitates the precondition that the tool uses the concept of ‘harm’ as a key influencing factor in the determination of seriousness, and specific attention is paid to this in the evaluation of the specific tools. 5.1.1.1 Public Perception–Based Tools In 1931, Thorsten Sellin identified that aggregating simple counts of crime was a poor measure of criminality: Criminal statistics have not yet reached a uniformly high stage of development, however, and this in part accounts for the frequency with which they are abused. (Sellin 1931, p. 10)
It was not for another 33 years, however, that Sellin and his colleague Marvin Wolfgang would develop the first major instrument to differentiate between the seriousness of different forms of crime by an empirical mechanism and present it as a
66
5 Measuring Harm
Table 5.1 Sellin and Wolfgang’s (1964) severity typology Class I A Bodily injury B Property theft C Property damage
Class II D Intimidation with threat of violence E Intimidation with threat of damage F Primary victimisation (to a person) G Secondary victimisation (to a business or organisation) H Tertiary victimisation (e.g., to the state or community) I Mutual victimisation (e.g., adultery, other consensual illegal acts) J No victimisation (juvenile offences)
supplement to simple aggregated counts (Sellin and Wolfgang 1964). Their method was to sample students, judges and police officers, asking them to rate the severity of 141 different criminal scenarios which mirrored the Federal Bureau of Investigation’s Uniform Crime Reports, the prevailing measure of crime seriousness in the USA at the time. Sellin and Wolfgang’s framework comprised two major classes of crime, subdivided into subclasses (see Table 5.1). For each of their 141 scenarios classified within these codes, Sellin and Wolfgang developed vignettes. These were presented to their participants, who were asked to rate them. But their vignettes were the source of much of the early criticism of Sellin and Wolfgang’s work. Rose (1966) highlighted that the vignettes were often inconsistently presented, and the individual characteristics of victims or perpetrators were vital to the perception of seriousness because of the role of public stereotyping. This early critique drove right to the heart of a crucial matter in severity measurement; the issue of subjectivity in the perception of seriousness. This issue has been the subject of much scholarly debate throughout the years since Sellin and Wolfgang’s work and agreement remains elusive (see Cohen 1988; O’Connell and Whelan 1996; Rossi et al. 1985; Stylianou 2003; Sherman et al. 2016a for examples of this debate). In Wolfgang et al.’s (1985) follow-up work, in which the National Survey for Crime Seriousness was developed, the authors rejected Rose’s criticism. However, their rejection appears somewhat selective, being based on a study of 206 students comparing six scenarios which concluded that information about intent and culpability formed a critical component of a person’s judgement of severity (Riedel 1975). Reliance on a single small-sample experiment for such an important finding is highly questionable, and indeed the study’s conclusion was questioned by Sebba (1984) and later contradicted in a larger experiment conducted in Israel (Fishman et al. 1986). Nonetheless, Sellin and Wolfgang’s work in 1964, and later Wolfgang, Figlio, Tracy and Singer’s work in 1985, became the benchmarks for the measurement of crime severity from the 1970s to the 1990s, spawning a number of replications and spin-offs (Akman and Normandeau 1968; Blumstein 1974; Epperlein and Nienstedt 1989; Fleming 1981; Lynch and Danner 1993; Parton et al. 1991; Rossi et al. 1974). While the findings and theoretical debates may have varied in this body of work, a consistent set of characteristics have emerged which demarks this strand of harm measurement tools from the other three we will discuss and continues to pervade the
5.1 What Is Harm and How Is It Measured?
67
overall debate. The public perception–based strand of instruments is characterised by three main aspects, all of which are relevant to the potential selection of such a tool for this research: (1) questionnaire design, (2) levels of measurement and (3) ‘additivity’. Stylianou (2003) gives an excellent full description of these, but it is worth summarising the main points here before evaluating the potential use of this type of tool for the purposes of this research. Firstly, the issue of questionnaire design is fundamental to perception-based instruments. Much prior research has found similarities in the ways in which different groups of individuals conceptualise seriousness (McCleary et al. 1981; Pontell et al. 1985; Rossi et al. 1974). Warr (1989) distilled these factors into a simpler equation: seriousness as a product of perceived harmfulness and perceived wrongfulness. Other researchers already disagreed with this on a fundamental conceptual level (Blum-West 1985; Hansel 1987), and the matter at hand here can be described as follows: if scholars cannot agree on a definition of seriousness, and further still can demonstrate that external factors such as stereotypes may colour perceptions, then how definitive and representative can the output of severity surveys ever be? This is compounded somewhat by the fact that, by definition, this type of measurement instrument draws on probability samples, and in most of the literature to date, most of those samples consist of university students. Therefore, the selection of a tool from the public perception–based strand must carefully consider the structure of the questionnaire on which the output is based, specifically with regard to: 1. the typology of, and balance between the scenarios presented, 2. the constitution of the population from which the sample is drawn, and 3. the order and presentation of the questions. The second important consideration is the method by which the instrument measures severity. Researchers working with public-perception tools have three primary methods: (1) ordinal/categorical scales, (2) magnitude estimation scales and (3) matched-pairs comparisons. Sellin and Wolfgang (1964) favoured magnitude estimation; the researchers invited participants to judge severity by comparing each scenario to a benchmark scenario (e.g., twice as serious, one hundred times as serious), arguing that this better reflected the construct of the participant panel, and not that of the person designing the measurement method. Their thoughts on this subject have been echoed many times (Bridges and Lisagor 1975; Evans and Scott 1984; Figlio 1975; Rossi and Henry 1980; Wolfgang et al. 1985), yet the method has been challenged by Miethe (1991) and Parton et al. (1991), who cautioned that it would be necessary to train participants in order to ensure the reliability of the measure. Fishman et al. (1986) also remarked on the training and level of competency required by panel members. In the consideration of the appropriate model for this research, the measurement scale is critical. The relative magnitude of harm is an essential component of such a tool if this research is to properly expose trends in harm concentration, escalation and more. This dilemma is best illustrated by means of a simple example. Is a ‘common assault’ domestic crime, wherein the victim is assaulted but sustains no injury, less serious than a ‘grievous bodily harm’ crime in which the victim suffers a serious physical injury? The latter is more serious by our
68
5 Measuring Harm
reckoning – but by how much? Is it twice as serious? Ten times as serious? A hundred times as serious? This detail matters in practice: how many common assault crimes constitute the harm of one grievous bodily harm crime? This leads us to the third important consideration among perception-based tools, which Stylianou (2003) labels ‘additivity’. Sellin and Wolfgang’s original premise (1964) was that two crimes of the same kind, committed either repeatedly or at the same time, were empirically equivalent to separate instances of the crime (e.g., committed by different people). Though this view was supported by Wellford and Wiatrowski (1975), it was challenged to varying degrees by Pease et al. (1974), Wagner and Pease (1978) and Gottfredson et al. (1980) on the basis of interactive considerations on the part of panel participants. Ignatans and Pease (2015), in developing a proposal for a UK harm index based on victims’ perceptions of seriousness as reported in the Crime Survey of England and Wales, associated ‘additivity’ with people’s judgements of seriousness, harm, and culpability. They argued that, in surveys which sample both single and chronic victims, the relative seriousness is factored into the output weighting. This is another key test in selecting the right tool for this research; while individual classification of harm is desirable (taking into account each individual circumstance of the victim and offender), it is unlikely to be practicable. Therefore, an assumption of broad ‘additivity’ is important to keep in mind when selecting the tool to ensure consistency of measurement and practicability (two of the three key tests set out by Sherman et al. 2016a). 5.1.1.2 Economic Harm–Based Tools A second strand of harm metric tools is composed of those which classify harm according to economic harm or financial cost. While cost is one aspect of severity considered by many of the public perception–based tools discussed in the previous subsection, this strand can be distinguished by the use of currency values as the output harm metric; these tools literally put dollars and cents or pounds and pence forward as the denominator of harm. The depth of previous research covering this strand of tools is somewhat lighter than for perception-based tools. The most relevant models of interest to this research took their cues from a modest range of research from the last century. Cohen (1988) first introduced the broad concept of supplementing ‘costs incurred’ estimates with information about ‘pain, suffering, and fear caused by crime’ (Cohen 1988, p. 1). Cohen, whose research focused on the United States, observed that, up to the point of his publication, most efforts to measure the cost of crime had focused on actual financial costs. In this respect, earlier efforts cannot be considered as even proxy measurements of harm. Subsequent work involving Cohen concluded that adding the costs of pain, suffering, and fear more than quadrupled the cost of crime (Miller et al. 1996), with the increase being attributed primarily to violent crime. Subsequently, other cost models were developed in France (Palle and Godefroy 2000) and Australia (Walker 1997).
5.1 What Is Harm and How Is It Measured?
69
In England and Wales, cost-of-crime models took a leap forward with practitioners after the Home Office published a research paper detailing a thorough costing model to be used in performance management (Brand and Price 2000). The Home Office’s tool estimated costs related to the anticipation of, response to, and consequences of crime, within which values were assigned for emotional and physical impact. The tool was configured around the framework of notifiable offences in England and Wales at the time, grouping together categories into eight classifications. The cost estimates themselves were the product of a complex combination of victims’ surveys, commercial surveys and industrial estimates. By the authors’ own admission, the methodology for developing the costs of emotional and physical impacts required improvement. The Brand and Price methodology used public perception of the costs a person would be willing to incur to avoid a road traffic accident as a proxy for the impact of crime. Clearly unsuitable, this specific aspect was addressed in a subsequent Home Office paper (Dubourg et al. 2005) which assessed a range of surveys to identify the prevalence and severity of health conditions emanating from crimes, then transposed these to estimated losses of quality-adjusted life years. This study gained some traction among researchers and professionals in subsequent years (Ignatans and Pease 2015; Welsh et al. 2015). Despite this, the tool was not updated for more than a decade, becoming obsolete due to the absence of official inflation adjustment. The use of cost-based models has been criticised by some as having low practical utility for practitioners because of the difficulty inherent in assessing a meaningful monetary value for many crimes as well as the need to constantly adjust for inflation (Ratcliffe 2015). The Home Office refreshed the model once again in 2018 (Heeks et al. 2018), concentrating primarily on victim- based crimes. 5.1.1.3 Sentence-Based Tools Sentence-based tools weight crimes on the basis of their respective punishments. In this sense, there are two main types of sentence-based tool: those which take their weightings from sentencing guidelines and those which take them from actual sentences imposed. These types of tool are a relatively recent development among criminologists, but they have become popular among analysts and researchers wishing to assess harm (see Barnham et al. 2017; Bland and Ariel 2015; Dudfield et al. 2017; Sherman et al. 2016b). Recent developments can be traced back to Sherman’s call (Sherman 2007) for a mechanism for weighting crimes in order to target experiments at what he called ‘the power few’ – units (whether people or places) to which were attributed the greatest harm. Sherman’s case was stated on the basis that such a cohort may offer the best opportunity for experimental criminology to detect effects in treatments but the harm indices have broader uses too. Two years later, the first such model emerged with the composition of the Canadian Crime Severity Index (Wallace 2009), though it should be pointed out that the catalyst for its development was not Sherman’s 2007 paper but rather a 2004 call from the Police Information and Statistics Committee of the Canadian Association
70
5 Measuring Harm
of Chiefs of Police (CACP), which requested a new method of reporting crime statistics from Statistics Canada, the Canadian equivalent to the UK Office for National Statistics. Their intention in seeking such a tool was different to Sherman’s; they wished to be able to detect changes in crime rates in a more nuanced way than mere aggregated crime statistics could portray, but this difference is somewhat semantic – both Sherman and the CACP sought variations on Sellin’s original premise: not all crimes are equivalent. In the Canadian Crime Severity Index, crimes recorded by the police are assigned weightings based on the mean sentences given in Canadian courts over the preceding five years (Babyak et al. 2009, 2013). Almost ten years on, the tool has become a mainstream national statistic (see https://www.statcan.gc.ca/ eng/sc/video/csi), and annual analyses of Canadian crime rates are presented using it. Sherman restated his argument for a ‘crime harm index’ repeatedly after his original 2007 paper (Sherman 2010, 2011, 2013) and in this time developed processes for what would later become the Cambridge Crime Harm Index (Sherman et al. 2016a). Like the Canadian index, the Cambridge Crime Harm Index is aligned to police-recorded crime classifications, but instead of taking individual weightings from average sentences, it takes them from the minimum sentences set out in guidelines given to judges and magistrates in England and Wales. Such a method is enabled by the fact that such guidelines exist at all, which they do not in every country. The development of the Cambridge index was the catalyst for a relative explosion in sentence-based crime indices. The ‘Sentencing Gravity Score’ (Ratcliffe 2015) proposed a similar method of taking a cue from guidelines to establish weight, but instead of using the number of days in prison as the unit of output, as do the Cambridge and Canadian indices, used a 14-point scale of severity, derived from scores assigned by the Pennsylvania Commission on Sentencing, which was adopted in 1997. Ratcliffe favoured this method due to its specificity and independence from police input (Ratcliffe 2015). The Sentencing Gravity Score broadly correlates with homicide rates; where the index score was high in Pennsylvania, it followed that the homicide rate was high, although this relationship deteriorated at lower-level geographic units once traffic accidents and other proactive measures were included. In this respect, Ratcliffe’s model departs from Sherman, Neyroud and Neyroud’s in that the latter authors argued for the removal of crimes recorded as a result of proactive police activity (e.g., drug possession crimes) from the index. While Ratcliffe wished to build a model reflective of wider police activity, Sherman et al. were primarily concerned with the differential effects such inputs may have, being largely dependent on the individual proactive capacity or working practices of individual agencies. Both points have merit; the Sentencing Gravity Score was developed specifically for use in one state and tailored to maximise the chances of operationalisation, while the Cambridge Crime Harm Index sought to establish a measure which could be meaningfully transposed across police force boundaries.
5.1 What Is Harm and How Is It Measured?
71
Other models influenced by the development of Sherman et al.’s model more closely mirrored it in respect of output (i.e., scores reflective of the number of days in prison). Notably, replications of the Cambridge Crime Harm Index have evolved in Denmark (Andersen and Mueller-Johnson 2018), Sweden (Rinaldo 2015), California, USA (Mitchell 2016), Australia (House and Neyroud 2018) and New Zealand (Curtis-Ham and Walton 2017), Each of these studies responded to the challenge of a lack of standard sentencing guidelines in their countries of focus in different ways. In New Zealand, average actual sentences were used and applied to all crime types, including those which were the product of proactive policing activities, to enable users of the index to choose those crime classifications which best suited their needs. In Australia, researchers evaluated the possibility of using maximum sentences before rejecting the method because of reduced variability (Kwan 2016, as cited in House and Neyroud 2018). House and Neyroud attempted to survey the judiciary, but in light of a low response rate, opted to use a variation on the average of actual sentences, considering the average sentence of first-time offenders only. This method resembles something of a hybrid between the Canadian and Cambridge models, using real sentencing data but only in cases where lower sentence tariffs are normally applied. In Sweden, researchers had more success in surveying judges (Rinaldo 2015), but in Denmark this method was rejected owing to judges’ lack of specialisation in criminal law. Instead, Andersen and Mueller- Johnson (2018) surveyed Danish prosecutors, asking them to rate 43 crime types and controlling for inter-rater reliability. The most significant development in sentence indices emanating from the Cambridge Crime Harm Index is the publication of the Crime Severity Score by the UK Office for National Statistics (ONS 2016). Published initially as an ‘experimental statistic’, the tool was intended by the ONS to complement, rather than replace, aggregated statistics. The Crime Severity Score draws directly on the Cambridge Crime Harm Index in spirit but employs average actual sentences (for a five-year period; December 2011–December 2015) over minimum guideline sentences. In seeking an objective measure, the ONS disregarded sentencing guidelines owing to too many omissions in the full range of crimes recorded by police. The ONS encountered some methodological challenges in certain crimes with low sample sizes, even in a five-year timeframe, and may extend the timeframe in future to address this issue (ONS 2016). The primary output difference to the Cambridge Crime Harm Index is that most offence types are shown to be more serious with the Crime Severity Score (see Ashby 2017) because of the influence of aggravating factors. The calculations for ‘days in prison’ equivalency of community sentences and fines are very similar, though they differ slightly in terms of calculation specifics. The ONS intends to update the index every five years to reflect changes in sentencing values. The publication of the Crime Severity Score brought crime indices to the attention of the mainstream media in the UK for the first time (Shaw 2016; Evans 2016).
72
5 Measuring Harm
5.1.1.4 Theoretical-Framework Tools The development of theoretical models is confined to a single study. Reflecting on the general progress of harm measurement among the criminological community, Greenfield and Paoli (2013) determined that not much had been done to establish definitive, systematic measurement instruments. While their paper predates the rapid rise in the number of sentence-based indices, it is difficult to challenge Greenfield and Paoli’s central premise. In the absence of such a tool, they proposed their own framework, while recognising ‘major conceptual and technical challenges’ (Greenfield and Paoli 2013, p. 865), based on the subjectivity of defining harm, which they argued was particularly difficult given the infinite nature of the subject, the legitimacy of the source of its measurement, and the extent to which the tool can be quantified and standardised. Their solution to these problems was to develop a highly complex overall model which, in practice, requires the identification of the bearers of harm, and the type of harm inflicted according to a taxonomy based on the work of von Hirsch and Jareborg (1991). It also requires the user to evaluate the severity and incidence of each type of resulting classification. Each of these factors has its own scale, and the positions on each of these determines the position of the type of harm on an overall prioritisation matrix. The authors advised that the determination of the positions on these scales should be made by a panel, with the output being the average of the results. Though this model sets out a comprehensive framework, the challenges associated with its implementation are, in the authors’ own words, ‘daunting’, with the result that it has seen little operationalisation. While it provides a recipe for a theoretically sound tool, it does not provide a prescriptive model that can be fully evaluated against the tests set out, and for this reason it is not examined further.
5.1.2 Assessing Which Tool to Use From this brief history of harm measurement tools, it is evident that we have three viable options: perception-based tools, economic-based tools and sentence-based tools. To make a thorough assessment of suitability, a viable candidate from each strand should be evaluated, and to this end each option is assessed against the following criteria: (1) Is the tool openly accessible to researchers at no cost? (2) Does the tool apply to the legal context of England and Wales? and (3) Can the tool be practically applied to police-recorded crime datasets? In applying these tests, four tools have been selected for deeper assessment, as summarised in Table 5.2. While it is obvious that tools applying to countries other than England and Wales should ordinarily be discounted, we have included the Canadian Crime Severity Index, the Sentencing Gravity Score and the Severity Typology in this description to illustrate the points on which they fail to meet the viability criteria. It is not impossible to apply harm index outputs across national boundaries in a meaningful way (see Sherman et al. 2016a, b). The problems with these options lie primarily
5.1 What Is Harm and How Is It Measured?
73
Table 5.2 Viability assessment of harm measurement tools
Tool name Cambridge Crime Harm Index Canadian Crime Severity Index CSEW Victim Seriousness Judgment Home Office Economic and Social Costs of Crime ONS Crime Severity Score Sentencing Gravity Score Severity Typology
Open access? ☑
Apply to England and Wales? ☑
Apply to police datasets? ☑
☒
☒
☒
Ignatans and Pease (2015) Heeks et al. (2018)
☑
☑
☑
☑
☑
☑
Office for National Statistics (2016) Ratcliffe (2015) Sellin and Wolfgang (1964)
☑
☑
☑
☑ ☒
☒ ☒
☑ ☒
Author(s) Sherman et al. (2016a) Wallace (2009)
elsewhere. The methodology for the Canadian Crime Severity Index is not widely available, but more importantly, being based on sentences issued by Canadian courts, it is blatantly unsuitable for the purpose of analysing domestic abuse in England and Wales. Ratcliffe’s Sentencing Gravity Score is potentially more palatable as the gradings have been made public, and with fewer of them, the individual values are arguably less important, but the fact remains that it is guidance clearly not applicable to England and Wales. The Severity Typology developed by Sellin and Wolfgang is, by now, too old to have much contemporary relevance, and it too reflects views collected from a population not especially related to twenty-first century England and Wales. This leaves four clear choices for closer scrutiny: the Cambridge Crime Harm Index, the Crime Survey of England and Wales Seriousness Judgement, the Home Office Economic and Social Cost of Crime model, and the ONS Crime Severity Score. In this subsection, each is analysed in further detail against the criteria set out in Sherman et al. (2016a) and reiterated in Ignatans and Pease (2015) and Curtis- Ham and Walton (2017). To assist the reader in tracing the logic of this analysis, Table 5.3 explains the scales against which each tool is assessed. The following paragraphs assess, in turn, each of the four viable tools against each of these criteria, with a brief recap of the methodology of each tool. 5.1.2.1 Cambridge Crime Harm Index Methodology Weights crimes by the number of days in prison (or equivalent) each Home Office crime classification would attract under the minimum sentencing guidelines provided to magistrates and judges, excluding all aggravating and mitigating factors.
74
5 Measuring Harm
Table 5.3 Criteria and scales for assessing harm measurement tools Resolves conflict democratically Offers a clear method for the resolution of conflicting views that is demonstrably democratic Moderate Offers a method for the resolution of conflicting views which is clear, yet of questionable or invalid democratic intent Weak Offers an opaque, or no method for resolution of conflicting views Grade Strong
Demonstrates reliability Output measure can be applied to different units of analysis and remains consistent for >10 years The output measure can either be applied equally to different units of analysis, or remains consistent over time (