120 60 8MB
English Pages 388 [378] Year 2021
Vergleichende Politikwissenschaft
Oliver Schlenkrich
Origin and Performance of Democracy Profiles
Vergleichende Politikwissenschaft Series Editors Steffen Kailitz, Hannah-Arendt-Institut für Totalitarismusforschung, Dresden, Germany Susanne Pickel, LS für Politikwissenschaft, Universität Duisburg-Essen, Duisburg, Nordrhein-Westfalen, Germany Claudia Wiesner, Hochschule Fulda, Fulda, Hessen, Germany
Die Schriftenreihe bietet ein Forum für alle, die sich in Forschung und Lehre mit zentralen Themen und Fragestellungen der der Vergleichenden Politikwissenschaft befassen. Sie steht dabei Beiträgen aller theoretischen und methodischen Zugänge der Vergleichenden Politikwissenschaft offen. Es sind sowohl theoretisch und/oder konzeptionell als auch empirisch oder methodisch ausgerichtete Schriften willkommen. Entsprechend der Internationalität der Vergleichenden Politikwissenschaft versteht sich die Reihe als ein internationales Forum des wissenschaftlichen Diskurses. Es werden daher deutsch- und englischsprachige Bände publiziert. Die Schriftenreihe wird im Auftrag der Sektion Vergleichende Politikwissenschaft der Deutschen Vereinigung für Politikwissenschaft herausgegeben. Die Aufnahme jeden Bandes der Reihe erfolgt durch ein Begutachtungsverfahren. The series offers a forum for all who deal in research and teaching with topics and questions of all subfields of comparative politics. It is open to contributions from all theoretical and methodological approaches to comparative politics. Both theoretically and/or conceptually as well as empirically or methodologically oriented writings are welcome. In keeping with the international nature of comparative politics, the series sees itself as an international forum for scholarly discourse. Therefore, volumes in German and English language will be published. The series is published on behalf of the section Comparative Politics of the German Political Science Association. Each volume in the series is accepted through a peer-review process.
More information about this series at http://www.springer.com/series/13436
Oliver Schlenkrich
Origin and Performance of Democracy Profiles
Oliver Schlenkrich University of Würzburg Berlin, Germany Ph.D. Thesis (Julius-Maximilians-Universität Würzburg, Graduate School of Law, Economics and Society, 2021). The Ph.D. Thesis was completed as part of the DFG (German Research Foundation) project “Causes of Quality Types and Democracy Profiles: Empirical Findings of the Democracy Matrix” (grant number LA 1210/5-2).
ISSN 2569-8672 ISSN 2569-8702 (electronic) Vergleichende Politikwissenschaft ISBN 978-3-658-34879-3 ISBN 978-3-658-34880-9 (eBook) https://doi.org/10.1007/978-3-658-34880-9 © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Fachmedien Wiesbaden GmbH, part of Springer Nature 2021 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. Responsible Editor: Stefanie Eggert This Springer VS imprint is published by the registered company Springer Fachmedien Wiesbaden GmbH part of Springer Nature. The registered company address is: Abraham-Lincoln-Str. 46, 65189 Wiesbaden, Germany
For Maria, Frederik and Hannes with love and thanks
Acknowledgments
Ich möchte mich ganz herzlich bei Prof. Dr. Hans-Joachim Lauth bedanken, der es mir ermöglicht hat, die Promotion anzustreben und auch erfolgreich abzuschließen. Ich danke ihm auch besonders für die vielen produktiven und hilfreichen Diskussion während unserer mehrjährigen Zusammenarbeit im Rahmen des von der DFG finanzierten Demokratiematrix-Projektes, aus der auch diese Promotion entstanden ist. Schließlich danke ich ihm auch dafür, dass er mir den Freiraum gegeben hat und mich darin unterstützt hat, neben meinem Forschungsinteresse der Demokratiewissenschaft meinen anderen Forschungsinteressen der quantitativen Methodik und bayesianische Statistik nachzugehen. All diese Forschungsthemen fließen in die Promotion ein. Es ist nicht übertrieben, wenn ich schreibe, dass er mich in meinem Denken und wissenschaftlichen Handeln stark prägt. Ich danke Frau Prof. Dr. Christiane Gross, welche die Zweitbetreuung dieser Arbeit übernommen hat. Ich danke ihr für die konstruktive methodische Beratung, die nicht nur zu einer Fokussierung der Methodik geführt hat, sondern mich vor allem in meinem Vorgehen wesentlich bestärkt und motiviert hat. Schließlich bedanke ich mich bei meinen Kolleginnen und Kollegen, allen voran Lukas Lemm, Dr. Christoph Mohamad-Klotzbach und Dr. Theresa Stawski, für den freundschaftlichen und wissenschaftlichen Austausch am Lehrstuhl für Vergleichende Politikwissenschaft und Systemlehre in Würzburg. Mit ihnen konnte man nicht nur sehr angenehm zusammenarbeiten, sondern ich konnte über mein Promotionsthema diskutieren, neue Ideen und neue Motivationen sammeln. Ganz großen Dank gilt auch besonders Frau Michaela Thoma, die mich immer kompetent in universitären Verwaltungsfragen beraten hat und stets wusste, was zu tun ist und wen man wann anschreiben/antelefonieren muss.
vii
viii
Acknowledgments
Last, but not least, danke ich meiner lieben Frau Maria Schlenkrich, die mir stets bei der Promotion gut zugesprochen hat, auch wenn es mal nicht so gut voranging. Auch meinem Sohn Frederik danke ich für sein Lachen und die Einsicht, dass wenn man etwas nur oft genug versucht und stur bleibt, es am Ende doch meist klappt. Schließlich danke ich meinem neugeborenen Sohn, Hannes, der mir klar machte, dass ich diese Promotion nun schnellstmöglich beenden muss. Dann danke ich auch meinen Eltern, Svetla und Wilhelm Kauff , sowie meinen Schwiegereltern, Cornelia und Uwe Schlenkrich, die alle eine immense Geduld hatten und trotzdem den Glauben an mich nicht verloren. Schließlich danke ich den drei Herausgeber/innen Prof. Dr. Steffen Kailitz, Prof. Dr. Susanne Pickel und Prof. Dr. Claudia Wiesner der DVPW Schriftenreihe Vergleichende Politikwissenschaft, die meine Dissertation in der Reihe aufgenommen haben. Dieses Buch ist im Rahmen des DFG-Projektes „Ursachen von Qualitätstypen und Demokratieprofilen: empirische Befunde der Demokratiematrix“ (LA 1210/5-2) entstanden. Berlin 17. Februar 2021
Contents
1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1 Introduction to the Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Main Goals and Research Question . . . . . . . . . . . . . . . . . . . . . . . 1.3 Shortcomings of the Current State of Research about Democracy Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.4 Contribution of this Study to the Research . . . . . . . . . . . . . . . . . 1.5 Research Design and Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.6 Plan of the Book and Summary . . . . . . . . . . . . . . . . . . . . . . . . . . .
1 1 2 3 5 8 10
Part I: Democracy Profiles 2
3
Identifying Profiles of Democracy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Democracy Matrix: Trade-Offs and Democracy Profiles . . . . . . 2.2.1 Democracy Conception and Trade-off-Relationship . . . . . . . . . . . . . . . . . . . . . . . . 2.2.2 Measurement of Democracy Profiles . . . . . . . . . . . . . . . . 2.3 Research Design: Multiple Imputation and Multi-Step Cluster Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4 Results of the Cluster Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5 Discussion: Temporal and Spatial Development of Democracy Profiles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.6 Summary and Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
17 17 19
33 39
Analyzing the Varieties of Democracy Dataset . . . . . . . . . . . . . . . . . . 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Overview of the Varieties of Democracy Dataset . . . . . . . . . . . .
43 43 44
19 25 27 32
ix
x
Contents
3.3
3.4
3.2.1 Themes Covered by the V-Dem Dataset . . . . . . . . . . . . . 3.2.2 V-Dem Measurement Model: Bayesian Item Response Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.3 Research Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Assessing the Quality of the Varieties of Democracy Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3.1 Conceptual Clarity of the Indicators . . . . . . . . . . . . . . . . 3.3.2 Descriptive Analysis of the Expert Coders . . . . . . . . . . . 3.3.3 Empirical Analysis of the Expert Coders . . . . . . . . . . . . 3.3.4 Multiple Choice and Percentage Variables . . . . . . . . . . . 3.3.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Summary and Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
44 47 52 54 54 59 69 88 89 93
Part II: AGIL Typology of Political Performance 4
AGIL Typology of Political Performance . . . . . . . . . . . . . . . . . . . . . . . 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Literature Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3 The AGIL Typology of Political Performance . . . . . . . . . . . . . . 4.3.1 Conceptual Derivation of the AGIL Typology . . . . . . . . 4.3.2 Description of each Matrix Field . . . . . . . . . . . . . . . . . . . 4.4 Summary and Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
97 97 98 105 105 108 112
5
Conceptualizing and Measuring Goal-oriented Performance . . . . . 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2 Methodological Framework for Conceptualization and Measurement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3 Adaptation: Economic and Environmental Outcomes . . . . . . . . 5.3.1 Economic Outcomes (Goal-oriented Performance 1a) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3.2 Environmental Outcomes (Goal-oriented Performance 1b) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4 Goal-Attainment: Reformability of the Political System (Goal-oriented Performance 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.5 Integration: Social and Domestic Security Outcomes . . . . . . . . 5.5.1 Social Outcomes (Goal-oriented Performance 3a) . . . . . 5.5.2 Domestic Security Outcomes (Goal-oriented Performance 3b) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.6 Latent Pattern Maintenance: Confidence in Institutions (Goal-oriented Performance 4) . . . . . . . . . . . . . . . . . . . . . . . . . . . .
113 113 114 116 116 120 124 127 127 132 138
Contents
5.7 6
7
xi
Summary and Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
143
Aggregating Goal-oriented Performance . . . . . . . . . . . . . . . . . . . . . . . 6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2 Methodological Framework for the Aggregation . . . . . . . . . . . . 6.2.1 Exploratory Factor Analysis . . . . . . . . . . . . . . . . . . . . . . . 6.2.2 Treatment of Missing values: Multiple Imputation . . . . 6.2.3 Transformation of the Indicators . . . . . . . . . . . . . . . . . . . . 6.3 Application of the Aggregation Framework to the Performance Areas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.3.1 Economic Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.3.2 Environmental Performance . . . . . . . . . . . . . . . . . . . . . . . . 6.3.3 Goal-Attainment Performance . . . . . . . . . . . . . . . . . . . . . . 6.3.4 Social Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.3.5 Domestic Security Performance . . . . . . . . . . . . . . . . . . . . 6.3.6 Latent Pattern Maintenance Performance . . . . . . . . . . . . 6.4 Summary and Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
147 147 149 149 152 157
Describing Goal-Oriented Performance . . . . . . . . . . . . . . . . . . . . . . . . 7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.2 Research Questions and Research Strategy . . . . . . . . . . . . . . . . . 7.3 Descriptive and Exploratory Empirical Analyses . . . . . . . . . . . . 7.3.1 Temporal and Spatial Distribution of the Performance Indices . . . . . . . . . . . . . . . . . . . . . . . . 7.3.2 Development of Goal-Oriented Performance . . . . . . . . . 7.3.3 Types of Goal-Oriented Performance . . . . . . . . . . . . . . . . 7.4 Summary and Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
160 160 164 170 171 176 181 184 187 187 188 190 190 192 205 212
Part III: Explaining Performance 8
Explaining Goal-Oriented Performance . . . . . . . . . . . . . . . . . . . . . . . . 8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.2 Literature Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.2.1 Criteria for the Literature Review . . . . . . . . . . . . . . . . . . . 8.2.2 Literature Review: Democracy Models and Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.3 Theory and Hypotheses: Three Types of Effects of Democracy Profiles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.4 Methodology: The Bayesian TSCS Within-Between Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
217 217 218 218 222 235 238
xii
Contents
8.5
8.6 9
8.4.1 Time-Series Cross-Sectional Analysis: Autoregressive Distributed Lag Model and Error Correction Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.4.2 Models for Goal-Attainment Performance and Confidence Performance . . . . . . . . . . . . . . . . . . . . . . . 8.4.3 Compositional Data as Independent Variables . . . . . . . . 8.4.4 Operationalization: Datasets and Variables . . . . . . . . . . . 8.4.5 Workflow of the Regression Analysis . . . . . . . . . . . . . . . Empirical Analyses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.5.1 Economic Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.5.2 Environmental Performance . . . . . . . . . . . . . . . . . . . . . . . . 8.5.3 Goal-Attainment Performance . . . . . . . . . . . . . . . . . . . . . . 8.5.4 Social Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.5.5 Domestic Security Performance . . . . . . . . . . . . . . . . . . . . 8.5.6 Confidence Performance . . . . . . . . . . . . . . . . . . . . . . . . . . 8.5.7 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Summary and Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Explaining Policy Regime Performance . . . . . . . . . . . . . . . . . . . . . . . . 9.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.2 Part I: Origin of Democracy Profiles . . . . . . . . . . . . . . . . . . . . . . 9.2.1 Literature Review, Theoretical Framework and Hypotheses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.2.2 Methodological Framework . . . . . . . . . . . . . . . . . . . . . . . . 9.2.3 Empirical Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.2.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.3 Part II: Co-existence of Policy Regimes . . . . . . . . . . . . . . . . . . . . 9.3.1 Conceptualization and Measurement of Policy Regimes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.3.2 Literature Review and Theoretical Framework . . . . . . . 9.3.3 Methodological Framework . . . . . . . . . . . . . . . . . . . . . . . . 9.3.4 Descriptive Analysis and Discussion . . . . . . . . . . . . . . . . 9.4 Summary and Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
238 249 251 253 256 261 261 266 271 273 279 283 286 292 295 295 296 296 301 309 323 326 326 328 330 331 335
10 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.1 Relevance and Research Question of the Study . . . . . . . . . . . . . 10.2 Important Findings and Implications of the Study . . . . . . . . . . . 10.3 Limits of the Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
337 337 338 341
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
343
Abbreviations and Acronyms
ADL AIC BIC CME DAG ECM EFA FCM Fec FeC fEC fEc FEC GEP HPD KMO LDV LME LOOIC LRM MAR MCAR MNAR MSA OECD OLS
Autoregressive distributed lag model Akaike information criterion Bayesian information criterion Coordinated market economy Directed acyclic graph Error correction model Exploratory factor analysis Fuzzy C-means clustering Libertarian and majoritarian democracy Libertarian and control-focused democracy Egalitarian and control-focused democracy Egalitarian and majoritarian democracy Balanced Democracy General Environmental Performance Highest posterior density Kaiser-Meyer-Olkin Test Lagged dependent variable Liberal market economy Leave-One-Out cross-validation Information Criterion Long-Run Multiplier Missing At Random Missing Completely At Random Missing Not At Random Measure of sampling adequacy Organisation for Economic Co-operation and Development Ordinary least squares
xiii
xiv
PAM PCA PPC RMSEA TSCS VoC WAIC
Abbreviations and Acronyms
Partition around medoids clustering Principal component analysis Posterior predictive check Root Mean Square Error of Approximation Time-series cross-sectional Varieties of Capitalism Widely applicable information criterion
List of Figures
Figure Figure Figure Figure Figure Figure
1.1 2.1 2.2 2.3 2.4 2.5
Figure 2.6 Figure 2.7 Figure 3.1 Figure 3.2 Figure 3.3 Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure
3.4 3.5 3.6 3.7 3.8 3.9 3.10 3.11 3.12 3.13 3.14
Plan of the Book . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Concept of the Democracy Matrix . . . . . . . . . . . . . . . . . . . . . Calibrated Fit Indices for the Trade-Off Indicators . . . . . . . Principal Component Plot . . . . . . . . . . . . . . . . . . . . . . . . . . . . Box Plots for the Cluster Solutions . . . . . . . . . . . . . . . . . . . . Temporal Distribution of Democracy Profiles (Count and Percent) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Spatial Distribution of Democracy Profiles (1974– 2017) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Temporal Development of the Democracy Profile for Single Countries after 1945 . . . . . . . . . . . . . . . . . . . . . . . Number of Variables per Section . . . . . . . . . . . . . . . . . . . . . . Study Design V-Dem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Directed Acyclic Graph of the V-Dem Measurement Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Number of Country-Years coded by Experts . . . . . . . . . . . . Number of Countries and Variables coded by Experts . . . . Types of Coders per Country and Year . . . . . . . . . . . . . . . . . Number of Coders per country-year per section . . . . . . . . . . Average Confidence of the Coders per Country-Year . . . . . Average Confidence of the Coders per Section . . . . . . . . . . Average Disagreement of the Coders per Country-Year . . . Disagreement between coders per section . . . . . . . . . . . . . . . Average Rating of the Coder Types per Country-Year . . . . . DAG Representation of the Regression Models . . . . . . . . . . Predicted Values for All Expert Coders . . . . . . . . . . . . . . . . .
11 20 33 34 35 36 37 38 45 48 50 60 62 63 64 65 66 67 68 69 74 78
xv
xvi
List of Figures
Figure 3.15 Figure 3.16 Figure 3.17 Figure 3.18 Figure 3.19 Figure 5.1 Figure 5.2 Figure 5.3 Figure 5.4 Figure 5.5 Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure
5.6 5.7 6.1 6.2 6.3 6.4 6.5 6.6 6.7 6.8 6.9 6.10 6.11 6.12 6.13 6.14 6.15 6.16 6.17 6.18 6.19 6.20 6.21 6.22
Predicted Values for the Number of Lateral and Bridge Coders . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Predicted Values for Coder Disagreement . . . . . . . . . . . . . . . The Ratings for Germany for the Indicator “v2svstterr” . . . Mean Values for the Multiple-Choice Variable “HOS control over” (v2exctlhs) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Multiple Choice Variables and Measurement Model Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Missing Values in the Economic Outcome Indicators . . . . . Missing Values in the Environmental Outcome Indicators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Missing Values in the Reformability Indicators . . . . . . . . . . Missing Values in the Social Outcome Indicators . . . . . . . . Missing Values in the Domestic Security Outcome Indicators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Levels of Support by Norris . . . . . . . . . . . . . . . . . . . . . . . . . . Missing Values in the Specific Support Indicators . . . . . . . . Workflow for the Aggregation Procedure . . . . . . . . . . . . . . . Standard Work Flow of Multiple Imputation . . . . . . . . . . . . Transformed Sample (Economic Performance) . . . . . . . . . . Convergence Plot for Economic Performance . . . . . . . . . . . Density Plots for Economic Performance . . . . . . . . . . . . . . . LOOCV for Economic Performance . . . . . . . . . . . . . . . . . . . Parallel Analysis for Economic Performance . . . . . . . . . . . . Factor Solution for Economic Performance . . . . . . . . . . . . . Transformed Sample (Environmetal Performance) . . . . . . . . Convergence Plot (Environmental Performance) . . . . . . . . . Density Plots for Environmental Performance . . . . . . . . . . . LOOCV of Environmental Performance . . . . . . . . . . . . . . . . Parallel Analysis for Environmental Performance . . . . . . . . Factor Solution for Environmental Performance . . . . . . . . . Transformed Indicators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Transformed Sample (Social Performance) . . . . . . . . . . . . . . Convergence Plot for Social Performance . . . . . . . . . . . . . . . Density Plots for Social Performance . . . . . . . . . . . . . . . . . . LOOCV of Social Performance . . . . . . . . . . . . . . . . . . . . . . . Parallel Analysis for Social Performance . . . . . . . . . . . . . . . Factor Solution for Social Performance . . . . . . . . . . . . . . . . . Transformed Sample (Domestic Security Performance) . . .
82 87 89 90 91 120 124 128 133 137 140 142 149 155 160 161 162 163 164 165 165 166 167 168 169 169 170 171 172 173 174 175 176 177
List of Figures
Figure Figure Figure Figure Figure Figure Figure Figure
6.23 6.24 6.25 6.26 6.27 6.28 6.29 7.1
Figure 7.2 Figure 7.3 Figure 7.4 Figure Figure Figure Figure Figure Figure Figure Figure
7.5 7.6 7.7 7.8 7.9 7.10 7.11 7.12
Figure 7.13 Figure 7.14 Figure 8.1 Figure 8.2 Figure 8.3 Figure Figure Figure Figure Figure Figure Figure
8.4 8.5 8.6 8.7 8.8 8.9 8.10
xvii
Convergence Plot for Domestic Security Performance . . . . Density Plots for Domestic Security Performance . . . . . . . . LOOCV for Domestic Security Performance . . . . . . . . . . . . Parallel Analysis for Domestic Security Performance . . . . . Factor Solution for Domestic Security Performance . . . . . . Parallel Analysis for Confidence . . . . . . . . . . . . . . . . . . . . . . Factor Solution for Confidence . . . . . . . . . . . . . . . . . . . . . . . Missing Values for Each Performance Area After Imputation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Economic Performance—Wealth . . . . . . . . . . . . . . . . . . . . . . Economic Performance—Productivity . . . . . . . . . . . . . . . . . . Economic Performance—Productivity (selected sample) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Environmental Performance—GEP . . . . . . . . . . . . . . . . . . . . Goal-Attainment Performance—Amendment Rates . . . . . . . Social Performance—Economic Equality . . . . . . . . . . . . . . . Social Performance—Social Equality . . . . . . . . . . . . . . . . . . Domestic Security Performance . . . . . . . . . . . . . . . . . . . . . . . Latent Pattern Maintenance Performance—Confidence . . . . Calibrated Cluster Validity Indices . . . . . . . . . . . . . . . . . . . . Principal Component Plot, Boxplot and World Map for the 2 Cluster Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . Principal Component Plot, Boxplot and World Map for the 3 Cluster Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . Principal Component Plot, Boxplot and World Map for the 4 Cluster Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . DAG representation of the Bayesian Multilevel TSCS Model (ADL) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DAG representation of the Bayesian Student-t Model . . . . . DAG representation of the Multilevel Model for Survey Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . LRMs for the Democracy Profiles (Wealth) . . . . . . . . . . . . . Dynamic Simulation (Wealth) . . . . . . . . . . . . . . . . . . . . . . . . Between-Effects (Wealth) . . . . . . . . . . . . . . . . . . . . . . . . . . . . LRMs for the Democracy Profiles (Productivity) . . . . . . . . . Dynamic Simulation (Productivity) . . . . . . . . . . . . . . . . . . . . Between-Effects (Productivity) . . . . . . . . . . . . . . . . . . . . . . . . LRMs for the Democracy Profiles (Environmental Performance) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
178 179 180 181 182 183 184 191 193 194 195 196 198 199 200 202 203 207 208 209 211 249 250 251 264 265 266 267 268 269 270
xviii
List of Figures
Figure Figure Figure Figure Figure Figure Figure Figure Figure
8.11 8.12 8.13 8.14 8.15 8.16 8.17 8.18 8.19
Figure Figure Figure Figure Figure
8.20 8.21 9.1 9.2 9.3
Figure Figure Figure Figure Figure
9.4 9.5 9.6 9.7 9.8
Figure 9.9 Figure 9.10
Dynamic Simulation (Environmental Performance) . . . . . . . Between-Effects (Environmental Performance) . . . . . . . . . . LRMs for the Democracy Profiles (Economic Equality) . . . Dynamic Simulation (Economic Equality) . . . . . . . . . . . . . . Between-Effects (Economic Equality) . . . . . . . . . . . . . . . . . . LRMs for the Democracy Profiles (Social Equality) . . . . . . Dynamic Simulation (Social Equality) . . . . . . . . . . . . . . . . . Between-Effects (Social Equality) . . . . . . . . . . . . . . . . . . . . . LRMs for the Democracy Profiles (Domestic Security Performance) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Dynamic Simulation (Domestic Security Performance) . . . Between-Effects (Domestic Security Performance) . . . . . . . DAG representation for the Dirichlet Regression . . . . . . . . . Odds-Ratio Plot for Structural Factors . . . . . . . . . . . . . . . . . Expected Values for Two Ideal Conditions (Structural Factors) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Odds-Ratio Plot for Power Resource Theory . . . . . . . . . . . . Expected Values (Power Resource Theory) . . . . . . . . . . . . . Odds-Ratio Plot for Cultural Orientation . . . . . . . . . . . . . . . Expected Probabilities for Cultural Orientation . . . . . . . . . . Relationship between Economic Regime, Welfare State and Democracy Profiles . . . . . . . . . . . . . . . . . . . . . . . . . Boxplots for the Economic Regime, Welfare State Regime and Democracy Profiles . . . . . . . . . . . . . . . . . . . . . . Cultural Orientations and Policy Regimes . . . . . . . . . . . . . .
271 272 277 278 279 280 281 282 283 284 285 304 314 315 319 320 323 324 331 332 334
List of Tables
Table Table Table Table Table Table Table
2.1 3.1 3.2 3.3 3.4 3.5 3.6
Table 3.7 Table 3.8 Table 4.1 Table 4.2 Table Table Table Table Table Table Table
4.3 4.4 4.5 4.6 4.7 4.8 5.1
Table 5.2 Table 5.3
Conceptually Derived Democracy Profiles . . . . . . . . . . . . . . . Research Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Multidimensional Indicators . . . . . . . . . . . . . . . . . . . . . . . . . . . Vaguely Defined Indicators . . . . . . . . . . . . . . . . . . . . . . . . . . . Ill-defined Scales . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Working Hypotheses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Regression Results for the Number of All Expert Coders (Truncated Poisson Regression) . . . . . . . . . . . . . . . . . Regression Results for the Number of Bridge and Lateral Expert Coders (Poisson Regression) . . . . . . . . . . Regression Results for Coder Disagreement (Normal Multilevel Regression) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Overview Political Productivity of Political Systems (Almond/Powell) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Putnam’s Performance Criteria in Making Democracy Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Overview Lijphart 2012 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Roller’s Typology of Performance Criteria . . . . . . . . . . . . . . . Overview Worldwide Governance Indicators . . . . . . . . . . . . . Overview SGI 2016 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Overview BTI 2018 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . AGIL Typology of Performance . . . . . . . . . . . . . . . . . . . . . . . Checklist for Conceptualization and Measurement Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Selection of Indicators for Economic Outcomes . . . . . . . . . . Selection of Indicators for Environmental Outcomes . . . . . .
24 54 55 56 58 72 76 80 84 98 99 101 101 103 104 105 109 116 119 123
xix
xx
Table Table Table Table Table Table Table Table Table Table
List of Tables
5.4 5.5 5.6 5.7 5.8 6.1 6.2 6.3 6.4 6.5
Table 6.6 Table 6.7 Table 6.8 Table Table Table Table Table
6.9 6.10 7.1 7.2 7.3
Table 7.4 Table 7.5 Table 7.6 Table Table Table Table Table Table Table
7.7 8.1 8.2 8.3 8.4 8.5 8.6
Table 8.7 Table 8.8 Table 8.9
Selection of Indicators for Reformability . . . . . . . . . . . . . . . . Selection of Indicators for Social Outcomes . . . . . . . . . . . . . Selection of Indicators for Domestic Security Outcomes . . . Selection of Indicators for Specific Support . . . . . . . . . . . . . . Final Evaluation of the Research Criteria . . . . . . . . . . . . . . . . Checklist for Exploratory Factor Analysis . . . . . . . . . . . . . . . Checklist for Treatment of Missing Values . . . . . . . . . . . . . . . Tukey’s Ladder of Power . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Checklist for Transformation Procedure . . . . . . . . . . . . . . . . Kaiser-Meyer-Olkin (KMO) Test for Economic Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Kaiser-Meyer-Olkin (KMO) Test for Environmental Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Kaiser-Meyer-Olkin (KMO) Test for Social Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Kaiser-Meyer-Olkin (KMO) Test for Domestic Security Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Kaiser-Meyer-Olkin (KMO) Test for Confidence . . . . . . . . . . Summary of Aggregation Procedure . . . . . . . . . . . . . . . . . . . . Top Performer of Economic Performance—Wealth . . . . . . . . Top Performer of Environmental Performance—GEP . . . . . . Top Performer of Social Performance—Economic Equality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Top Performer of Social Performance—Social Equality . . . . Top Performer of Domestic Security Performance . . . . . . . . Summary Table for the Development of Goal-Oriented Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Stability of the Cluster Solution . . . . . . . . . . . . . . . . . . . . . . . . Literature Review Criteria . . . . . . . . . . . . . . . . . . . . . . . . . . . . Effects of Democracy Profiles . . . . . . . . . . . . . . . . . . . . . . . . . Amalgamations of Democracy Profiles . . . . . . . . . . . . . . . . . . Operationalization of the Control Variables . . . . . . . . . . . . . . Workflow and Regression Diagnostics . . . . . . . . . . . . . . . . . . Regression Results for Goal-Attainment Performance (Lutz) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Regression Results for Goal-Attainment Performance (CCP) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Regression Results for Confidence Performance . . . . . . . . . . Summary Table for Goal-oriented Performance . . . . . . . . . . .
127 131 137 142 145 153 157 158 159 163 168 175 180 182 186 193 197 200 201 202 205 207 222 239 253 257 262 274 275 287 289
List of Tables
Table 9.1 Table Table Table Table Table
9.2 9.3 9.4 9.5 9.6
Table 9.7 Table 9.8
xxi
Hypotheses and Their Impact on the Occurrence of Democracy Profiles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Membership Probabilities of Selected Cases . . . . . . . . . . . . . Workflow of Regression Analysis . . . . . . . . . . . . . . . . . . . . . . Measurement of the Independent Variables . . . . . . . . . . . . . . Results of the Dirichlet Regression (Structural Factors) . . . . Results of the Dirichlet Regression (Power Resource Theory) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Results of the Dirichlet Regression (Cultural Factors) . . . . . Overview Policy Regimes . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
300 301 307 308 311 317 321 329
1
Introduction
1.1
Introduction to the Introduction
How can we make sense of all the different institutional designs of democracies? What causes these different institutional designs and do these designs have an impact on policy outcomes? Many democracy researchers have been working on these questions: Steffani (1979) stresses that the strength of the party discipline is a political consequences of different constitutional designs in parliamentary and presidential systems. Lijphart (1984, 1999, 2012) created the famous distinction between consensus and majoritarian democracies and recommends the constitutional adoption of consensus democracy for emerging democracies. According to his empirical findings, consensus democracies are the better decision-makers, have a higher quality of democracy and are overall “kinder and gentler” than majoritarian democracies. Thereby, consensus democracies emerge in large and ethnically fragmented countries due to their ability to moderate conflict, while majoritarian democracies emerge as a result of the British heritage. Additionally, there is the differentiation between collective and competitive veto points (Birchfield & Crepaz, 1998; Crepaz & Moser, 2004) and between decentralized and centripetal democracies (Gerring & Thacker, 2008). Finally, others argue that consociationalism (Lijphart, 1996) and power-sharing or “state-nation” (Stepan et al., 2010) democracies are favorable for heterogeneous societies because their institutional setup—minority rights, mutual veto powers and federalism—reduce the probability of violent conflicts or even civil wars. All in all, important conclusions in comparative political science are that there are different institutional choices, there is a reason why some countries choose a specific institutional design and finally, “institutions do matter”.
© The Author(s), under exclusive license to Springer Fachmedien Wiesbaden GmbH, part of Springer Nature 2021 O. Schlenkrich, Origin and Performance of Democracy Profiles, Vergleichende Politikwissenschaft, https://doi.org/10.1007/978-3-658-34880-9_1
1
2
1
Introduction
Recently, different institutional profiles of democracies have also been analyzed in the quality of democracy research concluding that a perfect democracy does not exist. Rather “every democratic country must make an inherently value-laden choice about what kind of democracy it wishes to be” (Diamond & Morlino, 2004, p. 21 emph. in original). There are trade-offs between central dimensions and functions of democracy: Newer measurements of democracies (e.g. Democracy Barometer, V-Dem) try to explore this possibility. On a preliminary basis, the Democracy Barometer (Bühlmann et al., 2012) identifies several different clusters of democracies (e.g. institutions in Sweden or Denmark emphasize the dimensions of freedom and equality at the cost of the control dimension). Varieties of Democracy investigates the possibility of trade-offs in their conceptual papers as well (Coppedge et al., 2011) stressing the tension between institutions of the liberal and majoritarian conception of democracy. However, these democracy measures are neither able to measure or identify empirically different democracy profiles nor analyze their consequences for political, social or economic outcomes. All these concepts which distinguish democracies by their specific institutional profile can be subsumed under the term “democracy model”. Thereby, the term “model” emphasizes that there is an assumption of an interaction between these institutional characteristics which creates a specific functional logic, radiating also into other areas. Often it is proposed that these institutions shape and favor a specific behavior of political actors. For instance, Lijphart’s (1999, 2012) majoritarian and consensus democracy are democracy models by implying that their characteristics are linked together (proportional representation leads to a multiparty system which in turn creates the need for consensus). Similar, the collective and competitive veto points by Birchfield and Crepaz (1998) belong to the democracy models, since the interplay of institutions of the collective veto points (parliamentarism, proportional representation, multiparty legislatures and governments) “enable closer, and more personal, interaction of political actors, collective agency, and shared responsibility” (Bogaards, 2017a, p. 16).
1.2
Main Goals and Research Question
On the one hand, this study wants to pick up the idea of trade-offs between central dimensions and functions of democracy in the quality of democracy research. Thereby, I draw on the dataset of the Democracy Matrix (Lauth & Schlenkrich, 2018a) which is a customized version of the Varieties of Democracy (V-Dem) dataset (Coppedge et al., 2018). It is a measurement instrument which is not only designed to gauge the quality of democracy, but also to capture several trade-offs
1.3 Shortcomings of the Current State of Research …
3
between dimensions caused by specific institutional choices of the democracies. It proposes various trade-offs between three fundamental democracy dimensions, namely political freedom, political equality, and political and legal control. On the other hand, this study wants to follow closely Lijphart’s tradition of empirical democracy research. Lijphart not only creates a typology of consensus and majoritarian democracies, but also analyzes their impact on policy outcomes and why some countries adopt consensus democracy, while others prefer majoritarian democracy. This means, he analyzes his typology from two directions: These democracy models are an independent variable, but also a dependent variable. The main research questions of this study are therefore: Can we conceptually and empirically identify democracy profiles on the basis of trade-offs? If we are able to identifying these democracy profiles, do they actually matter for policy outcomes (political performance)? And finally, what are the causes of these democracy profiles? Why do some countries adopt a specific democracy profile?
1.3
Shortcomings of the Current State of Research about Democracy Models
The central reference point in the discussion of models of democracy and political performance is the study “Patterns of Democracy” by Lijphart (1999, 2012), which is considered a landmark work in comparative political science (Kailitz, 2007; Bormann, 2010, p. 1). Therefore, the deficits of the current state of research are discussed, especially with regard to Lijphart. The research gaps manifest themselves in three main areas: a conceptual, theoretical, and methodological area. The conceptual shortcoming refers on the one hand to the conceptualization of the democracy models and on the other hand on the conceptualization of performance. The starting point of Lijphart’s consensus and majoritarian democracy is mainly inductively derived: “features of majoritarian democracy were derived inductively from the experience of the United Kingdom, whereas the […] features of consensus democracy were derived deductively by taking the opposite of the majoritarian model” (Bogaards, 2017a, p. 3). The lack of a guiding democracy conception becomes apparent with the “conceptually questionable” (Bormann, 2010, p. 7) inclusion of corporatism and independent central banks as features of consensus democracy. Other approaches have strongly improved on this issue by anchoring their democracy models in the democracy theory (e.g. Gerring et al., 2005; Gerring & Thacker, 2008).
4
1
Introduction
In addition, Lijphart lacks a clear conception of what constitutes as a valid performance measure. There is no conceptual justification for the performance measures Lijphart uses to demonstrate the “kinder and gentler” nature of consensus democracies. Borman (2010, p. 8) states that he made a “haphazard choice of dependent variables which conflate policy outcomes and outputs”. Although some approaches improved in this respect by offering an overall performance framework (e.g. Roller, 2005), they still lack a guiding criteria which single aspects of performance are relevant. Other approaches just focus on single aspects of performance without a reference to a clear performance conception (e.g. Doorenspleet & Pellikaan, 2013; Birchfield & Crepaz, 1998; L. Anderson, 2001). Therefore, the analyzed performance criteria often seem inconsistent and incomplete. The second shortcoming refers to the theory and can be separated into two points: First, there is a lack of a causal explanation of the relationship between democracy models and performance in the sense of unclear or incomplete causal mechanisms. Lijphart “goes directly from measuring type of democracy to performance, without developing a theory linking the two” (Bogaards, 2017a, p. 15). This is a general problem in this research area: There is a serious limit of the institutional explanations of democracy models. It seems a profound theoretical relationship will not be able to be established because the causal chain linking democracy models and performance is too long and thus, a “distinct causal path” (Gerring et al., 2005, p. 159) cannot be identified. Nevertheless, more attention should be paid to this area. Second, Lijphart’s approach lacks the appropriate control variables. He considers mainly the socio-economic approach, while other theories of the public policy approach are not considered (Schmidt, 2015). Other approaches also improved in this area by taking the role of parties, culture or spatial correlation into account (Roller, 2005; Gerring & Thacker, 2008). However, these approaches miss other important control variables: They do not discuss the influence of informal institutions: Lauth (2010a) states that Lijphart misses this aspect and thus, overestimates the impact of formal institutions. Informal institutions might not only distort the measurement and classification of majoritarian and consensus democracy but might also affect the performance results. Another important factor he neglects is quality of statehood, although the state’s monopoly on the use of force and a working administration could be called into question for some of the countries in his sample (e.g. India, Jamaica). Finally, there is a methodological weakness. First, although missing data represents a problem in his analysis (Lijphart, 2012, p. 267), he does not consider resolving this issue using appropriate methodological approaches. Second, while his performance data would allow a more powerful time-series cross-sectional
1.4 Contribution of this Study to the Research
5
(TSCS) analysis, he ignores the time characteristics of his data and performs only a cross-sectional analysis. He averages the data for all countries over certain time periods. This does result in a loss of statistical power due to the artificial and significant reduction of the sample size. It is also problematic because the time-periods are arbitrary determined, both for the dependent as well as the independent variables. In addition, although he controls for outliers, he does not report any other diagnostic procedures and robustness checks. Other approaches improved on this matter: Gerring and Thacker (2005; 2008) estimate a TSCS analysis. However, due to the complexity of this method (see chapter 9), the correctness of the TSCS analyses are highly dependent on the right model specifications (S.E. Wilson & Butler, 2007). Therefore, they do not control for the autocorrelation with a lagged dependent variable and they do not control for unit heterogeneity, which casts doubt on whether the model is correctly specified. Especially, since they do not report any model diagnostics (e.g. residual checks). Thus, all these considerations demonstrate a significant research gap and the need for further research.
1.4
Contribution of this Study to the Research
This study contributes to the research about the democracy model and performance by overcoming and mitigating these three research deficits. First, this study makes a two-fold conceptual improvement: 1. In contrast to the first research deficit, the typology of democracy models which is proposed in this study is derived entirely deductively. The characteristics of these democracy profiles are embedded in the framework of the 15-field matrix of democracy (Lauth, 2004; Lauth & Schlenkrich, 2018a) with its institutions and dimensions. These institutions and dimensions guide the selection process of the relevant features. In addition, democracy profiles are based on the idea of trade-offs. These trade-offs are conceptually linked to competing democracy conceptions (e.g. libertarian vs. egalitarian democracy conception). Each democracy profile is located at a different end of the tradeoffs so that these democracy profiles correspond to democracy conceptions and are deductively derived from them. 2. The performance areas to assess the effects of the democracy profiles are selected for conceptual reasons. I propose a theory-based topology of performance which I call the AGIL typology of political performance. However, in contrast to other approaches, which usually define the broad framework but are unable to justify the selected performance areas in more detail, my typology
6
1
Introduction
is founded on Parsons’ AGIL paradigm. The AGIL paradigm with its idea that a system has to necessarily perform certain functions provides a heuristical framework which allows a reasonable selection of these performance elements. Furthermore, the study contributes with two theoretical innovations: 3. This study aims to develop a causal chain linking democracy profiles and performance. Based on a literature review, I identify three general effects of these democracy profiles (a direct effect caused by the form and mode of the decision-making process; an indirect effect which shapes the negotiation method and attitudes of actors within the political institutions and finally a second indirect effect which shapes the attitudes of the citizens). However, this is only successful to a certain extent, because there is a serious limit of institutional explanations, including the effects of democracy profiles: Institutions enable or constrain actions of individuals—but they do not determine it. Thus, the separate role of actors is important and institutions do not have a causal effect alone, rather there is a dense interplay with the actors within those institutions. This means that the causal chain of the effects of the democracy profile is usually too long and the outcome to be explained is causally too far away (Gerring et al., 2005, p. 159). 4. I discuss and incorporate additional control variables to further isolate the effects of the democracy profiles: Since my sample consists of a heterogeneous set of countries, it is necessary to consider the role of the quality of statehood (Grävingholt et al., 2015; Schlenkrich et al., 2016a, 2016b) and the influence of informal institutions (Lauth, 2000; Helmke & Levitsky, 2004). Both factors have an impact on performance. Finally, the study makes a methodological contribution: 5. I enrich the overall deductive framework with more explorative and inductive methods: I compare the concepts of the democracy profiles and performance typology to the results of a cluster analysis and exploratory factor analysis. While the conceptualization ensures an overall deductive approach, I do not manually force a certain aggregation or a specific threshold value, but the exploratory methods “allow the data to speak for itself” and reveal empirical patterns in the data. Thereby, the exchange between the concept and the
1.4 Contribution of this Study to the Research
7
methods works in both directions: On the one hand, this can lead to a revision of the concept if the data show surprising but conceptually compatible results; on the other hand, it also can result in the use of a different specification within those exploratory methods if these results do not match the concept (e.g. selecting a different number of clusters; excluding a specific factor). 6. I treat missing values in the study, if all presuppositions are fulfilled (e.g. missing at random—MAR). Not treating missing values can lead to a significant bias in the estimates and to a loss of statistical power (Graham, 2012; Schlomer et al., 2010; Buuren, 2018). I apply multiple imputation (MI) and full-information maximum likelihood (FIML) (Graham, 2009, pp. 555–558) which are considered state of the art. 7. I employ time-series cross-sectional (TSCS) analysis. While other studies use cross-sectional analysis, the application of TSCS analysis, the combination of the features of time-series analysis with the properties of cross-sectional analysis, has theoretical and statistical advantages in the study of the effects of democracy profiles and performance: On the one hand, it allows to model complex and dynamic theories with effects varying over time and between countries. On the other hand, the combination of time and units increases the sample size and thus, “increases statistical leverage” (Fortin-Rittberger, 2014). In addition, due to the difficulty of estimating time-invariant or slowly changing variables such as the democracy profiles of this study within the TSCS framework, I rely on a particular multilevel model, called the within-between multilevel model (Bell & Jones, 2015). However, I enhance this model by reapplying it to the standard TSCS models with lagged dependent variables (ADL: autoregressive distributed lag models; ECM: error correction models, see Beck & Katz, 2011). This study contributes methodologically not only by using the more complex TSCS framework but also by estimating withinbetween multilevel ADL and ECM models. Finally, it also contributes by using Bayesian instead of Frequentist statistics to show that Bayesian statistics is not only able to cope with the increasing model complexity but also offers more model diagnostics. Bayesian statistics is also helpful in dealing with missing data. 8. Finally, the last methodological enhancement is the consideration of uncertainties in the clustering and estimation process: Often there are serious overlaps between clusters. To account for this uncertainty in the classification, I use fuzzy cluster analysis (D’Urso, 2015) to obtain membership degrees, so that the cluster analysis does not categorize a country to only one democracy profile. These membership degrees highlight how likely it is that a country belongs to a certain democracy profiles (e.g. United Kingdom has a high probability
8
1
Introduction
belonging to the libertarian-majoritarian democracy profile, while Australia shows a higher uncertainty in its classification, since it belongs to the other democracy profiles with almost equal degree). To deal with this kind of data, which is called “compositional” data (Aitchison, 1982), certain estimation methods have to be applied (logratio transformation, Dirichlet regression). Thus, these eight points highlight the conceptual, theoretical and methodological relevance of this study. However, the study contributes in three more aspects: 9. I present and analyze empirical data for various policy outcomes. The last study which did something similar is the study by Roller (2005). However, her data ends in 1995. Here, I present data until 2017. However, besides presenting the development of each performance area, I describe the impact of the financial crisis in 2007/2008 on these different outcomes. 10. Since the data for the democracy profiles are derived from the Varieties of Democracy (V-Dem) dataset, I analyze the quality of this data in terms of validity and reliability as well as the presuppositions of the V-Dem measurement model. V-Dem uses Bayesian ordinal item response theory to calibrate and aggregate the numerous ratings of the coders, thus trying to identify and correct measurement errors by the coders. However, it presupposes that countries are coded by a sufficient number of expert coders. To my knowledge, this is the first attempt to analyze the V-Dem dataset independently from the V-Dem institute itself. 11. Finally, the study has practical relevance: In particular, the AGIL typology of political performance makes it possible to assess the strengths and weaknesses in the performance of a political system. In this way, practitioners can easily identify possible and relevant areas for improvement.
1.5
Research Design and Methods
While Lijphart’s (36 democracies) and especially Roller’s (21 OECD countries) sample are constrained, this study seeks to analyse a broader spatial and temporal sample by including all democracies to be able to generalize these findings. All democracies have democracy profiles and should therefore be included in the analysis. However, the data availability varies between the research questions: First, the most data is available for the question about the identification and origins of democracy profiles. Due to the usage of the V-Dem and Democracy Matrix
1.5 Research Design and Methods
9
dataset, the temporal and spatial coverage is rather broad for the identification of the democracy profiles: It includes all democracies1 (111) from 1900 to 2017. However, it is not possible to achieve the same coverage for the performance research question. Here data availability is severely limited in terms of spatial and temporal coverage. This is particularly problematic because the measurement of each performance area is based on several indicators (see chapter 5). The earliest year for which data are available is 1970, and most data are available since 1990. However, it is important to note that most of the democracies (outside of the OECD world) only emerged since 1974 (third wave of democracy) and especially since 1990 (Lauth et al., 2020). Although it would certainly be advantageous to have longer time-series, the data are more or less available for the most relevant period. To reduce the impact of missing values, I rely on multiple imputation. The study depends solely on quantitative methods. Importantly, the effects of institutions on performance cannot be sufficiently analyzed with case studies in a qualitive manner. There is a serious limitation of the institutional explanations of democracy profiles: Institutions do not have a causal effect alone and policy outcomes are affected by a lot of different variables, most of them are probably much closer to the outcome. So, “while one might be inclined to doubt an isolated finding such as the foregoing (parliamentary systems lead to lower infant mortality), one is forced to take this finding seriously if it is embedded in a larger pattern.” (Gerring et al., 2005, p. 159). Therefore, this study relies on quantitative methods which allow to analyze a variety of different countries over a longer time-period to isolate the effects of democracy profiles. Qualitative studies would have difficulties here because of the wealth of other, probably greater effects. Overall, the study employs a variety of methods: Cluster analysis is used to empirically detect the democracy profiles and also certain performance profiles. I especially rely on fuzzy clustering to express the uncertainty in the classification and the empirical complexity which implies no clear profiles but rather mixed ones; exploratory factor analysis (EFA) allows the creation of latent factors out of various isolated indicators. These latent factors represent the performance areas (e.g. environmental performance, economic performance). In addition, EFA allows to evaluate this aggregation procedure and diagnose problems. Finally, I apply different regression techniques, which dependent on the nature of the dependent variable. I use the Dirichlet regression to include the uncertainty given by the membership probabilities resulting from the fuzzy classification process. Poisson regression is utilized to analyze, what affects the number of coders for an 1
The classification is based on the core measurement of the Democracy Matrix dataset, and includes deficient and working democracies (Lauth & Schlenkrich, 2019), see chapter 2.
10
1
Introduction
item and country in the V-Dem dataset. Finally, I employ multilevel models and TSCS regression. Due to the variety of methods applied here, I use the so-called directed acyclic graphs (DAGs) throughout the book for a simple but detailed representation of the regression models (Levy & Mislevy, 2016; see for an example Figure 3.13 in chapter 3). DAGs provide a simple visualization of all model parameters and their dependencies to each other: Thereby, rectangles represent the observed or by the user-defined values (e.g. independent or dependent variables or the priors in Bayesian analysis), while circles display all parameters which are estimated by the statistical model and are thus unknown. Those shapes are directed and connected to each other by “one-headed arrows, so that there is a ‘flow’ of dependence” (Levy & Mislevy, 2016, p. 36). The graphics are acyclic because this flow of dependence only runs in one direction through the model paths; then returning to the original location is not possible. Lastly, the study is completely transparent. Every step can be replicated, because all R-scripts and datasets can be found on https://github.com/OSchlenkr ich/PerformanceDemocracies and are available for download. The R environment and the R packages used for the calculations are also freely available (https:// www.r-project.org/).
1.6
Plan of the Book and Summary
This book contains overall 8 chapters divided into three parts (excluding the introduction and conclusion): The first part is concerned with the conceptualization and identification of democracy profiles. The second part conceptualizes the AGIL typology of performance and measures it empirically. The third part, finally, combines the previous two parts by causally analyzing the relationship between democracy profile and certain performance areas. Figure 1.1 visualizes the relationships of the parts and single chapters of this study. The first part contains two chapters: Chapters 2 deals with the conceptualization and identification of democracy profiles. Thereby, the democracy profiles are based on the idea of trade-offs between three central democracy dimensions: political freedom, political equality and political and legal control. It is theorized that a perfect democracy which maximizes all three dimensions simultaneously does not exist but rather that the emphasis on one dimension must lead to a reduction in another dimension. Two central trade-offs are defined: First, the trade-off between the freedom and equality dimensions (inclusiveness); and second the trade-off between the freedom and control dimensions (effective government). Applying a cluster analysis to the Democracy Matrix dataset, I am able to
1.6 Plan of the Book and Summary
11
Figure 1.1 Plan of the Book
identify five different democracy profiles, each with a different dimensional configuration: Libertarian-majoritarian democracies combine high freedom with low equality and control values, egalitarian-majoritarian democracies favor equality over freedom and control, libertarian-control-focused democracies are characterized by high freedom and control values at the cost of equality values, while egalitarian-control focused democracies emphasize equality and control at the cost of freedom. Finally, a cluster can be distinguished that balances all democracy dimensions. Since the Democracy Matrix relies on the Varieties-of-Democracy dataset, chapter 3 assesses the quality of this dataset. V-Dem is an expert survey with over 3000 experts contributing to it. It offers data for over 180 countries from 1900 to 2018 (V-Dem Version 9): Can we have confidence in the data used to calculate the quality of democracy and identify the democracy profiles? I conclude that the dataset has a conceptual bias towards the freedom dimension, some unclear formulated questions and reliability issues considering that only a minority of the expert coders rate most of the data. However, while the conceptual bias and validity issues concern the correct measurement of especially the equality dimension, the reliability does not affect all countries equally. However, since my
12
1
Introduction
study focuses on democracies—a class of countries with above average number of experts—there should be overall a sufficient degree of validity and reliability. The second part of this book conceptually and empirically develops a typology of political performance, which I call the AGIL-typology of political performance. Chapter 4 drafts the overall framework by laying the conceptual foundations of the typology. This typology differentiates between three broad performance dimensions (goal-oriented performance; general performance and policy regime performance). Thereby, to overcome a major point of criticism of all previous proposals, Parsons’ AGIL scheme helps to justify the selected performance criteria within these three dimensions, by focusing on performance criteria which are important for the surviving of a system: adaptation, goal-attainment, integration and latent pattern maintenance. Derived from these general functions, I distinguish between economic outcomes, environmental outcomes, goal-attainment outcomes, social outcomes, domestic security outcomes and latent pattern maintenance outcomes. Since data for the general performance dimension is not available, I only focus on the goal-oriented performance and regime performance dimension in this study. Chapter 5 conceptualizes in greater detail the single areas of the goal-oriented performance by using parsimonious definitions with discriminatory power and selecting indicators based on their construct validity, data quality and spatial and temporal coverage. Most of the definitions have a strong discriminatory power, and are measured by multiple indicators with high construct validity. In addition, I was able to derive time-series data with a broad spatial coverage, although this coverage varies by the performance areas. Overall, the dataset consists of data for a maximum of 85 democracies and covers a range from 1970 to 2017. The most problematic performance area is the goal-attainment performance. Not only is the concept rather weak, as I was not able to sufficiently conceptualize what good reformability means, the indicators have a weak construct validity since they only focus on constitutional amendments, leaving out the important part of institutional learning or implicit constitutional change. Moreover, the data are very limited, exhibit inconsistencies and, in contrast to the other performance outcomes, do not allow time-series analysis. Finally, the measurement of the performance dimension, confidence in political institutions, stands out because it is assessed on the basis of survey data. The chapter 6 is concerned with the aggregation of these indicators for each performance area. It is based on a methodological framework which combines multiple imputation, indicator transformation, exploratory factor analysis and diagnostic procedures. The aggregation with the help of the exploratory factor analysis extracts several components for each performance area: Economic
1.6 Plan of the Book and Summary
13
wealth and productivity; general environmental performance, economic equality and social equality, domestic security performance and finally, confidence in political institutions in the latent pattern maintenance performance area. The last chapter 7 of the second part analyzes these performance areas in a descriptive and exploratory way: How did these different performance areas develop over time? What was the impact of the financial crisis and did the sample become increasingly similar (convergence) or did the sample diverge? And can we distinguish between different types of performance configurations? An understanding and knowledge of this performance data is not only interesting in itself, but is also relevant for this study, because it is a validity test of the data and therefore, a presupposition for the causal analysis in the next chapters. There was no trend for the domestic security performance and confidence in the political institutions. Significant improvement could be identified for the economic and environmental performance, while a troublesome negative development in the form of rising economic inequalities was also apparent. The financial crisis impacted all countries in some performance areas (productivity component of the economic performance), but its effects concentrated on few countries (Spain, Greece, Portugal and Cyprus). Finally, there is a strong convergence movement in several performance areas (e.g. environmental performance, economic equality performance). Having created my two most important variables in this study, the performance outcomes and the democracy profiles, the third part of this book empirically analyzes the relationship between the democracy profiles and performance. Chapter 8 analyzes the effects of democracy profiles on these different performance outcomes. After the development of the theory which differentiates between three general effect types of democracy profiles, I apply a time-series cross-sectional analysis. However, I rely on Bayesian statistics because it has several advantages for this study (e.g. greater flexibility for the estimation of complex models; easy incorporation of missing data; robustness in small sample settings). The results show that the democracy profiles do not have an immediate effect, but these effects need a longer period of time to manifest themselves (usually decades). The results indicate that there is not an overall better performing democracy profile. Libertarian democracies have a (slight) advantage in economic performance, while egalitarian democracies show better results on the environmental outcomes, and especially on the economic equality and social equality outcomes. There are mixed results for goal-attainment performance, and no effects are found for confidence performance. In a last step, I empirically analyze policy regime performance in chapter 9. Since democracy profiles can be understood as a part of the policy regime performance (goal-attainment), the second research question will be addressed: What
14
1
Introduction
causes democracy profiles? I distinguish between several theoretical structural factors, power resource theory and cultural factors. The empirical regression analysis reveals that several structural factors are relevant (e.g. British heritage leads a libertarian-majoritarian democracy profile identical to the British one), while there are no effects for the power resource theory. Cultural orientation is also important for the explanation of democracy profiles: A competitive and hierarchical culture leads to a more libertarian democracy profile. In a small excursus the view is broadened to other political regimes and the question is asked whether the democracy profiles co-exist with specific other policy regimes forming a coherent whole? In a descriptive analysis, it is shown that liberal market economies, libertarian democracy profiles and minimal welfare states are associated with each other, while coordinated market economies, social democratic/conservative welfare states and egalitarian democracy profiles co-exist. Besides institutional reasons, I emphasize a cultural explanation: They co-exists because they are part of the same cultural background. The former highlight competition and individualism, while the latter emphasize harmonic culture with empathy for the vulnerable. I conclude in chapter 10 by addressing the implications and limitations of this study.
Part I Democracy Profiles
2
Identifying Profiles of Democracy
2.1
Introduction
How can we make sense of all the different institutional designs of democracies? Structuring the political reality is an important task of comparative politics. Therefore, typologies are a useful and necessary tool. Typologies structure the confusing political reality by reducing empirical complexity and focusing on its most relevant aspects. Various efforts have been made to capture the fundamental institutional choices in the diverse and heterogeneous world of democracies (Bogaards, 2017a): For example, democracies are divided into parliamentary and presidential systems (Shugart & Carey, 1992; Steffani, 1979), collective and competitive veto points (Birchfield & Crepaz, 1998; Crepaz & Moser, 2004), decentralized and centripetal democracies (Gerring & Thacker, 2008), or nationstate and state-nation institutions (Stepan et al., 2010). The most influential proposal is Lijphart’s (2012) typology of majoritarian and consensus democracy which has been much debated and considerably criticized (Bormann, 2010; Fortin, 2008a; Giuliani, 2016; A. Kaiser, 1997; Lauth, 2010a). Recently, the quality of democracy research began to distinguish between different types or profiles of democracies, concluding that a perfect democracy does not exist. A democracy cannot perform at its best in all dimensions and functions This is a particularly methodologically revised version of Oliver Schlenkrich. 2019. Identifying Profiles of Democracies: A Cluster Analysis Based on the Democracy Matrix Dataset from 1900 to 2017. Politics and Governance 7 (4): 315–330. Electronic supplementary material The online version of this chapter (https://doi.org/10.1007/978-3-658-34880-9_2) contains supplementary material, which is available to authorized users. © The Author(s), under exclusive license to Springer Fachmedien Wiesbaden GmbH, part of Springer Nature 2021 O. Schlenkrich, Origin and Performance of Democracy Profiles, Vergleichende Politikwissenschaft, https://doi.org/10.1007/978-3-658-34880-9_2
17
18
2
Identifying Profiles of Democracy
simultaneously. Rather “every democratic country must make an inherently valueladen choice about what kind of democracy it wishes to be” (Diamond & Morlino, 2004, p. 21). There are trade-offs between central dimensions and functions of democracy. Thereby, democracies emphasize some dimensions or functions, while others are necessarily neglected. Newer measurements of democracies (e.g., Democracy Barometer, V-Dem) attempt to shed light on this possibility. The Democracy Barometer (Bühlmann et al., 2012) identifies several different clusters of democracies on a preliminary basis. V-Dem examines the possibility of trade-offs in their conceptual papers and highlights the tension between institutions of the liberal and majoritarian conception of democracy (Coppedge et al., 2011). However, these democracy measures are not able to measure different democracy profiles (e.g., countries can have high degrees of democratic quality in each dimension). I draw on the novel dataset of the Democracy Matrix (Lauth & Schlenkrich, 2018a) which is a customized version of the Varieties of Democracy (V-Dem)1 dataset (Coppedge et al., 2018). It is a measurement instrument which is not only designed to gauge the quality of democracy, but also to capture several trade-offs between dimensions caused by specific institutional choices of the democracies. It proposes various trade-offs between three fundamental democracy dimensions, namely political freedom, political equality, and political and legal control. Conceptually, I identify several democracy profiles: Libertarianmajoritarian democracies stress the freedom dimension over both the equality and control dimension; egalitarian-majoritarian democracies focus on the equality dimension but neglect freedom and control. In addition, it is possible that there can be a mix between high freedom and control dimensions (libertariancontrol-focused democracy) as well as a mix between high equality values and high control values (egalitarian-control-focused democracy). This chapter applies a cluster analysis with validation strategies to this dataset to test whether these conceptually proposed democracy profiles can be detected empirically. The democracy profiles identified in this chapter represent the main research interest of this study. Thereby, the democracy profiles are the key dependent and independent variable in this study. This chapter proceeds as follows: Section 2.2 describes the conceptualization and measurement of the Democracy Matrix. Section 2.3, the methodology section, presents the multiple steps of the cluster analysis and the cluster validation strategies. Finally, the results of the cluster analysis are presented (Section 2.4) and discussed (Section 2.5), followed by a conclusion (Section 2.6). 1
For a detailed analysis of the V-Dem dataset, see Chapter 3.
2.2 Democracy Matrix: Trade-Offs and Democracy Profiles
19
2.2
Democracy Matrix: Trade-Offs and Democracy Profiles
2.2.1
Democracy Conception and Trade-off-Relationship
How can we reasonably define democracy? In democracy theory, three different conceptual ranges became apparent: minimal definitions, middle-range definitions, and maximal definitions. Although there is a large scientific consensus on the minimal definition of democracy—the repeated holding of elections with competition and broad participation, it has become clear that a nuanced view on the quality of democracy, especially for established democracies, is not possible within the boundaries of this definition. Maximal definitions overstretch the concept of democracy by focusing on socio-economic outcomes unrelated to the democratic procedures which are the real focus of the analysis (welfare state within the social democracy concept). However, middle-range definitions supplement the minimal democracy concept only insofar as this is necessary for a differentiated analysis and thus, the definition remains within the limits of a narrow and procedural understanding of democracy. The democracy concept of the Democracy Matrix (Lauth, 2015; Lauth & Schlenkrich, 2018a) is based on such a middle-ranged understanding of democracy. The Democracy Matrix combines three dimensions with five central democratic functions: While the dimension of freedom captures the extent of citizens’ free self-determination based on civil and political rights, the equality dimension encompasses legal egalitarianism and the actual realization of those rights (inputegalitarianism). The control dimension takes into account the protection of the two other dimensions through legal control by judiciaries and political oversight by intermediary institutions, the media, and parliament. In addition, five key functions cut across these three dimensions specifying the concept of democracy quality. The function “procedures of decision” captures the democratic quality of representative elections and direct democracy. The “regulation of the intermediate sphere” analyzes interest aggregation and interest articulation by parties, interest organizations, and civil society. “Public Communication” evaluates the functioning of the media system and the public realm. The function “guarantee of rights” analyzes the democratic quality of the court system, whereas the last function “rules settlement/implementation” focuses on the democratic quality of the executive and legislative branches’ work. This produces 15 matrix-fields which guide and support a detailed analysis of the quality of democracy (see Figure 2.1). For example, the three matrix fields of the institution “Public Communication” assess whether the media system can freely
20
2
Identifying Profiles of Democracy
Notes: The dark grey boxes represent the three dimensions and the five central instuons of the Democracy Matrix. Light grey boxes are the 15 matrix fields, each represenng a combinaon of one dimension and one instuon. The focus of the analysis of the democrac quality for each matrix field is described by the text in the light grey boxes (e.g., “communicave freedom” is the focus of the matrix field which is part of the instuon “Public Communicaon” and the freedom dimension). The two-headed arrows represent the derived trade-offs and the text inside the grey boxes describes the components involved in the trade-off, e.g., there is a trade-off between judicial review (control dimension) and effecve government (freedom dimension). Source: Lauth/Schlenkrich (2018b) and Schlenkrich (2019)
Figure 2.1 Concept of the Democracy Matrix
2.2 Democracy Matrix: Trade-Offs and Democracy Profiles
21
operate (freedom dimension), whether interests are equally represented in the public sphere by diverse media outlets (equality dimension), and finally, whether the media system is able to criticize and control the government (control dimension). Democracies—defined by the Democracy Matrix—preserve all dimensions of political freedom, political equality, and political and legal control, as well as maintain a democratic functional logic in all five key institutions. It may be that some of its characteristics are only partially developed as long as the central democratic functional logic is retained such as in deficient democracies in which elections occur in combination with some deficits in the rule of law. Thereby, the democracy matrix conceptualizes the internal relationship of these central dimensions to each other (Lauth, 2016a). It differentiates between complementary effects and conflicting effects of the democracy dimensions (trade-offs). On the one hand, these dimensions reciprocally support one another: Elections are only meaningful if they are not only competitive but also allow nearly universal suffrage, or more generally, freedom needs a minimum level of equality and vice versa. On the other hand, there are tensions between the dimensions (Diamond & Morlino, 2004). This means a perfect democracy that fully realizes all democracy dimensions cannot exist. Conflicting effects (trade-offs) can also be understood as a normative dilemma for democratic societies. They give expression to a political conflict over values, on which society must take a position. Stressing one value, which might have been selected in a process of negotiation by the different social forces (Bühlmann et al., 2012, p. 123) or which reflects a specific cultural preference (Maleki & Hendriks, 2015), changes the degrees of development of the individual dimensions and their weights relative to one another. The conflicting effects of the dimensions or trade-offs allow citizens to shape their democracy according to their normative preferences. As Berlin (2000, p. 23) states: Liberty and equality, spontaneity and security, happiness and knowledge, mercy and justice—all these are ultimate human values, sought for themselves alone; yet when they are incompatible, they cannot all be attained, choices must be made, sometimes tragic losses accepted in the pursuit of some preferred ultimate end.
Trade-offs arise because some democracy concepts (e.g., egalitarian vs. libertarian democracy) can be arranged as opposing pairs and prefer different institutional solutions for the same function. These conceptions have an equal normative weight and it is equally possible to justify them (‘antinomy’: Hidalgo, 2014). In addition, they are recognized as having the same level of democracy quality, which means that the conceptions and their institutional decisions are neutral concerning the quality of democracy. Ultimately, every conception of democracy
22
2
Identifying Profiles of Democracy
emphasizes different political values, while others are neglected (e.g., equality as opposed to freedom). This means that they exhibit a different dimensional structuring of the same democratic quality (e.g., equality dimension over the freedom dimension). Hence, due to their connection to different conceptions of democracy, institutions emphasize different democracy dimensions. The tensions between the dimensions are reflected in institutional decisions and one cannot completely realize all three dimensions of the Democracy Matrix since they are unavoidably bound to conflicting goals. Thus, the framework of the Democracy Matrix with its analytical distinction between dimensions and institutions supports and controls the selection of relevant trade-offs. The Democracy Matrix differentiates between two opposing pairs of democracy conceptions. The first pair tackles the levels of effectiveness of the government or the conflicting relationship between the freedom and control dimension: Is the decision-making process separated between the different powers which control the government and does the government have to rely on a broad consensus? Or is there a higher level of freedom for the government through its centralized power? This follows the idea of a distinction between majoritarian and consensus democracies (Lijphart, 2012) which are opposing concepts of democracy and cannot be realized simultaneously. The former focuses on majority rule, the latter on an extended system of reciprocal mechanisms of oversight. Whereas consensus democracy emphasizes the interplay of several veto players (Tsebelis, 2002), which restrict the action of governments (e.g., strong second chambers, coalitions, constitutional courts), the ideal–typical majoritarian democratic institutions favor effective government, that is structures with more limited capacities for oversight. Consensus democracy can also be understood as a constitutional democracy, whose core element is a strong constitutional court. Popular legislative initiatives are included as a further trade-off element. However, the distinction between controlled and uncontrolled referendums is important (Vatter, 2009, p. 128). The former “relate to the power-concentrating characteristics of majoritarian democracy, while the latter can be initiated by citizens and confirm the power-dispersing characteristics of consensus democracy” (Bormann, 2010, p. 5). To emphasize the dimension involved in this trade-off, I call these types majoritarian and control-focused democracy profiles. The second opposition is the gap between libertarian and egalitarian conceptions of democracy which represent the tension between freedom and equality. This trade-off captures the inclusiveness of access to the government or political influence. Whereas egalitarian democracies underscore political equality, libertarian democracy focuses on the realization of political freedom. Egalitarian democracies emphasize inclusiveness by the introduction of equal representation and an
2.2 Democracy Matrix: Trade-Offs and Democracy Profiles
23
equal chance of representation through PR-systems, egalitarian political finance, and fair media regulation. To the contrary, libertarian democracies are considered to be more exclusive with their FPTP-system and their “lack of restrictions on expenditure and contributions, market principles of access to the media [and] no public funding” (Smilov, 2008, p. 3). Although these trade-off conceptions can be considered in isolation, their differentiation and informative value arises from their combination: Thereby, this combination of the democracy conceptions creates coherent and incoherent democracy profiles. The coherent democracy profiles are the libertarian and majoritarian democracy (Fec) and the egalitarian and control-focused democracy (fEC). While the Fec is a combination of two democracy conceptions emphasizing the freedom dimension, the fEC is a combination of two democracy conceptions which weaken the freedom dimension. The incoherent democracy profiles are a mix of democracy conceptions, where one conception strengthens the freedom dimension, while the other weakens it. The libertarian and controlfocused democracy has a lower effective government due to power-dispersion characteristics and therefore reduces the freedom dimension. However, it has also a higher freedom dimension because it emphasizes unequal representation and exclusion. Depending on the relative strength of the two democracy conceptions, two configurations can be distinguished here: feC or FeC. The same applies to the egalitarian and majoritarian democracy: On the one hand it favors inclusion and reduces the freedom dimension, on the other hand it strengthens the freedom dimension through power-concentrating characteristics. This results in two dimensional configurations: FEc and fEc. Both opposing pairs and the accompanying profiles of democracy are displayed in a two-by-two matrix (as seen in Table 2.1). Moreover, these two opposing pairs of democracy conceptions resemble, on the one hand, the democracy models of decentralism and centripetalism (Gerring et al., 2005; Gerring & Thacker, 2008) and, on the other hand, the distinction between collective and competitive veto points (Birchfield & Crepaz, 1998). Gerring and Thacker differentiate between two fundamental aspects: authority and inclusion. While the trade-off between freedom and control encompasses the aspect of authority which “indicates the extent to which political institutions centralize constitutional sovereignty within a democratic framework” (Gerring & Thacker, 2008, p. 16), the trade-off between freedom and equality is similar to the inclusion element which “indicates the extent to incorporate a diversity of interests, ideas, and identities in the process of governance” (Gerring & Thacker, 2008, p. 16). Translating the democracy profiles to the types developed by Gerring and Thacker, libertarian-majoritarian democracies correspond to the centralized democracies (high authority, low inclusion), egalitarian-majoritarian democracies resemble the
24
2
Identifying Profiles of Democracy
Table 2.1 Conceptually Derived Democracy Profiles Effective Government (Freedom vs. Control) Inclusiveness (Equality vs. Freedom)
High
Low
High
Egalitarian and majoritarian democracy (FEc, fEc)
Egalitarian and control-focused democracy (fEC)
Low
Libertarian and majoritarian democracy (Fec)
Libertarian and control-focused democracy (FeC, feC)
Notes: The letters in brackets represent the three central dimensions of democracy, namely freedom (F), equality (E), and control (C). An upper-case letter instead of a lower-case letter indicates that the dimension is pronounced relative to the other dimensions. For example, the abbreviation Fec stands for a democracy that emphasizes the freedom dimension at the expense of the equality and control dimensions. Source: own table
centripetal model (high authority, high inclusion), and finally, the control-focused democracies (either in a libertarian but more in an egalitarian way) are quite similar to the decentralized democracies (low authority, high inclusion). These considerations can also be linked to the differentiation between collective and competitive veto points. Whereas collective veto points result “from institutions where the different political actors operate in the same body and whose members interact with each other on a face to face basis” (Birchfield & Crepaz, 1998, p. 182), competitive veto points emerge when the power is “institutional diffused” (Crepaz & Moser, 2004, p. 266) in separate institutions between different political actors. On the one hand, the trade-offs between the freedom and control dimensions, especially the components of bicameralism and the divided government, represent the competitive veto points. On the other hand, the trade-offs between the freedom and equality dimension, especially the element of the electoral system, approximate the theoretical underpinnings of the collective veto points. Overall, the Democracy Matrix is able to incorporate and represent these diverse democracy conceptions by drawing on the idea of trade-offs between central dimensions of democracy.2
2
Doorensplit/Pellikaan (2013) combine three dimensions “centralization vs. decentralization”, “homogeneous vs. heterogeneous society” and “PR electoral systems vs. majority rule”
2.2 Democracy Matrix: Trade-Offs and Democracy Profiles
2.2.2
25
Measurement of Democracy Profiles
How is this democracy conception and its respective democracy profiles measured? I use the data from the Democracy Matrix Dataset V1.1.3 The Democracy Matrix is a customized version of the Varieties-of-Democracy (V-Dem) dataset (Coppedge et al., 2018). V-Dem offers over 353 key indicators for determining democracy quality, covering a period from 1900 to 2018 (as of April 2019) and including approximately 180 countries. The data is collected according to an elaborate procedure and is subject to statistical tests to increase the reliability and validity of the assessments. Democracy Matrix Dataset V1.1 is based on version 8 of the V-Dem dataset. The development of the Democracy Matrix is designed according to the state of the art for measurement concepts, made up of three phases; conceptualization, measurement, and aggregation (Munck & Verkuilen, 2002). Thereby, the Democracy Matrix dataset not only measures each individual matrix field but also provides data for the matrix fields aggregated into dimensions and institutions (see Figure 2.1). In contrast to other democracy indices, the Democracy Matrix explicitly considers the integration of trade-offs in the measurement stage by applying a two-step measurement strategy (Lauth, 2016a): Quality measuring indicators consist of the usual indicators used by various democracy measures, while trade-off measuring indicators measure the conflicting impact of the dimensions within the Democracy Matrix. The former indicators are linear in the sense that higher values indicate a higher democracy quality. The latter are bipolar which means that each end of the scalar indicates a highly developed characteristic of the profile. Therefore, maximum values are not possible simultaneously in each dimension. The conflicting effects are not characterized by generally differing degrees of democratic quality, but rather by the distribution of to create a “cube” with eight different profiles of democracy (depoliticized democracy, consociational democracy, unitary democracy, consensus democracy, majoritarian democracy, federal democracy, centrifugal democracy, centripetal democracy). They base their ideas on the works of Lijphart (1996, 2012), Gerring/Thacker (2008) and Norris (2008). However, they often follow an idiosyncratic interpretation and shorten the argumentation and conceptions of these authors: For instance Lijphart’s consensus democracy is not only about the electoral system and decentralization (although these are important elements). Similarly, Lijphart’s consociational democracies do not only focus on the electoral system and societal structure but also rather on mutual veto rights and cultural autonomy (Lijphart, 1996). In addition, consociational democracy is characterized by informal rules and behavior (Bogaards et al., 2019). Finally, they state that “centripetal democracy also includes federal democracy” (Doorenspleet & Pellikaan, 2013, p. 248) which is contrary to Gerring’s and Thacker’s conception. 3 see www.democracymatrix.com
26
2
Identifying Profiles of Democracy
democracy quality in different dimensions. Trade-off indicators represent the differences in the shape of these dimensions to each other. A libertarian-majoritarian and an egalitarian-majoritarian democracy have different profiles, but they could have the same democratic quality. For example, the freedom dimension of the institution “Public Communication” is measured as follows: The matrix field is conceptually based on the idea of communicative freedoms which is made up of the two components “freedom of the press” and “freedom of opinion”. These two elements are measured by seven V-Dem indicators in total. The first component, freedom of the press, is the average of the three indicators “harassment of journalists” (v2meharjrn), “government censorship effort” (v2mecenefm), and “internet censorship effort” (v2mecenefi). The freedom of opinion component is the average of the four indicators “freedom of discussion—women” (v2cldiscw), “freedom of discussion—men” (v2cldiscm), “freedom of Religion” (v2clrelig), and “freedom of academic and cultural expression” (v2clacfree). Finally, both components are scaled between 0 and 1 and are multiplied together in the sense of necessary conditions to derive the final value for this matrix field. These values are linear in the sense that higher values indicate a higher level of quality of democracy in this matrix field. All other matrix fields are measured similarly so that the Democracy Matrix applies approximately a selection of 100 V-Dem indicators. This is the first step of the measurement strategy: These quality measuring indicators are the basis for the regime classification and the subsequent trade-off measurement if the country is classified as a democracy. Furthermore, the Democracy Matrix locates the following trade-off between the freedom and the equality dimension in the institution “Public Communication”: Libertarian Media Access versus Egalitarian Media Access. Whereas libertarian media access is characterized by the fact that “it only provides for market access to the media” (Smilov, 2008, p. 9, emphasis in the original), the egalitarian model relies on free airtime and restrictions on the purchase of additional media airtime. This trade-off is the weighted average of the three VDem indicators “election paid interest group media” (v2elpaidig), “election paid campaign advertisements” (v2elpdcamp), and “election free campaign media” (v2elfrcamp). These combined indicators are then transformed in a bipolar way: If there are no restrictions, they provide higher values for the freedom dimension (up until 1) and lower values for the equality dimension (down until 0.75 which is the threshold of a working democracy). And vice versa, the more regulation exists, the higher the value for the equality dimension (up until 1) and the lower the value for the freedom dimension (down until 0.75). Afterwards, these values are multiplied with the values for the quality measurement of the first step. This
2.3 Research Design: Multiple Imputation and Multi-Step Cluster …
27
applies to all the matrix fields where a trade-off is identified. This produces the final values for the trade-off measurement stage. By combining the quality measuring indicators and trade-off indicators to produce the final values for the trade-off measurement stage, a more realistic evaluation is obtained: If the equality dimension has a low democratic quality according to the quality measuring indicators, the effects can be mitigated to a certain extend by the trade-off indicators but this could still result in an overall inegalitarian democracy profile.
2.3
Research Design: Multiple Imputation and Multi-Step Cluster Analysis
Can we empirically detect these democracy profiles in the data? Do countries have similar democracy profiles? To answer these questions, I apply a cluster analysis with a strong focus on cluster validation to the trade-off measurement data of the Democracy Matrix V1.1 dataset (Lauth & Schlenkrich, 2019). Before the cluster analysis, however, it is useful to deal with missing values to increase the sample size: The matrix fields with the trade-off indicators of the Democracy Matrix dataset have missing values. Therefore, I use multiple imputation to replace the missing values of the trade-off measurement indicators. In this section, I only provide a short description of multiple imputation. Because multiple imputation is used heavily in Chapter 6, this chapter contains more details. In short, multiple imputation creates several datasets by replacing the missing values with plausible values. These plausible imputed values vary in each datasets to express the uncertainty in these estimates (Buuren, 2018, p. 20). The normal workflow would be to apply the statistical analysis (e.g., regression) to each imputed dataset. Afterwards, the results of the analyses are combined using Rubin’s rule, so that they contain the uncertainty of the replacement of missing values. One important prerequisites for the process of multiple imputation is that the data is Missing at Random (MAR): “With MAR data the probability of having a missing data point is related to another variable in the data set but is not related to the variable of interest” (Schlomer et al., 2010, p. 2). By including other observed data into the missing data model, it is possible to estimate plausible values for the missing data. In contrast, multiple imputation should not be applied if the data is Missing Not At Random (MNAR). MNAR means that “the likelihood of missingness is related to the score on that same variable had the participant responded” (Schlomer et al., 2010, p. 3). This occurs when the scale of the missing data is biased due to truncation, e.g., there are no observed values for failed
28
2
Identifying Profiles of Democracy
states precisely because they have failed. For the trade-off dataset of the Democracy Matrix, I can assume that the data follows the MAR pattern: It seems not plausible that the values are missing due to their exceptional high or low values on the trade-off measurement. In addition, a value for a country is only imputed if it is classified as a democracy according to the core measurement, so that at least it has no missing values for the indicators of the core measurement level. Thus, I created 10 imputed datasets for the trade-off measurement variables.4 For each matrix field, a single value, the average value of all imputed values, is then calculated. In contrast to regression analysis, there is no common and agreed way to propagate the uncertainty due to multiple imputation into the cluster analysis, although the incorporation of uncertainty in the cluster classification would be advantageous (Basagaña et al., 2013). However, I can somewhat overlook the uncertainty, because for most cases only one or two trade-off components are missing. Because these components are aggregated to dimensional index scores, which are the focus of the cluster analysis, an imputed value does not in general have a huge impact on this aggregated score. There are several diagnostic procedures which are applied and presented in the appendix (Figure 109, Figure 110, Figure 111, and Figure 112; for a detailed discussion see Chapter 6). The diagnosis revealed problems with the imputation procedure (e.g. the predictive performance was limited for some indicators), but these were not significant enough, so that I could continue the analysis with the imputed values. Multiple imputation increases the sample size from 3339 to 4065 observations. In the next step, these observations are subjected to the cluster analysis to identify possible democracy profiles. 4
I included the following indicators from the Democracy Matrix dataset (Lauth & Schlenkrich, 2019). This variables represent the 15 matrix fields of the trade-off-measurement: “decision_freedom_trade_off”, “decision_equality_trade_off”, “decision_control_trade_off”, “intermediate_freedom_trade_off”, “intermediate_equality_trade_off”, “intermediate_control_trade_off”, “communication_freedom_trade_off”, “communication_equality_ trade_off”, “communication_control_trade_off”, “rights_freedom_trade_off”, “rights_equality_trade_off”, “rights_control_trade_off”, “rule_settlement_freedom_trade_off”, “rule_settlement_equality_trade_off” and “rule_settlement_control_trade_off”. To support the estimation of the missing values, I included the following auxiliary variables: GDP per capita, current US dollars (WDI, wdi_gdpcapcur via QoG dataset, Teorell et al., 2019), Educational Equality (V-Dem, v2peedueq, Coppedge, Gerring, Knutsen, Lindberg, Skaaning, et al., 2019), Lower chamber electoral system (V-Dem, v2elparlel, Coppedge, Gerring, Knutsen, Lindberg, Skaaning, et al., 2019), Legal Origin (lp_legor via QoG dataset, Teorell et al., 2019), Effective Number of Parliamentary or Legislative Parties (gol_enpp via QoG dataset, Teorell et al., 2019), and Population Size (WDI, wdi_pop via QoG dataset, Teorell et al., 2019).
2.3 Research Design: Multiple Imputation and Multi-Step Cluster …
29
Cluster analysis classifies observations using data in form of variables (features) and different cluster algorithms (Everitt et al., 2011; Hastie et al., 2009; James et al., 2013; Kaufman & Rousseeuw, 2005). Cluster analysis can be seen as a form of exploratory data analysis because it reveals structures in the form of groupings within the data. Validation is an important aspect of cluster analysis, as different cluster solutions are often possible and cluster algorithms “tend to generate clusterings even for fairly homogeneous data sets” (Hennig, 2007, p. 258). Therefore, the application of several conceptual and methodological strategies is necessary to validate the cluster solution in this study: 1. A conceptual and theoretical validation 2. Examination of the internal cluster quality using fit indices and visualization techniques 3. Evaluation of the robustness of the cluster solution using resampling procedures The conceptual and theoretical validation ensures that the cluster solution is not just a random artefact but rather conforms to democracy theory. For example, do the clusters found in the data correspond to our deductively expected democracy profiles of the previous section? Do we find a cluster of democracies which have a higher freedom than equality or control dimension (libertarian and majoritarian democracies, Fec)? The second validation strategy, examination of the internal cluster quality using fit indices, guarantees that the cluster solution fits the central clustering aim. Hennig (2019, p. 4) argues that “depending on the subject matter background and the clustering aim, different clusterings can be optimal on the same data set”. Therefore, the appropriate definition of the clustering goal is important as it guides the selection of the aspects of cluster validity and the clustering method. In the literature, two broad clustering aims are generally considered (Akhanli & Hennig, 2020, p. 6): within-cluster homogeneity and between-cluster separation. The former focuses on the formation of clusters which are most similar in their characteristics, while the latter focuses on finding clusters which are separated by gaps from each other. I focus on the clustering goal of within-cluster homogeneity, because democracy profiles should be most similar in their defining characteristics. Democracy profiles are defined based on the specific dimensional configurations of political freedom, political equality, and political and legal control which emerged due to trade-offs. The observations belonging to the clusters should be homogeneous with respect to the shape of the dimensions, i.e. the configuration of democratic quality in these three dimensions, so that countries with the same trade-off design
30
2
Identifying Profiles of Democracy
should be classified together. Thus, I do not aim for the clustering objective of between-cluster separation. Due to the empirical complexity, the transitions from one democracy profile to another must be seen as gradual. This implies, however, that crisp clustering techniques might not be appropriate for our clustering concept. Hence, I accommodate for this assumption by using fuzzy instead of crisp clustering (D’Urso, 2015). Instead of classifying an observation uniquely to only one cluster, fuzzy clustering calculates for each observation the “strength of membership in all or some of the clusters” (Everitt et al., 2011, p. 242). The strength of the membership can vary between 0 and 1, indicating whether an observation seriously overlaps with other clusters resulting in an estimation of uncertainty in the classification. In addition, its membership degrees imply that the differences between the democracy profiles are continuous and not abrupt. How can the success of the clustering be evaluated with respect to cluster homogeneity? Akhanli/Hennig (2020) find that the use of two internal validity criteria, the average within-cluster dissimilarity (Iave.wit ) and the Pearson Gamma (I Pear son ), is best suited for the clustering goal of homogeneous clusters.5 Iave.wit is a e homogeneity, since it measures the average degree of similarity of the different characteristics within the clusters. I Pear son measures the degree of approximation of the dissimilarity structure by the cluster solution. It “supports small within-cluster distances […] but will also prefer distances between clusters to be large, adding some protection against splitting already homogeneous clusters” (Akhanli & Hennig, 2020, p. 16). However, these fit indices are not directly comparable, as the value range of these fit indices will differ and as these indices tend to systematically favor a larger number of clusters. Hennig (2019) and Akhanli/ Hennig (2020) propose a calibration method which improves comparability and counteracts the preference of these indices for a higher number of clusters. It generates several random cluster solutions on the dataset and calculates their respective fit indices. This yields the expected value range of these fit indices for this specific dataset. A calibration is made possible by comparing the actual to the 5
These fit indices are defined as follows (Halkidi et al., 2015; Hennig, 2019): K 1 Iave.wit (C) = n1 d xi , x j , n k −1 k=1
xi =x j ∈Ck
where xi and x j are the N observations, d is the dissimilarity function (e.g. Euclidean) and these observations are split into C clusters, whose number is denoted by k. Finally, I Pear son (C) = cor (vec( d xi , x j ]i< j , vec( ci j i< j )), where vec( d xi , x j ]i< j is the vector of pairwise dissimilarities for the xi , x j observa tions and vec( ci j i< j ) is 0 when xi and x j are in the same cluster, 1 when xi and x j are not in the same cluster. I Pear son (C) represents then the correlation between these two quantities.
2.3 Research Design: Multiple Imputation and Multi-Step Cluster …
31
expected values of the fit indices. Thus, I apply Fuzzy c-means6 (FCM) to the data (D’Urso, 2015) and provide the crisp solutions for the more traditional k-means and PAM (partition around medoids, Kaufman & Rousseeuw, 2005) algorithms. In addition, I illustrate the cluster solution by presenting the clustered objects using the first two principal components (Pison et al., 1999). The two principal components allow to present the dataset in a two-dimensional space by compressing most of the information within the dataset as much as possible, resulting in a less dimensional space. Although my dataset consists only of three dimensions, so that the compression power is not as high, it still allows for easier visualization. This gives an overall impression about the quality of the cluster solutions and assists in evaluating the distances of the clusters to each other and thus, the separation of the clusters (even though, as stated, between-cluster separation is not the main goal). The third strategy is to test the robustness of the cluster solution. Valid clusters should also be stable, so that the cluster solution can be generalized to a certain degree (Hennig, 2019, p. 13). The data is randomly partitioned by using a nonparametric bootstrap method to assess the stability of the clusters over 100 resample runs (Hennig, 2007). The result is expressed in terms of the Jaccard similarity coefficient (a coefficient > 0.75 indicates a stable cluster). This ensures that the cluster solution discovered is not an artifact of the specific data sample, and that cluster solutions with too many small clusters are prevented. The values from the variables representing the three dimensions of the tradeoff measurement enter the cluster analysis.7 The spatio-temporal range of this study is the following: Since democracy profiles presuppose the existence of a democratic regime, the observation must be classified as a democracy in order to be included in the sample. After multiple imputation 4065 observations can be included in the analyses. The analysis covers all years from 1900 to 2017. The number of included countries is 111 from all regions, the average of years per country is 36.6 with a minimum of 1 year and a maximum of 118 years (see appendix, Chapter 1, for more a detailed overview). Thus, the cluster analysis is based on country-year observations. With this setting, it is easier to track the temporal change in the democracy profile for each country. However, the analysis It minimizes the following objective function min :
C n
m d 2 , where u m is the memberu ik ik ik i=1 k=1 2 ship matrix of the i-th observation to the k-th cluster and dik represents the squared Euclidean
6
distance. m is the so-called fuzziness parameter. To date, there are no theoretical justifications for the values of this parameter (D’Urso, 2015). I use the default m = 2. 7 The variables are freedom_dim_index_trade_off, equality_dim_index_trade_off, and control_dim_index_ trade_off (see Lauth & Schlenkrich, 2019).
32
2
Identifying Profiles of Democracy
has to ensure that a cluster is built, not from years of a single country, but from a reasonable number of countries.
2.4
Results of the Cluster Analysis
Figure 2.2 shows the results for the calibrated internal fit indices, the average within-cluster distance (Iave.wit ) and Pearson’s (I Pear son ), as well as the overall index which is the average of these two indices. The higher the value of the fit indices, the better the cluster solution. For all fit indices, the three partition algorithms (FCM, k-means, PAM) move almost in tandem. Iave.wit favors especially the two and five cluster solution, but overall, the values of the other cluster solution are very similar to each other. Pearson’s prefers especially the two cluster solutions. To a lesser extent, the solutions with three, four and five clusters are also acceptable. Finally, the aggregated index shows that the most likely cluster solution for the specific goal of homogenous clusters is to partition the data into two, three, four and five cluster solution. These four solutions are finally subjected to a stability test, because valid clusters must also be stable. The bootstrap resampling techniques indicates that all cluster solutions are in fact stable (see Table 60 in the appendix for the results). All clusters have been recovered (all Jaccard similarity values > 0.75). The instability of the cluster solution increases slightly, the more clusters are distinguished. Figure 2.3 visualizes the different cluster solutions by calculating the first two principal components of the data. Overall, the visualization shows that the observations form a point cloud with some observations placed in each corner of the plot. While the two cluster solution splits this point cloud in halve, the three and four cluster solutions split the dataset in more and more “cake pieces”. Thereby, each cake piece encompasses these populous corners of the figure. Finally, the five cluster solution places a new cluster in the middle of the plot. Nevertheless, the different observations lie close to each other and there is no gap between these clusters. This highlights the importance of fuzzy clustering which does not give a fixed classification but calculates membership degrees including the uncertainty of the classification.
2.5 Discussion: Temporal and Spatial Development …
33
Note: x-axis represent the number of clusters; y-axis shows the fit index. Values above 0 indicate a beer clustering result compared to the random clusterings. Source: own calculaons
Figure 2.2 Calibrated Fit Indices for the Trade-Off Indicators
2.5
Discussion: Temporal and Spatial Development of Democracy Profiles
Figure 2.4 presents the dimensional configurations of the cluster solutions. Each cluster solution with its dimensional configuration resembles some of the conceptual considerations above: In particular, the two-cluster solution reflects the inclusiveness dimension or the trade-off between freedom and equality: One cluster emphasizes the equality dimension over the other dimensions, while the other cluster highlights the freedom dimension.
34
2
Identifying Profiles of Democracy
Note: This figure shows the first two principal components for each cluster soluon. These two components explain 99% of the whole variability of the dataset. The shapes of the points and the grey colors represent the different clusters found in the data. Source: own presentaon
Figure 2.3 Principal Component Plot
The three-cluster solution introduces more differentiation of the inclusiveness dimension by allowing now a more balanced pattern between the freedom and equality dimension (FEc). This also happens to sharpen the other two profiles: A libertarian and control-focused democracy (FeC) and an egalitarian and controlfocused democracy (fEC) can be identified. This means that these three types are also distinguished along the effective government dimension (trade-off between freedom and control): While FEc emphasizes both freedom and equality values over control values and therefore prefers effective government, fEC has lower values on the effective government dimension. FeC is an incoherent type: On the one hand, it has higher values of freedom than equality because it has low level of inclusiveness. On the other hand, it has high control values due to the preference of a lower effective government. However, the trade-off between freedom and control (effective government) does not necessary result in low freedom values,
2.5 Discussion: Temporal and Spatial Development …
35
Source: own calculaons with the dataset of the Democracy Matrix V1.1
Figure 2.4 Box Plots for the Cluster Solutions
because the freedom dimension is reinforced by the other trade-off dimension. Overall, the types of this cluster solution represent three out of four conceptual derived types. The four-cluster solution divides the previously identified FEc democracy profile into an fEc and Fec profile. While these four clusters represent the conceptual considerations overall very well, the Fec cluster shows somewhat higher egalitarian values than what would be expected from a conceptual point of view. Finally, the five-cluster solution finds an additional balanced cluster (FEC). This balanced cluster does not contradict the idea of trade-offs. Rather, this means that some political systems do not occupy the extreme ends of the trade-offs (see Table 62 and Table 63 in the appendix for a detailed overview which countries belong to the four- and five-cluster solution).8 I continue with a detailed description of the five-cluster solution because, on the whole, it encompasses the other solutions and it also has a unique cluster, the balanced democracy profile. The other solutions are described in the appendix 8
The five-cluster solution is almost identical to the cluster solution found by Schlenkrich (2019), even though the paper uses a different clustering strategy and different cluster algorithms. This also validates these clusters.
36
2
Identifying Profiles of Democracy
(see Figure 113, Figure 114 and Figure 115) and are only taken up again in the empirical analysis (see Chapter 8 and 9). The temporal development of the democracy profiles from 1900 to 2017 is shown in Figure 2.5. If we divide the timeline according to the three waves of democracy (Huntington, 1993), we see that every wave of democracy has a distinct combination of democracy profiles: In the first wave of democracy (until 1926), democracy profiles emphasizing the freedom dimension (FeC or Fec) dominated. In addition, the balanced cluster (FEC) was also represented. In the second wave (1945–1962), egalitarian democracies (either with a weak or strong control dimension—fEc, fEC) complement this picture. During this wave, all democracy profiles coexisted with almost equal frequencies. However, this drastically changed with the third wave (since 1974). While libertarian democracy profiles (Fec, especially FeC) almost disappeared, profiles emerged that focused more on the equality dimension (fEc and fEC). The balanced cluster is also growing. In general, countries which democratized after 1990 have opted for an egalitarian democracy profile. It seems that egalitarian profiles are on the rise and libertarian democracy profiles have gone out of fashion.
Source: own calculaons with the dataset of the Democracy Matrix V1.1
Figure 2.5 Temporal Distribution of Democracy Profiles (Count and Percent)
2.5 Discussion: Temporal and Spatial Development …
37
Figure 2.6 shows the spatial distribution of the democracy profiles for the third wave of democracy. The countries are classified according to their longest lasting democracy profile during this period. On the one hand, the majority of democracy profiles in North America and South America are control-focused democracies. The USA combines the control-dimension with a more pronounced freedom dimension (FeC), whereas the countries in South America show higher control and equality values (fEC). On the other hand, Europe has a mixture of egalitarian-majoritarian (fEc) and egalitarian-control-focused democracies (fEC). The United Kingdom and Ireland are notable exceptions, as they are the only libertarian-majoritarian democracies in Europe (Fec). Similar to the finding by Lijphart, the United Kingdom seemed to have transferred its libertarian democracy profile to some of its former colonies. Most of them have the same profile (Fec: Botswana, New Zealand, Solomon Islands, Trinidad and Tobago). However, there are also exceptions to this rule (FEC: India, Australia, Sri Lanka).
Notes: World map shows the mode of the democracy profiles for each country in the period between 1974 and 2017. The mode is the value which appears the most in a set of values. White color means that the democracy profile is not available because data is missing or the country is not classified as a democracy. A map in color can be found in the online appendix. Source: own calculaons with the dataset of the Democracy Matrix V1.1
Figure 2.6 Spatial Distribution of Democracy Profiles (1974–2017)
Finally, this new typology makes it possible to track the development of democracy profiles for individual countries. Figure 2.7 shows this development for five
38
2
Identifying Profiles of Democracy
Source: own calculaons with the dataset of the Democracy Matrix V1.1
Figure 2.7 Temporal Development of the Democracy Profile for Single Countries after 1945
selected countries after 1945. Instead of using the crisp cluster solutions, the changes in the membership degrees are plotted. The United Kingdom is an example of a political system with a very stable democracy profile. For instance, the United Kingdom never changed its libertarian-majoritarian profile: The freedom dimensions is favored by a highly disproportional electoral system, “no limits of the total expenditure of and donations to political parties” (Smilov, 2008, p. 14) at least until 2000, no judicial review and no divided government as well as a weak second chamber. Although some of those characteristics have changed partially since 2000, it does not result in a change of the membership probabilities. There are also countries with minor changes. Germany was an egalitariancontrol-focused democracy in its beginnings, but shifts towards a more egalitarianmajoritarian profile in the 1960s. However, the membership probability shows
2.6 Summary and Conclusion
39
that there is a high chance that it still belongs to the fEC cluster. Nevertheless, Germany’s democracy profile contrasts in some aspects with United Kingdom’s democracy profile: a proportional representation system (also with a 5% threshold which make it majoritarian in some cases like 2013 where two parties fell just below the 5% hurdle), an egalitarian party finance system and egalitarian media access model strengthen the equality dimension, whereas a rather strong second chamber in combination with a strong constitutional court favor the control dimension. These institutional decisions come at the expense of the freedom dimension. Finally, there are political systems whose democracy profile has changed drastically: New Zealand was first a libertarian-majoritarian democracy and changed to an egalitarian-majoritarian profile in 1996 with the electoral reform from a First-Past-the-Post system to Proportional Representation. Switzerland established a balanced democracy profile in 1972. This was more caused by a change in the quality measuring indicators than by a change in the trade-off indicators: Switzerland introduced woman suffrage in 1971 and changed from a deficient democracy to a working democracy. Similarly, the United States had a high probability to be classified as a libertarian and control-focused democracy (FeC). With the civil rights movement, however, this inegalitarian democracy profile was partially replaced by an egalitarian-control-focused and balanced democracy profile. Since 1990 it has even been classified as an egalitarian and control-focused democracy, although it is very likely to belong to the balanced cluster as well. With the election of Trump in 2016, the membership profiles for the egalitarian democracy profile (fEC, fEc) decreased in favor of libertarian profiles (Fec and FeC).
2.6
Summary and Conclusion
This chapter is a heavily revised version of Schlenkrich (2019). I incorporated several methodological improvements: First, I have greatly increased the sample size over the original study by imputing plausible values using multiple imputation, when the values for the trade-off indicators were missing. In addition, in accordance with the newer methodology of the Democracy Matrix, the trade-off indicators are based on the core measurement instead of the context measurement. The increased sample size will greatly benefit the empirical analysis. Finally, due to a recent improvement in the application of the cluster analysis, I select the number of clusters on the basis of calibrated fit indices which resemble specific clustering goals.
40
2
Identifying Profiles of Democracy
This chapter presents the main focus of this research study: the democracy profiles. The democracy profiles are the most important independent and dependent variable of this study. Based on the work by Lauth and Schlenkrich (2018a), I have shown how to conceptually link trade-offs between dimensions with democracy profiles: From a democracy theory perspective, a perfect democracy seems impossible, a complete realization of all three key democracy dimensions—freedom, equality, and control—is unattainable. The tensions between dimensions manifest themselves in institutional choices and two opposing pairs of profiles can be identified: libertarian vs. egalitarian democracy, as well as majoritarian vs. control-focused democracy. In this article, I have drawn important conclusions. A cluster analysis with a strong focus on cluster validation revealed that these deductively derived profiles can be found. Based on the Democracy Matrix dataset—a customized version of the Varieties-of-Democracy dataset—I find empirical evidence for a libertarianmajoritarian democracy cluster (high political freedom values; lower values for political equality, and political and legal control; Fec) and an egalitarianmajoritarian democracy cluster (high equality, less freedom and control; fEc). In addition, there are control-focused clusters: High control values are associated either with higher freedom or higher equality levels (the libertarian-control-focused FeC and the egalitarian-control-focused cluster fEC respectively). Finally, there is a smaller balanced cluster (FEC) whose dimensional values do not vary. These clustering results are overall very similar to the findings of the original study (see figure 116 in the appendix). Furthermore, this chapter shows that each wave of democracy has its own characteristic distribution of democracy profiles. In the first wave, libertarian democracies (either with majoritarian or control-focused features) dominated. The second wave presented a mixed picture, meaning that all profiles of democracy were more or less equally represented. The third wave showed that egalitarian democracy profiles (either with majoritarian or control-focused features) gained the upper hand. Referring to the spatial distribution, there is a concentration of control-focused democracies in North and South America, whereas a stronger mix of democracy profiles exists in Europe. An exception seems to be the United Kingdom with its libertarian-majoritarian democracy profile. I also discussed cases where the democracy profile was mostly stable over the whole period from 1945 to 2017 (United Kingdom, Germany) as well as countries where there was a significant change (New Zealand and Switzerland). Although the five-cluster solution was discussed in more detail in this chapter, the other solutions are taken up again in the empirical causal analysis (see Chapter 8 And 9).
2.6 Summary and Conclusion
41
In the next chapters, I ask the question, whether these democracy profiles actually matter for performance. An important conclusion in comparative political science is that there are different institutional choices and “institutions do matter” (Lijphart, 2012; Steffani, 1979; Duverger, 1951). In addition, I analyze the causes of democracy profiles in the last chapter. Why do countries develop specific democracy profiles? Furthermore, in Chapter 3 I investigate the quality of the V-Dem dataset in terms of validity and reliability, since this dataset is the basis for the Democracy Matrix and hence for the measurement of democracy profiles. An assessment of its quality is therefore essential.
3
Analyzing the Varieties of Democracy Dataset
3.1
Introduction
The Varieties of Democracy (V-Dem) dataset by the Department of Political Science at the University of Gothenburg (Sweden) and the Kellogg Institute of the University of Notre Dame (USA) is the currently largest dataset for empirical democracy research and quality of democracy research. It is publicly available since January 2016, covers over 180 countries and colonies from 1900 to 2018 in its recent version (April 2019; V.9).1 It offers 353 different indicators. V-Dem states that each indicator per country is on average coded by five experts, leading to over 3000 experts who contributed to this project. The yearly update extends the time range but also includes new variables. This dataset is based on sophisticated methodological considerations. It uses Bayesian ordinal item response theory to calibrate and aggregate the numerous ratings of the coders, thus trying to identify and correct measurement errors by the coders. In theory, the measurement model can not only adjust for the “rater precision [but also for the raters’] differing standards of conceptualization” (Pemstein et al. 2015: 5). Latter means that experts can interpret the scale of the same indicators differently and the measurement model will be able to identify it and align these different scales. From a methodological perspective, these are
1
The dataset (V 9) is available at v-dem.net.
Electronic supplementary material The online version of this chapter (https://doi.org/10.1007/978-3-658-34880-9_3) contains supplementary material, which is available to authorized users.
© The Author(s), under exclusive license to Springer Fachmedien Wiesbaden GmbH, part of Springer Nature 2021 O. Schlenkrich, Origin and Performance of Democracy Profiles, Vergleichende Politikwissenschaft, https://doi.org/10.1007/978-3-658-34880-9_3
43
44
3
Analyzing the Varieties of Democracy Dataset
important improvements for the measurement of democracy (see next section for a detailed description of V-Dem’s measurement model). The V-Dem dataset is not only at the core of this research project, which uses its indicators to measure the Democracy Matrix and its corresponding democracy profiles, but more and more studies are and will depend on these data. This makes a thorough, independent discussion of the methodological base and empirical findings of the V-Dem dataset important. However, such a discussion is currently still lacking, even though the V-Dem institute itself publishes critical reviews and possible improvements of the reliability and validity of its own dataset (e.g. Pemstein et al., 2019). Therefore, what is the Varieties of Democracy dataset? And above all, how is the quality of the data? And especially: What do these insights gained in this chapter mean for the Democracy Matrix dataset and the measurement of the democracy profiles? Section 3.2 provides an overview of the V-Dem dataset, focusing on the theoretical foundations and assumptions of the V-Dem measurement model, thereby developing the research focus for the subsequent sections of this chapter.
3.2
Overview of the Varieties of Democracy Dataset
3.2.1
Themes Covered by the V-Dem Dataset
V-Dem offers indicators for a variety of different aspects of the political system. Figure 3.1 gives an overview of all variables in the dataset differentiated between 12 sections. This includes variables coded by country experts and more technical variables coded by the V-Dem staff. The “election” section contains by far the most variables. This is followed by the sections “executive”, “direct democracy”, and “legislature”. These sections have almost the identical number of variables. The next group are the “civil liberties” and “political equality” section. Since version 9 of the V-Dem data set (published 2019), there was a large inclusion of new variables (“Exclusion”) describing various aspects of social, socio-economic or gender exclusion. The last group of sections (judiciary, party system, civil society, sovereignty/state, media and deliberation) contain only 10 or fewer variables. If we compare, which variables are coded by country experts (164 in total) or the V-Dem team, we see quite a different picture for the largest sections: Only a minority of variables in the largest section “election” are coded by experts, while roughly 50% percent of the variables in the executive and legislature section are coded by experts. And there are no country expert codings in the direct democracy
3.2 Overview of the Varieties of Democracy Dataset
45
Notes: sv = Sovereignty/State; pe = Polical Equality; ju = Judiciary; el = Elecons; ps = Polical Pares; me = Media; cl = Civil Liberty; lg = Legislature; dl = Deliberaon; ex = Execuve; cs = Civil Society; dd = Direct Democracy. Source: own calculaons based on V-Dem Dataset V.9 and Codebook V.9.
Figure 3.1 Number of Variables per Section
section highlighting the heavy focus on technical aspects of direct democracy procedure (e.g. threshold of signature gathering). But, in general, technical aspects are not capable of evaluating the democratic quality of direct democracy procedures, so that it is not possible to evaluate whether direct democracy procedures were free or fair. However, that would be a relevant question (Lauth, 2004). Finally, there are sections (civil liberties, political sovereignty, media, and civil society) which only consists of variables coded by country experts.
46
3
Analyzing the Varieties of Democracy Dataset
Are there aspects missing which might be important for assessing the quality of democracies or democracy profiles? There are almost no indicators for federalism. This is unexpected, because V-Dem initially wanted to measure the principle of majoritarian and consensus democracy. In this democracy model, federalism is an important part due to its power dispersion characteristics. The indicators “regional offices relative power” (v2elrgpwr) and “local offices relative power” (v2ellocpwr) do not assess the strengths of the different state levels relative to each other but rather the strength of elected in comparison to non-elected offices. There might be similarities to the concept of federalism, however, these indicators overall seem to focus on other elements of a political system. Another aspect of the political system which is only marginally represented is the quality of statehood. It is surprising that aspects of statehood like the functioning and quality of the administration or the tax monopoly are underrepresented, even though this research area became more prominent in the recent years. This situation has somewhat improved with the inclusion of new indicators in version 9. In addition, there seems to be an overall conceptual bias towards freedom aspects, neglecting control and especially equality aspects of the political system. However, despite observing a large number of variables in the political equality section, it is important to stress that V-Dem´s understanding of political equality focuses on the equality of the individual. It does not include the equality of whole organization such as the media system or political parties which is especially relevant for democratic quality. For example, V-Dem does not measure whether opposition parties have enough resources to compete with the governing parties. Similarly, the civil society measurement is based mainly on freedom aspects (restrictions and repression), while cultural aspects of the civic society (e.g. nonviolence) are ignored (Lauth, 2017a). These issues are reflected in the measurement of the Democracy Matrix. One of the best measured institutions is the electoral system, where only two (but still important) indicators for the effectively equal right to vote and for effectively equal right to be a candidate are missing. This means that the equality dimension of the procedures of decision is only measured on a de jure basis. The equality dimension of the intermediary system is the matrix field measured with the lowest amount of validity. From the few indicators in the political party and civil society section, there are no indicator focusing on the equality of political parties or civil society organization, as stated above. There are also significant gaps in the measurement for the public communication institution: There is no indicator in the media section for freedom of information in the sense of the transparency of the political process or for the equal provision of media to citizens. Finally, a measure
3.2 Overview of the Varieties of Democracy Dataset
47
for statehood is missing, so that aspects of the institution “rules settlement and implementation” cannot be fully measured.
3.2.2
V-Dem Measurement Model: Bayesian Item Response Theory
Experts are important for the assessment of the quality of democracy. Often these concepts are so complex that variables based on solely quantitative data (e.g. turnout) are not suitable: “for most of the relevant characteristics of democracy, there is no possibility for a meaningful quantitative measurement, as they cannot be objectively quantified” (Lauth, 2004, p. 301, own translation). Especially democracy can be seen as an “essentially contested term” (Coppedge et al., 2011, p. 257). Those complex concepts can only be “met by expert judgment, grounded in analytical and synthetic competence as well as in local knowledge” (Schedler, 2012, p. 30). With the famous exception of Vanhanen’s Index of Democratization (Vanhanen, 2003), which relies solely on quantitative variables, most measures of democracy therefore use variables based on qualitative data (i.e. “subjective judgements”) provided by experts (Lauth, 2010b). Thus, the knowledge of experts “permits researchers to explore topics that might otherwise be impossible to study in a systematic fashion” (Maestas, 2016, p. 1). Vanhanen (1997, p. 34) criticizes the use of variables based on qualitative, expert data: “I think that we should try to formulate intersubjectively usable and reliable measures of democracy based on available quantitative data. Just like ‘metre’ is scientifically a more satisfactory measure of length than subjective concepts of length varying from one person to person”. However, recent expert surveys are designed to deal with the problem of subjectivity: The Bertelsmann Transformation Index (Bertelsmann Stiftung, 2018a) uses a complex calibration procedure with various stages involving multiple experts (two country experts, intraregional calibration through a regional coordinator, interregional calibration through all regional coordinators and a final approval stage). V-Dem uses Item Response Theory to mathematically align the experts’ evaluations and calibrate the country scores. The V-Dem approach follows a “mixed design” (Maestas, 2016, p. 7) of expert surveys (see Figure 3.2): The majority of experts rate only a single country. However, there are also lateral and bridge coders. Country coders rate only one country, usually over the entire period from 1900 to 2012. Bridge coders on the other hand, evaluate more than two countries for an extended period of time. Finally, lateral
48
3
Analyzing the Varieties of Democracy Dataset
coders only rate a short-period of time (usually one year), but for several countries. This design “provides data that can be used in measurement models to help correct for cross-country biases in scaling” (Maestas, 2016, p. 7).
Notes: This plot shows two countries A and B over three me points (T1, T2 and T3). Grey squares and triangles represent different country coders; white circles indicate bridge coders and dark-grey circles stand for lateral coders Source: own presentaon, based on Pemstein et al. (2019)
Figure 3.2 Study Design V-Dem
V-Dem uses an ordinal item response theory (IRT) model,2 to transform the expert ratings into country-year estimates of the democracy quality controlling for the experts’ perceptions and reliability. The main research field using IRT-models3 2
The author was able to replicate the data for some indicators. However, it is not completely replicable because they do not provide the coders’ country of origin in the raw data file to protect the coders. The coders’ country of origin is important for the correct specification of the hierarchical priors in the measurement model (see next section). See for the Stan-code Pemstein et al. (2019, p. 17). 3 In contrast, the classical test theory (CTT) deals more with metric variables, whereas the IRT deals with categorical data (Geiser & Eid, 2010, p. 312).
3.2 Overview of the Varieties of Democracy Dataset
49
is educational research which produced a vast amount of literature and study design proposals. Thereby, item response theory specifies “the probability for an observable taking on a particular value as a function of the latent variable for the examinee (subject, politician, and patient) and the measurement model parameters for that observable” (Levy & Mislevy, 2016, p. 253). In these research areas, IRT models are often applied to identify the ability of a subject (e.g. skill) given their response to specific items (test questions). Thereby, the subject’s ability is the latent trait—a not directly observable variable. In addition, IRT allows to model and estimate specific features of the items. The 1PL model takes the item specific difficulty of a correct answer into account. In 2PL models an additional second parameter is included—the discrimination parameter which is the ability of an item to differentiate between subjects. Finally, the 3PL models model the probability that an examinee might be only guessing the correct answer.4 V-Dem transfers the 2PL models into the realm of expert surveys: the examinees become countries, the abilities become country scores (e.g. the democratic quality) and finally the items become expert raters with a unique difficulty parameter and discriminatory power. Figure 3.3 shows the directed acyclic graph of the Varieties-of-Democracy measurement model. In the middle of this figure, there are the observed scores (yctr ) of the raters (r ), which evaluate one year5 (t) for one country (c) within one variable. The measurement model uses these scores to estimate three parameters: the latent trait (z ct ) which is the estimate for each country-year, the vector of difficulty parameters (τr k ) for each response category (k) and the discrimination parameter of the rater (βr ). The difficulty parameter accounts for the fact that “experts may diverge in how far apart they consider different levels” (Marquardt & Pemstein, 2018, p. 439), whereas the discrimination parameter estimates the reliability of the expert rater: this factor “weight[s] downwards the contribution of experts who unsystematically diverge in either the scale direction or direction of their codings from those experts who code the same cases” (Marquardt & Pemstein, 2018, p. 440).
4
Due to the IRT framework, it is as much a study about different aspects of democracy as it is a study about the evaluating capacities of experts. V-Dem’s dataset cannot only be used for causal analysis of political transformation processes, but rather it can be used to analyze the effect of various background factors (e.g. gender, educational attainment, ideologies and conceptual frameworks) on expert’s evaluation of democratic quality. 5 The measurement model does not estimate country-years but rather regime periods, i.e. stretches of time: “we treat any stretch of time, within in country c, in which no expert provides two differing ratings, or estimates of confidence, as a single observation” (Pemstein et al. 2015: 9).
50
3
Analyzing the Varieties of Democracy Dataset
Source: own presentaon, based on Pemstein et al. (2019)
Figure 3.3 Directed Acyclic Graph of the V-Dem Measurement Model
This measurement model is based on Bayesian statistics (for a detailed discussion see Chapter 8). Therefore, it presupposes a so-called prior distribution for each parameter—“a priori knowledge, beliefs, or assumptions about the parameter” (Levy & Mislevy, 2016, p. 26).6 Generally, these priors are often criticized, 6
V-Dem uses these priors as hyperparameters to form multilevel models by modeling a hierarchical prior specification on the difficulty parameter τ . The difficulty parameter of each
3.2 Overview of the Varieties of Democracy Dataset
51
because they can strongly influence the resulting values and even—under certain circumstances –override the implications of the data. Therefore, priors are usually defined in a relatively vague way (so called weakly informative priors or non-informative priors). However, V-Dem places a highly informative prior on the latent trait, the country score z. Thereby the data is used twice: This prior is calculated from the data itself; the same data is then used again in the Bayesian estimation. It’s a “double-dipping procedure” (Depaoli & van de Schoot, 2017, p. 245) which is problematic. The reason is that the prior should represent the researcher’s belief of the parameter estimate before seeing the actual data. However, V-Dem uses here a more pragmatic approach and argues that it is necessary to ensure some degree of scale equivalence, especially for countries where information is low due to a small number of expert coders. These priors “help the model to place cases relative to another in a reasonable way when the model lacks the necessary information (i.e. it lacks sufficient bridge and lateral coding) to situate a case relative to the rest of the cases” (Pemstein et al. 2015: 7). This strong prior shows the core problem of V-Dem’s measurement model: On the one hand, there should be enough country coders per country so that unbiased estimates of the latent trait can be created (V-Dem’s goal is five coders per country-year). As V-Dem states, estimates based on less than three coders are very problematic and should be excluded from the analysis: “V-Dem’s methodology is based on the assumption that we have a minimum of five Country Experts for every single country-variable-year.[…] We strongly advise against using observations based on three or fewer coders” (Coppedge, Gerring, Knutsen, Lindberg, Skaaning, et al., 2019, p. 30). On the other hand, studies (Marquardt & Pemstein, 2018; Pemstein et al., 2015) have shown that “bridging” observations is very important for unbiased results, so that each country is connected with the other countries and the rating can be securely placed on the measurement scale. Although V-Dem has a large pool of experts, there is still the question of how many overlapping codings they have. Because not every coder can rate every country in the dataset, the problem of scale equivalence between the country estimates arises: the ratings of countries cannot be placed on the same scale (z) by country expert partially depends on the difficulty parameter of all country experts which belong to the same country. At the same time, the difficulty parameters of these countries are partially depended on a world average difficulty parameter. A hierarchical prior “has the attractive feature of retaining the core goal of specifying a prior distribution for the posterior distribution to be shrunk towards[,] but allows where that prior distribution is located to depend to some degree on the information in the data” (Levy & Mislevy, 2016, p. 290). This is known as the advantages of partial pooling or multilevel modeling (Gelman, 2013; Gelman & Hill, 2007).
52
3
Analyzing the Varieties of Democracy Dataset
the measurement model and thus, they cannot be meaningful compared if there are too few coders offering overlapping ratings for more than one country. Thus, cross-national comparability is only guaranteed if there are enough lateral and bridge coders who are able to code different countries. V-Dem states that they “currently lack the necessary overlapping observations to completely identify the scale of the latent trait cross-nationally […]. While we are developing techniques and collecting further data to overcome this issue, we currently adopt an explicitly Bayesian approach and make substantial use of prior information to obtain estimates that exhibit strong face validity, both within and across countries” (Pemstein et al., 2019, p. 8). Finally, in another aspect, V-Dem has an explicit advantage over the other datasets by using its measurement model: In general, V-Dem is more flexible and can easily add new ratings to improve their dataset because their measurement model represents an automatic calibration method. Therefore, V-Dem can be considered as an “open” dataset which can change by including new coders as new facts and information about the history of countries emerge. In contrast, the Bertelsmann Transformation Index (BTI), for example, is a more “closed” dataset; its ratings cannot easily be changed once the manual calibration process is completed.
3.2.3
Research Questions
In this section, I lay out the central research criteria which will guide the following detailed discussion of the V-Dem dataset. There is a common test procedure for democracy indices developed by Munck and Verkuilen (2002) which has been frequently applied and adapted (Müller & Pickel, 2007; Pickel et al., 2015). A similar procedure was developed by Lauth (2004, 2010b, 2011). Both test procedures distinguish between three crucial tasks: conceptualization, measurement and aggregation. The conceptualization phase focuses on the elaboration of a democracy definition that is appropriate to the investigation and that is characterized by its discriminatory power. In addition, a differentiated and stringent concept tree must be created, which serves as a starting point for the next steps in the work process. The next step is the measurement of the concept tree using indicators for empirical measurement of the individual components and subcomponents. Two critical issues can be highlighted: the selection of valid and reliable indicators. Indicators are valid if they can measure the component which they are supposed to measure. Indicators are reliable if their measurement error is small and their ratings are replicable (Rammstedt, 2010). This is also where the conceptual one-dimensionality of indicators comes into play (Lauth, 2004, p. 234), so that
3.2 Overview of the Varieties of Democracy Dataset
53
an unambiguous coding is guaranteed. If the indicator focuses on components that could belong to different dimensions and institutions—especially if these components are contradictory to each other –, it is unclear on which aspects the ratings of the experts are based on (e.g. free and fair elections which combines freedom with equality aspects into a single assessment). This reduces the validity and the reliability of the indicators. The last step in the work process is elaborating the aggregation, i.e. a calculation serving to provide a theoretically well-founded bundling of the values that have been measured by the indicators. However, this common framework only needs to be partially applied here. The reason is that the main research aim is not to analyze the V-Dem dataset as a whole, but to examine the dataset and its elements only to the extent necessary for this study. Therefore, I am not interested in the evaluation of the democracy concept of V-Dem, because I use the democracy concept of the Democracy Matrix (Lauth, 2004; Lauth & Schlenkrich, 2018a). I am also not concerned with the index aggregation procedures of V-Dem (e.g. polyarchy index, liberal democracy index), since I use mainly the untransformed indicators and develop my own aggregation rules (Lauth & Schlenkrich, 2020). Therefore, I only concentrate here on the measurement phase with its focus on the validity and reliability of the indicators, because these indicators are used to construct the Democracy Matrix and to measure the democracy profiles. All analyses in these following sections are based on the disaggregated data file of V-Dem version 9 which contains the raw expert coder ratings (coder-level dataset). Besides validity and reliability in general, this discussion makes clear that there should be a special focus on the prerequisites of measurement model. In the next Section 3.3, I assess the quality of the Varieties of Democracy dataset. Firstly, I evaluate the conceptual clarity and one-dimensionality of the indicators (see Subsection 3.3.1). Afterwards, I discuss descriptively the characteristics of the expert coders which are important for the quality and reliability of the dataset (e.g. number of countries coded per expert, type of coder, coder agreement) (see 3.3.2). In addition, I evaluate important prerequisites of the V-Dem measurement model (see 3.3.3): There must not only be enough country coders, but also experts who evaluate several countries at once (bridge or lateral coders). This chapter asks therefore, if the countries do have sufficient coders and are sufficiently bridged. Using a regression analysis, are there any country or indicator characteristics which influence the number of coders and the disagreement between coders? Another common type of variables in the V-Dem dataset are multiple choice and percentage variables. In contrast to the other types of variables, these variables are not subjected to the measurement model. Therefore, I
54
3
Analyzing the Varieties of Democracy Dataset
discuss their validity and reliability in Section 3.3.4. A summary of the research questions is given in Table 3.1.
Table 3.1 Research Questions Validity
• Conceptual Clarity of the Indicators: One-dimensionality & precise definitions of question wording • Well-ordered response categories, i.e. first and last response as polar opposites
Reliability
• Workload of Coders • Confidence of Coders • Disagreement between expert coders
Measurement Model and Scale equivalence
• Number of Country Coders • Number of Lateral and Bridge Coders
Source: own presentation
3.3
Assessing the Quality of the Varieties of Democracy Dataset
3.3.1
Conceptual Clarity of the Indicators
Are the V-Dem indicators conceptually coherent, so that an unambiguous coding is guaranteed? The combination of several dimensions and aspects in the sense of a composite indicator is problematic because it is unclear how the experts implicitly weigh each of these dimensions. This is particularly relevant if these dimensions and aspects do not go hand in hand. The V-Dem dataset contains composite indicators that investigate several different aspects at once (see Table 3.2). The indicator “election free and fair” (v2elfrfair) is multidimensional, as it simultaneously examines the freedom and equality dimension of elections. Free elections encompass freedom of choice between multiple parties and a free, undisturbed electoral process (e.g. no voting irregularities, no violence), whereas equal elections give every candidate and especially every political party a fair chance and every voter a possibility to vote. The same applies to the indicator “transparent laws with predictable enforcement” (v2cltrnslw). This indicator combines transparency, predictability and even stability as well as coherence in
3.3 Assessing the Quality of the Varieties of Democracy Dataset
55
Table 3.2 Multidimensional Indicators Indicator
Question
Election free and fair (v2elfrfair)
“Taking all aspects of the pre-election period, election day, and the post-election process into account, would you consider this national election to be free and fair?” (Coppedge, Gerring, Knutsen, Lindberg, Skaaning, et al., 2019, p. 60)
Transparent laws with predictable enforcement (v2cltrnslw)
“Are the laws of the land clear, well-publicized, coherent (consistent with each other), relatively stable from year to year, and enforced in a predictable manner?” (Coppedge, Gerring, Knutsen, Lindberg, Skaaning, et al., 2019, p. 162)
Distinct party platforms (v2psplats)
“How many political parties with representation in the national legislature or presidency have publicly available party platforms (manifestos) that are publicized and relatively distinct from one another?” (Coppedge, Gerring, Knutsen, Lindberg, Skaaning, et al., 2019, p. 90)
Source: Coppedge et al. (2019)
a single assessment. Empirically, all these aspects could point in a similar direction and they are connected with each other. However, there is no reason why laws cannot be arbitrarily enforced, even though these laws are transparent and “well-publicized” (Coppedge, Gerring, Knutsen, Lindberg, Skaaning, et al., 2019, p. 162). Coherence refers to law-making, transparency to communication, stability to statehood and predictability to the regime type (autocracy or democracy). Finally, the indicator “distinct party platforms” (v2psplats) combines the aspect of public availability and distinctiveness, creating a composite indicator. Manifestos of parties could be publicly available, even though they are not distinguishable. Isolating the aspect of distinguishability into a clean indicator would be very useful for the Post-democracy debate, which assumes that the positions of parties in the post-democratic age are increasingly becoming similar (Crouch, 2004). Another problem for conceptual clarity of the indicators arises, when the wording of the questions is very vague (see Table 3.3). This applies especially to the variable of the political equality section. In the wording of the indicators, “power distributed by socioeconomic position” (v2pepwrses), “power distributed by social group” (v2pepwrsoc) and “power distributed by gender” (v2pepwrgen),
56
3
Analyzing the Varieties of Democracy Dataset
Table 3.3 Vaguely Defined Indicators Indicator
Question
Power distributed by socioeconomic position (v2pepwrses)
“Is political power distributed according to socioeconomic position?” (Coppedge, Gerring, Knutsen, Lindberg, Skaaning, et al., 2019, p. 190)
Power distributed by social group (v2pepwrsoc)
“Is political power distributed according to social groups?” (Coppedge, Gerring, Knutsen, Lindberg, Skaaning, et al., 2019, p. 190)
Power distributed by gender (v2pepwrgen)
“Is political power distributed according to gender?” (Coppedge, Gerring, Knutsen, Lindberg, Skaaning, et al., 2019, p. 191)
Reasoned justification (v2dlreason)
“When important policy changes are being considered, i.e. before a decision has been made, to what extent do political elites give public and reasoned justifications for their positions?” (Coppedge, Gerring, Knutsen, Lindberg, Skaaning, et al., 2019, p. 147)
Common good (v2dlcommon)
“When important policy changes are being considered, to what extent do political elites justify their positions in terms of the common good?” (Coppedge, Gerring, Knutsen, Lindberg, Skaaning, et al., 2019, p. 147)
Respect counterarguments (v2dlcountr)
“When important policy changes are being considered, to what extent do political elites acknowledge and respect counterarguments?” (Coppedge, Gerring, Knutsen, Lindberg, Skaaning, et al., 2019, p. 148)
Source: Coppedge et al. (2019)
the term “political power” appears. But this term is never explained in more detail, even though V-Dem provides a rich glossary in the codebook explaining a lot of technical terms (Coppedge, Gerring, Knutsen, Lindberg, Skaaning, et al., 2019, pp. 354–359). However, these indicators suffer from another issue. The question is kept too general giving the coders no hints what aspects of political power they should evaluate (e.g., elections, party systems, media and parliament). Similar considerations can be made for the deliberation indicators “Reasoned justification” (v2dlreason), “Common good” (v2dlcommon) and “Respect counterarguments” (v2dlcountr). Even though the terms within the question wording
3.3 Assessing the Quality of the Varieties of Democracy Dataset
57
are better defined than the political equality indicators, it is still not clear on which political areas the expert coders should focus to evaluate deliberation. In all these cases, it seems difficult for the coders to answer correctly and in a valid way. However, there are other equality indicators which circumvent most of the problems: The indicators, “social group equality in respect for civil liberties” (v2clsocgrp) and the social exclusion variables (state jobs, public services, business opportunities) have a clearer defined focus. These problems are reflected in the high disagreement of the expert coders (see Figure 3.11). This is particularly important because these indicators measure important aspects of the equality dimension, so that the conceptual bias in favor of freedom aspects of the political system is potentiated. Finally, there are variables with ill-defined scales (see Table 3.4). The indicator “state fiscal capacity” (v2stfisccap) mixes an ordinal scale with a nominal scale. It evaluates the fiscal capacity and therefore in the sense of taxation a central category of statehood (Lambach et al., 2015). The lowest value of this indicator is that the state is not capable of raising revenue. The other categories, however, do not show inevitably a gradually increase in this capacity but rather qualitative different revenue sources (external sources, natural sources and taxes). Even though the highest category, taxes on economic transactions and income, expresses a higher financial stability, the other categories could at least—theoretically—be on the same fiscal capacity. It is therefore questionable to use this categorical variable as an input to the measurement model which presupposes ordinal scale level. The same applies to the indicator “national party control” (v2psnatpar). This indicator focuses on the party composition of the national government and/or parliament and is therefore related to the veto player approach by Tsebelis (2002). However, the order of the responses is wrong. It is not category 0 (coalition) and 2 (single party) that are polar opposites, but response categories 2 and 1 (divided government). Very few veto players are found in those situations where only one party controls all relevant areas of government and legislation. The highest probability of stalemate is not found in a coalition formation, but rather in a divided government (Tsebelis, 1995). This is also acknowledged by the fact that V-Dem offers another version of this indicator (divided party control index— v2x_divparctrl), but this is not a genuine indicator estimated by the measurement model but only a reordered version of the measurement model estimates of the original variable (v2psnatpar). Similar to the state fiscal capacity indicator, such an ill-defined scale, which is not ordinal because there is no apparent order from low to high, cannot be fitted by the measurement model. This measurement model then forces the nominal scale to a continuous scale which is unreasonable.
58
3
Analyzing the Varieties of Democracy Dataset
Table 3.4 Ill-defined Scales Indicator
Question
State fiscal capacity (v2stfisccap)
“On which of the following 0: The state is not capable of sources of revenue does the raising revenue to finance central government primarily itself. rely to finance its activities?” 1: The state primarily relies on (Coppedge, Gerring, Knutsen, external sources of funding Lindberg, Skaaning, et al., 2019, (loans and foreign aid) to p. 175) finance its activities. 2: The state primarily relies on directly controlling economic assets (natural resource rents, public monopolies, and the expropriation of assets within and outside the country) to finance its activities. 3: The state primarily relies on taxes on property (land taxes) and trade (customs duties). 4: The state primarily relies on taxes on economic transactions (such as sales taxes) and/or taxes on income, corporate profits and capital. “How unified is party control of 0: Unified coalition control. A the national government?” single multi-party coalition (Coppedge, Gerring, Knutsen, controls the executive and Lindberg, Skaaning, et al., 2019, legislative branches of the p. 91) national government. (This is true almost by definition in a parliamentary system where a single coalition gathers together a majority of seats.). 1: Divided party control. (A) Different parties or individuals (unconnected to parties) control the executive and the legislature or (B) Executive power is divided between a president/monarch and a prime minister, each of which belongs to different parties; or between a non-partisan monarch and a prime minister. 2: Unified party control. A single party controls the executive and legislative branches of the national government. (This is true almost by definition in a parliamentary system where a single party has a majority of seats.). (continued)
National party control (v2psnatpar)
Scale
3.3 Assessing the Quality of the Varieties of Democracy Dataset
59
Table 3.4 (continued) Indicator
Question
Judicial review (v2jureview)
“Does any court in the judiciary 0: No. have the legal authority to 1: Yes. invalidate governmental policies (e.g. statutes, regulations, decrees, administrative actions) on the grounds that they violate a constitutional provision?” (Coppedge, Gerring, Knutsen, Lindberg, Skaaning, et al., 2019, p. 158)
Scale
Source: Coppedge et al. (2019)
Finally, the indicator “Judicial review” (v2jureview) evaluates whether a supreme court exists using two response categories (“yes” and “no”). It seems that this is just a factual question and is not intended to measure the strength of the supreme court which would be an important measure. The empirical findings of this indicator, however, show that it can be used as a proxy for the strength of the supreme court (see Figure 118 in the appendix). Still, it is not clear, why not a more fine-grained scale is applied here. Especially since there is a prominent predecessor for this type of measurement (Lijphart, 2012).
3.3.2
Descriptive Analysis of the Expert Coders
The previous discussion shows that V-Dem measurement relies on the quality of expert coders. Therefore, I analyze in a descriptive way certain characteristics of the experts which might influence their quality and reliability: What is the average workload of the experts? How many coders coded each country-year? How are the different types of coders (country, bridge, lateral and historic coders) distributed over the countries and periods? How confident are the coders and how much do they disagree? Overall, there are 3177 experts and 13 792 313 ratings in the dataset.
3.3.2.1 Workload of Experts Figure 3.4 (top row) shows the total country-years coded over all variables by each expert. On average, each expert coded 4341 country-years (median: 2379). The top 3 experts coded 52,991, 59,509 and 73,973 country-years. The expert
60
3
Analyzing the Varieties of Democracy Dataset
Notes: In the above plot, the x-axis shows the coders and the y-axis shows the sum of the years coded by each coder. The plot below shows the cumulave percentage of country-years coded by the coders. Source: own calculaons based on V-Dem Disaggregated Dataset V.9.
Figure 3.4 Number of Country-Years coded by Experts
3.3 Assessing the Quality of the Varieties of Democracy Dataset
61
(coder-ID: 2348) with 73,973 country years coded 31 countries on 162 variables over 119 years. These countries include not a regional sample, but rather a diverse set (e.g. Belgium, Germany, Honduras, Mexico, Eritrea, Kenia, Saudi Arabia, Somalia etc.). The coders with 52,991 and 59,509 coded only 5 resp. 8 countries, however over 145 and 158 variables and 119 years. A reliable and meaningful coding of such a large number of variables and countries over this time period presupposes great knowledge. However, to consider that someone has such a knowledge is highly doubtful, to say the least. This may reduce the quality of the dataset. The second plot in Figure 3.4 (bottom row) shows the cumulative percent of country-years coded by the experts: it shows that about 2000 of the 3177 experts are responsible for only 20% of all country-year codings. The final 1177 coders code about 80% of all country-years. That means 1/3 of all coders code 4/5 of the data. How many countries and variables are coded by the experts? Figure 3.5 shows that about 4/5 of the experts coded only 1 country. The highest amount of countries coded is—as above—31 countries. The second highest number of coded countries is only 17. The average number of variables coded by experts is 69 (median 62). Thereby, 78 experts coded the whole set of indicators (164). It seems that many coders select only one country, but code a multitude of variables for that country. That makes clear that the workload is very unequally distributed: only a minority of experts contribute with a significant number of ratings to the dataset. However, is it possible that these experts have such knowledge over multiple cases, multiple years and multiple variables?
3.3.2.2 Number of Coders per Country-Year Figure 3.6 shows the number of coders per country-year, summarized over all 164 variables. Due to historical V-Dem, the number of country experts in the beginning of twentieth century is slightly higher. After the drop in 1920, it slightly increases over the years. However, the mean number of country experts per country-year comes only close to the V-Dem target of 5 coders per country-year after 1990. In 2005 and especially in 2012 there is a strong spike in the number of coders. In 2005, an increased number of country coders and bridge coders joined the coding procedure, while in 2012 especially the number of lateral coders increased. This shows that the interconnection between the countries—a presupposition of the measurement model—is mostly done in 2012.
62
3
Analyzing the Varieties of Democracy Dataset
Notes: The first plot show on the y-axis the number of countries per coder and the second plot shows on the y-axis the variables coded per coder. Both plots put the coders on the x-axis. Source: own calculaons based on V-Dem Disaggregated Dataset V.9.
Figure 3.5 Number of Countries and Variables coded by Experts
3.3 Assessing the Quality of the Varieties of Democracy Dataset
63
Notes: The black horizontal line indicates V-Dem’s goal of 5 experts per country-year. The plot shows the number for each coder type per country-year, summarized over all 164 variables. Source: own calculaons based on V-Dem Disaggregated Dataset V.9.
Figure 3.6 Types of Coders per Country and Year
Figure 3.7 shows that almost all sections are coded on average by 5 coders per country-year. The exception is the sovereignty section which is coded by less than 4 experts per country-year. Furthermore, almost all variables of the sovereignty section which were included in the V8 update (e.g., v2stcritapparm, v2strenarm, v2stfisccap) are based on less than three experts per country-year. Newer variables have a smaller number of coders (see Figure 119 in the appendix). The best three sections are civil society, executive, legislature and deliberation. This is especially surprising for deliberation which might be considered to be a very difficult question battery and one would assume that it is harder to find enough experts. Besides the sovereignty sections, the least coded sections are political equality and judiciary. However, these are only minor differences. Both can be considered as sections with very difficult questions.
64
3
Analyzing the Varieties of Democracy Dataset
Notes: sv = Sovereignty/State; pe = Polical Equality; ju = Judiciary; el = Elecons; ps = Polical Pares; me = Media; cl = Civil Liberty; lg = Legislature; dl = Deliberaon; ex = Execuve; cs = Civil Society. Source: own calculaons based on V-Dem Dataset V.9.
Figure 3.7 Number of Coders per country-year per section
3.3.2.3 Coder Confidence Confidence is measured by the self-assessment of the country experts: V-Dem lets the experts assess their confidence on a scale between 0 and 100. 0 means no confidence at all, 100 is perfectly confident. As expected, the more the codings are in the past, the less confident feel the experts (see Figure 3.8). Interesting is the slight decrease of confidence in 2012—the year which included the lateral coders. However, the lower confidence of V-Dem’s coder is not yet reflected in the reliability estimates of the V-Dem measurement model. This measurement model only estimates an overall reliability parameter, but does not vary it over the years. However, V-Dem would like to implement such a feature in the future, if possible.
3.3 Assessing the Quality of the Varieties of Democracy Dataset
65
Notes: The black line indicates the average confidence of the coders. The error bars display the 25th and 75th percenle. Source: own calculaons based on V-Dem Dataset V.9.
Figure 3.8 Average Confidence of the Coders per Country-Year
Figure 3.9 shows the confidence of the raters per section. Coders feel less confident especially in the section “judiciary”, followed by the section “media”, “political equality” and “deliberation”. The most confident coders can be found in the sections, “executive”, “legislature” and “sovereignty”. These observations align with the assumption that more difficult sections like judiciary, political equality and deliberation would produce less confidence.
66
3
Analyzing the Varieties of Democracy Dataset
Notes: sv = Sovereignty/State; pe = Polical Equality; ju = Judiciary; el = Elecons; ps = Polical Pares; me = Media; cl = Civil Liberty; lg = Legislature; dl = Deliberaon; ex = Execuve; cs = Civil Society. The bars indicate the mean confidence of the experts within a secon. Source: own calculaons based on V-Dem Dataset V.9.
Figure 3.9 Average Confidence of the Coders per Section
3.3.2.4 Coder Disagreement Coder disagreement, expressed as the standard deviations of the ratings for country-years (similar Martínez i Coma & van Ham, 2015), is shown in Figure 3.10. Overall, the more recent the ratings, the less disagreement there is between the coders. There are several spikes: First, there was large disagreement after the end of the first world war. Second, there was also large disagreement between the experts after the second world war. Both disagreements can be explained by the large turmoils and instabilities during and after these two events. Interestingly, the end of communism did only result in a very small increase of disagreement. And finally, there is an increase in disagreement after 2005 and especially 2012, when a huge bulk of bridge and lateral coders joined the rating procedure.
3.3 Assessing the Quality of the Varieties of Democracy Dataset
67
Notes: Coder disagreement is represented as the standard deviaons of the rangs for country-years. Source: own calculaons based on V-Dem Dataset V.9.
Figure 3.10 Average Disagreement of the Coders per Country-Year
Disagreement per section shows that the deliberation section is the one with the most disagreement by far (see Figure 3.11). This shows that the questions inside the deliberation section are hard to answer: The questions are vaguely formulated. This is followed by civil society and political equality. In contrast, to the self-assessment there is far less disagreement in the section “judiciary”. The next plot shows the disagreement for the sections only for those variables which have five response categories. There is almost no difference. Do the different types of coder agree with each other? Figure 3.12 shows the average rating per country year differentiated by coder type. The baseline is the country-coders ratings. If all the coder types would rate in the same way, they would have values near 0. However, historical coders grant—on average—higher scores to their coded countries than country coders. This means that historical coders seem to have some bias, insofar as they need to express higher values after 1900 to highlight the difference in the state of democracy between 1789 and 1900. Bridge coders give—on average—lower values than country coders. Their ratings become increasingly similar since 1965. There is a drop after 1990, when the bridge coders start to disagree again with the country coders. They agree with the country coders very close since 2000. This means bridge coders see the state of democracy in the past much worse than country coders. The lateral coders fluctuate in their magnitude of disagreement over the years, but they clearly grant higher values than country-coders since 2000. However, it is important to
68
3
Analyzing the Varieties of Democracy Dataset
Notes: sv = Sovereignty/State; pe = Polical Equality; ju = Judiciary; el = Elecons; ps = Polical Pares; me = Media; cl = Civil Liberty; lg = Legislature; dl = Deliberaon; ex = Execuve; cs = Civil Society. Source: own calculaons based on V-Dem Dataset V.9.
Figure 3.11 Disagreement between coders per section
underline that, on the one hand, the overall share of lateral coders is rather low compared to the other coder types (see Figure 3.4 above). On the other hand, the lateral coders are almost in line with the country coders for 2012, when there is a huge influx of lateral coders and when it is probably most important.
3.3 Assessing the Quality of the Varieties of Democracy Dataset
69
Notes: BC = Bridge Coder; CC = Country Coder and HC = Historic Coders. First, I calculated the average rang of the country-coders per country-year and variable. I then subtracted the rangs of the bridge, lateral and historic coders from that value. This plot shows then the average difference of these coders to the countrycoders per year. Source: own calculaons based on V-Dem Dataset V.9.
Figure 3.12 Average Rating of the Coder Types per Country-Year
3.3.3
Empirical Analysis of the Expert Coders
In this section, a regression model is specified to make general claims about the reliability of country estimates in the sense of the number coders (all expert coders, lateral and bridge coders) and about the causes of increased disagreement between raters. As stated above, these aspects are important because V-Dem’s measurement models is “based on the assumption that we have a minimum of five Country Experts for every single country-variable-year” (Coppedge, Gerring, Knutsen, Lindberg, Skaaning, et al., 2019, p. 30). Similarly, V-Dem needs enough lateral and bridge coders for a successful identification of the latent variable scale. The more cross-country coders, the higher the chance that a country can be correctly identified on the scale of the measurement model. Finally, the lower the
70
3
Analyzing the Varieties of Democracy Dataset
disagreement between the coders, the higher the (intercoder) reliability of the data, regardless of the measurement model. In Section 3.3.3.1, I formulate working hypotheses on factors that might contribute to or hinder a successful recruitment of sufficient experts. I divide the hypotheses according to the difficulty of the research object, the political and economic importance of the country, and finally, the characteristics of the questions themselves. Section 3.3.3.2 discusses the methodological framework. Due to the nature of the data (count variables) I use a specific type of regression, the Poisson regression. Finally, the last three sections test whether these working hypotheses can explain the number of all expert coders per country (3.3.3.3), the number of the lateral and bridge coders only (3.3.3.4) and coder disagreement (3.3.3.5).
3.3.3.1 Working Hypotheses and Operationalization In this section, I describe all the hypotheses which are tested in the following empirical analysis. The hypotheses should be understood as “working hypotheses”. This means that these hypotheses are not fully grounded in theory, but rather are formulated in a more pragmatic way. It is important to stress that these are factors which could influence the expert recruitment outcome. Therefore, should these factors be statistically significant and have a substantial effect size, then this means that V-Dem’s recruitment process is in some way biased. The first hypothesis focuses on the difficulty of the research object. It addresses the country as a transparent research object. The more transparent a country, the easier it is to find information and thus the more experts are available. And the more transparent the country, the less disagreement is between the codings of the experts. I measure difficulty with four aspects: GDP per capita, democracy quality, occurrences of internal and international war and number of coups. The higher the modernization of a country, the easier it is to find information and thus, the higher the number of experts for this country. I measure modernization using GDP per capita from the World Development Indicators. The lower the democratic quality of a country, the smaller the number of experts. Countries with a low quality of democracy are less transparent than countries with high democratic quality. For the latter countries, therefore, more information is available, which facilitates the recruitment of experts. However, another consideration could be made: Countries with high democratic quality (working democracies) and low democratic quality (hard autocracies) attract more raters, because the former have higher transparency and availability of information, and the latter have a prolonged regime stability. The latter may be less transparent, but they are actually easier to evaluate because their scores might not change much. The regimes in the middle of the democratic scale, the so called ‘grey zone’, should attract the least number of raters. They
3.3 Assessing the Quality of the Varieties of Democracy Dataset
71
are not transparent and often do fluctuate frequently. Therefore, we have a nonlinear effect: the highest number of raters should be at both ends of the scale, the lowest number in the middle. This aspect is measured with the political rights scale of the Freedom House index. Furthermore, the more occurrences of internal or international war, the lower the number of experts. Turmoil leads to an opaque development, so that less experts provided the knowledge which is needed to evaluate the situation. Here, Clio-Infra (2018) provides the dataset. In a similar vein, the more transitions through coups, the more difficult it is to rate the country. This is measured via the coups d’etat dataset (J.M. Powell & Thyne, 2011). The same reasoning applies to the quality of statehood which is measured using the “Political Stability and Absence of Violence” indicator from the World Governance Indicators. The second hypothesis focuses on the political and economic importance of the country: The more important the country is, the more experts are available and the higher the chances for V-Dem to recruit an expert. I measure this aspect with population size and trade (import/export). The higher the population size, the higher the number of coders. The higher the amount of international trading in the sense of the amount of import and export, the higher the number of experts. I measure the first aspects with the World Development Indicators, the latter aspect with a dataset by Gleditsch (2002). This factor is less relevant for the coder disagreement analysis. The last hypothesis uses the characteristics of the questions itself as a causal factor. The more difficult the question is, the less experts dare to answer it. The difficulty of the question depends on the accessibility of the political object to which the question refers: Assessing the quality of elections and civil liberties might be easy due to the availability of information and a more concrete political object, on the other hand evaluating deliberation and political equality seems very hard. In addition, the difficulty rises with a vague question wording. I measure the difficulty of the question by drawing on the previous analysis regarding the conceptual clarity of the indicators (see 3.3.1): A dummy variable is created, where the vaguely defined indicators (power distributed by socioeconomic position, v2pepwrses; power distributed by social group, v2pepwrsoc; power distributed by gender, v2pepwrgen; reasoned justification, v2dlreason; common good, v2dlcommon; respect counterarguments, v2dlcountr) are assigned a value of 1, while all other indicators are assigned a 0 value. For all regressions, I include the version of the dataset in which this variable was newly contained as a control variable. Variables which were included in newer dataset versions have, as mentioned above, fewer expert coders. In addition, for the disagreement regression, the number of response categories per variable
72
3
Analyzing the Varieties of Democracy Dataset
and the number of experts per country-year are used as control variables: The higher the number of possible responses, the more likely it is that the coder disagree. And similarly, it is more difficult to reach an agreement if the number of experts for a country-year is large. Table 3.5 summarizes these considerations regarding the working hypotheses and their measurement (summary statistics can be found in Figure 120 in the appendix).
Table 3.5 Working Hypotheses Theoretical Framework
Measurement
Difficulty of the research object
• WDI GDP per capita constant 2010 US Dollars (log) (wdi_gdpcapcur) • FH Political Rights Scale (mean) (e_fh_pr) • Powell/Thyne Coups d’etat dataset (sum) (e_pt_coup) • Clio Internal and international wars (sum) (e_miinterc, e_miinteco) • Political Stability and Absence of Violence (wbgi_pve)
Political and Economic Importance • Gleditsch Mean of Import and Export (log) (gle_imp, gle_exp) • WDI Total Population Size (log) (wdi_pop) • WDI Total Land Area (log, sq. km) (wdi_area) Difficulty of indicator question
• Dummy variable: 1: vaguely defined indicators; 0 otherwise
Control Variables
• Variables included in the newer versions 8 and 9 (reference category is: variables in version 6 resp 7) • Number of response categories (only for Disagreement Regression) • Number of Coders per Country-Year (only for Disagreement Regression)
Note: The brackets contain the variable names listed either in the QoG codebook or V-Dem Codebook. The values are the mean values over the period from 1974–2017 Source: own table
3.3.3.2 Methodological Framework In this analysis, I try to answer three questions: How many experts rate a country per year? How many bridge and lateral coders rate a country per year? And finally, how strong is the disagreement about the rating of a country between experts?
3.3 Assessing the Quality of the Varieties of Democracy Dataset
73
The dependent variables of the first two questions require the use of the so-called Poisson regression and its generalizations and adaptations (negative binomial and the truncated forms), whereas the last question can be tackled by a regression under the normal distribution. The dependent variable of the first two questions, the number of coders resp. bridge or lateral coders, is a count variable. It is the number of experts for a country and variable: I calculate the dependent variable by counting the number of unique coders resp. unique lateral and bridge coders per country over the whole time period that country is in the dataset. Therefore, the dependent variable has the form y = [0,1,2…]. Count data is discrete and does often not follow normal distribution (Hox et al., 2017, p. 139; Tutz, 2010, p. 888). In addition, count data cannot become negative, so that a regression under the normal linear model could lead to implausible results. A transformation to a normal distribution may be possible, however, this changes the interpretation of the regression results (Hox et al., 2017, p. 139). To circumvent these issues, I draw on the Poisson distribution and the Poisson regression model which can model count data, e.g. the number of coders per country or the number of bridged coders per country. However, the Poisson distribution has a major drawback. It is based on equidispersion: the mean and variance of the distribution are only represented by one parameter (λ), so that variance equals its mean. Therefore, the Poisson distribution “is frequently not flexible enough to adapt to the given data” (Tutz, 2011, p. 182). Data often exhibits a variance which is greater than the mean value (overdispersion), so that “data-level variance is higher than would be predicted by the model” (Gelman & Hill, 2007, p. 325). To model overdispersion, I can draw on the negative-binomial distribution which is a two-parameter distribution, where the additional parameter κ allows for overdispersion (Lambert, 2018). However, an alternative is to model the overdispersion explicitly by extending the Poisson regression to a multilevel model allowing for a “data-level variance comβo +∈i ; ∈ ∼ ponent” i (Gelman & Hill, 2007, p. 325) in the sense ofyi ∼ Poisson e N 0, σε2 . The latter approach of adding i is taken here, because the data structure and the questions itself require already a multilevel model: The dependent variable, the number of coders, is nested within countries but also within variables. However, there are two caveats: The dependent variable of the first question does not contain zeros. Each country is at least coded by one expert coder. This is a problem for the Poisson distribution. We can correct this by using the zero truncated Poisson distribution. In addition, important for an adequate modeling is the consideration of a possible offset: “counts can be interpreted relative to some baseline or ‘exposure’” (Gelman & Hill, 2007, p. 111). However, this is not important here, because the baseline does not vary for the different countries:
74
3
Analyzing the Varieties of Democracy Dataset
The number of unique coders is independent of the length of the existing time series for that country. Therefore, I refrain from including an offset. The DAG representation of the two multilevel Poisson models is shown in Figure 3.13. Furthermore, the figure contains also the multilevel regression model under the normal model which is used to analyze the third research question. Finally, prior to these regressions, I perform exploratory analyses to detect heteroscedasticity and multicollinearity and evaluate the number of missing values (for a more detailed discussion, see chapter 9). The results in form of XY plots (dependent variables vs. independent variables) and correlation plots are shown in the appendix. In addition, I test the model capabilities by simulating values from the model and compare them to the original data. Finally, model comparisons with the χ 2 statistics and AIC/BIC values were conducted.
Note: Le Model shows the overdispersed mullevel (truncated) Poisson model, while the right model depicts the mullevel regression model under the normal model. variable);
is the overall intercept;
is the observed number of coders (dependent
is the random component on the V-Dem variable level;
random component on the country level;
and
variables on the V-Dem variable resp. country level;
is the
are the regression coefficients and independent and
are the variance parameters for the variable
and country level; finally, σ is the parameter for modeling overdispersion. Source: own presentaon
Figure 3.13 DAG Representation of the Regression Models
3.3 Assessing the Quality of the Varieties of Democracy Dataset
75
3.3.3.3 Analysis I: Number of all Expert Coders per Country and Variable The first analysis focuses on all expert coders. The explorative analysis in form of XY-plots and correlation plots did not show any anomalies (see Figure 121 and Figure 122 in the appendix). There is a strong positive correlation between the population size and land area size (Pearson correlation coefficient 0.75). This might pose a problem for the regression model due to multicollinearity. Therefore, I test it by excluding the land size area indicator in one regression model. However, there are missing observations in the dataset (see Figure 123 and Figure 124 in the appendix). The missing data in the independent variables might interfere with the validity of the analysis, though: countries with missing observations show consistently a lower number of expert coders. By excluding these observations, the model will overestimate the number of expert coders. However, based on the correlation diagram, the missing observations are only weakly negatively correlated with the independent variables. This means, for example, that low GDP is not strongly correlated with missingness and therefore, the model will only slightly underestimate the impact of the independent variables. In total, there are over 24,469 observations clustered in 150 variables and 165 countries. To test whether the model can adapt to the data, I simulated datasets from the model and compared them to the data. It turned out that the overdispersion of the model is sufficiently established (see Figure 125 in the appendix). The empirical results are presented in Table 3.6. According to the standard deviations of the cluster variables in the null model (M1), there is considerable variation on level 2. The multilevel model is also highly favored against a single level model in the χ 2 -test (see Table 64). M2 and M3 tests the variables from the difficulty of the research object theory. These models are significantly favored compared to the null model. V-Dem does not have bias towards high GDP countries or low-conflict countries. However, the higher the democratic quality and thus, the more transparent a country, the higher the number of country experts. Similarly, the higher the number of coups d’etats, which implies a higher coding difficulty, the lower the number of expert coders. Although these models imply that the higher the statehood, the lower the number of experts, which would go against the theoretical considerations, this factor becomes insignificant by including the other variables. There is no nonlinear effect of democratic quality (M3). According to the χ 2 -test and AIC/BIC values, this model is not superior to the model with a simple linear relationship. M4 and M5 include the variables for the political and economic hypothesis. These models are favored compared to the previous models. The bigger the land area and population size, the higher the number of expert coders. Economic importance measured by trading activity, on the other hand, does not have an effect. M4
−0.622 (0.677)
M4 −0.151 (0.686)
M5 −0.242 (0.688)
M6
Population Size, log (WDI)
0.216 *** (0.041)
0.154 *** (0.046)
0.027 (0.041)
−0.176 *** −0.179 *** 0.028 (0.052) (0.053) (0.042)
Mean Political Stability and Absence of Violence (WGI)
0.007 (0.009)
−0.006 (0.006)
0.044 ** (0.016)
−0.007 (0.007)
0.008 (0.009)
Sum of Internal/International Wars (Clio)
−0.052 * (0.026)
0.039 * (0.016)
−0.019 (0.05)
−0.023 (0.019)
−0.052 * (0.026)
Sum of Coups d’Etat (Powell/Thyne)
0.006 (0.011)
0.03 (0.067)
0.008 (0.05)
−0.023 (0.019)
0.065 ** (0.021)
Average Political Rights (FH, reversed)
−0.003 (0.032)
(continued)
0.154 *** (0.046)
0.027 (0.041)
−0.006 (0.006)
−0.023 (0.019)
0.044 ** (0.016)
−0.019 (0.05) 3
Average Political Rights (FH, reversed)sq
0.003 (0.03)
GDP per capita 2010, log (WDI)
−0.975 *** −0.975 *** −0.975 *** −0.975 *** −0.989 *** (0.029) (0.029) (0.029) (0.029) (0.029)
2.449 *** (0.284)
M3
V-Dem Update V9
2.369 *** (0.243)
M2
−0.831 *** −0.831 *** −0.831 *** −0.831 *** −0.846 *** (0.059) (0.059) (0.059) (0.059) (0.061)
2.454 *** (0.043)
M1
V-Dem Update V8
(Intercept)
Fixed Part
Term
Table 3.6 Regression Results for the Number of All Expert Coders (Truncated Poisson Regression)
76 Analyzing the Varieties of Democracy Dataset
−0.044 (0.045)
M4
M5
−0.018 (0.045)
0.043 ** (0.016)
−0.018 (0.045)
0.043 ** (0.016)
M6
24,469 150 165
df.residual
NObs
NVariables
NCountries
Note: 2507 out of 26,976 are missing Source: own presentation
119,935.48 24,465
BIC
165
150
24,469
24,458
119,643.05
119,553.9
165
150
24,469
24,457
119,652.86
119,555.6
119,903.06
0
AIC
0
0
0.383
−59,947.53 −59,765.95 −59,765.8
0.383
logLik
0.408
countrysd
obs_effectsd
0.36
variablesd
Random Part 0.112 0
0.275
0
0.275
0.11
165
150
24,469
24,456
119,562.26
119,456.89
165
150
24,469
24,455
119,565.35
119,451.88
165
150
24,469
24,453
119,581.68
119,452
−59,715.45 −59,711.94 −59,710
0
0.281
0.112
0.025 (0.019) 0.112
M3
Number of Response Categories
0.112
M2
−0.027 (0.091)
M1
Question Difficulty
Mean of Import/Export (Gleditsch)
Total Land Area (WDI)
Term
Table 3.6 (continued)
3.3 Assessing the Quality of the Varieties of Democracy Dataset 77
78
3
Analyzing the Varieties of Democracy Dataset
and M5, which stepwise include the effects of land area and population size, make it clear that there is no problem of multicollinearity. The variables “coups d’etats” and statehood become insignificant. Finally, M6 tests the effect of the difficulty of the indicator question. This measure is insignificant. Using the χ 2 statistic, M6 is not superior to M5. The control values show that the variables which were first introduced in V8 and V9 have significantly lower number of expert coders. How substantial are these effects (see Figure 3.14)? The strongest effect has population size. From lowest populated to the highest populated country it raises the number of experts from 5 to 30 experts. Land area size raises the number of experts from 8 to about 15. Democratic quality shows a moderate substantial effect. From countries with no political rights to countries with full political rights, there is an increase of the experts from about 10 to 13. Therefore, V-Dem has a slight bias towards democratic regimes and a substantial bias towards populous and large countries. Statehood and coups d’etats, even though in some models significant, show no substantial effect. Whereas variables from V6 have about 12 expert coders, newer variables (version 8 and 9) have only 5 expert coders.
Note: y-values are the number of all expert coders. Source: own presentaon
Figure 3.14 Predicted Values for All Expert Coders
3.3 Assessing the Quality of the Varieties of Democracy Dataset
79
3.3.3.4 Analysis II: Number of Bridge and Lateral Coders The second analysis focuses only on lateral and bridge coders. The results of explorative analysis are very similar to the first analysis. The XY-plots and correlation plots did not show any anomalies (see Figure 126 and Figure 127 in the appendix). As in the first analysis, there is a strong positive correlation between the population size and land area size (Pearson correlation coefficient 0.75) indicating multicollinearity. The missing observations in the dataset (see Figure 128 and Figure 129 in the appendix) might interfere with the validity of the analysis, because countries with missing observations showed consistently a lower number of expert coders. Based on the correlation diagram and as before, the missing observations are only weakly negatively correlated with the independent variables. In total, there are over 24,469 observations clustered in 150 variables and 165 countries. I simulated datasets from the model and compared them to the data to test whether the model can adapt to the data. The overdispersion of the model is sufficient (see Figure 130 in the appendix). The results are presented in Table 3.7. According to the standard deviations of the cluster variables in the null model (M1), there is considerable variation on level 2. The multilevel model is also highly favored against a single level model in the χ 2 -test (see Table 65 in the appendix). M2 and M3 tests the variables from the difficulty of the research object theory. These models are significantly favored to the null model. Here, the variables “coups d’etats” and statehood are significant. Their significance vanishes afterwards with the introduction of new variables in the models: The higher the number of coups d’etats, which implies a higher coding difficulty, the lower the number of expert coders. And the higher the statehood, the lower the number of experts, which goes actually against the theoretical considerations. In contrast to the first analysis, lateral and bridge coders do not significantly vary with the democratic quality. In addition, there is no nonlinear effect of democratic quality (M3). According to the χ 2 statistic and AIC/BIC values, this model is not superior to the simpler linear model. M4 and M5 include the variables for the political and economic importance hypothesis. These models are favored compared to the previous models. The greater the population size, the higher the number of bridge and lateral coders. Land area size and economic importance measured by trading activity, on the other hand, do not have an effect. The variables “coups d’etats” and statehood become insignificant. Finally, M6 tests the effect of the difficulty of the indicator question. It is not superior to the previous model according to the χ 2 -test. The control values show that the variables which were first introduced in V8 and V9 have significantly lower number of expert coders.
−0.239 ** (0.086)
−0.246 ** (0.086)
Mean Political Stability and Absence of Violence (WGI)
Population Size, log (WDI)
0.003 (0.014)
0.002 (0.014)
Sum of Internal/International Wars (Clio)
−0.074 + (0.043)
−0.074 + (0.043)
Sum of Coups d’Etat (Powell/Thyne)
−0.014 (0.018)
0.296 *** (0.076)
0.225 ** (0.087)
0.012 (0.078)
−0.016 (0.012)
−0.017 (0.012) 0.014 (0.079)
−0.038 (0.036)
−0.038 (0.036)
−0.016 (0.029)
−0.022 (0.029)
(continued)
0.225 ** (0.087)
0.012 (0.078)
−0.016 (0.012)
−0.038 (0.036)
−0.016 (0.029)
−0.043 (0.094) 3
Average Political Rights (FH, reversed)sq
0.092 (0.109)
−0.043 (0.094)
−0.013 (0.093)
0.011 (0.035)
−0.029 (0.052)
Average Political Rights (FH, reversed)
−1.308 (1.296)
M6
−0.044 (0.049)
−1.308 (1.296)
M5
GDP per capita 2010, log (WDI)
−1.843 (1.262)
M4
−1.098 *** −1.098 *** −1.098 *** −1.098 *** −1.098 *** (0.038) (0.038) (0.038) (0.038) (0.038)
2.113 *** (0.464)
M3
V-Dem Update V9
2.301 *** (0.397)
M2
−1.077 *** −1.077 *** −1.077 *** −1.077 *** −1.077 *** (0.079) (0.079) (0.079) (0.079) (0.079)
1.788 *** (0.063)
M1
V-Dem Update V8
(Intercept)
Fixed Part
Term
Table 3.7 Regression Results for the Number of Bridge and Lateral Expert Coders (Poisson Regression)
80 Analyzing the Varieties of Democracy Dataset
M5
24,465 24,469 150 165
df.residual
NObs
NVariables
NCountries
Note: 2507 out of 26,976 are missing Source: own presentation
103,764.38
165
150
24,469
24,458
103,511.75
103,422.59
165
150
24,469
24,457
103,521.24
103,423.98
165
150
24,469
24,456
103,477.19
103,371.83
165
150
24,469
24,455
103,484.7
103,371.22
165
150
24,469
24,454
103,494.79
103,373.22
0
0.521
103,731.96
0
0.521
BIC
0
0.524
AIC
0
0.626
−51,861.98 −51,700.29 −51,699.99 −51,672.91 −51,671.61 −51,671.61
0
0.627
logLik
0.149
−0.006 (0.063)
−0.051 (0.084)
0.049 (0.03)
M6
0
0.149
−0.051 (0.084)
0.049 (0.03)
0.67
0.149
−0.081 (0.083)
M4
obs_effectsd
0.149
M3
countrysd
0.149
M2
0.421
M1
variablesd
Random Part
Question Difficulty
Mean of Import/Export (Gleditsch)
Total Land Area (WDI)
Term
Table 3.7 (continued)
3.3 Assessing the Quality of the Varieties of Democracy Dataset 81
82
3
Analyzing the Varieties of Democracy Dataset
Are these effects substantial (see Figure 3.15)? The greatest effect size has, again, population size. From its lowest to the highest values, it causes the number of bridge and lateral coders to be between just 1.5 and 24. Land area size, even though insignificant, has also a moderate effect. The number of bridge and lateral coders varies from 4 to 8 over the values of land area size. While there are on average 8 lateral and bridge coders for variables in V6, there are only 2 lateral and bridge coders for the variables introduced in version 8 and 9. The other effects are only minor. Overall, the recruitment process of the lateral and bridge coders seems to be less biased compared to the recruitment process of all experts.
Note: y-values are the number of bridge and lateral expert coders. Source: own presentaon
Figure 3.15 Predicted Values for the Number of Lateral and Bridge Coders
3.3 Assessing the Quality of the Varieties of Democracy Dataset
83
3.3.3.5 Analysis III: Coder Disagreement The last part of the empirical analysis focuses on coder disagreement (a similar analysis for the Electoral Integrity Project is: Martínez i Coma & van Ham, 2015). The explorative analysis revealed no anomalies in the independent variables (see Figure 131, Figure 132 and Figure 133 in the appendix), except for the multicollinearity issue with land area size and population size as before. However, the dependent variable does not fit a normal distribution well. It has a bulk of values on the left side of the distribution. I continue with a regression under the normal model, because glmmTMB—the R-package used for the calculations—does currently (version 1.0.2.1) not support a Student-t distribution which would probably soften the impact of these outlying values. The simulated values from the model shows that the model cannot account for this bulk of values (see Figure 135 in the appendix) and that the model used is not optimal. In addition, there are missing observations. The disagreement for cases with missing observations is slightly higher. The missingness is only weakly correlated with the independent variables. In total, I do not expect a great distortion of the empirical results. Overall, there are 24,377 observations for 150 variables and 165 countries. M1 is the null model. According to the standard deviations, there is considerable variation on the country, variable and observation level. χ 2 -Test significantly favors the multilevel over the single level model (see Table 66 in the appendix). M2 and M3 include the indicators to test the hypothesis about the difficulty of the research objects. Both models are significantly favored over the null model, and M3 with the nonlinear relationship of democracy quality and disagreement is also significantly favored over M2 with the mere linear relationship. In total, the standard deviation of the country level is reduced by 30% (from 0.3 to 0.2). Besides the significance of the squared term of democracy quality, the GDP per capita variable is also significantly related to the dependent variable: The higher the GDP of the country, the lower the disagreement. Nevertheless, with the inclusion of more variables, this relationship becomes insignificant. M4 and M5 include the hypothesis about the political and economic importance of the country. However, no term is significant, and these models are, overall, no improvement according to the χ 2 -test. Finally, the last model M6 test the hypotheses of the difficulty of the research question. Here, the χ 2 -test shows no significant improvement. Furthermore, the standard deviation for the variable level is not reduced in any way. Finally, all control variables show significant effects. The higher the number of expert coders, the higher the disagreement in the codings. Interestingly, variables included in version 8 show a significantly lower disagreement than variables from V6 (only in the last model M6), while variables included in version 9 shows a significantly higher disagreement than variables in version 6 (Table 3.8).
M3
M4
M5
M6
0.012 (0.014) −0.007 (0.005)
0.012 (0.014) −0.009 + (0.005)
Sum of Internal/ International Wars (Clio)
−0.063 *** 0.046 (0.012) (0.035)
Average Political Rights (FH, reversed)
Sum of Coups d’Etat (Powell/Thyne)
0.049 (0.035)
−0.092 *** −0.073 *** −0.034 (0.016) (0.017) (0.037)
GDP per capita 2010, log (WDI)
−0.019 ** (0.006)
−0.039 (0.037)
0.128 + (0.078)
0.128 + (0.078)
V-Dem Update V9
−0.008 (0.005)
0.009 (0.014)
−0.019 ** (0.006)
0.046 (0.035)
−0.466 ** (0.155)
−0.007 (0.005)
0.009 (0.014)
−0.019 ** (0.006)
−0.466 ** (0.155)
(continued)
−0.008 (0.005)
0.009 (0.014)
−0.019 ** (0.006)
0.046 (0.035)
−0.034 (0.037)
0.132 + (0.078)
−0.462 ** (0.155)
0.026 *** (0.004)
0.512 *** (0.028)
3
Average Political Rights (FH, reversed)sq
0.128 + (0.078)
0.128 + (0.078)
−0.467 ** (0.155)
−0.467 ** (0.155)
0.026 *** (0.004)
V-Dem Update V8
0.026 *** (0.004)
0.026 *** (0.004)
0.513 *** (0.028)
0.026 *** (0.004)
0.513 *** (0.028)
Average Coders per Year
0.513 *** (0.028)
−1.564 *** −1.816 *** −2.269 *** −2.176 *** −2.265 *** (0.187) (0.2) (0.532) (0.518) (0.532)
M2
0.513 *** (0.028)
−0.013 (0.053)
M1
Number of Response Categories
(Intercept)
Fixed Part
erm
Table 3.8 Regression Results for Coder Disagreement (Normal Multilevel Regression)
84 Analyzing the Varieties of Democracy Dataset
56,163.61 24,373 24,377
BIC
df.residual
NObs
24,377
24,364
55,883.9
55,778.58
24,377
24,363
55,883.79
55,770.37
24,377
24,360
55,912.06
55,774.34
0.745
24,377
24,361
55,902.53
55,772.91
0.745
(continued)
24,377
24,359
55,921.95
55,776.12
0.745
0.191
0.298
56,131.2
0.745
0.192
0.298
AIC
0.745
0.191
0.298
0.746
0.193
0.298
−28,061.6 −27,876.29 −27,871.18 −27,870.17 −27,870.45 −27,870.06
0.199
0.298
logLik
0.313
countrysd
0.059 (0.128)
Residualsd
0.568
variablesd
Random Part
Question Difficulty
−0.038 (0.033)
−0.032 (0.032)
−0.038 (0.033)
0.033 (0.034)
−0.019 (0.03)
M6
Mean of Import/Export (Gleditsch)
0.02 (0.029)
−0.019 (0.03)
M5
−0.009 (0.012)
−0.019 (0.03)
M4
−0.009 (0.012)
−0.01 (0.028)
−0.019 (0.029)
Total Land Area (WDI)
M3
M2
0.033 (0.034)
M1
Population Size, log (WDI)
Mean Political Stability and Absence of Violence (WGI)
erm
Table 3.8 (continued)
3.3 Assessing the Quality of the Varieties of Democracy Dataset 85
150 165
NVariables
NCountrytextid
Note: 2599 out of 26,976 are missing Source: own presentation
M1
erm
Table 3.8 (continued)
165
150
M2 165
150
M3 165
150
M4 165
150
M5 165
150
M6
86 3 Analyzing the Varieties of Democracy Dataset
3.3 Assessing the Quality of the Varieties of Democracy Dataset
87
Are these substantial effects (see Figure 3.16)? GDP per capita which is at least significant in M2 and M3 shows only a small substantial effect compared to other variables. The nonlinear effect of democratic quality is interesting, because it does not follow the theoretical considerations. Autocracies shows the same level of disagreement as countries with a democracy quality in the middle of the scale. Only countries with a very high democratic quality are coded with less disagreement. This effect is also moderate compared to the other variables. Another, more substantial effect has the variable “number of coders per year”: As the number of coders grows, the disagreement increases. The most substantial effect, however, is the number of categories: when the response categories increase from 2 to 6, the disagreement rises strongly.
Note: y-axis is the coder disagreement. Source: own presentaon
Figure 3.16 Predicted Values for Coder Disagreement
88
3.3.4
3
Analyzing the Varieties of Democracy Dataset
Multiple Choice and Percentage Variables
Multiple choice and percentage variables (e.g. v2svstterr) are not subjected to the measurement model, probably because of the lack of an ordinal scale. However, this means that there are no difficult and discrimination parameters estimated for each rater and instead, the codings of the rater are weighted equally. While the measurement model can identify unreliable coders and reduce their impact on the aggregate value, such a correction is not possible for the values of multiple choice and percentage variables. This causes at least two problems: The first refers to distortion from unreliable experts, while the second issue concerns distortions due to the varying numbers of expert coders. The first issue can be seen for the estimates of Germany for the variable “State authority over territory (v2svstterr)”7 (see Figure 3.17): The expert with the ID 648 coded very differently than rest of the other coders. Whereas the other coders see almost no problem in the state’s monopoly on the use of force after 1945 in Germany, he consistently gives zero points. This coder may have misunderstood this question. This also indicates a problem with the measurement model: it implicitly assumes that some raters did not accidentally turn the scale upside down, so that they all understand the direction of the scale in the same way. Furthermore, the estimates of multiple choice or percentage variables are affected by the changing number of coders. Of course, this can be considered as a general problem, probably affecting other datasets as well. However, this problem is more severe in the V-Dem dataset. The main goal is to recruit as much expert as possible to estimate sufficiently the measurement model. As shown before, especially the number of lateral coders spikes in the year 2012. This has a dramatic effect on the estimates of the percentage and multiple-choice variables: Figure 3.18 shows a sudden change of the values in 2012 for the values of the variable “HOS control over (v2exctlhs)”. Figure 3.19 generalizes this finding and shows that other variables are also affected. As can be seen, changes in the values of multiple choice and percentage variables goes hand in hand with changes of the number of coders: The higher the change in the number of coders, the higher the change in the values of these variables. This is especially relevant because V-Dem does not offer uncertainty estimate for the multiple-choice variables. As the same figure shows, the variables subjected to the measurement model are almost not affected. Here we can see large changes in the values without a change in the number of coders. Overall, 7
This problem was reported to V-Dem by the author. It seemed that it was still undiscovered by V-Dem at that time.
3.3 Assessing the Quality of the Varieties of Democracy Dataset
89
Note: State authority over territory (v2svserr). Each line stands for a different coder. The y-axis shows the values of the indicator (100 indicates the absolute monopoly of power, 0 represents no monopoly of power). The x-axis shows the years. As can be seen, the coder indicated by black dots rates consistently lower than the other experts. Source: own calculaon
Figure 3.17 The Ratings for Germany for the Indicator “v2svstterr”
this means that it could lead to an invalid analysis at least for some years when there is a huge influx of expert coders. These kinds of variables work best when the number of experts is constant.
3.3.5
Discussion
In this section, I discuss these empirical findings in relation to the research criteria which were outlined above. Regarding the validity criterion, almost all V-Dem’s indicators have a high validity. This is also supported by the detailed descriptions
90
3
Analyzing the Varieties of Democracy Dataset
Note: Queson text: “In pracce, from which of the following bodies must the head of state customarily seek approval prior to making important decisions on domesc policy?”. Values averaged across all countries. Source: own calculaon, V-Dem Codebook (Coppedge, Gerring, Knutsen, Lindberg, Skaaning, et al., 2019, p. 109)
Figure 3.18 Mean Values for the Multiple-Choice Variable “HOS control over” (v2exctlhs)
for almost every indicator and the glossary in the codebook which is offered by V-Dem. However, there are few variables which have a validity problem: There are variables which investigate several different aspects simultaneously, whose question wording seems very vague and finally, there are indicators with illdefined scales. It is important to note that the problem cannot be solved by the measurement model, because these problems affect the concepts of the variables themselves prior to the measurement model and these are also problems which the design of measurement model does not tackle. Because these validity concerns affect the equality measurement the most, this potentiates V-Dem’s conceptual bias in favor of freedom aspects of the political system. The next criterion is reliability. In some aspects, the reliability of the dataset can be seen as—at least—doubtful and not all of these aspects can be mitigated by the measurement model. The descriptive analysis revealed that the workload
3.3 Assessing the Quality of the Varieties of Democracy Dataset
Note: Randomly selected variables (HOS removal by other in pracce – v2exrmhsol; stronger civil liberes characteriscs – v2clrgstch; HOS control over – v2exctlhs; Regime support groups – v2regsupgroups; Property rights for men – v2clprptym; elecon assume office – v2elasmoff; party organizaons – v2psorgs; Power distributed by social group – v2pepwrsoc). Top two rows display the percentage and mulple-choice variables. Boom two rows contains measurement model variables. Values are averaged across countries and years. Source: own calculaon, V-Dem Codebook (Coppedge, Gerring, Knutsen, Lindberg, Skaaning, et al., 2019, p. 109)
Figure 3.19 Multiple Choice Variables and Measurement Model Variables
91
92
3
Analyzing the Varieties of Democracy Dataset
across the experts is distributed very unequally. 33% of all experts rate 80% of the data. Can these experts have such knowledge to precisely and reliably rate such a large number of countries over different political aspects? The problem is multiplied when we think about the time horizon, from 1900 to 2018, which needs to be coded by these experts. It also revealed that V-Dems target of five coders per country-year is only reached—on average—after 1990. I showed that the coder confidence is less, the more the ratings is in the past. This relevant fact is not (yet) considered by the measurement model which assumes that the raters are equally confident in all their ratings. Furthermore, coder disagreement is high when the ratings are in the past and when there are radical changes (e.g. WW II). There is also systematic bias between the different coder types: Historical coders give higher values to country-years between 1900 and 1920 compared to country coders for the same period. I argued that historical coders assign higher values in the 1900s to highlight the difference from the state of democracy between 1789 and 1900. Bridge coders give lower values in the past than country coders. Finally, the regression analysis found that the values for autocracies and countries with some democratic quality show higher disagreement compared to the values of countries with a very high democratic quality. The problems of the multiple choice and percentage variables belong also in the category of reliability: These variables are not subjected to the measurement model and V-Dem offers no confidence intervals. This leads to two problems affecting the reliability: First, codings of reliable and unreliable experts are weighted equally, and second, their aggregated values show huge changes, when many country experts enter the survey or drop out from it. The data can probably lead to an invalid analysis at least for some years when there is a huge influx of expert coders. Besides validity and reliability in general, the previous discussion makes it clear that special emphasis should be placed on the requirements of the measurement model. The measurement model not only needs a large number of coders for a reliable estimate, it must also have a large number of bridge and lateral coders for a reliable placement of the country estimates on the latent scale relative to other country-years. Applying a regression analysis, I tried to discover possible biases which affect the recruitment process and result in more resp. less coders. For the all coder sample, the regression discovered a moderate bias towards democracies, and a greater bias towards populous and larger countries. Countries with these characteristics had more unique coders. I did not find the same effects for the bridge and lateral coders. Larger countries had still more bridge and lateral coders, while there was no bias towards democracies. Overall, the working hypotheses could barely be confirmed: Only parts of the difficulty of the research
3.4 Summary and Conclusion
93
object and the political and economic importance had a significant effect. It was remarkable to find so few sources of bias (e.g., there is no effect regarding the question difficulty). This means that the recruitment process seems to be working sufficiently well.
3.4
Summary and Conclusion
The V-Dem dataset is used to measure and identify the democracy profiles that are the focus of this study. This underlines the importance of this chapter, which evaluates and analyses the V-Dem dataset in order to show its advantages and disadvantages for this research study: Can we have confidence in the data used to calculate the quality of democracy and identify the democracy profiles? To answer this question, the range of themes covered by the V-Dem indicators and the theoretical foundations and assumptions of the measurement model have been discussed. Three research questions were derived from this, which not only concern the validity and reliability of the data set, but also place special emphasis on the prerequisites of the measurement model. In terms of validity, I analyzed whether the indicators were conceptually clearly defined (one-dimensionality, precise definition) and had well-ordered response categories. Regarding reliability, I assessed certain characteristics (workload, confidence) which might affect a reliable rating of the experts. In addition, I also analyzed the disagreement between experts which might be considered as an indicator for reliability. Finally, the measurement model needs a sufficiently large number of so-called country, bridge and lateral coders for reliable estimates. To my knowledge, this chapter represents the first attempt to analyze the V-Dem dataset independently from the V-Dem institute itself. What can we conclude for the sample of this study? First, although V-Dem offers indicators for a variety of different themes, there is some conceptual bias towards the freedom dimension. This bias is exaggerated considering that the conceptual clarity is compromised for core equality measures. This is problematic for the measurement of the equality matrix fields of Democracy Matrix, meaning that these matrix fields could only be roughly operationalized. Ultimately, this leads to problems of the measurement of democracy profiles, too, especially for the detection of the egalitarian democracy profiles. Secondly, the reliability of the dataset can be seen as—at least—doubtful. Considering that 33% of the expert coders create 80% of the data, can experts have such knowledge to precisely and reliably rate such a large number of countries
94
3
Analyzing the Varieties of Democracy Dataset
over different political aspects? There also was a systematic bias regarding the different types of coders: Most remarkable, historical coders give higher values for countries in the period from 1900 to 1920 than country coders: Historical coders disagreed with the country coders. However, for the sample used here, which comprises democracies, the regression analysis showed that there is, on average, less disagreement in the values for democracies. This means the selected sample of this studies suffers less from these problems. Furthermore, multiple-choice and percentage variables had also reliability issues. In particular, their estimates fluctuate when new coders enter or when coders leave the survey. However, this type of variable is not as relevant for this study, because they were only used sporadically in the calculation of the Democracy Matrix (e.g. HOS removal by other in practice, v2exrmhso, to calculate the effective governing power component). Thirdly, I examined using a regression analysis what influences the number of all expert coders on the one hand and just the number of bridge and lateral coders on the other hand. Surprisingly, I found few source of bias in the recruitment process: Democracies, and especially populous and large countries had more coders on average, while populous countries had more bridge and lateral coders on average. This means that I can be less concerned about not meeting the requirements of the measurement model, because there should be more expert coders for our sample of democratic states. Regarding my study, this means that there should be a sufficient degree of validity and reliability. In addition, the number of coders should mostly fulfill the prerequisites of the measurement model. Overall, the V-Dem measurement model is a right step in the direction. As shown, it also requires at least a basic methodological understanding of Bayesian item response theory by the user when he decides to use the V-Dem dataset.
Part II AGIL Typology of Political Performance
4
AGIL Typology of Political Performance
4.1
Introduction
This chapter aims to develop a conceptual framework of fundamental performance areas of democracies to judge and assess the performance of the different profiles of democracies: What are relevant performance criteria? Given the “multiplicity of heterogeneous criteria of political performance” (Roller 2011: 7), it narrows down the analytical perspective and conceptualizes the dependent variable(s) of this study. This resembles kind of a balancing act: On the one hand the measure of performance must be as “comprehensive” (Putnam, 1994, p. 64) as possible. A misspecification of the theoretical framework has dramatic consequences for answering the research question because it makes the empirical results of this study “blind” for performance areas which are not captured by it—in the worst case, completely misjudging the performance of the profiles of democracies. On the other hand, the analytical focus needs to be parsimonious as possible not to lose trace of the most relevant performance aspects. Therefore, the conceptualization of performance and hence, the selection of performance aspects and variables for this study needs to be justified by theory. This highlights the extraordinary importance of this chapter. The chapter proceeds in two steps: A literature review of performance definitions and typologies is presented in section 4.2, the development of an own typology of performance areas is carried out in section 4.3. Finally, the chapter concludes with a summary (section 4.4).
© The Author(s), under exclusive license to Springer Fachmedien Wiesbaden GmbH, part of Springer Nature 2021 O. Schlenkrich, Origin and Performance of Democracy Profiles, Vergleichende Politikwissenschaft, https://doi.org/10.1007/978-3-658-34880-9_4
97
98
4.2
4
AGIL Typology of Political Performance
Literature Review
Performance is a prominent and relatively old research field in comparative politics (Dahl, 1967; Eckstein, 1971; Gurr & McClelland, 1971; G.B. Powell, 1982; for an overview, see Roller, 2011). I discuss the central and influential performance concepts and typologies of Almond and Powell (1982), Putnam (1994), Lijphart (1999, 2012) and Roller (2005) in this chronological order. In addition, I analyze the more empirical oriented approaches of several measurement instruments that assess performance and governance, such as the World Governance Indicators (Kaufmann et al., 2010), Sustainable Governance Indicators (SchraadTischler & Seelkopf, 2017) and Bertelsmann Transformation Index (Bertelsmann Stiftung, 2018a). This chapter deals with approaches that claim to propose a general, overarching definition of political performance that differentiates and at the same time unites different performance areas (e.g. economic and social performance). Studies that focus on individual performance areas and do not link this area with others are discussed in chapter 8. Almond and Powell (1982; for an empirical analysis, see G.B. Powell, 1982) develop a performance typology based on their political system approach (see Table 4.1). They assign different performance criteria to the three functions of a political system (system function, process function and policy function). Within their process function, they cover cultural aspects in the sense of the political culture approach with the elements “compliance and support”. This is an innovative feature that is not covered by other concepts discussed below. Even though the political system approach gives a reasonable order to these criteria, the selection of items seems rather “inconsistent” (Roller, 2005, p. 26), especially for the policy goods. Concerning the selection of performance aspects, the recurse to the political system functions is too vague. Why should we analyze only welfare, security or liberty? Why not environmental outcomes?
Table 4.1 Overview Political Productivity of Political Systems (Almond/Powell) Performance Areas
Sub-items
System goods
System Maintenance
Process goods
Participation
System Adaptation Compliance and Support Procedural Justice (continued)
4.2 Literature Review
99
Table 4.1 (continued) Performance Areas
Sub-items
Policy goods
Welfare Security Liberty
Source: own presentation based on Almond/Powell (1982)
Putnam (1994, p. Ch. 3) analyses the performance of Italian regional governments using a set of twelve indicators. He stresses the multidimensionality of performance in the sense of “the possibility that different governments might simply be good at different things” (Putnam, 1994, p. 64) and distinguishes three broader performance areas in the conception (policy process, policy decision and policy implementation). The full set of performance criteria are listed in Table 4.2.
Table 4.2 Putnam’s Performance Criteria in Making Democracy Work Performance Areas
Indicators
Policy Process
cabinet stability budget promptness statistical and information service
Policy Decision
reform legislation
Policy Implementation
day care centers
legislative innovation family clinics industrial policy instruments agricultural spending capacity local health unit expenditures housing and urban development bureaucratic responsiveness Source: own presentation Putnam (1994, p. Ch. 3)
It seems that his study focuses on reasonable and relevant performance areas. However, Putnam does not justify the performance criteria with a thorough theoretical reflection, even though Putnam’s criteria closely resemble the policy cycle approach. This narrow conceptual perspective on performance might also be the
100
4
AGIL Typology of Political Performance
reason why he concludes that all indicators can be aggregated into a single composite index of institutional performance based on a statistical analysis and thus, why he dismisses his idea of the multidimensionality of performance. In addition, he measures outputs rather than outcomes. Outputs refer to the content of laws that reflect the ambitions and plans of the government, while outcomes correspond to the actual effects and consequences of these laws (Lauth & Wagner, 2016, p. 105; Schmitt, 2012, pp. 30–31). Therefore, Putnam’s study “is actually not a study of performance but rather a policy output study” (Roller, 2005, p. 32). Measuring performance on the output level runs the risk of a serious misjudgment if the outputs do not successfully translate into real outcomes. Even though outcomes cannot be directly controlled by the political actors, Roller argues that outcome measures are more important for assessing the performance because only outcomes and not outputs have actual consequences and change the environment of the political system: “performance does not refer to actions or efforts to reach goals […] but to the outcomes or actual results of these actions” (Roller, 2011, p. 1852). Lijphart states, the “fact that governments are not in full control does not mean that they have no control at all. When the economy performs well […] governments routinely claim credit for this happy state of affairs” (Lijphart, 2012, p. 260). Rather, he highlights the importance of statistically controlling for these external influences in the causal analysis of performance to isolate the genuine political effect. Lijphart (1999, 2012) explores the differences between majoritarian and consensus democracy in three performance areas: governance, representation and policy orientations. He evaluates outcomes as well. On the one hand, Lijphart’s list is more extensive and he seems to capture more relevant performance areas in contrast to Putnam (e.g. political equality or environmental policies). On the other hand, Lijphart does not give a theoretical justification for the selected criteria similarly to Putnam. Therefore, it is not clear why his selected areas are relevant. All analyzed performance criteria are shown in Table 4.3. Roller’s (2005) typology offers an immense improvement over Lijphart. Based on an extensive literature review, Roller develops a “typology of performance criteria” differentiating between goal-oriented and general performance on the one hand and systemic and democratic performance on the other hand. General performance evaluates the achievement of procedural goals “whose realization promotes the attainment of specific policy goals” (Roller 2005: 21), whereas goal-oriented performance aims at measuring substantive goals. Systemic performance captures substantive and procedural goals relevant to all political systems. In contrast, democratic performance evaluates these goals specifically regarding democracies.
4.2 Literature Review
101
Table 4.3 Overview Lijphart 2012 Performance Areas
Sub-items
Governance
Good governance Macroeconomic management Control of violence
Representation
Quality of Democracy Women’s representation Political equality Electoral participation Satisfaction with democracy
Policy Orientation
Welfare state Protection of the environment Fewer imprisonments and death penalties Foreign aid spending
Note: The sub-items are measured using different sources (e.g., quality of democracy is measured by the Voice and Accountability Index from the World Governance Indicators and the EIU Democracy index). Source: own presentation based on Lijphart (1999, 2012)
Thereby, Roller focuses on outcome rather than output measures. This sets up the following 2×2 matrix (Table 4.4):
Table 4.4 Roller’s Typology of Performance Criteria Goal-oriented performance
General performance
Systemic performance
Security, welfare (1,1)
Efficiency, stability (1,2)
Democratic performance
Liberty, equality (2,1)
Accountability, participation (2,2)
Source: Roller (2005)
This additional separation (systemic vs. democratic performance) is based on the work of Fuchs (1998) and is almost identical to the measurement of the quality of democracy. This is problematic: Firstly, it blurs the distinction between goal-oriented and general performance. Liberty and equality are not mere substantive goals rather they characterize the quality of a procedure (e.g. free and equal electoral procedures). Thus, they are themselves procedural to some extent
102
4
AGIL Typology of Political Performance
and therefore, do not belong to the goal-oriented but rather to the general performance criterium. Apart from substantial definitions of the quality of democracy (e.g. social democracy), it is not feasible to differentiate between goal-oriented (substantive) and general (procedural) performance from a quality of democracy perspective: Prominent definitions define quality of democracy solely in procedural terms (Lauth, 2004; Munck, 2016). Secondly, if democratic performance applies only to democracies, then what criterion is used for the regime classification in the first place (Munck, 2016)? Finally, the systemic performance is crudely conceptualized and undetermined. What functions/outcomes must a political system necessarily produce for its society? Another prominent performance measure is the Worldwide Governance Indicators (see Table 4.5): They define governance as “the traditions and institutions by which authority in a country is exercised” (Kaufmann et al., 2010, p. 4). Thereby, this governance definition is related to performance because it implies procedural goals in the sense of Roller’s general performance. They divide governance into six sub-dimensions: While “Voice and Accountability”, “Rule of Law” and “Control of Corruption” are related to the quality of democracy approach, “Political Stability” and “Absence of Violence” as well as “Government Effectiveness” are related to statehood. Finally, “Regulatory Quality” measures the market-friendliness of the polity. Scholars point out that this definition “is just about as broad as any definition of ‘politics’” (Rothstein & Teorell, 2008, p. 168) and, therefore, the conceptual base of the Worldwide Governance Indicators might not be useful. However, the conception deviates from the other mentioned approaches in two important ways: On the one hand, the governance dimensions of the WGI are defined in terms of perceptions: “perceptions matter because agents base their actions on their perceptions, impression, and views” (Kaufmann et al., 2010, p. 18). Even though the reliance on only subjective aspects seems problematic (Iqbal & Shah, 2008; Muno, 2012), it is a reasonable addition to the hitherto concepts based solely on facts. As Roller (Roller, 2011, p. 1853) puts it, it “is equally valid to ask for the quality of government on the basis of objective criteria and to ask for citizens’ evaluation of the quality of government”. On the other hand, WGI uses a variety of sources (e.g. expert surveys; public opinion surveys) from different organizations (e.g. Latinobarometro; Varieties-of-Democracy; Freedom House) and aggregates them into a single composite measure drawing on an elaborate statistical procedure. Two influential measures are developed by the Bertelsmann Stiftung, the Sustainable Governance Indicators (SGI) and the Bertelsmann Transformation Index (BTI). The SGI monitors to what extent OECD- and EU-countries “achieve
4.2 Literature Review
103
Table 4.5 Overview Worldwide Governance Indicators Performance Areas
Sub-items
Process
Voice and Accountability
Capacity
Government Effectiveness
Political Stability and Absence of Violence Regulatory Quality Respect
Rule of Law Control of Corruption
Note: Each sub-item is measured by aggregating a variety of data sources (e.g., expert surveys; opinion polls). Source: own presentation based on Kaufmann et al. (2010)
sustainable policy outcomes and imbue political decision-making with a longerterm focus” (Schraad-Tischler & Seelkopf, 2017, p. 2). Like Lijphart, they differentiate between three performance areas (see Table 4.6): policy performance, quality of democracy and governance. For the same reasons as stated above, I do not discuss here the quality of democracy dimension of the SGI. While the policy performance area measures the policy outcome rather than the policy output and helps to identify the “reform needs” (Schraad-Tischler & Seelkopf, 2017, p. 3) of each country, the governance area is a “contextualized assessment of the extent to which the governments of OECD and EU states […] are able to identify pressing issues, develop appropriate solutions and implement them efficiently and efficaciously” (Schraad-Tischler & Seelkopf, 2017, p. 10). Of particular interest here is the sub-item “Executive Capacity” within the governance performance area, as it focuses exclusively on the procedural performance of policymaking. However, executive accountability is actually more linked to the quality of democracy dimensions. In addition, one aspect not mentioned by the other measures before is sustainability—highlighting the difference between short-term and long-lasting goals. Whereas the SGI assesses OECD countries, the BTI “analyzes and evaluates whether and how developing countries and countries in transition are steering social change toward democracy and a market economy” (Bertelsmann Stiftung, 2018a, p. 1). Political transformation is equivalent to the measurement of democracy and therefore corresponds to Roller’s democratic performance aspect. Economic transformation analyzes the quality of the social market economy. Finally, the governance index “assesses the quality of political leadership with which transformation processes are steered” (Bertelsmann Stiftung, 2018a, p. 1). On the one hand, both measures of the Bertelsmann Stiftung have a detailed
104
4
AGIL Typology of Political Performance
Table 4.6 Overview SGI 2016 Performance Areas
Sub-items
Indicators
Policy Performance
Economic Policies
Economy Labor Markets Taxes Budgets Research and innovation Global financial system
Social Policies
Education Social Inclusion Health Families Pensions Integration Safe Living Global Inequalities
Environmental Policies
Environment Global Environmental Protection
Democracy
Quality of Democracy
Electoral Process Access to Information Civil Rights and Political Liberties Rule of Law
Governance
Executive Capacity
Strategic Capacity Interministerial Coordination Evidence-based Instruments Societal Consultation Policy Communication Implementation Adaptability Organizational Reform
Executive Accountability
Citizens’ Participatory Competence Legislative Actors’ Resources Media Parties and Interest Associations
Source: own presentation based on Schraad-Tischler/Seelkopf (2017)
and broad understanding of performance—even differentiating between substantial and procedural performance in Roller’s sense. On the other hand, neither of them gives substantial reasons why they selected these different aspects. An overview is depicted in Table 4.7.
4.3 The AGIL Typology of Political Performance
105
Table 4.7 Overview BTI 2018 Performance Areas Sub-items
Indicators
Status Index
Stateness Political Participation Rule of Law Stabilitiy of democratic institutions Political and social integration
Political Transformation
Economic Transformation Level of socio-economic development Organization of the market and competition Currency and price stability Private property Welfare regime Economic performance Sustainability Governance Index
Governance
Level of difficulty Steering capability Resource efficiency Consensus-building International Cooperation
Source: own presentation based on Bertelsmann Stiftung (2018a)
4.3
The AGIL Typology of Political Performance
4.3.1
Conceptual Derivation of the AGIL Typology
Even though all of these performance typologies seem to capture the most important aspects, they are either not based on theory or they conflate different concepts due to the use of a broad definition of performance. Roller’s concept seems to be the most promising, since it is founded in theory. However, I argue that no items linked to the quality of democracy should be included in the performance typology. Instead a narrower definition of performance is applied. This affects Roller’s democratic performance dimension which is linked too closely to the quality of democracy concept. Despite circumventing a tautological explanation in this study, performance and quality of democracy seem to be two different concepts: Quality of democracy is concerned with the “regime” and thus with the “criterion of political rule” (Merkel, 2010, p. 22 own translation). It evaluates the way in which access to power (Merkel, 2010, p. 22) is regulated (e.g. in a democratic or autocratic way) and how political rule is exerted (e.g. rule of law).
106
4
AGIL Typology of Political Performance
Performance measurement is independent from the concept of the regime and is only concerned with the way political goods are produced and the quality of the political goods themselves. Therefore, quality of democracy measures mainly procedures, whereas performance measures both procedural and substantial goals. However, only dealing with the remaining two possible matrix fields of Roller’s typology (system performance with goal-oriented and general performance) is too abstract and difficult to apply empirically. An approach grounded in theory which helps to differentiate within the goaloriented and general performance to justify and select the relevant performance criteria is needed. Thereby, Powell’s and Almond’s political system approach (Almond & Powell, 1982) might be helpful. The political system approach identifies several important functions, which a political system needs to perform in order to survive. However, it is insufficient to limit the performance analysis to the mere political system approach, because it does not show which policy goods are relevant for society, as it does not show which other subsystems of society are important. This is why the policy goods of Almond/Powell seem to be inconsistent and incomplete, as well. These other subsystems, which produce these important policy goods and which are nonetheless influenced by the political system due to its main steering capability, are to be included in the performance analysis. Therefore, we should not concentrate on the political system itself but should rather focus on a more general level, the social system in the sense of Parsons’ AGIL paradigm (Parsons, 2005). The AGIL scheme gives a heuristic and analytical focus because it includes all “functional prerequisites” of a system to survive (Allan, 2010; Brock et al., 2012), namely adaptation, goal attainment, integration and latent pattern maintenance (latency). In contrast to Powell’s and Almond’s approach, Parsons does not only focus on the political system itself, rather it is concerned with other subsystems of the society as well. Parsons’ system theory shows which other sub-systems are important for the surviving of society—besides the political system. Parsons’ theory—which states that every system must fulfill specific function in order to survive—is able to justifies to some extent the selected performance criteria. It is crucial analyzing the extent of which systems can maintain and perform these functions. It is also possible to compare those functions between different systems because all systems must fulfill the same functions. Thus, I propose a new and more enhanced typology, the AGIL Typology of Political Performance, which combines parts of Roller’s typology with Parsons’ AGIL paradigm (see Table 4.8). Adaptation means that the social system is able to adapt to its outer environment and is able to change this environment if necessary (Abels, 2019, p. 202). This means especially the extraction and efficient use of scarce resources. The
4.3 The AGIL Typology of Political Performance
107
subsystem which offers this function is the economy. In a more figurative sense, it is also the treatment of resources in the sense of environmental awareness and action (environmental protection). Goal Attainment is the system’s function of setting specific goals to plan and organize its activities and actions. It ensures that “every party is energized and moves in the same direction or toward the same goal” (Allan, 2010, p. 213). This function of the social system is fulfilled by the government and the political system. Whereas the social system with these functions orients itself towards the outer environment, the next two functions (integration, latent pattern maintenance) support the inner stability and order of the system (Abels, 2019, p. 204). Integration ensures that the different parts of the social system form a whole: this functions solves the “problem of maintaining solidarity of the units of the social system, of holding them in line or insuring their cooperation” (Isajiw, 2013, p. 93). This is known as the legal system or conflict regulation. Finally, latent pattern maintenance is the cultural part of the system, its identity: it “is too costly to make people conform to social expectations through government and law; there has to be a method of making them willing to conform” (Allan, 2010, p. 213). It refers to the “stability and commitment to the norms and values of the society” (Isajiw, 2013, p. 91). This is the task of socialization or political culture. It adds a subjective dimension to the analysis, similar to the political goods of compliance and support in the approach by Almond/Powell. This new typology is able to integrate the most important aspects of the different aforementioned conceptions—but places them in different table fields. Furthermore, I adopt Roller’s idea that the goal-oriented and general performance are intertwined in the sense that the general performance supports the realization of the goal-oriented performance. Aspects of the goal-oriented performance can be controlled by the political system easier and/or changed more quickly than aspects of the general performance. Here, path dependence applies more strictly. Thus, while goal-oriented performance is the substantive side of performance, general performance captures the procedural aspect of performance. On the one hand, general performance resembles the stages of the policy cycle (problem definition and agenda setting, policy formulation and adoption, implementation, evaluation; see Knill & Tosun, 2012). And on the other hand, it is a governance dimension, because within these stages of the policy cycle it is also a question of the effective interaction of social and state actors. It evaluates, for instance, how well the system is able “to set and maintain priorities among the many conflicting demands made upon them […]; to target resources where they are most effective; to innovate when old policies have failed; to coordinate conflicting objectives into a coherent whole; to be able to impose losses on powerful groups; to represent
108
4
AGIL Typology of Political Performance
diffuse, unorganized interests in addition to concentrated, well-organized ones; to ensure effective implementation of government policies once they have been decided upon; to ensure policy stability so that policies have time to work; to make and maintain international commitments in the realms of trade and national defense to ensure their long-term well-being; and, above all, to manage political cleavages to ensure that society does not degenerate into civil war” (Weaver & Rockman, 1993, p. 6, emphasis in the original). In addition, as most of the discussed literature, I follow the idea of the importance of outcomes compared to outputs: performance is not concerned with the actions or intends to achieve certain goals but it is the evaluation of the outcomes or actual results (Roller, 2011, p. 1852). Actions may even produce effects that are contrary to the intended goal; only the outcomes are a valid measure. However, my typology differentiates between a third dimension of performance which can be seen as a typical mix between the goal-oriented and general performance: This can be describe as the “policy regime” performance. Jahn (1998) uses this term in the context of environmental performance and he states that “regime” implies “that policy outcomes of expansionist and limited energy policies are dependent on a set of rules, values and institutions which constitute clearly distinguishable environmental outcomes” (Jahn, 1998, n. 2). Such policy regimes, a mixture of policies and procedural rules, are widely recognized and conceptualized in political science. Besides the welfare state regime and Jahn’s environmental regimes, research literature differentiates between economic regimes (Varieties of Capitalism), governmental regimes (e.g. presidentialism, parliamentarism and in a wider context majoritarian and consensus democracy) and consociationalism. The idea is that those policy regimes show a “typical” or “type-specific” performance which can be expected because it belongs to this specific policy regime and follows its functional logic. In sum, the AGIL typology of performance criteria, depicted in Table 4.8, analyses performance by differentiating between three performance dimensions (goal-oriented, policy regime and general performance), and four performance functions (adaptation, goal attainment, integration and latent pattern maintenance).
4.3.2
Description of each Matrix Field
In the following section, I give a short conceptual description of each matrix field (for a more detailed conceptual discussion of the goal-oriented performance see chapter 5; a more detailed description of the policy regime performance dimension is given in chapter 9):
4.3 The AGIL Typology of Political Performance
109
Table 4.8 AGIL Typology of Performance
Adaptation
Goal-oriented Performance (substantive dimension) Policy outcome
Policy Regime Performance (substantive-procedural dimension) Policy-Governance-Outcome
General Performance (procedural dimension) Governance
(1a/1) Economic Outcomes
(1a/2) Economic Regime
(1b/1) Environmental Outcomes
(1b/2) Environmental Regime
(1/3) Efficient Use of Resources, Effective Implementation
Goal-Attainment (2/1) (2/2) Reforms of Governmental Regime decision-making process; Reformability of the Political System
(2/3) Set priorities; Innovation; Efficacy of the domestic Decision-Making Process
Integration
(3/3) Conflict Regulation (Civil Order)
Latent Pattern Maintenance
(3a/1) (3a/2) Social Outcomes Welfare State Regime (3b/1) Domestic Security Outcomes
(3b/2) Consociationalism
(4/1) Specific Support; Satisfaction; Confidence
(4/2) Political Culture
(4/3) Diffuse Support; Legitimacy
Source: own presentation
• Goal-oriented Performance/Adaptation (1/1): This matrix field analyzes substantive outcomes of the adaptation function. The adaptation function handles the relationship between the system itself and the environment. On the one hand, this aspect analyses economic outcomes (1a/1) in the sense of an extraction of resources and transforming them into usable goods. On the other hand, adaptation can be understood in a broader sense as the capability to adapt to the outer environment and change this environment as necessary. Therefore, I include environmental outcomes (1b/1) in this matrix field as well.
110
4
AGIL Typology of Political Performance
• Regime Performance/Adaptation (1/2): This mix of substantial and procedural performance includes economic regime in the sense of the prominent Varieties of Capitalism debate based on Hall and Soskice (2001) who differentiate between liberal market economies and coordinated market economies (1a/2). Less clear and still in the early research is Jahn’s (2014) approach of environmental regimes (1b/2) by distinguishing “Green States” which follow an environmental-friendly ideology and “Productionist States” which see the nature mainly as a good for production and consumption. Similarly, is the debate about the “ecostate” (Duit, 2014; Christoff, 2005): “The welfare state and the ecostate are similar […] in the sense that the state takes on the function of mitigating the effects of market externalities: social costs in the case of the welfare state, and ecological costs in the case of the ecostate” (Duit, 2014, p. 322). • General Performance/Adaptation (1/3): This matrix field evaluates the efficiency in the usage of resources und implementation, because “the stipulations of [a] policy must be put into action to bring about the behavioural changes intended by the policy-makers” (Knill & Tosun, 2012, p. 171). In the sense of governance, not only the government but also different actors or groups (e.g. bureaucracy) are part of the implementation process and need to be considered. • Goal-oriented Performance/Goal Attainment (2/1): Goal-Attainment describes the steering capabilities of the system. In combination with goal-oriented performance, this matrix field assesses the substantial aspect of the steering capability. It encompasses constitutional reforms but also institutional learning as well (Brusis, 2008, p. 104). The framework of the SGI (SGI Team, 2019, p. 67) includes the following question which describe this matrix field very well: “To what extent does the government improve its strategic capacity by changing the institutional arrangements of governing?”. • Regime Performance/Goal Attainment (2/2): This matrix field assesses the structure of the decision-making process (governmental system). These are the institutional rules which guide the decision-making process. Concepts such as parliamentarism vs. presidentialism (Steffani, 1979), but also the distinction between majoritarian and consensus democracies (Lijphart, 2012) can be found here. The democracy profiles which were identified in chapter 2 in this study can be found in this matrix field. The procedural aspect of these profile is clear in terms of institutional rules; the substantive aspect is less clear. At the very least, however, Lijphart links consensus democracy to the production of kinder and gentler policies. • General Performance/Goal Attainment (2/3): This matrix field analyzes the steering capabilities of the political system from a procedural perspective. Is
4.3 The AGIL Typology of Political Performance
•
•
•
•
•
111
the political system able to innovate? Is the decision-making process efficient and without the possibility of a gridlock/deadlock? And in times of globalization and supranationalization, is the political system able to foster an efficient international cooperation? Goal-oriented Performance/Integration (3/1): The integration function ensures that the different parts of the social system form a coherent and stable whole. This finds its expression in social inclusion and security. Combined with substantial dimension, this means that, on the one hand, this field analysis social outcomes (3a/1), which includes economic and social equality. On the other hand, it evaluates the domestic security outcomes (3b/1). Can citizens live safely together and is the state able to protect its citizen? Regime Performance/Integration (3/2): Here is the famous welfare state typology by Esping-Andersen (1990) located (3a/2). In addition, consociationalism (Lijphart, 2004; R.B. Andeweg, 2000) as a conflict regulation regime can be included here (3b/2). These special systems are characterized by a grand coalition, mutual vetoes, PR electoral systems and federalism besides informal consociational arrangements (Bogaards et al., 2019). It is theorized that these systems provide stability and hinder violence in multinational states. General Performance/Integration (3/3): This aspect analyses the conflict regulating abilities which assist the different parts of the system to form a whole. Weaver/Rockman (1993, p. 6) call this the “managing of cleavages”. This is especially important for political systems in multinational contexts (Stepan et al., 2010). Can the system regulate conflicts successfully to maintain a civil order? Goal-oriented Performance/Latent Pattern Maintenance (4/1): The latent pattern maintenance function describes the function of the “cultural system”. Although Parsons sees institutions such as the church or family as central institutions in this system, it makes sense to limit the cultural systems’ content to political objects only. In this respect, it allows the performance typology to include elements of the political culture debate. In combination with goaloriented performance, this matrix field analyzes the short-term evaluation of the political system by the population. According to Easton (Easton, 1965), this can be called the specific support of the political system. Is the population satisfied with the work of the government or the politicians? Does it trust these institutions? Regime Performance/Latent Pattern Maintenance (4/2): This matrix field tries to identify the structure of political culture (Almond & Verba, 1963). Is there a political culture which corresponds to the structure of political system? Even though the foundations were laid by Almond/Verba as early as 1963, the
112
4
AGIL Typology of Political Performance
identification of entire political cultures is still in the early stages of research. The two studies by Almond/Verba (1963, 1980), which identified specific types of political cultures in five countries, have not yet been extended to a larger sample. • General Performance/Latent Pattern Maintenance (4/3): This matrix field analysis the diffuse support of the political system. While specific support is linked to evaluative orientations, diffuse support taps affective orientations towards political objects. Therefore, diffuse support is similar to “legitimation” or “belief in legitimacy” (Lauth, 2020): Although a lack of specific support may turn into a lack of diffuse support under specific circumstances, only diffuse support has consequences for the survivability and consolidation of a regime.
4.4
Summary and Conclusion
This chapter showed that many conceptions of performance lack a comprehensive conceptualization. This makes the selection of performance criteria inconsistent and incomplete. Therefore, I propose a new typology, the AGIL Typology of Political Performance: First, I focus on outcomes rather than outputs because outcomes measure the real consequences and impact of policies. Secondly, I draw on the idea by Roller to differentiate between substantive outcomes (goal-oriented performance) and procedural outcomes (general performance). However, I include also regime performance which I define as a typical mix of the substantive and procedural outcomes. Instead of the political system approach by Almond/Powell (1982), whose policy goods are inconsistent, I use—thirdly—Parson’s AGIL paradigm (2005) to justify the selected performance criteria: I focus on all subsystems which are important for the surviving of society according to Parsons. This makes it possible to overcome a major point of criticism of all previous proposals which do not explain their selection of the relevant performance criteria in theoretical terms. In the next chapters, this typology is the central starting point. I examine the detailed conceptualization and measurement of the AGIL Typology of Political Performance focusing especially on the goal-oriented performance dimension in the following chapter 5. To derive final values for each matrix field of the goal-oriented performance, I discuss and conduct the aggregation procedure in chapter 6. Chapters 7 presents a descriptive analysis of the development of the goal-oriented performance, while chapter 8 carries out the main causal analysis, which explains goal-oriented performance with the help of democracy profiles. Finally, regime performance is explored and analyzed in chapter 9.
5
Conceptualizing and Measuring Goal-oriented Performance
5.1
Introduction
In the previous chapter, I developed the overall framework for the assessment of political performance in the sense of an enhanced typology of political performance, named the AGIL Typology of Political Performance. In this chapter, I elaborate a proposal for measuring the individual fields of the AGIL typology of political performance (see Table 4.8 In the previous chapter). As a starting point, Table 7.4 in the appendix gives an overview over existing studies which analyze the relationship between a democracy model (e.g. consensus vs. majoritarian democracy) and performance areas (Bogaards, 2017b). These studies are fit into the matrix fields of the AGIL typology of performance, so that scientific desiderates within the performance research become visible. It shows that a lot of studies focus on goal-oriented performance so that almost all areas are covered by studies (with the exception of the goal-attainment function). In addition, many studies investigate aspects which mix the two concepts of performance and measurement of democracy (“Others”). The biggest desideratum is the general performance dimension, which has hardly been studied due to its complicated measurement. However, this large void cannot be closed by this study either: Still there are strong limitations in the availability of empirical data for this performance dimension, so that it is not possible to measure the matrix fields of the general performance—even rudimentary. Therefore, I will only develop and discuss the goal-oriented and policy regime performance in greater detail. I start with Electronic supplementary material The online version of this chapter (https://doi.org/10.1007/978-3-658-34880-9_5) contains supplementary material, which is available to authorized users. © The Author(s), under exclusive license to Springer Fachmedien Wiesbaden GmbH, part of Springer Nature 2021 O. Schlenkrich, Origin and Performance of Democracy Profiles, Vergleichende Politikwissenschaft, https://doi.org/10.1007/978-3-658-34880-9_5
113
114
5
Conceptualizing and Measuring Goal-oriented Performance
the goal-oriented performance in this chapter; chapter 9 deals with the policy regime performance. Thereby, the various studies presented in Table 7.4 in the appendix are consulted to ensure a sound conceptual approach and measurement based on the research literature. The discussion of the goal-oriented performance follows the well-known approach developed by Munck/Verkuilen (2002) (see also Müller & Pickel, 2007; Pickel et al., 2015). As discussed in the chapter 3, they outline the importance of three steps for the development of a sound measurement concept: conceptualization, measurement and aggregation. Applying this approach, another major problem of many performance studies will be solved: the lack of a thorough discussion of the individual dependent variables of these studies. Often scholars “simply use a range of different and separate indicators that lack theoretical or substantive backing” (Vis et al., 2012, p. 74). After a review of my methodological framework (see section 5.2), I present the conceptualization and measurement for each matrix field of the goal-oriented performance dimension (i.e. selection of the indicators). The presentation is in the in the order of the four central functions of social systems (AGIL), namely adaptation with its economic and environmental outcomes (section 5.3), goal-attainment measuring reformability of the political system (section 5.4), integration with its social as well as domestic security outcomes (section 5.5) and latent pattern maintenance evaluating the confidence in the political system (section 5.6). A summary is given in section 5.7. The final step, the aggregation strategy, will be explained in the next chapter (Chapter 6) due to its high complexity.
5.2
Methodological Framework for Conceptualization and Measurement
In general, I follow the three-phase approach by Munck/Verkuilen (2002), but adapt their approach to the necessities of this study. In this chapter, I address the challenges of conceptualization and measurement, and in the next chapter the step of aggregation (i.e. the theoretically and mathematically sound bundling of the values measured by the indicators) (see for a more detailed discussion, chapter 6). In the conceptualization phase, the focus is on developing a useful and parsimonious definition that has discriminatory power and that serves as point of departure for the next steps. Since I elaborated the main framework, the AGIL typology of performance, in the previous chapter, the concepts to be developed here are already divided into smaller sections, the performance areas (e.g. economic outcomes or environmental outcomes). Therefore I refrain from constructing
5.2 Methodological Framework …
115
a differentiated concept tree, as suggested by Munck and Verkuilen (2002, p. 13), although I do discuss the concept of the individual goal-oriented performance fields in detail. The next step is to measure the concept—its operationalization—by selecting indicators for empirical measurement. Three criteria are important here: high construct validity, high data quality and broad coverage. Construct validity implies the following: First, the selected indicators actually need to measure the outlined components of the concept. Secondly, “cross-national equivalence” (Pickel et al., 2015, p. 506) and possible functional equivalents should be taken into account, so that the indicator is applicable and comparable across the countries examined here. This is closely linked to the aspect of data quality. It is often questionable whether the performance data are comparable across cases. Often data are reported to international organization by countries that define or measure aspects of performance quite differently (e.g. “burglary” is defined differently in each country and is not readily comparable, see section 5.5.2 “Domestic Security Outcomes”). Or the data are based on different definitions (e.g. calculation of the Gini-coefficient based on income vs. consumption data) but are nevertheless mixed in the dataset. Coverage refers to the sufficient spatial and temporal completeness (Gerring & Thacker, 2008, p. 102; Wendling et al., 2018, p. 5) for a successful crossnational and longitudinal comparison of the performance of democracy profiles. The indicator and its dataset should cover the whole sample of the study (see chapter 2 about democracy profiles). However, this objective has to be balanced with construct validity and data quality, so that it might be necessary to limit the spatial coverage from global to only OECD countries, which offer a larger and often more reliable set of indicators. The dataset should also consist of several measurements over time to allow for a time-series-cross-sectional analysis (see chapter 9). Data for only a single year allowing only cross-sectional analysis is not sufficient, but cannot be prevented for some performance areas (goal-attainment performance; latent pattern maintenance performance). Table 5.1 summarizes the methodological framework.
116
5
Conceptualizing and Measuring Goal-oriented Performance
Table 5.1 Checklist for Conceptualization and Measurement Framework Step of Index Construction
Task
Conceptualization
Parsimonious definition with discriminatory power
Measurement
Construct Validity and Cross-National Equivalence Data Quality Spatial and Temporal Coverage
Source: own table
5.3
Adaptation: Economic and Environmental Outcomes
5.3.1
Economic Outcomes (Goal-oriented Performance 1a)
The performance of economic outcomes is defined by the extent to which it is increasing the wealth of the population (Roller, 2005, pp. 39–42). The Bertelsmann Transformation Index (BTI) defines a successful economic performance as “solid growth” (Bertelsmann Stiftung, 2018b, p. 32). Also, Gerring and Thacker (2008, p. 106) link economic development to “economic growth and prosperity”. Roller’s conceptualization of the economic outcome is based on the ‘magic square of economic policy’, which highlights the four main goals of economic policy, namely full employment, high price stability, high economic growth and balanced trade (Roller, 2005, p. 40). According to Lovell et al. (1995, p. 507), these are important and almost universal goals: “The first three objectives are enshrined in the United States Full Employment Act of 1946, and they appear to guide policy makers in most, if not all, other advanced nations as well. The importance of the fourth objective is clear from the recent contentious debates on NAFTA, and on the GATT and the creation of the World Trade Organization”. However, Roller excludes the trade balance because it “has no direct influence on citizen wealth” (Roller, 2005, p. 40). The OECD uses similar components in its ‘magic diamond’ (OECD, 1987) of economic policy. Between these different goals of the magic square, there is on the one hand a complementary relationship and on the other hand a conflicting relationship (trade-off): “First, the achievement of all four goals simultaneously was perceived as a clear and sustainable increase in economic wellbeing. And second, as pushing towards achieving one target might have negative effects on others […], achieving all targets at once was seen as something difficult to achieve” (Dullien, 2017, p. 7). It is important for this study to consider the conflicting relationships between these goals, as it may complicate the aggregation procedure or even prevent a
5.3 Adaptation: Economic and Environmental Outcomes
117
successful aggregation into a single component of economic performance. This applies especially to the trade-off between inflation and unemployment in the Keynesian view (Vis et al., 2012, pp. 80–81). This needs to be carefully examined in the next chapter. Another approach that is very popular in the US is the misery index developed by the economist Art Okun, which is just the unweighted sum of the inflation and unemployment rate. Hanke (2014, n. pag.) modified the Misery index by including additional variables: “Constituents prefer lower inflation rates, lower unemployment rates, lower lending rates, and higher GDP per capita [growth]”. Finally, the assessment of the BTI considers several different macroeconomic indicators, the most important are GDP growth, unemployment, inflation, foreign direct investment, gross capital formation, public debt and current account balance (Bertelsmann Stiftung, 2018b, p. 32). In contrast to the other approaches, BTI includes at least two other economic aspects: the economic consequences of public debt and austerity policy and effects of investments in the sense of gross capital formation and foreign direct investment (FDI), which both can be considered drivers of economic growth (for FDI, see Sarkar, 2016). In this study, I use several indicators to measure macroeconomic performance: A first set of indicators encompasses GDP per capita, unemployment and price stability (no inflation). GDP per capita is the most widely accepted indicator. It measures the income of a country. However, “among economists, it is by now widely acknowledged that GDP is a highly imperfect measure of well-being of the citizens of a country” (Dullien, 2017, p. 4). But instead of considering GDP growth rates, I choose GDP per capita, because—similar to Gerring and Thacker (2008, p. 108)—my interest is “in the level of prosperity attained in a given country, rather than its short-run rate of change”.1 Importantly, to allow for a valid comparison across countries, the GDP measure needs to be adjusted for purchasing power parity (PPP) to control for the different living costs and price levels across countries (e.g. Switzerland has a higher GDP per capita than Germany, but also much higher price levels than Germany). In addition, unemployment/employment is often difficult to compare across countries because it focuses mostly on formal employment. Informal employment which is common,
1
The World Bank favors the Gross National Income per capita to evaluate the lending eligibility and repayment terms as well as to classify economics into different income groups. As the World Bank states, this indicator “is closely correlated with other, nonmonetary measures of the quality of life, such as life expectancy at birth, mortality rates of children, and enrollment rates in school.” (World Bank Data Help Desk, 2019). However, this should be true for the GDP per capita as well and this indicator is closely correlated with it (.99 Pearson coefficient).
118
5
Conceptualizing and Measuring Goal-oriented Performance
for instance, in South American countries, is not captured. Nevertheless, it is the most important indicator for the labor market, therefore I include it in my analysis. Although Roller excluded account balance, it can be regarded as a “key indicator of stability” (Dullien, 2017, p. 13): “large current account deficits increase a country’s external debt and thus endanger debt sustainability. Large current account surpluses […] entail growing indebtedness of foreign trade partners and thus destabilise the external economic environment. Furthermore, large surpluses lead in the medium to long term to revaluation pressure on the national currency and thus jeopardise medium- to long-term export opportunities” (Dullien, 2017, p. 13). Furthermore, besides GDP per capita, the economic wellbeing of the citizens is complemented by another indicator as well: private consumption per capita measures the purchasing power of private households. Dullien (2017, p. 13) states that consumption “as an indicator in addition to GDP ensures that the kind of increases that really improve the material circumstances of the individual can be taken into consideration. Income increases for a rich minority that are just saved, although showing up in GDP, do not reflect rising consumption”. Finally, I incorporate the indicators used in the BTI framework (public debt, foreign direct investment and gross capital formation). It is important to note that this study does recognize possible negative effect of economic growth, such as economic inequality or environmental deterioration. Sustainable development in the form of environmental protection and a reduction of social inequality are important performance objectives (Buhr & Schmid, 2016; Dullien, 2017; Dullien & van Treeck, 2012, see next sections). While this performance area is only concerned with the performance of the economy, the other performance areas will examine these important goals (i.e. social and environmental outcomes are measured separately). An simultaneous assessment of all performance fields (see chapter 7) will reveal, whether the relationships between these performance areas are complementary or conflicting (Roller, 2005, pp. 198–218; see chapter 7). For the operationalization of this concept, I rely on three sources. Indicators from the International Monetary Fund (IMF), World Bank (WDI) and the Penn World Tables (PWT) are used. All sources collect data for many countries over a long period of time. IMF and WDI try to ensure the comparability of their data by developing various quality assessment systems such as the General Data Dissemination System (GDDS) or the Data Quality Assessment Framework (DQAF). PWT is a “database with information on relative levels of income, output, inputs and productivity, covering 182 countries between 1950 and 2014” (Teorell et al., 2019, p. 524). As Dawson et al. (2001, p. 989) state, even though there is large
5.3 Adaptation: Economic and Environmental Outcomes
119
measurement error in the PWT data varying by development status of the country, the PWT belongs “[f]or many purposes, […] to the best international data we have” (see also for a more recent criticism, Johnson et al., 2013). Overall, it seems there is sufficient quality and comparability of the data, so that they can be used in this study. Table 5.2 shows the selected indicators for Economic Outcomes in the Adaptation function. Table 5.2 Selection of Indicators for Economic Outcomes Goal
Component
Name of Indicator
Source
Wealth
Based on Magic Square of Economic Policy
GDP per capita, current prices (Purchasing power parity, international dollars)
International Monetary Fund (World Economic Outlook 2019)
Inflation rate, average consumer prices Unemployment rate (% of total labor force) Central government debt (% of GDP) Share of gross capital formation at current PPPs
Penn World Table V.9
Household final consumption expenditure per capita (constant 2010 US$)
World Bank (WDI indicators)
Foreign direct investment, net inflows (% of GDP) (wdi_fdiin via QoG: Teorell et al., 2019) Current account balance Source: own table
Most of the data is available since 1980, the data coverage is excellent (see Figure 5.1). The GDP per capita and inflation indicator of the IMF show almost no missing values. A higher missingness rate is found in the unemployment indicator and especially in the government debt indicator. Gross capital formation (PWT) has a wider temporal coverage going back to the 1950 s. It covers almost 100% of
120
5
Conceptualizing and Measuring Goal-oriented Performance
Note: Black bars indicate the amount of missing values. Source: own calculaon with IMF, PWT and WDI data
Figure 5.1 Missing Values in the Economic Outcome Indicators
all cases. The coverage of the WDI indicators is also great. For these indicators, most of the data is available from 1980 and the missing values fall since then to about 2% for the investment indicator and 18% for the consumption indicator. Since all datasets offer data for the 1980 s, I aggregate and calculate the index for economic outcome performance beginning in the year 1980.
5.3.2
Environmental Outcomes (Goal-oriented Performance 1b)
Environmental matters “have become increasingly relevant in contemporary highly industrialized and globalized societies” (Jahn, 2016, p. 1). The danger of environmental pollution and climate change affects all areas of life, e.g. health issues, environmental catastrophes. Insofar, environmental outcomes can be considered a crucial part of performance.
5.3 Adaptation: Economic and Environmental Outcomes
121
An important distinction is the measuring of the state of the environment compared to environmental performance. The state of the environment (e.g. biodiversity) is often not driven by political decision but by apolitical reasons beyond human control (geographical location) and should therefore not be considered here. Rather, this study focusses on the “evaluation of societal attainment with relation to environmental matters” (Meadowcroft, 2014, p. 28). Performance measures should be changeable through politics: “Measurements of differences in environmental conditions that stem entirely from natural circumstances do not speak directly to the issue of performance” (Meadowcroft, 2014, p. 28). In a similar vein, Scruggs (1999, p. 11) highlights that environmental performance should be defined “to be the results of human responses to human-induced environmental pollution problems”. Roller (2005, p. 45) as well emphasizes that what matters is “not the existing concentration of air pollutants (immission) but the volume of pollutants emitted into the air by firms and citizens (emission)”. For example, the Ecological Footprint Index (Wackernagel et al., 2019) offsets the biodiversity against the ecological footprint (consumption of the environment). Countries that are rich on biodiversity due to their geographical location (e.g. Brazil, DR Congo) are performing better than countries with lower biodiversity, although the latter may pursue more protective environmental policies (Jahn, 2016, p. 96). This is not plausible and the concept of the Ecological Footprint Index cannot be used here. Roller’s (2005, pp. 44–46) concept of environmental performance is based on the OECD’s “pressure-state-response” framework. It distinguishes between two dimensions, namely “protecting environmental quality” and “protecting the quality and quantity of natural resources”. Whereas the former describes “‘sinkoriented’” (OECD, 1993, p. 11) issues like climate change, air quality and waste, the latter focuses on “‘source-oriented’” (OECD, 1993, p. 11) issues like water resources or forest resources. She measures the former aspect of air quality with emission-based indicators (Sulphur oxides; nitrogen oxides; Carbon dioxide emissions) and indicators covering municipal waste production and fertilizer use. The second dimension is based on an indicator that measures the water consumption. Jahn (2016) bases his study on 14 indicators collected by the OECD (indicators measuring atmospheric emissions; water pollution; waste generation; soil degradation and environmental relief measures). Using principal component analysis (for a description of this procedure, see the next Section 6.2.1) on his dataset, he identifies three components2 : General Environmental Performance, Mundane 2
He uses principal component analysis (PCA) instead of exploratory factor analysis (EFA). However, this might not be the right methodological choice. PCA does not differentiate between shared variance (communality) and unique variance (unshared variance, error). Whereas
122
5
Conceptualizing and Measuring Goal-oriented Performance
Environmental Performance, Water Pollution. The General Performance Index encompasses air emission, waste and water consumption. According to Jahn, it is the “most politically contested factor because major challenges of environmental performance are embodied in this dimension” (Jahn, 2016, p. 135). He proposes that, if one “is looking for the single most important environmental performance index, it would be this one” (Jahn, 2016, p. 151). The second factor, Mundane Environmental Performance, combines recycling and water treatment. Finally, the third factor—water pollution—contains pollution of rivers and lakes. For the concept and measurement of this performance area I use the following strategy: Although a multidimensional concept and measurement of environmental performance might be beneficial, I simplify Jahn’s approach by focusing only on one key dimension of environmental performance, as I also study a variety of other performance areas. I only try to recreate Jahn’s most important factor, the General Environmental Performance Index. Like Jahn, I use OECD data to measure environmental performance (greenhouse gas emissions, sulphur oxides emission, nitrogen oxides emissions, carbon monoxide emissions, generated municipal waste and water abstraction). However, I deviate from Jahn’s approach in one important aspect: Because it is important for this study, to compare countries that are at different economic levels, I do not use emissions per capita as a reference, as Jahn does, but rather the emission per unit of GDP. If I were to use the per capita transformation, India would be the best performing country due to its large population. India performs worse—and is therefore assessed more realistically, when using the unit of GDP as a reference. However, “this indicator is much more uncertain than population[.] For the CO2 intensity related to GDP of a country (CO2 per USD of GDP) it is recommended to compare levels between countries and longer term trends only” (Olivier et al., 2015, p. 30). Table 5.3 shows the selected indicators for environmental outcomes in the adaptation function.
PCA reduces only the dimensionality of the data, EFA tries to find the latent factor which causes the shared variance of the measured variables. The components of the PCA are “parsimonious representations of the original measured variables but are not latent constructs” (Watkins, 2018, p. 9). Despite these theoretical and computational differences, both methods usually yield the same results (M. Norris & Lecavalier, 2010, pp. 9–10). A detailed explanation of the EFA procedure can be found in the next chapter.
5.3 Adaptation: Economic and Environmental Outcomes
123
Table 5.3 Selection of Indicators for Environmental Outcomes Goal
Component
Name of Indicator
Source
Protection of the Environmental Quality
Atmospheric Emissions
Greenhouse gas (CO2 ) emissions per unit of GDP
OECD indicators
Sulphur Oxides Emissions per unit of GDP Nitrogen Oxides Emissions per unit of GDP Carbon Monoxide Emissions per unit of GDP Waste
Total amount generated of municipal waste per capita (oecd_waste_t1b via QoG: Teorell et al., 2019)
Water consumption
Water abstractions per capita (oecd_water_t1a via QoG: Teorell et al., 2019)
Source: own table
Figure 5.2 shows the distribution of missing values among the indicators. Most of the OECD data is only provided since 1990. However, the OECD data also include statistics for few non-OECD countries such as Argentina, Brazil, India, Indonesia, South Africa and Russia. Nevertheless, 50% of the observations are still missing from the data. Lastly, the indicators for water and waste have more missing values.
124
5
Conceptualizing and Measuring Goal-oriented Performance
Note: Black bars indicate the amount of missing values. Source: own calculaon with OECD data
Figure 5.2 Missing Values in the Environmental Outcome Indicators
5.4
Goal-Attainment: Reformability of the Political System (Goal-oriented Performance 2)
There is consensus that, on the one hand, all “constitutions require regular, periodic modification, whether through amendment, judicial or legislative alteration, or replacement” (Lutz, 1994, p. 357) and that very rigid constitutions reduce the political regime’s ability to survive. On the other hand, there is consensus that too much change or flexibility of the constitution can be a sign of a destabilized political system. Thus, constitutions “are likely to live longer if they include amendment procedures that finely balance flexibility and rigidity” (Negretto, 2012, p. 759). This is a debate that goes back to the founders of the American constitution. As Elkins et al. (2009a, p. 22) put it: “On Jefferson’s side, the consequences of constitutional replacement include increased representation, a more participatory public regarding higher law, and upgrades to suboptimal or outmoded institutions. Arguments on Madison’s side include a role in facilitating precommitment, binding a sometimes diverse and multitudinous citizenry, fostering the development of ancillary institutions, and a potential instrumental benefit in facilitating investment and economic activity”. Finally, Lorenz (2008, p. 362) states that changes of the constitutions are a genuine part of the development of democracies, while constitutional rigidity is just an illusion.
5.4 Goal-Attainment: Reformability of the Political System …
125
The frequency of constitutional amendments depends on at least two elements, namely pressure and mechanisms of constitutional amendment. The first element is the pressure or demand to change the constitutions. According to Lutz (1994, p. 357), a modification of the constitution becomes necessary when the economical or technological environment of the political system changes or when political culture and values of the population evolve. These socio-economic changes can “complicate social relations or upset the balance between contending groups” (Banting & Simeon, 1985, p. 12) leading to constitutional conflicts. In addition, constitutional modifications may be necessary to correct for unwanted institutional effects. Thereby, the greater the gap between the demands and what the constitution can deliver, the greater the risk that the rules “become brittle and out of date, leading to pressure to adopt new rules through constitutional amendment, reinterpretation, or replacement” (Ginsburg & Melton, 2015, p. 688). The second element is the amendment procedures, which, depending on their design, may facilitate or hinder the reform process. There are two ways to change the constitutional framework: the explicit constitutional change and the implicit constitutional change (Lorenz, 2015, p. 377). The former refers to the formal process of changing the constitutional text or introducing a completely new constitution. The latter means a constitutional transformation by changing political practice as well as interpretation of the constitution by the judiciary (Ginsburg & Melton, 2015; Negretto, 2012; for a comprehensive list of constitutional change mechanisms, see Contiades & Fotiadou, 2012, pp. 436–440). Although Tsebelis’ veto player theory gives a general idea about the relation between the frequency of successful amendments and difficulty of the amendment procedure due to the number of veto groups, unfortunately, there is still “no agreement on how to classify constitutional amendment procedures according to their rigidity” (Bucur & Rasch, 2019, p. 157). This is why different measurement approaches of the amendment difficulty are not very well correlated (Ginsburg & Melton, 2015, p. 688; Lorenz, 2005). However, these procedure cannot explain constitutional change alone rather it is important to take other political, economic and social factors into account (Bucur & Rasch, 2019, p. 172; Ginsburg & Melton, 2015). The above mentioned demand and pressure approach allows to determine a political system’s ability to reform as a measure of performance: a high performing political system is one that can respond and react to pressure and demands by adapting its constitution or decision-making process.3 However, amending the constitution is not the only way to change the decision-making process. There 3
Amendment difficulties and constitutional rigidity play also a role in the debate on constitutionalism vs. democracy. Here, too rigid constitutions and too difficult amendment procedures are seen as a problematic limitation of the principle of democracy. Democracy presupposes
126
5
Conceptualizing and Measuring Goal-oriented Performance
could also be smaller organizational changes via institutional learning, which do not require a constitutional modification: “Institutional learning refers […] to the ability to reform organizational structures: to what extent do intra-executive actors review their own organizational structures (rules of procedure, external relations with parliament, government parties, administration and the public, etc.), and to what extent do institutional reforms improve the strategic capacity to act?” (Brusis, 2008, p. 104, own translation). Lastly, the difficulty of the amendment procedures is not a part of goal-oriented performance, but can serve as an independent variable.4 How can we measure reformability? Due to measurement difficulties and data limitation, it is not possible to quantitatively assess the reformability of the political system as a response or reaction to demands and pressures. Therefore, the analysis must focus solely on the frequency of constitutional change and presuppose that these changes are adequate responses to pressures and demands. Traditionally, the frequency of constitutional changes is measured by the amendment rate (Lorenz, 2005). The amendment rate is the number of constitutional changes given a specific time period (e.g. one year) and can therefore range between 0 (no amendments) and 1 (multiple amendment). As with the measures of amendment difficulties, there is a low correlation between the amendment rates from different approaches (Lorenz, 2005, p. 351). Therefore, I selected two sources which provide a rather broad spatial and temporal coverage (Comparative Constitutions Project—CCP, and the amendment rate created by Lutz). In addition, the CCP offers a weighted amendment rate as well. In contrast to the unweighted amendment rate, the weighted amendment rate “takes into account the extent of the changes […] that resulted from each amendment” (Ginsburg & Melton, 2015, p. 705). However, the coverage for the sample of this study is rather poor, so that the weighted rate cannot be included in the analysis.5
some freedom of choice and room for maneuver. However, too flexible constitutions and easy amendment possibilities carry the danger of the tyranny of the majority, see for a discussion Lauth (2004, pp. 152–166). Constitutions and judicial review are even rejected by Munck (2016, pp. 14–16): “political arbitrariness is replaced by judicial arbitrariness, and popular majorities are replaced by a majority of judges”. 4 Difficulty of the amendment procedures is probably related to the democracy profiles: consensus democracies can only change their constitution through a difficult process involving multiple veto groups, or democracies which emphasize the control dimension at the cost of the other dimensions. 5 For instance, the weighted amendment rate for countries such as Germany, Switzerland, Belgium, France, Italy, United Kingdom are missing. However, these cases are important for the following empirical analysis.
5.5 Integration: Social and Domestic Security Outcomes
127
However, what is a good performing amendment rate? Lutz (1994, p. 357) gives almost a circular definition: “A successful constitutional system would seem to be defined by a constitution of considerable age that has a total number of amendments which, when divided by the constitution’s age in years, represent a moderate amendment rate”. Elkins et al. (2009a, pp. 140, 205) try to specify an optimal amendment rate via a statistical analysis.6 There is no consensus, and therefore I have to crudely assume that the higher the amendment rate, the higher the performance in the reformabiltiy area. The selection of indicators is displayed in Table 5.4. Table 5.4 Selection of Indicators for Reformability Goal
Component
Name of Indicator Source (QoG)
High Reformability
Amendment rate
Unweighted Amendment rate (arate_ccp)
CCP (Elkins et al., 2009b)
Amendment rate (arate_lutz)
Lutz (1994)
Source: own table
Figure 5.3 illustrates the distribution of missing values for both indicators. The CCP measure is nearly universally available, while the Lutz measure shows about 30% missing values from 1950 to 1990. The increase of missing values since 1990 is due to the massive emergence of new democracies. However, it is important to stress that the data here is not a time series. Since it is a rate and thus summarize a time-span into a single number, it is actually more like cross-sectional data.
5.5
Integration: Social and Domestic Security Outcomes
5.5.1
Social Outcomes (Goal-oriented Performance 3a)
Social policy and its effects, the social outcomes, have “immense and immediate importance for the distribution of life chances and life risks” (Häusermann, 2018, 6
They state: “a constitution has suboptimal flexibility if it has an estimated amendment rate of less than 0.35 or greater than 0.75, moderate flexibility if it has an estimated amendment rate of between 0.35 and 0.45 or 0.65 and 0.75, and optimal flexibility of it has an estimated amendment rate between 0.45 and 0.65” (Elkins et al., 2009a, p. 205).
128
5
Conceptualizing and Measuring Goal-oriented Performance
Note: Black bars indicate the amount of missing values. Source: own calculaon with CCP and Lutz data
Figure 5.3 Missing Values in the Reformability Indicators
p. 2). They attempt to protect against the consequences and risks of the capitalist economy through compensation (Bertelsmann Stiftung, 2018b, p. 31). Thereby, social policy can be defined as the intervention “in society and the economy […] to redistribute material resources between classes and to safeguard against social risks” (Häusermann, 2018, p. 3). This definition has two main components that go in tandem with each other: on the one hand, protection against social risks and on the other hand, the redistribution of material resources to alleviate economic inequality. Häusermann (2018, p. 4) lists important risks against which most welfare state provide protection, such as old age, illness, disability or unemployment. Economic inequality is an important topic recently, as economic inequalities appear to be increasing (Piketty, 2014; Streeck, 2017; Merkel, 2014). How can we measure these two components which are important for social outcomes? For the component “protection against social risks”, there is a discussion labeled the “dependent variable problem” (Clasen & Siegel, 2007). Traditionally, the social expenditure as percentage of GDP was used as a measure of the welfare state generosity in quantitative studies (e.g. Crepaz, 1998). Although the data is easily available and it provides “a summary measure of different aspects of
5.5 Integration: Social and Domestic Security Outcomes
129
programmes” (Green-Pedersen, 2007, p. 19), social expenditures indicators have serious flaws: On the one hand, public expenditure fluctuates with the general economic situation of the countries (as it is measured as a percentage of the GDP). They are also dependent on the unemployment rate or the aging of the population, which might lead to wrong conclusions (Green-Pedersen, 2007, p. 19; Jahn, 2011, p. 144). For example, social expenditure grew under Thatcher in Britain due to rising unemployment in the 1980 s, although spending was actually cut (Esping-Andersen, 1990, p. 20; Scruggs, 2007, p. 137). On the other hand, they “only reflect the budgetary aggregate part of a social policy’s generosity but not the eligibility criteria or the distribution rules” (Häusermann, 2018, p. 6). A better alternative is the analysis of the “institutional characteristics of major social protection programs” (Scruggs, 2014) which culminate in the calculation of so called “replacement rates”. The “income replacement rate is a measurement of the generosity of an instrument, which relates the level of benefits to the average wage of a typical employee” (Häusermann, 2018, p. 6). It describes the standardized benefits in the form of “monetary compensation paid to retirees, unemployed persons or those unable to work because of illness” (Schmitt, 2012, p. 32). This became an especially popular approach since the availability of the Comparative Welfare Entitlements Dataset (CWED, Scruggs, 2014) that covers 33 countries from 1970 to 2010. The CWED considers replacement rates for a typical single worker and a typical family, coverage and eligibility criteria for three areas, namely unemployment insurance, sick pay insurance and public pensions. These replacement rates are “less sensitive to ‘demand factors’ […] than expenditures and more clearly connected to political decisions” (Wenzelburger et al., 2013, p. 1230). However, the data seem to have some validity issues. On the one hand, some changes in the replacement rate “are influenced by other factors such as wage levels and taxation rules” (Green-Pedersen, 2007, p. 21). On the other hand, Wenzelsburger et al. (2013) compared replacement rates datasets and showed that they diverged in some respects: “the absolute values and relative positions of several states are strongly affected by the way their peculiarities are dealt with” (Wenzelburger et al., 2013, p. 1245), e.g. some exclude unemployment assistance in Germany (Scruggs et al., 2017, p. 5). The replacement rates can be seen as an outcome measure because they measure the impact of political decisions on the benefit levels for workers and families. Overall, the CWED combines the replacement rates for unemployment insurance, sick pay insurance and public pension into a “Combined Generosity Index”, which I will use. How can we measure the effects of the redistribution of material resources? There are two common measures, the poverty rate and the Gini index. Both are
130
5
Conceptualizing and Measuring Goal-oriented Performance
included in this study, because they measure different aspects of economic inequality. On the one hand, the poverty rate (i.e. percentile ratio 90/10 or percentile ratio 80/20) focuses on inequality between those with the lowest and highest income and thus, “captures only one element of the income distribution (the percentage below an income cut-off, or poverty line) and can thus move independently of inequality, defined across the whole distribution” (Saunders, 2010, p. 3). On the other hand, the Gini index “captures the overall variation in incomes by measuring how far the actual distribution differs from one in which all incomes are distributed equally” (Saunders, 2010, p. 3). However, the cross-country comparability of these measures is limited. There is a trade-off between coverage and comparability. The “smallest but most consistent” (Ravallion, 2015, p. 529) source is the Luxembourg Income Study (LIS). This study developed a “comparative household income microdata base by adjusting national data sets to conform to a standardized conceptual and definitional template” (Saunders, 2010, p. 4). The World Development Indicators (WDI) have a wider coverage, but the data may not be as comparable as the LIS data. For example, WDI mixes income and consumption data for the calculation of the Gini coefficient. This might not be a problem in low-income countries, “where there is little capital accumulation, [so that] consumption, and income data sets are very comparable and consumption may indeed be preferable” (Smeeding & Latner, 2015, p. 617). However, data for rich countries are often reported in terms of income. Whereas consumption data underestimates the Gini coefficient, income data tend to overestimate it. I incorporate another indicator to measure economic inequality, “Means-tested v. universalistic policy”7 from the Varieties of Democracy dataset, which might not be considered a genuine outcome indicator but offers higher comparability while providing great temporal and spatial coverage. Finally, I add a third aspect to the measurement of social outcomes: equal opportunity based on gender and ethnicity or race. The BTI defines equal opportunity as the “equal access to participation in society regardless of their social background” (Bertelsmann Stiftung, 2018b, p. 31). They refer to equal opportunities in the areas of education, public services and employment. The opposite of equal opportunity is exclusion, “when individuals are denied access to services or participation in governed spaces based on their identity or belonging to a particular group” (Coppedge, Gerring, Knutsen, Lindberg, Skaaning, et al., 2019, p. 195). It is a performance measure, because it focuses on social inequality or 7
“How many welfare programs are means-tested and how many benefit all (or virtually all) members of the polity?” (Coppedge, Gerring, Knutsen, Lindberg, Skaaning, et al., 2019, p. 150)
5.5 Integration: Social and Domestic Security Outcomes
131
social exclusion—and not on political equality which would be a measure of the quality of democracy. Besides a commonly used female labor force indicator from the International Labour Organization (ILO), the V-Dem Dataset—since Version 9—contains indicators that can be used here: They measure, for instance, the equal distribution of clean water, security and healthcare across different social groups and gender. Another important part of equal opportunity is equal access to education. I use the female secondary school enrollment as an indicator which offers some variability in my sample—focusing on primary education would probably not highlight differences between the developed democracies in my sample. Table 5.5 presents the selected indicators for the measurement of social outcomes in the integration function.
Table 5.5 Selection of Indicators for Social Outcomes Goal
Component
Name of Indicator
Economic and Social Equality
Insurance against social risks (Unemployment; Health; Pension)
Combined Generosity CWED2 (via QoG: Index (sc_tgen) Teorell et al., 2019)
Redistribution of Gini Coefficient material resources (lis_gini) (Income Inequality) Percentile Ratio (90/10) (lis_pr9010)
Source
LIS (via QoG: Teorell et al., 2019)
Percentile Ratio (80/20) (lis_pr8020)
Equal opportunity
Gini Index (World Bank estimate) (wdi_gini)
World Bank (WDI indicators) (via QoG: Teorell et al., 2019)
Means-tested v. universalistic policy (v2dlunivl)
Varieties of Democracy
Access to public services distributed by gender (C) (v2peapsgen)
Varieties of Democracy
Access to public services distributed by social group (C) (v2peapssoc) (continued)
132
5
Conceptualizing and Measuring Goal-oriented Performance
Table 5.5 (continued) Goal
Component
Name of Indicator
Source
Labor force, female (% of total labor force) (wdi_lfpf)
ILO (via QoG: Teorell et al., 2019)
School enrollment, secondary, female (wdi_gersf)
World Bank (WDI indicators) (via QoG: Teorell et al., 2019)
Source: own table
Figure 5.4 shows the percent of missing values across the indicators. V-Dem offers the broadest coverage, ranging over whole period with less than 10% of missing values. The school enrollment variable from WDI indicators offers also wide coverage since 1970. It has 25% of missing values until the end of 1990 s. The missing values decrease to less than 20% since 2000. The female labor force indicator also provides a lot of data and covers over 95% of all cases in this study. More missing values are found for the inequality measures: The LIS indicators are measured in five years intervals and cover only about 30% of the cases. The Gini index has fewer missing values and covers about 55% of the cases since 2000. Finally, the Combined Generosity Index from the CWED2 database covers more than 50% of the sample until 1990, and then about 25% of the sample. It does not provide data for the democracies that emerged during the 1990 s because it focuses only on OECD countries.
5.5.2
Domestic Security Outcomes (Goal-oriented Performance 3b)
Domestic security can be seen as a “key function of government and also clearly qualifies as a ‘core’ function in more than a purely formal sense” (Paul Norris, 2007, p. 133). Roller (2005, p. 37) distinguishes between two components of domestic security: “protection of life and property” and “protection of public order”. She measures the protection of life and property through the number of violent crimes (murder/manslaughter, robbery) and property crimes (burglary). Within the protection of public order, she differentiates between riots and deaths from domestic political violence. However, she was unable to measure the latter component due to data limitations.
5.5 Integration: Social and Domestic Security Outcomes
133
Note: Black bars indicate the amount of missing values. Source: own calculaon with CWED2, LIS, V-Dem and WDI data
Figure 5.4 Missing Values in the Social Outcome Indicators
Wenzelsburger (2013) also stresses the importance of the evaluation of the crime state (number of violent or property offences). However, he adds additional indicators for this policy area (Wenzelburger, 2016): Firstly, the incarceration rate as a measure of the punitiveness8 , which combines the effects of the judicial and legislative system, and secondly, the government spending for public order and safety, which gives a good impression of a country’s overall domestic security policy. In particular, government spending for public order and safety appears to be an important indicator: With the establishment of “private prisons in several countries, it is possible that expenditure data, including the cost to government of using private provision, might provide a better indication of overall resourcing than employment data, generally including only those directly employed by government” (Paul Norris, 2007, p. 133). In addition, he also uses the number of police personnel as an indicator, but points out the problems of comparability of this indicator (Wenzelburger, 2015, p. 669).
8
Punitivity is a “mix of attitudes, enactments, motivations, policies, practices, and ways of thinking that taken together express greater intolerance of deviance and deviants, and greater support for harsher policies and severe punishments” (Tonry, 2007, p. 7).
134
5
Conceptualizing and Measuring Goal-oriented Performance
For Powell (1982), the protection of the public order is a central performance criterion, too. He specifies the term “political order” as the “absence of turmoil and violence and the maintenance of the basic forms of the democratic regime” (G.B. Powell, 1982, p. 9). Political systems seek stability and self-preservation. In contrast to autocracy, democracies have a “potential for participation and responsiveness [so that] a resolution of conflict without violence” (G.B. Powell, 1982, p. 154) is possible. This function is also closely related to the concept of statehood (Rotberg, 2004; Mohamad-Klotzbach & Schlenkrich, 2017; Schlenkrich et al., 2016a; Schneckener, 2004a). On the one hand, the state must protect its citizens from other non-state forces by controlling its territory to maintain the monopoly on the use of physical force. On the other hand, a functioning administration ensures “the state’s ability to provide its citizens with basic life chances. These include the protection from (relatively easily) avoidable harmful diseases; a basic education […] and a basic administration that regulates social and economic activities” (Grävingholt et al., 2012, p. 9). Therefore, I will follow Roller’s concept, because its two components cover the most important aspects of domestic security. How can we measure them? There is no simple solution due to large data limitations. The data comparability is limited because of legal, statistical and substantial variations between the countries (Aebi, 2019; Harrendorf, 2018), e.g. the definition of criminal offences varies across countries. This means that the data often does not represent the actual or “true” number of offences, but rather the way the statistics are constructed. In order to increase comparability, Roller (2005, p. 38) proposes to “restrict oneself to offences where comparability is greatest with respect to the legal definitions of the offence, and where the frequency of reporting is the highest”. She suggests that this is the case for three offences—intentional homicides, robbery and burglary. Aebi/Linde (2010) analyze even more offences like intentional homicide, assault, robbery, burglary, theft and motor vehicle theft. Nevertheless, they limit themselves to European countries and analyze only time trends and not the level of the offences. Recent research shows, however, that we need to proceed even more restrictively: “[C]ompleted homicide is the only offense category for which police data may come close to the true picture of crime” (Harrendorf, 2018, p. 192). This is why this study only includes this indicator of intentional homicides, because it is necessary to assess not only the time trend but also the level of these offences between a more heterogeneous sample than the European countries. The data comes from the United Nations Crime Trends Survey (UNCTS): They are based on an international classification system of crimes (International Classification of rimes for Statistical Purposes—ICCS). Although they use a standardized definition of crimes that is useful for international comparability, they
5.5 Integration: Social and Domestic Security Outcomes
135
do not monitor compliance with this definition by national agencies and do not adjust the data, so that “the added value of the ICCS […] is almost minimal” (Harrendorf, 2018, p. 174). This shows the importance of using just the intentional homicides as an indicator. ICCS defines it as an “[u]nlawful death inflicted upon a person with the intent to cause death or serious injury” (UNODC, 2015, p. 33). For the first dimension, fight against crime, I consider the following indicators: I include the general government spending for public order and safety (OECD, 2019) from the structure of general government expenditures by function (COFOG). “Public order and safety” encompasses police services, fire-protection services, law courts, prisons as well as research and development in the field of public order and safety (OECD, 2017, p. Annex C). Paul Norris (2007, p. 135) states that while “this constitutes a somewhat imperfect definition of the resources devoted to the realm of criminal justice (for instance, because of the inclusion of the fire service), these data are generally collected by government economic services to match the COFOG classification, and so might avoid some of the jurisdiction-based anomalies associated with data collected through the criminal justice system”. However, this is—strictly speaking—not an outcome indicator. Since it might be used as a proxy for domestic security outcomes and comparable data in this performance area is hardly found, I include this indicator. But instead of using spending for public order and safety in relation to the GDP (e.g. % of GDP), I calculate the spending per capita: what matters is, how much is spent per citizen. I include theft as an indicator. However, I am not interested in the theft per se but rather as a proxy variable for police performance: “For the total of crime and for minor offenses, rates can be compared but are mainly an indirect measure of police performance, more or less unrelated to the reality of crime” (Harrendorf, 2018, p. 194). A well-functioning police force can record more criminal charges, and citizens also report more because they trust the police. Thus, paradoxically, the theoretical link to performance is as follows: the higher the theft rate, the higher the police performance. Furthermore, I include the “Reliability of Police Services” indicator from the Global Competitiveness Report (GCR) 2019 which measure “to what extent can police services be relied upon to enforce law and order?” (World Economic Forum, 2019, p. 615). The data for this indicator was collected by their “Executive Opinion Surveys” based on 16.936 business executives in 139 countries (World Economic Forum, 2019, p. 633).
136
5
Conceptualizing and Measuring Goal-oriented Performance
Finally, I exclude a measure for punitiveness (incarceration). Although punitiveness is an important indicator for criminal justice research, because the “[s]ocieties vary substantially in the severity of the penalties they impose for various kinds of crimes and criminals” (Blumstein et al., 2005, p. 347), it appears less important for assessing the performance of domestic security outcomes, because the degree of punitiveness cannot be easily placed on a continuum: Is more or less punitiveness good for domestic security? For example, low degrees of punitiveness may be found in well-functioning systems that “are seen to be just, fair, and trustworthy [and which] can be more self-confident and restrict punishment to a minimum” (Harrendorf, 2018, p. 194) or in deficient criminal justice systems that fail to clear up criminal offences. The second component, “protection of public order”, distinguishes between riots and deaths caused by organized violence according to Powell (1982). Riots are defined “as large numbers of citizens acting out of control in an unplanned and disorganized fashion, and destroying property” (G.B. Powell, 1982, p. 21). In contrast, deaths “by political violence sometimes result from rioting, usually as the police restore order, but are more frequently the outcome of systematic armed attacks by terrorists” (G.B. Powell, 1982, p. 21). I was unable to measure these components due to lack of data, insofar the data limitation did not change since Roller’s attempt. Usually the data for this measure is taken from the Cross-National Time-Series Data Archive (Banks & Wilson, 2019). However, this dataset is not free of charge. It is possible to measure the second sub-component, organized violence, with data from the World Governance Indicators (“Political Stability and Absence of Violence”). However, this data is seriously limited and cannot actually be analyzed over time due to its assumption of a “constant global average” (Iqbal & Shah, 2008, p. 30): the mean of each measured year is artificially set to zero, even if all countries dramatically improve in their governance from one year to another. Table 5.6 shows the selected indicators for domestic security outcomes in the integration function. The number of missing values is depicted in Figure 5.5. Data becomes only available for my sample after 1990. The UNODC data cover 50% to over 85% of all observations for some time points. The Reliability of Police Services has also huge coverage, however, the data collection began only in 2007. Finally, government spending for public order and safety encompasses only 35% of the sample because it is restricted to the OECD region.
5.5 Integration: Social and Domestic Security Outcomes
137
Table 5.6 Selection of Indicators for Domestic Security Outcomes Goal
Component
Subcomponent
Name of Indicator
Source
Domestic Security
Fight Against Crime
Crime Level
Homicides per 100.000
UNODC
Police Performance
Theft per 100.000 (Proxy Indicator) Reliability of Police Services
Protection of Public Order
GCR
Government Spending
General government OECD spending—Public order and safety (per capita)
–
–
Source: own table
Note: Black bars indicate the amount of missing values. Source: own calculaon with UNODC, OECD and GCR data
Figure 5.5 Missing Values in the Domestic Security Outcome Indicators
–
138
5.6
5
Conceptualizing and Measuring Goal-oriented Performance
Latent Pattern Maintenance: Confidence in Institutions (Goal-oriented Performance 4)
Since the studies by Almond/Verba (1963, 1980), political culture has been “one of the most important concepts of empirical political research” (Fuchs, 2007, p. 1). Political culture is concerned with the survivability of a regime (Fuchs, 2007; Pickel & Pickel, 2006, pp. 66–69, 2016; Mishler & Rose, 2002). The congruence thesis implies that the likelihood that a political system survives increases with a political culture matching the structure of the political system. Using the example of the Weimar Republic, it is pointed out that a democracy without democrats cannot survive (Sontheimer & Bleek, 2005). Therefore, political culture is an important component of a successful consolidation of a democracy (Merkel, 2008, 2010). Thus, political culture is associated with the concept of legitimation in a descriptive sense of people’s beliefs (Thomassen & van Ham, 2017, p. 6; see for a definition of legitimation Lauth, 2020, 2017b; Peter, 2017). This is further elaborated in the concept of political support by Easton (1965, 1975). Easton (1975, p. 436) defines it as “an attitude by which a person orients himself to an object either favorably or unfavorably, positively or negatively”. He distinguishes between three political objects, the political community, the political regime and the political authorities. Importantly, Easton’s approach is also based on the distinction between two different types of feelings towards political objects, which he describes as specific and diffuse support. Specific support is linked to evaluative orientations, whereas diffuse support taps affective orientations towards political objects. While the “former might reflect the immediate performance of government, the latter represents deeper political feelings that might provide a potential reservoir of support in times of political stress” (Dalton, 2004, p. 8). What is the relationship between specific and diffuse support? Only diffuse support has consequences for the survivability and consolidation of a regime, although a lack of specific support may turn into a lack of diffuse support under specific circumstances. Specific support tends to depend on short-term factors, while diffuse support is influenced by long-term developments (Pippa Norris, 2011). Similar to specific and diffuse support, Lipset (1959) differentiates between legitimacy and effectiveness of the political system and argues that a prolonged time of ineffectiveness can make regimes appear as illegitimate with the risk of collapse. As Fuchs (2007, p. 165) notes, “it can be assumed that the causal direction moves top down (transfer) in fully established democracies [from diffuse to specific support], whereas it moves upwards (generalization) in new established democracies” from specific to diffuse support.
5.6 Latent Pattern Maintenance: Confidence in Institutions …
139
Easton’s influential approach has been adjusted in two key respects: On the one hand, Norris (1999, 2011) simplified the complex concept of political support. Instead of differentiating between specific and diffuse support for each political object, which makes the original concept hard to measure empirically, Norris equates the abstractness of the objects themselves with the types of political support: The “levels can be seen as ranging in a continuum from the most diffuse support for the nation-state down through successive levels to the most concrete support for particular politicians.” (Pippa Norris, 1999, pp. 9–10). It is a more pragmatic approach: By losing some conceptual clarity compared to Easton’s approach, Norris provides a conceptual framework which is “more suitable for empirical research” (Thomassen & van Ham, 2017, p. 7) due to easier measurement with the current data availability. This is also justified by the fact that Easton’s distinction between specific and diffuse support does not hold empirically, because the “difficulties in independently measuring diffuse and specific support are enormous, and separate indicators of the two generally are found to be highly correlated” (C.J. Anderson & Guillory, 1997, p. 70). On the other hand, Norris (1999, 2011) and Dalton (2004) split Easton’s single category “political regime” into three political objects, namely regime principles, regime performance and regime institutions. Thus, Norris defines five different political objects in a hierarchical order: political community, political principles, political performance, political institutions and finally, political authorities (see Figure 5.6). The most abstract level is the political community. It implies “a basic attachment to the nation beyond the present institutions of government and a general willingness to co-operate together politically” (Pippa Norris, 1999, p. 10). Political community is related to the concept of statehood (Gilley, 2006; for definitions of statehood, see Mohamad-Klotzbach & Schlenkrich, 2017; Schlenkrich et al., 2016b). The next level is the regime principles that signify the values of the political system. Here the main question is whether the citizens are in “agreement with these specific values, or [in] agreement with the idea of democracy as the best form of democracy” (Pippa Norris, 1999, p. 11). The third level is named “regime performance” and captures “how authoritarian or democratic political systems function in practice” (Pippa Norris, 1999, p. 11). It is the subjective evaluation of the quality of democracy. Regime institutions are the fourth level. It “includes attitudes towards governments, parliaments, the executive, the legal system and police, the state bureaucracy, political parties, and the military” (Pippa Norris, 1999, p. 11). Finally, the fifth level is the orientation towards political actors/authorities, “including evaluations of politicians as a class and the performance of particular leaders” (Pippa Norris, 1999, p. 12).
140
5
Conceptualizing and Measuring Goal-oriented Performance
Figure 5.6 Levels of Support by Norris
Source: own figure based on Norris (1999, p. 10)
To establish a distinction between goal-oriented and general performance within the latent pattern maintenance function, I use Easton’s differentiation between specific and diffuse support. However, I follow Norris’ pragmatic approach to measure this concept. Goal-oriented performance is assessed at both levels, the political authorities and the political institutions, which have more to do with specific support than diffuse support. Evaluation of the regime performance is also included in the goal-oriented performance, because its measurement focuses on the institutions as they function in reality. However, it is still a middle category and could be linked to the general performance dimension. By contrast, political community and regime principles can be more clearly assigned to general performance, because they capture more diffuse support.
5.6 Latent Pattern Maintenance: Confidence in Institutions …
141
In contrast to the measurement of the other AGIL functions, I draw on survey data (micro-level data) rather than on macro-level data. To obtain the highest spatial and temporal coverage, I combine the World Value Survey (WVS, 2015) with the European Value Survey (EVS, 2015). The so-called “Integrated Value Survey” (IVS), constructed from both surveys, consists of 113 countries and covers a period from 1981 to 2014. Regime performance is usually measured with the question “Are you very satisfied, fairly satisfied, not very satisfied or not at all satisfied with the way democracy is functioning (in your country)?” (van Ham & Thomassen, 2017, p. 18). However, this question is not included in the WVS/EVS. Therefore, I cannot measure this component. Drawing on a different dataset, I would not be able to obtain values for a large number of countries and time points needed for the empirical analysis. Evaluation of the regime institutions is traditionally measured by the indicators “confidence in institutions” or “trust in institutions” (van Ham & Thomassen, 2017, p. 18). Due to the focus on performance of political systems, I include public institutions (parliament, parties, judiciary and civil service), but exclude private institutions. However, confidence and trust are conceptually not the same thing. Confidence is the “belief in the capacity of an agency to perform effectively” (Pippa Norris, 2011, p. 19). Trust is the “rational or affective belief in the benevolent motivation and performance capacity of another party” (Pippa Norris, 2011, p. 19). Empirically, however, there is almost no difference (van der Meer, 2017, p. 4). For this study, the narrower question of confidence is more relevant, as it focuses only on the capacity and effective performance of the institution. This confidence indicator is included in the WVS/EVS. Finally, political authorities are usually measured with “trust in politicians” or the approval of incumbents. Both items are not included in the WVS/EVS. Table 5.7 presents the selected indicators for goal-oriented performance in the latent pattern maintenance function. The missing values are shown in Figure 5.7. The first data for all variables becomes available in the early 1990 s. Overall, the availability of the data varies greatly over time and it is not continuously available on an annual basis because it is survey data. For some years, over 35% of all cases are observed. It is important, that it is possible to align surveys which are conducted closely in time to increase the sample size considerably (see Figure 136 in the appendix). Nevertheless, it seems clear, that this kind of data is not suitable for time-series analysis and I have to reduce the analysis of the latent pattern maintenance function to single points in time.
142
5
Conceptualizing and Measuring Goal-oriented Performance
Table 5.7 Selection of Indicators for Specific Support Goal
Component
Name of Indicator
Source
Specific Support
Regime performance
-
Regime institutions
Confidence in Parliament (E069_07) Confidence in Civil Service (E069_08) Confidence in Justice System/Courts (E069_17) Confidence in Parties (E069_12) Confidence in Government (E069_11)
Integrated Value Survey Wave 1981–1984, Wave 1989–1993, Wave 1994–1999, Wave 1999–2004, Wave 2005–2009, Wave 2010–2014
Political actors
-
Source: own table
Note: Black bars indicate the amount of missing values. Source: own calculaon with WVS (2015)/EVS(2015)
Figure 5.7 Missing Values in the Specific Support Indicators
5.7 Summary and Conclusion
5.7
143
Summary and Conclusion
In this chapter, I developed the conceptualization and measurement for the goaloriented performance dimension of the AGIL typology of political performance. This typology differentiates between three dimensions, namely goal-oriented, policy regime and general performance and four central function which every system has to fulfill in order to survive (adaptation, goal-attainment, integration and latent pattern maintenance). Due to strong limitations of the data for the general performance dimension, I had to limit the empirical analysis to the goal-oriented performance and parts of the policy regime performance. This means the general performance dimension will be a desideratum which has hardly been studied due to its complex conceptual nature and complex measurement. At the beginning of the chapter, I outlined several criteria based on Munck/Verkuilen (2002) which guided the way in conceptualization and measurement of the single matrix fields of the goal-oriented performance. The concept should be built on a useful and parsimonious definition with discriminatory power, while the selection of the indicators is based on the construct validity, data quality and coverage. Overall, most of the concepts for each outcome were parsimoniously formulated and have strong discriminatory power. Closest to this ideal are the concepts for social outcome with its three components (insurance against social risks, redistribution of material resources, equal opportunity) and the concepts of domestic security outcome based on two components (fight against crime, protection of public order) as well as the concept of environmental outcome. Some components of the economic performance and the specific support are more vaguely formulated and have less discriminatory power (e.g. the line between specific and diffuse support could not be established clearly). Finally, the vaguest concept used here is the reformability performance. While it may not be a complex concept, it could only be conceptualized insufficiently because one had to leave open what good reformability means and how it could be measured. The selected indicators generally have high construct validity. This is especially true for the environmental outcome, social outcome as well as the specific support performance. Some problems occur for the indicators of the economic performance (e.g. unemployment indicator is to some extent invalid) and the domestic security outcome (theft indicator as a proxy for police performance). Furthermore, a lower amount of construct validity can be found for the indicators in the goal-attainment performance. The reason is that these indicators are only focused on constitutional amendments, leaving out the important part of institutional learning or implicit constitutional change.
144
5
Conceptualizing and Measuring Goal-oriented Performance
The next criterium, the data quality, can be summarized as follows: The data of the indicators for the specific support performance and the environmental outcomes have a high quality. Comparability issues are evident in the data for the economic performance (e.g. indicators by PWT) and for the social outcomes (e.g. inequality measures). The least amount of data quality and therefore, strong comparability issues, can be found in indicators of the domestic security outcome. In addition, the data quality of the amendment rates is at least doubtful, since—as discussed—these measures do not correlate very well with each other. The last guiding criteria was the evaluation of the coverage. For how many cases of my sample do these indicators provide data? Is it possible to perform a time-series analysis with a high statistical power? The largest coverage, spatially and temporally, have the indicators for the economic outcome. They cover almost the whole sample and provide data from 1980 onwards. A time-series analysis should be possible with this data. The same applies for social outcome, domestic security outcome and environmental performance, all of which provide data for a large number of cases and a long period of time, even though they appear more limited than the data for the economic outcome indicators. As discussed, there is not enough time-series data for the specific support performance. Finally, the indicators of the reformability performance do not offer time-series data, although they cover almost the whole sample. Table 5.8 gives a summary. The next chapter describes the aggregation procedure via exploratory factor analysis: To derive a final value for each performance component, the multiple values of the indicators need to be meaningfully aggregated. This step offers also a variety of diagnostic procedures, so that it might be necessary to exclude some indicators, because they negatively affect the aggregation solution. In addition, the discussion of the missing values makes it clear that ignoring them in the aggregation procedure is not an option and would result in a loss of an unnecessary large number of cases, because often, a case is not missing all values from the indicators used to measure a performance area at the same time. There is at least some information about this case. Therefore, the next chapter also addresses ways to deal with missing data.
Source: own table
Latent Pattern Maintenance
Specific support
medium
high
medium
Domestic Security Outcomes
high
high
Social Outcomes high
medium
Integration
low
high
Reformability
high
Environmental Performance
medium
Construct Validity
Goal-Attainment
medium
Economic Performance
Adaptation
Parsimonious and Discriminatory Concept
Component
Function
Table 5.8 Final Evaluation of the Research Criteria
high
low
medium
low
high
medium
Data Quality
low (OECD, limited temporal coverage)
broad (80 democracies, 1990–2017)
medium (70 democracies)
limited (104 democracies for CCP; 25 democracies for Lutz)
medium (40 democracies)
broad (85 democracies)
short (often one point in time)
medium (1990–2017)
broad (1970–2017)
short (one point in time)
medium (1990–2017)
broad (1980–2017)
Spatial Coverage Temporal Coverage
5.7 Summary and Conclusion 145
6
Aggregating Goal-oriented Performance
6.1
Introduction
In the chapter before, the conceptualization and measurement of the goal-oriented performance areas were discussed. A variety of indicators were chosen to represent the several conceptual components. Thereby, I balanced the validity of the indicators, their quality and their spatial and temporal coverage. In this chapter, these chosen indicators are aggregated to measure the matrix fields of the goal-oriented performance. This chapter is more technical than the others. It is important to aggregate these indicators, because the focus of this study is the overall performance in each field: The aggregate value is the interplay of each indicator. To score high in those indices, countries have to score high on all indicators, not just on a single indicator (Vis et al., 2012, pp. 74, 80). Measuring performance using a variety of indicators, each of them separately included in the analysis, does not give an assessment of the overall performance and makes “comparing the analyses’ findings […] usually difficult because they examine different dependent variables” (Vis et al., 2012, p. 74). To weight and aggregate these indicators to an index, there are several methods available (Munck & Verkuilen, 2002, pp. 22–27; Goertz, 2006, pp. 39–50; Nardo et al., 2005, pp. 31–34): For example, if the components resp. indicators have the same weight and can compensate each other, the simple arithmetic average
Electronic supplementary material The online version of this chapter (https://doi.org/10.1007/978-3-658-34880-9_6) contains supplementary material, which is available to authorized users.
© The Author(s), under exclusive license to Springer Fachmedien Wiesbaden GmbH, part of Springer Nature 2021 O. Schlenkrich, Origin and Performance of Democracy Profiles, Vergleichende Politikwissenschaft, https://doi.org/10.1007/978-3-658-34880-9_6
147
148
6
Aggregating Goal-oriented Performance
would be appropriate. In contrast, if the components exhibit necessary or sufficient relationships and thus a non-compensatory relationship, the multiplication or minimum and maximum values are appropriate aggregation techniques. However, I use exploratory factor analysis (EFA) to weight and aggregate the indicators to an index, because the conceptualization does not give a “clear match with any specific aggregation rule” (Munck & Verkuilen, 2002, p. 24) and I theorize that the indicators are caused by an underlying latent dimension, the specific performance area (for the same reason V-Dem uses this method to aggregate their indices, see Teorell et al., 2016). In addition, in contrast to a simple averaging strategy EFA offers evaluation criteria to determine the relevance and goodness of each indicator, to determine the number of latent factors and the model fit. It allows to analyze, how good those proposed concepts in the previous chapter fit the data. And finally, while I prefer a unidimensional index because the highest level of aggregation makes a causal analysis easier, such an index does not make sense, if the values of the indicators do no move in the same direction. A factor analysis will examine the correlation structure, and based on it extract possible number of factors. For a successful factor analysis, I follow the steps in Figure 6.1 which will be discussed in the next sections. In the first step, the raw data is transformed to meet the distributional prerequisites of the multiple imputation algorithm and the explorative factor analysis. In the second step, I create multiple datasets via multiple imputation to fill in the missing values with plausible values. Contrary to the standard approach of multiple imputation, but necessary for the factor analysis, I average those values to a single dataset. The third step is applying the explorative factor analysis and evaluating its model fit. Then I extract the factor scores, which represent the performance of the countries. Lastly, for descriptive and interpretability purposes only, I transform the factor scores to a 0–100 scale, where 50 represents the average value of the sample. The next section deals with the methodological framework (6.2). There I discuss the steps and diagnostic tools of the exploratory factor analysis. In addition, I describe the multiple imputation procedure and transformation method of the indicators. The third section (6.3) shows the results of the factor analysis for each performance area. Finally, a summary is given in the last section (6.4).
6.2 Methodological Framework for the Aggregation
149
Source: similar to Buuren (2018, p. 19).
Figure 6.1 Workflow for the Aggregation Procedure
6.2
Methodological Framework for the Aggregation
6.2.1
Exploratory Factor Analysis
In this study, the main method for aggregating the individual indicators is the exploratory factor analysis (EFA): “EFA expresses the relationship between variables that can be directly measured, or observed, and those that cannot, typically referred to as latent variables” (Finch, 2013, p. 167). EFA supports the analysis by examining, whether the latent factors correspond to the concepts developed for each performance area. Since the typology of performance is new and not yet empirically researched, I conduct the more flexible EFA instead of the confirmatory factor analysis (CFA). The latter is used to test more established theories and hypothesis (Finch, 2013, p. 167), whereas the former “is used when there is little supporting evidence for the factor structure” (M. Norris & Lecavalier, 2010, p. 8). Although EFA and principal component analysis (PCA) are often used synonymously, they should not be confused (Watkins, 2018, pp. 9–10). In contrast to the EFA, PCA does not distinguish between shared variance (communality) and unique variance (unshared variance, error). Thereby, it is theoretically assumed that the shared variance accounts for the “underlying relationships” (M. Norris & Lecavalier, 2010, p. 9) between the manifest (measured or observed) variables. While the EFA identifies latent constructs that explain this shared variance
150
6
Aggregating Goal-oriented Performance
between the manifest variables, PCA is only a data reduction technique that summarizes all variance, whether shared or unique, with a smaller set of variables (James et al., 2013, p. 374). EFA is the method of choice to identify latent constructs (M. Norris & Lecavalier, 2010). There are some prerequisites for the use of EFA. There must be at least three observed variables per factor for statistical identification (Watkins, 2018, p. 4). In addition, a large sample size is needed. It is estimated at about 200 to 400 observations—dependent on the magnitude of the factor loadings and thus the strength of the detectable effect (M. Norris & Lecavalier, 2010). Finally, the distributional assumption of the EFA must be met (Watkins, 2018, pp. 5–9). Most important is the normality assumption of the observed variables. Skewness and kurtosis should not be too pronounced. If the normality assumption is violated, the Pearson correlation coefficient should not be used, but a more robust correlation method (e.g Spearman). Likewise, ordinal variables should be based on polychoric correlation. Outliers need to be identified and treated. Before conducting the EFA, it is “important to verify that the measured variables are sufficiently intercorrelated to justify factor analysis” (Watkins, 2018, p. 8). Here I use the Kaiser-Meyer-Olkin (KMO) measure which ranges from 0 to 1. Values above 0.5 and especially 0.6 for the overall correlation matrix and single variables is seen as adequate (H.F. Kaiser & Rice, 1974; Tabachnick & Fidell, 2014, p. 667). If the overall KMO is lower than this threshold, “it is advisable to drop the individual indicators with the lowest individual KMO statistic values” (Nardo et al., 2005, p. 67) and to test again. There are at least three decision to be made: The choice of the estimation method, the determination of the number of factors and the selection of the rotation of the factors. The main estimation method is the frequently used Pearson correlation. This method requires metric variables and the assumption of normality of the underlying data. For ordinal variables, however, polychoric correlations are used: It is “recommended that EFA be based on polychoric correlations if the ordinal variables are measured by fewer than five to seven categories or when distributions of the ordinal variables are asymmetrical” (Watkins, 2018, p. 7). This applies only to the survey variables for the Latent Pattern Maintenance function. I apply Horn’s Parallel Analysis (1965) to extract the number of factors. This is one way to provide “the most accurate empirical estimates of the number of factors to retain” (Watkins, 2018, p. 12). This method is superior compared to the eigenvalue-greater-than-1 rule (Kaiser Rule) (Guttman, 1954; H.F. Kaiser, 1960, 1970) or Cattell’s (1966) scree plot which both tend to “overestimate the number of factors underlying a set of data” (Finch, 2013, p. 174) and are generally not recommended (P. Wilson & Cooper, 2008). Parallel analysis is built on the
6.2 Methodological Framework for the Aggregation
151
assumption that some eigenvalues can become greater than 1, even though in reality the indicators are uncorrelated (Moosbrugger & Schermelleh-Engel, 2012, pp. 331–332). This means that some eigenvalues are only apparently important. To extract the correct number of factors, the eigenvalues of the current sample are therefore compared with artificially generated eigenvalues. These artificial generated eigenvalues are based on randomly generated datasets with variables that are truly uncorrelated in the population but are correlated by chance. The factors that have higher empirical eigenvalues than these artificial eigenvalues are retained. Furthermore, I use a root mean square error of approximation (RMSEA) as a model selection criterion. According to Davidov et al. (2016) a reasonable fit is achieved, if the RMSEA shows values less than 0.08, while a close model fit is obtained with a RMSEA < 0.05. The rotation supports the interpretability of the factor solution. However, this only applies to solutions with two or more factors. If there are more than one factor to extract, I choose an oblique factor rotation to not unreasonable restrict the intercorrelation of the factors. If there is empirically no intercorrelation between the factors, the oblique rotation can produce orthogonal results (Watkins, 2018, p. 15), i.e. solutions without correlated factors. The goal of the factor rotation is the simple structure of the factor solution (Moosbrugger & Schermelleh-Engel, 2012, p. 332). It is an “attempt to find a solution where each factor is loaded by several salient variables and each variable has a salient loading on one factor and trivial loadings on the remaining factors” (Watkins, 2018, p. 16). Thereby, a strong factor loading is defined above 0.6 (M. Norris & Lecavalier, 2010, p. 12). This means that the cross-loading variables should be explained and removed if necessary. I also need to check the internal consistency reliability of the factors in order to judge the measurement error. A standard measure of reliability is Cronbach’s Alpha (Lance et al., 2006; Nardo et al., 2005). However, the assumptions of Cronbach’s Alpha that “all items measure the same underlying variable, that they do so on the same scale, and that they are equally strongly associated to that underlying variable” (Peters, 2014, p. 59) are very restrictive and are often violated in practice, so that it should not be used. Thereby, “in cases where coefficient alpha is likely biased (i.e. multidimensional measures […] with unequal factor loadings), omega coefficients may be more accurate estimates” (Watkins, 2017, p. 6). Thus, I report the hierarchical omega estimates for the general (ωtotal ) and subscale (ωgroup ) constructs (the latter only if I extract more than two factors). They state the reliable variance for each factor. Thereby, the values for an appropriate level of reliability should be greater than 0.5 (Watkins, 2017, p. 6).
152
6
Aggregating Goal-oriented Performance
However, I do not suspect that these factor scales have a high degree of reliability. The selected indicators—as discussed in the previous chapter—are often plagued with measurement errors and comparability problems and often do only reflect the best and not the ideal choice. In addition, these indicators and items were not especially designed to measure this specific construct or scale, whereas this is usually the case in the psychological research where these reliability coefficients are commonly used. Nevertheless, the model fit of the factor analysis and the reliability coefficients still allow me to test the conceptual assumptions. In contrast to a mere manual aggregation based on coding rules, this analysis reveals pattern in the data and thus at least tells us whether these expectations comply with the empirical data. Finally, I am not only interested in the goodness of the factor solution, but rather in the factor scores for each individual observation (e.g. its economic or social performance). Due to “factor indeterminacy” (Grice, 2001; Steiger, 1990), there are several ways to calculate those factor scores (DiStefano et al., 2009). Each way has advantages but also drawbacks. The most common are the regression scores, the Bartlett scores as well as the Anderson-Rubin-Scores. In contrast to the other two methods, the Bartlett scores produces estimates that are “unbiased and, therefore, more accurate reflections of the cases’ location on the latent continuum in the population” (DiStefano et al., 2009, p. 5). Thus, I apply the Bartlett scores. The discussion of the EFA has shown that it does not produce a single correct solution, but leaves room for the subjective decisions of the researcher. Even if it is an exploratory approach, it is important that all of these decisions are addressed and justified, as they influence the factor solution and thus the final performance estimates of the countries. The checklist for an appropriate EFA is given in Table 6.1. In addition to these decisions, I need to treat the missing values as well as normalize the indicators before they enter the EFA.
6.2.2
Treatment of Missing values: Multiple Imputation
As we have seen in the chapter before, there are missing values in the selected indicators. However, not treating missing values can lead to a significant bias in the estimates and to a loss of statistical power (Graham, 2012; Schlomer et al., 2010; Buuren, 2018). Therefore, it is recommended “that researchers report the amount of missing data in a study, consider the potential sources and patterns of missing data, and use and report appropriate methods for handling missing data in their analysis” (Schlomer et al., 2010, p. 2).
6.2 Methodological Framework for the Aggregation
153
Table 6.1 Checklist for Exploratory Factor Analysis Step
Diagnostic Criterium
Existence of Factor Structure
Kaiser-Meyer-Olkin (KMO) > 0.5
Number of Factors to extract
Parallel Analysis and RMSEA
If more than one factor: Simple Structure
No cross-loadings of items (≈ strong factor loading above 0.6)
Rotation
Oblique Rotation
Internal Consistency Reliability
Hierarchical omega coefficient > = 0.5 (ωtotal and ωgroup )
Factor Scores
Bartlett Scores
Source: own table
Rubin (1976) distinguishes between three causes or mechanisms of missing data: Missing Completely At Random (MCAR), Missing At Random (MAR), and Missing Not At Random (MNAR). Each mechanism results in different consequences for the statistical analysis with regard to obtaining unbiased estimates and calls for different methods of handling missing data (Graham, 2009, 2012). Missing Completely At Random (MCAR) means that “the probability of being missing is the same for all cases” (Buuren, 2018, p. 8) and thus, is unrelated to the data. This mechanism leads to unbiased estimates regardless of the used method for handling missing data. However, it can produce low statistical power (Graham, 2009, p. 553). Missing At Random (MAR) means that the missing data is related to other observed data. By including the observed data into the missing data model, it is possible to fix the missingness and obtain unbiased estimates. Finally, there is Missing Not At Random (MNAR) which means that “the likelihood of missingness is related to the score on that same variable had the participant responded” (Schlomer et al., 2010, p. 3). For example, if respondents do not answer income questions because they do not want to disclose their high income. Or if we to not obtain data for quality of statehood indicators because the state does not have the capacity to answer due to their lower quality of statehood. These data are truncated and the assumption of ignorability is violated: We do not believe anymore “that the available data are sufficient to correct for the effects of the missing data” (Buuren, 2018, p. 40). Thus, MNAR data cannot be fixed and leads inevitably to biased estimates regardless of the method used (even standard analysis techniques like regression cannot be applied to this data anymore). Overall, the MAR assumption should be met for the sample of this study. Since
154
6
Aggregating Goal-oriented Performance
the sample consists of democracies, these countries provide a minimum level of transparency, so that there is a possibility that even low performance values are likely to be reported by these countries. How can we treat missing values? Older methods of treating missing values are listwise deletion (complete cases analysis) or pairwise deletion (available case analysis). Both methods are “generally not recommended” (Schlomer et al., 2010, p. 3). On the one hand, listwise deletion only works, if the data are MCAR which is often unrealistic. Even then listwise deletion results in a loss of statistical power. On the other hand, the issue with pairwise deletion is that it uses “different cases for each correlation, which results in difficulty in comparing correlations and oftentimes the inability to use these correlations in multivariate analyses” (Schlomer et al., 2010, p. 3). State of the art methods encompass multiple imputation (MI) and full-information maximum likelihood (FIML) (Graham, 2009, pp. 555–558). The standard workflow of multiple imputation is shown in Figure 6.2. Multiple imputation creates several datasets (usually 5). In these datasets the missing values are replaced by plausible values. These datasets “are identical for the observed data entries, but differ in the imputed values. The magnitude of these difference [!sig.] reflects our uncertainty about what value to impute” (Buuren, 2018, p. 20). Then, the same statistical method (e.g. regression) is applied to all imputed datasets independently. Finally, these results which vary due to the uncertainty in the imputed values for the missing data are pooled together into a single estimate using Rubin’s rule. The final estimate incorporates both “(a) the standard errors of the analysis of each data set and (b) the dispersion of parameter estimates across data sets” (Lorenzo-Seva & Van Ginkel, 2016, p. 597). However, this workflow cannot be easily applied to the EFA. On the one hand, the different imputed datasets could lead to a different number of factors that can be extracted (Nassiri et al., 2018, p. 502). On the other hand, the “freedom of the final position of rotated factors means that the rotated factor solution may turn out to be non-comparable” (Lorenzo-Seva & Van Ginkel, 2016, p. 599). There are complex ways to deal with these two issues (Lorenzo-Seva & Van Ginkel, 2016; Nassiri et al., 2018). However, I follow the simpler approach by Dray and Josse (2015), which is depicted in Figure 6.1. They propose to average the imputed data to obtain only one complete dataset. This dataset is then subjected to the EFA procedure to obtain the factor scores. Although this makes it impossible to derive uncertainties in the factor estimates due to the missing values and thus, is in conflict with the standard workflow of multiple imputation, Dray and Josse (2015, p. 665) can show that their approach “provide[s] the best estimates for missing
6.2 Methodological Framework for the Aggregation
155
Source: Buuren (2018, p. 19)
Figure 6.2 Standard Work Flow of Multiple Imputation
values”. Importantly, the factor scores should also be less biased, because in the majority of cases only a few variables are missing. Important is the correct specification of the imputation model. Firstly, the imputation model and the analysis model should be similar: We have to include “all of the variables from the analysis model in the imputation model to ensure that the imputation model preserved the relationships between the variables of interest” (Nguyen et al., 2017, p. 3). The second consideration is with regard to the auxiliary variables: They should be highly correlated with the missing data. Including auxiliary variables increases the “predictive power[,] even if including them […] would produce bias in estimating a causal effect […] or collinearity would preclude determining which variable had a relationship with the dependent variable” (Honaker et al., 2011, p. 8). Thus, it is important to note that the imputation model is not a causal model but rather a predictive one. The time-series structure of the dataset can be used to estimate better imputed values, because “knowing the observed values of observations close in time to any missing value may enormously aid the imputation of that value” (Honaker et al., 2011, p. 17). Thus, the values should be more plausible by using a linear time trend as well as lagging and leading the observed variables.
156
6
Aggregating Goal-oriented Performance
There is a variety of diagnostic checks for multiple imputation: The first diagnostic is assessing whether the algorithm has converged. The algorithm needs some iterations until the parameters of the imputed variables have stabilized and do not change anymore. Testing this, means to start the imputation model from different starting values and then check their convergence (Honaker et al., 2011, p. 29). If “we will see all of these chains converging to the same value, [we can] reasonably conclude that this is the likely global maximum” (Honaker et al., 2011, p. 34). The second diagnostic check is examining the plausibility of the imputations. Commonly used is the density plot, which compares the density of the imputed values to the observed values, so that discrepancies can be seen (Buuren & Groothuis-Oudshoorn, 2011). However, it is “important to recognise that discrepancies between observed and imputed data are not necessarily problematic, since under MAR we may expect such differences to arise” (Nguyen et al., 2017, p. 6). Nevertheless, imputations with odd distributions need an investigation, because these may be caused by a violation of the MAR assumption or by an bad behaving imputation model (Abayomi et al., 2008; Honaker et al., 2011, pp. 25–27). A third diagnostic check refers to the predictive performance of the imputation model via leave-one-out-cross-validation: “a single observation is deleted and the proposed model is fitted to the remaining data and used to predict the outcome for the excluded data point” (Nguyen et al., 2017, p. 7). The closer the prediction to the actual observed value, the better is the imputation model. This tells us whether “the imputation model can confidently predict the true value of the observation” (Honaker et al., 2011, p. 28). Multiple imputation is carried out with the R-package Amelia II (Honaker et al., 2011). For the large survey data in the latent pattern maintenance function, I use the simpler and time-saving Full Information Maximum Likelihood (FIML) approach. FIML handles “the missing data and parameter estimation in a single step” (Graham, 2012, p. 53). However, it is not possible to use auxiliary variables with FIML: “One situation in which multiple imputation potentially holds an advantage over maximum likelihood estimation is in the use of auxiliary variables” (Enders, 2010, p. 336). Another disadvantage is that there are no diagnostic procedures to evaluate the quality of the FIML procedure. The R-package umxEFA (Bates et al., 2019) is used; it allows us to compute the factor scores for respondents with missing data. A summary of the treatment of missing values in this study is given in Table 6.2.
6.2 Methodological Framework for the Aggregation
157
Table 6.2 Checklist for Treatment of Missing Values Assumption/Prerequisite
Multiple Imputation: Diagnostic Procedure
FIML
MAR-Assumption
Can only be made plausible; no statistical test available
Survey Data
Imputation and Analysis Model
Include auxiliary variables but also all variables from the causal analysis;
Convergence
Stabilization of parameter values
Plausibility of Imputations
Density of observed vs. imputed values
Predictive Performance
Leave-One-Out-Cross-Validation (LOOCV)
Source: own table
6.2.3
Transformation of the Indicators
The aim of the transformation of the indicators is to meet the normality assumption of the multiple imputation algorithm and the exploratory factor analysis. As discussed, factor analysis and the multiple imputation algorithm presuppose that the variables are normally distributed. A transformation of a non-normally distributed variable into an approximately normally distributed one is therefore indicated. A transformation can also deal with outliers that would otherwise distort the results.1 Hence, I apply data transformation methods that alter the data values by a specific mathematical function (Osborne, 2002). Tukey’s ladder of power (Tukey, 1977) is commonly used to identify the best transformation of the data so that it resolves the skewness and thus brings the data closer to a normal distribution (see Table 6.3). However, there are three caveats to this method: First, “[p]ower transformations preserve the order of the data only when all values are positive and [second, they] are effective only when the ratio of the largest to the smallest data values is itself large” (Fox, 2016, p. 59). The third caveat is related to proportions, as power transformations are “often unhelpful for proportions because these quantities are bounded below by 0 and above by 1” (Fox, 2016, p. 80). For this type of variable, Tukey invented the “folded power transformations” which can be 1
A transformation also establishes comparability between the variables by adjusting their variances. This would be particularly important in the context of a PCA, since the variables are often measured on different scales and the component solution should not “depend on an arbitrary choice of scaling” (James et al., 2013, p. 381). However, the EFA is usually based on the correlation matrix, which implies a standardization of the variables.
158
6
Aggregating Goal-oriented Performance
used instead. Fortunately, it is easy to satisfy the first two conditions by adding a positive or negative constant. In doing so, the data should be shifted so that “its leftmost point (minimum value) is anchored at 1.0” (Osborne, 2002, p. 5) to increase the effectiveness of the transformation. In addition, most variables in this study are not proportions, i.e. they do not hit 0 and 1. Table 6.3 Tukey’s Ladder of Power λ
−2
−1
−1/2
0
x
−1 x2
−1 x
−1 √ x
log x
1/2 √ x
1
2
x
x2
Source: based on Tukey (1977, pp. 172–173)
A disadvantage of the data transformation is that a substantial interpretability of the transformed variable scores is no longer possible (Osborne, 2002, p. 4). Although the transformation preserves the order of the data points, the distance between the data points is modified. However, this is less relevant in this study, since the interest is not in the values of these variables, but rather in the scores of the latent variables produced by the factor analysis. To find the best power or folded power transformation that approximates the normal distribution as closely as possible, I perform a visual inspection and use a formal test, the Shapiro-Wilk test (Shapiro & Wilk, 1965). This test assesses whether the data has been transformed to a normal distribution by calculating the so-called W statistic: A higher W statistic means a better fit with the normal distribution. Then I use the power or folded power transformation which maximizes the W-statistic (Mangiafico, 2016, p. 708). Although there are other test statistics for normality, it is important to note that “true normality is exceedingly rare” (Osborne, 2002, p. 1) and therefore, it is sufficient to improve the normality of the variables. For this purpose the Shapiro-Wilk test should be sufficient. In addition, I use winsorization to treat possible influential data points. In contrast to trimming values, winsorization “reduces the influence of outliers without completely removing observations by adjusting response values more centrally” (Igo, 2010, p. 601). I replace values which lie outside the 1% or 2.5% quantile—depending on necessity—with values at these boundaries. Finally, I scale the variables to have a mean of 0 and a standard deviation of 1 (standard normal distribution). If necessary, I reverse the scores of the variables before entering the exploratory factor analysis, so that “high scores on all the variables mean the same thing” (Watkins, 2018, p. 6). This supports the interpretability of the factor solution.
6.2 Methodological Framework for the Aggregation
159
For the empirical regression analysis in chapter 8 I use the untransformed factor scores, because they follow a standard normal distribution and are therefore best suited for requirements of a regression analysis. For the descriptive analysis in chapter 7, however, the factor scores themselves are transformed on a scale from 0 to 100 to allow an easier interpretability. Here I follow the “the distanceto-target technique for indicator construction, which situates each country relative to targets for worst and best performance” (Wendling et al., 2018, p. 7). Since it is not possible to conceptualize exactly what is good or bad performance, I use an empirical yardstick: The worst performance marks the 2.5th percentile of the factor scores, and the best performance is set to the 97.5th percentile. This method has the advantage over the min-max scaling since “[t]rimming off the tails of the underlying distribution is helpful because it prevents outliers from having undue influence on the resulting scores” (Wendling et al., 2018, p. 8). The percentiles are calculated over the entire time series in order to maintain comparability of these scores over time. Table 6.4 summarizes all important steps of the transformation step.
Table 6.4 Checklist for Transformation Procedure Step
Procedure
Tukey’s ladder of power
For metric variables (all values must be positive, ratio of largest to smallest value > 5, minimum value set to 1)
Tukey’s folded power transformation
For proportions (bounded between 0 and 1)
Winsorizing
Replacing extreme outlying values
Standardization and Scale Reversing
Mean 0 and Standard Deviation 1 High scores on all variables should mean the same thing
Distance-to-target technique
Transform factor scores to 0 and 100 according to the 2.5th and 97.5th percentile (for descriptive chapter only)
Source: own table
160
6
Aggregating Goal-oriented Performance
6.3
Application of the Aggregation Framework to the Performance Areas
6.3.1
Economic Performance
A transformation of the raw indicators of the economic performance area is necessary. The distributions of the indicators are strongly skewed and there are extreme values (see Figure 138 in the appendix). However, a transformation of the values results in distributions that are approximately normally distributed (see Figure 6.3). Unfortunately, there are some deviations for the values of the investment indicator: The distribution shows a large bulk of values at the lower end. As we will see, this affects the goodness of the imputation procedure.
Source: own data
Figure 6.3 Transformed Sample (Economic Performance)
How did the multiple imputation algorithm work out? The algorithm converges to the same value, even though different starting points are used (see Figure 6.4). For some reason, one chain took a little bit longer to converge. In total, there is a reasonable indication that we found the global maximum and that the convergence criterium is fulfilled.
6.3 Application of the Aggregation Framework …
161
Note: 5 chains run from different starng locaons. If all chains move towards the same value indicated by the dashed line, the chains successfully converged. Source: own data
Figure 6.4 Convergence Plot for Economic Performance
According to the density plot for the indicators (see Figure 6.5), there is a misfit between the observed and imputed values for the variables.2 The imputed values for the consumption per capita are consistently lower than the observed values. At the same time, imputed values for the unemployment rate and inflation are higher than the observed values. As explained above, this can be expected under the MAR assumption. It means that the overall state of the economy of the imputed countries is lower. This is plausible given that the missingness occurs mainly at the beginning of the time series and those countries are missing with a lower economic development. Therefore, these misfits do not pose a problem for the imputation model but rather show its capability of producing plausible values. The odd distribution of the values for the GDP per capita indicator is created because of its relatively low number of missing values. 2
I use the following auxiliary variables to support the imputation: GDP per capita (WDI), Life Expectancy in years (WDI), Political Freedom Dimension (DeMaX), Political Equality Dimension (DeMaX), and Political and Legal Control Dimension (DeMaX). In addition, I include all relevant independent variables discussed in the next chapters, so that the imputation and analysis model are similar.
162
6
Aggregating Goal-oriented Performance
Source: own data
Figure 6.5 Density Plots for Economic Performance
Predictive performance seems overall okay with the exceptions of one indicator (see Figure 6.6). For most indicators, the imputation model is able to predict the observed values successfully, because all confidence intervals overlap with the straight line. This does not apply to investment indicator where the predictive values do not fit the beginning and end of the scale. This seems to be an extension of the former identified problem in the distribution with the huge bulk of low values. This means this indicator should not be used and I exclude it from the further analysis. The (minor) misfits for the balance and inflation indicators affects only the lower end of the scale, so they might not pose a major problem for the analysis. I now turn the exploratory factor analysis. Including all indicators, the KMO test shows an overall value of 0.69 (see Table 6.5). Although the value is above the targeted threshold for a satisfactory fit, the MSA value for the balance indicator and governmental debt indicator is below the threshold (< 0.6). This means that these items do not have a clear relationship with the other variables (see the correlation in Figure 137 in the appendix). Therefore, they are dropped from the further analysis. This results in an increase of the overall KMO value to 0.7. No item is below the targeted threshold.
6.3 Application of the Aggregation Framework …
163
Source: own data
Figure 6.6 LOOCV for Economic Performance
Table 6.5 Kaiser-Meyer-Olkin (KMO) Test for Economic Performance Indicator
Set 1
Set 2
Set 3
0.69
0.7 0.67
Overall
0.69
Current account balance (WDI)
0.53
Household Consumption per capita (WDI)
0.65
0.65 0.45
Central Government Debt (%GDP) (IMF)
0.55
Unemployment rate (IMF)
0.77
0.77
0.79
GDP per capita (PPP) (IMF)
0.66
0.66
0.65
Inflation (IMF)
0.75
0.75
0.73
Gross Capital Formation at PPPs (PWT)
0.85
0.84
0.84
Source: own table
Parallel Analysis in Figure 6.7 suggests two factors. However, the eigenvalue of the second factor is very small and a one factor solution might be acceptable. The second criterion RMSEA clearly favors a two-factor solution. The RMSEA value for the one factor solution shows significant misfit and is above the threshold of 0.1. The two-factor solution reduces the RMSEA value significantly and it now is in the acceptable range. I therefore extract two factors.
164
6
Aggregating Goal-oriented Performance
RMSEA 1 Factor
0.2
2 Factor
0.058
Source: own table
Source: own figure and table
Figure 6.7 Parallel Analysis for Economic Performance
The factor solution is shown in Figure 6.8. One factor consists of rather strong loadings from the variables GDP per capita, consumption and inflation. Therefore, this factor describes the overall economic prosperity of a country. Thus, this factor can be called “Economic Wealth”. The second factor is created by the unemployment and gross capital formation indicators, so that it measures the productivity of the economy. Employment represents a human productivity factor, whereas gross capital formation can be considered as an investment in the production process. This factor is more short-term focused and should fluctuate more. However, the factor solution narrowly fails to pass the final test of measurement reliability. The omega values for the subscales (ωgroup ) are below the threshold of 0.5 implying that there is a huge measurement error. As discussed in the previous chapter (e.g. certain comparability issues of the data), this measurement error is no surprise.
6.3.2
Environmental Performance
A transformation to a normal distribution is necessary. The raw sample showed very large outlying values. The transformation with the help of Tukey’s ladder of power and Tukey’s folded power transformation brought the values closer to the normal distribution. There are small bulks of values to the left and right of
6.3 Application of the Aggregation Framework …
165
Source: own data
Figure 6.8 Factor Solution for Economic Performance
these distributions due to winsorizing procedure. However, these small deviations should not pose a problem (Figure 6.9).
Source: own data
Figure 6.9 Transformed Sample (Environmetal Performance)
166
6
Aggregating Goal-oriented Performance
The convergence test is shown in Figure 6.10. The algorithm converges to the same value, even though it uses different starting points. As above, only one chain took a little bit longer to converge. Therefore, there is a reasonable indication that algorithm found the global maximum and converged successfully.
Note: 5 chains run from different starng locaons. If all chains move towards the same value indicated by the dashed line, the chains successfully converged. Source: own data
Figure 6.10 Convergence Plot (Environmental Performance)
Are the imputations plausible? The density plots (see Figure 6.11) reveal minor misfits between the observed and imputed values for the variables.3 Overall, imputed values are somewhat higher for the (more local) air pollutants (SOx, NOx, CO), while they are lower for the greenhouse gas emissions. This is somewhat plausible given that the missingness occurs mainly at the beginning of the time 3
I use the following auxiliary variables to support the imputation: GDP per capita (WDI), Educational Inequality (V-Dem), Renewable electricity output (WDI), Contribution of renewables to energy supply (OECD), Political Freedom Dimension (DeMaX), Political Equality Dimension (DeMaX) and Political and Legal Control Dimension (DeMaX). I also include all relevant independent variables discussed in the next chapters, so that the imputed values preserve the relationship between the dependent and the independent variables.
6.3 Application of the Aggregation Framework …
167
Source: own data
Figure 6.11 Density Plots for Environmental Performance
series (1990) and those countries are missing with a lower economic development. A lower economic development due its smaller energy demand might go along with the production of fewer greenhouse gases but simultaneously lower economic development might create more local air pollutants (see Figure 141 in the appendix for a visualization). This shows that the imputation model is able to generate plausible values. The predictive capability is strong (see Figure 6.12). There seems to be no major misfit, and almost all confidence intervals overlap with the straight line, so that the imputation model is able to predict the observed values accurately. I now discuss the diagnostics and results of the exploratory factor analysis. The KMO test shows an overall value of 0.8, when all variables are included (see Table 6.6). In addition, all items show a KMO value above 0.6 with the exception of the waste indicator which shows with 0.35 a very low KMO value (see also the correlation plot in Figure 139 in the appendix). When the waste indicator is removed from the analysis, the overall KMO values rises to 0.86. I therefore drop this indicator from the further analysis.
168
6
Aggregating Goal-oriented Performance
Source: own data
Figure 6.12 LOOCV of Environmental Performance
Table 6.6 Kaiser-Meyer-Olkin (KMO) Test for Environmental Performance Indicator
Set 1
Set 2
Overall
0.8
0.86
Water Abstractions per capita (OECD)
0.65
0.93
Municipal Waste per capita (OECD)
0.35
Greenhouse Gas per unit GDP (OECD)
0.86
0.85
Sulphur Oxide per unit GDP (OECD)
0.81
0.87
Nitrogen Oxide per unit GDP (OECD)
0.78
0.82
Carbon Monoxide per unit GDP (OECD)
0.9
0.9
Source: own table
Parallel Analysis suggests one factors (see Figure 6.13). There is no difference in the RMSEA value of the single or two factor solution. Therefore, I extract only one factor. The RMSEA—in absolute terms—is greater than the favored cutoff value of 0.08 for a satisfactory fit. The one-factor solution is shown in Figure 6.14. All indicators load strongly on this factor with the exception of the water abstraction indicator. Overall, this is similar to Jahn’s (2016) factor solution which combined all these items into
6.3 Application of the Aggregation Framework …
169
Source: own data
Figure 6.13 Parallel Analysis for Environmental Performance
a single factor “General Environmental Performance Index”, even though I had to drop the waste indicator. Therefore, I call this factor “General Environmental Performance Index” (GEP). Finally, the total omega value shows that this scale is reliable (0.905).
Source: own data
Figure 6.14 Factor Solution for Environmental Performance
170
6.3.3
6
Aggregating Goal-oriented Performance
Goal-Attainment Performance
It is not possible to perform multiple imputation and exploratory factor analysis in this performance area due to the limitations of the data. There are only two indicators for the measurement of the goal-attainment performance. These indicators measure the same concept, the amendment rate, but are from two different sources. In contrast, to the previous performance areas, the amendment rates do not resemble time-series data. However, I still transform the variables to be as close to a normal distribution as possible, to support the regression analysis in the next chapters. However, due to the original distribution which is heavily skewed towards 0 (see Figure 142 in the appendix), the approximation of the transformed values to a normal distribution is not successful. The distribution shows heavy tails, and there is not a mass of values in the middle of the distribution which would be expected under a normal distribution (especially for the Lutz indicator). This might cause problems for the empirical causal analysis, so that robust version of regression should be favored here to soften the impact of the distributions (Figure 6.15).
Source: own data
Figure 6.15 Transformed Indicators
6.3 Application of the Aggregation Framework …
6.3.4
171
Social Performance
A transformation of the indicators is necessary (see in Figure 144 the appendix). Especially the distributions of the Gini and poverty indicators are skewed and contain extreme values. A transformation of the values results in distributions that are approximately normally distributed (see Figure 6.16). Due to only slight deviations from the normal distribution, I do not suspect any problems arising from the shape of these distributions.
Source: own data
Figure 6.16 Transformed Sample (Social Performance)
Figure 6.17 shows that the algorithm converges. As with the imputation models of the other performance areas, it took somewhat longer for one chain to converge. However, it passes the convergence test.
172
6
Aggregating Goal-oriented Performance
Note: 5 chains run from different starng locaons. If all chains move towards the same value indicated by the dashed line, the chains successfully converged. Source: own data
Figure 6.17 Convergence Plot for Social Performance
According to the density plot in Figure 6.18, there is a misfit between the observed and imputed values for the variables.4 This is especially relevant for the poverty indicators from the LIS project and the Generosity Index from CWED2 which have over 50% of missing values. As can be seen, the imputation model replaces the missing values with values that indicate a higher inequality than the values for the observed sample. This is plausible given the lower GDP per capita, the higher educational and health inequality of the sample which is missing. This is visualized in in the appendix. This shows that the imputation model is capable of imputing plausible and useful values.
4
I use the following auxiliary variables to support the imputation: GDP per capita (WDI), Educational Inequality (V-Dem), Health Inequality (V-Dem), Political Freedom Dimension (DeMaX), Political Equality Dimension (DeMaX), Political and Legal Control Dimension (DeMaX) and Public Social Expenditure (OECD). I also include all relevant independent variables discussed in the next chapters, so that the imputation and analysis model are similar.
6.3 Application of the Aggregation Framework …
173
Source: own data
Figure 6.18 Density Plots for Social Performance
Predictive performance is overall pretty good. There seems to be only a minor misfit with regard to the Combined Generosity Index, because there are some confidence intervals at the lower end of the scale which do not overlap with the straight line (Figure 6.19). The overall KMO value (0.87) suggest that all variables are factorable (see Table 6.7). The KMO values for the single variables are all far above the threshold (0.5). Therefore, I do not exclude any variables from further analysis. In the parallel analysis, three factors have greater eigenvalues than what would be expected for a random sample. However, the third factor has a very low eigenvalue and is only just above the line for the artificially created eigenvalues. The RSMEA values suggest four factors. However, a closer look at the four-factor solution (see Figure 145 in the appendix) makes it clear that this solution splits the items into too many factors, creating factors with only a single item and thus,
174
6
Aggregating Goal-oriented Performance
Source: own data
Figure 6.19 LOOCV of Social Performance
making the solution not viable for empirical analysis. There is almost no difference in the RMSEA values for the two- and three-factor solution. Therefore, extract only two factors following the more parsimonious solution (Figure 6.20). The two-factor solution is shown in Figure 6.21. Five items load onto one factor, whereas four factors load onto the second factor. I name first factor “Economic Equality”. The items have in common that they measure income inequality in the form of poverty or the overall Gini index. They include also insurances against social risks (Generosity Index by CWED2 and means-tested v. universalistic policy indicator by V-Dem). The second factor represents inequalities on the basis of gender and social groups by focusing on the aspect of equal opportunities. Therefore, I call this factor “Social Equality”. The reliability measured by the omega coefficient is very high for the first factor (economic equality). However, there seems to be huge measurement error in the second factor indicated by a value (0.19) far below the targeted threshold of 0.5.
6.3 Application of the Aggregation Framework …
175
Table 6.7 Kaiser-Meyer-Olkin (KMO) Test for Social Performance Indicator
Set 1
Overall
0.87
Combined Generosity Index (CWED2)
0.93
Poverty Rate (90/10) (LIS)
0.81
Poverty Rate (80/20) (LIS)
0.8
Labor force, female (% of total) (WDI)
0.8
Gini Index (WDI)
0.97
Secondary School Enrollment, Female (WDI)
0.85
Access Public Services by Gender (V-Dem)
0.89
Access Public Services by Social Group (V-Dem)
0.94
Means-tested vs. universalistic policy (V-Dem)
0.87
Source: own table
Source: own data
Figure 6.20 Parallel Analysis for Social Performance
176
6
Aggregating Goal-oriented Performance
Source: own data
Figure 6.21 Factor Solution for Social Performance
6.3.5
Domestic Security Performance
Even though a transformation of the variables improves the approximation to a normal distribution, the raw values showed a very symmetrical distribution (see Figure 147 in the appendix). The only exception are the homicide and theft rate indicators which display extreme values. Most of the transformed variables in Figure 6.22 exhibit only small deviations from the normal distribution and should not pose a problem for the imputation model. Less perfect are the indicators for the homicide rate and reliability of police services because it seems that they have a bimodal distribution which could affect the imputation model.
6.3 Application of the Aggregation Framework …
177
Source: own data
Figure 6.22 Transformed Sample (Domestic Security Performance)
In the following, the imputation model is discussed. The EM algorithm converges normally (see Figure 6.23) and passes this test. As before, one chain took a little bit longer to converge.
178
6
Aggregating Goal-oriented Performance
Note: 5 chains run from different starng locaons. If all chains move towards the same value indicated by the dashed line, the chains successfully converged. Source: own data
Figure 6.23 Convergence Plot for Domestic Security Performance
According to the density plot, there is only a misfit between the observed and imputed values for the government spending variable and (to some extent) for the homicide variable.5 The imputed values for government spending are lower and the imputed homicide rates are consistently higher than the observed values. The homicide rate is explainable, because the values for countries like Argentina, Peru and Senegal are missing, where higher homicide rates should be expected. The less government spending is due to the missing values for low GDP countries. In addition, it can be seen that the imputed values for the reliability of police services are also following a more bimodal distribution (less so for the homicide rate indicator), yielding some confidence in the imputation model (Figure 6.24). 5
I use the following auxiliary variables to support the imputation: GDP per capita (WDI), Confidence in the police force(Gallup World Poll via WGI), Confidence in the judicial system (Gallup World Poll via WGI), Educational Inequality (V-Dem), Means-tested vs. universalistic policy (V-Dem), Social class equality in respect for civil liberty (V-Dem), Power distributed by socioeconomic position (V-Dem), Political Freedom Dimension (DeMaX), Political Equality Dimension (DeMaX), and Political and Legal Control Dimension (DeMaX). I also include all relevant independent variables discussed in the next chapters, so that the imputation and analysis model are similar.
6.3 Application of the Aggregation Framework …
179
Source: own data
Figure 6.24 Density Plots for Domestic Security Performance
The imputation model has a high predictive performance. There seems to be almost no misfit, because all confidence intervals overlap with the straight line. However, the imputation model does have some trouble predicting the lower end of the homicide rates scale. It consistently places the values higher than they should be (Figure 6.25). The overall KMO test with a 0.71 value shows a satisfactory result, suggesting that the dataset is factorable (see Table 6.8). All KMO values for the single variable are far above the threshold of 0.5. This means I subject all indicators to the exploratory factor analysis. The parallel analysis suggests two factors, although the second factor has a small eigenvalue. However, inspecting the factor solution, it is clear that the second factor consists only of the theft indicator (see Figure 148 in the appendix). As discussed in the previous chapter, the theft indicator is problematic insofar as it is unreliable as a proxy and does not measure an aspect of police performance
180
6
Aggregating Goal-oriented Performance
Source: own data
Figure 6.25 LOOCV for Domestic Security Performance
Table 6.8 Kaiser-Meyer-Olkin (KMO) Test for Domestic Security Performance Indicator
Set 1
Overall
0.71
COFOG Public Order and Safety (per capita) (OECD)
0.8
Homicide Rate (UNODC)
0.74
Theft Rate (UNODC)
0.69
Reliability of Police Services (GCR)
0.65
Source: own table
directly. Somewhat unintuitively, it was suggested that a higher number of reported thefts indicated a higher police performance. Therefore, I drop this indicator from the analysis. Now the parallel analysis suggests a one factor solution. The
6.3 Application of the Aggregation Framework …
181
RMSEA value is not calculable because the factor solution consists only of three items. This is shown in Figure 6.26.
Source: own data
Figure 6.26 Parallel Analysis for Domestic Security Performance
The one-factor solution is shown in Figure 6.27. The reliability of police service (GCS) loads strongly on the one factor. Homicide rate and government spending for public order and safety contribute somewhat weaker to the factor scores. I call this factor “Domestic Security”. This solution does have a high internal consistency reliability (0.815).
6.3.6
Latent Pattern Maintenance Performance
In contrast to the other performance areas, this performance area is based on survey data with a large number of respondents. Therefore, I do not apply multiple imputation to replace the missing values for the latent pattern maintenance performance due to the high computational costs. Rather I use full-information maximum likelihood (FIML) methods to deal with missing observations. There are no diagnostic procedures.
182
6
Aggregating Goal-oriented Performance
Source: own data
Figure 6.27 Factor Solution for Domestic Security Performance
The overall KMO value of 0.84 suggest that the variables are factorable (see Table 6.9). All values for the single indicators are above the 0.8 threshold. I therefore include all variables in the exploratory factor analysis.
Table 6.9 Kaiser-Meyer-Olkin (KMO) Test for Confidence Indicator
Set 1
Overall
0.84
Confidence in Parliament (IVS)
0.8
Confidence in Civil Service (IVS)
0.87
Confidence in Judiciary (IVS)
0.89
Confidence in Parties (IVS)
0.85
Confidence in Government (IVS)
0.84
Source: own table
The parallel analysis with all variables suggests three factors. However, the eigenvalue of the second and especially third factor is very small and is only barely above a value which would be expected from a random sample. The RMSEA for the two-factor solution value is a moderate 0.072 (see Figure 150 in the appendix). Looking at the two-factor solution, the factor analysis assigns the confidence in the judiciary to its own factor. This is to be expected conceptually. While the other institutions are all more or less pollical institutions, the
6.3 Application of the Aggregation Framework …
183
judiciary appears to be “the least partisan and least political institution” (Dalton, 2004, p. 35). Therefore, I remove the confidence in the judiciary indicator to get a clearer empirical result: Although the parallel analysis suggests two factors, the eigenvalue of the second factor is very minimal. The RMSEA value improved to a good 0.05 for the one-factor solution. I extract therefore only one factor (Figure 6.28).
Source: own data
Figure 6.28 Parallel Analysis for Confidence
All the items load rather strongly on this one factor (see Figure 6.29). This is especially true for the indicator “confidence in parliament”.6 Less strong loadings and with more unique variance is the indicator for the confidence in the civil service. This might also be conceptually expected, since it is might be seen as less of a political institution than the parliament or government. This solution stands in contrast to the approach by Van Ham and Thomassen (2017). They argue that confidence in government can be used to measure the support for political authorities—the lowest level of political support in the Norris’ typology. However, this 6
Van Ham and Thomassen (2017) argue that confidence in government can be used to measure the support for political authorities—the lowest level of political support in the Norris’ typology. However, this claim is not supported by the factor analysis—confidence in the government does not make up its own factor.
184
6
Aggregating Goal-oriented Performance
Source: own data
Figure 6.29 Factor Solution for Confidence
claim is not supported by the factor analysis—confidence in the government does not make up its own factor. Overall, I call this factor “confidence performance”. Finally, this solution does have a high internal consistency reliability (Total omega of 0.832).
6.4
Summary and Conclusion
This chapter discusses the aggregation procedure which is the final step of the creation of the goal-oriented performance indices. In a first step, I laid out the methodological framework which involved multiple steps from the transformation of the values over the treatment of missing values to the final exploratory factor analysis to gain the factor scores. Thereby, I derived several important statistical tools and diagnostic procedures: The main goal of the transformation of the indicators is to change the shape of the values of the indicators to an approximately normal distribution which supports the multiple imputation procedure and the explanatory factor analysis. The treatment of missing values by using multiple imputation and FIML procedures reduces a potential bias in the empirical analysis and improves statistical power. I applied several diagnostics on the imputation model like the convergence test, predictive performance test (LOOCV) and
6.4 Summary and Conclusion
185
comparison of the density of observed vs. imputed values. Finally, the explanatory factor analysis aggregates the multiple indicators into one or more factors. Again, several diagnostic procedures (KMO-test, parallel test, RMSEA and reliability tests) are used to select variables and test the model fit. The biggest issue is that the combination of the multiple imputation with the explanatory factor analysis is very complicated. Although propagating the uncertainty of the multiple imputation into the factor scores is desirable, the research in this area is still in its infancy, so that this step had to be abandoned. Table 6.10 summarizes the results of this chapter. First, all diagnostic procedures revealed problems and thus, improved the aggregation of the goal-oriented performance measures. These problems were only minor or could be resolved by excluding a specific indicator, allowing the analysis to proceed. Least troublesome was the transformation procedure, as it affected only one variable from the economic performance area. Often, more troublesome was the predictive performance of the multiple imputation algorithm (e.g., inflation indicator and investment indicator in the economic performance analysis, homicide rate in the domestic security performance analysis). The deviation of the imputed from the observed values could often be explained, so that this deviation actually gives the imputation model plausibility. Finally, the reliability of the factors scale is often a major weakness: The preferred scale reliability threshold was not reached or only reached for one factor. It is important to note that this threshold for omega was set for clinical studies with individual respondents and individually designed items and it is unclear whether I can safely assume that this threshold is transferable to cross-country studies. Overall, the empirically extracted factors by the exploratory factor analysis aligned often very well with the concepts created in the previous chapter. The exception is the productivity component of the economic outcome performance which was not conceptually expected. After creating the factor scores and performance indices, I now move on to the empirical analysis of the data. The next chapter 7 analyses the development of the performance areas over time in a descriptive manner. The subsequent chapter 8 provides a causal explanation of the development of these performance areas using TSCS analysis.
Source: own table
Confidence
n/a
approximately normally distributed
Domestic Security Outcomes n/a
good imputation model
good imputation model
n/a
successful aggregation
successful aggregation
Partially successful aggregation
n/a
successful aggregation
Partially successful aggregation
Explanatory Factor analysis
Confidence
Domestic Security
Economic Equality; Social Equality
Amendment Rate
General Environmental Performance (GEP)
Economic Wealth; Productivity
Extracted Factors
6
Latent Pattern Maintenance
approximately normally distributed
Social Outcomes
Integration
strong deviations from normal distribution
Reformability
Goal-Attainment
approximately normally distributed
Environmental Performance
Imputation model with medium predictive accuracy
approximately Imputation model normally distributed with medium with minor deviations predictive accuracy
Economic Performance
Multiple Imputation
Adaptation Function
Transformation
Performance Area
Function
Table 6.10 Summary of Aggregation Procedure
186 Aggregating Goal-oriented Performance
7
Describing Goal-Oriented Performance
7.1
Introduction
Before moving to the causal analysis via regression models, I explore the development of the goal-oriented performance areas. This is important in itself, but it is also an essential first test for the next part of the study: Do these data make sense and do the empirical findings align with prior knowledge? Only when the performance data is valid, it makes sense to continue with the causal analysis in the next chapter where these indices are the main dependent variables. This chapter is descriptive and exploratory. In section 7.2, I develop several research questions, which help to evaluate the capabilities of the data. How many values are still missing after the imputation process? How do these performance areas develop over time? What is the relationship between the different performance areas? Do they go hand in hand or can we identify trade-offs? All these research questions are addressed in the section 7.3. While the evaluation of the first two questions is mainly supported by the presentation of graphics and tables in a descriptive manner, an explorative cluster analysis is carried out to answer the last two question about the relationship between the several performance areas. This clustering strategy is similar to the clustering of the democracy profiles in chapter 2. Finally, the last section (7.4) gives a summary.
Electronic supplementary material The online version of this chapter (https://doi.org/10.1007/978-3-658-34880-9_7) contains supplementary material, which is available to authorized users.
© The Author(s), under exclusive license to Springer Fachmedien Wiesbaden GmbH, part of Springer Nature 2021 O. Schlenkrich, Origin and Performance of Democracy Profiles, Vergleichende Politikwissenschaft, https://doi.org/10.1007/978-3-658-34880-9_7
187
188
7.2
7
Describing Goal-Oriented Performance
Research Questions and Research Strategy
I follow the research strategy of Roller (2005) and Jahn (2016), whose studies serves as a model here. Roller analyzes four aspects of her performance sample: first, the overall performance level over the entire period under study (average for each country); second, the development of performance (the change from one year to the next); and third, the question of international convergence in the performance areas. Finally, she analyzes the structure of performance, namely whether a country can be highly effective in all dimensions, or is there a trade-off between different goal-oriented performance areas? She identifies several theoretical and empirical patterns (sustainability, classical social democracy, libertarian model, best possible case, worst possible case, economic and socio-political straggler as well as economic straggler) (Roller 2005: 210–218). Jahn’s study (2016, pp. 98– 99) on environmental performance examines similar questions but adds one more: What impact did the global financial crisis (2007/2008) have on environmental performance? Therefore, I analyze the following research questions in this chapter: 1. How did the goal-oriented performance areas develop in my sample? Which trends can be identified? Although I analyze the development of the entire sample, this may not be advantageous: The number of countries does fluctuate between the years (e.g. new democracies emerge after 1990; some democracies regress into autocracies) and it is very heterogeneous. Therefore, I focus on the OECD founder countries1 as well. This is a slightly more balanced sample where the observations do not fluctuate as much and it is also a more homogenous sample. For each sample, I compare the median, 10th and 90th percentile values for each year. The median is more robust to outliers than the mean, while the percentile values show the spread and distribution of the performance values. 2. Is there an impact of the financial crisis in 2007/2008 on these different performance areas? I mark the time of the finical crisis in the following plots with a grey bar to facilitate visual inspection. The following theses are examined in relation to the financial crisis: Economic wealth should decline; so, should productivity. Jahn (2016, p. 99) develops two competing hypotheses 1
In my sample, these are Austria, Belgium, Canada, Denmark, France, Germany, Greece, Ireland, Italy, Luxembourg, Netherlands, Norway, Portugal, Spain, Sweden, Switzerland, United Kingdom, United States of America. I exclude Turkey in this sample for stability, because for Turkey I have only a few observations.
7.2 Research Questions and Research Strategy
189
for environmental performance: One the one hand, there should be less industrial air emissions due to economic and production decline. This improves the environmental performance. On the other hand, environmental regulations could be eased to generate economic development at the expense of environmental performance. The financial crisis should hit especially the poorer part of the population (e.g. through social welfare retrenchment), so that economic inequality should increase; there should be no impact on social equality. The rising economic inequality and the economic decline in the wake of the financial crises should lead to an increase in the crime rate and thus to a decline in the performance of domestic security. And finally, confidence in public institutions should decrease. 3. Is there international convergence or divergence? Are the living conditions of the countries becoming increasingly similar? If the countries move to more social and economic inequality, is this movement similar across the countries? Thereby, OECD countries should also be more connected to each other, and they should move more in the same directions regarding the performance areas. I draw on the coefficient of variation (relative standard deviation)2 to describe this development. A lower coefficient value indicates less dispersion in this performance areas, whereas a higher value shows higher variation. 4. Is there a relationship between the different performance areas? Do these performance areas go hand in hand, or are there trade-offs between these goals (e.g. economy vs. environment)? I identify types or patterns of performance using cluster analysis. Instead of using an arbitrary cut-off value to classify these performance types as Roller does, the cluster analysis helps to formalize this process. Nevertheless, although this analysis is more heuristically, it draws on the same cluster validation strategies as in the previous cluster analysis for identifying democracy profiles (see chapter 2). I use the entire sample to cluster to encompass the most heterogenous sample. However, before I answer these questions, it is useful to examine the temporal and spatial distribution of the performance indices. Hence, another research question is: 5. How many values are still missing in these indices? How much did imputation improve the sample size?
2 It is the ratio of the sample standard deviation s to the sample mean x. ˆ It is then multiplied by 100 to convert the value to percentage: C V = xsˆ ∗ 100
190
7
Describing Goal-Oriented Performance
Finally, as mentioned in the aggregation chapter 6, the values of the indices themselves are transformed on a scale from 0 to 100 to allow an easier interpretability. Thereby, it is not possible to conceptualize exactly what is “bad” or “good” performance, so an empirical yardstick is used: The worst performance marks the 2.5th percentile of the factor scores, and the best performance is set to the 97.5th percentile. This kind of transformation was conducted over the whole time-series in order to maintain comparability of these scores over time.
7.3
Descriptive and Exploratory Empirical Analyses
7.3.1
Temporal and Spatial Distribution of the Performance Indices
Figure 7.1 shows the missing values for each index after imputation. The missingness rate is calculated in relation to the whole sample of democracies in the sample. The indices for economic and social equality that measure the performance of social outcomes go furthest back in time, starting in the 1970s. They cover almost 70% of the sample for the most present years. These indices include values for North America, parts of South America, Europe and Australia and Oceania. They even cover (for some time points) India, Mongolia and parts of Southern Africa (see the world maps Figure 151 in in the appendix). The indices for wealth and productivity, which evaluate the economic performance, start since the 1980s and cover around 90% of the cases in the sample for the current years. In contrast to the economic and social equality indices, economic performance encompasses more countries from Eastern Europe. The index for domestic security starts in the 1990s and covers about 75% of all cases in the sample. The spatial distribution is similar to the above indices covering countries from very different regions. The goal-attainment indices, which refer to the amendment rates, have different sample sizes. Although both indices cover data since the 1950s, the amendment rate of the CCP covers almost 100% of cases. The index by Lutz covers about 60% of the sample until 1990 and is then more constrained by only providing data for 25% of the cases. While CCP gives values for most countries in all regions, the Lutz index is mainly focused on Europe. It covers also the USA, two countries from South America (Chile and Argentina), India and Australia. However, the data is not a real time series: the values of this index refer to whole time periods (the duration of the constitutions for the CCP indicators, e.g. the value for Germany does not change since the 1949). The data for environmental performance begin in the 1990s and cover approximately
7.3 Descriptive and Exploratory Empirical Analyses
191
Note: Missingness is calculated in relaon to the whole sample of democracies. Black bars indicate the amount of missing values. Source: own data
Figure 7.1 Missing Values for Each Performance Area After Imputation
50% of all cases. This data is constrained on OECD countries, but also contains data for Argentina, Brazil, India, Indonesia and South Africa. Finally, there is less data for the confidence index in the latent pattern maintenance function of performance. Data collections only started in the 1990s. Only sporadically, it covers around 40% of the cases and often less. However, this is not surprising, since the data are based on surveys which are not conducted on a regular basis. Nevertheless, over the years, the surveys cover North America, South America, Europe, Australia and Oceania. It even includes India and some parts of Southern Africa. What does this imply for the descriptive analysis in this chapter and regressionbased empirical analysis in the next chapter? On the one hand, there is a data for a diverse set of countries spanning through almost all regions. On the other hand, data for such a large number of countries is only available for the recent years and
192
7
Describing Goal-Oriented Performance
often only for a few points in time. Nevertheless, with most indices a time-series analysis is possible. There should be sufficient statistical power for the economic and social performance functions. The same applies to the domestic security performance. The environmental performance can only be analyzed with lesser power due to the high number of missing values and short time span. Finally, no time-series analysis is possible for the goal-attainment index and confidence index due to the nature of the data. The goal-attainment index and confidence index will therefore not be analyzed longitudinally, but across countries by focusing on few points in time.
7.3.2
Development of Goal-Oriented Performance
7.3.2.1 Economic Performance The factor analysis resulted in two factors for the economic performance area: economic wealth and productivity. Figure 7.2 presents the average value for the wealth component for all countries and OECD countries. Both samples show a positive trend: Wealth increased since the 1980s. The dip in the beginning of the 1990s for the sample consisting of all countries is due to the inclusion of the newer democracies. For the OECD countries there is a huge positive trend. This trend stagnated somewhat due to the financial crisis in 2008 though. However, the impact of the financial crisis concerning the wealth component is minimal. It shows also a very small convergence trend: The sample with all countries shows a convergence since the 1990s, whereas the OECD sample constantly converges. Table 7.1 presents the top performer in the wealth component of economic performance. Luxembourg is constantly on the first rank since 1985. Norway holds the second place since the 1990s, but drops to the third place in 2015. Switzerland holds the third rank since 1990s. The United States of America declined from rank 3 (1980) to rank 4 (1990–2010) and finally rank 5 (2015). Denmark is on the fourth resp. fifth rank, but is displaced by the Netherlands and especially Ireland (since 2005). Ireland holds the second place since 2015 (the values of all countries can be seen in Table 68 in the appendix)
7.3 Descriptive and Exploratory Empirical Analyses
193
Note: In the top figures, the thick line represents the median value. The error bars indicate the 90th resp. 10th percenle. The boom figures show the coefficients of variaon (in percentage). A smaller percentage indicates less variaon. In each figure, the Pearson correlaon coefficient indicang the strength of the trend is shown. Source: own data
Figure 7.2 Economic Performance—Wealth
Table 7.1 Top Performer of Economic Performance—Wealth rank
1980
1985
1990
1995
2000
2005
2010
2015
1
CHE
LUX
LUX
LUX
LUX
LUX
LUX
LUX
2
LUX
CHE
NOR
NOR
NOR
NOR
NOR
IRL
3
USA
USA
CHE
CHE
CHE
CHE
CHE
NOR
4
DNK
DNK
USA
USA
USA
USA
USA
CHE
5
CAN
CAN
DNK
DNK
IRL
IRL
IRL
USA
Note: CHE = Switzerland, LUX = Luxembourg, NOR = Norway, USA = United States of America, CAN = Canada, DNK = Denmark, IRL = Ireland Source: own data
194
7
Describing Goal-Oriented Performance
The second component of economic performance, productivity, does not show a clear trend (see Figure 7.3). There are ups and downs. The downwards trend in the beginning of the 1990s in the whole sample and OECD sample is caused by the economic crisis at that time, hitting especially Sweden and Finland. In contrast to the wealth component, the impact of the financial crisis in 2008 is evident, especially for the OECD sample. Although the average value drops, more interestingly is the downwards shift of the 10th percentile: Not all countries are equally affected: Especially Spain and Greece experience a sharp decline (see Table 69 in the appendix). These two economic crises can also be identified with the coefficient of variation: its value rises in the beginning of the 1990s and especially since 2008 indicating a higher variation in these two samples. In between these two crises from the mid-1990s to the mid-2000s, there is small trend towards convergence.
Note: In the top figures, the thick line represents the median value. The error bars indicate the 90 th resp. 10th percenle. The boom figures show the coefficients of variaon (in percentage). A smaller percentage indicates less variaon. In each figure, the Pearson correlaon coefficient indicang the strength of the trend is shown. Source: own data
Figure 7.3 Economic Performance—Productivity
7.3 Descriptive and Exploratory Empirical Analyses
195
Because this data has more variation, Figure 7.4 does not present the top performers in five-year intervals, but it rather shows the countries with the highest positive and highest negative trends: Whereas developing countries like Barbados, Botswana, India, Jamaica or Trinidad and Tobago show a steep positive trend, more developed countries show a negative trend. Luxembourg shows a continuing downwards trend. Sweden experiences a sharp drop in the 1990. Lastly, Island, Cyprus and Greece suffer a negative drop due to the financial crisis.
Source: own data
Figure 7.4 Economic Performance—Productivity (selected sample)
7.3.2.2 Environmental Performance The exploratory factor analysis revealed one factor, namely the general environmental performance. Figure 7.5 presents the empirical findings. There is a strong positive trend and almost no difference between the OECD founding countries and the whole sample. The effects of the financial crisis 2008 have caused this trend to stagnate, but the overall impact is rather small. Furthermore, a trend towards convergence is clearly apparent.
196
7
Describing Goal-Oriented Performance
Note: In the top figures, the thick line represents the median value. The error bars indicate the 90 th resp. 10th percenle. The boom figures show the coefficients of variaon (in percentage). A smaller percentage indicates less variaon. In each figure, the Pearson correlaon coefficient indicang the strength of the trend is shown. Source: own data
Figure 7.5 Environmental Performance—GEP
The top-performers are shown in Table 7.2. The undisputed leader is Switzerland, which has taken the top spot since the 1990s. Japan is in second place, but has fallen back to the third rank since 2000 and left the top five leaders since 2010. Costa Rica takes the second place in 2005 and the third place since 2010. Sweden’s performance has risen from the fifth to the second place in 2010. Finally, there are fluctuations between Austria, Norway, Luxembourg, and Netherlands, which are competing for the last ranks (the values for all countries can be seen in Table 70 in the appendix).
7.3.2.3 Goal-Attainment Performance The amendment rates only show cross-country variation. I therefore visualized the values on a world map (see Figure 7.6). According to the Lutz measure, Japan, Australia, United States of America, Denmark and Spain have the lowest
7.3 Descriptive and Exploratory Empirical Analyses
197
Table 7.2 Top Performer of Environmental Performance—GEP rank
1990
1995
2000
2005
2010
2015
1
CHE
CHE
CHE
CHE
CHE
CHE
2
JPN
JPN
SWE
CRI
SWE
SWE
3
NOR
AUT
JPN
SWE
CRI
LUX
4
AUT
NOR
NLD
NLD
NLD
IRL
5
SWE
SWE
NOR
JPN
LUX
CRI
Note: CHE = Switzerland, LUX = Luxembourg, NOR = Norway, JPN = Japan, SWE = Sweden, AUT = Austria, NLD = Netherlands, CRI = Costa Rica Source: own data
amendment rates. In contrast, frequent amendments can be found in New Zealand, India, Portugal, Austria and Sweden. The CCP amendment rate comes to a somewhat different conclusion because it encompasses more cases and even deficient democracies: It sees among others Tunisia, Timor-Leste, Thailand and Japan as countries with a low amendment rate, while the USA and Australia take a middle position. On the other hand, higher amendment rates can be found in Brazil, New Zealand, Mexico, Austria and Georgia. The difference between the CCP and Lutz measures mainly concerns European countries (e.g. France or Portugal). It is important to highlight that these variables were not subjected to the aggregation and diagnostic procedure from the previous chapters. Values for all countries can be found in Table 71 in the appendix.
7.3.2.4 Social Performance According to the exploratory factor analysis, social performance consists of two factors: Economic equality and social equality. Figure 7.7 presents the development of the first component, economic equality. A rather strong negative trend can be observed for both the whole sample and the smaller OECD sample. Even as countries become wealthier, economic equality has decreased. Interestingly, the error bars of the 10th percentile increased in the beginning of the 1990s, indicating that the newer democracies, which entered the sample after the collapse of communism, have a higher economic inequality. The OECD sample shows a sharp decline in the 1980s and then stabilizes. The impact of the financial crisis in 2008 are obvious, at least for the OECD sample. However, as shown, economic inequality increased even before the financial crisis. The performance of the economic equality factor declined and has not yet recovered. The coefficient of variation rose sharply for the whole sample until 1995, suggesting that the sample
198
7
Describing Goal-Oriented Performance
Source: own data
Figure 7.6 Goal-Attainment Performance—Amendment Rates
became more heterogeneous. Since then, however, the empirical findings indicate a trend towards convergence: All countries are simultaneously experiencing a loss of economic equality.
7.3 Descriptive and Exploratory Empirical Analyses
199
Note: In the top figures, the thick line represents the median value. The error bars indicate the 90 th resp. 10th percenle. The boom figures show the coefficients of variaon (in percentage). A smaller percentage indicates less variaon. In each figure, the Pearson correlaon coefficient indicang the strength of the trend is shown. Source: own data
Figure 7.7 Social Performance—Economic Equality
The top performer of economic equality in the social performance are the Nordic countries (see Table 7.3): Sweden holds the first place almost untouched until 1995. Finland, Denmark and Norway can be found among the top five performers throughout the entire period. Iceland developed to a top-performer since 2000. Austria, Germany and Netherlands—and even United Kingdom –can be found on the fourth and fifth rank in the 1970s. Interestingly, also newer democracies like the Czech Republic and Slovakia are top performers since 1995. For the values of all countries, see Table 72 in the appendix. The second latent factor of social performance is social equality. While the OECD sample shows a steady positive trend, the sample with all countries shows no trend (see Figure 7.8). With the emergence of newer democracies since the 1990s, the 10th percentile increases. This indicates that especially these newer democracies show a lower level of social equality. There is no noticeable effect of the fiscal crisis in 2008. There is a trend towards convergence for the OECD countries, whereas the whole sample shows a stronger diversification. However, this trend stagnates since 2000.
200
7
Describing Goal-Oriented Performance
Table 7.3 Top Performer of Social Performance—Economic Equality rank
1970
1975
1980
1985
1990
1995
2000
2005
2010
2015
1
SWE
SWE
SWE
SWE
FIN
SWE
NOR
NOR
ISL
NOR
2
DNK
FIN
FIN
FIN
SWE
FIN
ISL
SWE
SWE
ISL
3
DEU
DNK
DNK
BEL
NOR
DNK
DNK
DNK
NOR
SWE
4
AUT
NLD
BEL
NLD
BEL
NOR
SWE
ISL
CZE
CZE
5
GBR
BEL
NLD
DNK
DNK
SVK
NLD
CZE
DNK
DNK
Note: CZE = Czech Republic, SVK = Slovakia, FIN = Finland, NOR = Norway, SWE = Sweden, DNK = Denmark, ISL = Iceland, BEL = Belgium, DEU = Germany, AUT = Austria; GBR = United Kingdom Source: own data
Note: In the top figures, the thick line represents the median value. The error bars indicate the 90 th resp. 10th percenle. The boom figures show the coefficients of variaon (in percentage). A smaller percenta ge indicates less variaon. In each figure, the Pearson correlaon coefficient indicang the strength of the trend is shown. Source: own data
Figure 7.8 Social Performance—Social Equality
7.3 Descriptive and Exploratory Empirical Analyses
201
Social equality is highest in the Scandinavian countries Denmark, Sweden and Norway (see Table 7.4). They are only surpassed by Belgium in 1995. Germany, Austria and even Australia are among the top performers until 1985. The Netherlands holds the fifth rank from 1975 to 2000 (with minor interruptions), and since 2005 even the third rank. Also, Ireland can be found on the fifth resp. fourth rank since 2010 (see Table 73 in the appendix).
Table 7.4 Top Performer of Social Performance—Social Equality rank
1970
1975
1980
1985
1990
1995
2000
2005
2010
2015
1
DNK
DNK
SWE
SWE
NOR
BEL
BEL
BEL
BEL
BEL
2
DEU
SWE
DNK
BEL
BEL
SWE
SWE
NOR
NOR
NOR
3
SWE
DEU
BEL
DNK
SWE
NOR
NOR
NLD
NLD
NLD
4
AUS
BEL
DEU
NLD
DNK
DNK
DNK
DNK
DNK
IRL
5
AUT
NLD
NLD
DEU
NLD
FIN
NLD
SWE
IRL
SWE
Note: NLD = Netherlands, FIN = Finland, NOR = Norway, SWE = Sweden, DNK = Denmark, IRL = Ireland, BEL = Belgium, DEU = Germany, AUT = Austria; AUS = Australia Source: own data
7.3.2.5 Domestic Security Performance For the whole sample, there is no clear trend apparent in the domestic security performance area until 2005, but the performance is slightly improving since 2005 (see Figure 7.9). The sharp rise in the performance value in 1995 does not indicate a real improvement in performance but rather the decrease of the sample size excluding countries with a lower performance (from 50 in 1994 to 27 in 1995, see as well the missing values in Figure 7.1). However, there is a positive trend for the OECD sample. In addition, there is no effect of the financial crisis visible. Nevertheless, the impact is concentrated on countries most affected by the financial crisis: there is a decline in domestic security performance for single countries like Greece, Portugal and Cyprus (see Table 74 in the appendix). The coefficient of variation also fluctuates without a clear trend. The top performers of domestic security performance belong to the Nordic countries (see Table 7.5): Iceland, Norway, Denmark and Finland can be found on the top ranks for the whole period from 1990 to 2015 (however, Sweden is not a top performer). Switzerland improves from rank 5 to rank 1 since 2005, while Iceland drops out of the top performers. Luxembourg improves to the second rank. Other top performers in this area are Germany and Australia.
202
7
Describing Goal-Oriented Performance
Note: In the top figures, the thick line represents the median value. The error bars indicate the 90th resp. 10th percenle. The boom figures show the coefficients of variaon (in percentage). A smaller percentage indicates less variaon. In each figure, the Pearson correlaon coefficient indicang the strength of the trend is shown. Source: own data
Figure 7.9 Domestic Security Performance
Table 7.5 Top Performer of Domestic Security Performance rank
1990
1995
2000
2005
2010
2015
1
ISL
ISL
CHE
CHE
CHE
CHE LUX
2
NOR
CHE
LUX
ISL
LUX
3
JPN
NOR
ISL
NLD
NLD
NLD
4
DNK
DNK
NOR
LUX
ISL
NOR
5
CHE
DEU
DEU
DEU
AUS
AUS
Note: NLD = Netherlands, NOR = Norway, DNK = Denmark, ISL = Iceland, CHE = Switzerland, LUX = Luxembourg, DEU = Germany, AUS = Australia Source: own data
7.3 Descriptive and Exploratory Empirical Analyses
203
7.3.2.6 Confidence Performance The confidence in political institutions is measured in the latent pattern maintenance function: The higher the confidence in institutions, the higher the performance in this area. I have estimated country-means by aggregating the survey data using the survey weights provided by the ESS and WVS. In order to compensate for the irregular collection of survey data over time, I aggregated the data in a five-year-interval starting from 1990 onwards. There might be a small decrease in confidence but overall, no apparent trend is discernible for the entire sample and for the smaller OECD sample (see Figure 7.10). This is surprising, given other research that states that there “is a contemporary malaise in the political spirit involving the three key elements of representative democracy
Note: In the top figures, the thick line represents the median value. The error bars indicate the 90th resp. 10th percenle. The boom figures show the coefficients of variaon (in percentage). A smaller percentage indicates less variaon. In each figure, the Pearson correlaon coefficient indicang the strength of the trend is shown. Source: own data
Figure 7.10 Latent Pattern Maintenance Performance—Confidence
204
7
Describing Goal-Oriented Performance
[…]: politicians, political parties, and parliament” (Dalton, 2004, p. 37; similar Pharr & Putnam, 2000). Or in a similar vein, that “satisfaction with democracy and intermediary political institutions declined considerably in West European democracies” (C.J. Anderson & Guillory, 1997, p. 67). The financial crisis had no impact on the confidence in institutions. However, there is a trend towards divergence for both the whole sample and the OECD sample. The confidence values between the countries are more spread out. The drop in 2005 is artificially caused by the smaller sample size for this period in the OECD sample. Values for all countries can be found in Table 75 in the appendix.
7.3.2.7 Discussion of Development of Goal-Oriented Performance What do these empirical findings imply for the hypotheses formulated in the beginning of this chapter? Over the years, there seems to be significant improvements in the economic performance as well as the environmental performance. At least for the OECD countries, there is also an improvement in the social equality performance and domestic security performance. However, there are also negative developments: There is a troublesome rising of economic inequalities. This partially confirms the Post-democratic theories (Crouch, 2004; Merkel, 2014) which assume rising economic inequalities. Turning to the hypothesis about the impact of the financial crisis, the findings are rather mixed. There is a clear impact for the productivity component of economic performance, and there is also evidence for an impact within the economic equality and domestic security performance. However, often not all countries are affected equally, but it seems that the impact on the performance areas is concentrated on few countries, which were hit the hardest by the financial crisis (Spain, Greece, Portugal and Cyprus). Finally, there is some evidence for the convergence thesis. There is a strong convergence movement in the environmental performance, to a lesser extent in the economic wealth component and in the aspect of economic inequality. In addition, there is more evidence that the financial crisis hit the countries differently, because often the financial crisis coincides with a spike in the coefficient of variation (for instance economic performance). Table 7.6 summarizes the findings.
7.3 Descriptive and Exploratory Empirical Analyses
205
Table 7.6 Summary Table for the Development of Goal-Oriented Performance Goal-Oriented Performance
Factor
Trend for all Trend for Impact of Convergence democracies OECD Financial countries Crisis
Economic Performance
Wealth
Positive
Positive
No
Yes
Productivity
No trend
No trend
Yes
No
Environmental General Positive Performance Environmental Performance
Positive
No
Yes
Social Performance
Economic Equality
Negative
Negative
Yes
Yes
Social Equality
No trend
Positive
No
Yes
Domestic Security Performance
Domestic Security
No trend
Positive
Yes
No pattern
Latent Pattern Maintenance Performance
Confidence in Institutions
No trend
No trend
No
No
Note: Goal-Attainment Performance is not included in this table as it is not a time-series. Source: own data
7.3.3
Types of Goal-Oriented Performance
7.3.3.1 Methodology In this section, I try to find patterns in the data by applying a cluster analysis to answer the question, whether there is a relationship between the different performance areas. Do these performance areas go hand in hand, or are there trade-offs between these goals (e.g. economy vs. environment)? However, it is important to note that the goal of this clustering chapter is to understand and inductively learn from the data generated. Therefore, this section is more heuristic than the previous analysis on the clustering of democracy profiles. I do not intend to find the “best” or “most valid” solution here and will therefore apply the validity criteria more loosely. Nevertheless, I apply the same steps as discussed previously in chapter 2 (There is a more detailed description of cluster analysis algorithms and validation strategies used here).
206
7
Describing Goal-Oriented Performance
The central cluster aim is to identify performance cluster which are very similar in their characteristics. Hence, the clusters should show within-cluster homogeneity. I therefore evaluate this goal with the two internal validity criteria, the average within-cluster dissimilarity (Iave.wit ) and the Pearson Gamma (I Pear son ) (see chapter 2 for a definition). I also use the random cluster solutions (Akhanli & Hennig, 2020; Hennig, 2019) to calibrate and aggregate those fit indices. Also, the stability or robustness of the cluster solution is checked. For the clustering process, I use all performance indices except the Lutz index for the goal-attainment performance. The reason is that the spatial distribution for this indicator by Lutz is very limited and would reduce the overall sample size too much and hinder the identification of clusters. To increase the sample size as much as possible, I average all existing performance values for each country in three ten-year intervals. Thus, I obtain 100 observations for 37 countries for the three decades (1990–1999, 2000–2009 and 2010–2017).
7.3.3.2 Empirical Results Figure 7.11 shows the results of the calibrated cluster validity indices. The aggregated index shows two peaks: For the k-means algorithm, it is clearly the three-cluster solution, while PAM seems to favor nine clusters. On the disaggregated component, Iave.wit , k-means produces also good results for a partition into two and four clusters. On the other hand, the PAM algorithm shows also high values for a split into four and five clusters. The I Pear son resembles the aggregated index. Thus, the cluster validity indices suggest the following number of clusters: 2, 3, 4 for k-means and 4, 5 and 9 for PAM. A valid clustering is also stable. However, the robustness of the cluster solutions varies significantly (see Table 7.7). The stability is fine for the two-cluster solution and okay for the three-cluster solution. The stability of all other cluster solutions is problematic implying that there is a serious overlap between the clusters so that the clusters are not clearly separated from each other. According to Hennig (2018, p. 44), a cluster becomes unstable when the Jaccard value drops below 0.6. It is even “dissolved”, if the value drops below 0.5. This means that all other cluster solutions have at least one unstable cluster, dissolved clusters occur in all PAM cluster solutions. However, since this analysis is mostly heuristic, I include the k-Means four-cluster solution with one unstable cluster but exclude all PAM solutions with dissolved clusters.3
3
The four-cluster solution of k-Means and the four-cluster solution of PAM are very similar.
7.3 Descriptive and Exploratory Empirical Analyses
207
Source: own data
Figure 7.11 Calibrated Cluster Validity Indices
Table 7.7 Stability of the Cluster Solution Clustering
Clusterwise Jaccard Values
2 Cluster Solution (k-Means)
0.97, 0.96
3 Cluster Solution (k-Means)
0.67, 0.66, 0.61
4 Cluster Solution (k-Means)
0.68, 0.72, 0.56, 0.62
4 Cluster Solution (PAM)
0.77, 0.62, 0.48, 0.52
5 Cluster Solution (PAM)
0.7, 0.57, 0.49, 0.47, 0.46
9 Cluster Solution (PAM)
0.62, 0.43, 0.54, 0.69, 0.56, 0.67, 0.88, 0.78, 0.42
Source: own data
We now turn to the empirical findings of the cluster analysis. The two-cluster solution splits the dataset into a low and high performing group (see Figure 7.12). These performance profiles are symmetrical shaped. All performance areas are on an identical level, so that this solution is not able to capture trade-offs or differences. The low performing group consists of countries from South America,
208
7
Describing Goal-Oriented Performance
Note: The world-map shows the most recent classificaon of a country (typically 2010). Source: own data
Figure 7.12 Principal Component Plot, Boxplot and World Map for the 2 Cluster Solution
Southern and Eastern Europe (at least for the 1990s period) and the Baltic countries, while the top-performing group contains North America, the Scandinavian and Central-European countries (as well as Japan and Australia).Par40
7.3 Descriptive and Exploratory Empirical Analyses
209
Par41The Figure 7.13 for the 3-cluster solution is very similar to the previous solution but introduces an additional medium performing group. As before, the clusters of this solution are almost symmetrical shaped (with the exception that the lowest performing group shows a somewhat higher goal-attainment performance and latent pattern maintenance performance). The medium performing group is
Note: The world-map shows the most recent classificaon of a country (typically 2010). Source: own data
Figure 7.13 Principal Component Plot, Boxplot and World Map for the 3 Cluster Solution
210
7
Describing Goal-Oriented Performance
especially characterized by a very low economic performance, high economic and social inequalities and low domestic security performance. The high performing group consists of Central-European countries (e.g. France, Germany, Switzerland) and the Nordic countries (Sweden, Denmark, Norway). USA, Canada, Australia and New Zealand also belong in this group. At the opposite end, there is a low performing group. Especially Latin American countries like Colombia, Brazil, Argentina can be found in this group. However, India and Turkey belong to this group as well. In contrast to the two-cluster solution, countries from Eastern Europe and Southern Europe (Spain, Greece) are now part of the medium performance group as well as the Baltic countries (Latvia, Estonia, Lithuania). The four-cluster solution is presented in Figure 7.14. The principal component plot shows that the clusters have serious overlap with the other clusters. This highlights the instability of this solution. Nevertheless, it is the first cluster solution which shows some asymmetrical clusters. The clusters have the following characteristics: There is still the cluster which has a top performance in all areas. However, the number of countries belonging to this cluster is significantly reduced: Now only Central European countries (Germany, Austria, Switzerland) and Nordic countries (Iceland, Sweden, Norway, Finland) belong in this group. The lowest performing group is exactly similar in their characteristics to the previous cluster solution. Also, group membership has not changed. In contrast to the previous solution, the medium performers were split into two subgroups: One subgroup is characterized by a higher social performance, while it lacks economic, environmental and domestic security performance. The other group has almost the opposite characteristics: higher economic performance goes hand in hand with a lower social performance and a somewhat lower environmental performance. Eastern European countries, Baltic states, Greece and Portugal belong to the former group, while the United States, Chile, Australia, United Kingdom and Japan belong to the latter group. Somewhat surprisingly, Spain and Italy can also be found in this group.
7.3.3.3 Discussion of Clustering of Goal-Oriented Performance The cluster analysis revealed that a variety of different performance profiles could be identified. However, it showed that there is serious overlap between the clusters and that therefore stable cluster solutions for more than three differentiated clusters could not be obtained. This might not be surprising given the empirical complexity which is created by comparing such a large number of performance areas simultaneously. Nevertheless, it can be argued that the clusters found have some validity: Often these clusters coincide with regions and align with theoretical assumptions. For instance, the USA has a performance profile with a lower
7.3 Descriptive and Exploratory Empirical Analyses
211
Note: The world-map shows the most recent classificaon of a country (typically 2010). Source: own data
Figure 7.14 Principal Component Plot, Boxplot and World Map for the 4 Cluster Solution
social performance, Scandinavian countries are top performers, or that there is a difference in the performance profile between Southern European countries and Central European countries. This gives some confidence in the cluster analysis. Overall, it seems that performance areas can be reconciled. In every solution I found clusters which balance the different performance areas. There is also
212
7
Describing Goal-Oriented Performance
no trade-off between economic and environmental performance. The only asymmetrical shape which was found here and can be understood intuitively is the combination of a strong economic performance with a lower social performance (and vice versa). A strong economy can go hand in hand with social and economic inequalities. Furthermore, it seems that the goal-attainment performance (amendment rate) and the latent pattern maintenance performance (confidence) is not useful for the differentiation of the clusters: Top performance in all areas can be achieved with either a very high or very low amendment rate. Not so easily understandable is the differentiation of high and low confidence performance in the “laggard”-group. How is it possible that these systems generate so much confidence even though they lack performance in all other areas? How can these findings be compared to Roller’s (2005) approach? In contrast to Roller, I included a larger and diverse set of countries which is spanning across regions (e.g. India, South American countries). In addition, I used a different methodological approach by not applying a subjective threshold but rather let the clustering algorithm with some diagnostics decide how to split and classify the dataset. However, there are some similarities to the types found by Roller: I identify almost the same countries (Scandinavian countries, Austria and France) as the best possible cases which have high performance in all areas. I also identified a cluster almost similar to Roller’s “libertarian model” which combines high economic performance with lower environmental and social performance. While there was a large gap between economic and social performance, the difference to the environmental performance was smaller. Nevertheless, in contrast to Roller, I identified more balanced performance profiles, e.g. I was not able to identify a “classical social democracy” which has only a high economic and social performance.
7.4
Summary and Conclusion
This chapter analyzed in a descriptive and exploratory manner the performance indices which where conceptual defined, measured and aggregated in the previous chapters. A deep understanding and knowledge of this performance data is not only interesting in itself, but is also relevant for this study, because it is a presupposition for the causal analysis in the next chapter where these indices are the main dependent variables. Thus, based on the research literature, I have formulated and answered several research questions: First, how did the multiple imputation procedure improve the temporal and spatial distribution? Temporal and spatial distribution varies considerably among
7.4 Summary and Conclusion
213
the indices. For example, the economic and social equality indices start since the 1970s, while the data for the environmental indices starts only in the 1990s. However, it is surprising that countries from a variety of regions are included in the analysis. However, as stated before, often non-OECD countries are only included with a few time points. The second set of research question refers to the development of the performance areas. Are there any trends? Was there an impact of the financial crisis? And finally, do these states move in tandem, so that their performance converges? Significant improvement could be identified for the economic and environmental performance. However, a troublesome negative development in the form of increasing economic inequalities was also apparent. The financial crisis impacted all countries in some performance areas (productivity component of the economic performance), but its effects concentrated on few countries (Spain, Greece, Portugal and Cyprus). These countries suffered performance losses in areas where the whole sample or the OECD sample did not show a deterioration. Finally, there is a strong convergence movement in the environmental performance, to a lesser extent in the economic wealth component. The last research question addresses types or profiles of performance. What is the relationship between the performance areas? Do these performance areas go hand in hand, or are there trade-offs between these goals? To answer this question, I applied a cluster analysis to the data for the performance areas. The internal validity criteria revealed that several solutions from only two clusters up to nine clusters can be identified. However, the solutions with a higher number of clusters were very unstable, indicating that these solutions are not very robust to small changes in the dataset. All in all, I found symmetrical, balanced clusters (top performers, medium performers, laggards) which have an almost identical level of performance in all areas. However, there is particularly one asymmetrical shaping that combines high economic performance with low social performance (and somewhat lower environmental performance). It seems too far-fetched to speak of a trade-off between economic and environmental performance based solely on this type. Therefore, all in all, no trade-offs were found. And finally: Did the data make sense? Overall, these empirical findings show that the performance data have a high level of validity. For example, the data show the impact of the financial crisis, it shows a declining economic equality, which is also found in several research articles. It also makes intuitive sense which countries were classified as top performers or laggards. The only problematic data is the goal-attainment performance in the form of the amendment rates by Lutz and the CCP, which differ considerably for some European countries. Nevertheless, the main causal analysis can be conducted in the next chapter.
Part III Explaining Performance
8
Explaining Goal-Oriented Performance
8.1
Introduction
This chapter is the first part of the causal analysis of this study. A connection between the performance measures and the democracy profiles is being made. I explain whether democracy profiles have a causal effect on goal-oriented performance. Specifically, this means that the democracy profiles are the independent variables, while the performance areas are the dependent variables. The next chapter 9, the second part of the causal analysis, explains why countries adopt a specific democracy profile as well as whether democracy profiles are linked to other performance types such as varieties of capitalism or welfare states. Here, the democracy profiles are the dependent variable. In order to establish a reasonable and meaningful connection between democracy profiles and the goal-oriented performance of a country in several performance areas, I address two aspects in the literature review in section 8.2: The first is the theoretical basis, while the second concerns the creation of an adaptable and robust statistical model that can be applied to the empirical analysis of this highly complex type of data. I build the theoretical foundation and hypotheses on a discussion of the theoretical assumptions and empirical findings of several key studies on goal-oriented performance. To elaborate the statistical model, I also include a critical assessment of the methodology and robustness checks used in these studies. The theoretical discussion results in the identification of three causal effects of the democracy profiles on performance (section 8.3). I Electronic Supplementary Material The online version of this chapter (https://doi.org/10.1007/978-3-658-34880-9_8) contains supplementary material, which is available to authorized users. © The Author(s), under exclusive license to Springer Fachmedien Wiesbaden GmbH, part of Springer Nature 2021 O. Schlenkrich, Origin and Performance of Democracy Profiles, Vergleichende Politikwissenschaft, https://doi.org/10.1007/978-3-658-34880-9_8
217
218
8
Explaining Goal-Oriented Performance
use a time-series cross-sectional Bayesian multilevel framework for this analysis (section 8.4), which possesses the flexibility needed to overcome most of the methodological criticisms mentioned in the literature review. Section 8.5 applies and evaluates the theoretical and methodological considerations to the actual data and thus, tests the hypotheses. The last section (8.6) of this chapter summarizes all findings.
8.2
Literature Review
8.2.1
Criteria for the Literature Review
The central reference point in the discussion of models of democracy and political performance is the study “Patterns of Democracy” by Lijphart (1999, 2012), which is considered a landmark work in comparative political science (Kailitz, 2007; Bormann, 2010, p. 1). Therefore, it is widely discussed and sparks considerable criticism. His study and its critique serve to develop important test criteria, which then guide the further literature review. Lijphart analyses the effects of majoritarian and consensus democracy on various performance outcomes. He states that his analysis is directed against the “conventional wisdom” that majoritarian democracies can be more effective in governing while consensus democracy is better at representing. He partially refutes these propositions due to his empirical findings that majoritarian democracies are “not superior to consensus democracies in providing good governance, managing the economy, and maintaining civil peace” (Lijphart, 2012, p. 273); rather, consensus democracies are often superior to majoritarian democracies in these performance measures (e.g. reducing unemployment). In line with “conventional wisdom”, he finds strong evidence that consensus democracy has “kinder and gentler qualities [than majoritarian democracies:] they are more likely to be welfare states; they have a better record with regard to the protection of the environment; they put fewer people in prison and are less likely to use the death penalty; and the consensus democracies in the developed world are more generous with their economic assistance to the developing nations” (Lijphart, 2012, pp. 274–275). However, these significant results are limited to the executive-parties dimension only, for the federal-unitary dimension he finds no notable results. Nevertheless, these findings make him prefer consensus democracy to majoritarian democracies, and he notes that the “consensus option is the more attractive choice for countries designing their first democratic constitutions or contemplating democratic reform” (Lijphart, 2012, p. 296).
8.2 Literature Review
219
This study attracted considerable criticism. I summarize the critique following Müller-Rommel’s (2008) distinction between a theoretical and methodological criticism.1 The main criticism regarding the theoretical component of this study is twofold: First, there is a lack of a theoretical explanation of the relationship between democracy profiles and performance in the sense of unclear or incomplete causal mechanisms. Lijphart argues that the performance advantage of consensus democracies mainly stems from the higher support from all societal sectors for policies which are decided by broad consensus. Moreover, because of the possibility of a rapid alternation of governments, majoritarian democracies can bring about “sharp changes [in policies] that are too frequent and too abrupt” (Lijphart, 2012, p. 257). Finally, consensus democracies are more stable in ethnically fragmented societies because they have the possibility to involve more groups in the decision-making process. However, as Bogaards (2017a, p. 15) states, “how this should translate into lower unemployment or, say, more development aid is not elaborated. Lijphart goes directly from measuring type of democracy to performance, without developing a theory linking the two”. Schmidt (2015, p. 38) gives a concrete example: he points out that the central bank and federalism, two important institutions that generally considered to affect inflation, are only part of the federal-unitary dimension but not the executive-parties dimension. Then, how does consensus and majoritarian democracy in the executive parties dimension influence the inflation rate (Schmidt, 2015, p. 38)? Similarly, Bormann (2010, p. 9) argues that the positive and significant effect for consensus democracy on the macroeconomic performance “largely depends on the connection to corporatism and central bank independence”. Critics argue that both factors are not logically related to the concept of consensus democracy and therefore question whether they should be assigned to the executive-parties or federal-unitary dimension (Müller-Rommel, 2008, p. 86). Second, his approach lacks the appropriate control variables. Other theories from the public policy approach are not considered (Schmidt, 2015). The effects of political actors and external factors such as globalization are not taken into account. Two of his three control variables (population size and human development) are based solely on the socioeconomic approach, while the third variable (ethnic fractionalization), which he only includes in his regression on political violence, is still a structural variable but has cultural implications. According to
1
There is also a considerable critique of the conceptualization and measurement of consensus and majoritarian democracy (Müller-Rommel, 2008; Schmidt, 2019; Croissant, 2010; Bogaards, 2017a).
220
8
Explaining Goal-Oriented Performance
Schmidt (2015), the inclusion of the ideological composition of the governmental parties as a control variable would have the effect that Lijphart’s institutional variable, the executive-parties dimension, becomes insignificant for welfare spending. He states that the “analysis of public policy and of variations in policy profiles will have to go beyond the institution-centred approach that Lijphart has primarily used and will have to consider—in addition to institutions and socioeconomic realities—actors, power resources, policy inheritance and the interaction between regime type and government capacities, such as mechanisms of governance” (Schmidt, 2015, p. 43). In addition to these theories, informal institutions (Lauth, 2000, 2004; Helmke & Levitsky, 2004) should also be considered. Institutions in the neo-institutionalist sense can be defined as “norm patterns which shape behaviour, and which in turn structure societal action and enhance the security with which citizens can expect reciprocal behaviour from fellow citizens” (Lauth, 2000, p. 23). In contrast to formal institutions, informal institutions are sets of rules that are not officially codified. Informal institutions interact with formal institutions in various ways. The relevant case here is when informal institutions obstruct the working of formal institutions. Lauth (2010a) states that Lijphart misses this aspect and thus, overestimates the impact of formal institutions. Informal institutions do not only distort the measurement and classification of majoritarian and consensus democracy, but in particular they distort the relationship between these democracy models and the different performance outcomes: Consensus democracies in corrupt countries operate differently than consensus democracies in contexts with low corruption. By including countries with strong informal institutions in his 1999 and 2012 sample (for instance, Argentina, India etc.), Lauth (2010a, p. 55) argues that Lijphart partially invalidates his analysis and conclusions in comparison to the more homogenous sample of the 1984 study. Another important factor is statehood. Statehood can be defined as the functioning of the monopoly on the use of physical force and administration (Schlenkrich et al., 2016b, 2016a). Lijphart overlooks that aspect, and Fortin (2008b) notes that this may be another reason why Lijphart’s concept cannot easily be generalized to other regions of the world. The performance of majoritarian and consensus democracy also depends on the “role of the state in different contexts, more specifically, its capacity to penetrate, regulate, extract and appropriate resources” (Fortin, 2008b, p. 216). The last two points, informal institutions and statehood, are especially important to consider if the analysis includes countries outside Western Europe or North America. Having dealt with the theoretical component of the criticism, I now turn to the methodological review of Lijphart’s approach. Müller-Rommel (2008, p. 88)
8.2 Literature Review
221
criticizes the sample selection resulting in a bias. The higher performance of consensus democracies, in contrast to majoritarian democracies, is not based on the higher effectiveness of consensus democracy, but on the poorer economic condition of the majoritarian democracies in Lijphart’s sample. In addition, there are at least three other important methodological aspects: First, is the statistical model adequate for the research question? This can be questioned for Lijphart’s approach. He applies a simple multivariate regression, although he deals with time-series cross-sectional (TSCS) data (see section 8.4 for detailed explanation). He averages the data for all countries over certain time periods. However, these time-periods are arbitrary determined, both for the dependent as well as the independent variables. Due to this artificial reduction of the sample size it also loses a lot of statistical power. TSCS analysis can “solve some of the classical problems of having too many explanatory variables for too few cases” (Fortin-Rittberger, 2014, p. 390). Are diagnostic procedures and robustness checks performed to test the assumptions of the statistical model (Gelman & Hill, 2007, Section 3.6; King & Roberts, 2015; Meuleman et al., 2014)? Important aspects are for instance the detection of outliers, residual analysis and posterior predictive checks (see section 8.4 for a detailed discussion). Lijphart discusses the role of outliers throughout his empirical analysis and even removes them where necessary. However, he does not give proper reasons for removing these observations from the statistical model (Schmidt, 2019, p. 336). He does not discuss other diagnostic procedures. Finally, although missing data represents a problem in his analysis (Lijphart, 2012, p. 267), he does not consider resolving this issue using appropriate methodological approaches but decides to continue with a smaller sample size. As I have discussed in the previous chapter 6, not taking missing values into account may distort estimates and weaken statistical power. Multiple imputation strategies can be applied to recover missing data. All these are important points of criticism, and these issues need to be solved for an adequate theoretical framework and robust empirical analysis. I generalize these critical questions, resulting in a discussion template that guides the following literature review (see Table 8.1): As far as the theoretical component of the reviewed studies is concerned, I investigate, on the one hand, the causal mechanism linking the democracy models to the performance outcomes, and, on the other hand, I examine if other important theoretical components are taken into account in the analysis. Regarding the methodological component, I focus on the appropriateness of the statistical model for the research question, the applied diagnostic and robustness checks and the handling of missing data. I will start
222
8
Explaining Goal-Oriented Performance
with studies which have a broad definition of performance, encompassing different areas of performance. Besides Lijphart’s approach, which I discussed above, three other studies fit in this category: Roller (2005), Gerring/Tacker (2005; 2008) and Doorenspleet/Pellikaan (2013). Then I discuss studies that have a narrower focus by targeting single performance areas. This is done in the order of the AGIL typology of political performance (Adaptation, Goal-Attainment, Integration and Latent Pattern Maintenance) created in this study. Table 8.1 Literature Review Criteria Theoretical Component
What is the causal explanation for the effects of the democracy model on performance outcomes? Are other influential factors considered as control variables? These include public policy theories (Schmidt, 2001) such as economic modernization, power resource theory, partisan theory, international factors, historical path dependence and cultural explanations. In addition, the effects of informal institutions and statehood should be considered. The last two points are especially important if the sample consists of very heterogeneous countries.
Methodological Component Which statistical model is employed? Is it appropriate for the research question? Which diagnostic procedures and robustness checks are applied (e.g., to detect outliers)? How is missing data dealt with? Source: own table
8.2.2
Literature Review: Democracy Models and Performance
Roller’s (2005) “theoretical model for explaining the performance of liberal democracies” is probably one of the best elaborated of all studies reviewed. Her approach includes a central actor, the government. Government behavior is the most important explanatory variable—through the creation of policy outputs—to explain the outcomes. This underlines the importance of political actors in Roller’s model. Based on rational choice institutionalism, the government’s corridor of action and decision-making is limited by several contextual factors: First, the degree of socio-economic modernity, which limits government actions financially. Second, economic globalization includes external pressures on the government.
8.2 Literature Review
223
Third, the institutional characteristics of the government are influential and represent Roller’s main research interest: She distinguishes democracy models based on formal (constitutional majoritarian resp. negotiation democracies) and informal structures (informal majoritarian resp. negotiation democracies). While the formal structures represent constitutional veto points (e.g. federalism, bicameralism), informal structures regulate the “relationship between governing and opposition parties” (Roller, 2005, p. 127). In this respect, this distinction is closely linked to the competitive and collective veto points by Birchfield and Crepaz (see for a discussion chapter 2). It also partly resembles Lijphart’s distinction between the executive-parties and federal-unitary dimension. Finally, culture is taken into account on a theoretical level, even if it is not included in the final explanatory model due to data constraints. In addition, from all the studies presented here, Roller’s approach has the broadest analytical perspective: It not only focuses on different performance areas, but she also analyzes three different research questions regarding performance: What causes stability or variability of performance? What influences the level of performance and what affects the structure of performance (i.e. the relationship between different performance areas)? While I discuss her findings regarding the structure of performance in chapter 7, I focus here on the explanation of the level of performance. She finds that constitutional negotiation democracies show a somewhat higher economic performance, while they also show lower health and environmental outcomes. In contrast, “informal negotiation democracies generally produce better policy performance than informal majoritarian democracies, particularly in social policy and poverty rates, in municipal waste production” (Roller, 2005, p. 252). This means that she can partly confirm Lijphart’s findings of the “kinder and gentler” qualities of informal negotiation democracies or consensus democracy on the executive-parties dimension. Her findings are not based on TSCS analysis, even though her data would allow such a procedure. Instead, the dependent variables are the average values of performance from 1974 to 1995 for the countries. However, averaging leads to a loss of information by creating an undifferentiated blend over different time periods, and by reducing the sample size to just 21 observations. Such a low sample size results in a very low statistical power making it hard to detect any significant effects. In accordance to her theory, she uses GDP per capita, openness of the economy (imports and exports as % of GDP) and
224
8
Explaining Goal-Oriented Performance
ideological orientation of the government (Schmidt-index2 ) as control variables. Finally, several diagnostic procedures are applied (e.g. test for multicollinearity). Gerring/Thacker (2005; 2008) develop a theory about “centripetal democratic governance”. They test their assumption of the better performance qualities of centripetal democracies (PR, parliamentary, unitary, unicameral) compared to decentralized democracies (plurality vote, presidential, federal, bicameral). Based on their notion of good governance, they test several different indicators (e.g. bureaucratic quality, tax revenue, participation, GDP per capita, infant mortality, life expectancy, participation) which they sort into three different categories (political development, economic development and human development). In contrast to the other authors, they concentrate on filling a theoretical blind spot by providing more theoretical arguments and more detailed explanations why centripetalism should lead to a higher performance on these indicators: They state that “centripetal institutions encourage strong political parties, corporatist-style interest representation, collegial decisionmaking, and authoritative public administration” (Gerring et al., 2005, p. 568). Strong political parties encourage party government which in turn leads to the focus on public interest: “political leaders climb a ladder of abstraction—from parochial interest, to party interest, to the public interest” (Gerring & Thacker, 2008, p. 37). Furthermore, successful conflict mediation implies the recognition of different interests and groups within the decision-making process. They do not discuss this in a more general way as policy conflicts but rather reduce conflict mediation to ethnic conflicts. This makes the whole argument relatively thin: Even though ethnic conflict mediation is surely important; this kind of conflict is only relevant for few democracies. Therefore, a more powerful argument would have been to explain how centripetal institutions resolve disputes in general policy conflicts and support an improved policy formulation. They argue that unitarism, parliamentarism and PR systems reduce ethnic violence. Whereas these arguments are sound for parliamentarism and PR electoral systems, because parliamentarism especially in combination with PR electoral systems can create a “share-able” government and thus, process diverse ethnic interests, it is rather weak for unitarism: Consociationalism and the research about state-nations (Stepan et al., 2010) show that federal arrangement can mediate ethnic conflict (see as well Schneckener, 2004b), while unitarism can intensify the conflict. The last causal mechanism for the favorable performance of centripetal institutions is the authoritative and deliberative policy coordination: 2
The Schmidt index (Schmidt, 1992) is a five point scale measuring the ideology of the governmental party: 1 Hegemony of right-wing (and centre) parties; 2 Dominance of rightwing (and centre) parties; 3 Balance of power between left and right; 4 Dominance of socialdemocratic and other left parties; 5 Hegemony of social-democratic and other left parties.
8.2 Literature Review
225
“Where authority is centralized and inclusive actors have strong incentives to cooperate, differences are more likely to be resolved in ways that are collectively beneficial. Deliberation […] is more likely to occur” (Gerring & Thacker, 2008, p. 63). They establish empirically that centripetalism indeed results in higher economic, human and political development: Centripetal institutions provide higher bureaucratic quality, higher tax revenues, more participation, higher GDP per capita, less infant mortality, longer life expectancy and many other aspects (Gerring et al., 2005, p. 576; Gerring & Thacker, 2008, p. 117). They do not base their control variables on a strong theory as Roller does. However, they include a plethora of control variables: Besides a time trend to control for spurious correlation between two similar trending variables, they pick regime history of the country, regional dummies, socialist history, legal origin, ethnic fractionalization, population size, centrum-periphery (distance to financial centers), resource curse (oil and diamond production) and religion (% Protestants and Muslims) as control variables. By also including a variable to control spatial autocorrelation, they go beyond other approaches, since countries “lying close to one another may display similar values for extraneous reasons (culture, geography, diffusion, and so forth” (Gerring et al., 2005, pp. 574–575). They include all countries which have been democratic at least for one year according to the Polity index (>0 on the combined democracy score ranging from −10 and 10) over the period from 1960 to 2000, so that between 77 to 126 countries are included. They perform a TSCS analysis without a lagged dependent variable but using first-order autocorrelation similar to the Prais-Winsten approach. In addition, they employ different model specifications as robustness checks. Nevertheless, it is not clear whether all challenges of the TSCS are resolved and they estimate a correctly specified model (e.g. eliminating all autocorrelation, see section 8.4.1). Doorenspleet/Pellikaan (2013) also analyze what type of democracy performs best. By combining the features of the electoral system (plurality vote vs. proportional representation), political system (centralized vs. decentralized) and the structure of the society (homogeneous vs. heterogeneous), they obtain eight different models of democracy. They do not provide a definition of good performance, rather they draw on the Worldwide Governance Indicators to measure performance. However, relying on the Worldwide Governance Indicators is problematic, as discussed in chapter 4 and 6. They find that the electoral system always has an impact on the quality of governance: PR outperforms plurality vote. In addition, “[c]entralization is best in homogeneous societies, while decentralization is best in heterogeneous societies” (Doorenspleet & Pellikaan, 2013, p. 259). However, the study has shortcomings. They do not formulate clear theoretical reasons
226
8
Explaining Goal-Oriented Performance
why one democracy profile is superior to others. They discuss Lijphart’s empirical findings and the study by Gerring/Thacker, but without referring to theoretical reasons or presenting their hypotheses. It is not clear what they define as democracy and therefore, which cases are selected and on which years the analysis is based. 98 cases are included in their analysis which means that their sample must comprise a very heterogeneous sample including not only functioning but also deficient democracies. There is no detailed description of which countries belong to which model of democracy, rather they offer only a short description based on regions (Doorenspleet & Pellikaan, 2013, p. 248). Methodologically, they compare mean values of the different democracy models and also perform multiple regressions. They do not discuss other explanations and do not use any control variables in their analysis. This is very problematic in regard to their heterogeneous sample and probably makes it more likely that their results are biased. They do not perform any robustness checks. Therefore, the empirical results seem very doubtful. I now discuss studies that have a narrower focus because they examine the effects of democracy models only on particular performance areas. Anderson (2001) analyses the effects of institutional designs on economic performance. Economic performance is differentiated in unemployment and inflation. The institutional design encompasses the democracy model (consensus vs. majoritarian democracy), corporatism and central bank independence. It is argued that corporatism, with its large and encompassing organizations, focuses more on the common good than pluralistic systems with small groups pursuing their particularistic goals. Corporatism, therefore, should lead to lower levels of unemployment and inflation. Central bank independence isolates the monetary realm from politics, so that it removes the “temptation for elected officials […] to manipulate the money supply for particularistic interests” (L. Anderson, 2001, p. 438). Thus, independence of central banks results in a lower inflation rate. Finally, according to Anderson consensus democracy actually leads to a worse economic performance than majoritarian systems because proportional representation allows the creation of several political groups and the multiple veto points of the consensus democracy give each group the possibility to follow its particularistic interests by giving them veto power. In addition, he argues that corporatism and central bank independence cannot be coherently incorporated into the consensus democracy model, as Lijphart does. He considers two control variables: government ideology and globalization. Left parties in government tend to reduce unemployment, whereas right parties in government prefer low inflation. The party composition of the government is measured via the Schmidt-index. He theorizes that globalization, measured as the
8.2 Literature Review
227
sum of imports and exports as percentage of the GDP, leads to higher unemployment. His analysis covers 18 OECD countries from 1970 to 1990. He calculates the mean rates for unemployment and inflation for each decade (n = 36). This means he doubles the observations by counting each country twice. However, even though there is some time between these observations, these observations are still not independent and the applied multivariate analysis should not treat them as independently, which inflates the effective sample size. He carries out robustness tests and outlier detection. Finally, he sees his hypothesis confirmed: “the most appropriate constellation of institutions for achieving optimal macroeconomic performance would appear to be a majoritarian political system, a corporatist system of interest intermediation, and an ICB [independent central bank]” (L. Anderson, 2001, pp. 448–449). Vis et al. (2012) examine the effects of corporatism and the democracy model on economic performance. According to the authors, economic performance is a mixture of the components of economic growth, employment and public debt. In a first step, they classify the economic performance of 19 OECD countries between 1975 and 2005 based on a so-called fuzzy-set ideal-type analysis (e.g. the high growth—low debt class; or the high employment, high growth class). Then, they combine this classification with the institutional variables of corporatism and democracy model in a cross tabulation. Although they take up the corporatism and consensus democracy hypothesis, they do not establish a theoretical relationship between the institutional variables and the outcome. Their empirical findings suggests that “there is no systematic direct relationship between a country’s institutional setup—represented by corporatism or consensus democracy—and economic performance” (Vis et al., 2012, p. 89). Their analysis is rather qualitative and exploratory; therefore, they do not use any statistical model. Also, it does not include other control variables (e.g. party ideology, globalization or cultural factors) and no robustness checks were carried out. Scruggs (1999) tests the effects of corporatism and the democracy model on environmental performance in 17 OECD countries. In contrast to pluralism, corporatism is favorable to good environmental performance because of its “ability to pursue public goods” (Scruggs, 1999, p. 5). Large, encompassing organization found in corporatist arrangements favor the national interests instead of particularistic ones. In addition, they help to “socialize distributional costs of environmental policies” (Scruggs, 1999, p. 5) by economically supporting the losers of environmental protection. They can more easily overcome problems of collective-action in environmental protection through consensus, negotiation and concertation. Similarly, in contrast to majoritarian democracies, consensus democracies through their consensual negotiations focus on the public good
228
8
Explaining Goal-Oriented Performance
and increase environmental protection. His approach comprehensively measures environmental performance in terms of air emissions (e.g. sulphur dioxide, nitrogen), municipal waste and waste-water treatment. The performance measure is not calculated as a time series, but as an aggregated indicator measuring environmental improvement between 1970 to 1990. Corporatism is measured using the Lijphart’s and Crepaz’s rankings, and majoritarian and consensus democracy is a recalculation of Lijphart’ index to match the period from 1970 to 1990. However, his recalculation does not distinguish between the executive-parties and federal-unitary dimension. But as Lijphart’s analysis shows, most of the impact comes from the executive-parties dimension. He applies several control variables: socio-economic modernization (level of manufacturing in gross domestic product, percentage change in energy use, percentage of nuclear power in total energy consumption, average level of per capita income growth and population density). He also incorporates the power resource theory by focusing on the strength of the environmental movement (average vote share for national green and left-libertarian parties, environmental group membership and difference between materialists and post-materialists). Due to the static performance measure, he uses bivariate and multivariate regressions instead of a time-series analysis. However, this results in low statistical power due to low sample size. This is problematic because the data provides more than just a cross-sectional analysis. He concludes that the “effect of corporatism appears […] to be resilient to different model specifications, operationalizations of corporatism and tests for particularly influential cases. [The] idea that consensual democracy is also associated with good environmental performance is not supported by the evidence” (Scruggs, 1999, p. 30). Poloni-Staudinger (2008) comes to a somewhat different conclusion when she tests the effect of the democracy model on environmental performance. She distinguishes between four types of environmental performance: mundane environmentalism (recycling, waste water treatment, unleaded gasoline and maintenance of protected areas), environmental taxation, conservation (red books and strict nature reserves) and, finally, nuclear energy production. She gives no reasons why she does not take into account air emissions (e.g. greenhouse gases), which can be considered an important part of environmental performance (see section 5.3.2). Her approach theorizes that consensus democracies on the executive-parties dimension are positively linked to good environmental performance, as the inclusion of green interests and parties is facilitated by coalition governments and proportional representation. Consensus democracies on the federal-unitary dimension are also positively related to environmental performance because federalism is “more sensitivity to subnational concerns and
8.2 Literature Review
229
provide[s] more access points to government” (Poloni-Staudinger, 2008, p. 414). Control variables include economic and social development (measured by the Human Development Index), population density, and green parties. She analyzes 23 OECD countries for the year 1998 stating that these “data correspond most closely, temporally, with Lijphart’s majoritarian/consensus scales” (PoloniStaudinger, 2008, n. 4). However, Lijphart’s indices are not calculated on a yearly base but are rather summary measures for either the years from 1945 to 1996, or from 1977 to 1996 (Lijphart, 1999, p. 48). It is therefore not necessary to limit the analysis to the year 1998. Thus, she performs only a cross-sectional analysis which goes along with weak statistical power. Robustness checks are not mentioned. While her study finds no effects for the nuclear energy and taxation categories, it finds that the federal-unitary dimension is positively associated with both outcomes, the mundane environmentalism and conservation. The executiveparties dimension is positively associated with the mundane environmentalism but negatively related to conservation. She suspects that this could be “due to the higher costs associated with conservation policies, which increase the likelihood that these policies will be vetoed by other actors in the decision-making process” (Poloni-Staudinger, 2008, p. 425). There are no studies for goal-attainment performance which focus on the effects of democracy models on the amendment rate. However, several studies analyze the relationship between difficulty of the amendment procedure and amendment rate. The difficulty of the amendment procedure is a feature of Lijphart’s model of majoritarian and consensus democracy (constitutional rigidity), thereby implicitly relating these studies to democracy models. Lutz (1994, p. 365) suggests—based on an analysis of the U.S. federal states and on a crossnational sample (European States, North and South America, Asia-Pacific)—that “the variance in amendment rate is largely explained by the interaction of two variables: the length of the constitution and the difficulty of the amendment process”. The more difficult the amendment procedure, the less likely is a constitutional change. The longer the constitution measured in words, the more likely is a constitutional change because the constitution regulates more areas and it needs therefore more adoptions. Ginsburg/Melton (2015) find that institutional factors are not relevant but rather societal values. They include a cultural factor, the amendment culture, which explains most parts of the amendment rate. Even though the idea of an amendment culture might be intriguing, their measurement of the amendment culture is rather “questionable” (Bucur & Rasch, 2019, p. 171). They suggest to measure the country’s amendment culture by using the amendment rate of its previous constitution as a proxy. However, this measure might be considered invalid.
230
8
Explaining Goal-Oriented Performance
Finally, Reutter/Lorenz (2016) analyze the amendment frequency of the German Länder. They include institutional factors, the amendment difficulty, the age of the constitution and the length of the constitution. However, they also introduce partisan variables which can be linked to democracy profiles quite easily. Whereas the fragmentation of the party system is an indicator for the consensus democracy, government with a large majority are an indication of a majoritarian democracy. On the one hand, they reason that party fragmentation decreases the amendment rate; on the other hand, they state that majority governments can alter the constitution more readily. According to them, left-wing parties will adopt the constitution more frequently than right-wing parties. They analyze 199 legislative turns across the 16 German Länder. The empirical results are as follows: Older and longer constitutions are amended more often than younger and shorter ones. Amendment difficulty leads to a lower amendment rate. For the partisan variables, they find a mixed picture regarding the effects of the democracy profiles: “This would mean that in the Länder, constitutional policy is shaped by factors of both majoritarian and consensual policy-making. Thus, an amendment is more likely to be passed if there are strong governments and a high number of effective parties” (Reutter & Lorenz, 2016, p. 119). However, their study considers the 199 observations to be independent observations. This overestimates their effective sample size: for instance, the institutional variables do not vary between the cases independently. The amendment procedure, the age of the constitution and the length of the constitution are related between some of the cases, since they are grouped within their Land. Therefore, a multilevel analysis would have been a more sorrow technique. Overall, Bucur/Rasch (2019, p. 172) give a summary about the state of research in the goal-attainment area: “Amendment institutions at best provide only a partial explanation of constitutional change. The answer to this question must take into account a number of additional factors of a political, economic, and social nature”. In addition, empirical analyses often show ambiguous results, depending on the sample and measurement of the amendment rate (Lorenz, 2015). Birchfeld/Crepaz (1998) investigate the effects of democracy models on income inequality (measured as the income share of top 20% and as a rich-poorratio) in 18 OECD countries. They show that competitive veto points3 result in increasing income inequality, whereas collective veto points reduce income inequality: the “more widespread the access to political institutions, and the more representative the political system, the more citizens will take part in the political process to change it in their favor which will manifest itself […] in lower 3
See Chapter 2 for a description of these democracy types.
8.2 Literature Review
231
income inequality” (Birchfield & Crepaz, 1998, p. 191). They include various control measures: cabinet ideology (bourgeois vs. social democratic), voter turnout, macro-economic variables (economic growth, GDP, unemployment). Economic openness of the economy is considered to include external influences in the form of globalization (measured foreign trade as percentage of GDP). No cultural aspects are considered, however. They pool two time points (late 1970s, and late 1980s), and state that “autocorrelation is of no great concern” (Birchfield & Crepaz, 1998, p. 188). This view can be questioned, because the income inequality of a country does not change quickly from one decade to another. It is not reasonable to treat the two time points of a country as independent observations, as these time points are highly correlated.4 Thus, the effective sample size is smaller than declared biasing the standard errors of the regression. They discuss and perform robustness checks in the sense of outlier detection. Schmidt (2001) studies the effects on social expenditures. He discusses a variety of different public policy theories. First, he discusses the theory of economic modernization. The extent of social expenditure is seen as a reaction to societal and economic developments. However, there are unclear causal mechanism and the theory represents a neglect of politics. Second, he presents the power resource theory and the partisan theory. Whereas the power resource theory focuses on the power distribution of extra-parliamentary groups (e.g. classes or unions), the partisan theory states that the extent of social expenditure is regulated by the parties in parliament (“parties matter”). Left parties (social democratic parties) or Christ-democratic parties favor a strong welfare state. Third, institutional theories are included in the discussion: On the one hand, consensus democracies result in higher social expenditure. On the other hand, an entrenchment of the welfare state as well as an expansion of the welfare state is made more difficult with a higher number of veto players due to the lower chances of policy change. Fourth, international factors lead to a higher degree of redistribution because the state has to cushion the negative effects of globalization. Fifth, social expenditure follows a historical path dependence (legacy theory). The politics of today are determined to a certain extent by the politics of the past. He also includes a cultural perspective with regard to the families of nation concept, however, he does not test this strain of theory.5 Schmidt finds that all of these theories are important to explain social expenditure empirically. For democracy models, two findings are important: Coalition 4
As a hard robustness test, they could have computed the regression for each time point separately. 5 For a detailed discussion, see Schmidt et al. (2007).
232
8
Explaining Goal-Oriented Performance
governments due to the need of finding compromises lead to higher social expenditure. Veto players slow down state activity, and are therefore negatively related with social expenditure. Other findings include, that there is historical path dependency (lagged dependent variable). Economic modernization leads to higher social expenditure, economic pressure (unemployment and costs of the public sector) results in lower social expenditure. Left parties and Christ-democratic parties lead to higher social expenditure. His analysis is based on 21 OECD countries from 1960 to 1995. He uses time-series-cross-sectional analysis à la Beck/Katz (1995). He includes a lagged dependent variable to correct for autocorrelation. However, he seems to miss to account for unit heterogeneity by including country-dummies. This makes the empirical results questionable. He does not perform robustness checks based on outlier detection. Crepaz/Moser (2004) analyse the factors influencing different measures of public expenditure (current disbursements, government consumption and social expenditure) in 15 OECD countries from 1960 to 1996. They focus on the consequences of institutional designs of democracies in terms of competitive and collective veto points. Thereby they develop the theory in greater detail: Actors in collective veto points have common interests because they share responsibility for policies, act face-to-face with other government members and have an “intrinsic interest to ensure that one’s own party is not singled out as the reason why a government falls” (Crepaz & Moser, 2004, p. 265). In contrast, competitive veto points are characterized by “political strategizing” (Crepaz, 1998, p. 64) and it becomes easier to use its veto power because “political institutions that have institutional vetoes, i.e. the ‘faceless’ institutions themselves are blamed rather than the parties or some prominent actors within parties” (Crepaz & Moser, 2004, p. 266). Therefore, they theorize that collective veto points expand the public spending, whereas competitive veto points reduce or impede the expansion of the public expenditure. They also stress the effects of globalization: “Not only are domestic institutions affecting [the outcome], but there is growing evidence that globalization is also having a measurable effect” (Crepaz & Moser, 2004, p. 260). They find empircally that collective veto points have an expansionary effect, whereas competitive veto points exert a restricting effect. In addition, public expenditure, especially social spending, is not negatively affected by globalization. There is no race to the bottom; rather, globalization has a positive and increasing effect. Their models include several control variables. These are based on the power resource theory (ideology of government) and state of economy (unemployment, inflation, age of population, GDP per capita, dummy variables for oil crisis). Cultural aspects
8.2 Literature Review
233
are not considered. They apply a TSCS model without a lagged dependent variable and correct for autocorrelation using the Prais-Winsten transformation. The compliance with the model assumptions is not checked, however. Lappi-Seppälä (2010) analyses the causes of punitivity of criminal justice systems. Punitivity is understood to be both systemic punitivity and attitudinal punitivity. The former means how severe the punishment of criminal justice systems is, measured by the incarceration rate. The latter means to what extent the population supports harsh punishments. This is measured by items of two international surveys (European Social Survey and International Crime Victims Survey). His approach is based on several explanatory factors: crime level (homicide rate), social factors (welfare and economic inequality), political culture (social trust and legitimacy as well as fear), institutional factors (majoritarian vs. consensus democracy, corporatism) and regime type (autocracies vs. democracies). According to his assumption, consensus democracy and corporatism have a direct and indirect effect. Indirectly, both factors promote a stronger welfare state due to the inclusion of more interest in the decision-making process and they therefore exhibit less economic inequality. Less economic inequality promotes “trust and legitimacy, which facilitate compliance with norms based on legitimacy and acceptance” (Lappi-Seppälä, 2008, p. 314). The direct impact of consensus democracy is in the way political discourse and deliberation is conducted. Majoritarian democracy focuses on controversies and distinction: the “main project for the opposition is to talk up societal or political crisis” (Lappi-Seppälä, 2010, p. 323). Compared to consensus democracies, this would result in delegitimizing of the governmental policies and political institutions. Moreover, consensus democracy leads to a stable and consistent criminal policy that cannot be changed by rapid turnarounds as might occur in majoritarian democracies. He concludes that lower rates of “imprisonment and a less punitive penal climate seem to have their roots in a consensual and corporatist political culture, in high levels of social trust and political legitimacy, as well as in a strong welfare state” (Lappi-Seppälä, 2010, p. 326). His sample consists of 30 countries: Most of them are European, but New Zealand and Canada are also included. However, his study analyses each explanatory strain in isolation. There is no attempt to “produce a ‘final causal model’ explaining differences in punitivity” (Lappi-Seppälä, 2010, p. 311). However, he discusses outliers and therefore carries out robustness tests. Anderson/Guillory (1997) analyze the relationship between the democracy model and satisfaction with democracy. They hypothesize that citizens who “win” an election in the sense that they support a party which won the election and constitute the government are more satisfied with the democracy compared to citizen who lose an election. However, this effect is mediated by the democracy
234
8
Explaining Goal-Oriented Performance
model: “the more consensual the set of political institutions in a country, the greater is the extent in which negative consequences of losing elections are muted” (C.J. Anderson & Guillory, 1997, p. 68). Whereas the gap between the satisfaction with democracy between the winners and losers is wide in majoritarian democracy, this gap narrows down in consensus democracies. The reason is that a consensus democracy enables opportunities for the loosing citizen to be still fairly represented in the parliament or even government, whereas in the majoritarian democracy the principle of “the winner takes it all” applies. They find empirically that their hypothesis is confirmed. Compared to the other studies discussed so far, this study is based on survey data and combines macro-indicators with individual-level variables. They draw on the Eurobarometer survey conducted in 1990 for eleven European countries. On the individual level, they control for national and personal evaluation of the economic situation, interests in politics and socio-demographic characteristics (education, age, gender, income). They perform separate regressions for each country (‘no pooling’) as well as pooled models. They do not perform multilevel regression with a cross-level interaction between the democracy model and the individual attitudes which is the current state of the art. The application of multilevel regression would result in an increased sample size compared to the no-pooling approach, and at the same time include the different country contexts. Robustness checks are done as well as outlier detection. Hakhverdian/Koop (2007) analyze whether democracy models have an effect on the election of populist parties. They theorize that consensus democracies both in the executive-parties dimension and federal-unitary dimension lead to a higher support for populist parties compared to majoritarian democracies. The reason is that “inclusiveness comes at the expense of accountability” (Hakhverdian & Koop, 2007, p. 407) and responsiveness in consensus-democracies. Consensus in the executive-parties dimensions imply negotiations at the elite level, proportional representation and coalitions which might lead in turn to a cartelization of the party system (Katz & Mair, 1995) and therefore to the rising of populist parties as a protest movement. Consensus in the federal-unitary dimension increases the “institutional complexity [which] obscures accountability and responsiveness of political elites” (Hakhverdian & Koop, 2007, p. 412). They can confirm their hypothesis: Consensus democracy in the executive-parties dimension as well as in the federal-unitary dimension lead to an increased electoral success of populist parties. This holds also, if these two dimensions are combined: Consensus democracies in both dimensions show increase support of populist parties compared to purely majoritarian democracies. Their analysis encompasses 19 European
8.3 Theory and Hypotheses: Three Types of Effects …
235
countries from 1990 to 2006. The methods are bivariate regression and crosstabulation. However, they do not control for other theories. Robustness checks are done via outlier detection. Lauth/Schlenkrich (2018c) come to a different conclusion for the same research question. Based on survey data (European Social Survey 7 and 8), they show that consensus democracy does not directly affect the support of populist parties but they have an effect on individual susceptibility to populism. However, in this regard consensus democracies perform better than majoritarian democracies: majoritarian democracies lead to a higher susceptibility to populism than consensus democracies. They include several control variables on the individual level (education, political education, personal evaluation of the economic situation, unemployment and rejection of migration). On the macro level they control for the age of democracy. They analyze 20 European countries around 2014 using a Bayesian multilevel path analysis.
8.3
Theory and Hypotheses: Three Types of Effects of Democracy Profiles
The literature review has identified several causal mechanisms of the democracy profiles for various policy outcomes. It should also be clear, however, that these causal links are usually unsatisfactorily weak. As Bogaards (2017a, p. 16) states, “the lack of a causal mechanism is a more general problem in the literature relating type of democracy to outcome variables”. It seems impossible to identify detailed causal paths, why democracy profiles influence policy outcomes. However, other explanations that focus, for instance, on party politics are much more convincing. Why is this so? The effects of the democracy profiles belong to the institutional explanations. However, there is a serious limit of the structural or institutional explanations of democracy profiles: Institutions enable or constrain actions of actors—but only to a certain extent. Thus, the role of actors is important and needs to be analyzed. Institutions do not have a causal effect alone, rather there is a dense interplay with the actors within those institutions. This means that the causal chain of the effects of the democracy profile is too long and the outcome which should be explained is causally too distant. Policy outcomes are affected by a lot of different variables, most of them are probably much closer to the outcome. Thus, there are a lot of intermediate effects which cannot be captured with this study design. For Gerring/Thacker (2008, p. 160) which have a similar study design “the most bothersome aspect of [their] theory [is]: it sits atop a large and opaque black box.
236
8
Explaining Goal-Oriented Performance
The inputs and outputs are clear, but what goes on inside is not”. The same is true for my study. Nevertheless, they stress that the studying of the effects of institutions is important: “Highly disaggregated studies run the risk of missing the big picture, and this, as it happens, is the most policy-relevant part of the problem. For we cannot switch institutions on a whim to suit different policy needs” (Gerring & Thacker, 2008, p. 163). Furthermore, it can be theorized that institutions have no immediate impact on performance, but that they slowly but steadily affect actors over time. If a particular institution is linked to a specific outcome, then it would not achieve its full effect size immediately, but its effect would only be distributed slowly over time. I summarize the various explanatory factors which I reviewed before and transfer them to the democracy profiles used in this study. These factors can be categorized in three types of effects: The first effect type results from the specific set of rules of each democracy profile which shapes the form and mode of the decision-making process. In this sense, it is a direct effect of the democracy profile. The reviewed approaches argue that a favorable decision-making process is characterized by a certain policy stability hindering abrupt policy changes, a fast and authoritative decision-making process with a low probability of a gridlock, as well as a decision-making process which is open to interests of new parties. Somewhat contrary to the authoritative decision-making process, federalism is seen as favorable because it allows the inclusion of subnational interests. A decision-making process shaped in this way promotes performance in various areas. Thus, the direct effect is mainly linked to the trade-off between the freedom and control dimension. Libertarian democracies with a low control dimension can score points with a fast decision-making process and a low probability of a gridlock. Control-focused democracies, on the other hand, ensure policy stability and are able to include subnational interests. However, there is a “threat of a blockade of the political decision-making process” (Schmidt, 2015, p. 40). In this regard, no democracy profile is a clear winner. Finally, the equality-freedom trade-off is also involved to a great extent. Egalitarian democracies, either with high or low control values, make it easier for small parties to be included and recognized in the decision-making process. The second type of effect is an indirect effect (indirect effect I): The specific rules of the democracy profile shape the negotiation method and attitudes of the actor inside its political institutions. It is argued that favorable negotiation methods and attitudes can be created, if institutional settings let “different political actors operate in the same body and interact with each other on a faceto-face basis” (Crepaz, 1998, pp. 64–65). Scholars argue that a favorable attitude is one, where a public interest and “shared responsibility” (Crepaz & Moser, 2004,
8.3 Theory and Hypotheses: Three Types of Effects …
237
p. 265) between the different political actors can emerge. In addition, these settings increase compromise and incentivize deliberation. This helps to overcome the collective-action problem and also to reduce particularistic interests. Overall, this leads to a higher performance. This effect type is mainly linked to the trade-off between the freedom and equality dimension, but also incorporates elements of the trade-off between the freedom and control dimension. Egalitarian democracies, especially combined with a low control value, are mostly characterized by coalitions created by a multiparty system and proportional representation. Less favorable are egalitarian control-focused democracies, because they also have competitive veto points besides collective veto points. In contrast, Anderson (2001) is actually arguing that veto power creates particularistic interests: In his reasoning, libertarian-majoritarian democracies would be the most favorable and to a lesser extent, the egalitarian-majoritarian democracies. There is a third type of effect that stems from the democracy profiles (indirect effect II). Thereby, democracy profiles indirectly create positive attitudes of the citizens outside of the political system. On the one hand, these attitudes are a performance goal in itself (see confidence performance), on the other hand, these attitudes also help to promote other performance goals. A high legitimacy of the political system and its policies and a high satisfaction creates acceptance of the policies. Implementation of policies is also made easier, when there is support from all societal sectors. Finally, by pacifying multinational states it creates stability which in turn would increase performance. As before, this mainly belongs to the freedom-equality trade-off, so that egalitarian democracies, either majoritarian or control-focused, are seen as superior. They have a consensual political discourse and they create a broad consensus. However, egalitarian-control-focused democracies are helpful in multinational contexts, because they allow inclusion of more groups in the decision-making process with significant veto powers. Finally, the balanced profile (FEC) is difficult to evaluate. I assume that it has an intermediate position within these different effect types due to the lack of a clear dimensional preference. For these reasons, I suggest the following general hypotheses: Hypothesis 1: The more likely a country belongs to the egalitarian democracy profile (fEc) and egalitarian control-focused democracy profile (fEC), the more likely it is to show a higher level of performance. Hypothesis 2: The more in line a country is with the libertarian democracy profile (Fec) and libertarian and control-focused democracy (FeC), the less likely it is to have a higher level of performance.
238
8
Explaining Goal-Oriented Performance
Hypothesis 3: The more similar a country is to the balanced democracy profile (FEC), the more likely it is to have effects that lie between the libertarian democracy profiles and the egalitarian democracy profiles. Hypothesis 4: The higher the inclusiveness dimension (freedom vs. equality trade-off) of a country, the more likely it is to show a higher performance level. Hypothesis 5: The higher the effective government dimension (freedom vs. control trade-off) of a country, the less likely it is to have a higher performance level. I include control variables as well, which will be discussed below. A summary is given in Table 8.2.
8.4
Methodology: The Bayesian TSCS Within-Between Model
8.4.1
Time-Series Cross-Sectional Analysis: Autoregressive Distributed Lag Model and Error Correction Model
8.4.1.1 Challenges of Time-Series Cross-Sectional Data Time-Series Cross-Sectional (TSCS) analysis combines the features of time-series analysis with the properties of cross-sectional analysis. This has theoretical and statistical advantages. On the one hand, it allows to model complex and dynamic theories with effects varying over time and between countries. On the other hand, the combination of time and units increases the sample size and thus, “increases statistical leverage” (Fortin-Rittberger, 2014, p. 389). Due to the complexity of this method, the correctness and the results of the TSCS analyses are highly dependent on the right model specifications (S.E. Wilson & Butler, 2007). Especially, the analysis has to deal with four important violations of OLS assumptions (Beck & Katz, 1995; Fortin-Rittberger, 2014; S.E. Wilson & Butler, 2007): (1) Unit heterogeneity: TSCS data are hierarchical data because the observations are nested within units resp. countries. The characteristics of these countries (e.g. culture, political institutions) differ from one another and affect the regression, so that the intercepts vary from country to country. This refers to “unobserved variables that remain constant over time and are not explained by the independent variables included in a model” (Fortin-Rittberger, 2014, p. 394). This is usually solved by fixed effects (FE) models which add a dummy variable for each country. This would lead to two problems in this study: The inclusion
8.4 Methodology: The Bayesian TSCS Within-Between Model
239
Table 8.2 Effects of Democracy Profiles fEc
Fec
FeC
FEC
fEC
E
c
Form and mode of the decision-making process (direct effects) facilitates the creation of new parties with new interests and makes it easier to involve small parties in the decision-making process
Y
N
N
ID
Y
Y
ID
no abrupt policy changes (stability of policies) due to multiple veto points
N
N
Y
ID
Y
Y
N
no gridlock
Y
Y
N
ID
N
N
Y
fast and authoritative decision-making process
Y
Y
N
ID
N
N
Y
federalism is more sensitive to subnational interests and more access points
N
N
Y
ID
Y
N
N
Shaping the negotiation method and attitudes of actors within the political institutions (indirect effects I) a public interest and a higher responsibility through collective veto points (parliamentarism and multiparty system) instead of competitive veto points as “faceless” institutions and diffusion of responsibility
Y
N
N
ID
Y
Y
Y
overcoming the collective Y action-problem by consensus, compromise and negotiation
N
N
ID
Y
Y
ID
no particularistic interest due to the lack of veto points
Y
N
ID
N
Y
Y
Y
(continued)
of country dummies eliminates the ability to include time-invariant or to estimate slow-changing variables (S.E. Wilson & Butler, 2007, pp. 105–106). However,
240
8
Explaining Goal-Oriented Performance
Table 8.2 (continued)
cooperation incentivizes deliberation
fEc
Fec
FeC
FEC
fEC
E
c
Y
N
N
ID
Y
Y
ID
Popular support through representation (indirect effects II) higher legitimation of policies through deliberation and consensual political discourse instead of controversies and distinction
Y
N
N
ID
Y
Y
ID
higher support from all societal sectors for policies due to broad consensus
Y
N
N
ID
Y
Y
ID
higher satisfaction among Y citizens through broad representation
N
N
ID
Y
Y
ID
more stability in N multinational contexts due to inclusion of more groups in decision-making process
N
N
ID
Y
Y
N
Sum “Y”
10
3
2
0
10
10
4
Sum “N”
3
10
11
0
3
3
3
Note: “Y” means “Statement applies”; “N” means “Statement does not apply” and “ID” means “Democracy Profile shows no clear preference”. E: libertarian-egalitarian trade-off dimension (high values = egalitarian); c: majoritarian-control-focused trade-off dimension (high values = majoritarian). Source: own presentation based on literature review
in this study, I am also interested in the effects of those contextual variables. Furthermore, FE models are an expression of the no-pooling approach, in which regressions are implicitly calculated separately for each country: They “assume that nothing learned about any one category informs estimates for the other categories—the parameters are independent of one another and learn from completely separate portions of the data” (McElreath, 2015, p. 355). Thus, these models are inefficient.
8.4 Methodology: The Bayesian TSCS Within-Between Model
241
A solution two both problems are so called random effects or multilevel models. While time-varying predictors can be considered on the first level, the observation level, time-invariant predictors can be included on the higher level, the country level. In addition, multilevel models can estimate the effect of higherlevel variables on lower-level variables (cross-level interactions). They also follow a partial pooling logic in which the information available from both levels is combined, so that we are “learning simultaneously about each cluster while learning about the population of clusters” (McElreath, 2015, p. 355). However, it is often cited as a drawback that random effects models cannot be used in conjunction with TSCS data. Especially in TSCS data, there is a violation of the “assumptions underpinning RCMs [Random Coefficient Models—a variant of multilevel models], namely that there is no correlation of random effects with regressors and no correlation between random components” (Fortin-Rittberger, 2014, p. 396). However, Bell and Jones (2015) show for TSCS data that it is possible to split a W + x B .6 While time-varying covariate into its within and between part: x gt = x gt g x gt is an “uninterpretable blend” (Raudenbush & Bryk, 2002, p. 139), the split allows the model to distinguish between the within and between effect of that variable, effectively removing the correlation: “This will capture the problematic correlation before it falls into the group-level error term and creates a violation of an important Gauss-Markov regression assumption” (Bafumi & Gelman, 2006, p. 13). This type of multilevel model is a so-called Within-Between model: “the main criticism of RE, the correlation between covariates and residuals, is readily solvable using the within-between formulation espoused here, although the solution is used all too rarely in RE modeling” (Bell & Jones, 2015, p. 17). Thus, multilevel models give the opportunity to incorporate contextual variables like institutions into the regression and to answer the research question: How do these institutions, which are slowly or never changing within a country, affect the outcome of performance? (2) Unit root or Nonstationarity: A variable is non-stationary or has a unit root, if it is a random walk yt = yt−1 +εt . A random walk is unpredictable and has statistical characteristics (mean, variance) that change over time. Two random walks can, by chance alone, show a high correlation, so that time-series with unit roots can lead to spurious regression results indicating a relationship between variables which actually does not exist. In contrast, a stationary process is a “process that is characterized by changes over time, while these changes are not directly
6
−
W is group mean centered x W = x − x , the between effect x B is The within effect x gt gt g gt g
grand-mean centered x gB =
− − x g− x.
242
8
Explaining Goal-Oriented Performance
a function of time” (Hamaker & Grasman, 2015, p. 2). In contrast to economical data, the detection of unit roots in political science datasets which often have only a short time frame, is rather difficult (Beck & Katz, 2011, p. 343). In addition, according to Beck and Katz (2011), many variables in political science are bounded (e.g. percentage variables), so that assuming a unit root which implies an ever increasing mean value cannot be applied. They state that the “impressive apparatus built over the past two decades to estimate models with [non-stationary] series [in econometrics] does not provide the tools needed for many, if not most, political economy TSCS datasets” (Beck & Katz, 2011, p. 343). Generally, the unit root tests proposed in the econometrics literature are not suitable for the TSCS data in political science: There is “the weak power of statistical tests, the disjuncture between the theoretically infinite variance of unit root processes and the limited variance of bounded time series, and the lack of sufficient theory predicting unit roots” (Keele et al., 2016, p. 299). Nevertheless, pretesting is still important. Fortunately, there is a simple test which is used by Beck/Katz (2011): I regress each variable on its lag (in a multilevel framework), and examine the coefficient of the lagged variable. If it is 1 or above7 , I assume that there is a unit root in the time-series. I am therefore only able to use the between-effect of that variable inside the regression, and exclude the time-varying within-effect to eliminate the distortion caused by a unit root. (3) Serial correlated errors (autocorrelation): The values of the dependent variable follow an autoregressive process (AR), e.g. current values of social equality performance depend to some extent on past values of social equality performance. We speak of serial correlation (Fortin-Rittberger, 2014, p. 392). This is usually solved by including a lagged dependent variable (LDV) in the regression. This is criticized because LDV will absorb most of the predictive power of the other variables (Aachen, 2000). However, “there is nothing atheoretical about the use of a lagged dependent variable, and there is nothing that should lead anyone to think the use of a lagged dependent variable causes incorrect harm” (Beck & Katz, 2011, p. 336). Based on simulation studies, Wilkins (2018, p. 409) comes to a similar conclusion that the “results demonstrate that the ‘suspicion, if not outright rejection’ with which LDVs have come to be viewed in certain quarters
7
Coefficients for the lagged dependent variable close to its “bounds are often a sign of model misspecification” (Keele et al., 2016, p. 302): For the ADL, the coefficient should lie between 0 and 1, while coefficient > = 1 indicates a unit root or an explosive time series. For the ECM, this coefficient should be in the range −1 to 0. Values ≥ 0 are troublesome. ADL and ECM are explained below.
8.4 Methodology: The Bayesian TSCS Within-Between Model
243
is unwarranted”. Unlike the other time-varying variables in the model, the autoregressive parameter does not need to be split in a within- and between-effect (Hamaker & Grasman, 2015). There are two general dynamic models which include a LDV, the so-called Autoregressive Distributed Lag Model8 (ADL) and the Error Correction Model9 (ECM). Both models are equivalent and can be transformed into each other (De Boef & Keele, 2008, p. 189). The ADL model offers a more traditional interpretation of the model parameters, whereas the ECM has the advantage that it can be used in conjunction with cointegrated data. I use both models by transferring them to a multilevel framework. Importantly, serial correlation needs to be eliminated for a correctly specified model: “Researchers should […] test their residuals for autocorrelation. Adding additional lags of the dependent variable helps to correct residual autocorrelation because residual autocorrelation can be specified as a restricted form of a higher order autoregressive model” (Wilkins, 2018, p. 409). Therefore, to obtain a proper model, the following test procedure is applied: 1. Estimate the model using the Autoregressive Distributed Lag (ADL) Model with AR(1) 2. Extract the residuals and create a lagged residual variable 3. Regress the lagged residual variable on the residuals and include all model parameters as before. If the coefficient of the lagged residuals is insignificant, the serial correlation is successfully removed and the empirical analysis can continue. However, if the coefficient is significant: 4. Estimate the model using the Autoregressive Distributed Lag (ADL) Model with AR(2) by including a second LDV 5. Repeat step 2 and 3. If the coefficient of the lagged residuals becomes insignificant, the serial correlation is removed and the inclusion of a second lag is warranted in order to continue the empirical analysis. If the coefficient is significant: 8
The ADL model is defined as following (Beck & Katz, 2011; De Boef & Keele, 2008; Wilkins, 2018): yi,t = α0 + α1 yi,t−1 + β0 xi,t + β1 xi,t−1 + ε. yi,t is the dependent variable, ε is an IID error term, α1 yi,t−1 represents the LDV, β0 xi,t and β1 xi,t−1 are the (lagged) independent variables and α0 is the intercept. 9 The formula for the ECM is (Beck & Katz, 2011; De Boef & Keele, 2008; Wilkins, 2018): yi,t = α0 + α1∗ yi,t−1 + β0 xi,t + β1∗ xi,t−1 + ε, where yi,t is the first-differenced dependent variable, ε is an IID error term, α1∗ yi,t−1 is the LDV, β0 xi,t is the first differenced-independent variable, β1∗ xi,t−1 is the lagged independent variable and α0 is the intercept.
244
8
Explaining Goal-Oriented Performance
6. Estimate the Error Correction Model (ECM) with AR(1) or AR(2) to check whether the series is cointegrated. Using the ADL model with a cointegrated series will result in high residual correlation, despite the inclusion of LDVs. However, the ECM model can deal with cointegrated data and if it is successful, the residual correlation is removed. (4) Panel Heteroscedasticity and Contemporaneous Correlation of Errors: Panel heteroscedasticity means that the error variances vary between the units. Instead of a constant variance across countries, panel heteroscedasticity leads to unique variance parameters for each country, so that some countries will have less uncertainty in their estimates than others. Contemporaneous error correlation refers to situations when observations “from unit may be correlated with another unit during the same period” (Fortin-Rittberger, 2014, p. 397). This may happen when “units experience a common shock in that period” (Shor et al., 2007, p. 172). This means that TSCS data is hierarchical in another way too. The observations are not only nested within countries, but also within points in time. Therefore, the multilevel structure corresponds to a “cross-classified” (Raudenbush & Bryk, 2002, pp. 373–398) structure.
8.4.1.2 Advantages of Bayesian Estimation for this Study I transfer the TSCS multilevel model to Bayesian statistics. Using Bayesian statistics has several advantages in this study. However, what is Bayesian Statistics? At the heart of Bayesian analysis stands Bayes’ rule (Gelman, 2013, p. 7): p(θ |y) =
p(θ ) p(y|θ ) p(y)
where θ are the model parameters and y is the observed data. Or in other terms (McElreath, 2015, p. 37): Posterior =
Prior ∗ Likeli hood Average Likeli hood
This formula consists of four parts: The prior is the initial belief in the parameter values before the data was seen. Although the prior makes Bayesian statistics inherently subjective, it is “no more inherently subjective than are likelihoods and the repeat sampling assumptions required for significance testing” (McElreath, 2015, p. 35) in the frequentist statistics. Bayesian analysis deals with this subjectivity in two ways: First, by applying conservative priors, so called weakly
8.4 Methodology: The Bayesian TSCS Within-Between Model
245
informative priors, which do help the model to converge but do not constrain the parameter values as much and let the data speak for itself. Second, by evaluating in a sensitivity analysis how the empirical results are changed when different priors are selected for the same model. We “must be aware of the sensitivity of our inferences to the choice of prior” (Lambert, 2018, p. 103). If the sample is large enough, the influence of the prior is usually very small. This is generally the case in the present study. The likelihood is the most important part and contains the model and the parameters. It “provides the plausibility of an observation (data), given a fixed value for the parameters” (McElreath, 2015, p. 45). It is an assumption about the data generating process. The average likelihood “standardize[s] the posterior, to ensure it sums (integrates) to one” (McElreath, 2015, p. 37), so that it is a valid probability distribution. The average likelihood is the reason why Bayesian statistics were simply not feasible in the past. There is no easy analytical solution to the average likelihood if the model is multidimensional: “If a model has more than about three parameters, then it is difficult to calculate any of the integrals necessary to do applied Bayesian inference” (Lambert, 2018, p. 116). Methods to approximate the posterior distribution are Markov Chain Monte Carlo (MCMC), Gibbs-Sampling and more recently Hamiltonian Monte Carlo (HMC). But these methods require a large amount of computing power, which have only recently become available. Applying Bayes’ rule, the combination of the prior and likelihood, which is then normalized by the average likelihood, results in the posterior distribution. The posterior distribution is thus the weighted average of the prior distribution and the likelihood (Lambert, 2018). Bayes’ rule transforms the prior belief in the parameter values, before the data was seen, into the posterior distribution, by learning from the data according to the likelihood. As Kruschke and Liddell (2018, p. 183) put it, we “start with a prior degree of belief in each possibility, then we collect some data and reallocate credibility across possibilities, resulting in a posterior degree of belief in each possibility”. Bayesian statistics offers at least four advantages over the frequentist approach: • Often, frequentist approaches are too inflexible for complex models: “frequentist approaches are limited by (1) hill-climbing algorithms [Maximum Likelihood Estimation] for finding parameters that sometimes fail to converge, (2) largeN approximations to sampling distributions that provide overly optimistic p values and CI’s, and (3) software that constrains the types of model structures and data distributions. On the other hand, modern Bayesian algorithms and software are robust across a wide range of complex models that can be very flexibly specified by the analyst, and the results are exact for any size N no
246
8
Explaining Goal-Oriented Performance
matter how small” (Kruschke & Liddell, 2018, p. 193). In addition, van de Schoot/Depaoli (2014, p. 79) states that complex models cannot be estimated with conventional statistics, because “numerical integration is often required to compute estimates based on maximum likelihood estimation, and this method is intractable due to the high dimensional integration needed to estimate the maximum likelihood”. This is especially true for multilevel models with complicated level designs and various parameters on different levels. • Missing data can easily be incorporated: It offers an easy and straight-forward solution for missing values in complex models (Gelman, 2013, pp. 451–452; Zhou & Reiter, 2010): First, after imputation of multiple datasets, we fit a model to each imputed dataset (see chapter 6 for a discussion of multiple imputation). Instead of using Rubin’s formula which presupposes normality, Bayesian methods allow the pooling of the results by mixing the posterior distribution of each model even when the normality assumption is violated. Although I do not think that this assumption is necessarily violated, it is safer to use the Bayesian method. Afterwards, all statistics (e.g. highest posterior density interval) can be computed on this mixed posterior distribution which now propagates the uncertainty of the imputed values. • It is more robust in small samples: Small samples pose a problem for quantitative analysis, especially in terms of statistical power (Gross & Kriwy, 2009). Simulation studies (Stegmueller, 2013; Zhang et al., 2007) show that Bayesian estimation is more robust in small sample settings than Maximum Likelihood Estimation: “Bayesian credible intervals are too wide, i.e., they provide more conservative tests of hypotheses, while [Maximum Likelihood] confidence intervals are too short, providing tests that are potentially very misleading, even at medium sample sizes” (Stegmueller, 2013, p. 755). It can work in small samples because Bayesian analysis is “not based on the asymptotic nature of the estimators as MLE” (Zhang et al., 2007, p. 381). • It is more supportive of the New Statistics: New Statistics (Cumming, 2014; Wasserstein & Lazar, 2016) focuses on effect sizes and confidence intervals instead of Null Hypothesis testing.10 Without the consideration of the effect size, Null Hypothesis testing is not meaningful: “Statistical significance is not equivalent to scientific, human, or economic significance. Smaller p-values do not necessarily imply the presence of larger or more important effects, and larger p-values do not imply a lack of importance or even lack of effect. Any effect, no matter how tiny, can produce a small p-value if the sample size or 10
There is an extensive literature dealing with common misinterpretations of p-values, confidence intervals etc. (Gigerenzer, 1993; Goodman, 2008; Greenland et al., 2016).
8.4 Methodology: The Bayesian TSCS Within-Between Model
247
measurement precision is high enough, and large effects may produce unimpressive p-values if the sample size is small or measurements are imprecise” (Wasserstein & Lazar, 2016, p. 132). Therefore, I especially focus on effect sizes in form of Bayesian credibility intervals. In contrast to frequentist confidence intervals, the credibility intervals offer a more natural interpretation by incorporating a distributional information (Kruschke & Liddell, 2018). A 95% credibility interval indicates a 95% chance that the true value of a parameter lies within its boundaries. Confidence intervals, on the other hand, merely state which other values would also be significant without indicating that values near the center of the interval are more probable. I call a model parameter to be significant, if the 95% highest posterior density (HPD) interval of this parameter does not contain 0. Thereby, the HPD “indicates which points of a distribution are most credible, and which cover most of the distribution. Thus, [it] summarizes the distribution by specifying an interval that spans most of the distribution, […] such that every point inside the interval has higher credibility than any point outside the interval” (Kruschke, 2015, p. 87). • It offers diagnostic capabilities like PPC: Bayesian estimation offers two diagnostic capabilities which complement each other: Posterior Predictive Checking (PPC) and information criteria like WAIC and LOOIC (Vehtari et al., 2017). The main idea of PPC is that “if a model is a good fit we should be able to use it to generate data that resemble the data we observed” (Gabry et al., 2019, p. 8). Widely Applicable Information Criterion (WAIC) and especially Leave-One-Out Cross-Validation Information Criterion (LOOIC) are measures of predictive accuracy. WAIC and LOOIC assess the “out-of-sample predictive accuracy using within-sample fits” (Vehtari et al., 2017, p. 1413). They help to prevent overfitting by assessing how generalizable a model is by approximating how well a model would predict future values. These information criteria can be used for model comparison and selection.
8.4.1.3 Bayesian Multilevel Model for TSCS data The TSCS model can be transformed into a Bayesian multilevel model (Shor et al., 2007): This has several advantages over a simple fixed effects design. One the one hand, it can not only estimate the effects of time-invariant variables but also cross-level interactions where a higher-level variable influences a lower-level variable. It also allows the modeling of random coefficients. On the other hand, by partially pooling, all the information from different parts of the model can be combined, for example to mitigate the effects of outliers. However, to exclude bias, the time-varying independent variables are split in a within- and betweeneffect according to the within-between-model (see above). To account for unit
248
8
Explaining Goal-Oriented Performance
heterogeneity, a random intercept varying by countries is included. Contemporaneous correlation requires a random intercept varying over the time points. This transforms the hierarchical model to a cross-classified multilevel model: Observations are not only nested within countries but also nested within time-periods allowing for time patterns and shocks. Panel heteroscedasticity, a different error or precision term per country, is modeled by adding a random intercept of countries in the variance resp. precision term. Finally, serial correlated errors are accounted for by introducing a lagged dependent variable. This shows another important difference to the classical TSCS approach: Common TSCS analyses apply robust standard errors, the panel corrected standard errors (PCSE) proposed by Beck/Katz (1995). These analyses thus follow a twostep strategy: First, they estimate the OLS regression in the knowledge that the properties of the data actually violate the model assumptions. Only in a second step do they then replace the incorrectly estimated model standard errors with robust standard errors. Robust standard errors, however, indicate that there is a model misspecification and King and Roberts (2015, pp. 160–161) argue that the researcher should not stop there, but instead “follow venerable best practices by using well-known model diagnostics to evaluate and then respecify [the] statistical model”. Robust standard errors are not an end in themselves, but rather a diagnostic tool that indicates that the model needs to be modified. If the model is then correctly specified, “classical and robust standard error estimates will be approximately the same” (King & Roberts, 2015, p. 161). The retrospective application of robust standard errors is not possible in Bayesian statistics. Instead, Bayesian statistics must actively consider the TSCS assumptions and incorporate them into the model specifications. For all models, I apply non-informative and weakly informative priors that do not contain information about the relationship of the variables, so the data can speak for itself, but these priors help the model to converge by pointing to plausible parameter space. For the variance parameters, I use half-Cauchy priors. This distribution has desirable feature for multilevel model, in particular that it restricts the variance components to be “away from very large values [and provides] more flexible and […] better behavior near 0” (Gelman, 2006, p. 528). For the coefficients of the independent variables I use a normal distribution with a mean of 0 and a standard deviation of 100, which can be considered in this study as a non-informative prior. For the autoregression parameter (LDV), the standard normal distribution N (0,1) is used. This is still a weakly informative prior, because the LDV parameter usually varies between 0 and 1. The prior supports the model by indicating a plausible parameter space but it does not prevent the model from considering a unit root or an explosive process. The prior for the
8.4 Methodology: The Bayesian TSCS Within-Between Model
249
intercept is a normal distribution with mean 0 and standard deviation 2. Importantly, this is also a weakly informative prior since the dependent variable follows a standard normal distribution. The calculations were accomplished with the Rpackage “brms” (Bürkner, 2017). Figure 8.1 shows the DAG representation of the Bayesian Multilevel TSCS ADL model.
Source: own representaon
Figure 8.1 DAG representation of the Bayesian Multilevel TSCS Model (ADL)
8.4.2
Models for Goal-Attainment Performance and Confidence Performance
Goal-attainment performance and confidence performance are measured differently than the other performance areas. The data for the goal-attainment performance is not a time series and therefore does not have a multilevel structure. Therefore, I perform a single level regression. Similarly, and consistently to the
250
8
Explaining Goal-Oriented Performance
TSCS models, I apply Bayesian statistics using the R-package “brms” (Bürkner, 2017). However, I compare the regression model based on a normal distribution with a more robust regression based on the Student-t distribution. The Student-t distribution has fatter tails than the normal distribution and therefore, is less sensitive to outliers: “Regressions estimated using the t model are said to be robust in that the coefficient estimates are less influenced by individual outlying data points” (Gelman & Hill, 2007, p. 124). Diagnostics is reduced to checking the quality of the imputation model, linearity, multicollinearity and homoscedasticity with respect to covariates. I apply non-informative and weakly priors for this model. The prior for variance parameter, as above, is a half-Cauchy distribution, while I apply a normal distribution with mean 0 and standard deviation of 100 for the coefficients of the independent variables. A gamma (2, 0.1) prior is used for the degrees of freedom parameter of the Student-t distribution. This prior is recommended since the distribution allows “some mass to large values of [the degrees of freedom parameter] (virtually implying Normality) as well as to small values of the degrees of freedom, thus also allowing for thicker tails” (Juárez & Steel, 2010, p. 1135). Figure 8.2 shows the DAG representation of this model.
Source: own representaon
Figure 8.2 DAG representation of the Bayesian Student-t Model
8.4 Methodology: The Bayesian TSCS Within-Between Model
251
Finally, the data for confidence performance is survey data and has a multilevel structure: Respondents are nested within countries. Due to the large sample size and the associated long computing time, Bayesian estimation is not feasible. I apply instead the standard maximum likelihood estimation. I rely on the glmmTMB-package (Magnusson et al., 2020) for the R environment. The diagnostic procedure is similar to the TSCS models, but the diagnostics associated with the time series character is omitted (e.g. contemporaneous correlation, autocorrelation, unit root). This model is depicted in Figure 8.3. Figure 8.3 DAG representation of the Multilevel Model for Survey Data
Source: own representaon
8.4.3
Compositional Data as Independent Variables
Finally, there is another important methodological issue which needs to be addressed. The most important independent variables of this study, the democracy profile of a country, are of the “compositional” data type (Aitchison, 1986). The variables express the probability of a country belonging to each democracy profile as a percentage. For instance, the data for Germany in 2016 show the following values for each democracy profile (four-cluster solution): FEC: 0.124, fEc: 0.672, Fec: 0.040, FeC: 0.019 and fEC: 0.146. Thereby, each specific democracy profile is a part or a subcomposition of the overall composition. The special characteristic of
252
8
Explaining Goal-Oriented Performance
compositional data is that it consists of strictly positive values which add up to a constant (Greenacre, 2019, p. 2). It is important to emphasize that compositional data contain relative information about the components instead of absolute information. It shows that Germany is more likely to be a member of the fEc than the Fec or FeC democracy profile. There is also certain probability that Germany belongs to the fEC or FEC democracy profile for this year. However, compositional data pose a problem for regression analysis. One the one hand, the interpretation of the coefficients is difficult. In general, regression coefficients are interpreted as the effect of a unit change of the explanatory variable, while all other variables are kept constant. However, this does not work for compositional data because the parts of the composition are interrelated. Here, the effect of a change in one part must be accompanied by an increase/decrease in the other parts (Buis, 2019, p. 19). On the other hand, results with compositional data should be subcompostional coherent so that the result does not depend on the parts chosen of the composition (Greenacre, 2019, pp. 5–6). As Hron et al. (2012, p. 1115) state, “[c]ompositional explanatory variables should not be directly used in a linear regression model because any inference statistic can become misleading”. This is why Greenacre (2019, p. 115) says: “Don’t use their original values, use logratios of the compositional parts”. The ratios between these parts express relative information and make them invariant. It is also “the key to convert ratios into the appropriate additive scale for statistical computations, as well as tending to symmetrize their distributions and reduce the effect of outliers” (Greenacre, 2019, p. 7). This transformation shifts “compositional data from their original sample space to an unrestricted real space, where standard statistical methods can be applied for their further analysis” (Filzmoser et al., 2018, p. 5). However, logratios cannot be computed if zeros are present because it is not possible to divide by 0 or calculate log(0). A frequently used strategy is to replace the zeros by a very small positive value (say 0.005) to ensure the calculation of log ratios (Greenacre, 2019, p. 57). However, zero values do not appear in the dataset of this study. At the same time, the components of the compositional data allow for flexible handling and can be added together to form amalgamations (Greenacre, 2019, p. 19). Instead of testing all pairwise combinations of the democracy profiles, I will consider the following logratio-amalgamations for the four-cluster and fivecluster solution (see Table 8.3): The first set of log ratios, which are based on the four-cluster solution, tests the effects of the different trade-off dimensions: the effective government dimension (freedom vs. control trade-off) is tested by combining the control-focused profiles FeC and fEC and contrasting it to Fec and fEc profiles. The inclusiveness dimension (freedom vs. equality trade-off) is
8.4 Methodology: The Bayesian TSCS Within-Between Model
253
analyzed by aggregating the egalitarian profiles fEC and fEc and comparing it to libertarian profiles Fec and FeC. For the second set, I assess the impact of a particular democracy profile as opposed to all other profiles. Thereby, I use the more detailed five-cluster solution. Table 8.3 Amalgamations of Democracy Profiles Four-Cluster Solution FKM4_c =
Fec+ f Ec log( FeC+ f EC )
FKM4_E = log(
f EC+ f Ec FeC+Fec )
Five-Cluster Solution Fec FKM5_Fec = log( FeC+ f EC+ f Ec+F EC ) f Ec FKM5_fEc = log( FeC+ f EC+Fec+F EC ) FeC FKM5_FeC = log( Fec+ f EC+ f Ec+F EC ) f EC FKM5_fEC = log( FeC+Fec+ f Ec+F EC ) F EC FKM5_FEC = log( FeC+ f EC+ f Ec+Fec )
Source: own table
8.4.4
Operationalization: Datasets and Variables
To ensure that the most important control variables are included in the empirical analyses, I apply a twofold strategy: On the one hand, I follow Schmidt’s general explanation scheme of public policy which differentiates between the theories of economic modernization, power resource theory, partisan theory, institutional theories, international factors and historical path dependence (see 8.2.2). I enrich these theories by controlling for informal institutions, statehood and spatial correlation. On the other hand, based on the literature review, I have selected additional control variables that are more specific to the policy area in question. However, it is important that I do not try to explain the performance outcomes as well as possible, but rather test the impact of the democracy profiles (similar Roller, 2005, p. 245). For this purpose, the inclusion of control variables covering a broad spectrum of effects should be sufficient. I draw data from two main sources for the control variables: The Quality of Government (QoG) Standard Time-Series dataset Version January 2019 (Teorell et al., 2019) and the V-Dem dataset Version 9 (Coppedge, Gerring, Knutsen, Lindberg, Teorell, et al., 2019). The QoG collects and unifies datasets from over 100 different sources. It does not modify the data, but combines the sources under a
254
8
Explaining Goal-Oriented Performance
common unique identifier making data handling easier. Therefore, I usually do not refer to the original sources and variable names, all these variable names and detailed sources can be found in the QoG codebook. The advantages and disadvantages of the V-Dem dataset are already discussed in Chapter 3. QoG and V-Dem use the same identifiers, so creating the data from these two sources is more convenient and less error-prone. Economic modernization is measured with different variables for the performance areas: To avoid endogeneity, I choose the indicator “population ages 65 and above (% of the total population)” from the World Development Index (variable name: wdi pop65) for the economic performance area. GDP per capita in constant 2010 US dollars (variable name: wdi gdpcapcon2010) is selected for the other performance dimensions. I include an additional factor for the domestic security performance which fits in this category of economic modernization: As Lappi-Seppälä (2010) showed, economic inequality affects the crime rate. I measure it with my Economic Inequality Index of the Social Performance Area. It is not reasonable to include the Power Resource Theory and the Partisan theory in this analysis, as this would lead to a loss of 25% to 50% of all data for the regression analysis (see for example Figure 160 or Figure 169 in the appendix). While the Power Resource Theory can be measured by the Trade Union Density Rate (% of total employers) from the database “Institutional Characteristics of Trade Unions, Wage Setting, State Intervention and Social Pacts in 55 countries between 1960 and 2018” (ICWSS), the Partisan theory is usually measured by the so-called Schmidt-index of the CPDS (see footnote 2). The CPDS dataset is created by Armingeon (2018) and covers political and institutional characteristics of 36 countries from 1960 to 2016. However, the variables of these datasets cannot be imputed in any meaningful way. This is because they encompass only certain regions (OECD/EU countries). It is not known whether the data can be transferred to other regions or other countries. It could be, for instance, that a certain party family does not exist in these other countries or regions. Furthermore, restricting the sample size would limit the variability of the dependent variable and especially the variability of the democracy profiles. While the Power Resource Theory and Partisan theory had to be omitted from the analysis, I distinguish between two institutional factors. The first is the central bank independence, which in the approaches examined plays a crucial role for explanation of economic performance by influencing the inflation and unemployment rate. To measure central bank independence, I rely on the Central Bank Independence (CBI) dataset (Garriga, 2016). This new dataset covers de jure central bank independence in four dimensions for 182 countries from 1970 to 2012. It assesses the “CEO’s characteristics (appointment, dismissal, and term of
8.4 Methodology: The Bayesian TSCS Within-Between Model
255
office of the chief executive officer of the bank); policy formulation attributions (who formulates and has the final decision in monetary policy and the role of the central bank in the budget process); the central bank’s objectives; and the central bank’s limitations on lending to the public sector” (Garriga, 2016, p. 854). The reliability of this index was tested, and it also correlated strongly with other indices for central bank independence based on a smaller sample size. I use the index, which weights the four components according to conceptual reasons, ranging from 0 (lowest independence) to 1 (highest independence) (variable name: lvaw garriga). Another institutional factor that is also relevant for the other performance areas is corporatism. Corporatism is drawn from the variable “CSO structure” from the Varieties of Democracy dataset (variable name: v2csstruc 1). It measures whether “large encompassing organizations dominate [and whether the] government and CSOs are linked formally through a corporatist system of interest intermediation” (Coppedge, Gerring, Knutsen, Lindberg, Skaaning, et al., 2019, p. 181). To complement this picture, informal institutions are included. Informal institutions can counteract the functioning of formal institutions, and thus hindering performance. Therefore, I draw on the Political Corruption Index from the Varieties of Democracy dataset (variable name: v2x corr). This index measures corruption in the public sector, in the executive, legislature and judiciary.11 International Factors are important to control for. I measure the openness of the economy, its capacity to trade, via the sum of imports and exports as % of GDP. The variable comes from the World Development Index (wdi trade). Historic path dependence is an important factor, too. In this case the measurement via the lagged dependent variables is suitable. Once a path has been taken, it is difficult to leave it, so that current values should depend to some extent on previous values. Furthermore, by controlling for the effects of the financial crisis in 2008, we examine whether this crisis led to a change of path in the sense of a “critical juncture” (Lauth, 2016b, p. 188).12 Finally, an important prerequisite of performance is an intact statehood. Statehood is measured by the indicator “State authority over territory” from the V-Dem dataset (variable name: v2svstterr). It measures “Over what percentage (%) of the territory does the state have effective control?” (Coppedge, Gerring, Knutsen, 11
Please note that the Democracy Matrix includes corruption measure only in the contextual measurement stage. However, the trade-off measurement of the democracy profiles is not based on the context measurement but on the core measurement. 12 This is measured by creating a binary variable fiscalcrisis_cat which takes the values 0 for all years before 2008, and 1 for all the years since 2008.
256
8
Explaining Goal-Oriented Performance
Lindberg, Skaaning, et al., 2019, p. 175).13 I also include spatial correlation as a control variable, because countries “lying close to one another may display similar values for extraneous reasons (culture, geography, diffusion, and so forth)” (Gerring et al., 2005, pp. 574–575). For each country, I created the average of the dependent variables for all countries weighted by its physical distance to these other countries. This procedure is similar to Gerring et al. Finally, there are two performance areas that require a distinct operationalization due to different data structures and methods applied: goal-attainment performance and latent pattern maintenance performance. The former is a crosssectional analysis, while the latter consists of an analysis of survey data. Two institutional factors, age and length of the constitution, are included as control variables for the goal-attainment performance—just like the other studies in this research field. Both variables come from the Comparative Constitutions Project (see Section 5.4 for a critique). Since the number of observations is quite small for this performance area (at least for the amendment rate by Lutz), I include the Partisan indicator. I average the scores over the time period used to calculate the amendment rate. For latent pattern maintenance performance, I control for the personal evaluation of the economic situation, interests in politics and socio-demographic characteristics (education, age, gender, income) of the respondents on the individual level. I also control for social capital in the form of social trust and values such as post-materialism (Inglehart, 1977, 2007).14 Furthermore, I include the GDP per capita variable on the country level as a control for economic modernizations. Due to the smaller sample size at the second level, I refrain from including other control variables. Table 8.4 shows a summary of the operationalization for each performance area.
8.4.5
Workflow of the Regression Analysis
The analysis follows the workflow and diagnostic approach described here in order to reduce errors and false conclusions. In a first step, missing data is imputed 13
For a detailed discussion of this indicator, see section 3.3.4. The formulation and labels of the items are as follows: Satisfaction with financial situation of household (1:Dissatisfied; 10:Satisfied); Interest in politics (1: Very interested; 4: Not at all interested); Highest educational level attained (1: inadequately completed elementary education; 8:University with degree); Scale of incomes (1: Lower step; 11: Highest step); Most people can be trusted (1: Most people can be trusted; 2: Can´t be too careful); PostMaterialist index (1: Materialist; 3: Postmaterialist). The scales for interest in politics and social trust are reversed.
14
Corporatism
Corporatism; Central Bank Independence
Political Corruption
Openness of the Economy
Institutional Factors
Informal Institutions
International Factors
Age of constitution; Length of constitution
Cabinet Composition via Schmidt index
–
–
Goal-Attainment Performance
Openness of the Economy
–
Political Corruption –
–
Partisan Theory –
GDP per capita
–
Population (65+)
Environmental Performance
Power Resource – Theory
Economic Modernization
Economic Performance
Table 8.4 Operationalization of the Control Variables Domestic Security Performance
Openness of the Economy
Political Corruption
Corporatism
–
–
–
Political Corruption
–
–
–
GDP per capita GDP per capita; Economic Inequality Index
Social Performance
(continued)
National Level: GDP per capita
Individual Level: personal evaluation of the economic situation; interests in politics; education; age; gender; income; social trust; post-materialism
Survey Data
Integration
8.4 Methodology: The Bayesian TSCS Within-Between Model 257
Distance weighted average of Dependent Variable
Spatial Correlation
–
–
Goal-Attainment Performance
Distance weighted average of Dependent Variable
State Authority Over Territory
Lagged Dependent Variable; Financial Crisis 2008
Social Performance
Distance weighted average of Dependent Variable
State Authority Over Territory
Lagged Dependent Variable; Financial Crisis 2008
Domestic Security Performance
Integration
8
Note: All regression models include a trend-variable as well. Source: Own Table
Distance weighted – average of Dependent Variable
State Authority State Authority Over Territory Over Territory
Statehood
Lagged Dependent Variable; Financial Crisis 2008
Lagged Dependent Variable; Financial Crisis 2008
Environmental Performance
Historical Path Dependence
Economic Performance
Table 8.4 (continued)
258 Explaining Goal-Oriented Performance
8.4 Methodology: The Bayesian TSCS Within-Between Model
259
for the independent variables. The procedure and diagnostics of multiple imputation is explained in detail in chapter 6. In a second step, I perform exploratory analysis and descriptive statistics. The amount of missing values (if it was not possible to impute them) and possible outliers is evaluated. It is also helpful in detecting heteroscedasticity by visualizing the relationship of the various independent variables on the components of the dependent variables via scatterplots. In addition, I check the stationarity of the variables included in the analysis. However, since the unit root tests proposed in the econometrics literature are often not suitable for the TSCS data in political science, I rely on the simple autoregression test (see 8.4.1.1). Before the interpretation of the results of the multilevel TSCS regression, I examine the convergence of the Hamiltonian Monte Carlo algorithm. Thereby, the algorithm uses different starting points and tests whether all these chains move to the same values. The standard criterion R ≤ 1.05 which indicates a good mixing of the chains is used (Gelman, 2013, p. 285; Gelman & Rubin, 1992). The idea is “to calculate the variance of samples within each chain and compare it to the between-chain variance. Intuitively, if these are about the same, then it indicates that we would find it difficult, on the basis of the sampling distribution alone, to differentiate between samples from one chain versus those from any of the others” (Lambert, 2018, p. 315). I also check the trace plots for each parameter. To assess whether there is a need to model unit heterogeneity, contemporaneous correlation, panel heteroscedasticity and autocorrelation, I base the decision on the following idea: Each model is flawed and does not capture the reality perfectly. Instead of asking whether the model is true or false, I ask “Do the model’s deficiencies have a noticeable effect on the substantive inferences?” (Gelman, 2013, p. 142). Therefore, a good model is one “that can account for the variation in data that is pertinent to this specific purpose” (Lambert, 2018, p. 217). Bayesian analysis offers two ways of checking whether the model is appropriate: Posterior Predictive Checks (PPC) and information criteria. Posterior Predictive Checks (PPC) “combine uncertainty about parameters, as described by the posterior distribution, with uncertainty about outcomes, as described by the assumed likelihood function. These checks are useful […] for prospecting for ways in which your models are inadequate” (McElreath, 2015, p. 68). The aim of PPC is to “drive intuitions about the qualitative manner in which the model succeeds or fails, and about what sort of novel model formulation might better capture the trends in the data” (Kruschke, 2015, p. 331). distribution p(y r ep |y) = Ther epsimulation is done from the posterior predictive p(y |θ ) p(θ |y)dθ , where y is the observed data, y r ep is the replicated data and θ are the regression parameters. In this sense Bayesian models are “generative”
260
8
Explaining Goal-Oriented Performance
(McElreath, 2015, p. 62). They produce data given the estimated parameters and the likelihood function of the model. Thus, the posterior predictive distribution combines the uncertainty in the parameters with the uncertainty due to sampling variation. Afterwards, the observed data is compared to the posterior predictive distribution generated by the model. The comparison is based on a discrepancy measure T (y, θ ) that needs to be defined. The result of this test can be express by a “Bayesian p-value [which] is defined as the probability that the replicated data could be more extreme than the observed data, as measured by the test quantity” (Gelman, 2013, p. 146) in the sense that T (y r ep , θ ) ≥ T (y, θ ). A model does not generate data similar to observed data, “if a discrepancy is of practical importance and its observed value has a tail-area probability near 0 or 1, indicating that the observed pattern would be unlikely to be seen in replications of the data if the model were true” (Gelman, 2013, p. 150). Bayesian p-values of 0.5 are indicating a good fit to the data. Furthermore, information criteria support model comparison and model selection. I rely on the Bayesian Leave-One-Out Cross-Validation Information Criterion (LOOIC). LOOIC measures the predictive accuracy by testing the “model’s capacity to generalize” (Lambert, 2018, p. 225) to future collected data. Although there is no opportunity to collect data in the future and to test whether the model correctly predicts these data, LOOIC is a good approximation. In a next step, I check the assumptions of homoscedasticity and linearity with respect to the covariates using residual plots. The last step is concerned with the visualization of the model implications (Lag Distribution, Long-Run Multiplier and Dynamic Simulation): Due to the TSCS characteristics of the data, credibility intervals alone would not be informative. Thus, besides these credibility intervals, it is crucial to see how the effect unfolds over time to assess the effect size. To visualize the effects, I plot the implied lag distribution of the variables. The lag distribution rests on the “impulse response function” (Beck & Katz, 2011, p. 335). The visualization traces how a one-time shock from X affects Y and how the effect dissipates over future time periods (see De Boef & Keele, 2008, p. 188 for various lag distributions). For instance, is there an instantaneous effect that vanishes immediately in the next period, or is the effect longer distributed over future time periods? In addition, I calculate the Long-Run Multiplier (LRM). The LRM is the “total effect X t has on Yt distributed over future time periods” (De Boef & Keele, 2008, p. 191): While X and Y are in state of equilibrium at first, a sudden change in X t will disturb this equilibrium, so that Y will also change and in the future reach a new equilibrium with X. The long-run multiplier shows the cumulative change in Y over these future periods caused by X. However, the “LRM can be statistically significant even if individual terms in the regression
8.5 Empirical Analyses
261
model are not. The statistical significance of any single term in the [ECM] or the ADL is of little consequence for assessing long run effects” (Keele et al., 2016, p. 295). According to Keele et al. (2016, p. 295, n. 13), the Long Run Multiplier can be calculated with
q
i=0 βi , while the LRM for the ECM(p,q) p 1− i=1 αi q ∗ β i L R E EC M = − i=0 p ∗ . Importantly, in contrast to frei=1 α1
for the ADL model is L R E ADL =
quentist statistics, this value with its HPD interval can be easily calculated with Bayesian statistics by applying the posterior distribution of the model parameters to these formulas. In addition, I apply a technique called dynamic simulation (Williams & Whitten, 2012) that follows the impact of a variable over time. This methods helps to “fully explore these long-term effects in dynamic relationships so that they can make the full slate of inferences but also so that they can avoid making inferences that are only valid when examining effects in the short-term” (Williams & Whitten, 2012, p. 688). A summary is given in Table 8.5. Since coupling TSCS analysis with Bayesian statistics is not often practiced, I am testing the capabilities of the model in the appendix 6.1 using simulated data. On the one hand, it helps to ensure that the statistical model of this study is properly set up to capture the relevant assumptions of TSCS data. On the other hand, it also provides information on how well (and in what situations) the applied diagnostic procedures (LOO information criterium, posterior-predictive checks and residuals plots) are able to detect erroneous specifications of the model.
8.5
Empirical Analyses
8.5.1
Economic Performance
I start by presenting the findings of the exploratory analysis for the wealth and productivity component. Overall, not many observations for TSCS regression of economic performance are missing (see Figure 160 in the appendix). The transformation of the values to a normal distribution succeeds to a satisfactory extent, the convergence as well as the predictive performance of the imputation model are acceptable (see Figure 161, Figure 162 and Figure 163 in the appendix). Since only a few values were missing, I impute only five datasets. Finally, the XY-plot (see Figure 164 in the appendix) does not reveal troublesome outlying observations. Beck’s residual test reveals that although some variables are sluggish, the confidence intervals of the lagged residual term for each variable are still far away from 1 (see Table 83 in the appendix). However, the population size indicator (independent variable) shows a problematic value of ≥1. Therefore, a unit
262
8
Explaining Goal-Oriented Performance
Table 8.5 Workflow and Regression Diagnostics Step
Stascal Test
Mulple Imputaon
Treat
Exploratory Analysis
missing
Visualizaon values
(check
Convergence Plot
convergence; check fit of imputaon
Plot showing observed vs.
model)
imputed values
Missing values, Distribuons, Detecon
Missingness Plot
of
XY-Plot
Heteroscedascity
and
Mulcollinearity; Outliers Check Staonarity
Autoregression Test (values >=
1
-
indicate unit root or explosive process) 1.05,
Convergence of Markov Chains Unit heterogeneity
Loo
Comparison:
Trace Plots Model
without
Tesng of TSCS assumpons
random intercept for countries vs.
Posterior Predicve Check: Country means
Model with random intercept for countries Contemporaneous
Loo
Correlaon
random intercept for me vs.
Comparison:
Model
without
Posterior Predicve Check: Time means
Model with random intercept for me Panel heteroscedascity
Loo
Comparison:
Model
with
homoscedasc errors vs. Model with
Posterior Predicve Check: Variance
heteroscedasc errors Autocorrelaon and Trend
Lagged Residual Test
Linearity
-
Pearson Residuals vs. Fied
-
Pearson
Values/predictors Homoscedascity
with
respect
to
covariates Visualizaon of the Effects and Long-
Residuals
vs.
Predictors Significance of LRM
Run Mulplier (LRM)
Lag Distribuon Plot Dynamic Simulaon
Source: Own Table
root is present and I use only the between estimate of population size. Finally, this results in 2136 observations for 85 countries from 1980 to 2017. The testing of the model assumptions of the economic performance shows that the model with the highest complexity (unit heterogeneity, contemporaneous correlation, panel heteroscedasticity and autocorrelation) is favored according to the LOOIC value (see Table 84 and Table 85 in the appendix). The PPC indicates the same: there is a substantial better fit to the observed data, the more complex the model is (Figure 165 and Figure 166 in the appendix). However, the specification
8.5 Empirical Analyses
263
of the number of LDVs was different for each component of economic performance. Regarding the wealth component, one lagged dependent variable was not sufficient to remove the serial correlation in the residuals. Testing a second lag removed all of the residual autocorrelation. However, the chains did not converge fully. Changing to the ECM with two LDVs did solve this problem: All 8 chains with 3000 iterations (warmup = 1000, post-warmup samples = 2000, thin = 2) converged with R < 1.05. Finally, regarding the productivity component, the inclusion of two LDVs removed the autocorrelation of the residuals. It was not necessary to switch to the ECM specification. All 8 chains with 3000 iterations (warmup = 1000, post-warmup samples = 2000, thin = 2) converged with R < 1.05. Table 86 in the appendix lists all models and parameters for the wealth component of the economic performance. The empirical results of the wealth component can be described as follows: Only one LRM of the control variables is significant according to the 95% HPD interval: the openness of the economy has a total positive effect (Figure 167 in the appendix). Regarding the LRM effects of the democracy profiles, only the LRM for the effective government dimension (tradeoff freedom vs. control trade-off) is significant (Figure 8.4). A change in this dimension towards a more majoritarian democracy leads to a positive total effect on wealth over time. The lag distribution of this trade-off dimension shows a delayed effect which then dissipates at a slow pace. To evaluate, whether the effect is substantial, I created two scenarios based on the USA and India (see Figure 8.5). Thereby, the USA starts with a higher wealth performance value than India. Both countries develop over time to either a more majoritarian or a more control-focused democracy (from 0 to 95% resp. from 0 to 5% quantile). The effect size of the effective government dimension is about 0.1 standard deviations difference on the wealth component between a more majoritarian and a more control-focused democracy after 45 years. Therefore, the more control-focused a democracy becomes over the years, the less wealth it creates over time. The effect becomes significantly different after 25 years. There are no statistically significant between-effects in the control or democracy profile variables (Figure 8.6). The only exception is the indicator for the “population ages 65 and above” which measures the overall modernization status and has a positive effect on wealth. The empirical results for the productivity component are as follows: The full table with all models and parameters can be found in Table 87 in the appendix. Regarding the control variables, only the LRM of the openness of the economy shows a significant positive relationship to productivity (the 95%-HPD interval does not contain 0) (see Figure 168 in the appendix). Regarding the LRM effects
264
8
Explaining Goal-Oriented Performance
Note: Dependent and independent variables are standardized. While the independent variable changes by one standard deviaon, the dependent variable changes by the amount of standard deviaon given by the LRM, holding other variables constant. 95%-HPD-Interval. Source: own presentaon
Figure 8.4 LRMs for the Democracy Profiles (Wealth)
of the democracy profiles, there is a long-run positive effect of the FeC and Fec profile on productivity, while the inclusion dimension is actually negatively related to productivity. These effects are statistically significant according to the 95% HPD interval. The lag distribution of the Fec shows a large initial effect, which then quickly disappears. The FeC profile shows a large initial effect which then slowly dissipates. Finally, the egalitarian profiles (inclusiveness dimension) show a large negative initial effect which then slowly vanishes (Figure 8.7). Are these effects substantial? To test this, I use the values of two countries, Switzerland and India. While Switzerland has a high level of productivity right at the start, India has low productivity outcome. I construct scenarios, where these countries become more libertarian-majoritarian (Fec), more libertarian-controlfocused (FeC) or more inclusive (from 0 to the 95% quantile). Each of these
8.5 Empirical Analyses
265
Note: 95%-HPD-Interval Source: own presentaon
Figure 8.5 Dynamic Simulation (Wealth)
developments is compared to the opposite movements (from 0 to the 5% quantile). There is more than a 0.2 standard deviation difference between these transformations after 45 years (see Figure 8.8). The effects become significantly different after 35 years. The between-effects of the control variables do not reach the statistical significance level with the exception of quality of statehood (see Figure 8.9): The higher the quality of statehood, the greater the economic productivity. Democracy profiles also show statistically significant effects. The more majoritarian-controlfocused (FeC) a democracy is, the higher the productivity. It rises productivity by 0.2 standard deviations. The closer a democracy is to the egalitarian-majoritarian profile (fEc), the lower the economic performance in terms of productivity (−0.2 SD). If I use a lower significance level (90% HPD-interval), the between-effect of the trade-off dimension about effective government becomes significant. The more majoritarian a democracy, the lower its productivity (−0.2 SD). All in all, the size of within- and between- effects are very similar.
266
8
Explaining Goal-Oriented Performance
Note: Dependent and independent variables are standardized. While the independent variable changes by one standard deviaon, the dependent variable changes by the amount of standard deviaon given by the values of the coefficients, holding other variables constant. 95%-HPD-Interval. Source: own presentaon
Figure 8.6 Between-Effects (Wealth)
8.5.2
Environmental Performance
The results of the exploratory analysis of the environmental performance suggest the following: Overall, there are not many observations missing for the TSCS regression (see Figure 169 in the appendix). The transformation of the values was successful, the convergence and the predictive performance of the imputation model are excellent (Figure 170, Figure 171 and Figure 172 in the appendix). As only a few values were missing, I impute five datasets. Finally, the XY-plot (see Figure 164 in the appendix) reveals no problematic observations. Beck’s residual
8.5 Empirical Analyses
267
Note: Dependent and independent variables are standardized. While the independent variable changes by one standard deviaon, the dependent variable changes by the amount of standard deviaon given by the LRM, holding other variables constant. 95%-HPD-Interval. Source: own presentaon
Figure 8.7 LRMs for the Democracy Profiles (Productivity)
test shows that some variables are only slowly changing, but their confidence intervals are still far away from 1 (see Table 88 in the appendix). However, the coefficient for the dependent variable, General Environmental Performance, is near 1, but the confidence intervals imply that a unit root is still rejected. However, the empirical analysis should carefully examine the LDV coefficient of the fully specified model. Finally, this results in 958 observations for 40 countries from 1990 to 2017. The testing of the model assumptions shows that the model with the highest complexity (unit heterogeneity, contemporaneous correlation, panel heteroscedasticity and autocorrelation) is favored according to the LOOIC value (see Table 89
268
8
Explaining Goal-Oriented Performance
Note: 95%-HPD-Interval Source: own presentaon
Figure 8.8 Dynamic Simulation (Productivity)
in the appendix). The same is indicated by the posterior predictive check which shows an improved fit of the observed data with a higher model complexity (see Figure 174 in the appendix). One lag of the dependent variable was sufficient to remove the serial correlation in the residuals. In the multivariate solution the coefficient of the LDV is even further away from 1. All 8 chains with 3000 iterations (warmup = 1000, post-warmup samples = 2000, thin = 2) converged with R < 1.05.
8.5 Empirical Analyses
269
Note: Dependent and independent variables are standardized. While the independent variable changes by one standard deviaon, the dependent variable changes by the amount of standard deviaon given by the values of the coefficients, holding other variables constant. 95%-HPD-Interval. Source: own presentaon
Figure 8.9 Between-Effects (Productivity)
The full table with all models and parameters can be found in Table 90 in the appendix. The empirical results for the (general) environmental performance can be summarized as follows: There are no significant LRMs for the control variables, with exception of the GDP per capita indicator (see Figure 175). Regarding the LRM effects of the democracy profiles, there is no long-run effects of the democracy profiles on environmental performance (see Figure 8.10). If a lower significance level is used, the inclusiveness dimension shows a positive LRM. Thereby, the effect is delayed by one time period, and then slowly disappears over time.
270
8
Explaining Goal-Oriented Performance
Note: Dependent and independent variables are standardized. While the independent variable changes by one standard deviaon, the dependent variable changes by the amount of standard deviaon given by the LRM, holding other variables constant. 95%-HPD-Interval. Source: own presentaon
Figure 8.10 LRMs for the Democracy Profiles (Environmental Performance)
Figure 8.11 shows, whether the effect for the inclusiveness dimension is substantial. I picked Switzerland, which has a high environmental performance as a starting value, and Australia with a rather low starting performance. The effect is lower than 0.1 standard deviation difference for environmental performance after 45 years. Thereby, the difference becomes more quickly significant for a country with lower environmental performance (after 25 years). In a second step, I discuss the time-invariant between-effects. Neither the control variables nor the coefficients for the democracy profile are statistically significant. Using the 90% HPD interval, the GDP per capita becomes significant: the higher the GDP per capita, the higher the environmental performance. Overall, this falsifies the assumption that there is trade-off between GDP per capita and environmental performance (Figure 8.12).
8.5 Empirical Analyses
271
Note: 95%-HPD-Interval Source: own presentaon
Figure 8.11 Dynamic Simulation (Environmental Performance)
8.5.3
Goal-Attainment Performance
The exploratory analysis of the goal-attainment performance indicates that not many observations are missing (see Figure 176 the appendix). Some observations for the indicator “length of the constitution” are missing. This mainly applies to the length of an older, earlier constitution of a country. Nevertheless, the length of the more recent constitution is available for this country (e.g. the length of the constitution from 1809 for Sweden is missing; however, the constitution from 1974 is included). Therefore, I refrain from multiple imputation. Finally, the distribution graph and the XY-plot (see Figure 177 and Figure 178 the appendix) do not display troublesome outlying observations. Finally, this results in a maximum of 25 observations for the amendment rate by Lutz, and a maximum of 123 observations for the amendment rate by the CCP. Since the number of observations for the Lutz indicator is quite small, the Partisan indicator can be included without a substantial loss of information. However, there is a greater loss of information with respect to the CCP measure. For both dependent variables, the model based on the student-t distribution did not fit the data better than a regression based on the normal distribution according to the LOOIC (−0.1 [0.5]; 0.0 [0.8]). Therefore, I employ the simpler regression model under the normal distribution assumption.
272
8
Explaining Goal-Oriented Performance
Note: Dependent and independent variables are standardized. While the independent variable changes by one standard deviaon, the dependent variable changes by the amount of standard deviaon given by the values of the coefficients, holding other variables constant. 95%-HPD-Interval. Source: own presentaon
Figure 8.12 Between-Effects (Environmental Performance)
The regression results for the amendment rate by Lutz in Table 8.6 show the following: While the age of the constitution has no effect, the length of the constitutions is significant: As the length of the constitution grows, the amendment rate increases. The democracy profiles do not have any effect. Since the statistical power is quite low due to the small sample size, I do not only discuss the 95% HPD intervals but also the 90% HPD intervals. Using a lower level of significance two effects can be found: The fEc profile is positively related to the amendment rate, while the FEC profile is negatively associated. Controlling for the partisan variable does not lead to interesting findings for the democracy profiles (see Table 91 in the appendix). However, it appears that social-democratic and left parties change the constitution more frequently, which corresponds to the
8.5 Empirical Analyses
273
theoretical argumentation of Reutter/Lorenz (2016). Regarding the regression diagnostics, the residual versus fitted plot does not show any patterns (see Figure 179 in the appendix). Therefore, the regression model is fine. The empirical results for the CCP amendment rate provide a somewhat different picture (see Table 8.7). Both factors, age of constitution and length of constitution, increase significantly the frequency of constitutional change. The democracy profiles are not related to the CCP measure. These results do not change using a lower significance level (90% HPD). Interestingly, when the control for the partisan variable is included (see Table 92 in the appendix), the only significant effect is the FeC variable. However, the FeC variable does not decrease the frequency of constitutional change (as the theory would predict) but rather increases it. Using a lower level of significance does not change the effects, however, the Fec variable is at the brink of becoming significant. This aligns with the theory that a libertarian-majoritarian democracy changes the constitution more frequently (if they have a written constitution). The regression diagnostic in the form of the residual versus fitted plot does not show any patterns. But it indicates that the distribution does not follow a normal distribution, instead it has a bulk of values on its left side (see Figure 179 in the appendix). This might have distorting effects on the standard errors and significance.
8.5.4
Social Performance
The exploratory analysis of the economic equality and social equality component of the social performance indicate the following: Overall, not many observations are missing (see Figure 180 in the appendix). The transformation of the values, the convergence and the predictive performance of the imputation model are acceptable (Figure 181, Figure 182 and Figure 183 in the appendix). Because of the few missing values, I impute only five datasets. Finally, the XY-plot (see Figure 184 in the appendix) does not reveal any outliers. According to Beck’s residual test the confidence intervals of the lagged residual terms are far away from 1 (see Table 93 in the appendix). This gives considerable evidence that no unit roots are in the dataset. Finally, this results in 1410 observations for 70 countries and from 1970 to 2017. The testing of the model assumptions shows that the model with the highest complexity (unit heterogeneity, contemporaneous correlation, panel heteroscedasticity and autocorrelation) is favored according to the LOOIC values (see Table 94 and Table 95 in the appendix). However, the standard errors of the LOOIC values show that there is not necessarily an improvement in model fit
274
8
Explaining Goal-Oriented Performance
Table 8.6 Regression Results for Goal-Attainment Performance (Lutz) Predictors
M1
Intercept
−0.22 −0.15 −0.52 −0.28 −0.56 −0.16 −0.36 −0.47 (−0.92, (−0.87, (−1.42, (−0.98, (−1.28, (−0.84, (−1.15, (−1.20, 0.47) 0.57) 0.33) 0.44) 0.27) 0.50) 0.41) 0.30)
Length Constitution (CCP)
0.44 (0.03, 0.86)
Age Constitution (CCP)
0.34 0.28 0.57 0.36 0.56 0.31 0.45 0.48 (−0.34, (−0.45, (−0.21, (−0.31, (−0.12, (−0.32, (−0.30, (−0.14, 1.02) 0.94) 1.33) 1.01) 1.24) 0.94) 1.17) 1.18)
LibertarianMajoritarian D. (Fec)
M2
0.46 (0.05, 0.88)
M3
0.52 (0.08, 0.97)
M4
0.48 (0.05, 0.92)
M5
0.47 (0.08, 0.87)
M6
M7
0.39 0.45 (−0.00, (0.03, 0.78) 0.87)
M8
0.52 (0.11, 0.96)
0.29 (−0.27, 0.85) −0.41 (−1.04, 0.29)
LibertarianControl-Focused D. (FeC)
−0.29 (−0.89, 0.38)
EgalitarianControl-Focused D. (fEC) EgalitarianMajoritarian D. (fEc)
0.49 (−0.01, 0.98) −0.69 (−1.35, −0.06)
Balanced D. (FEC) Inclusiveness + (E)
0.21 (−0.35, 0.82)
Effective Government + (c)
0.40 (−0.09, 0.89)
Residual
0.98 (0.70, 1.29)
0.98 (0.69, 1.30)
0.97 (0.68, 1.29)
0.98 (0.69, 1.31)
0.92 (0.65, 1.23)
0.90 (0.64, 1.20)
0.99 (0.70, 1.31)
0.94 (0.66, 1.22)
Num. obs.
25
25
25
25
25
25
25
25
Note: 95%-HPD-Interval Source: own table
when including a random intercept of time to control for contemporaneous correlation. The PPC shows that especially the inclusion of the country random
8.5 Empirical Analyses
275
Table 8.7 Regression Results for Goal-Attainment Performance (CCP) Predictors
M1
Intercept
−0.16 −0.17 −0.14 −0.17 −0.14 −0.17 −0.16 −0.16 (−0.33, (−0.35, (−0.32, (−0.35, (−0.32, (−0.35, (−0.35, (−0.34, 0.01) 0.01) 0.04) 0.01) 0.03) 0.01) 0.02) 0.02)
M2
M3
M4
M5
M6
M7
M8
Length Constitution (CCP)
0.25 (0.06, 0.43)
0.25 (0.08, 0.42)
0.24 (0.07, 0.42)
0.25 (0.06, 0.42)
0.23 (0.06, 0.43)
0.25 (0.09, 0.43)
0.25 (0.09, 0.43)
0.24 (0.06, 0.41)
Age Constitution (CCP)
0.31 (0.13, 0.51)
0.31 (0.12, 0.48)
0.31 (0.11, 0.49)
0.32 (0.13, 0.51)
0.29 (0.10, 0.48)
0.32 (0.13, 0.50)
0.31 (0.13, 0.50)
0.32 (0.13, 0.51)
−0.01 (−0.28, 0.23)
LibertarianMajoritarian D. (Fec) LibertarianControl-Focused D. (FeC)
0.13 (−0.17, 0.40)
EgalitarianControl-Focused D. (fEC)
0.06 (−0.17, 0.27) −0.16 (−0.39, 0.06)
EgalitarianMajoritarian D. (fEc) Balanced D. (FEC)
0.05 (−0.19, 0.31)
Inclusiveness + (E)
−0.01 (−0.25, 0.23) −0.13 (−0.34, 0.11)
Effective Government + (c) Residual
0.98 (0.86, 1.12)
0.99 (0.87, 1.11)
0.98 (0.86, 1.11)
0.99 (0.87, 1.11)
0.98 (0.86, 1.11)
0.99 (0.87, 1.12)
0.99 (0.86, 1.11)
0.98 (0.86, 1.12)
Num. obs.
123
123
123
123
123
123
123
123
Note: 95%-HPD-Interval Source: own table
intercept improves the overall model fit, while the inclusion of a random intercept for time and panel heteroscedasticity improve only the estimate for a few years and countries (see Figure 185 and Figure 186 in the appendix). Therefore,
276
8
Explaining Goal-Oriented Performance
I still use the most complex model, even if it only provides better estimates for some countries and years. One lag of the dependent variable was not sufficient to remove the serial correlation in the residuals. The serial correlation vanished after adding a second lag in the model. All 8 chains with 3000 iterations (warmup = 1000, post-warmup samples = 2000, thin = 2) converged with R < 1.05. The empirical results for the economic equality performance are as follows: The full table with all models and parameters can be found in Table 96 in the appendix. In a first step, I focus on the LRM which show the total effect of the within-effects on the dependent variable, economic equality, distributed over time. Regarding the control variables, all LRMs are statistically insignificant (the 95%HPD interval does contain 0) (see Figure 187). Regarding the LRM effects of the democracy profiles (see Figure 8.13), there is a long-run negative effect of the FeC profile on economic equality. The lag distribution shows a large initial effect, which first falls off quickly and then slowly disappears. In addition, the use a 90%-HPD interval (or 0.1 significance level) shows that there is long-run positive effect of the inclusiveness dimension on economic equality. There is a large initial effect which then slowly vanishes over time. Are these within-effects substantial (see Figure 8.14)? This is visualized using the USA and Sweden as contrasting examples: While the USA has a lower economic equality value as a starting point, Sweden has a higher economic equality value. For both countries, two scenarios are created: One in which the withinvariables of the democracy profiles increase from 0 to the 95% quantile, and one where it decreases from 0 to the 5% quantile over the years. Probably one cannot speak of substantial effect when the variables only make about 0.15 standard deviation difference after 45 years. However, considering that the democracy profile as an institutional explanation is causally far away from the to be explained object, it is still a surprising and sizeable effect. Although economic equality decreases in all simulations in the first few years, this declining trend is reversed and economic equality is increased by changing to a profile that is less libertariancontrol-focused and more inclusive. Thereby, this effect becomes only statistically significant after 25 or more years. In a second step, I discuss the time-invariant between-effects (see Figure 8.15). Almost all control variables with the exception of corruption reach the statistical significance level: The more open the economy, the higher the quality of statehood, the higher the overall GDP per capita as well as the more corporatist a country is, the greater the economic equality. Democracy profiles also show statistically significant effects. The fEc profile leads to higher economic equality (0.2 SD increase in economic equality for 1 SD increase in fEc). The more majoritarian a democracy is, the higher the economic equality (0.2 SD).
8.5 Empirical Analyses
277
Note: Dependent and independent variables are standardized. While the independent variable changes by one standard deviaon, the dependent variable changes by the amount of standard deviaon given by the LRM, holding other variables constant. 95%-HPD-Interval. Source: own presentaon
Figure 8.13 LRMs for the Democracy Profiles (Economic Equality)
The inclusiveness dimension shows a higher economic equality (0.1 SD) based on a lower significance level (90% HPD-interval). In contrast, the control-focused profiles FeC and fEC both show a lower economic equality (0.1 resp. 0.2 SD). The FEC profile, and the Fec profile have no effect. This plot reveals that betweeneffects of the democracy profiles are only slightly more substantial than their within-counterparts. I now turn to the social equality performance. The full table with all models and parameters for the social equality performance is in Table 97 in the appendix. The LRMs of the control variables are statistically insignificant (the 95%-HPD interval does contain 0) (see Figure 187). Regarding the LRM effects of the democracy profiles in Figure 8.16, there is a long-run negative effect of the Fec profile on social equality. The negative effect for the FeC profile is as well significant
278
8
Explaining Goal-Oriented Performance
Note: 95%-HPD-Interval Source: own presentaon
Figure 8.14 Dynamic Simulation (Economic Equality)
according to the 90%-HPD interval. The lag distributions for both profiles show a large initial effect, which then slowly disappears. In addition, the use of a 90%HPD interval (or 0.1 significance level) reveal that there is long-run positive effect of the inclusiveness dimension on social equality. There is a large initial effect which then slowly vanishes over time. The within-effects for social equality are less substantial than the within-effects for the economic equality. The scenarios are the same as before: The magnitude of effects is less than 0.1 standard deviations of social equality after 45 years (see Figure 8.17). There is a statistically significant difference between the democracy profiles only after 40 years. In addition, the effects are less pronounced the higher the initial starting value of the social equality performance. The time-invariant between-effects of the trade openness of the economy, GDP per capita and corruption are significantly related to social equality (see Figure 8.18). While a more open economy and a higher GDP per capita increase social equality, corruption decreases it. Only the between-effect of the Fec profile is significantly negatively related to social equality (decrease of 0.2 SD
8.5 Empirical Analyses
279
Note: Dependent and independent variables are standardized. While the independent variable changes by one standard deviaon, the dependent variable changes by the amount of standard deviaon given by the values of the coefficients, holding other variables constant. 95%-HPD-Interval. Source: own presentaon
Figure 8.15 Between-Effects (Economic Equality)
in social equality). A 90% HPD-interval shows the between effect of the inclusiveness dimension as significant: the more inclusive the democracy profile (the higher the political equality dimension), the higher the social equality (0.15 SD). In contrast, the egalitarian-majoritarian profile (fEc), the control-focused profiles (FeC and fEC) and the balanced profile (FEC) have no effect. Nevertheless, the between-effects are more substantial than the within-effects.
8.5.5
Domestic Security Performance
The exploratory analysis of the domestic security performance reveals that most missing values can be found in one variable (see Figure 189 in the appendix): The economic inequality indicator shows a significant proportion of missing values. In
280
8
Explaining Goal-Oriented Performance
Note: Dependent and independent variables are standardized. While the independent variable changes by one standard deviaon, the dependent variable changes by the amount of standard deviaon given by the LRM, holding other variables constant. 95%-HPD-Interval. Source: own presentaon
Figure 8.16 LRMs for the Democracy Profiles (Social Equality)
contrast to the partisan and power resource variables, economic inequality values can be imputed. The transformation of the values, the convergence as well as the predictive performance of the imputation model are excellent (Figure 190, Figure 191 and Figure 192 in the appendix). Because a more substantial part is missing, I impute ten datasets. Finally, no outliers are present according to the XYplot (see Figure 193 in the appendix). In contrast to the other performance areas, Beck’s test reveals that overall, most variables are not sluggish (see Table 98 in the appendix). Their confidence interval is far away from 1. This is a clear indication that that there are no unit roots in the dataset. Finally, this results in 1461 observations for 80 countries from 1990 to 2017. The testing of the model assumptions shows that the model with the highest complexity (unit heterogeneity, contemporaneous correlation, panel heteroscedasticity and autocorrelation) is favored according to the LOOIC value (see Table 99
8.5 Empirical Analyses
281
Note: 95%-HPD-Interval Source: own presentaon
Figure 8.17 Dynamic Simulation (Social Equality)
in the appendix). The PPC indicates the same: there is a substantial better fit to the observed data, the more complex the model is (see Figure 194 in the appendix). One lag of the dependent variable was not sufficient to remove the serial correlation in the residuals. Testing a second lag still showed residual autocorrelation. Therefore, I tried the ECM specification. One lag within the ECM removed all serial correlation in the residuals. All 8 chains with 3000 iterations (warmup = 1000, post-warmup samples = 2000, thin = 2) converged according to R < 1.05. All model parameters are listed in Table 100 in the appendix. The empirical results for the domestic security performance can be described as follows: The GDP per capita variable is the only significant control parameter according to the
282
8
Explaining Goal-Oriented Performance
Note: Dependent and independent variables are standardized. While the independent variable changes by one standard deviaon, the dependent variable changes by the amount of standard deviaon given by the values of the coefficients, holding other variables constant. 95%-HPD-Interval. Source: own presentaon
Figure 8.18 Between-Effects (Social Equality)
95% HPD interval (see Figure 175). Regarding the LRM effects of the democracy profiles (see Figure 8.19), there are long-run effects of the democracy profiles which are statistically significant. While the fEC profile has a positive LRM, the FEC profile has a negative LRM. The lag distribution of the fEC profile reveal a large initial effect which then decays slowly. In contrast, the lag distribution of the FEC profile shows two first large effects which then dissipate at a slower rate. These effects are visualized in Figure 8.20 for two countries over 45 years. I selected Germany as one example, because it has a high starting value for domestic security performance. Argentina is a contrasting example, because it has a lower starting value. These scenarios reflect a movement towards higher egalitarian-control-focused democracy profile as well as towards a more balanced
8.5 Empirical Analyses
283
Note: Dependent and independent variables are standardized. While the independent variable changes by one standard deviaon, the dependent variable changes by the amount of standard deviaon given by the LRM, holding other variables constant. 95%-HPD-Interval. Source: own presentaon
Figure 8.19 LRMs for the Democracy Profiles (Domestic Security Performance)
profile. The difference for these democracy profiles are 0.2 standard deviations after 45 years. The effect becomes significant after about 25 years. However, there are no statistically significant between-effects in the control or democracy profile variables (see Figure 8.21).
8.5.6
Confidence Performance
The exploratory analysis for the survey data related to the confidence performance suggests that only a few values are missing (see Figure 196 in the appendix). However, two items had to be excluded because they were not surveyed in certain countries (subjective and objective income questions). Imputing whole questions seems not possible as well as not practical. However, both of these questions are
284
8
Explaining Goal-Oriented Performance
Note: 95%-HPD-Interval Source: own presentaon
Figure 8.20 Dynamic Simulation (Domestic Security Performance)
related to the education item. Education can be regarded as an indirect measurement for both items. An imputation procedure is therefore not necessary and would make the analysis too complicated. The scales of the items were transformed so that 0 represents the lowest category on each item (see Figure 197 in the appendix). Age was median-centered. Furthermore, no outliers appear in the second level variables (GDP per capita and democracy profile, see Figure 199 in the appendix). This finally results in 76998 respondents in 58 countries from 2000 to 2014: I included the latest available survey of each country (e.g., there were two survey in Australia 2005 and 2008, so I included only the newest 2008 survey in the regression analysis). The logratios of the democracy profiles are the average values from 1974 to 2010.15 The empirical results in Table 8.8 do not show any effects for the democracy profile variables, indicated by the absence of significant coefficients and the lack 15
Alternative time periods were tested; but it did not change the results.
8.5 Empirical Analyses
285
Note: Dependent and independent variables are standardized. While the independent variable changes by one standard deviaon, the dependent variable changes by the amount of standard deviaon given by the values of the coefficients, holding other variables constant. 95%-HPD-Interval. Source: own presentaon
Figure 8.21 Between-Effects (Domestic Security Performance)
of improvement of the AIC/BIC values. Confidence in political institutions and the specific democracy profile do not appear to be related. Although not significant, the democracy profile fEC explains a larger portion of the variance on the second level. While the other level 2 variables reduce the variance of the second level from 0.184 to only 0.180, the fEC variable reduces it to 0.178. The more likely the country belongs to the fEC profile, the higher is the level of confidence. Nevertheless, the effect size is not substantial (see Figure 200 in the appendix). Finally, a country’s GDP per capita is not related to its overall level of confidence. The other control variables on level 1 all behave the way they should according to theory: The higher the age of the respondents, the higher their confidence; if the respondent is female, she has more confidence; the higher the education, the less
286
8
Explaining Goal-Oriented Performance
confidence; the same applies to postmaterialist, who are more critical of traditional institutions; the higher the respondents’ interest in politics, the higher their confidence; and finally, the higher the social trust, the higher the confidence.
8.5.7
Discussion
The theory suggests that the effects of the democracy profiles on these policy outcomes are causally distant and are mediated in particular by political actors. Therefore, if democracy profiles are linked to certain performance outcomes, it is assumed that their effects materialize only slowly. The hypotheses assume that especially the egalitarian democracy profiles should perform better on various outcomes than libertarian democracy profiles. The theory also states that the egalitarian-majoritarian and the egalitarian-control-focused democracy should perform well on various outcomes, while the libertarian-majoritarian and libertariancontrol-focused democracies should show an overall lower performance. Table 8.9 summarizes the empirical findings. We can draw several conclusions from the empirical analyses: First, a change in the democracy profiles does not have an immediate effect, but as the simulations show, these effects need a longer time-span to be realized and become significantly different. Insofar, this conforms to the theoretical assumptions. For some performance areas, the democracy profile needed over 25 years to show a significant impact.16 Second, the impact of these democracy profiles ranges between 0.1 and 0.2 standard deviations on the performance scales after 45 years. Probably, the effect size cannot be regarded as strong. But, given that the democracy profile as an institutional explanation is causally so far away from the performance outcome, it can still be considered a remarkable effect. The fact that one can measure an effect at all is surprising. Obviously, other explanations would probably have a more substantial effect, but they are closer on the causal chain as well. Third, the empirical analyses revealed that there is at least one statistically significant effect of the democracy profiles for each performance area with the exception of the confidence performance. Thereby, most effects are concentrated in the social performance area (economic and social equality) on the one hand, 16
However, the time-span for a significant effect varies with the specification of the simulation. When there is a faster change to the highest value of the independent variable (democracy profile), the effect becomes significant more quickly. Nevertheless, the effects of the democracy profile still needed about 15 years to be realized.
−0.016 (0.03)
0 (0.04)
LibertarianMajoritarian D. (Fec)
EgalitarianMajoritarian D. (fEc)
−0.028 (0.023)
0.109 *** (0.004)
Social Trust
−0.035 *** (0.003)
−0.03 (0.023)
−0.035 *** (0.003)
−0.035 *** (0.003)
Postmaterialism
0.095 *** (0.002)
GDP per capitawdi
0.095 *** (0.002)
0.095 *** (0.002)
Interest in Politics
−0.005 *** (0.001)
0.11 *** (0.004)
−0.005 *** (0.001)
−0.005 *** (0.001)
Education
0.029 *** (0.003)
0 *** (0)
−0.139 *** (0.026)
M4
0.11 *** (0.004)
0.029 *** (0.003)
0.029 *** (0.003)
0 *** (0)
−0.144 *** (0.027)
M3
Female
−0.14 *** (0.025)
M2
0 *** (0)
−0.01 (0.026)
M1
Age
(Intercept)
Fixed Part
Term
Table 8.8 Regression Results for Confidence Performance
−0.01 (0.026)
0.11 *** (0.004)
−0.035 *** (0.003)
0.095 *** (0.002)
−0.005 *** (0.001)
0.029 *** (0.003)
0 *** (0)
−0.154 *** (0.025)
M5
−0.029 (0.023)
0.11 *** (0.004)
−0.035 *** (0.003)
0.095 *** (0.002)
−0.005 *** (0.001)
0.029 *** (0.003)
0 *** (0)
−0.138 *** (0.03)
M6
−0.029 (0.022)
0.11 *** (0.004)
−0.035 *** (0.003)
0.095 *** (0.002)
−0.005 *** (0.001)
0.029 *** (0.003)
0 *** (0)
−0.143 *** (0.025)
M7
−0.032 (0.023)
0.11 *** (0.004)
−0.035 *** (0.003)
0.095 *** (0.002)
−0.005 *** (0.001)
0.029 *** (0.003)
0 *** (0)
−0.139 *** (0.029)
M8
(continued)
−0.018 (0.026)
0.11 *** (0.004)
−0.035 *** (0.003)
0.095 *** (0.002)
−0.005 *** (0.001)
0.029 *** (0.003)
0 *** (0)
−0.141 *** (0.025)
M9
8.5 Empirical Analyses 287
0.453
0.181
0.453
0.181
−0.016 (0.032)
M7
0.453
0.181
−0.011 (0.037)
M8
0.453
0.18
−0.032 (0.035)
M9
58
NObs
NCountries
Source: own table
76995
76998
df.residual 58
76998
76989
96930.08
96846.82
58
76998
76987
96950.76
96848.99
58
76998
76987
96950.48
96848.71
58
76998
76987
96948.71
96846.94
58
76998
76987
96950.61
96848.85
58
76998
76987
96950.5
96848.73
58
76998
76987
96950.67
96848.91
58
76998
76987
96949.91
96848.14
100600.27
0.453
0.178
0.019 (0.05)
M6
BIC
0.453
0.181
M5 0.042 (0.029)
−50283.26 −48414.41 −48413.49 −48413.36 −48412.47 −48413.42 −48413.36 −48413.45 −48413.07
0.453
0.181
M4
100572.52
0.453
0.184
M3
AIC
0.464
Residualsd
M2
logLik
0.2
countrysd
M1
8
Random Part
Effective Government + (c)
Inclusiveness + (E)
Balanced D. (FEC)
LibertarianControl-Focused D. (FeC)
EgalitarianControl-Focused D. (fEC)
Term
Table 8.8 (continued)
288 Explaining Goal-Oriented Performance
8.5 Empirical Analyses
289
Table 8.9 Summary Table for Goal-oriented Performance Performance Area
Fec
FeC
FEC
fEC
fEc
E
c
–
Positive
Wealth
–
–
–
–
–
Productivity
Positive
Positive
–
–
Negative Negative Negative
Environmental Performance
–
–
–
–
–
Positive
–
Goal-Attainment Positive Performance
Positive
Negative –
Positive
–
–
Economic Equality
–
Negative –
Negative Positive
Positive
Positive
Social Equality
Negative Negative –
–
–
Positive
–
Domestic Security Performance
–
–
Negative Positive
–
–
–
LPM Performance (Confidence)
–
–
–
–
–
–
–
Sum Positive
2
2
0
1
2
3
2
Sum Negative
1
2
2
1
1
1
1
Source: own table
and the productivity component of the economic performance on the other hand. Effects for the goal-attainment area can be found as well. Fourth, there is not one overall better performing democracy profile: The democracy profiles Fec and FeC perform better on the productivity component of the economic performance, while they perform worse on social performance by lowering economic and social equality. This is especially true for the libertarian-control-focused democracy (FeC). The egalitarian-majoritarian profile (fEc) shows the contrasting effects: it shows a higher economic equality (no effect on social equality) and a lower productivity performance. It is important to stress that productivity is a distinct component of the economic performance area, but probably the wealth component, which can be considered more important, is not affected by these democracy profiles. The goal-attainment performance shows mixed results: Although the Fec (only barely significant) and fEc show a higher reformability, which conforms to the theory, the higher goal-attainment performance of the FeC profile is contrary to the theoretical expectations. The fEC profiles does show a higher domestic security performance, which is in line with
290
8
Explaining Goal-Oriented Performance
theory. Finally, the two negative effects of the FEC profile on goal-attainment and domestic security performance are not explained by the theory. Fifth, the trade-off dimensions also show relationships. With regard to the dimension of inclusiveness (freedom vs. equality trade-off), the following applies: The more egalitarian a democracy becomes, the better it scores not only in terms of social performance (both economic and social equality), but also in terms of environmental performance. Regarding the effective government dimension (freedom vs. control trade-off), the more majoritarian a democracy becomes, the better it performs in relation to economic equality and economic wealth. However, both dimensions perform worse on the productivity component. Since the dimensions of inclusiveness and effective government encompass a better performance in two completely separate performance areas, these dimensions come closest to effects resulting in an overall better performance. Overall, these general hypotheses, which related a democracy profile to an overall better performance, could not be verified; rather, the empirical results show a much greater complexity. How do these empirical findings relate to the state of research (see section 8.2)? The findings for the economic performance are in contrast to Vis et al. (2012) who do not find any systematic pattern between consensus democracy and economic performance. The findings of my study are contrary to Lijphart (2012, p. 273) because he sees a slightly better performance for the consensus democracies on unemployment, budget balance and economic freedom. My results are more similar to Anderson’s (2001) findings that consensus democracies actually perform worse on macroeconomic performance. My empirical findings show that majoritarian democracies perform better on the wealth component, while libertarian democracies perform better on the productivity component (which includes employment). Thereby, my findings imply that different aspects of Lijphart’s majoritarian democracy are important for these two components (economic wealth and productivity). Roller’s study finds that constitutional negotiation democracies perform better in economic performance. By equating constitutional negotiation democracies with a lower effective government dimension, I can only confirm this finding for the productivity component but not for the wealth component. The findings for environmental performance disagree with Scruggs (1999) analysis that the democracy model does not have any influence on environmental performance. They agree with Lijphart’s assessment that consensus democracies show a higher environmental protection. They partly agree with Poloni-Staudinger’s (2008) mixed findings insofar as the egalitarian democracy has a positive effect on environmental performance, while there is no effect of the effective government dimension.
8.5 Empirical Analyses
291
Regarding the goal-attainment performance, the empirical findings suggest that institutional variables have an influence on the amendment rate. That is contrary to the results of Ginsburg /Melton (2015), who state that institutional variables are not relevant. These findings align with Reutter/Lorenz (2016, p. 119) who state that “constitutional policy is shaped by factors of both majoritarian and consensual policy-making”: Strong governments in the form of the Fec or (to a lower extend) the FeC profile lead to an increase in reformabiltiy, as well as egalitarian democracies with a more fragmented party system. In addition, the empirical findings for the social performance area are consistent with Birchfield/Crepaz (1998), Crepaz/Moser (2004) and Schmidt (2001). Birchfield/Crepaz find that collective veto points increase income equality, while competitive veto points decrease income equality. Similarly, while libertarian democracies and especially the FeC profile reduce social and economic equality, the egalitarian-majoritarian democracy profile increases economic equality. The equality-reducing effect of the egalitarian-control-focused democracy might be explained by the fact that it consists of a mixture of collective and competitive veto points. In accordance to Schmidt, the coalition governments caused by the inclusiveness dimension results in compromises and thus higher economic and social equality. The more control-focused a democracy becomes, the more it is slowed down by veto players resulting in a lower economic equality performance. The positive relationship between the inclusion dimension (tradeoff between freedom and equality) and social performance confirms Lijphart’s assessment that consensus democracy leads to more economic and social equality. However, with the positive contribution of the effective government dimension to economic equality, I provide an additional empirical finding. With respect to the domestic security performance, the empirical findings are inconsistently related to the literature, although the dependent variables are slightly different, because Lijphart and Lappi-Seppäla (2008) only analyze the incarceration rate. There is no theoretical reason why the balanced profile shows a negative effect on domestic security. While the positive effect of the fEC profile is according to the theory, it is surprising that other democracy profiles do not reach a level of significance. Finally, I do not find any systematic relationship regarding the democracy profile and confidence performance, so this study cannot confirm Lijphart’s findings that consensus democracy leads to a more favorable political culture. However, Lijphart is concerned with another aspect of political culture, the satisfaction with democracy, while my analysis focuses on confidence in institutions, although both aspects should be related.
292
8.6
8
Explaining Goal-Oriented Performance
Summary and Conclusion
In this chapter, the research question, whether democracy profiles have an effect on goal-oriented performance, is answered. This analysis builds on the previous steps and chapters. These include the identification of democracy profiles, as well as the conceptualization, measurement and aggregation of the AGIL typology of political performance. This chapter relates the identified democracy profiles to each outcome of the goal-oriented performance dimension. In the first section, a literature review about the relationship between democracy models and performance was given. Overall, it was shown that the literature is based on the assumption that democracy models matter and that they have a variety of effects. Yet, these causal links are usually unsatisfactorily weak, so that “the lack of a causal mechanism is a more general problem in the literature relating type of democracy to outcome variables” (Bogaards, 2017a, p. 16). It was theorized that the reason for this is the great length of the causal chain which links these institutions to the performance outcomes. Between the institutions and the performance outcome lie a number of intermediate factors in which political actors play a central role. Nevertheless, I extracted three general effects from the literature review: While the direct effect results from the form and mode of the decision-making process which is determined by the institutional framework, the indirect effects focus on the one hand on the negotiation method and attitudes of the political actors shaped within those institutions and on the other hand on the attitudes of the citizen outside of these institutions. It is theoretically assumed that certain democracy profiles facilitate the creation of new parties and make it easier to involve small parties in the decision-making process. In addition, these democracy profiles can result in less gridlock, and in a fast and authoritative decision-making process. They can create a public interest and a higher responsibility among the political actors and incentivize deliberation. Furthermore, when these institutions foster consensus, they are able to gain higher legitimacy and support from all societal sectors for policies. Finally, certain democracy profiles might provide more stability in multinational contexts. By relating these effects to the democracy profiles, I derived several hypotheses. The most important ones are: The more likely a country belongs to the egalitarian democracy profile (fEc) and egalitarian control-focused democracy profile (fEC), the more likely it is to display higher performance. The more a country belongs to the libertarian democracy profile (Fec) and to the libertarian and control-focused democracy (FeC), the less likely it is to have a high level of performance. And the higher the inclusiveness dimension (freedom vs. equality trade-off) of a country, the more likely it is to show a higher performance.
8.6 Summary and Conclusion
293
The next section was concerned with the methodology for the empirical analysis. A statistical model for analyzing time-series cross-sectional data, on which the study is mainly based, has to consider several methodological challenges: unit root or the problem of spurious regression, autocorrelation, unit heterogeneity, contemporaneous correlation and panel heteroscedasticity. To account for unit heterogeneity, the model includes a random intercept varying by countries. This has several advantages over a simple fixed effects design. One the one hand, it allows the estimation of the effects of time-invariant or “sluggish” variables such as democracy profiles. On the other hand, by partially pooling, it can combine all the information from different parts of the model, for example to mitigate the effects of outliers. However, to exclude bias, the time-varying independent variables are split in a within- and between-effect according to the within-between-model. Contemporaneous correlation requires a random intercept varying over the time points. Panel heteroscedasticity, a different error term per country, is modeled by adding a random intercept of countries in the variance resp. precision term. Finally, serial correlated errors are accounted by introducing a lagged dependent variable. I employed simpler regression models for the analysis of goal-attainment performance which is only based on cross-sectional data, and for the confidence performance which relies on survey data. In addition, I outlined several diagnostic procedures (e.g., lagged residuals to detect autocorrelation) and I used the long-run multiplier (LRM) to assess and describe the total effect of an independent variable on the dependent variable over time. As a visualization technique to evaluate the effects over time, I used dynamic simulation (Williams & Whitten, 2012). Furthermore, I discussed the control variables. Thereby, a gap in the previous approaches was identified: Neither of these approaches controlled for informal institutions and the quality of statehood, although they might severely affect and distort the performance outcomes. By including a corruption and a statehood indicator, I was able to close this gap. The empirical analysis can be summarized as follows: A change in the democracy profiles does not have an immediate effect, but these effects need a longer time-span to be realized and significantly different (over 25 years). The impact of these democracy profiles is quite small, even after 45 years. But it can still be considered a remarkable effect, because the democracy profile as an institutional explanation is causally far away from the performance outcome. With the exception of the confidence performance, the empirical analysis showed that there is at least one statistically significant effect of the democracy profiles in each performance area. Thereby, most effects are concentrated in the social performance area
294
8
Explaining Goal-Oriented Performance
(economic and social equality) and the productivity component of the economic performance. There is not one overall better performing democracy profile: The democracy profiles Fec and FeC perform better on the productivity component of the economic performance, while they perform worse on social performance by lowering economic and social equality. The egalitarian-majoritarian profile (fEc) shows contrary effects: It combines a higher economic equality and a lower productivity performance. Regarding the goal-attainment performance, the Fec, FeC and fEc show a higher reformability. This means that both a strong party and a multiparty system can lead to a higher goal-attainment performance (Reutter & Lorenz, 2016, p. 119). The fEC profiles does show a higher domestic security performance, which is in line with theory. Finally, the two negative effects of the FEC profile on goal-attainment and domestic security performance are not explained by theory. The impact of the trade-off dimensions is also significant. The more inclusive a democracy becomes (trade-off between freedom and equality), the better it scores not only in terms of social performance (both economic and social equality), but also in terms of environmental performance. Regarding the effective government dimension (trade-off between freedom and control), the more majoritarian a democracy becomes, the better it performs in relation to economic equality and economic wealth. However, high values in both dimensions perform worse on the productivity component. The proposed general hypotheses, which related democracy profiles to an overall better performance, could not be verified; rather, the empirical results showed a much greater complexity. Nevertheless, the discussion of the study findings in the context of the literature review revealed that the study produced very similar empirical results. The difference to the literature is that the study here presents a full assessment of the goal-oriented performance of several countries. This reinforces the conclusion that there is not one overall better performing democracy profile, as Lijphart suggests. However, a severe limit of the analysis is that no variables of the partisan and power resource theories were included. This would help control for the different political interests of important political actors or groups. The reason for this was that it would reduce the sample size too much and make the sample too homogeneous in terms of democracy profiles for a reasonable empirical analysis. It was also not possible to impute this type of data, as some party families may simply not exist in other regions, or parties may operate differently in regions other than Europe.
9
Explaining Policy Regime Performance
9.1
Introduction
While the chapters before dealt with goal-oriented performance and explained various outcomes within the goal-oriented performance dimension using the democracy profiles as the main independent variable, this chapter changes the perspective: On the one hand, the democracy profiles become a dependent variable, and on the other hand, this chapter examines policy regime performance. Policy regime performance refers to certain mixtures of policies and procedural rules (Jahn, 1998, n. 2), and therefore represents an intermediate performance dimension, since it combines goal-oriented and general performance: Countries with a specific policy regime show a “typical” or “type-specific” mix of goal-oriented and general performance which can be expected because the country belongs to this specific policy regime and follows its functional logic. As discussed in chapter 4, by using the AGIL paradigm, it is possible to identify and differentiate between several important policy regimes: Economic regimes (varieties of capitalism), environmental regimes (eco state), governmental regimes (democracy profiles), welfare state regimes, consociationalism and political culture. In this chapter, I focus on the governmental regimes with the democracy profiles identified in this study; the economic regimes in the sense of the varieties of capitalism approach (Hall & Soskice, 2001) as well as the welfare state regimes à la Esping-Andersen (1990). The other policy regimes cannot be analyzed due to
Electronic Supplementary Material The online version of this chapter (https://doi.org/10.1007/978-3-658-34880-9_9) contains supplementary material, which is available to authorized users.
© The Author(s), under exclusive license to Springer Fachmedien Wiesbaden GmbH, part of Springer Nature 2021 O. Schlenkrich, Origin and Performance of Democracy Profiles, Vergleichende Politikwissenschaft, https://doi.org/10.1007/978-3-658-34880-9_9
295
296
9
Explaining Policy Regime Performance
limitations in their research and lack of available data. I focus on two questions in this chapter: 1. What causes democracy profiles? According to Bogaards (2017a, p. 18), “so far nobody has tried to explain the geographical distribution of patterns of democracy”. Chapter 2 included only a descriptive analysis of the regional patterns of these democracy profiles and suggested some reasons for these patterns. However, it did not test them. Thus, this present chapters gives the opportunity to validate these suggestions. Furthermore, if I obtain meaningful results which conform with my theoretical expectations, it will give considerable validation to the conceptualization and measurement of democracy profiles as well. This question is discussed in section 9.2. 2. Is there a relationship between democracy profiles and other configurations in the regime performance dimension? As Iversen and Soskice (2011, p. 633) point out, Lijphart’s consensus democracy goes hand in hand with coordinated market economies (CMEs) and social-democratic and conservative welfare states, while majoritarian democracy seems to prefer liberal market economies and liberal welfare states. Showing that the same applies to democracy profiles, gives additional validation to the typology used in this study. I also analyze the specific cultural background as a possible explanation for this co-existence of specific regimes. This question is dealt with in section 9.3.
9.2
Part I: Origin of Democracy Profiles
9.2.1
Literature Review, Theoretical Framework and Hypotheses
The center of part I of this chapter is the research question, why do some countries adopt specific democracy profiles? Lijphart (2012, pp. 246–249) is a good starting point, because he does not only analyze the consequences of its majoritarianconsensus dichotomy but rather he briefly examines their causes as well. He gives three explanations which he also confirms empirically: First, consensus democracy is the favorite institutional design for multinational and culture diverse countries because consensus democracy is related to the concept of consociational democracy (Lijphart, 1969, 1996, 2004) due to the basic idea of multiple veto players. Federalism, power-sharing governments and veto rights support the establishment of “politics of recognition” (Schneckener, 2004b) or policies of “state-nations” (Stepan et al., 2010). It is reasoned that
9.2 Part I: Origin of Democracy Profiles
297
“more heterogeneous and divided societies [need] an institutional arrangement and a political culture that could manage or accommodate inherent tensions and bridge internal cleavages” (Bormann, 2010, p. 2). Many studies show that democracies with many institutional veto points in ethnically heterogeneous societies are less conflict-prone and thus exhibit greater stability (e.g. Stepan et al., 2010; Schneckener, 2004b). By allowing a fair representation of minority groups in the decision-making process, consensus democracy discourages ethnic conflicts and thus promotes political stability. Lijphart can show empirically that the higher the degree of pluralism within a country, the more likely it is to be a consensus democracy. Second, larger countries need a more decentralized administration to be governable. His empirical analysis can verify this hypothesis: The higher the population size, the higher the likelihood that it is a consensus democracy. And finally, Lijphart considers diffusion processes: Not only internal causes should be considered, but also possible takeover and diffusion effects of democracy profiles or parts thereof across national borders. Because in Lijphart’s typology, the United Kingdom is a prime example of a majoritarian democracy, he reasons that, if the country is a former British colony, it is more likely to be a majoritarian democracy as well. This means that the United Kingdom imposes its governmental regime on its former colonies. And in fact, Lijphart’s (2012, p. 246) finds empirically that Commonwealth countries have a similar majoritarian democratic system to that of Great Britain. Building on the work of Lijphart, I transform his consideration to the democracy profiles distinguished in this study: Ethnic heterogeneity increases the probability that the country belongs to the egalitarian and control-focused democracy profile (fEC) and lessens the probability of the emergence of other democracy profiles. The fEC profile allows an equal representation of minority groups and establishes veto-points hindering effective government. Larger countries which need a decentralization to be governable should result in a control-focused instead of a majoritarian democracy profile: This hypothesis applies to both the egalitarian (fEC) and the libertarian variant (FeC). Similar to Lijphart, I constraint the diffusion process to the examination of the impact of the colonial history of the United Kingdom. If the diffusion theory is right, the colonies should have the same libertarian and majoritarian democracy profile (Fec) as United Kingdom, and the other democracy profiles should occur to a lesser extent in the colonies. However, this study goes beyond Lijphart and adds two additional theory clusters. While Lijphart analysis only structural causal factors, I include the generalized behavior of meso actors and cultural variables as explanatory factors: On
298
9
Explaining Policy Regime Performance
the one hand, I examine the effects in the sense of the Power Resource Theory: Democracy profiles express a political conflict of values or goals, on which a society must position itself. The negotiation processes by the various social forces (Bühlmann et al., 2012, p. 123) change the characteristics and weightings of the individual dimensions in relation to one another. Thus, democracy profiles can be seen as an expression of social and political power relations. A strong labor movement and a strong left or social democratic party favor an egalitarian (fEc or fEC) instead of a libertarian democracy profile. A similar assumption is also made by the post-democracy theory, which sees a departure from egalitarian democracy due to the weakness of labor movements and social democratic parties (Crouch, 2004). On the other hand, I include cultural factors in the analysis: Culturalist theories begin with the investigation of values. Are there clusters of values or dominance structures of values that lead to the formation of certain profiles of democracy? A cultural explanation for the emergence of democracy models—is given by Maleki/Hendriks (2015, p. 10), who theorize that “the preference and inclination for a specific cultural orientation in a society can explain the preference for and practice of one model of democracy or another”. This relationship between the type of democracy and cultural values is the “most understudied” (Maleki & Hendriks, 2015, p. 4) of all given explanations. They conclude on the one hand, that a more majoritarian model of democracy (which they call aggregative democracy) “is strongly correlated with the cultural orientation of mastery” (Maleki & Hendriks, 2015, p. 20). Thereby, a mastery culture is associated with competition and individualism: it “gives more sympathy to the strong. […] On the contrary, in a society with low mastery, consensus, solidarity, and harmony are emphasized more and sympathy for the weak is praised” (Maleki & Hendriks, 2015, p. 10). On the other hand, a more participative democracy (measured in terms of electoral and non-electoral participation) is correlated with a less hierarchical cultural orientation and a stronger future orientation. In a hierarchical culture, “power holders are entitled to privileges and subordinates expect to be told what to do” (Maleki & Hendriks, 2015, p. 12), so that hierarchical cultures are more elitist. A culture with future orientation means that people tend to plan ahead and invest in participation and deliberation. Short-term orientations go along with competition and alternation of government parties, whereas long-term orientations prefer incremental changes in a consensual system. However, while the theorized relationships between the cultural orientations of mastery and hierarchy with certain democracy models are rather convincing, the future orientation relationship is a rather weak argument: Democratic participation can also be emotive rather
9.2 Part I: Origin of Democracy Profiles
299
than deliberative, and deliberation can also take place in an elitist majoritarian democracy. I adopt this theoretical approach: A mastery orientation leads to democracy profiles which highlight the freedom (Fec or FeC) at the cost of the equality dimension. A hierarchical and therefore more elitist culture leads to majoritarian democracy profiles (Fec or fEc), while it weakens the occurrence of controlfocused democracies with decentralization and direct democracy which offer various participation options. Finally, this cultural approach poses a problem: I am aware that there is a danger of endogeneity in this analysis. It is possible that the institutional design leads to a specific culture and the causal chain is actually reversed? Even if it cannot be completely ruled out in theoretical terms, it would be somewhat unconvincing to understand only political institutions as the decisive formative force of values. This contradicts the common theory of value transformation (Inglehart, 1977, 2007). Religion can be considered as another cultural factor. This factor plays also a role for the explanations of the three worlds of welfare states or the varieties of capitalism (Manow, 2004; Schröder, 2014). Catholicism represents the idea of subsidiarity (Schröder, 2014, p. 176). Subsidiarity means that smaller units should be strengthened and higher units should only intervene if the smaller units cannot handle it. Therefore, I derive the final hypothesis: The more a country has been influenced by Catholicism, the more likely it is that it adopts a control-focused democracy (FeC or fEC). Because I cannot provide any hypotheses regarding the emergence of the balanced democracy profile (FEC), it is reasonable to focus here on the fourcluster solution identified in chapter 2 (the results for the five-cluster solution are presented in the appendix, see Table 107). Since the sample of democracy profiles included in this analysis is relatively heterogeneous (compared with Lijphart 36 democracies, for instance), it is necessary to include a control variable. I include socio-economic modernization as a control since there is the possibility that certain democracy profiles depend on the country’s socio-economic status. Table 9.1 gives a summary of all derived hypothesis.
300
9
Explaining Policy Regime Performance
Table 9.1 Hypotheses and Their Impact on the Occurrence of Democracy Profiles Short Name of Hypothesis
Hypotheses
Effect Fec
fEc
FeC
fEC
Cultural Diversity
The higher the degree of cultural − diversity within a country, the more pronounced is the egalitarian and control dimensions.
−
−
+
Population Size
The larger the population of a country, the higher the control dimension.
−
−
+
+
British Heritage
If the country is a former British colony or possesses British heritage, its democracy profile will be libertarian-majoritarian democracy identical to the British one.
+
−
−
−
Labor Movement
The stronger labor movement, the greater the probability that it is an egalitarian democracy.
−
+
−
+
Left Parties
The stronger the left and/or the − social democratic party, the more likely it is to be an egalitarian democracy.
+
−
+
Mastery Orientation
If the country has a cultural orientation of mastery, the more pronounced is the freedom dimension resulting in a libertarian-majoritarian or a libertarian-control-focused democracy profile.
+
−
+
−
Hierarchy Orientations
If a country has a cultural orientation of hierarchy, the more likely it belongs to the majoritarian democracy profiles (Fec or fEc).
+
+
−
−
(continued)
9.2 Part I: Origin of Democracy Profiles
301
Table 9.1 (continued) Short Name of Hypothesis Catholicism
Hypotheses
Effect
The more a country has been influenced by Catholicism, the more likely it is that it adopts a control-focused democracy.
Fec
fEc
FeC
fEC
−
−
+
+
Control Variable: Socio-Economic Modernization Source: own table
9.2.2
Methodological Framework
9.2.2.1 Characteristics of the Dependent Variable: Dirichlet Regression The dependent variable of this analysis is the membership probabilities for the democracy profiles of a country. These membership probabilities were obtained due to the usage of fuzzy instead of crisp clustering (see chapter 2). As discussed, the observations belonging to a specific cluster are not strongly separated to other clusters; often there are serious overlaps between clusters. A crisp classification without membership probabilities loses this kind of information. Instead, a country being classified as only belonging to one democracy profile, these membership probabilities highlight the uncertainty incorporated into the classification. This is shown in Table 9.2 for the four-cluster solution: Table 9.2 Membership Probabilities of Selected Cases Country
Fec
fEc
FeC
fEC
United Kingdom
0.74
0.10
0.10
0.06
Australia
0.37
0.28
0.22
0.13
Source: own table
Whereas United Kingdom can be considered a pure type of the Fec-democracy profile with about a 74% membership probability, Australia shows a much higher uncertainty in the classification: The membership probabilities indicate that it belongs most likely to the Fec democracy profile (37%). However, there is also a high chance that it is actually a fEc profile (28%). Furthermore, the membership probability for the FeC and fEC profile is still 22% and 13%. Instead of using
302
9
Explaining Policy Regime Performance
a single threshold which neglects the uncertainty of the classification, I focus on the membership probabilities for a more robust estimation. Membership probabilities are proportional or “compositional” (Aitchison, 1982) data. However, this kind of data poses problems for causal analysis, because the empirical analysis for these data needs to be based on a different estimation method than standard linear regression. The reasons is that on the one hand the response variable consists of a vector of observations instead of only one value, while it also violates the assumptions of a regression under the normal distribution, a normal error term and constant variance: The observations are “limited to numerical values between, and including, 0 and 1, and the variability in the observed proportions usually varies systematically with the mean of the response” (Douma & Weedon, 2019, p. 1413). Since these variables are limited between 0 and 1, values near these boundaries are often skewed in one direction and therefore the variance is also constrained. Thus, the use of compositional data as a dependent variable in an inferential model requires a special regression technique. Historical approaches to estimate regressions with a compositional dependent variables involve single regressions to each component after applying log-ratio transformations (Douma & Weedon, 2019, p. 1418). However, these transformations are difficult to interpret and introduce bias: “if heteroscedasticity persists after the transformation, one has to either violate the assumption of homoscedasticity in linear models, or incorporate model terms capturing heteroscedasticity that further complicate the model (and interpretation)” (Maier, 2014, p. 2; similar Barceló et al., 1996). Therefore, newer regression methods are directly based on the Dirichlet Distribution. The Dirichlet distribution is able to model observations 1 with C parts (y 1 + · · · + yC−1 + yC ) constrained to C c=1 yc = 1. The density function of the Dirichlet distribution (Maier, 2014; Gueorguieva et al., 2008) is f (y1 , . . . , yc |α1 , . . . , αc ) =
C 1 (αc −1) yc , B(α) c=1
where α are the shape parameters, and B(α) is the multinomial Beta function C ∞ c=1 (αc ) C and (x) is the Gamma function defined as 0 t x−1 e−t dt. Thus, the 0(
1
c=1 αc )
The Dirichlet distribution cannot handle exact zeros or ones (yc = 0 and yc = 1). An y ∗(n−1)+ 1
∗ = ic C , often proposed solution is a transformation of the raw values according to yic n where n is the total number of observations (Douma & Weedon, 2019, p. 1420). This adds a small number to each fraction. However, exact zeros or ones do not appear in the dataset.
9.2 Part I: Origin of Democracy Profiles
303
Dirichlet distribution is a generalization of the Beta distribution to any number of categories (Douma & Weedon, 2019, p. 1418). Here I use the Dirichlet regression based on its alternative formulation2 with a mean (μc ) and a dispersion () parameter, so that αc = μc (Maier, 2014). The higher the precision , the less uncertainty is in the estimates of μc . This is similar to a multinomial regression approach (Long & Freese, 2014) with its reference category. The reference category can be set arbitrarily. In order to get a full picture over the model, all contrasts of outcomes should be considered. The formula is as follows: yc ∼ Dirichlet(μc , ) f or c ∈ [1, . . . , C ≥ 2] e ηc μc = C
a=1 e
μb = C
ηa
1
a=1 e
ηa
ηc = X βc = eZy where yc is the response matrix with the constraints yc ∈ [0, 1], and C c=1 yc = 1, is the precision parameter, βc are the regression coefficients and X is the predictor matrix for the response yc , y are the regression coefficients and Z is the predictor matrix for the precision parameter , μb is the reference category with e X 0 (all regression coefficients are zero), whereas μc indicates all other C components. Finally, the link functions for the μc are the logit, while the link function for the phi parameter is the log. By exponentiating the logits, odds ratios are obtained, facilitating interpretation. The DAG representation is shown in Figure 9.1. The package “DirichletReg” (Maier, 2014) was used to perform the calculations.
9.2.2.2 Workflow of Regression Analysis The workflow of this regression analysis follows four main steps (see Table 9.3). This workflow results from a selection from the 10-step protocol by Zuur/Ieno
The common parametrization only estimates the shape parameter α of the Dirichlet distribution. The alternative approach models explicitly the heteroscedasticity (Maier, 2014).
2
304
9
Explaining Policy Regime Performance
Source: own plot
Figure 9.1 DAG representation for the Dirichlet Regression
(2016) and Zuur et al. (2010). Thereby, these steps have been adapted to the needs of the Dirichlet regression (Douma & Weedon, 2019, pp. 1420–1423). 1. Data Exploration and Descriptive Statistics: The first step consists of a description of the number of missing values and whether action in form of multiple imputation needs to be taken. Multiple imputation is described in detail in chapter 6. The detection of the correlation between the independent variables via correlation diagrams is important to get a sense about a possible problem of multicollinearity. I treat variables with extreme caution if they are higher correlated than |r| ≥ 0.7 (Pearson correlation coefficient) which is a commonly preferred threshold (Dormann et al., 2013). Finally, I visualize the effect of the independent variables on the dependent via scatterplots: On the one hand, it helps to “detect observations that do not comply with the general pattern between two variables” (Zuur et al., 2010, p. 10). This assists in identifying outliers. On the other hand, it shows “how the variation in the proportions vary as function of the covariates” (Douma & Weedon, 2019, p. 1420) and thus, supports the identification of heteroscedasticity and the need to independently model the precision parameter. 2. Model fit and model selection criteria. One of the characteristic of a Dirichlet regression is that an independent variable does not influence the dependent variable by a single but rather by many regression coefficients—compared to
9.2 Part I: Origin of Democracy Profiles
305
the simple linear regression (Kühnel & Krebs, 2010, p. 861). This makes the test and interpretation of a significant effect of an independent variable rather difficult. Therefore, to test, whether a variable significantly effects the outcome variable, I employ the likelihood ratio test of nested models as some kind of “omnibus test” (Long & Freese, 2014, p. 396). The null-hypothesis states that all regression coefficients of the new included independent variable are zero and this variable does not improve the model fit. The information criterium AIC is also employed here. AIC measures the “predictive accuracy” (McElreath, 2015, p. 15) of the model but at the same time it applies a penalty that increases with the number of model parameters and thus, with the complexity of the model. This means that not the most complex model with high predictive accuracy but a parsimonious model that has high predictive accuracy with the lowest possible number of variables is the best model. A lower AIC value means improved model fit. If the AIC values of the compared models show a difference of more than two points, than there is usually a significant improvement of the model fit (Douma & Weedon, 2019). 3. Model validation. I inspect the residuals to detect heteroscedastic patterns and potential outliers (Salkind, 2010). This is especially important for the small sample size in this chapter. According to Gueorguieva et al. (2008) and Maier (2014) the following residual types for the Dirichlet Regression can be defiraw = y − μ , where y are the observed values ned: First, the raw residualsric ic ic c and μc are the expected values. Raw residuals are not relevant for this type of analysis, because they do not take “skewness and heteroscedasticity” into account (Maier, 2014, p. 14) of the Dirichlet distribution. Standardized resi μc ∗(1−μc ) standar d = [yic − μc ]/ , duals account for this kind of variance: ric 1+i
where i is the precision parameter. Thereby, “systematic pattern of variation in the spread of residuals along the range fitted values or covariates indicates the need for a separate model for the precision parameter ” (Douma & Weedon, 2019, p. 1423). Overall, it is difficult to assess the presence of heteroscedasticity when the sample size is as small as in this study. Unfortunately, for the Dirichlet regression no measures of outlier detection have been developed yet (Douma & Weedon, 2019, p. 1423). However, I can resort to the composite standar d )2 because, as Gueorguieva composite residuals rC = C c=1 (ric et al. (2008) state, the composite residuals are highly correlated with Cook’s Distance, a common measure for influence of an observation. Thus, composite residuals help to identify outliers.
306
9
Explaining Policy Regime Performance
4. Visualization of the model results. Since the dependent variable consists of many values which can be differently influenced by the independent variables, a regression table cannot visualize these relationships. Therefore, I use a special visualization technique called the odds-ratio plot (Long, 1987; Long & Freese, 2014). This odds-ratio plot was originally developed for the multinomial (categorical) regression, but I can easily transfer it to the Dirichlet regression, since both regression types are related. The odds-ratio shows the effects on each of the dependent variables for every independent variable. Thereby, the independent variables form the rows of the graph, while their regression coefficients are organized horizontally in each row indicated by letters representing each outcome. A coefficient placed to the right of another one indicates that with an increase in the independent variable, the likelihood of that outcome is increased relative to the other one. In contrast, a coefficient placed to the left of another one implicates a decrease in the likelihood of that outcome relative to the other one. In addition, the distance between two coefficients indicates the magnitude of the coefficients. Furthermore, the significance of the coefficients for each outcome is visualized: If this coefficient is insignificant, a line is added “connecting the two outcomes, suggesting that those outcomes are ‘tied together’” (Long & Freese, 2014, p. 440). Finally, the size and sign of a letter indicate how large and in which direction the change in probability is in the terms of marginal effects (a standard deviation by continuous variables; a unit increase by dummy variables). 5. Visualization of model implications. I use model simulations of expected values3 to visualize the average effect of the independent variables on the outcomes (King et al., 2000). This procedure helps to conclude whether a variable is not only significant but has indeed substantive impact. On the one hand, I simulate values for certain ideal scenarios to evaluate the effects of a combination of explanatory variables. On the other hand, I also show the effects of the values for an independent variable, while the other variables are kept at their mean value.
3
King et al. (2000, p. 350) highlight the difference between predicted and expected values: “A predicted value contains both fundamental and estimation uncertainty, whereas an expected value averages over the fundamental variability arising from sheer randomness in the world, leaving only the estimation uncertainty caused by not having an infinite number of observations. Thus, predicted values have a larger variance than expected values”.
9.2 Part I: Origin of Democracy Profiles
307
Table 9.3 Workflow of Regression Analysis Steps
Goals
1. Data Exploration
Evaluation of Missing Values Checking of Multicollinearity Heteroscedasticity
2. Model fit
Likelihood Ratio Test; AIC
3. Model Validation
Residual Analysis (Standardized and Composite Residuals)
4. Visualization
Odds-Ratio Plot Expected Values
Source: own table
9.2.2.3 Operationalization of the Dependent and Independent Variables In this section, I describe the measurement of the dependent and especially the independent variables. Since I do not rely on a time-series regression, I am reporting the data for the countries averaged over the period from 1950 to 2000 (if not otherwise specified). Therefore, the dependent variable is the average membership probability for each country from 1950 to 2017, normalized so that it sums to 1. Cultural diversity is measured with the Fractionalization index developed by Alesina et al. (2003). They differentiate between ethnic, linguistic and religious fractionalization and provide data for about 190 countries. The measurement reflects the state of fractionalization between 1980 and 2000. However, since this variable should not change much over time, it is possible to transfer this data to earlier or later periods. I create the independent variable as the geometric mean of ethnic, religious and language fractionalization. Population size and GDP per capita for socio-economic modernization come from the World Development Indicators (WDI). Both variables are logtransformed to reduce the impact of outliers. Catholicism is measured via the World Religion Data from the Correlates of War Project (Zeez & Henderson, 2013). Thereby I divide the number of Catholics by the total population to gain a relative measure. British heritage is measured on the one hand with a dummy variable for the colonial history (1 = British; 0 = other) from the Colonial History Data Set (Hensel, 2018) and on the other hand with an indicator for legal origin (a dummy variable: 1: English; 0: Other) taken from the study by La Porta et al. (1999). The data for the labor movement is the union density rate provided
308
9
Explaining Policy Regime Performance
by the ICTWSS database (Visser, 2016). To measure the strength of social democratic and left parties, I rely on the CPDS database (Armingeon et al., 2018).4 I create a summary measure, which is the sum of the share of seats in parliament for social democratic and left-socialist parties and then averaged over the entire period. Finally, cultural orientations are measured with the values for the Mastery and Hierarchy orientation from a study conducted by Schwartz (Licht et al., 2007). This study consists of “a sample of over 15,000 urban teachers in 51 countries, surveyed in 1988–1998” (Maleki & Hendriks, 2015, p. 10). Table 9.4 summarizes the operationalization of the hypotheses (summary statistics can be found in Table 101 in the appendix). Table 9.4 Measurement of the Independent Variables Theoretical Component
Independent Variables
Source
Cultural Diversity
Geometric mean of ethnic, religious and language fractionalization
Alesina et al. (2003)
Population Size
Population Size (logged)
WDI
British Heritage/British Colony
Colonial History (1 = British, 0 = Other) Legal Origin (1 = English; 0 = Other)
ICOW (Hensel, 2018) La Porta et al. (1999)
Labor Movement
Union Density Rate
ICTWSS
Left Party / Social Democracy Party
Sum of share of seats in parliament for social democratic and left-socialist parties
CPDS (Armingeon et al., 2018)
Cultural Orientations
Mastery Orientation Hierarchy Orientation
Licht et al. (2007)
Catholicism
Roman Catholics (% of total population)
Zeez/Henderson (2013)
Socio-Economic Modernization (Control Variable)
GDP per Capita (logged)
WDI
Source: own table
The pattern of missing values is shown in Figure 201 in the appendix. From the 112 observations with a democracy profile, there are only very few missing values 4
A short description for the ICTPWSS and CPDS can be found in chapter 6.
9.2 Part I: Origin of Democracy Profiles
309
for indicators of the structural explanation. These cases (Kosovo, Palestine/West Bank, Taiwan, Montenegro and German Democratic Republic) are excluded from the analysis. Given this small number of missing values, the benefits of a more elaborate method like multiple imputation to deal with them are too small. For the variables capturing the power resource theory there are a lot more observations missing. About half or more of all observations with a democracy profile are missing (64 to 76 missing observations). The missing values concentrate on the non-OECD countries. It is not possible to impute those observations. They might be missing not at random (MNAR) (see chapter 6 for a detailed discussion of the multiple imputation procedure), because unions, left or social-democratic parties might just not exist or work differently in other countries. The same goes for the cultural variables: Only 46 out of 112 observations are not missing. I refrain from imputing these values, because It is not clear whether Schwartz’s concept is appliable to countries with missing data. Overall, this makes it necessary to split the regression in three parts according to the structural explanatory factors, the power resource theory and the cultural values. Including all these variables into one model would lose too much information due to the missing values. It is also necessary to reduce the number of independent variables for the regression model with the power resource theory and the cultural values.
9.2.3
Empirical Analysis
9.2.3.1 Structural Factors The exploratory analysis revealed almost no anomalies for the structural indicators (see Figure 203 in the appendix). On the one hand, the correlation between the indicators of colonial history and legal origin surpasses the threshold of |r| ≥ 0.7. However, this is not surprising given that they measure the same theoretical component. I enter these indicators separately in the regression to test and resolve the problem of multicollinearity. On the other hand, the XY-Plots show that there are no outliers but there is a low variation of the membership probabilities of the FeC democracy profiles. This might not be surprising, as discussed in chapter 2, that this specific democracy profiles almost disappears in the third wave of democracy. Furthermore, this plot indicates that the precision does not vary with any covariate and that it is therefore not pressing to model the precision part of the Dirichlet regression. Table 9.5 presents the results of the regression analysis for the structural explanatory factors (N = 107). It shows the regression coefficients for the democracy
310
9
Explaining Policy Regime Performance
profiles fEc, fEC and FeC, whereas the libertarian-majoritarian profile (Fec) is the reference category and is thus, omitted. Importantly, choosing Fec as the reference category makes sense, since it is an exceptional and coherent democracy profile, in which the freedom dimension is favored in both trade-off dimensions (libertarian vs. egalitarian; majoritarian vs. control-focused) (see chapter 2). However, I also display the odds-ratio plots which visualize all relationships between the independent and the dependent variables at once. In total, seven models were calculated in a hierarchical regression analysis, where one explanatory variable is added to the model in each successive step. This setup allows to test via the likelihood ratio test, if the added independent variable significantly improves the model fit and thus, if the so represented hypothesis is significantly linked to the dependent variable. To evaluate the model fit, the table also shows the AIC values. Model 1 is the starting point and does not contain any explanatory variables. Model 2 includes the GDP per capita variable. The LR-Test does show a significant improvement (LR χ2 = 8.92, df = 4, p = 0.03), the AIC improve over 2 points as well. Model 3 contains the population size indicator. The LR-Test is significant (LR χ2 = 15.97, df = 4, p = 0), and the AIC value improved greatly. Thus, population size is a significant explanatory factor for the democracy profiles. Model 4 includes the cultural diversity indicator. Here, the AIC and the LR-Test (LR χ2 = 2.39, df = 4, p = 0.5) show no improvement. For now, this means cultural diversity is not related to the democracy profiles. However, this changes with model 5 and 6 which include separately the British colony and English legal origin as indicators. Controlling for both of these factors results in a significant cultural diversity indicator. According to the significant LR-Test (LR χ2 = 24.36, df = 4, p = 0, LR χ2 = 16.3, df = 4, p = 0) and the AIC value, British heritage and English legal origin is significantly linked to the democracy profiles as well. The better AIC value for the British colony indicator shows that model 5 is overall better than model 6, therefore I continue the model building with the British colony indicator. Finally, model 7, which includes the percentage of Catholics as an indicator, shows no significant improvement (LR χ2 = 1.77, df = 4, p = 0.62). The standardized residuals of or the democracy profiles (see Figure 204 in the appendix) indicate a problem with the FeC category which is not well predicted. The reason is that this category lacks variation as there are only very few countries that fit into this group. Therefore, this problem cannot be overcome by modeling the precision parameter. The composite residuals of the model, which can be used to detect outliers to some extent (see section 9.2.2.2), indicate at least three outliers (see Figure 205 in the appendix): El Salvador, Guatemala and
9.2 Part I: Origin of Democracy Profiles
311
Table 9.5 Results of the Dirichlet Regression (Structural Factors) Term
M1
M2
M3
M4
M5
M6
M7
0.39 *** (0.09)
0.39 (0.56)
−0.43 (1.05)
−0.45 (1.07)
0.38 (1.02)
0.11 (1.04)
0.44 (1.02)
0 (0.07)
0 (0.07)
0.01 (0.07)
0.01 (0.07)
−0.01 (0.07)
0.01 (0.07)
0.05 (0.06)
0.05 (0.05)
0 (0.05)
0.02 (0.05)
0 (0.05)
0.06 (0.45)
0.79+ (0.48)
0.73 (0.48)
0.73 (0.48)
fEc Intercept GDP per capita, log (WDI) Population Size, log (WDI) Cultural Diversity (Alesina)
−0.9 *** (0.21)
British Colony (ICOW)
−0.92 *** (0.22) −0.78 *** (0.21)
British Legal Origin (La Porta)
−0.14 (0.25)
Catholicism (World Religion Dataset) fEC Intercept GDP per capita, log (WDI) Population Size, log (WDI) Cultural Diversity (Alesina)
0.12 (0.1)
1.46 ** (0.56)
−1.88+ (1.07)
−2.22 * (1.1)
−1.42 (1.05)
−1.64 (1.06)
−1.44 (1.05)
−0.17 * (0.07)
−0.18 * (0.07)
−0.14+ (0.08)
−0.14+ (0.07)
−0.16 * (0.07)
−0.15 * (0.07)
0.22 *** (0.06)
0.21 *** (0.06)
0.16 ** (0.06)
0.18 ** (0.06)
0.15 ** (0.06)
0.57 (0.46)
1.43 ** (0.49)
1.23 * (0.49)
1.48 ** (0.5) (continued)
312
9
Explaining Policy Regime Performance
Table 9.5 (continued) Term
M1
M2
M3
M4
M5
M6
−1.03 *** (0.23)
British Colony (ICOW)
M7 −1.02 *** (0.23)
−0.77 *** (0.22)
British Legal Origin (La Porta) Catholicism (World Religion Dataset)
0.12 (0.27)
FeC Intercept
−0.68 *** (0.11)
GDP per capita, log (WDI)
0.16 (0.63)
−1.76 (1.23)
−2.07 (1.27)
−1.52 (1.23)
−1.73 (1.24)
−1.52 (1.22)
−0.11 (0.08)
−0.11 (0.08)
−0.07 (0.09)
−0.08 (0.09)
−0.09 (0.09)
−0.09 (0.09)
0.12+ (0.07)
0.11+ (0.07)
0.08 (0.06)
0.1 (0.07)
0.07 (0.06)
0.5 (0.53)
0.96+ (0.58)
0.86 (0.58)
1.02+ (0.59)
Population Size, log (WDI) Cultural Diversity (Alesina)
−0.51+ (0.26)
British Colony (ICOW)
−0.49+ (0.26) −0.37 (0.26)
British Legal Origin (La Porta) Catholicism (World Religion Dataset) phi
0.18 (0.3)
1.69 *** (0.07)
1.72 *** (0.07)
1.78 *** (0.07)
1.79 *** (0.07)
1.88 *** (0.07)
1.85 *** (0.07)
1.88 *** (0.07) (continued)
9.2 Part I: Origin of Democracy Profiles
313
Table 9.5 (continued) Term
M1
M2
M3
M4
M5
M6
M7
LL
244.21
248.67
256.66
257.85
270.03
266
270.91
AIC
−480.42
−483.34
−493.31
−489.7
−508.06
−500
−503.83
N
107
107
107
107
107
107
107
Ref.: Fec Note: The estimated coefficients are on the logit scale, while the precision parameter is reported at a log scale. In addition, the standard error is in parentheses. The significance levels are as follows: *** p < 0.001, ** p < 0.01, p * < 0.5, +p < 0.1. The reference category (Fec) is omitted from the output. Source: own calculations
The Gambia. Excluding these countries from the regression, there are no changes regarding the significance of indicators (see Table 102 in the appendix). Furthermore, the odds-ratio plot is shown in Figure 9.2. This plot shows all effects on the dependent variables at once. Firstly, the odds-ratios plot shows that being a British colony decreases significantly the odds of being an egalitarian-majoritarian (fEc) and egalitarian-control-focused democracy profile (fEC) compared to the libertarian-majoritarian democracy profile. The effect for the libertarian-control-focused democracy profile (FeC) is only significant compared to the fEC democracy profile. The marginal effects for this variable are also rather large. British heritage greatly increases the probability of being a Fec and FeC democracy profile and substantially reduces the probability of being a fEc and fEC democracy profile as well. The second largest effect can be found for the population-size variable. Population size increases the odds ratios for the egalitarian-control-focused democracy profile (fEC) compared to the other three profiles. On average, the larger the population size, the higher the probability of being a fEC democracy. Although the effect is not significant, a similar, yet smaller positive marginal effect can be observed for the FeC profile, meaning that the probability for the FeC profile rises with increasing population size. Cultural diversity significantly increases the odds of being an egalitarian-control-focused democracy profile (fEC) compared to a Fec profile. The other odds ratios are not significant. The marginal effect shows that the membership probability of being a fEC profile increases with cultural diversity, while the probability of being a Fec profile decreases. Finally, I included the GDP per capita indicator as a control variable. It shows a significant decrease in odds for the fEC profile compared to the other three profiles. The marginal effect shows that the probability of being fEC is reduced with a higher
314
9
Explaining Policy Regime Performance
GDP per capita. Overall, the odds-ratio plot shows the weakness of this model: This model cannot explain the FeC democracy profile well. A somewhat different visualization, the predicted membership probabilities for the significant variables, can be found in Figure 206 in the appendix.
Note: The top x-axis shows the log-odds; The boom x-axis shows the logit-coefficients. A line from one democracy profile to another shows that there is not a significant relaonship. It shows the effects for an increase in SD for the connuous variables and 1 for the dichotomous variables. Source: own calculaons
Figure 9.2 Odds-Ratio Plot for Structural Factors
Finally, I illustrate the implications of this model using two ideal conditions: The first condition is a country with a large population size, great cultural diversity and no British heritage. The second condition describes a country with a small population, a low level of cultural diversity and with British heritage. All other variables are at their means. Figure 9.3 indicates that the first condition makes it
9.2 Part I: Origin of Democracy Profiles
315
far more likely that egalitarian democracy profiles (fEC or fEc) emerge compared to the libertarian profiles (Fec or FeC). The second condition makes it more likely that libertarian-majoritarian democracies (Fec) emerge. There is no increase in the membership probabilities of the FeC profile compared to the first condition. This confirms the above-mentioned weakness of the model: it tells us almost nothing about the distribution of the FeC profile.
Source: own calculaons
Figure 9.3 Expected Values for Two Ideal Conditions (Structural Factors)
9.2.3.2 Power Resource Theory According to the exploratory analysis, there are no anomalies: Neither were higher correlations present nor revealed the XY-plots troublesome outliers. The number of variables had to be limited due to the smaller sample size (N = 36). Therefore, I included only the most relevant independent variables of the structural explanation. I excluded the catholic indicator because, as shown, it was not significant and substantial related to the democracy profiles. I excluded as well the GDP
316
9
Explaining Policy Regime Performance
per capita variable, because the reduced sample with its focus on OECD countries is rather homogeneous in this respect. The cultural diversity indicator was significant, but it contributed not very much to the explanation compared to the population size indicator. Therefore, the regression includes two structural factors (population size and British colony). The results of the Dirichlet regression are shown in Table 9.6. Model 1 is the baseline model, including the remaining structural factors. Model 2 which includes the union indicator is not superior to Model 1 according to AIC values and the LR-Test (LR χ2 = 5.87, df = 4, p = 0.12). The democratic/left party indicator is added in Model 3. This also does not improve the model fit compared to Model 1 shown by the AIC values and the LR-test (LR χ2 = 5.08, df = 4, p = 0.17). Model 4 which includes both power resource variables is not superior to the simple Model 1 (LR χ2 = 9.31, df = 4, p = 0.16). The standardized residuals indicate the same problem with the FeC category (see Figure 207 in the appendix). The composite residuals indicated Switzerland as a slight outlier (see Figure 208 in the appendix). Removing this observation did not change any results (see Table 103 in the appendix). The odds-ratio plot presents no significant relations for the union indicator and the social-democracy/left party indicator. In this empirical analysis, democracy profiles are not related to the labor movement and to higher parliamentary representation of social democratic and left parties (Figure 9.4). These effects might not be significant due to the low sample size. Therefore, it is at least worthy to explore the substance of these effects. The next plot (see Figure 9.5) shows the predicted membership probabilities for the democracy profiles when either the union density or the seats share of social democratic and left parties in parliament varies from low to high values, while the other variables are kept at their mean. This shows that for both variables the membership probability of the fEc profile increases. It also shows that for the union indicator the membership probability of the libertarian-majoritarian democracy profile increases. However, all in all, I would suggest that both factors have a substantial effect: While the union factor increases the probably of the fEc profile from 41% to 49%, and the Fec profile from 21% to 33%, the left party factor increases the probability of the fEc profile by over 15%. The union effect only partly aligns with the hypothesis, because it increases not only the egalitarian fEc profile but also the libertarian Fec profile. However, the contribution of the social democratic/left party indicator for the fEc profile corresponds to the theory and is quite substantial.
9.2 Part I: Origin of Democracy Profiles
317
Table 9.6 Results of the Dirichlet Regression (Power Resource Theory) Term
M1
M2
M3
M4
Intercept
1.25 (1.45)
1.49 (1.76)
1.25 (1.49)
1.63 (1.76)
Population Size, log (WDI)
−0.03 (0.09)
−0.04 (0.1)
−0.04 (0.09)
−0.05 (0.1)
British Colony (ICOW)
−1.28 *** (0.35)
−1.27 *** (0.34)
−1.3 *** (0.35)
−1.28 *** (0.35)
fEc
Union Density Rate (ICTWSS)
0 (0.01)
Left/Social Democratic Party (CPDS)
0 (0.01) 0.01 (0.01)
0 (0.01)
fEC Intercept
−0.06 (1.72)
2.51 (2.11)
0.18 (1.63)
2.43 (2.05)
Population Size, log (WDI)
0 (0.11)
−0.11 (0.12)
0.03 (0.1)
−0.08 (0.12)
British Colony (ICOW)
−0.88 * (0.38)
−0.81 * (0.38)
−1.06 ** (0.39)
−0.95 * (0.39)
−0.02+ (0.01)
Union Density Rate (ICTWSS) Left/Social Democratic Party (CPDS)
−0.02+ (0.01) −0.02 (0.01)
−0.02 (0.01)
FeC Intercept
−2.15 (1.96)
−0.15 (2.51)
−1.85 (1.89)
−0.04 (2.44)
Population Size, log (WDI)
0.07 (0.12)
−0.01 (0.14)
0.09 (0.12)
0.01 (0.13)
British Colony (ICOW)
−0.09 (0.4)
−0.02 (0.39)
−0.28 (0.44)
−0.19 (0.44)
Union Density Rate (ICTWSS) Left/Social Democratic Party (CPDS)
−0.02 (0.01)
−0.02 (0.01) −0.02 (0.02)
−0.01 (0.02) (continued)
318
9
Explaining Policy Regime Performance
Table 9.6 (continued) Term
M1
M2
M3
M4
Phi
2.06 *** (0.13)
2.12 *** (0.13)
2.11 *** (0.13)
2.16 *** (0.13)
LL
105.48
108.42
108.02
110.14
AIC
−190.96
−190.84
−190.04
−188.28
N
36
36
36
36
Ref.: Fec Note: The estimated coefficients are on the logit scale, while the precision parameter is reported at a log scale. In addition, the standard error is in parentheses. The significance levels are as follows: *** p < 0.001, ** p < 0.01, p * < 0.5, +p < 0.1. The reference category (Fec) is omitted from the output. Source: own calculations
9.2.3.3 Cultural Factors I have separated the testing of the impact of cultural factors on the democracy profiles from the structural and power resource factors, because the analysis of the cultural factors rests on a smaller sample size (N = 45). Therefore, I have to limit the number of variables: I excluded the British colony variable, because, on the one hand, this variable was correlated with the cultural variables (especially mastery orientation), although it is below the threshold of a Pearson correlation coefficient of |r| > = 0.7. On the other hand, a removal is also sensible insofar as the British colony variable which is part of the diffusion theory also represents cultural values (British cultural values). Nevertheless, including the British colony variable would remove the effect of the mastery orientation (see Table 105 in the appendix). In addition, there is a strong correlation (|r| ≥ 0.7) between the hierarchy cultural orientation and GDP per capita indicators which carries the danger of multicollinearity: The higher the GDP, the lower the hierarchy orientation. However, testing the inclusion and exclusion of the GDP per capita indicator, revealed that the effects of the hierarchy and mastery orientation was actually strengthened by controlling for GDP per capita (see Table 104 in the appendix). I, therefore, include GDP per capita as a control variable. I excluded the cultural diversity indicator, because it contributed not very much to the explanation compared to the population size indicator. Furthermore, I do not include any variables from the power resource theory, because it would reduce the sample size significantly. In contrast to the power resource theory regression, I include the percentage
9.2 Part I: Origin of Democracy Profiles
319
Note: The top x-axis shows the log-odds; The boom x-axis shows the logit-coefficients. A line from one democracy profile to another shows that there is not a significant relaonship. It shows the effects for an increase in SD for the connuous variables and 1 for the dichotomous variables. Source: own calculaons
Figure 9.4 Odds-Ratio Plot for Power Resource Theory
of Catholics here, because—although it was not a significant explanatory factor before, it belongs to the cultural hypothesis. The fowling variables remain: GDP per capita, population size and percentage of Catholics. Table 9.7 presents the empirical results for the cultural factors. The first model (M1) is the base-line model which includes the remaining structural factors. Model 2 tests the hierarchy hypothesis. The model improved slightly according to AIC values and the LR-Test (LR χ2 = 8, df = 4, p = 0.05). Model 3 includes the indicator for mastery orientation. Model 3 is only a slight improvement compared
320
9
Explaining Policy Regime Performance
Source: own calculaons
Figure 9.5 Expected Values (Power Resource Theory)
to Model 1 according to the LR-test Test (LR χ2 = 7.41, df = 4, p = 0.06), while the AIC value is somewhat below the threshold of two points showing no improvement. Finally, model 4 incorporates both variables. Neither the LR-test (LR χ2 = 6.29, df = 4, p = 0.1) nor the AIC value show an improvement over model 2 or 3. Residual analysis reveals the same problem with FeC as above (see Figure 209 in the appendix) and indicates the presence of small outliers (see Figure 210 in the appendix). Removing the small outliers Israel, Ireland and Chile from the regression, would slightly weaken the significance of the effect of the mastery orientation for the fEc profile (p < 0.1) (see Table 106 in the appendix). Figure 9.6 shows the odds-ratio plot. The first variable, mastery orientation, decreases the odds of being a fEC and fEc profile compared to the Fec profile. These effects are statistically significant: The odds-ratios for the fEc and fEC democracy profiles differ significantly from the Fec profile. However, this is not
9.2 Part I: Origin of Democracy Profiles
321
Table 9.7 Results of the Dirichlet Regression (Cultural Factors) Term
M1
M2
M3
M4
Intercept
2.23 (2.19)
7.16 * (2.81)
10.05 * (4.01)
13.48 ** (4.25)
GDP per capita, log (WDI)
0 (0.13)
−0.36+ (0.18)
−0.11 (0.13)
−0.42 * (0.18)
Population Size, log (WDI)
−0.11 (0.1)
−0.03 (0.1)
−0.04 (0.1)
0.02 (0.1)
Catholicism (World Religion Dataset)
0.12 (0.38)
−0.38 (0.42)
−0.11 (0.39)
−0.55 (0.43)
fEc
−1.35 ** (0.51)
Hierarchy Orientation (Schwarz) Mastery Orientation (Schwartz)
−1.23 * (0.5) −2.11 * (0.92)
−1.81 * (0.91)
fEC Intercept
2.93 (2.34)
5.21+ (2.92)
11.67 ** (4.23)
13.45 ** (4.53)
GDP per capita, log (WDI)
−0.39 ** (0.13)
−0.57 ** (0.19)
−0.53 *** (0.14)
−0.66 *** (0.19)
Population Size, log (WDI)
0.03 (0.11)
0.08 (0.11)
0.1 (0.11)
0.13 (0.11)
Catholicism (World Religion Dataset)
0.14 (0.39)
−0.12 (0.43)
−0.11 (0.4)
−0.34 (0.44)
−0.66 (0.54)
Hierarchy Orientation (Schwarz) Mastery Orientation (Schwartz)
−0.52 (0.53) −2.26 * (0.9)
−2.24 * (0.92)
FeC Intercept
−1.05 (2.74)
−0.31 (3.33)
3.81 (4.97)
4.38 (5.32)
GDP per capita, log (WDI)
−0.13 (0.15)
−0.19 (0.22)
−0.21 (0.16)
−0.24 (0.23)
Population Size, log (WDI)
0.09 (0.13)
0.1 (0.13)
0.14 (0.13)
0.14 (0.13) (continued)
322
9
Explaining Policy Regime Performance
Table 9.7 (continued) Term
M1
M2
M3
M4
Catholicism (World Religion Dataset)
0.01 (0.44)
−0.1 (0.49)
−0.14 (0.45)
−0.25 (0.51)
−0.22 (0.6)
Hierarchy Orientation (Schwarz) Mastery Orientation (Schwartz)
1.92 *** (0.11)
−0.14 (0.6) −1.32 (1.06)
−1.32 (1.09)
1.92 *** (0.11)
1.98 *** (0.12)
phi
1.86 *** (0.11)
LL
116.34
120.34
120.04
123.48
AIC
−206.67
−208.67
−208.08
−208.96
N
45
45
45
45
Ref.: Fec Note: The estimated coefficients are on the logit scale, while the precision parameter is reported at a log scale. In addition, the standard error is in parentheses. The significance levels are as follows: *** p < 0.001, ** p < 0.01, p * < 0.5, +p < 0.1. The reference category (Fec) is omitted from the output. Source: own calculations
the case for the FeC profile which lies in-between. The marginal effects for this variable are also rather large. Mastery orientation and thus, a competitive cultural orientation, increases especially the probability of being a Fec democracy profile and decreases the probability of being a fEc and a fEC democracy profile. The second effect, the hierarchy orientation, decreases significantly the odds of being a fEc profile compared to the Fec democracy profiles. Odds ratios in relation to the fEC or FeC profiles are not significant. In terms of marginal effects, hierarchy orientation increases the probability of being a Fec and FeC democracy profile and decreases the membership probability of the fEC and especially of the fEc democracy profiles. The predicted membership probabilities of the model are depicted in Figure 9.7. The values for the mastery and hierarchy orientation vary from low to high, whereas all other variables are kept at their mean values. The probability of the Fec model increases with a stronger mastery orientation, whereas the fEc and the fEC membership probabilities decrease. The hierarchy orientation has a rather strong effect. It decreases sharply the probabilities of a country belonging to the fEc profile. A stronger hierarchy orientation increases the probability for
9.2 Part I: Origin of Democracy Profiles
323
Note: The top x-axis shows the log-odds; The boom x-axis shows the logit-coefficients. A line from one democracy profile to another shows that there is not a significant relaonship. It shows the effects for an increase in SD for the connuous variables and 1 for the dichotomous variables. Source: own calculaons
Figure 9.6 Odds-Ratio Plot for Cultural Orientation
the FeC but especially for the Fec democracy profile. Overall, the contribution of the cultural factors seems relatively substantial.
9.2.4
Discussion
Some of the hypotheses in section 9.2.1 have been verified. With respect to structural factors, the hypothesis is confirmed, that, if a country is a former British
324
9
Explaining Policy Regime Performance
Source: own calculaons
Figure 9.7 Expected Probabilities for Cultural Orientation
colony or possesses British heritage in form of an English legal system, its democracy profile will be libertarian-majoritarian democracy identical to the British one. Additionally, it led to a decrease in the membership probabilities of the fEC and fEc profiles—both profiles have a stronger equality dimension compared to the other democracy dimensions. The hypothesis, that the larger the population of a country, the higher the control dimension, is only partly confirmed: Population size increases only the odds for the egalitarian-control-focused democracy profile (fEC) compared to the other three profiles. The other control-focused democracy profile, FeC, is not significantly related to population size. In addition, the hypothesis, the higher the degree of cultural diversity within a country, the more the equality and control dimensions are pronounced, is also verified to some extent. Cultural diversity significantly increases the odds of being an egalitarian-controlfocused democracy profile (fEC) only compared to a Fec profile and not to the other profiles. Nevertheless, especially compared to the Fec democracy profile,
9.2 Part I: Origin of Democracy Profiles
325
the fEC profile allows an equal representation of minority groups and establishes veto-points hindering effective government and thus, might therefore be used more frequently in multinational contexts. Regarding the power resource theory, neither the labor movement hypothesis— the stronger the labor movement, the greater the probability that it is an egalitarian democracy—nor the other hypothesis that the stronger the left and/or the social democratic party, the more likely it is to be an egalitarian democracy, can be confirmed statistically. I stated that the effects were somewhat substantial although not significant: Especially the egalitarian-majoritarian profile was affected by the parliamentary representation of social-democratic and left parties and corresponds therefore to the theory. Finally, the results for the cultural factor are as follows: The hypothesis that the more a country has been influenced by Catholicism, the more likely it is that it adopts a control-focused democracy, could not be verified. The hypothesis that if the country has a cultural orientation of mastery, the more pronounced is the freedom dimension resulting in a libertarian-majoritarian or a libertarian-control-focused democracy profile, can partially be confirmed. Mastery orientation decreases the odds of being a fEC and fEc profile compared to the Fec profile. However, there is no effect for the other libertarian democracy profile (FeC). This makes sense as the control dimension of the FeC profile offers power dispersion and therefore hinders effective competition: It prevents the power of the strongest and integrates minorities to some extent. The hypothesis concerning the hierarchy orientation that a country with a cultural orientation of hierarchy is more likely to belong to the majoritarian democracy profiles (Fec or fEc), is as well only partially confirmed. Hierarchy orientation decreases significantly the odds of being a fEc profile compared to Fec democracy profile. Insofar, the hierarchy orientation influences not the effective government dimension, as I theorized, but the inclusion dimension. However, this makes sense as well: The higher the inclusion of the political system, the easier it is to participate, and the more likely everyone is considered equal. The empirical analysis revealed also the weaknesses of the theory and model. While the theory and model identified and confirmed reasonable explanatory factors of the Fec, fEc and to some extent the fEC democracy profiles, there is little evidence what causes the FeC democracy profiles. Insofar, there is room for a vast improvement.
326
9
Explaining Policy Regime Performance
9.3
Part II: Co-existence of Policy Regimes
9.3.1
Conceptualization and Measurement of Policy Regimes
In part II of this chapter, I examine if there is a relationship between democracy profiles and other configurations in the policy regime performance dimension: Do specific combinations of policy regimes co-exist? The AGIL typology of performance differentiates between various policy regimes: In the adaptation function it distinguishes between economic regimes in the sense of the Varieties of Capitalism approach and environmental regimes (Jahn, 1998; Duit, 2014, e.g. ecostates). The goal-attainment function includes governmental regimes such as the democracy profiles of this study. The integration function differentiates between welfare regimes and consociationalism. Finally, the latent pattern maintenance function encompasses the political culture approach. However, due to limitations of the research and the lack of available and credible data I can only analyze the relationship between democracy profiles and Varieties of Capitalism as well as welfare state regimes. Despite the early research by Almond/Verba (1963, 1980), which identified several types of political cultures in five countries (parochial, participatory, subject and the mixed civic culture), this research has not been continued and expanded to other countries. The research about environmental regimes (Christoff, 2005; Duit, 2014; Jahn, 2014) is still in its infancy, so that reliable concepts and data are missing. Finally, consociationalism is rare and often explored in case studies (Bogaards et al., 2019). Quantitative datasets, if they exists, are problematic by measuring only few aspects of consociationalism and focusing only on formal instead of the important informal institutions of consociationalism (Bogaards et al., 2019, p. 349). The Varieties-of-Capitalism (VoC) approach was developed by Hall/Soskice (2001) who made the famous distinction between liberal market economies (LME) and coordinated market economies (CME). The main actors in the VoC approach are the firms (Höpner, 2009, p. 309) which have to solve coordination problems in four spheres (financing, industrial relations, educational system, inter-firm relations). The central difference between the LME and the CME is that firms in LMEs follow an overall competitive market logic, while firms in CMEs use strategic and long-term coordination through networks and associations (Hall & Soskice, 2001, p. 8). This coordination mechanism characterizes each of these spheres: Financing in LME is primarily through capital markets, while firms in CMEs rely on domestic bank lending. Industrial relations are characterized in LMEs by “open markets: rates of unionization are low and labor laws are flexible, allowing forms to hire and fire employees with ease” (Orvis &
9.3 Part II: Co-existence of Policy Regimes
327
Drogus, 2017, p. 861). In contrast, CMEs have worker protection and large and strong unions which negotiate collective labor agreements. Whereas LME have general education, CME use vocational training by firms. Finally, the inter-firm relations are concerned with the transfer of knowledge. Transfer of knowledge in LMEs is done by takeover, while CMEs cooperate in joint-ventures by relying on independent arbitrators (business associations) (Schröder, 2014, p. 43). These elements are linked through “institutional complementarities” (Hall & Soskice, 2001, p. 17) meaning that a sphere with a specific coordination logic (e.g. strategic coordination) makes it more likely that another sphere shows the same logic. They claim that his results in types of economies which differ considerably in their institutional setup. Thereby, this institutional setup is rather stable over time due to path dependency (Buhr & Schmid, 2016, p. 726). Finally, this results in distinct comparative advantages of the two types: “CMEs […] are assumed to have a premium on incremental innovation, whereas LMEs […], in contrast, are supposed to focus on radical innovation” (Nölke & Vliegenthart, 2009, p. 675). Whereas LMEs are good in inventing new products, CMEs are incrementally improving existing products. This means that the one best type does not exist (Buhr & Schmid, 2016, p. 726), but each type has its own comparative advantage.5 The classification of the economic regimes of 20 OECD countries according to the varieties of capitalism typology is listed in Table 9.8. A second difference is the type of the welfare regime. Similar to Titmuss’s (1974) typology of the residual, industrial achievement-performance and institutional redistributive model of social policy, Esping-Andersen (1990) differentiates between three worlds of welfare states: A conservative, a social-democratic and a liberal welfare state. This typology rests on two dimensions: the degree of decommodification and the promoted social stratification system (Arts & Gelissen, 2002, p. 141). Thereby decommodification means the “workers’ degree of independence from the labor market as a source of subsistence” (Häusermann, 2018, p. 10), while stratification resembles the degree of economic inequality in a society. The liberal welfare state is based on the conception of the “minimal” state (Schmidt et al., 2007, p. 262) and emphasizes individualism and the market principle. It goes along with a low level of decommodification and a stark social stratification. It rests on “means testing [and] private (as opposed to public) health 5
Research suggests that distinguishing only two types of economies might not be sufficient to capture the complex reality, e.g. dependent market economies as described by Nölke/Vliegenthart (2009) or state-dominated economies (Schneider & Paunescu, 2012). Nevertheless, this study focuses only on the dichotomy of liberal market and coordinated market economies, because this “framework […] for the better or worse has become the dominant paradigm in the study of varieties of capitalism” (Witt, 2010, p. 3).
328
9
Explaining Policy Regime Performance
and retirement insurance” (Hicks & Kenworthy, 2003, p. 29). The conservative welfare state is based on Catholic social policy and offers a moderate amount of decommodification. Its social insurance programmes preserve the occupational income differences and follow the principle of subsidiarity (Arts & Gelissen, 2002, p. 142). Finally, the social-democratic welfare state shows the highest level of decommodification. The stratification system is directed at “generous universal and highly distributive benefits not dependent on any individual contributions” (Arts & Gelissen, 2002, p. 142). The goal is a basic provision for all citizens, so that social insurance programmes are “most universalistic in coverage and homogeneous in benefit level” (Hicks & Kenworthy, 2003, p. 29). Table 9.8 shows the distribution of welfare regime types for OECD countries. The third approach differentiates governmental regimes: Simple approaches divide political systems in parliamentary, semi-presidential and presidential system (Steffani, 1979; Duverger, 1980). The more sophisticated approach due to the inclusion of more variables is Lijphart’s (1999, 2012) distinction between consensus and majoritarian democracy. The approach of this study is to differentiate between five democracy profiles (FEC, Fec, fEc, FeC, fEC). This was discussed in the previous chapters.
9.3.2
Literature Review and Theoretical Framework
How are democracy profiles and VoC and welfare states related? Is it possible to combine these different typologies? As some scholars (Iversen & Soskice, 2011; Orvis & Drogus, 2017, p. 862; Soskice, 2007) point out, there are complementarities between varieties of capitalism, welfare states and the governmental systems (consensus versus majoritarian democracies). Consensus democracies tend to go along with CMEs and strong social democratic or conservative welfare states, whereas majoritarian democracies are associated with LMEs and liberal welfare states. The literature offers several reasons, why these specific configurations reinforce and support each other. How are the economic regime and the welfare state regime reinforcing each other? Businesses and employers “played a much more important role in welfare state development than Esping-Andersen had assumed” (Arts & Gelissen, 2010, p. 574), so that the VoC approach enhances Esping-Andersen’s power resource theory (Häusermann, 2018, p. 11). Businesses in CMEs require employees to specialize in certain skills. Therefore, a “precondition for skill specificity […] is the need of extensive guarantees” (Soskice, 2007, p. 92) for the workers which are provided by a strong welfare state (social-democratic or conservative). On the
9.3 Part II: Co-existence of Policy Regimes
329
Table 9.8 Overview Policy Regimes Country
Welfare Regime 1990 (Esping-Andersen)
Varieties of Capitalism 1990s (Hall/Soskice)
Australia
Liberal
LME
Austria
Social Democratic
CME
Belgium
Social Democratic
CME
Canada
Liberal
LME
Denmark
Social Democratic
CME
Finland
Conservative
CME
France
Conservative
Mixed
Germany
Conservative
CME
Ireland
Liberal
LME
Italy
Conservative
Mixed
Japan
Conservative
CME
Netherlands
Social Democratic
CME
New Zealand
Liberal
LME
Norway
Social Democratic
CME
Portugal
–
Mixed
Spain
–
Mixed
Sweden
Social Democratic
CME
Switzerland
Conservative
CME
United Kingdom
Liberal
LME
United States
Liberal
LME
Source: Höpner (2009, p. 313), Hick/Kenworthy (2003, p. 28) and Esping-Andersen (1990, p. 52)
other hand, a flexible labor market goes along with a liberal welfare state because the “training system focus in general skills, making a high level of social protection less necessary” (Arts & Gelissen, 2010, p. 574). In addition, “the case for an insurance-based welfare state is accepted by employers” (Iversen & Soskice, 2011, p. 642). Therefore, the interest in the creation of a strong welfare state crosses the classes of employers and workers resulting in cooperation (Mares, 2003). Why do specific governmental system and economic regimes reinforce each other? On the one hand, consensus democracy offers a “framework for interest
330
9
Explaining Policy Regime Performance
groups to take part in policymaking” (Soskice, 2007, p. 93) and supports unions and associations in their important role which CMEs have assigned to them. On the other hand, PR systems “empirically favour left of centre coalitions, while majoritarian systems favour centre-right” (Soskice, 2007, p. 94). This is called the “structural centre-left bias” (Iversen & Soskice, 2011, p. 632) of consensus systems. In PR systems the center party will choose a left party for coalition “to tax the rich while a centre-right party gains little from taxing the poor” (Iversen & Soskice, 2011, p. 638). The strength of left and social-democratic parties results in CMEs and strong welfare states. However, the VoC-types shape the electoral system of the governmental system as well: “[L]ow economic coordination at the end of the nineteenth century bred class conflict that led parties on the right to favor majoritarian institutions as the best protection against the rising left” (Ido, 2012, p. 7). In contrast, the cross-class interests between business elites and workers in CMEs let to the establishment of PR systems. If we apply these considerations to the democracy profiles of this study, the electoral system focuses on the equality vs. freedom trade-off (inclusiveness). PR systems allow a greater representation of the different groups of the society, while plurality voting is more exclusive. Therefore, egalitarian democracies (fEc and fEC) should favor CMEs and therefore strong welfare states, while libertarian democracies (Fec or FeC) should promote LMEs and minimal welfare states. I can add yet another reason for the co-occurrence of these three concepts: A specific set of policy regimes might share the same cultural values. CMEs, socialdemocratic/conservative welfare states and egalitarian democracy profiles (fEc or fEC) are based on an egalitarian culture. In contrast, LMEs, liberal welfare states and libertarian democracy profiles (Fec or FeC) are based on competition. Finally, Figure 9.8 summarizes and visualizes these interrelationships.
9.3.3
Methodological Framework
The methodological framework applied here is very minimal. First, I am not really proposing a causal relationship between these different policy regimes which would imply a temporal sequence of the emergence of these regimes. Rather I speak of a co-existence of these regimes which does not necessarily refer to a causal sequence. This makes regression analysis not the preferred option. Second, there is a very small sample size and thus, low statistical power. Third, even if I would apply regression analysis, the small samples size does not permit the inclusion of control variables. This would negate the advantages of multivariate regression analysis. Therefore, I concentrate on a mere descriptive analysis using
9.3 Part II: Co-existence of Policy Regimes
331
Note: I applied this figure to the democracy profiles and added the cultural explanatory component. Source: Soskice (2007, p. 95)
Figure 9.8 Relationship between Economic Regime, Welfare State and Democracy Profiles
simple box plots without reference to any statistical parameters. This makes it also possible to combine the descriptive analysis and the discussion section. The classifications of the welfare state regimes and varieties of capitalism are snapshots from around 1990. Similarly, I create the variable measuring the democracy profiles for each country by averaging the membership probabilities for each democracy profile for all years from 1974 to 1990. As a robustness check, I also use the period from 1990 to 2000 as a reference point of the democracy profiles (see Figure 211 in the appendix).
9.3.4
Descriptive Analysis and Discussion
The box plots in Figure 9.9 reveal that certain economic regimes, welfare state regimes and democracy profiles actually co-exist with each other. Regarding the
332
9
Explaining Policy Regime Performance
typology of welfare state regimes, liberal welfare state regimes show a higher membership probability of the libertarian-majoritarian democracy profile (Fec). Social-democratic welfare states are associated with egalitarian-majoritarian democracy profiles, while the conservative welfare state represents a mixture of egalitarian-majoritarian, egalitarian-control-focused and to some extent also the libertarian-majoritarian democracy types. The balanced democracy profile can be found in all welfare state regimes, while small portions of the libertarian-controlfocused democracy profile can be found especially in the liberal welfare state and the conservative welfare state.
Source: own figure
Figure 9.9 Boxplots for the Economic Regime, Welfare State Regime and Democracy Profiles
In terms of the varieties of capitalism typology, liberal market economies (LMEs) exhibit libertarian-majoritarian democracy profiles, while coordinated market economies (CMEs) contain especially egalitarian-majoritarian democracy profiles. They also include small shares of egalitarian-control-focused democracy
9.3 Part II: Co-existence of Policy Regimes
333
profiles. Balanced democracy profiles can be found in all economic regimes. Finally, the mixed category shows a variety of different democracy profiles, whereby egalitarian-majoritarian democracies predominate here. Furthermore, the welfare regime typology and the varieties of capitalism typology are related. This is shown in the lower row of Figure 9.9. While all liberal market economies are also liberal welfare states, the coordinated market economies go along with social democratic and conservative welfare state regimes. This makes it clear that certain policy regimes coexist and seem to be complementary to each other (see also the box plot in Figure 212 in the appendix, which combines all typologies at once). Using a period from 1990 to 2000 as a basis for the membership probabilities of the democracy profiles does not change the results (see Figure 211 in the appendix). This indicates a reasonable degree of robustness. Finally, I test whether these regime configurations actually share the same cultural values. I draw on Schwartz’s data for cultural orientations (Licht et al., 2007), which I used earlier in part 1 (see section 9.2). I also select the same cultural value orientation for the analysis: Mastery orientation highlights competitiveness and success, while a low mastery orientation emphasizes “consensus, solidarity, and harmony […] and sympathy for the weak” (Maleki & Hendriks, 2015, p. 10). The previous regression analysis showed that mastery orientation and the democracy profiles are significantly related: The probability of a libertarian democracy profiles (especially Fec) increases with a stronger mastery orientation, whereas the membership probabilities of the fEc and the fEC democracy profiles decrease. Figure 9.10 shows a similar relationship for the welfare states and VoC types: Liberal welfare states have a higher mastery orientation compared to conservative or social democratic welfare states. Similarly, LMEs have a higher mastery orientation compared to CMEs. This makes it clear that cultural values reinforce the relationship between these different policy regime types. A specific set of policy regimes occurs because of the same cultural values. Overall, the descriptive analysis showed that specific democracy profiles tend to go along with specific types of capitalism and welfare states. I observed two groups: On the one hand, the fEc and fEC profile are associated with CMEs and social democratic and conservative welfare states. On the other hand, libertarianmajoritarian democracy profiles are more likely associated with LMEs and liberal welfare states. There is no clear picture for the balanced democracy profile which might be expected from a conceptual point of view, as it is undecided about its dimensional orientation and therefore can coexist with all other policy regimes. There are also no clear results for the FeC profile. The reason is that almost no countries have this profile and it almost disappeared in the third wave.
334
9
Explaining Policy Regime Performance
Source: own figure
Figure 9.10 Cultural Orientations and Policy Regimes
Another important observation is that majoritarian democracy profiles occupy the extremes, insofar as they can be especially found in liberal and socialdemocratic welfare states. The same applies to CMEs and LMEs. Democracies with a more pronounced control-dimension (here only fEC) belong more often to the intermediate types: conservative welfare states and mixed economies. A possible explanation for the intermediate position of the fEC profile in regard to the welfare states might be that a strong control dimension with multiple veto players makes policy change difficult and constrains a possible expansion of the welfare state, while it also hinders welfare state retrenchment (Schmidt et al., 2007, pp. 66–67; 269–271). Finally, the analysis revealed that cultural values play a role: A competitive culture goes hand in hand with LMEs, liberal welfare states and libertarian democracy profiles. In contrast, a harmonic culture with empathy for the vulnerable is associated with CMEs, social democratic/conservative welfare states and egalitarian democracy profiles.
9.4 Summary and Conclusion
9.4
335
Summary and Conclusion
The focus of this chapter was the explanation of policy regime performance. In the first part of this chapter, the causes of the profiles of democracy were analyzed. I proposed several theoretical reasons why democracy profiles emerge in certain countries. These hypotheses were put to the test by applying a Dirichlet regression which allows to model the special characteristics of the dependent variable, the membership probabilities of the democracy profiles. In each regression, several diagnostic procedures are applied. The regression analysis confirmed several hypotheses, sometimes only partially: The cultural diversity hypothesis was confirmed partially, because an increase in cultural diversity only had a positive effect on the egalitarian-control-focused democracy profile (fEC) compared to a Fec profile and not to the other profiles. However, this can also be explained: the Fec and fEC profile are the most contrasting profiles. While the other profiles might be appropriate in these settings of cultural diversity, the Fec profile is certainly not. In addition, only partial confirmed was the population size hypothesis. Population size increases only the odds for the egalitarian-control-focused democracy profile (fEC) compared to the other three profiles, while the other control-focused democracy profile, FeC, is not significantly related to population size. The British heritage hypothesis was fully verified: A democracy which was a former British colony or its legal origin was British, also has a democracy profile similar to the British one (libertarianmajoritarian democracy). The cultural hypothesis can also be conformed partially: Mastery orientation has a negative effect on the membership probabilities of fEC and fEc profile compared to the Fec profile. However, there is no effect for the other libertarian democracy profile (FeC). Furthermore, hierarchy orientation decreases significantly the odds of being a fEc profile compared to Fec democracy profile. Insofar, the hierarchy orientation influences not the effective government dimension, as I theorized, but the inclusion dimension. Finally, there was no significant effect for the variables of the power resource theory; insofar they were not confirmed. However, I argued that the effect of the social-democratic/left party indicator is still substantial by increasing especially the probabilities of the fEc profile with an increased representation of those parties. In this respect, the hypothesis should not be dismissed so easily. The second part of this chapter analyzed the co-existence of certain democracy profiles, welfare state types and VoC types. I stated several theoretical reasons why specific policy regimes are related to each other. Besides institutional reasons (e.g. electoral system; decision-making framework) and the power resource theory (interests of business elites and workers), I presented another explanation
336
9
Explaining Policy Regime Performance
strain linked to cultural values: LMEs, libertarian democracy profiles and minimal welfare states are associated with each other, because they share the same background of cultural values. These values highlight competition and individualism. In contrast, a harmonic culture with empathy for the vulnerable is associated with CMEs, social democratic/conservative welfare states and egalitarian democracy profiles. This part shows that the different conceptual approaches, which were developed independently from each other (Iversen & Soskice, 2011, p. 633), form a coherent picture. Overall, this chapter gives considerable validation to the conceptualization and measurement of democracy profiles. I was able to confirm the theoretical expected relationships and thus the conceptualization and measurement cannot be completely off track. In addition, another validation is gained by this analysis because the pattern of co-existence between these policy regimes is confirmed. However, it also revealed a problem in the typology: There were almost no confirmed relationship for the FeC profile. This profile is above all a historical one and is hardly present today, which reduces the classifying power of the typology.
Conclusion
10.1
10
Relevance and Research Question of the Study
Comparative political science in the tradition of Lijphart’s famous study “Patterns of Democracies” concludes that democracies can be distinguished by specific institutional designs, that there is a reason why some countries choose a specific institutional design and that finally “institutions do matter” for performance in policy outcomes. However, significant research gaps are evident from a conceptual, theoretical, and methodological perspective. Current approaches lack a clear conception of models of democracy and what constitutes “good” performance. In addition, these approaches only weakly theorize the causal relationship between the democracy profile and the policy outcome and do not take relevant control variables into account. Finally, the methodological weakness shows up in the form that standard regression models are used instead of time-series crosssectional analysis, which leads to a neglect of the time-series structure of the data and results in a low statistical power. This implicates the need for further research. Thus, in this study, I addressed three distinct but interdependent research questions by reinforcing each other: The first research question was: Is it possible to identify democracy profiles based on the idea of trade-offs? While democracy models are generally distinguished in comparative political science, democracy models in the quality of democracy research are only recently analyzed, concluding that a perfect democracy does not exist and that “every democratic country must make an inherently value-laden choice about what kind of democracy it wishes to be” (Diamond & Morlino, 2004, p. 21 emph. in original). This is the starting point of the study and opens up other research questions: If such democracy profiles can be identified, do they actually influence performance?
© The Author(s), under exclusive license to Springer Fachmedien Wiesbaden GmbH, part of Springer Nature 2021 O. Schlenkrich, Origin and Performance of Democracy Profiles, Vergleichende Politikwissenschaft, https://doi.org/10.1007/978-3-658-34880-9_10
337
338
10
Conclusion
The typology established by working on the first question is a prerequisite for the second question. But the most concise typology lacks relevance until it can be demonstrated that its types do in fact have measurable effects. Finally, the third research question addresses the origins of these democracy profiles. What causes democracy profiles? If the empirically derived causes match the theoretical expectations, it gives considerable validation to the typology. The following section 10.2 summarizes the most important findings of the study and derives its main implications, while section 10.3 discusses the limitations of the study and starting points for future research.
10.2
Important Findings and Implications of the Study
The important findings and implications of the first research question are as follows: Other approaches discussed the existence of trade-offs conceptually (Diamond & Morlino, 2004; Coppedge et al., 2011; Bühlmann et al., 2012), but they were unable to identify any democracy profiles or trade-offs empirically. The present study was able to find competing democracy conceptions realized as democracy profiles in the empirical world. These democracy profiles show that a perfect democracy which maximizes all three democracy dimensions does not exist, so that the trade-offs between freedom and equality (inclusiveness) as well as freedom and control (effective government) support the structuring of the empirical reality. Particularly, the usage of fuzzy clustering had a great relevance for this study, since there were serious overlaps between these clusters of democracy profiles that would make a clear classification unconvincing. I identified a maximum of five different democracy profiles: Libertarian-majoritarian democracies combine high freedom with low equality and control values, while egalitarian-majoritarian democracies favor equality over freedom and control. Libertarian-control-focused democracies are characterized by high values of freedom and control at the expense of equality, while egalitarian-control focused democracies emphasize equality and control at the cost of freedom. Finally, a balanced cluster can be determined whose dimensions are all at the same level. This study also provides the first analysis of the V-Dem dataset outside of the V-Dem institute (to my knowledge). V-Dem studies the general characteristics of their measurement model based on the item response theory by simulating different scenarios and then testing how good the measurement model functions in those scenarios (Marquardt & Pemstein, 2018; Pemstein et al., 2015, 2019). In addition, V-Dem discusses the validation of single questions or indices and studies the influence of the characteristics of the experts (e.g. gender) on their
10.2 Important Findings and Implications of the Study
339
response behavior (McMann et al., 2016). Unfortunately, the data for the coder characteristics is not publicly available. However, this study works with the actual raw data of the experts’ ratings, taking all indicators into account. Thus, it takes a holistic perspective. It analyzes not only the validity and reliability of the whole data set, but also places special emphasis on the prerequisites of the measurement model. My study implies that the dataset has a conceptual bias towards the freedom dimension, some unclear formulated questions and reliability issues considering that only a minority of the expert coders rate most of the data. However, I found few source of bias in the experts’ recruitment process: democracies, and especially populous and large countries had more coders on average, but all in all the distribution of coders was relatively even. Considering the AGIL typology of political performance, Parsons’ AGIL scheme is a powerful heuristic framework for thinking about “functional prerequisites” of a system to survive (adaptation, goal-attainment, integration and latent pattern maintenance). It helps to overcome a major point of critique of the current research by justifying the selected performance criteria. From these functional prerequisites, I derived several important performance areas: economic outcomes, environmental outcomes, goal-attainment outcomes, social outcomes, domestic security outcomes and latent pattern maintenance outcomes. Furthermore, the exploratory factor analysis might be favorable to a more theoretical driven or manual aggregation process, because it provides a variety of diagnostics which help to estimate the quality of the aggregation (e.g. RMSEA, KMO values, coefficient omega). This aggregation procedure extracts several components for each performance area: Economic wealth and productivity; general environmental performance, economic equality performance and social equality performance, domestic security performance and finally, confidence in the latent pattern maintenance performance area. As for the second research question, different democracy conceptions affect performance. The results point to the fact that there is not a single best democracy profile: Libertarian democracies have a (slight) advantage in economic performance, while egalitarian democracies have better results in the environmental outcomes, and especially in the economic equality and social equality outcomes. There are mixed results for the goal-attainment and no statistically significant effects for the confidence performance dimension. Therefore, if we equate Lijphart’s consensus democracies (on the executive-parties dimension) with the egalitarian democracies, the results align with his study showing the “kinder and gentler” nature of egalitarian democracies in terms of economic and social equality. In contrast to Lijphart, who states that there is no difference in
340
10
Conclusion
macroeconomic performance between the democracy models, libertarian democracies show better performance in this area. One implication for research is that these effects are, in general, only weak. But they are still important because the causal distance is far away and there are many factors one can think of which are closer to the outcome. Furthermore, these effects do not result in an immediate performance change, but they only take effect after a longer period of time. In contrast to other studies, which often focus on single aspects of performance, this study gives an overview of the effects of democracy profiles on all relevant performance areas. The TSCS multilevel within-between model allows the inclusion of timeinvariant variables. These variables are of substantial interest in this study and in comparative politics in general. These within-between models can also be linked to the more classical autoregressive distributed lag (ADL) models and error correction models (ECM), as this study shows. It is important to stress that the correctness of the TSCS analyses is highly dependent on the right model specifications (S.E. Wilson & Butler, 2007). Violating one of the assumptions of the regression (e.g. unit root, autocorrelation, unit heterogeneity) can lead to substantial bias in the estimates. By applying this type of multilevel model, it is no longer necessary to accept model misspecifications in order to be able to answer important research questions about context factors (e.g. not using fixed effects) (Bell & Jones, 2015). Bayesian methods were especially helpful here, since they not only allow the estimation of such complex models but also permit in a simple way the inclusion of uncertainty created by replacing the missing values with multiple imputation. Finally, Bayesian methods allow an easy calculation of the significance of the long-run multiplier (LRM) which is not directly accessible with frequentist statistics (De Boef & Keele, 2008). Therefore, a more frequent use of Bayesian methods in comparative political science is desirable. Finally, for the third research question, an important explanatory factor that became obvious is culture. Throughout the paper I argued that trade-offs represent a value-laden choice. I find evidence for this thesis, because democracy profiles have their origins in a similar culture. In addition, democracy profiles form a coherent whole with other policy regimes (e.g. coordinated market economies and strong welfare states go hand in hand with an egalitarian democracy profile). Here culture is also a substantial factor. The former group emphasizes competition and individualism, while the latter group stresses a more harmonic culture. This factor should be given greater weight in further research (Maleki & Hendriks, 2015). In addition, the Dirichlet regression for compositional data was useful to incorporate and consider the uncertainty of the classification. However, it might
10.3 Limits of the Study
341
be used outside of this study for other important research questions in political science (e.g. allocation of public budgets).
10.3
Limits of the Study
As shown, the study has several implications for the current research on the relationship between democracy models and performance. However, there are also severe limitations. While the study improves the research by tracing the democracy models back to their respective democracy conceptions and by empirically verifying these types based on the idea of trade-offs, there are serious limits in the conceptualization and classification. First, relevant trade-offs could not be identified for all matrix fields, so that the aggregation to the final dimensional value on which the cluster analysis was based included matrix fields with trade-offs and without trade-offs. This averaging process could have led to the loss of sharp distinctions between the democracy profiles. Therefore, an important question is, what other democracy conceptions and trade-offs are important? Second, this typology works very well for the second wave of democracy, where all democracy profiles appear at an almost equal rate. It works less well for the third wave of democracy (since 1974). On the one hand, the FeC democracy profile (libertariancontrol-focused) vanishes almost completely in the 1990s. Although this profile is conceptually relevant, it can be regarded more as a historical profile. On the other hand, the balanced profile (FEC) and egalitarian profile (fEc and fEC) gained too much empirical weight, reducing the discriminatory power of this typology for the recent years. Concerning the second research question, the AGIL typology of political performance provided a useful tool by focusing on performance criteria that are more relevant than others and should be included in the overall performance assessment. The limits of the typology became apparent in this study. Although the goalattainment function can be considered as relevant, because it encompasses not only constitutional reforms but also institutional learning, it could only be weakly defined conceptually. This was subsequently reflected in the poor measurement and empirical analysis. Besides a few items of the Sustainable Governance Indicators (SGI), which cover only OECD countries for a short time-period, there are no indicators to assess the reformability of a political system. The same applies to the general performance dimension. Besides its relevance, there is not enough data. So, these are still open questions. The answer to the second research question showed that trade-offs and the derived democracy profiles matter. The structuring of the variety of causal effects
342
10
Conclusion
into three effect types might be considered a theoretical improvement. However, as the empirical analysis shows by not verifying the hypotheses derived from these three types, it is probably not useful to think of general effects of democracy profiles which can be transferred to all performance areas at the same time. Therefore, an implication would be to think about more concrete explanations of democracy profiles with regard to single policy outcomes. Even those studies which focus on individual performance areas often use overly broad theories. Similarly, the incorporation of additional control variables which include the effects of informal institutions and statehood on performance is important from a theoretical point of view, however, they often explained less than one might have expected (indicated by the lack of significance). Still, this does not mean that an inclusion of this variables is not warranted. One severe limit is that the power resource theory and partisan theory could not be included in the explanation of performance, although they are central explanatory factors. This would help control for the political interests of important political actors or groups. The reason for this was that it would reduce the sample size too much and make the sample too homogeneous in terms of democracy profiles for a reasonable empirical analysis. This raises the question of whether it is possible to measure the interests of political actors differently than on the basis of the party families. It would help to create a variable which is not only reasonable for European countries but can be generalized to other regions as well. A disadvantage of Bayesian methods is the long computation time and the high demand on computer resources, especially if the sample size is large and the model complex. Some of these models needed to run for several hours until the results were finally available. Nevertheless, the methodological strategy of combining the TSCS model with Bayesian statistics turned out to be very valuable. Regarding the third research question about the origins of democracy profiles, the major theoretical limit for this research question is that it is not clear in which direction the causal arrow points. It is possible that the institutional design leads to a specific culture and the causal chain is actually reversed. All these points show the need for further research.
References
Aachen, C.H. (2000). Why Lagged Dependent Variables Can Suppress the Explanatory Power of Other Independent Variables. Presented at the Annual Meeting of the Political Methodology Section of the American Political Science Association, July 20–22, Los Angeles. Abayomi, K., Gelman, A. & Levy, M. (2008). Diagnostics for multivariate imputations. Journal of the Royal Statistical Society: Series C (Applied Statistics) 57(3): 273–291. Abels, H. (2019). Einführung in die Soziologie: Band 1: Der Blick auf die Gesellschaft. Wiesbaden: Springer Fachmedien Wiesbaden. Retrieved from http://link.springer.com/ 10.1007/978-3-658-22472-1 Aebi, M.F. (2019). Cross-National Comparisons Based on Official Statistics of Crime. In M. Natarajan (ed.), International and Transnational Crime and Justice (2nd ed.). Cambridge University Press. Retrieved from https://www.cambridge.org/core/product/identifier/978 1108597296%23CN-bp-80/type/book_part Aebi, M.F. & Linde, A. (2010). Is There a Crime Drop in Western Europe? European Journal on Criminal Policy and Research 16(4): 251–277. Aitchison, J. (1982). The Statistical Analysis of Compositional Data. Journal of the Royal Statistical Society. Series B (Methodological) 44(2): 139–177. Aitchison, J. (1986). The Statistical Analysis of Compositional Data. New York: Chapman and Hall. Akhanli, S.E. & Hennig, C. (2020). Comparing clusterings and numbers of clusters by aggregation of calibrated clustering validity indexes. Stat Comput (30): 1523–1544. Alesina, A., Devleeschauwer, A., Easterly, W., Kurlat, S. & Wacziarg, R. (2003). Fractionalization. Journal of Economic Growth 8(2): 155–194. Allan, K. (2010). The Social Lens: An Invitation to Social and Sociological Theory. Pine Forge Press. Almond, G.A. & Powell, G.B. (1982). Evaluating Political Goods and Productivity. International Political Science Review 3(2): 173–181. Almond, G.A. & Verba, S. (1963). The Civic Culture: Political Attitudes and Democracy in Five Nations. Princeton University Press. Retrieved from http://www.jstor.org/stable/j.ctt 183pnr2
© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Fachmedien Wiesbaden GmbH, part of Springer Nature 2021 O. Schlenkrich, Origin and Performance of Democracy Profiles, Vergleichende Politikwissenschaft, https://doi.org/10.1007/978-3-658-34880-9
343
344
References
Almond, G.A. & Verba, S. (eds.). (1980). The Civic Culture Revisited: An Analytic Study. Boston: Little, Brown. Anderson, C.J. & Guillory, C.A. (1997). Political Institutions and Satisfaction with Democracy: A Cross-National Analysis of Consensus and Majoritarian Systems. American Political Science Review 91(1): 66–81. Anderson, L. (2001). The Implications of Institutional Design for Macroeconomic Performance: Reassessing the Claims of Consensus Democracy. Comparative Political Studies 34(4): 429–452. Andeweg, R. (2001). Lijphart vs. Lijphart: The Cons of Consensus Democracy in Homogeneous Societies. Acta Politica 36(2): 117–128. Andeweg, R.B. (2000). Consociational Democracy. Annual Review of Political Science 3(1): 509–536. Armingeon, K., Wenger, V., Wiedemeier, F., Isler, C., Knöpfel, L., Weisstanner, D. & Engler, S. (2018). Comparative political data set 1960-2016. Bern: Institute of Political Science, University of Berne. Arts, W.A. & Gelissen, J. (2002). Three worlds of welfare capitalism or more? A state-ofthe-art report. Journal of European Social Policy 12(2): 137–158. Arts, W.A. & Gelissen, J. (2010). Models of the Welfare State. Oxford University Press. Retrieved from http://oxfordhandbooks.com/view/10.1093/oxfordhb/978019957 9396.001.0001/oxfordhb-9780199579396-e-39 Bafumi, J. & Gelman, A.E. (2006). Fitting Multilevel Models When Predictors and Group Effects Correlate. Retrieved from https://academiccommons.columbia.edu/doi/10.7916/ D87P953X Banks, A.S. & Wilson, K.A. (2019). Cross-National Time-Series Data Archive. Databanks International. Jerusalem. Retrieved from https://www.cntsdata.com/ Banting, K.G. & Simeon, R. (1985). Introduction: The Politics of Constitutional Change. In K. G. Banting & R. Simeon (eds.), The Politics of Constitutional Change in Industrial Nations. London: Palgrave Macmillan UK. Retrieved from http://link.springer.com/10. 1007/978-1-349-06991-0_1 Barceló, C., Pawlowsky, V. & Grunsky, E. (1996). Some aspects of transformations of compositional data and the identification of outliers. Mathematical Geology 28(4): 501–518. Basagaña, X., Barrera-Gómez, J., Benet, M., Antó, J.M. & Garcia-Aymerich, J. (2013). A Framework for Multiple Imputation in Cluster Analysis. American Journal of Epidemiology 177(7): 718–725. Bates, T.C., Maes, H. & Neale, M.C. (2019). umx: Twin and Path-Based Structural Equation Modeling in R. Twin Research and Human Genetics 22(1): 27–41. Beck, N. & Katz, J.N. (1995). What To Do (and Not to Do) with Time-Series Cross-Section Data. American Political Science Review 89(03): 634–647. Beck, N. & Katz, J.N. (2011). Modeling Dynamics in Time-Series–Cross-Section Political Economy Data. Annual Review of Political Science 14(1): 331–352. Bell, A. & Jones, K. (2015). Explaining Fixed Effects: Random Effects Modeling of TimeSeries Cross-Sectional and Panel Data. Political Science Research and Methods 3(1): 133–153. Berlin, I. (2000). My Intellectual Path. In H. Hardy (ed.), Isaiah Berlin: The Power of Ideas. London: Chatto & Windus.
References
345
Bertelsmann Stiftung, D. (2018a). BTI 2018. Methodology. Bertelsmann Stiftung, D. (2018b). BTI 2018. Codebook for Country Assessment. Gütersloh. Birchfield, V. & Crepaz, M.M.L. (1998). The impact of constitutional structures and collective and competitive veto points on income inequality in industrialized democracies. European Journal of Political Research 34(2): 175–200. Blumstein, A., Tonry, M. & Van Ness, A. (2005). Cross-National Measures of Punitiveness. Crime and Justice 33: 347–376. Bogaards, M. (2017a). Comparative Political Regimes: Consensus and Majoritarian Democracy (Vol. 1). Oxford University Press. Retrieved from http://oxfordre.com/politics/view/ 10.1093/acrefore/9780190228637.001.0001/acrefore-9780190228637-e-65 Bogaards, M. (2017b). Comparative Political Regimes: Consensus and Majoritarian Democracy (Vol. 1). Oxford University Press. Retrieved from http://oxfordre.com/politics/view/ 10.1093/acrefore/9780190228637.001.0001/acrefore-9780190228637-e-65 Bogaards, M., Helms, L. & Lijphart, A. (2019). The Importance of Consociationalism for Twenty-First Century Politics and Political Science. Swiss Political Science Review 25(4): 341–356. Bormann, N.-C. (2010). Patterns of Democracy and Its Critics. Living Reviews in Democracy 2(2): 1–14. Brock, D., Junge, M. & Krähnke, U. (2012). Soziologische Theorien von Auguste Comte bis Talcott Parsons: Einführung (3., aktualisierte Auflage.). München: Oldenbourg Verlag. Brusis, M. (2008). Reformfähigkeit messen? Konzeptionelle Überlegungen zu einem Reformfähigkeitsindex für OECD-Staaten. Politische Vierteljahresschrift 49(1): 92–113. Bucur, C. & Rasch, B.E. (2019). Institutions for Amending Constitutions. In R. D. Congleton, B. Grofman & S. Voigt (eds.), The Oxford Handbook of Public Choice, Volume 2. Oxford University Press. Retrieved from http://oxfordhandbooks.com/view/oxfordhb/978019046 9771.001.0001/oxfordhb-9780190469771-e-40 Bühlmann, M., Merkel, W., Müller, L., Giebler, H. & Weβels, B. (2012). Demokratiebarometer: ein neues Instrument zur Messung von Demokratiequalität. Zeitschrift für Vergleichende Politikwissenschaft 6(S1): 115–159. Buhr, D. & Schmid, J. (2016). Wirtschaftspolitik in der Vergleichenden Politikwissenschaft. In H.-J. Lauth, M. Kneuer & G. Pickel (eds.), Handbuch Vergleichende Politikwissenschaft. Wiesbaden: Springer Fachmedien Wiesbaden. Retrieved from http://link.springer.com/10. 1007/978-3-658-02338-6_54 Buis, M.L. (2019). Analysis of Proportions. In P. Atkinson, S. Delamont, A. Cernat, J. W. Salkshaug & R. A. Williams (eds.), SAGE Research Methods Foundations. 1 Oliver’s Yard, 55 City Road, London EC1Y 1SP United Kingdom: SAGE Publications Ltd. Retrieved from https://methods.sagepub.com/foundations/analysis-of-proportions Bürkner, P.-C. (2017). brms : An R Package for Bayesian Multilevel Models Using Stan. Journal of Statistical Software 80(1). Retrieved from http://www.jstatsoft.org/v80/i01/ Buuren, S. van. (2018). Flexible imputation of missing data (Second edition.). Boca Raton: CRC Press, Taylor & Francis Group. Buuren, S. van & Groothuis-Oudshoorn, K. (2011). mice: Multivariate Imputation by Chained Equations in R. Journal of Statistical Software 45(3). Retrieved from http://www.jstatsoft. org/v45/i03/
346
References
Callaghan, H. & Ido, M. (2012). Introduction: Varieties of capitalism, types of democracy, and globalization. In M. Ido (ed.), Varieties of Capitalism, Types of Democracy and Globalization (1st ed.). Routledge. Retrieved from https://www.taylorfrancis.com/books/978 0203123980 Cattell, R.B. (1966). The Scree Test For The Number Of Factors. Multivariate Behavioral Research 1(2): 245–276. Christoff, P. (2005). Out of chaos, a shining star? Toward a typology of green states. In J. Barry & R. Eckersley (eds.), The State and the global ecological crisis. Cambridge, Mass.: MIT Press. Clasen, J. & Siegel, N.A. (2007). Comparative Welfare State Analysis and the ‘Dependent Variable Problem’. In Investigating Welfare State Change—The ‘Dependent Variable Problem’ in Comparative Analysis. Edward Elgar Publishing. Retrieved from http://www.elg aronline.com/view/9781845427399.00009.xml Clio-Infra. (2018). Database. Retrieved from http://www.clio-infra.eu/ Contiades, X. & Fotiadou, A. (2012). Models of constitutional change. In X. Contiades (ed.), Engineering Constitutional Change: A Comparative Perspective on Europe, Canada and the USA. London: Routledge. Retrieved from https://www.taylorfrancis.com/books/978 0203094990 Coppedge, M., Gerring, J., Altman, D., Bernhard, M., Fish, S., Hicken, A., … Teorell, J. (2011). Conceptualizing and Measuring Democracy: A New Approach. Perspectives on Politics 9(2): 247–267. Coppedge, M., Gerring, J., Knutsen, C.H., Lindberg, S.I., Skaaning, S.-E., Teorell, J., … Ziblatt, D. (2018). V-Dem Country-Year Dataset 2018. Retrieved from https://www.vdem.net/en/data/data-version-8 Coppedge, M., Gerring, J., Knutsen, C.H., Lindberg, S.I., Skaaning, S.-E., Teorell, J., … Ziblatt, D. (2019). V-Dem Codebook v9. Retrieved from https://www.v-dem.net/en/data/ data-version-8 Coppedge, M., Gerring, J., Knutsen, C.H., Lindberg, S.I., Teorell, J., Altman, D., … Ziblatt, D. (2019). V-Dem Country-Year Dataset 2019. Retrieved from https://www.v-dem.net/ en/data/data-version-8 Crepaz, M.M.L. (1996). Consensus Versus Majoritarian Democracy: Political Institutions and their Impact on Macroeconomic Performance and Industrial Disputes. Comparative Political Studies 29(1): 4–26. Crepaz, M.M.L. (1998). Inclusion versus Exclusion: Political Institutions and Welfare Expenditures. Comparative Politics 31(1): 61–80. Crepaz, M.M.L. & Moser, A.W. (2004). The Impact of Collective and Competitive Veto Points on Public Expenditures in the Global Age. Comparative Political Studies 37(3): 259–285. Croissant, A. (2010). Regierungssysteme und Demokratietypen. In H.-J. Lauth (ed.), Vergleichende Regierungslehre. Wiesbaden: VS Verlag für Sozialwissenschaften. Retrieved from http://link.springer.com/10.1007/978-3-531-92357-4_5 Crouch, C. (2004). Post-democracy. Malden, MA: Polity. Cumming, G. (2014). The New Statistics: Why and How. Psychological Science 25(1): 7–29. Dahl, R.A. (1967). The Evaluation of Political Systems. In I. de S. Pool (ed.), Contemporary Political Science: toward empirical theory. New York: McGraw-Hill Book Company.
References
347
Dalton, R.J. (2004). Democratic Challenges, Democratic Choices. Oxford University Press. Retrieved from http://www.oxfordscholarship.com/view/10.1093/acprof:oso/978019926 8436.001.0001/acprof-9780199268436 Davidov, E., Dülmer, H., Cieciuch, J., Kuntz, A., Seddig, D. & Schmidt, P. (2016). Explaining Measurement Nonequivalence Using Multilevel Structural Equation Modeling: The Case of Attitudes Toward Citizenship Rights. Sociological Methods & Research 47(4): 729– 760. Dawson, J.W., Dejuan, J.P., Seater, J.J. & Stephenson, E.F. (2001). Economic information versus quality variation in cross-country data. Canadian Journal of Economics/Revue Canadienne d‘Economique 34(4): 988–1009. De Boef, S. & Keele, L. (2008). Taking Time Seriously. American Journal of Political Science 52(1): 184–200. Depaoli, S. & van de Schoot, R. (2017). Improving transparency and replication in Bayesian statistics: The WAMBS-Checklist. Psychological Methods 22(2): 240–261. Diamond, L.J. & Morlino, L. (2004). An Overview. Journal of Democracy 15(4): 20–31. DiStefano, C., Zhu, M. & Mîndrila, D. (2009). Understanding and Using Factor Scores: Considerations for the Applied Researcher. Practical Assessment, Research & Evaluation 14(20): 11. Doorenspleet, R. & Pellikaan, H. (2013). Which type of democracy performs best? Acta Politica 48(3): 237–267. Dormann, C.F., Elith, J., Bacher, S., Buchmann, C., Carl, G., Carré, G., … Lautenbach, S. (2013). Collinearity: a review of methods to deal with it and a simulation study evaluating their performance. Ecography 36(1): 27–46. Douma, J.C. & Weedon, J.T. (2019). Analysing continuous proportions in ecology and evolution: A practical introduction to beta and Dirichlet regression. (D. Warton, ed.)Methods in Ecology and Evolution 10(9): 1412–1430. Dray, S. & Josse, J. (2015). Principal component analysis with missing values: a comparative survey of methods. Plant Ecology 216(5): 657–667. Duit, A. (2014). Conclusion: An Emerging Ecostate? In A. Duit (ed.), State and Environment. The MIT Press. Retrieved from http://mitpress.universitypressscholarship.com/view/10. 7551/mitpress/9780262027120.001.0001/upso-9780262027120-chapter-12 Dullien, S. (2017). A New »Magic Square« for Inclusive and Sustainable Economic Growth— A policy framework for Germany to move beyond GDP. Friedrich-Ebert-Stiftung. Dullien, S. & van Treeck, T. (2012). Ein neues ‘Magisches Viereck’—Ziele einer nachhaltigenWirtschaftspolitik und Überlegungen für ein neues „Stabilitäts-und Wohlstandsgesetz“. denkwerk demokratie. D’Urso, P. (2015). Fuzzy Clustering. In C. Hennig, M. Meil˘a, F. Murtagh & R. Rocci (eds.), Handbook of Cluster Analysis. Boca Raton: CRC Press, Taylor & Francis Group. Duverger, M. (1951). Les Partis Politiques. Paris: Armand Colin. Duverger, M. (1980). A New Political System Model: Semi-Presidential Government. European Journal of Political Research 8(2): 165–187. Easton, D. (1965). A systems analysis of political life. New York, NY: Wiley. Easton, D. (1975). A Re-Assessment of the Concept of Political Support. British Journal of Political Science 5(4): 435–457. Eckstein, H. (1971). The evaluation of political performance: problems and dimensions. Sage Publications.
348
References
Elkins, Z., Ginsburg, T. & Melton, J. (2009a). The endurance of national constitutions. Cambridge ; New York: Cambridge University Press. Elkins, Z., Ginsburg, T. & Melton, J. (2009b). Replication Data for The Endurance of National Constitutions. Retrieved from https://comparativeconstitutionsproject.org Enders, C.K. (2010). Applied missing data analysis. New York: Guilford Press. Esping-Andersen, G. (1990). The three worlds of welfare capitalism. Princeton, N.J: Princeton University Press. Everitt, B.S., Landau, S., Leese, M. & Stahl, D. (2011). Cluster Analysis. Chichester, UK: John Wiley & Sons, Ltd. Retrieved from http://doi.wiley.com/10.1002/9780470977811 EVS. (2015). European Values Study Longitudinal Data File 1981–2008 (EVS 1981– 2008). GESIS Data Archive. Retrieved from https://dbk.gesis.org/dbksearch/sdesc2.asp? no=4804&db=e&doi=10.4232/1.12253 Filzmoser, P., Hron, K. & Templ, M. (2018). Applied compositional data analysis: with worked examples in R. Cham: Springer. Finch, W.H. (2013). Exploratory Factor Analysis. In T. Teo (ed.), Handbook of Quantitative Methods for Educational Research. Rotterdam: SensePublishers. Retrieved from http:// link.springer.com/10.1007/978-94-6209-404-8_8 Fortin, J. (2008a). Patterns of Democracy?: Counterevidence from Nineteen Post-Communist Countries. Zeitschrift für Vergleichende Politikwissenschaft 2(2): 198–220. Fortin, J. (2008b). Patterns of Democracy? Counterevidence from Nineteen Post-Communist Countries. Zeitschrift für Vergleichende Politikwissenschaft 2(2): 198–220. Fortin-Rittberger, J. (2014). Time-Series Cross-Section. In The SAGE Handbook of Regression Analysis and Causal Inference. 1 Oliver’s Yard, 55 City Road, London EC1Y 1SP United Kingdom: SAGE Publications Ltd. Retrieved from http://methods.sagepub.com/book/reg ression-analysis-and-causal-inference/n17.xml Fox, J. (2016). Applied regression analysis and generalized linear models (Third Edition.). Los Angeles: SAGE. Fuchs, D. (1998). Kriterien demokratischer Performanz in Liberalen Demokratien. In M. Th. Greven (ed.), Demokratie—eine Kultur des Westens? 20. Wissenschaftlicher Kongreß der Deutschen Vereinigung für Politische Wissenschaft. Wiesbaden: VS Verlag für Sozialwissenschaften. Retrieved from https://doi.org/10.1007/978-3-322-92308-0_9 Fuchs, D. (2007). The Political Culture Paradigm. Oxford University Press. Retrieved from http://oxfordhandbooks.com/view/10.1093/oxfordhb/9780199270125.001.0001/ oxfordhb-9780199270125-e-009 Gabry, J., Simpson, D., Vehtari, A., Betancourt, M. & Gelman, A. (2019). Visualization in Bayesian workflow. Journal of the Royal Statistical Society: Series A (Statistics in Society) 182(2): 389–402. Garriga, A.C. (2016). Central Bank Independence in the World: A New Data Set. International Interactions 42(5): 849–868. Geiser, C. & Eid, M. (2010). Item-Response-Theorie. In C. Wolf & H. Best (eds.), Handbuch der sozialwissenschaftlichen Datenanalyse. Wiesbaden: VS Verlag für Sozialwissenschaften. Retrieved from http://link.springer.com/10.1007/978-3-531-92038-2_14 Gelman, A. (2006). Prior distributions for variance parameters in hierarchical models (comment on article by Browne and Draper). Bayesian Analysis 1(3): 515–534. Gelman, A. (2013). Bayesian Data Analysis (Third edition.). Boca Raton: CRC Press.
References
349
Gelman, A. & Hill, J. (2007). Data Analysis Using Regression and Multilevel/Hierarchical Models. Cambridge ; New York: Cambridge University Press. Gelman, A. & Rubin, D.B. (1992). Inference from Iterative Simulation Using Multiple Sequences. Statistical Science 7(4): 457–472. Gerring, J. & Thacker, S.C. (2008). A Centripetal Theory of Democratic Governance. Cambridge: Cambridge University Press. Retrieved from http://ebooks.cambridge.org/ref/id/ CBO9780511756054 Gerring, J., Thacker, S.C. & Moreno, C. (2005). Centripetal Democratic Governance: A Theory and Global Inquiry. The American Political Science Review 99(4): 567–581. Gigerenzer, G. (1993). The superego, the ego, and the id in statistical reasoning. In G. Keren & C. Lewis (eds.), A handbook for data analysis in the behavioral sciences: Methodological issues. Hillsdale, NJ, US: Lawrence Erlbaum Associates, Inc. Gilley, B. (2006). The meaning and measure of state legitimacy: Results for 72 countries. European Journal of Political Research 45(3): 499–525. Ginsburg, T. & Melton, J. (2015). Does the constitutional amendment rule matter at all?: Amendment cultures and the challenges of measuring amendment difficulty. International Journal of Constitutional Law 13(3): 686–713. Giuliani, M. (2016). Patterns of democracy reconsidered: The ambiguous relationship between corporatism and consensualism. European Journal of Political Research 55(1): 22–42. Gleditsch, K.S. (2002). Expanded Trade and GDP Data. The Journal of Conflict Resolution 46(5): 712–724. Goertz, G. (2006). Social Science Concepts: A User’s Guide. Princeton: Princeton University Press. Goodman, S. (2008). A Dirty Dozen: Twelve P-Value Misconceptions. Seminars in Hematology 45(3): 135–140. Graham, J.W. (2009). Missing Data Analysis: Making It Work in the Real World. Annual Review of Psychology 60(1): 549–576. Graham, J.W. (2012). Missing data: analysis and design. New York, NY: Springer. Grävingholt, J., Ziaja, S. & Kreibaum, M. (2012). State fragility: towards a multi-dimensional empirical typology. Bonn: Dt. Inst. für Entwicklungspolitik. Grävingholt, J., Ziaja, S. & Kreibaum, M. (2015). Disaggregating state fragility: a method to establish a multidimensional empirical typology. Third World Quarterly 36(7): 1281– 1298. Greenacre, M.J. (2019). Compositional data analysis in practice. Boca Raton: CRC Press, Taylor & Francis Group. Greenland, S., Senn, S.J., Rothman, K.J., Carlin, J.B., Poole, C., Goodman, S.N. & Altman, D.G. (2016). Statistical tests, P values, confidence intervals, and power: a guide to misinterpretations. European Journal of Epidemiology 31(4): 337–350. Green-Pedersen, C. (2007). More than Data Questions and Methodological Issues: Theoretical Conceptualization and the Dependent Variable ‘Problem’ in the Study of Welfare Reform. In Investigating Welfare State Change. Edward Elgar Publishing. Retrieved from http:// www.elgaronline.com/view/9781845427399.00010.xml Grice, J.W. (2001). Computing and evaluating factor scores. Psychological Methods 6(4): 430–450.
350
References
Gross, C. & Kriwy, P. (2009). Kleine Fallzahlen in der empirischen Sozialforschung. In P. Kriwy & C. Gross (eds.), Klein aber fein! Quantitative empirische Sozialforschung mit kleinen Fallzahlen. Wiesbaden: VS-Verlag. Gueorguieva, R., Rosenheck, R. & Zelterman, D. (2008). Dirichlet component regression and its applications to psychiatric data. Computational Statistics & Data Analysis 52(12): 5344–5355. Gurr, T.R. & McClelland, M. (1971). Political performance: a twelve-nation study. Beverly Hills: Sage Publications. Guttman, L. (1954). Some necessary conditions for common-factor analysis. Psychometrika 19(2): 149–161. Hakhverdian, A. & Koop, C. (2007). Consensus Democracy and Support for Populist Parties in Western Europe. Acta Politica 42(4): 401–420. Halkidi, M., Vazirgiannis, M. & Hennig, C. (2015). Method-Independent Indices for Cluster Validation and Estimating the Number of Clusters. In C. Hennig, M. Meil˘a, F. Murtagh & R. Rocci (eds.), Handbook of Cluster Analysis. Boca Raton: CRC Press, Taylor & Francis Group. Hall, P.A. & Soskice, D. (2001). Varieties of Capitalism. Oxford University Press. Retrieved from http://www.oxfordscholarship.com/view/10.1093/0199247757.001.0001/ acprof-9780199247752 Hamaker, E.L. & Grasman, R.P.P.P. (2015). To center or not to center? Investigating inertia with a multilevel autoregressive model. Frontiers in Psychology 5. Retrieved from http:// journal.frontiersin.org/article/10.3389/fpsyg.2014.01492/abstract Hanke, S.H. (2014). Measuring Misery around the World. Globe Asia 22–25. Harrendorf, S. (2018). Prospects, Problems, and Pitfalls in Comparative Analyses of Criminal Justice Data. Crime and Justice 47(1): 159–207. Hastie, T., Tibshirani, R. & Friedman, J. (2009). The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Second Edition (2nd ed.). New York: Springer-Verlag. Retrieved from //www.springer.com/de/book/9780387848570 Häusermann, S. (2018). Welfare State Research and Comparative Political Economy. In Oxford Research Encyclopedia of Politics. Oxford University Press. Retrieved from http://politics.oxfordre.com/view/10.1093/acrefore/9780190228637.001.0001/ acrefore-9780190228637-e-654 Hays, J.C. (2003). Globalization and Capital Taxation in Consensus and Majoritarian Democracies. World Politics 56(1): 79–113. Helmke, G. & Levitsky, S. (2004). Informal Institutions and Comparative Politics: A Research Agenda. Perspectives on Politics 2(4): 725–740. Hennig, C. (2007). Cluster-wise assessment of cluster stability. Computational Statistics & Data Analysis 52(1): 258–271. Hennig, C. (2018). fpc: Flexible Procedures for Clustering. R package. Retrieved from https:// cran.r-project.org/web/packages/fpc/fpc.pdf Hennig, C. (2019). Cluster Validation by Measurement of Clustering Characteristics Relevant to the User. In C. H. Skiadas & J. R. Bozeman (eds.), Data Analysis and Applications 1. Hoboken, NJ, USA: John Wiley & Sons, Inc. Retrieved from http://doi.wiley.com/10. 1002/9781119597568.ch1 Hensel, P.R. (2018). ICOW Colonial History Data Set, version 1.1. Retrieved from http:// www.paulhensel.org/icowcol.html
References
351
Hicks, A. & Kenworthy, L. (2003). Varieties of welfare capitalism. Socio-Economic Review 1(1): 27–61. Hidalgo, O. (2014). Die Antinomien der Demokratie. Frankfurt am Main: Campus-Verl. Honaker, J., King, G. & Blackwell, M. (2011). Amelia II: A Program for Missing Data. Journal of Statistical Software 45(7). Retrieved from http://www.jstatsoft.org/v45/i07/ Höpner, M. (2009). „Spielarten des Kapitalismus“ als Schule der vergleichenden Staatstätigkeitsforschung. Zeitschrift für Vergleichende Politikwissenschaft 3(2): 303–327. Horn, J.L. (1965). A rationale and test for the number of factors in factor analysis. Psychometrika 30(2): 179–185. Hox, J.J., Moerbeek, M. & Schoot, R. van de. (2017). Multilevel analysis: techniques and applications (Third edition.). New York, NY: Routledge. Hron, K., Filzmoser, P. & Thompson, K. (2012). Linear regression with compositional explanatory variables. Journal of Applied Statistics 39(5): 1115–1128. Huntington, S.P. (1993). The Third Wave: Democratization in the Late Twentieth Century. University of Oklahoma Press. Ido, M. (2012). Varieties of Capitalism, Types of Democracy and Globalization (1st ed.). Routledge. Retrieved from https://www.taylorfrancis.com/books/9780203123980 Igo, R.P. (2010). Influential Data Points. In Encyclopedia of Research Design. 2455 Teller Road, Thousand Oaks California 91320 United States: SAGE Publications, Inc. Retrieved from http://methods.sagepub.com/reference/encyc-of-research-design/n502.xml Inglehart, R. (1977). The silent revolution: changing values and political styles among western publics. Princeton, NJ: Princeton University Press. Inglehart, R. (2007). Postmaterialist Values and the Shift from Survival to Self-Expression Values. Oxford University Press. Retrieved from http://oxfordhandbooks.com/view/10. 1093/oxfordhb/9780199270125.001.0001/oxfordhb-9780199270125-e-012 Iqbal, K. & Shah, A. (2008). How do Worldwide Governance Indicators measure up? Retrieved from http://siteresources.worldbank.org/PSGLP/Resources/Howdoworldwidegovernance indicatorsmeasureup.pdf Isajiw, W.W. (2013). Causation and Functionalism in Sociology. Routledge. Iversen, T. & Soskice, D. (2011). Inequality and Redistribution: A Unified Approach to the Role of Economic and Political Institutions. Revue économique 62(4): 629. Jahn, D. (1998). Environmental performance and policy regimes: Explaining variations in 18 OECD-countries. Policy Sciences 31(2): 107–131. Jahn, D. (2011). Einführung in die vergleichende Politikwissenschaft. Springer-Verlag. Jahn, D. (2014). The Three Worlds of Environmental Politics. In A. Duit (ed.), State and Environment. The MIT Press. Retrieved from http://mitpress.universitypressscholarship. com/view/10.7551/mitpress/9780262027120.001.0001/upso-9780262027120-chapter-4 Jahn, D. (2016). The Politics of Environmental Performance: Institutions and Preferences in Industrialized Democracies. Cambridge: Cambridge University Press. Retrieved from http://ebooks.cambridge.org/ref/id/CBO9781316339152 James, G., Witten, D., Hastie, T. & Tibshirani, R. (2013). An Introduction to Statistical Learning with Applications in R. Wiesbaden: VS-Verlag. Johnson, S., Larson, W., Papageorgiou, C. & Subramanian, A. (2013). Is newer better? Penn World Table Revisions and their impact on growth estimates. Journal of Monetary Economics 60(2): 255–274.
352
References
Juárez, M.A. & Steel, M.F.J. (2010). Non-Gaussian dynamic Bayesian modelling for panel data. Journal of Applied Econometrics 25(7): 1128–1154. Kailitz, S. (2007). Arend Lijphart, Patterns of Democracy. Government Forms and Performance in Thirty-Six Countries, Yale 1999. In S. Kailitz (ed.), Schlüsselwerke der Politikwissenschaft. Wiesbaden: VS Verlag für Sozialwissenschaften. Retrieved from http://link.springer.com/10.1007/978-3-531-90400-9_64 Kaiser, A. (1997). Types of Democracy: From Classical to New Institutionalism. Journal of Theoretical Politics 9(4): 419–444. Kaiser, H.F. (1960). The Application of Electronic Computers to Factor Analysis. Educational and Psychological Measurement 20(1): 141–151. Kaiser, H.F. (1970). A second generation little jiffy. Psychometrika 35(4): 401–415. Kaiser, H.F. & Rice, J. (1974). Little Jiffy, Mark Iv. Educational and Psychological Measurement 34(1): 111–117. Katz, R.S. & Mair, P. (1995). Changing Models of Party Organization and Party Democracy: The Emergence of the Cartel Party. Party Politics 1(1): 5–28. Kaufman, L. & Rousseeuw, P.J. (2005). Finding Groups in Data: An Introduction to Cluster Analysis. Hoboken, N.J: Wiley. Kaufmann, D., Kraay, A. & Mastruzzi, M. (2010). The Worldwide Governance Indicators: Methodology and Analytical Issues. World Bank Policy Research Working Paper 5430. Keele, L., Linn, S. & McLaughlin Webb, C. (2016). Erratum for Keele, Linn, and Webb (2016). Political Analysis 24(2): 291–306. King, G. & Roberts, M.E. (2015). How Robust Standard Errors Expose Methodological Problems They Do Not Fix, and What to Do About It. Political Analysis 23(2): 159–179. King, G., Tomz, M. & Wittenberg, J. (2000). Making the Most of Statistical Analyses: Improving Interpretation and Presentation. American Journal of Political Science 44(2): 347. Knill, C. & Tosun, J. (2012). Public policy: a new introduction. New York: Palgrave Macmillan. Kruschke, J.K. (2015). Doing Bayesian data analysis: a tutorial with R, JAGS, and Stan (Edition 2.). Boston: Academic Press. Kruschke, J.K. & Liddell, T.M. (2018). The Bayesian New Statistics: Hypothesis testing, estimation, meta-analysis, and power analysis from a Bayesian perspective. Psychonomic Bulletin & Review 25(1): 178–206. Kühnel, S.M. & Krebs, D. (2010). Multinomiale und ordinale Regression. In C. Wolf & H. Best (eds.), Handbuch der sozialwissenschaftlichen Datenanalyse. Wiesbaden: VS Verlag für Sozialwissenschaften. Retrieved from http://link.springer.com/10.1007/978-3531-92038-2_32 La Porta, R., Florencio Lopez-de-Silanes, Shleifer, A. & Vishny, R. (1999). The Quality of Government. Journal of Law, Economics, & Organization 15(1): 222–279. Lambach, D., Johais, E. & Bayer, M. (2015). Conceptualising state collapse: an institutionalist approach. Third World Quarterly 36(7): 1299–1315. Lambert, B. (2018). A Student’s Guide to Bayesian Statistics. Los Angeles: SAGE. Lance, C.E., Butts, M.M. & Michels, L.C. (2006). The Sources of Four Commonly Reported Cutoff Criteria: What Did They Really Say? Organizational Research Methods 9(2): 202–220.
References
353
Lappi-Seppälä, T. (2008). Trust, Welfare, and Political Culture: Explaining Differences in National Penal Policies. Crime and Justice 37(1): 313–387. Lappi-Seppälä, T. (2010). Imprisonment and penal demands—Exploring the dimensions and drivers of systemic and attitudinal punitivity. In The Routledge Handbook of European Criminology. Routledge. Retrieved from https://www.taylorfrancis.com/books/978 0203083505 Lauth, H.-J. (2000). Informal Institutions and Democracy. Democratization 7(4): 21–50. Lauth, H.-J. (2004). Demokratie und Demokratiemessung: Eine konzeptionelle Grundlegung für den interkulturellen Vergleich. Wiesbaden: VS-Verlag. Lauth, H.-J. (2010a). Demokratietypen auf dem Prüfstand: Zur Reichweite von Lijpharts Mehrheits- und Konsensusdemokratie in der Vergleichenden Politikwissenschaft. In K. H. Schrenk & M. Soldner (eds.), Analyse demokratischer Regierungssysteme. Wiesbaden: VS Verlag für Sozialwissenschaften. Retrieved from http://link.springer.com/10.1007/978-3531-91955-3_3 Lauth, H.-J. (2010b). Möglichkeiten und Grenzen der Demokratiemessung. Zeitschrift für Staats- und Europawissenschaften (ZSE) / Journal for Comparative Government and European Policy 8(4): 498–529. Lauth, H.-J. (2011). Qualitative Ansätze der Demokratiemessung. Zeitschrift für Staats- und Europawissenschaften (ZSE) / Journal for Comparative Government and European Policy 9(1): 49–77. Lauth, H.-J. (2015). The Matrix of Democracy: A Three-Dimensional Approach to Measuring the Quality of Democracy and Regime Transformations (Working Paper No. 6). Würzburg: Universität Würzburg. Retrieved from https://opus.bibliothek.uni-wuerzburg.de/opus4wuerzburg/frontdoor/deliver/index/docId/10966/file/WAPS6_Lauth_Matrix_of_Democr acy.pdf Lauth, H.-J. (2016a). The internal relationships of the dimensions of democracy: The relevance of trade-offs for measuring the quality of democracy. International Political Science Review 37(5): 606–617. Lauth, H.-J. (2016b). Formale und informelle Institutionen in der Vergleichenden Politikwissenschaft. In H.-J. Lauth, M. Kneuer & G. Pickel (eds.), Handbuch Vergleichende Politikwissenschaft. Wiesbaden: Springer Fachmedien Wiesbaden. Retrieved from http:// link.springer.com/10.1007/978-3-658-02338-6_14 Lauth, H.-J. (2017a). Zivilgesellschaft und die Qualität der Demokratie. In A. Croissant, S. Kneip & A. Petring (eds.), Demokratie, Diktatur, Gerechtigkeit. Wiesbaden: Springer Fachmedien Wiesbaden. Retrieved from http://link.springer.com/10.1007/978-3-65816090-6_19 Lauth, H.-J. (2017b). Legitimation autoritärer Regime durch Recht. Zeitschrift für Vergleichende Politikwissenschaft 11(2): 247–273. Lauth, H.-J. (2020). Legitimacy and Legitimation. In The SAGE Handbook of Political Science. 1 Oliver’s Yard, 55 City Road London EC1Y 1SP: SAGE Publications Ltd. Retrieved from https://sk.sagepub.com/reference/the-sage-handbook-of-political-science/i4967.xml Lauth, H.-J. & Schlenkrich, O. (2018a). Making Trade-Offs Visible: Theoretical and Methodological Considerations about the Relationship between Dimensions and Institutions of Democracy and Empirical Findings. Politics and Governance 6(1): 78.
354
References
Lauth, H.-J. & Schlenkrich, O. (2018b). Trade-Off Measurement in the Democracy Matrix. Retrieved from https://www.democracymatrix.com/concept-tree-operationalisat ion/trade-off-measurement Lauth, H.-J. & Schlenkrich, O. (2018c). Demokratie in komplexen Gesellschaften. In T. Mannewitz (ed.), Die Demokratie und ihre Defekte. Wiesbaden: Springer Fachmedien Wiesbaden. Retrieved from http://link.springer.com/10.1007/978-3-658-20848-6_7 Lauth, H.-J. & Schlenkrich, O. (2019). Democracy Matrix Codebook V.1.1. Retrieved from https://www.democracymatrix.com/fileadmin/Mediapool/Downloads/Archive/Dem ocracyMatrix_Codebook_v1_1.pdf Lauth, H.-J. & Schlenkrich, O. (2020). Democracy Matrix Codebook V.3. Retrieved from https://www.democracymatrix.com/fileadmin/Mediapool/Downloads/Democr acyMatrix_Codebook_v3.pdf Lauth, H.-J., Schlenkrich, O. & Lemm, L. (2020). No Age of Autocratization! Growing Hybridity in the Center of the Regime Continuum. Retrieved from https://www.demokr atiematrix.de Lauth, H.-J. & Wagner, C. (eds.). (2016). Vergleichende Politikwissenschaft: Analyse und Vergleich politischer Systeme. In Politikwissenschaft: eine Einführung (8th ed.). Paderborn: Schöningh. Levy, R. & Mislevy, R.J. (2016). Bayesian Psychometric Modeling. Boca Raton: CRC Press, Taylor & Francis Group. Lewin, L., Lewin, B., Bäck, H. & Westin, L. (2008). A Kinder, Gentler Democracy? The Consensus Model and Swedish Disability Politics. Scandinavian Political Studies 31(3): 291–310. Licht, A.N., Goldschmidt, C. & Schwartz, S.H. (2007). Culture rules: The foundations of the rule of law and other norms of governance. Journal of Comparative Economics 35(4): 659–688. Lijphart, A. (1969). Consociational Democracy. World Politics 21(2): 207–225. Lijphart, A. (1984). Democracies: Patterns of Majoritarian and Consensus Government in Twenty-One Countries. New Haven: Yale University Press. Lijphart, A. (1996). The Puzzle of Indian Democracy: A Consociational Interpretation. The American Political Science Review 90(2): 258–268. Lijphart, A. (1999). Patterns of Democracy: Government Forms and Performance in Thirty-six Countries. Yale University Press. Lijphart, A. (2004). Constitutional Design for Divided Societies. Journal of Democracy 15(2): 96–109. Lijphart, A. (2012). Patterns of Democracy: Government Forms and Performance in Thirty-six Countries (2nd ed.). New Haven: Yale University Press. Lipset, S.M. (1959). Some Social Requisites of Democracy: Economic Development and Political Legitimacy. American Political Science Review 53(1): 69–105. Long, J.S. (1987). A Graphical Method for the Interpretation of Multinomial Logit Analysis. Sociological Methods & Research 15(4): 420–446. Long, J.S. & Freese, J. (2014). Regression models for categorical dependent variables using Stata (Third edition.). College Station, Texas: Stata Press Publication, StataCorp LP. Lorenz, A. (2005). How to Measure Constitutional Rigidity: Four Concepts and Two Alternatives. Journal of Theoretical Politics 17(3): 339–361.
References
355
Lorenz, A. (2008). Verfassungsänderungen in etablierten Demokratien: Motivlagen und Aushandlungsmuster. Wiesbaden: VS, Verl. für Sozialwiss. Retrieved from https://doi.org/10. 1007/978-3-531-91193-9 Lorenz, A. (2015). Verfassungen in der vergleichenden Politikwissenschaft. In H.-J. Lauth, M. Kneuer & G. Pickel (eds.), Handbuch Vergleichende Politikwissenschaft. Wiesbaden: Springer Fachmedien Wiesbaden. Retrieved from http://link.springer.com/http://link.spr inger.com/10.1007/978-3-658-02993-7_28-1 Lorenzo-Seva, U. & Van Ginkel, J.R. (2016). Multiple Imputation of missing values in exploratory factor analysis of multidimensional scales: estimating latent trait scores. Anales de Psicología 32(2): 596. Lovell, C.A.K., Pastor, J.T. & Turner, J.A. (1995). Measuring macroeconomic performance in the OECD: A comparison of European and non-European countries. European Journal of Operational Research 87(3): 507–518. Lutz, D.S. (1994). Toward a Theory of Constitutional Amendment. American Political Science Review 88(2): 355–370. Maestas, C. (2016). Expert Surveys as a Measurement Tool. (L.R. Atkeson & R.M. Alvarez, eds.) (Vol. 1). Oxford University Press. Retrieved from http://oxfordhandbooks.com/view/ 10.1093/oxfordhb/9780190213299.001.0001/oxfordhb-9780190213299-e-13 Magnusson, A., Skaug, H., Nielsen, A., Berg, C., Kristensen, K., Maechler, M., … Brooks, M. (2020). Package ‘glmmTMB’. Retrieved from https://cran.r-project.org/web/packages/ glmmTMB/glmmTMB.pdf Maier, M.J. (2014). DirichletReg: Dirichlet Regression for Compositional Data in R 13. Maleki, A. & Hendriks, F. (2015). The relation between cultural values and models of democracy: a cross-national study. Democratization 22(6): 981–1010. Mangiafico, S.S. (2016). Summary and Analysis of Extension Program Evaluation in R, version 1.18.1. Retrieved from http://rcompanion.org/documents/RHandbookProgramEvalu ation.pdf Manow, P. (2004). ‘The good, the bad, and the ugly’: Esping-Andersen’s Regime Typology and the Religious Roots of the Western Welfare State. MPIfG Working Paper 3: MaxPlanck-Institut für Gesellschaftsforschung. Mares, I. (2003). The politics of social risk: business and welfare state development. Cambridge, UK ; New York: Cambridge University Press. Marquardt, K.L. & Pemstein, D. (2018). IRT Models for Expert-Coded Panel Data. Political Analysis 26(4): 431–456. Martínez i Coma, F. & van Ham, C. (2015). Can experts judge elections? Testing the validity of expert judgments for measuring election integrity: Can Experts Judge Elections? European Journal of Political Research 54(2): 305–325. McElreath, R. (2015). Statistical rethinking: a Bayesian course with examples in R and Stan. Boca Raton: CRC Press/Taylor & Francis Group. McMann, K., Pemstein, D., Seim, B. & Lindberg, S.I. (2016). Strategies of Validation: Assessing the Varieties of Democracy Corruption Data. V-Dem Working Paper 23. Meadowcroft, J. (2014). Comparing Environmental Performance. In A. Duit (ed.), State and Environment. The MIT Press. Retrieved from http://mitpress.universitypressscholarship. com/view/10.7551/mitpress/9780262027120.001.0001/upso-9780262027120-chapter-2 Merkel, W. (2008). Plausible theory, unexpected results: The rapid democratic consolidation in central and eastern Europe. IPG 11–29.
356
References
Merkel, W. (2010). Systemtransformation. Eine Einführung in die Theorie und Empirie der Transformationsforschung. Wiesbaden: VS-Verlag. Merkel, W. (2014). Is capitalism compatible with democracy? Zeitschrift für Vergleichende Politikwissenschaft 8(2): 109–128. Meuleman, B., Loosveldt, G. & Emonds, V. (2014). Regression Analysis: Assumptions and Diagnostics. In The SAGE Handbook of Regression Analysis and Causal Inference. 1 Oliver’s Yard, 55 City Road, London EC1Y 1SP United Kingdom: SAGE Publications Ltd. Retrieved from http://methods.sagepub.com/book/regression-analysis-and-causal-inf erence/n5.xml Mishler, W. & Rose, R. (2002). Learning and re-learning regime support: The dynamics of post-communist regimes. European Journal of Political Research 41(1): 5–36. Mohamad-Klotzbach, C. & Schlenkrich, O. (2017). Die wiederentdeckte Relevanz von Staat und Staatlichkeit. Zeitschrift für Vergleichende Politikwissenschaft 11(4): 479–487. Moosbrugger, H. & Schermelleh-Engel, K. (2012). Exploratorische (EFA) und Konfirmatorische Faktorenanalyse (CFA). In H. Moosbrugger & A. Kelava (eds.), Testtheorie und Fragebogenkonstruktion. Berlin, Heidelberg: Springer Berlin Heidelberg. Retrieved from http://link.springer.com/10.1007/978-3-642-20072-4_13 Müller, T. & Pickel, S. (2007). Wie lässt sich Demokratie am besten messen? Zur Konzeptqualität von Demokratie-Indizes. Politische Vierteljahresschrift 48(3): 511–539. Müller-Rommel, F. (2008). Demokratiemuster und Leistungsbilanz von Regierungen: Kritische Anmerkungen zu Arend Lijphart’s „Patterns of Democracy“. Zeitschrift für Vergleichende Politikwissenschaft 2(1): 78–94. Munck, G.L. (2016). What is democracy? A reconceptualization of the quality of democracy. Democratization 23(1): 1–26. Munck, G.L. & Verkuilen, J. (2002). Conceptualizing and Measuring Democracy: Evaluating Alternative Indices. Comparative Political Studies 35(1): 5–34. Muno, W. (2012). Die Vermessung der Welt: Eine Analyse der Worldwide Governance Indicators der Weltbank. Zeitschrift für Vergleichende Politikwissenschaft 6(1): 87–113. Nardo, M., Saisana, M., Saltelli, A., Tarantola, S., Hoffman, A. & Giovannini, E. (2005). Handbook on Constructing Composite Indicators: Methodology and User Guide (OECD Statistics Working Papers No. 2005/03). Paris: OECD. Retrieved from https://www.oecdilibrary.org/economics/handbook-on-constructing-composite-indicators_533411815016 Nassiri, V., Lovik, A., Molenberghs, G. & Verbeke, G. (2018). On using multiple imputation for exploratory factor analysis of incomplete data. Behavior Research Methods 50(2): 501–517. Negretto, G.L. (2012). Replacing and Amending Constitutions: The Logic of Constitutional Change in Latin America. Law & Society Review 46(4): 749–779. Nguyen, C.D., Carlin, J.B. & Lee, K.J. (2017). Model checking in multiple imputation: an overview and case study. Emerging Themes in Epidemiology 14(1): 8. Nölke, A. & Vliegenthart, A. (2009). Enlarging the Varieties of Capitalism: The Emergence of Dependent Market Economies in East Central Europe. World Politics 61(4): 670–702. Norris, M. & Lecavalier, L. (2010). Evaluating the Use of Exploratory Factor Analysis in Developmental Disability Psychological Research. Journal of Autism and Developmental Disorders 40(1): 8–20.
References
357
Norris, Paul. (2007). Expenditure on Public Order and Safety. In The Disappearing State? Edward Elgar Publishing. Retrieved from https://EconPapers.repec.org/RePEc:elg:eec hap:3779_6 Norris, Pippa. (1999). Critical Citizens: Global Support for Democratic Government. Oxford: Oxford University Press. Retrieved from http://www.oxfordscholarship.com/view/10. 1093/0198295685.001.0001/acprof-9780198295686 Norris, Pippa. (2008). Driving Democracy: Do Power-Sharing Institutions Work? Cambridge: Cambridge University Press. Retrieved from http://ebooks.cambridge.org/ref/id/CBO978 0511790614 Norris, Pippa. (2011). Democratic deficit: critical citizens revisited. New York: Cambridge University Press. OECD. (1987). OECD Economic Outlook, Volume 1987 Issue 1 (Vol. 1987). OECD Publishing. Retrieved from https://www.oecd-ilibrary.org/economics/oecd-economic-outlookvolume-1987-issue-1_eco_outlook-v1987-1-en OECD. (1993). OECD Core Set of Indicators For Environmental Performance ReviewsA synthesis report by the Group on the State of the Environment. Paris: OECD Publishing. Retrieved from http://www.oecd.org/officialdocuments/publicdisplaydocum entpdf/?cote=OCDE/GD(93)179&docLanguage=En OECD. (2017). Government at a Glance 2017. OECD. Retrieved from https://www.oecd-ili brary.org/governance/government-at-a-glance-2017_gov_glance-2017-en OECD. (2019). General government spending (indicator). OECD. Retrieved from https:// www.oecd-ilibrary.org/governance/general-government-spending/indicator/english_a 31cbf4d-en Olivier, J., G.J., Janssens-Maenhout, G., Muntean, M. & Peters, J.A.H.W. (2015). Trends in global CO2 emissions: 2015 Report. The Hague: PBL Publishers. Retrieved from https://edgar.jrc.ec.europa.eu/news_docs/jrc-2015-trends-in-global-co2-emissions2015-report-98184.pdf Orvis, S. & Drogus, C.A. (2017). Introducing Comparative Politics: Concepts and Cases in Context. CQ Press. Osborne, J.W. (2002). Notes on the use of data transformations. Practical Assessment, Research & Evaluation 8(6). Ozymy, J. & Rey, D. (2013). Wild Spaces or Polluted Places: Contentious Policies, Consensus Institutions, and Environmental Performance in Industrialized Democracies. Global Environmental Politics 13(4): 81–100. Parsons, T. (2005). The Social System. London: Routledge. Pemstein, D., Marquardt, K.L., Tzelgov, E., Wang, Y., Medzihorsky, J., Krusell, J., … Roemer, J. (2019). The V-Dem Measurement Model: Latent Variable Analysis for Cross-National and Cross-Temporal Expert-Coded Data. SSRN Electronic Journal. Retrieved from https:// www.ssrn.com/abstract=3395892 Pemstein, D., Tzelgov, E. & Wang, Y.-T. (2015). Evaluating and Improving Item Response Theory Models for Cross-National Expert Surveys. University of Gothenburg. Retrieved from https://www.v-dem.net/media/filer_public/22/41/224180e9-464b-4da2-b067b7d1af9b2731/v-dem_working_paper_2015_1.pdf Peter, F. (2017). Political Legitimacy. In E. N. Zalta (ed.), The Stanford Encyclopedia of Philosophy (Summer 2017.). Metaphysics Research Lab, Stanford University. Retrieved from https://plato.stanford.edu/archives/sum2017/entries/legitimacy/
358
References
Peters, G.-J.Y. (2014). The alpha and the omega of scale reliabilityand validity. The European Health Psychologist 16(2): 56–69. Pharr, S.J. & Putnam, R.D. (eds.). (2000). Disaffected democracies: what’s troubling the trilateral countries? Princeton, N.J: Princeton University Press. Pickel, S. & Pickel, G. (2006). Politische Kultur- und Demokratieforschung: Grundbegriffe, Theorien, Methoden: eine Einführung (1. Auflage.). Wiesbaden: VS, Verlag für Sozialwissenschaften. Pickel, S. & Pickel, G. (2016). Politische Kultur in der Vergleichenden Politikwissenschaft. In H.-J. Lauth, M. Kneuer & G. Pickel (eds.), Handbuch Vergleichende Politikwissenschaft. Wiesbaden: Springer Fachmedien Wiesbaden. Retrieved from http://link.springer.com/10. 1007/978-3-658-02338-6_41 Pickel, S., Stark, T. & Breustedt, W. (2015). Assessing the Quality of Quality Measures of Democracy: A Theoretical Framework and its Empirical Application. European Political Science 14(4): 496–520. Piketty, T. (2014). Capital in the twenty-first century. Cambridge Massachusetts: The Belknap Press of Harvard University Press. Pison, G., Struyf, A. & Rousseeuw, P.J. (1999). Displaying a clustering with CLUSPLOT. Computational Statistics & Data Analysis 30(4): 381–392. Poloni-Staudinger, L.M. (2008). Are consensus democracies more environmentally effective? Environmental Politics 17(3): 410–430. Powell, G.B. (1982). Contemporary democracies: participation, stability, and violence (5. print.). Cambridge: Harvard Univ. Press. Powell, J.M. & Thyne, C.L. (2011). Global instances of coups from 1950 to 2010: A new dataset. Journal of Peace Research 48(2): 249–259. Putnam, R.D. (1994). Making Democracy Work: Civic Traditions in Modern Italy. Princeton University Press. Qvortrup, M. & Lijphart, A. (2013). Domestic Terrorism and Democratic Regime Types. Civil Wars 15(4): 471–485. Rammstedt, B. (2010). Reliabilität, Validität, Objektivität. In C. Wolf & H. Best (eds.), Handbuch der sozialwissenschaftlichen Datenanalyse. Wiesbaden: VS Verlag für Sozialwissenschaften. Retrieved from https://doi.org/10.1007/978-3-531-92038-2_11 Raudenbush, S.W. & Bryk, A.S. (2002). Hierarchical linear models: applications and data analysis methods (2nd ed.). Thousand Oaks: Sage Publications. Ravallion, M. (2015). The Luxembourg Income Study. The Journal of Economic Inequality 13(4): 527–547. Reutter, W. & Lorenz, A. (2016). Explaining the Frequency of Constitutional Change in the German Länder: Institutional and Party Factors. Publius: The Journal of Federalism 46(1): 103–127. Roller, E. (2005). The Performance of Democracies: Political Institutions and Public Policy. (J. Bendix, trans.) (1st ed.). OUP Oxford. Roller, E. (2011). Performance. In International Encyclopedia of Political Science. 2455 Teller Road, Thousand Oaks California 91320 United States: SAGE Publications, Inc. Retrieved from http://sk.sagepub.com/Reference/intlpoliticalscience/n428.xml Rotberg, R.I. (ed.). (2004). When states fail: causes and consequences. Princeton, N.J: Princeton University Press.
References
359
Rothstein, B. & Teorell, J. (2008). What Is Quality of Government? A Theory of Impartial Government Institutions. Governance 21(2): 165–190. Rubin, D.B. (1976). Inference and missing data. Biometrika 63(3): 581–592. Salkind, N. (2010). Residual Plot. In N. Salkind (ed.), Encyclopedia of Research Design. 2455 Teller Road, Thousand Oaks California 91320 United States: SAGE Publications, Inc. Retrieved from http://methods.sagepub.com/reference/encyc-of-research-design/ n384.xml Sarkar, P. (2016). Foreign Direct Investment, Capital Formation, and Growth. In M. Roy & S. Sinha Roy (eds.), International Trade and International Finance. New Delhi: Springer India. Retrieved from http://link.springer.com/10.1007/978-81-322-2797-7_17 Saunders, P. (2010). Inequality and Poverty. Oxford University Press. Retrieved from http:// oxfordhandbooks.com/view/10.1093/oxfordhb/9780199579396.001.0001/oxfordhb-978 0199579396-e-36 Schedler, A. (2012). Judgment and Measurement in Political Science. Perspectives on Politics 10(1): 21–36. Schlenkrich, O. (2019). Identifying Profiles of Democracies: A Cluster Analysis Based on the Democracy Matrix Dataset from 1900 to 2017. Politics and Governance 7(4): 315. Schlenkrich, O., Lemm, L. & Mohamad-Klotzbach, C. (2016a). State Fragility in the Democratic Republic of the Congo 1960–2014: A New Approach for Assessing the Quality of Statehood by Analysing the Relationship between Capacities, Challenges and State Actors. In J. Bobineau & P. Gieg (eds.), The Democratic Republic of the Congo – Problems, Progress and Prospects. Berlin: LIT Publishers. Schlenkrich, O., Lemm, L. & Mohamad-Klotzbach, C. (2016b). The contextualized index of statehood (CIS): assessing the interaction between contextual challenges and the organizational capacities of states. Zeitschrift für Vergleichende Politikwissenschaft 10(3): 241–272. Schlicht-Schmälzle, R. & Möller, S. (2012). Macro-Political Determinants of Educational Inequality between Migrants and Natives in Western Europe. West European Politics 35(5): 1044–1074. Schlomer, G.L., Bauman, S. & Card, N.A. (2010). Best practices for missing data management in counseling psychology. Journal of Counseling Psychology 57(1): 1–10. Schmidt, M.G. (1992). Regierungen: Parteipolitische Zusammensetzung. In M. G. Schmidt (ed.), Lexikon der Politik (Vol. 3: Die westlichen Länder). München: C.H. Beck Verlag. Schmidt, M.G. (2001). Ursachen und Folgen wohlfahrtsstaatlicher Politik: Ein internationaler Vergleich. In M. G. Schmidt (ed.), Wohlfahrtsstaatliche Politik. Wiesbaden: VS Verlag für Sozialwissenschaften. Retrieved from http://link.springer.com/10.1007/978-3663-01428-7_2 Schmidt, M.G. (2015). The Four Worlds of Democracy: Commentary on Arend Lijphart’s Revised Edition of Patterns of Democracy (2012). In V. Schneider & B. Eberlein (eds.), Complex Democracy. Cham: Springer International Publishing. Retrieved from http://link. springer.com/10.1007/978-3-319-15850-1_3 Schmidt, M.G. (2019). Demokratietheorien: Eine Einführung. Wiesbaden: Springer Fachmedien Wiesbaden. Retrieved from http://link.springer.com/10.1007/978-3-658-25839-9 Schmidt, M.G., Ostheim, T., Siegel, N.A. & Zohlnhöfer, R. (eds.). (2007). Der Wohlfahrtsstaat. Wiesbaden: VS-Verlag.
360
References
Schmitt, S. (2012). Comparative approaches to the study of public policy-making. In Routledge Handbook of Public Policy. Routledge. Retrieved from https://www.taylorfrancis. com/books/9780203097571 Schneckener, U. (ed.). (2004a). States at Risk: fragile Staaten als Sicherheits- und Entwicklungsproblem. Retrieved from https://nbn-resolving.org/urn:nbn:de:0168-ssoar-243853 Schneckener, U. (2004b). Models of Ethnic Conflict Regulation—The Politics of Recognition. In U. Schneckener & S. Wolff (eds.), Managing and Settling Ethnic Conflicts: Perspectives on Successes and Failures in Europe, Africa and Asia. C. Hurst & Co. Publishers. Schneider, M.R. & Paunescu, M. (2012). Changing varieties of capitalism and revealed comparative advantages from 1990 to 2005: a test of the Hall and Soskice claims. Socio-Economic Review 10(4): 731–753. Schraad-Tischler, D. & Seelkopf, L. (2017). Concept and Methodology: Sustainable Governance Indicators 19. Schröder, M. (2014). Varianten des Kapitalismus. Wiesbaden: Springer Fachmedien Wiesbaden. Retrieved from http://link.springer.com/10.1007/978-3-658-05242-3 Scruggs, L. (1999). Institutions and Environmental Performance in Seventeen Western Democracies. British Journal of Political Science 29(1): 1–31. Scruggs, L. (2007). Welfare State Generosity Across Space and Time. In Investigating Welfare State Change. Edward Elgar Publishing. Retrieved from http://www.elgaronline.com/ view/9781845427399.00017.xml Scruggs, L. (2014). Social Welfare Generosity Scores in CWED2: A Methodological Genealogy. CWED Working Paper 01. Retrieved from http://cwed2.org/Data/CWED2_WP_01_ 2014_Scruggs.pdf Scruggs, L., Jahn, D. & Kuitto, K. (2017). Comparative Welfare Entitlements Dataset 2 Codebook. Version 2017–09. University of Connecticut & University of Greifswald. SGI Team. (2019). Sustainable Governance Indicators 2019: Codebook. Retrieved from https://www.sgi-network.org/docs/2019/basics/SGI2019_Codebook.pdf Shapiro, S.S. & Wilk, M.B. (1965). An Analysis of Variance Test for Normality (Complete Samples). Biometrika 52(3/4): 591. Shor, B., Bafumi, J., Keele, L. & Park, D. (2007). A Bayesian Multilevel Modeling Approach to Time-Series Cross-Sectional Data. Political Analysis 15(2): 165–181. Shugart, M.S. & Carey, J.M. (1992). Presidents and Assemblies: Constitutional Design and Electoral Dynamics. Cambridge [England] ; New York: Cambridge University Press. Smeeding, T. & Latner, J.P. (2015). PovcalNet, WDI and ‘All the Ginis’: a critical review. The Journal of Economic Inequality 13(4): 603–628. Smilov, D. (2008). Dilemmas for a Democratic Society: Comparative Regulation of Money and Politics. DISC Working Paper 4. Central European University. Sontheimer, K. & Bleek, W. (2005). Grundzüge des politischen Systems Deutschlands (Aktualisierte Neuausg., 12., Aufl.). München: Piper. Soskice, D. (2007). Macroeconomics and Varieties of Capitalism. In B. Hancké, M. Rhodes & M. Thatcher (eds.), Beyond Varieties of Capitalism. Oxford: Oxford University Press. Retrieved from http://www.oxfordscholarship.com/view/10.1093/acprof:oso/978019920 6483.001.0001/acprof-9780199206483 Steffani, W. (1979). Parlamentarische und präsidentielle Demokratie. Wiesbaden: VS-Verlag.
References
361
Stegmueller, D. (2013). How Many Countries for Multilevel Modeling? A Comparison of Frequentist and Bayesian Approaches. American Journal of Political Science 57(3): 748– 761. Steiger, J.H. (1990). Some additional thoughts on components and factors. Multivariate Behavioral Research 25: 173–180. Stepan, A., Linz, J.J. & Yadav, Y. (2010). The Rise of “State-Nations”. Journal of Democracy 21(3): 50–68. Streeck, W. (2017). Buying time: the delayed crisis of democratic capitalism. (P. Camiller & D. Fernbach, trans.) (Second edition, with a new preface.). London New York: Verso. Tabachnick, B.G. & Fidell, L.S. (2014). Using multivariate statistics. Harlow, Essex: Pearson Education. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&scope= site&db=nlebk&db=nlabk&AN=1418064 Tavits, M. (2004). The Size of Government in Majoritarian and Consensus Democracies. Comparative Political Studies 37(3): 340–359. Teorell, J., Coppedge, M., Skaaning, S.-E. & Lindberg, S.I. (2016). Measuring Electoral Democracy with V-Dem Data:Introducing a New Polyarchy Index. Varieties of Democracy Institute. Retrieved from https://www.v-dem.net/media/filer_public/b7/1f/b71 f18e0-852e-4e52-adc4-9923f7baaac6/v-dem_working_paper_2016_25_edited.pdf Teorell, J., Holmberg, S., Rothstein, B., Pachon, N.A. & Axelsson, S. (2019). The Quality of Government Standard Dataset, version Jan19. University of Gothenburg: The Quality of Government Institute. Retrieved from http://www.qog.pol.gu.se doi:https://doi.org/10. 18157/qogstdjan19 Thomassen, J. & van Ham, C. (2017). A Legitimacy Crisis of Representative Democracy? (Vol. 1). Oxford University Press. Retrieved from http://www.oxfordscholarship.com/view/10. 1093/oso/9780198793717.001.0001/oso-9780198793717-chapter-1 Titmuss, R.M. (1974). What is Social Policy? In B. Abel-Smith & K. Titmuss (eds.), Social Policy: An Introduction. New York, NY. Tonry, M. (2007). Determinants of Penal Policies. Crime and Justice 36(1): 1–48. Tsebelis, G. (1995). Decision Making in Political Systems: Veto Players in Presidentialism, Parliamentarism, Multicameralism and Multipartyism. British Journal of Political Science 25(3): 289–325. Tsebelis, G. (2002). Veto players: how political institutions work. Princeton, N.J: Princeton University Press. Tukey, J.W. (1977). Exploratory Data Analysis. Addison-Wesley Publishing Company. Tutz, G. (2010). Regression für Zählvariablen. In C. Wolf & H. Best (eds.), Handbuch der sozialwissenschaftlichen Datenanalyse. Wiesbaden: VS Verlag für Sozialwissenschaften. Retrieved from http://link.springer.com/10.1007/978-3-531-92038-2_33 Tutz, G. (2011). Regression for Categorical Data. Cambridge: Cambridge University Press. Retrieved from http://ebooks.cambridge.org/ref/id/CBO9780511842061 UNODC. (2015). International Classification of Crime for Statistical Purposes (ICCS). Version 1. Retrieved from https://www.unodc.org/documents/data-and-analysis/statistics/ crime/ICCS/ICCS_English_2016_web.pdf van de Schoot, R. & Depaoli, S. (2014). Bayesian analyses: Where to start and what to report. The European Health Psychologist 16(2): 75–84. van der Meer, T.W.G. (2017). Political Trust and the “Crisis of Democracy”. In Oxford Research Encyclopedia of Politics. Oxford University Press. Retrieved
362
References
from https://oxfordre.com/politics/view/10.1093/acrefore/9780190228637.001.0001/acr efore-9780190228637-e-77 van Ham, C. & Thomassen, J. (2017). The Myth of Legitimacy Decline (Vol. 1). Oxford University Press. Retrieved from http://www.oxfordscholarship.com/view/10.1093/oso/ 9780198793717.001.0001/oso-9780198793717-chapter-2 Vanhanen, T. (1997). Prospects of democracy: a study of 172 countries. New York: Routledge. Vanhanen, T. (2003). Democratization: A Comparative Analysis of 170 Countries. London ; New York: Routledge. Vatter, A. (2009). Lijphart expanded: three dimensions of democracy in advanced OECD countries? European Political Science Review 1(1): 125–154. Vehtari, A., Gelman, A. & Gabry, J. (2017). Practical Bayesian Model Evaluation Using Leave-One-Out Cross-Validation and WAIC. Statistics and Computing 27(5): 1413–1432. Vis, B., Woldendorp, J. & Keman, H. (2012). Economic performance and institutions: capturing the dependent variable. European Political Science Review 4(1): 73–96. Visser, J. (2016). ICTWSS Data base. version 5.1. Amsterdam Institute for Advanced Labour Studies (AIAS), University of Amsterdam. Wackernagel, M., Beyers, B. & Rout, K. (2019). Ecological footprint: managing our biocapacity budget. Wasserstein, R.L. & Lazar, N.A. (2016). The ASA Statement on p-Values: Context, Process, and Purpose. The American Statistician 70(2): 129–133. Watkins, M.W. (2017). The reliability of multidimensional neuropsychological measures: from alpha to omega. The Clinical Neuropsychologist 31(6–7): 1113–1126. Watkins, M.W. (2018). Exploratory Factor Analysis: A Guide to Best Practice. Journal of Black Psychology 44(3): 219–246. Weaver, R.K. & Rockman, B. (1993). Assessing the Effects of Institutions. In R. K. Weaver & B. A. Rockman (eds.), Do institutions matter?: government capabilities in the United States and abroad. Washington, D.C.: Brookings Institution. Retrieved from https://bb. vle.keele.ac.uk/bbcswebdav/xid-1990951_1 Wendling, Z.A., Emerson, J.W., Esty, D.C., Levy, M.A. & de Sherbinin, A. (2018). 2018 Environmental Performance Index. New Haven, CT: Yale Center for Environmental Law & Policy. Wenzelburger, G. (2013). Die Politik der Inneren Sicherheit: Konturen eines Forschungsfelds aus Sicht der vergleichenden Politikforschung. Zeitschrift für Vergleichende Politikwissenschaft 7(1): 1–25. Wenzelburger, G. (2015). Die Politik der Inneren Sicherheit. In G. Wenzelburger & R. Zohlnhöfer (eds.), Handbuch Policy-Forschung. Wiesbaden: Springer Fachmedien Wiesbaden. Retrieved from http://link.springer.com/10.1007/978-3-658-01968-6_26 Wenzelburger, G. (2016). Innere Sicherheit in der Vergleichenden Politikwissenschaft. In H.-J. Lauth, M. Kneuer & G. Pickel (eds.), Handbuch Vergleichende Politikwissenschaft. Wiesbaden: Springer Fachmedien Wiesbaden. Retrieved from http://link.springer.com/10. 1007/978-3-658-02338-6_59 Wenzelburger, G., Zohlnhöfer, R. & Wolf, F. (2013). Implications of dataset choice in comparative welfare state research. Journal of European Public Policy 20(9): 1229–1250. Wilkins, A.S. (2018). To Lag or Not to Lag?: Re-Evaluating the Use of Lagged Dependent Variables in Regression Analysis. Political Science Research and Methods 6(2): 393–411.
References
363
Williams, L.K. & Whitten, G.D. (2012). But Wait, There’s More! Maximizing Substantive Inferences from TSCS Models. The Journal of Politics 74(3): 685–693. Wilson, P. & Cooper, C. (2008). Finding the magic number. The Psychologist 21: 866–867. Wilson, S.E. & Butler, D.M. (2007). A Lot More to Do: The Sensitivity of Time-Series Cross-Section Analyses to Simple Alternative Specifications. Political Analysis 15(2): 101–123. Witt, M.A. (2010). China: What Variety of Capitalism? SSRN Electronic Journal. Retrieved from http://www.ssrn.com/abstract=1695940 World Bank Data Help Desk. (2019, August 27). Why use GNI per capita to classify economies into income groupings? Retrieved 27 August 2019, from https://datahelpdesk.worldbank. org/knowledgebase/articles/378831-why-use-gni-per-capita-to-classify-economies-into World Economic Forum. (2019). The Global Competitiveness Report 2019. Retrieved from http://www3.weforum.org/docs/WEF_TheGlobalCompetitivenessReport2019.pdf WVS. (2015). World Value Survey 1981–2014 official aggregate v.20150418, 2015. World Values Survey Association. Retrieved from www.worldvaluessurvey.org Zeez, M. & Henderson, E.A. (2013). The World Religion Dataset, 1945-2010: Logic, Estimates, and Trends. International Interactions (39): 265–291. Zhang, Z., Hamagami, F., Lijuan Wang, L., Nesselroade, J.R. & Grimm, K.J. (2007). Bayesian analysis of longitudinal data using growth curve models. International Journal of Behavioral Development 31(4): 374–383. Zhou, X. & Reiter, J.P. (2010). A Note on Bayesian Inference After Multiple Imputation. The American Statistician 64(2): 159–163. Zuur, A.F. & Ieno, E.N. (2016). A protocol for conducting and presenting results of regressiontype analyses. (R. Freckleton, ed.)Methods in Ecology and Evolution 7(6): 636–645. Zuur, A.F., Ieno, E.N. & Elphick, C.S. (2010). A protocol for data exploration to avoid common statistical problems: Data exploration. Methods in Ecology and Evolution 1(1): 3–14.