215 21 15MB
English Pages 387 [388] Year 1994
Trends and Perspectives in Empirical Social Research
Trends and Perspectives in Empirical Social Research Editors Ingwer Borg and Peter Ph. Möhler
W G DE
Walter de Gruyter · Berlin · New York 1994
Prof. Dr. Ingwer Borg, Scientific Director at Z U M A (Center for Survey Research and Methodology) in Mannheim and Professor for Psychology at the Justus-Liebig-University of Giessen. Prof. Dr. Peter Ph. Möhler, Director of ZUMA (Center for Survey Research and Methodology) in Mannheim and Hon. Professor at the University of Mannheim.
© Printed on acid-free paper which falls within the guidelines of the ANSI to ensure permanence and durability.
Library of Congress Cataloging-in-Publication
Data
Trends and perspectives in empirical social research / Ingwer Borg and Peter Ph. Möhler (eds.), Includes bibliographical references and index. ISBN 3-11-014311-9 (cloth) 3-11-014312-7 (pbk.) 1. Social sciences - Research. II. Möhler, Peter Ph., 1945- , H62. T65 1994 300-dc20
I. Borg,
Die Deutsche Bibliothek — Cataloging-in-Publication
Ingwer.
94-26590 CIP
Data
Trends and perspectives in empirical social research / ed. Ingwer Borg and Peter Ph. Möhler. - Berlin ; New York : de Gruyter, 1994 ISBN 3-11-014311-9 (geb.) 3-11-014312-7 (brosch.) NE: Borg, Ingwer [Hrsg.]
© Copyright 1994 by Walter de Gruyter & Co., D-10785 Berlin All rights reserved, including those of translation into foreign languages. No part of this book may be reproduced in any form - by photoprint, microfilm, or any other means nor transmitted nor translated into a machine language without written permission from the publisher. Printing: WB-Druck, Rieden. - Binding: D. Mikolai, Berlin. - Cover Design: Johannes Rother, Berlin. - Printed in Germany.
Editors' Note In 1994 ZUMA, the Center for Survey Research and Methodology, celebrates its twentieth anniversary. The institute was established in 1974 to assist and advise researchers on the design and execution of their empirical social research, in particular research involving survey procedures. It is one of three institutions which make up the German federal- and state-funded organisation of GESIS (Gesellschaft Sozialwissenschaftlicher Infrastruktureinrichtungen e.V.), whose task it is to provide an infrastructure for the German social sciences. Over the past twenty years, ZUMA has grown in size and developed in scope. From the outset, assisting and advising scholars in their empirical research has been the anchor of its activities. The center's involvement can be restricted to one specific aspect of a research project or can extend to every stage - from research design, through questionnaire design and sampling, to data processing, data protection, and statistical analysis. The researchers who consult ZUMA about projects come from all branches of social research: nationwide surveys, elite studies, panel studies, text analyses, observation, and from such varied substantive fields as Political Sociology, Sociology, Psychology, Gerontology, Criminology, Communications Research, Epidemiology, and Ethnology. To ZUMA's central commitment to such projects have gradually come a number of new responsibilities. Thus, in addition to focusing in-house research on empirical research methodology and technology, ZUMA is also directly active in important survey projects. It houses, for instance, the German General Social Survey (ALLBUS) and the German part of the International Social Survey Programme. Two more recent additions to the center have been the Department of Microdata and the Department of Social Indicators. The first provides the scientific community with data from the bureau of census, the second maintains a time series of data on subjective and objective social indicators (with the emphasis on measures of well-being and social welfare). ZUMA's twentieth anniversary has been greeted as a welcome opportunity to invite colleagues associated with the center or leading in their field to contribute to a volume in which, looking back, they trace developments in their particular field of expertise and outline the current state of the art and, in looking forward, point the way to future developments. The articles in the volume reflect the research areas in which ZUMA is active - from national and cross-national survey research to modeling, and from quantitative approaches to hermeneutic perspectives. The editors wish to thank Rita Haaf, Jolantha Müllner, Berhard Krüger and Paul Liittinger of ZUMA for their assistance in converting sometimes rough-and-ready texts into camera-ready copy while struggling with the intricacies and mysteries of DOS, Windows, WinWord, Novell and PostScript. A special tribute goes to Dagmar Haas, who supported the editors in collecting and editing the texts, and to Sabine Hochholdinger, for bringing her text processing skills to this undertaking. Ingwer Borg Peter Ph. Möhler
Contents Social Indicators Research: Societal Monitoring and Social Reporting Heinz-Herbert Noll and Wolfgang Zapf 1
The Origin of a Movement and its Objectives
1
2
Principles and Approaches
3
3
Social Reporting: Achievements and Results
5
4
Evaluating the Effort: A View from the Nineties
10
5
Looking Ahead
11
Nationwide General Social Surveys James A. Davis, Peter Ph. Möhler, and Tom W. Smith 1
Overview
17
2
Origins of NGSS
17
3
Conclusion: Trends and Perspectives
23
Measurement in Multi-National Surveys Duane F. Alwin, Michael Braun, Janet Harkness, and Jacqueline Scott 1
Introduction
26
2
Comparative Research in the Social Sciences
27
3
Problems of Functional Equivalence of Measurements
30
4
Conclusions - The Future
35
Cognitive and Communicative Aspects of Survey Measurement Norbert Schwarz, Herbert Bless, Hans-J. Hippler, Fritz Strack, and Seymour Sudman 1
Introduction
40
2
Respondents' Tasks
41
3
Question Comprehension
41
4
Attitude Measurement and the Emergence of Context Effects
44
5
Concluding Remarks
54
Vni
Contents
Secondary Analysis of Official Microdata Richard Alba, Walter Müller, and Bernhard Schimpl-Neimanns 1
Introduction
57
2
Illustrations from the United States
58
3
Empirical Social Research and Official Data in Germany
60
4
Strengths and Weaknesses of Secondary Analysis of Official Microdata
63
5
Analytic Potential of Major Bodies of Official Microdata
66
6
Research Potential and Prospects
71
Computer-Assisted Interviewing in Social and Market Research Rolf Porst, Michael Schneid, and Jan Willem van Brouwershaven 1
History
79
2
Definitions and Descriptions
81
3
Classification and Function of Computer Technology for Social and Market Research
81
4
Features and Capabilities of Computer-Assisted Data Collection Systems
85
5
Implications and Methodological Aspects of the Use of Computers
88
6
New Data Collection Techniques
92
7
The Future of Computer-Assisted Data Collection
94
The Study of Work Values: A Call for a More Balanced Perspective Arthur P. Brief and Ramon J. Aldag 1
Introduction
99
2
A Brief History of Work Values and their Study
100
3
Support for the External View
102
4
Other Evidence for the Functionality of Economic Work Outcomes
107
5
Implications for Management Practice
6
Implications for Applied Organizational Research
!
110 Ill
Contents
IX
Theory and Practice of Sample Surveys Horst Stenger and Siegfried Gabler 1
Introduction
125
2
Fixed Populations and Sampling Designs
125
3
Superpopulation Models
130
4
Analytic Studies
132
5
Nonsampling Errors
133
6
Sampling at ZUMA
134
Statistics and the Sciences Jan de Leeuw 1
Introduction
139
2
Statistics
139
3
The Evaluation of Statistical Techniques
142
4
The Role of Models in Statistics
144
5
Connection Models and Techniques
147
Measurement: The Reasonable Ineffectiveness of Mathematics in the Social Sciences Peter H. Schönemann 1
Introduction
149
2
The Dangers of Premature Precision
150
3
Scaling and Measurement
152
4
Uniqueness, Meaningfulness and Klein's Erlanger Program
154
5
The Psychology of Rectangles
155
6
The Unreasonable Effectiveness of Mathematics in the Natural Sciences
157
Nominal, Ordinal, Interval and Ratio Typologies are Misleading Paul Velleman and Leland Wilkinson 1
Introduction
161
2
Stevens' Typology of Data
161
3
Prescribing and Proscribing Statistics
163
4
Classical Criticisms of Stevens Proscriptions'
164
X
Contents
5
The Controversy over Statistics and Scale Types
165
6
Alternative Scale Taxonomies
166
7
Proscribing Transformations
167
8
Good Data Analysis does not Assume Data Types
167
9
Stevens' Categories do not Describe Fixed Attributes of Data
168
10 Stevens' Categories are Insufficient to Describe Data Scales
169
11 Statistics Procedures cannot be Classified according to Stevens' Criteria
169
12 Scale Types are not Precise Categories
170
13 Scales and Data Analysis
170
14 Meaningfiilness
171
15 The Axiomatic Argument
172
16 A Role for Data Types
173
17 Conclusion
174
Evolving Notions of Facet Theory Ingwer Borg 1
Introduction
178
2
Early FT and Attitudinal Behavior
180
3
Early FT and Intelligence Testing
181
4
On Mapping Sentences and Models
183
5
Content Facets and Range Facets
186
6
On Item Definitions and Attitudes
188
7
Correspondence Hypotheses
190
8
FT as a Theory
194
9
FT Perspectives
195
Factor Analysis in the 1980's and the 1990's: Some Old Debates and Some New Developments James H. Steiger 1
Introduction
201
2
Some Theoretical Background
202
3
Key Developments from 1980-1994
207
4
Challenges and Directions for Future Research
217
Contents
XI
Causal Modeling: Some Trends and Perspectives Frank Faulbaum and Peter M. Bentler 1
Aims and Scope of Causal Modeling: Overview and Historical Developments
224
2
Constructing Initial Candidate Models
226
3
Model Estimation
230
4
Evaluation of Model Fit (Fit Indexes)
234
5
Model Modification
239
6
Concluding Remarks and Suggestions for Future Research
240
Attitude Theory and Measurement: Implications for Survey Research leek Ajzen and Dagmar Krebs 1
Historical Perspective
250
2
Unidimensional Conceptions of Attitude
251
3
Multidimensional Conceptions of Attitude
257
4
Conclusions
261
Reconciling Macro and Micro Perspectives by Multilevel Models: An Application to Regional Wage Differences Uwe Blien, Michael Wiedenbeck, and Gerhard Arminger 1
Multilevel Models Bridge the Gap between the Micro and Macro Level
266
2
An Example: the Regional Wage Curve Hypothesis
267
3
Random Coefficient Models
268
4
Data and Variables to Test the Wage-Curve Hypothesis
274
5
Results
275
6
Summary
280
A Phenomenological Approach to Social Research: The Perspective of the Other Carl F. Graumann 1
The Notion of Approach
283
2
The Conception of a Phenomenological Approach
284
3
Phenomenology and Social Research
285
4
Phenomenological Approaches in Social Research: Two Illustrations
286
5
The Phenomenological Approach in Perspective
290
ΧΠ
Contents
Hermeneutic Interpretation in Qualitative Research: Between Art and Rules Manfred Lueger and Jürgen H.P. Hoffmeyer-Zlotnik 1
The Tradition of Hermeneutics
294
2
Psychoanalytic Hermeneutics
296
3
Objective Hermeneutics
298
4
Hermeneutics of the Sociology of Knowledge
301
5
Interpretation: Rules versus Art or Rules Within Art
303
6
Prospects
304
On the Integration of Quantitative and Qualitative Methodological Paradigms (Based on the Example of Content Analysis) Norbert Groeben and Ruth Rustemeyer 1
The Metatheoretical Point of Departure: The Position of Content Analysis Between Monism and Dualism
308
2
Prototypes for Systematizing Understanding
312
3
(Methodological) Goal Criteria for a Constructive Approach to Content Analysis: Adjustment of Rules with Explicit Elaboration of Inferences
317
On the Link Between Methodical Systematics and Object Adjustment as a Way of Reconciling the Quantitative and the Qualitative Paradigm
322
4
Trends and Perspectives in Content Analysis Peter R. Schrott and David J. Lanoue 1
Introduction
327
2
Post-War Research on Content Analysis
328
3
A Content Analysis of Content Analyses: Research Design
329
4
Results
330
5
Recent Developments in Content Analysis
337
6
Conclusion
339
About the Contributors
347
Name Index
353
Subject Index
367
Social Indicators Research: Societal Monitoring and Social Reporting Heinz-Herbert Noll and Wolfgang Zapf
1
The Origin of a Movement and its Objectives
That area of the social sciences called social indicators research was created in the United States in the mid-1960s. One can regard as the original source of social indicators research a project of the space agency NASA in which the side effects of the space program on American society were investigated. It came to the conclusion that there was almost a complete lack not only of adequate data but also of an adequate methodology for measuring such social consequences. Raymond A. Bauer, the director of the project, presumably invented the concept of "social indicators". In his definition, social indicators were "statistics, statistical series, and all other forms of evidence that enable us to assess where we stand and are going with respect to our values and goals" (Bauer, 1966, p. 1). There have been predecessors of modern social indicators research as early as the 1920s and 1930s with the trend reports of W.F. Ogburn and in the 1950s with the "level of living" research of J. Drenowski. The series of trend reports which were published in the American Journal of Sociology between 1928 and 1942 under the heading "Recent Social Changes" and, especially, the so-called Ogburn report on "Recent Social Trends in the United States" published in 1933 by the "President's Committee on Social Trends" (established by President Hoover and directed by Ogburn) were models for modern social reporting. The work of Drenowski and an expert commission of the United Nations in the 1950s is another source of social indicators research in so far as they tried to improve the measurement of living standards by identification of particular components and by the development of indicators for such components. Soon other countries and international organizations adopted the ideals, concepts, and early approaches of social indicators research which were first discussed in the United States. The OECD (Bertrand, 1986/87) started its program of work on social indicators in 1970, and roughly at the same time, the Social and Economic Council of the United Nations together with the Conference of European Statisticians began to develop a "System of Social and Demographic Statistics". The program of social indicators research had an unusually strong impact; it was undertaken with a strong sense of commitment and also with a sense of mission that united all participants: for these reasons this innovation was termed the "social indicators movement". Besides social scientists, economists, and statisticians, high ranking civil servants and politicians were also involved, which is quite unusual for a research program. One reason for the rise and rapid diffusion of this "movement" was the political climate and the sociopolitical mood of the late 1960s and early 1970s (Rockwell, 1986/87, p. 256f.). Even in that phase of high growth rates, doubts were raised in the highly developed western industrial societies about economic growth as the major goal of societal progress. The "social costs" of
2
Heinz-Herbert Noll and Wolfgang Zapf
economic growth and "public poverty" as the other side of the coin of "private affluence" got public attention and received prominence in social policy. There was increasing doubt whether "more" should ever equal "better", and it became a public claim to prefer quality to quantity. The concept of "quality of life" which resulted from this discussion was propagated as an alternative to the more and more questionable concept of the affluent society and served thereafter for social policy as a new, but also much more complex, multidimensional goal. Quality of life as a goal of social policy encompasses all fields of life and comprises, besides individual material and immaterial well-being, also collective values like freedom, justice, and the guarantee of natural conditions of life for present and future generations. Another characteristic idea in the political climate of the 1960s and early 1970s was that societal structures and processes could be comprehensively modeled and actively guided by politics. Interventionist ideas and concepts were very popular. The program of the "Great Society" in the United States or the reform politics of the social-liberal coalition in the German Federal Republic were manifestations of this optimism. The concepts of the "active society" and an active social policy respectively represented the imagined transition from a reactive politics of "muddling through" to a new model of governing which was committed to a rationalization of the political process, and thus a leveling of the so-called issue-attention cycle and a prospective and planned formation of life conditions at large. Politics which aims at such goals is, however, in a specific way dependent on information which enables it to recognize problems early, to set priorities, and to monitor and control the success and the impact of its policies (Zapf, 1977a; Fattaccini, 1986/87). Even if the possibility of comprehensive planning and the guidance of societal development by politics today is judged much less optimistically than before, and even if the sociopolitical problems and tasks have severely changed, nevertheless, the concept of quality of life, even today, is of rising importance, and the need for adequate information has not decreased but rather increased. Given this background, one can interpret the rise of social indicators research above all as an answer to the increased demands for information made by an active social policy and posed by the challenge to operationalize and to quantify its core formula: the concept of quality of life. However, the social indicators movement in the tradition of Ogburn was characterized from the beginning - besides its policy orientation - by the concern to investigate significant trends of social change and, by doing this to provide an information base for the observation of the modernization of societies at different levels of development (Sheldon & Moore, 1968; Modell, 1987). Therefore, in a broad sense, all statistical information which is "important in reference to the quality of life, modernization, and an active social policy" can be considered, in summary, as "social indicators" (Zapf, 1977b, p. 246). All in all, the social indicators movement sought to realize a program of fundamental improvement of the societal information base and of the practical implementation of these improvements: "The felt need was for more adequate monitoring and reporting of social conditions and processes - implying a need to develop improved measures of these phenomena, together with expanded data collection capabilities. Thus the dual goals of the social indicators movement were apparent from the start to establish an improved social reporting capability as soon as possible, and to encourage long-term research and development in the general area of social measurement and model-building" (Johnston, 1990, p. 433). The program of social indicators research was oriented toward the model of economic indicators and the already much further developed practice of economic reporting, but, at the same time, it went beyond the narrow and one-sided perspective of a purely economic interpretation of societal development.
Social Indicators Research: Social Monitoring and Social Reporting
2
3
Principles and Approaches
From today's perspective the most important functions of social indicators are the measurement of welfare and the observation of trends of social change. In a broad sense, social indicators are regarded as instruments for the regular observation and analysis of social change - "monitoring of social change" (Sheldon & Moore, 1968). The major goal is to develop standards for the level of modernization of society and to register progress in modernization and the connected problems and consequences. In this context social indicators are "all data which enlighten us in some way about structures and processes, goals and achievements, values and opinions" (Zapf, 1977b, p. 236). In particular the tasks deriving from those functions are: the description of social trends, the explanation of these trends, the identification of relevant relationships between different developments, and the investigation of the consequences of those changes in time series of indicators (Land, 1983). Descriptive indicators of social change just as well may be objective indicators as subjective indicators, and they may be related explicitly - but that is not a necessary condition - to social policy goals. Therefore their primary function is not the direct guidance and efficiency control of political decisions, but broad societal enlightenment and the provision of an information base which instead supports politics in an indirect way. The primary function of social indicators is the measurement of welfare, whereby welfare development can be understood as a specific dimension of the comprehensive modernization of societies (Zapf, 1991; 1993). This function of welfare measurement has its origin in the criticism of an overly simplistic growth policy and in the inadequacies of the gross national product as welfare indicator and one-dimensional standard of societal development. On the contrary, a multi dimensional concept of welfare, as it is reflected in the concept of quality of life, demands a differentiated measurement of welfare in the way that welfare benefits and welfare losses are quantified across all relevant areas of life and are compared with goals. In their function as welfare indicators and indicators of the quality of life respectively, social indicators should be related to individuals, oriented towards societal goals, and should measure the outputs not the inputs of social processes. As welfare indicators social indicators always have a direct normative relationship, and changes in indicators should be uniformily interpreted as improvements or deteriorations of living conditions and/or the quality of life. It is in this sense that Mancur Olson has, in his classical definition, called social indicators "a statistic of direct normative interest which facilitates concise, comprehensive and balanced judgments about the condition of major aspects of a society. It is in all cases a direct measure of welfare and is subject to the interpretation that if it changes in the 'right' direction, while other things remain equal, things have gotten better, or people are 'better off" (Department of Health, Education, and Welfare, 1969, p. 97). According to the respective frame of reference of such evaluations, we talk about "objective" or "subjective" social indicators (Noll, 1990). Objective social indicators are statistics which represent social facts independently of personal evaluations; subjective social indicators, on the other hand, emphasize the individual experience and evaluation of social circumstances. A concept which bases welfare measurement exclusively on objective indicators is the level of living approach of Scandinavian welfare research. Here, welfare is understood as "individuals' command over resources in terms of money, property, knowledge, psychic and physical energy, social relations, security ..." (Erikson/Uusitalo, 1987, p. 189); the interest is on objective living conditions and life chances. On the contrary, subjective approaches have arisen especially from the American quality of life research which, coming from social psychology, ultimately defines welfare as subjective well-being. Using objective indicators implies that living conditions can be judged as being favorable or unfavorable on the basis of objective observations which compare real conditions with norma-
4
Heinz-Herbert Noll and Wolfgang Zapf
tive criteria, e.g. with societal values or political goals. Precondition, however, is that there is a consensus about the dimensions that are relevant for welfare in the first place, a consensus about good and bad conditions and about the direction in which society should move. Using subjective social indicators is, instead, based on the premise that welfare, in the final instance, is perceived by individual citizens and can be judged best by them (Campbell, 1972). This position, too, is not undisputed and has caused a deep controversy about the principles of welfare measurement, which, however, has now been largely settled. Today, the consensus of opinion is to base welfare measurement on both objective and subjective indicators, given the fact "that similar living conditions are evaluated quite differently, that people in bad conditions are satisfied and privileged persons may be very dissatisfied" (Zapf, 1984). Individual welfare or quality of life, is therefore defined as "good living conditions which go together with positive subjective well-being" (Zapf, 1984, p. 23). For the observation and analysis of quality of life, the relationships between objective and subjective indicators are of interest, because subjective well-being is only partially determined by external conditions. The specification of categories of individual welfare that can be achieved by combining information about objective living conditions and subjective well-being is not only of analytical interest, but may also be helpful for guiding social policy (Fig. 1). Figure 1: Categories of individual welfare
Subjective Well-being Objective Living Conditions
Good
Bad
Good
WELL - BEING
DISSONANCE
Bad
ADAPTATION
DEPRIVATION
Source: Zapf (1984) The coincidence of good living conditions and positive well-being is the preferred combination and is called here, referring to OEDC terminology, as "Well-Being". "Deprivation" is the constellation in which bad living conditions covary with negative well-being. "Dissonance" is the term used to describe the inconsistent combination of good living conditions and dissatisfaction, and is also called the "dissatisfaction dilemma". Finally "Adaptation" is the combination of bad living conditions and satisfaction, and is also called the "satisfaction paradox". Ceteris paribus, the larger the proportion of the population in the well-being category, the higher the welfare level of a society. Aside from the deprived, who are the traditional clients of social politics, the adapted also create a special problem group: "The adapted often represent the reality of powerlessness and social retreat. Often it is just this group, which subjectively adapts to obvious defi-
Social Indicators Research: Social Monitoring and Social Reporting
5
ciencies, that is overlooked and dismissed by the established social policy programs" (Zapf, 1984, p. 26). In the early stages of the social indicators movement, aside from their descriptive and analytical functions of welfare measurement and monitoring of social change, social indicators had been ascribed much more far-reaching functions of goal and priority setting, program evaluation, societal warning system, and prediction and guidance of political processes. Those ambitious expectations have not been fulfilled, however, and from today's perspective must be regarded as unrealistic.
3
Social Reporting: Achievements and Results
Social reporting is the most important and most successful application of social indicators research. Modeled after economic reporting, social reporting aims at providing "information on social structures and processes and on preconditions and consequences of social policy, regularly, in time, systematically, and autonomously" (Zapf, 1977c, p. 11). In a less ambitious sense one can also define social reporting as "the description, explanation and interpretation of social trends for an audience without special training in social science methods" (Rockwell, 1983, p. 90) or simply as the presentation of data which enable the evaluation of living conditions of the population and their change over time (Duff, 1989, p. 1). The function of social reporting therefore is to describe and to analyze the state and changes of a population's living conditions and life quality with an adequate empirical data base in the sense of regular and comprehensive monitoring. As a specific mode of production, distribution, and presentation of socially relevant knowledge, social reporting today is well established within the information system of many countries and within international and supranational organizations, like the OECD, the European Union and the United Nations. This development first started in the United States with the mandate given by the "U.S. Department of Health, Education, and Welfare" to economist Mancur Olson to conceptualize the prototype of a national social report (Department of Health, Education, and Welfare, 1969). Since then, the center of gravity of these activities has moved to Europe. 1 Disregarding for the moment the former socialist countries in Eastern Europe, in Europe there are now only a few countries - e.g., Belgium, Greece, and Ireland - which do not conduct social reporting on the national level (Fig. 2). If one regards the successful - even if incomplete - spread of national social reporting as a process of diffusion of an innovation within the system of societal information, certain regularities may be detected. The establishment of national social reporting obviously was promoted best in Europe under the preconditions of an articulated welfare-state program of social policy, an interventionist orientation of government, innovative statistical agencies, and geographical centrality (Rothenbacher, 1993, Habich & Noll, 1993). Whereas the Scandinavian countries, Great Britain, the Netherlands, France, and the German Federal Republic were among the trendsetters in the establishment and institutionalization of social reporting, the Southern European nations were latecomers. The "classics" among social reports - the British "Social Trends", the Dutch "Social and Cultural Report", and the French "Donn6s Sociales" - have now been published regularly for more than two decades. With a series of new reports having been published in the early 1990s in Spain, Italy, Portugal, Turkey and Cyprus, the Southern European gap has now also been more or less closed (Noll 1993a).
6
Heinz-Herbert Noll and Wolfgang Zapf
Figure 2: Social reports from European countries
Country Institution
Title
First edition
Latest edition
Periodicity
Austria Statistisches Zentralamt
Sozialstetistische Daten
1977
1990
4/5 years
Cyprus Department of Statistics and Research, Ministry of Finance
Social Indicators
1992
1992
?
Denmark Danmarks Statistics/ Socialforskningsinstituttet
Levevilkir i Danmark
1976
1992
4 years
Federal Republic of Germany Statistisches Bundesamt 1985ff with Sfb 3 1992 with WZB and ZUM A
Datenreport
1983
1992
2 years
France Institut Nationale de la Statistique et des Economique
Donndes Sociales
1973
1993
3 years
Great Britain Central Statistical Office
Social Trends
1970
1993
annually
Hungary
Social Report
1990
1990 English 1992 1990
?
Tarki Italy
Sintesi della Vita Sociale Italiana
1990
?
Istituto Nationale di Statistica Netherlands Social and Cultural Report Social and Cultural Planning Office
1974
1992 English 1993
2 years
Portugal Institute Nacional de Estatistica
1992
1992
?
1991 1974
1991
7 7
1987
irregularly
1990
7
Portugal Social 1985-1990
Spain Indicadores Sociales Institute Nacional de Estadistica Panoramica Social Sweden Statistics Sweden
Perspectiv p i Välfarden
Turkey Prime Ministry State Planning Organization
Social Indicators
1990
Social Indicators Research: Social Monitoring and Social Reporting
7
The rise of social reporting in these countries obviously is not only connected with the reform and expansion of the statistical infrastructure, but also with the social modernization occurring in recent years. In addition processes of European integration had a more than minor impact on the further development of social reporting. As we can see in the publication of a social report in Hungary, it seems to be that with political liberalization and the transition to a market economy, the development of social reporting has now also gained momentum in the Eastern European countries.
Figure 3: Selected supranational social reports
Title
First edition
Periodicity
Social Indicators for the European Community
1977
1980, 1984
Social Europe
of
1991
3 years
OECD
Living Conditions in OECD Countries
1986
The World Bank
World Report
1978
annually
1987
annually
1990
annually
Institution EUROSTAT
Portrait
Development
Social Indicators Development United Nations Human Development Report Programme
of
Development
8
Heinz-Herbert Noll and Wolfgang Zapf
Supranational organizations took up social reporting early and continue today to be some of the most important actors in this area. 2 The "OECD Programme of Work on Social Indicators" (OECD, 1982, Bertrand, 1986/1987) and the "System of Social and Demographic Statistics" of the United Nations (United Nations, 1975), conceptualized by Richard Stone, have heavily influenced modern social reporting. The OECD, however, failed to convert its concepts into a regular reporting system. The OECD program was canceled in the mid-1980s after the first and final publication of the report "Living Conditions in OECD Countries" (Fig. 3). Today, the activities in social reporting of the OECD are restricted to special areas, such as education (OECD, 1994), science (OECD, 1988), or the environment (OECD, 1991a,b). At present, the diverse activities of the United Nations and its special organizations are concentrated on global observations of development in third-world countries. Besides the World Bank Reports, the Human Development Reports of the United Nations Development Programs is of special interest in our context. Especially the effort to use the Human-Development Index as a synthetic measure of societal development which integrates different dimensions of the level of living has won attention and started a new discussion about the construction of summary indices (Klingebiel, 1992; Lind, 1992). In contrast to the United Nations and the OECD, the European Community has taken up social reporting from the beginning in a more pragmatic manner. The series "Social Indicators for the European Community" was replaced in 1991 by the report "Social Portrait of Europe". At present, in the European Union (EU) context, there are efforts to employ new data collections - e.g. an EU-wide household panel and a time-budget study - to improve the data base for social reporting. The increasing demand for information about the social dimension of European integration (Noll, 1993b), and not least the upcoming enlargement of the EU through the addition of the Scandinavian countries and Austria let us expect that the EU will increase its activities in the area of social reporting in the future. The available reports demonstrate that social reporting is characterized by a plurality of conceptual approaches, reporting schemes, actors, and institutional solutions. Obviously, there is not one generally agreed-upon model, but a plurality of more or less successful and convincing variants. Nevertheless, there are common characteristics, and some standards have now been established. The majority of the so-called comprehensive reports follow a system of life domains or social concerns as originally proposed by the OECD. More and more often the topics include not only objective life conditions but also aspects of the subjective well-being of the population. It is now more or less agreed upon that social reporting needs a specific data base as already established in several countries, e.g., Sweden (Level-of-Living-Survey), the Netherlands (Leefsituatiesurvey), or in Germany (Wohlfahrtssurvey), in the format of so-called comprehensive surveys, that is, regular representative probability surveys which are oriented towards objective living conditions and in most cases also to their subjective apprehension and evaluation. Very ambitious systems of social indicators, however, which were propagated in the early stages of the social indicators movement, are being found less and less in the practical social reporting efforts of today. The agents of social reporting are for the most part statistical offices, but also include special planning agencies, ministries, associations (e.g., trade unions), and professional institutions. The available reports therefore differ in analytical depth, adapted methods, and style of presentation. In any case, the guarantee of the continuity of monitoring is important: "The establishment of a new nationwide statistical system requires centralized technical resources and continuity in the work. Only then can it become a societal institution, with some financial and legal autonomy and regular publishing activities" (Vogel, 1990, p. 442). The advantages of official social reporting are the continuity and liability of reporting. In contrast, the advantages of non-
Social Indicators Research: Social Monitoring and Social Reporting
9
Figure 4: Typology of Social Reports
Level
Coverage
Actors
Supranational
Comprehensive
Official
• life-domain structure
• statistical offices
• list of social concerns
• ministries
• life-cycle concept
• state agencies
National
• local authorities Regional
Noncomprehensive
NonofGcial
• single domains
• research institutes
(health, education, work) Local
• subgroups (women, elderly, children)
• associations (welfare associations, unions, parties)
• social problems (poverty, crime, drug abuse)
official, private social reporting, however, are greater autonomy, freedom, and flexibility. As important as institutionalization and routinization might be, equally important for ambitious social reporting is the innovative impulse of methodological and conceptual innovations, which are best guaranteed in the realm of professional reporting efforts. Among the most important trends in the development of social reporting which could be observed in recent years in the Federal Republic of Germany and elsewhere are tendencies toward topical specialization and geographical differentiation. 3 On the one hand, the demands, ideas, and concepts which social indicators research has developed were taken up in order to create specialized reporting systems for particular life and policy areas. In addition to the official and non-official examples of comprehensive social reporting, a series of special reports have now come into being which concentrate on specific sectors of life, e.g., health, education, or family; on special social problems, e.g., poverty or crime; or on particular population groups, e.g., children, senior citizens, or women. Moreover aside from topical specialization, the development of social reporting is presently also characterized by a trend towards spatial differentiation. The social reporting concepts which were primarily developed for societal or supranational levels are increasingly being applied to lower-level aggregates such as, regions, provinces, or cities and communities (Asselin et
10
Heinz-Herbert Noll and Wolfgang Zapf
al., 1992; INSEE, 1989; Instituto Vasco de Estadistica, 1990; Noll & Schröder, 1994). As in national social reporting, the primary concern is the observation of the quality of life of citizens; however another concern is to recognize tendencies toward structural change and social problems early on, to identify problem groups, and to evaluate the attainment of political goals.
4
Evaluating the Effort: A View from the Nineties
If one takes a retrospective look at the balance of social indicators research and the goals it has attained, several stages can be differentiated (Andrews, 1990; Fattaccini, 1986/87; Land, 1983). The founding stage from the middle of the 1960s to the early 1970s was characterized by the formation of the social indicators movement, the development of programs, and the realization of significant pilot studies of social reporting. The boom period of social indicators research took place in the 1970s. In this period, the then innovative ideas and concepts were taken up worldwide - including in the then socialist societies of Eastern Europe (Priller, 1990; Noll, 1992). Moreover, efforts were made to implement them in the context of scientific research as well as in the activities of statistical offices, administrations, and international organizations. The success of this period is manifested not only in a flood of publications and in the acceptance of social indicators research as a self-sustaining area of research within academic social science and the establishment of regular social reporting in many countries, but also in the creation of a specific infrastructure of data generation for societal monitoring and social reporting, e.g., Qualityof-Life-Surveys, Level-of-Living-Surveys, or General Social Surveys and Household Panels: "The 1970s produced substantial institutional, conceptual, and methodological progress with respect to the social indicators field" (Andrews, 1990, p. 402). Even then, the center of activities and innovations in social indicators research had already started to shift more and more from the United States to Europe - and especially to the Scandinavian countries, the Netherlands, and the Federal Republic of Germany. A third period from the end of the 1970s until the mid-1980s was characterized by stagnation and in part by decreasing interest in social indicators research: The number of publications went down, research projects ended, the promotion of social indicators research was drastically reduced in the United States and the "Center for Coordination of Research on Social Indicators" was closed (Rockwell, 1986/87). In addition international organizations, and especially the OECD, drastically reduced their commitment to this area. This development can partly be explained by - The economic crisis which beset the Western industrial societies at the beginning of the 1980s and redirected the focus of politics back towards basic economic problems - The change in political climate and social policy programs, which was connected in several countries with the transition from liberal to conservative governments - The diffusion and in part routinization of ideas, concepts, and methodological standards in other areas of social science and official statistics (Glatzer, 1981) - Unsolved methodological and theoretical problems in social indicators research, not the least also the fact that it could not communicate its successes and the usefulness of its products sufficiently (Andrews, 1990; Fattaccini, 1986/87; Vogel, 1990). What was regarded by some observers as crisis, disillusionment, and disappointment (Innes, 1990) may be regarded from a more optimistic perspective as a process of consolidation and maturity (Andrews, 1990, p. 405; Noll, 1990, p. 77). For such an evaluation one not only can point to the fact that central activities of social monitoring and social reporting have been continued in many countries, but also that since the middle of the 1980s a revitalization can be
Social Indicators Research: Social Monitoring and Social Reporting
11
observed, which, as already demonstrated, has manifested itself, in a new wave of social reporting efforts on subnational, national, and international levels, the establishment of new institutions of social monitoring, and the further elaboration of social science and statistical infrastructures. Overall, in retrospective, the success of social indicators research is to be found more in the area of general societal enlightenment than in the production of technical expert knowledge or the provision of special planning intelligence for politics. The ambitious ideas of using social indicators to contribute to a rationalization of political processes, to establish goals and priorities, to evaluate political programs, and to introduce a societal early warning system - ideas which were thriving in the early phase of the social indicators movement - have proven to be unrealizable. In this regard, social indicators have suffered a similar fate as other scientific instruments of political decision making, e.g., cost-benefit analysis or the Planning-ProgrammingBudgeting System. Presumably, the complexity of the political process has been underestimated, and the relevance of empirical information about the state and change of societies for political action has been judged too optimistically: "The failure was more due to an overly simplistic view of how and under what conditions knowledge influences policy, than to, as some observers suggested, a fundamental conflict between the worlds of knowledge and public action" (Innes, 1990, p. 430). In comparison to an instrumental or technocratic model which proposes a direct demand on the part of politics for scientific information for the solution of social policy problems, a model of enlightenment (Weiss, 1979; Innes, 1989) seems to be more realistic, a model according to which social science is connected with politics rather indirectly. In this sense it is only logical that social indicators research and social reporting today is ascribed a less ambitious and less direct function as provider of information: "Social reporting ... belongs to the democratic infrastructure and has a special political function. To put it simply, social reporting places welfare issues on the political agenda. It supplies material to the public debate, influencing the media and, indirectly, the administration" (Vogel, 1990, p. 441). Although some of the ambitious expectations have not been fulfilled, social indicators research nevertheless has had some important successes in regard to its central goals: "The descriptive social indicators perspective - emphasizing the production of data, analyses, and models by which to improve social reporting - has been one of the singularly most successful lines of development pursued over the past two decades" (Land, 1983, p. 22). If one agrees that its basic premises are still valid, that the demand for theoretical and problem-oriented information about living conditions and social developments is increasing, and that the preconditions for social monitoring and social reporting today are brighter than ever before, then one should evaluate its future rather optimistically.
5
Looking Ahead
Where should social indicators research be going and what are the perspectives for further developments in social reporting? Looking at recent activities and taking into account changes in information demands as well as improvements in reporting capabilities, the following topics and tendencies are included among those that seem to be relevant and likely: - Monitoring the Transformation of Former Socialist Societies: One obvious task for social indicators research is the observation of system transformation in the former socialist societies of Central and Eastern Europe. In some countries, e.g., in Hungary and Germany, transforma-
12
Heinz-Herbert Noll and Wolfgang Zapf
tion studies already have brought about interesting results. It is important not only to apply the available instruments to the observation of these processes but to be aware of the particular theoretical and methodological problems of analysis of these historically unique events. - Reconsidering Concepts of Welfare and the Quality of Life: In the recent past a discussion about the goals of societal development - e.g., in connection with the concepts of reflexive modernization (Beck, 1991) and sustainable development (Pronk & Haq, 1992) - has started which might also be of importance for social indicators research and social reporting. In view of changed social contexts and problems - for example the crisis of the welfare state or the global ecological problems - the question arises how far concepts of modernization, individual welfare, subjective well-being, and quality of life are still adequate in their present format as frames of reference for the observation of social development, or if they actually need to be reformulated. - Construction of Synthetic Welfare Indices: The need for summary indices, synthesizing the various dimensions of welfare into one single measure, is not new. However, a renewed discussion of this old topic can now be observed, which will probably continue and become even more popular in future research. The most important manifestation of this new search for summary welfare indices is the Human Development Index, constructed within the United Nations Human Development Program, in order to rank nations according to their state of human development. Other examples of this kind of research are the "Index of Sustainable Economic Welfare" (Cobb, 1991) or so-called "all-inclusive-quality-of-life-indices" (Johnston, 1988). The search for summary welfare indices must not necessarily contradict the "great need for simple and user-friendly statistical indicators, capable of orienting and putting on firmer grounds the policy dialogue over social issues" (Garonna, 1994, p. 9), which some observers cite as a major challenge for future work. - Use of Longitudinal Data and Dynamic Analysis: The availability of longitudinal data has been decisively improved by life-history surveys, but even more so by household panels as they are carried out today in several countries, e.g., in the United States, Canada, Great Britain, or Germany, and in the future will be carried out throughout Europe. For social indicators research this opens new perspectives for the description and explanation of social change. Longitudinal information which goes beyond time series of aggregate data offers better opportunities for dynamic analysis, something which has also been demanded for the subjective side of the quality of life: "More needs to be known about the psychodynamics of aspirations and evaluations of reality if we are to understand better how people come to feel as they do about their life circumstances" (Andrews, 1990, p. 405). The improved data base also gives a new push to devices of model-building and microsimulation which were already used in the 1970s in social indicators research and which might be very useful instruments for future social reporting (Garonna, 1994, p. 8). - Strengthening the International Perspective: Comparative information and comparative analyses gain further importance with the increasing economic and political integration not only taking place in Europe but also in other regions of the world. For social reporting there is an increasing demand for efforts to harmonize the different national reporting systems, which continue to be quite independent, and to develop further the international and supranational reporting systems (Vogel, 1990, p. 443), e.g., a European-wide social reporting system which is worthy of its name. In order to accomplish these goals, the establishment of inter- and
Social Indicators Research: Social Monitoring and Social Reporting
13
supranational surveys in keeping with the ideas of social monitoring and welfare measurement will be one of the most important future tasks. - Revival of Social Accounting and Social Modeling: Recently, social accounting concepts, as developed by Richard Stone and others, have regained attention and are regarded as a potential future for the informational infrastructure (Garonna, 1994). There are large-scale efforts already in progress for the enlargement of national accounts by satellite systems, aside from the development of accounting systems for particular life domains, such as education or the environment. What makes the social accounting approach particularly attractive is the integrated framework it offers for observing and analyzing the linkages between different components and elements within a larger system. Beyond this, it has also been suggested to put more emphasis on social modeling in social indicators research in order to be able "to advance social change toward positive goals" (Ferriss, 1990, p. 416). - Developing Prospective Social Reporting: There are good reasons for developing social reporting, which has been largely retrospective, into an instrument with a much stronger future orientation. It can be left unanswered whether the inclusion of social indicators in prognostic models is, at present, a realistic goal. However, the application of scenario techniques and projections, model accounts, and simulations on the basis of social indicators are certainly possible. A consistent and rigorous utilization of these possibilities could be the beginning of a prospective mode of social reporting and could provide insights into probable developments that will be increasingly required in the future, not only in politics.
Notes ^The United States is one of the few countries where the publication of a comprehensive social report was cancelled after three editions and where social reporting in this sense has failed: "the United States differs from most of the rest of the Western world, where social reporting is visibly alive..." (Rockwell, 1986/1987, p. 255). 2
Cf. more extensively Habich & Noll (1993), p. 139 seq.
3
Cf. further Habich & Noll (1993) and Noll & Schröder (1994)
14
Heinz-Herbert Noll and Wolfgang Zapf
References Andrews, F.M. (1990). The Evolution of a Movement. Journal of Public Policy, 9, 401-405. Asselin, S., et al. (1992). Portrait Social Du Quebec. Quebec: Les Publications du Quebec. Bauer, R.A. (Ed.). (1966). Social Indicators. Cambridge, Mass., London: The M.I.T. Press. Beck, U. (1991). Der Konflikt der zwei Modernen. In W. Zapf (Ed.), Die Modernisierung moderner Gesellschaften. Proceedings of the 25. Deutschen Soziologentages in Frankfurt am Main 1990 (pp.40-53). Frankfurt: Campus. Bertrand, R. (1986/87). Les Indicateurs sociaux. The Tocqueville Review, 8, 211-233. Bulmer, M. (1990). Problems of Theory and Measurement. Journal of Public Policy, 9, 407412. Campbell, A. (1972). Aspiration, Satisfaction and Fullfillment. In A. Campbell and P. Converse, The Human Meaning of Social Change (pp.441-446). New York: Rüssel Sage Foundation. Cobb, C. (1991). Der 'Index of Sustainable Economic Welfare' oder: Hat die Wohlfahrt in der Gesellschaft wirklich zugenommen? In Η. Diefenbacher, S. Habicht-Erenler (Eds.), Wachstum und Wohlstand. Neuere Konzepte zur Erfassung von Sozial- und Umweltverträglichkeit (pp.61-72). Marburg: Metropolis. Department of Health, Education, and Welfare (Ed.). (1969). Toward α Social Report. Washington: US. Government Printing Office. Duff, L. (1989). Social Reports. A Bibliography of National and International Documents. Manuscript. Lanham, Maryland. Erikson, R., Uusitalo, H. (1987). The Scandinavian Approach to Welfare Research. Swedish Institute for Social Research, Reprint Series No. 181, Stockholm. Fattaccini, R. (1986/87). Le Mouvement des indicateurs sociaux aux Etats-Unis. The Tocqueville Review, 8, 235-249. Ferriss, A. (1990). Whatever Happened, Indeed! Journal of Public Policy, 9, 401-405. Garonna, P. (1994). Statistics Facing the Concerns of a Changing Society. Unpublished Paper. Istituto Nationale di Statistica. Roma. Glatzer, W. (1981). An Overview of the International Development in Macro Social Indicators. Accounting, Organization and Society, 6, 219-234. Habich, R., Noll, H.-H., in collaboration with W. Zapf (1993). Soziale Indikatoren und Sozialberichterstattung. Internationale Erfahrungen und gegenwärtiger Forschungsstand. Expertise für das Bundesamt für Statistik der Schweiz. Berlin and Mannheim. Innes, J.E. (1989). Knowledge and Public Policy. The Search for Meaningful Indicators. New Brunswick and London: Transaction Publishers. Innes, J.E. (1990). Disappointments and Legacies of Social Indicators. Journal of Public Policy, 9, 429-432. INSEE. (1989). Donnees Social lie de France. Paris. Instituto Vasco de Estadistica (1990). Encuesta de Conditiones de Vida 1989. Indicatores Generales. 2 vols. Vitoria-Gasteiz. Johnston, D.F. (1988). Toward A Comprehensive 'Quality-Of-Life' Index. Social Indicators Research, 20, 473-496. Johnston, D.F. (1990): Some Reflections on the United States. Journal of Public Policy,9, 433436. Klingebiel, S. (1992). Entwicklungsindikatoren in der politischen und wissenschaftlichen Diskussion: Der Human Development Index, der Human Freedom Index und andere neuere Indikatoren-Konzepte. INEF-Report, 2. University of Duisburg. Land, K. (1983). Social Indicators. Annual Review of Sociology, 9, 1-26.
Social Indicators Research: Social Monitoring and Social Reporting
15
Lind, Ν. (1992). Some Thoughts On The Human Development Index. Social Indicators Research, 27, 89-101. Modell, J. (1987). Conveying Social Change: Some Formal Considerations. The Tocqueville Review, 8, 1986/87, 187-209. Noll, H.-H. (1990). Sozialindikatorenforschung in der Bundesrepublik - Konzepte, Forschungsansätze und Perspektiven. In H. Timmermann (Ed.), Lebenslagen. Sozialindikatorenforschung in beiden Teilen Deutschlands (pp.69-87). Saarbrücken: Dadder. Noll, H.-H. (1991). Soziale Indikatoren. In G. Reinhold (Ed.) Soziologie-Lexikon, (pp.510513). Munich and Vienna: R. Oldenbourg. Noll, H.-H. (1992). Sozialindikatorenforschung und Sozialberichterstattung in der DDR. Berliner Journal für Soziologie, 3/4, 319-322. Noll, H.-H. (1993a). Neue Welle der Sozialberichterstattung in Südeuropa. ISI-Informationsdienst Soziale Indikatoren, 10, 13-16. Noll, H.-H. (1993b). Lebensbedingungen und Wohlfahrtsdisparitäten in der Europäischen Gemeinschaft. In W. Glatzer (Ed.), Einstellungen und Lebensbedingungen in Europa. Soziale Indikatoren XVII (pp.73-98). Frankfurt: Campus. Noll, H.-H., Schröder, H. (1994). Sozialberichterstattung in der Bundesrepublik Deutschland: Bestandsaufnahme und konzeptionelle Empfehlungen für einen Bericht zur sozialen Lage in Baden-Württemberg. Vorstudie für das Ministerium für Arbeit, Gesundheit und Sozialordnung Baden-Württemberg. Mannheim. OECD (1982). The OECD List of Social Indicators. OECD Social Indicator Development Programme. Paris: OECD. OECD (1988). Main Science and Technology Indicators 1982-1988. Paris: OECD. OECD (1991a). The State of the Environment. Paris: OECD. OECD (1991b). Environmental Indicators. A Preliminary Set. Paris: OECD. OECD (1994). Making Education Count. Developing And Using International Indicators. Paris: OECD. Priller, E. (1990). Sozialindikatorenforschung und Sozialberichterstattung in der DDR. In Η. Timmermann (Ed.) Lebenslagen. Sozialindikatorenforschung in beiden Teilen Deutschlands (pp. 109-123). Saarbrücken: Dadder. Pronk, J., ul Haq, M. (Ed.) (1992). Sustainable Development - From Concept to Action. The Hague Report. The Hague and New York. Rockwell, R. (1983). Social Indicators at the Council. A Review of the Current Program and an Overview of Future Plans. Items, 37, 90-94. Rockwell, R. (1986/87). Prospects for Social Reporting in the United States. A Receding Horizon. The Tocqueville Review, 8, 251-262. Rothenbacher, F. (1993). National and International Approaches in Social Reporting. Social Indicators Research, 29, 1, 1-62. Sheldon, E.B., Moore, W.E. (1968). Indicators of Social Change. Concepts and Measurement. New York: Rüssel Sage Foundation. United Nations (1975). Towards a System of Social and Demographic Statistics. Vogel, J. (1990). Social Indicators: A Swedish Perspective. Journal of Public Policy, 9, 439444. Zapf, W. (1972). Zur Messung der Lebensqualität. Zeitschrift für Soziologie, 1, 353-376. Zapf, W. (1977a). Gesellschaftliche Dauerbeobachtung und aktive Politik. In H.-J. Krupp and W. Zapf, Sozialpolitik und Sozialberichterstattung (pp.210-230). Frankfurt: Campus. Zapf, W. (1977b). Soziale Indikatoren - eine Zwischenbilanz. In H.-J. Krupp and W. Zapf, Sozialpolitik und Sozialberichterstattung (pp.231-246). Frankfurt: Campus.
16
Heinz-Herbert Noll and Wolfgang Zapf
Zapf, W. (1977c). Einleitung in das SPES-Indikatorensystem. In W. Zapf (Ed.), Lebensbedingungen in der Bundesrepublik. Sozialer Wandel und Wohlfahrtsentwicklung (pp. 11-27). Frankfurt: Campus. Zapf, W. (1984). Individuelle Wohlfahrt: Lebensbedingungen und wahrgenommene Lebensqualität. In W. Glatzer/W. Zapf (Ed.), Lebensqualität in der Bundesrepublik, (pp. 13-26). Frankfurt: Campus. Zapf, W. (1990). Einleitung. In WZB-AG Sozialberichterstattung (Ed.), Sozialreport 1990. Documentation of a workshop at the Wissenschaftszentrum Berlin für Sozialforschung. Working Paper P90-102, Berlin. Zapf, W. (1991). Modernisierung und Modernisierungstheorien. In W. Zapf (Ed.), Die Modernisierung moderner Gesellschaften. Proceedings of the 25. Deutschen Soziologentages in Frankfurt a. M. 1990. S. 23-39. Frankfurt/M. and New York: Campus. Zapf, W. (1993). Wohlfahrtsentwicklung und Modernisierung. In W. Glatzer (Ed.), Einstellungen und Lebensbedingungen in Europa. Soziale Indikatoren XVII (pp. 163-176). Frankfurt: Campus.
Nationwide General Social Surveys James A. Davis, Peter Ph. Möhler, and Tom W. Smith
1
Overview
Modern societies require advanced tools for monitoring social dynamics and social change, tools which must to take account of the complexity of social structures they seek to measure and model. Among the most useful tools for monitoring and understanding societies across time and across countries are Nationwide General Social Surveys (NGSS), such as the NORC General Social Survey (GSS) in the United States, the British Social Attitudes surveys (BSA) and the Allgemeine Bevölkerungsumfrage der Sozialwissenschaften (ALLBUS) in Germany.^ All surveys are, of course, representative of their populations. NGSS are significant, however, in several important ways: they are collective research, parts of methodologically rigorous, scientifically controlled research programs·, they are academically owned and independent, use and develop standardized measurements, and they are designed to be replicative over time and across countries. Together with other national surveys and panels, NGSS form an essential part of the general data base for social scientists and part of the independent academic social science infrastructure in modern societies. Moreover, in the three countries discussed here, NGSS are widely used for teaching survey methods and substantive research areas.
2
Origins of NGSS
Nationwide General Social Surveys in the United States, Great Britain and Germany were initially designed and funded for two reasons: Firstly, modern societies need increasingly to obtain scientifically reliable data by permanent survey monitoring (a topic closely connected to Social Indicators - see Noll & Zapf in this volume; Davis & Smith, 1992, Davis et al., 1993; Jowell et al., 1993). Secondly, the United States was the first country to institute a time series and in the early sixties, knowledge and skills in survey research within the major US research institutions resulted in a concentration of the funds necessary for carrying out large-scale representative surveys at a few very large research institutes, such as NORC in Chicago and the ISR in Ann Arbor. Thus suffering from a lack of funds while teaching at a smaller university, James A. Davis originated the idea of a shared, public-use survey, later to be called the "General Social Survey" (Davis & Smith, 1992). Davis managed to secure funding for a first, small-scale study, and from 1973 on the NORC General Social Survey has been carried out almost every year. In the early eighties, a change in the federal law regulating the activities of the German Census Office in the late seventies meant that German academic researchers lost access to almost all
18
James Α. Davis, Peter Ph. Möhler, and Tom W. Smith
Census Office data sets. Up to that point, almost all analyses of German social structure had relied on direct analysis of the three percent annual sample (about 600,00 cases) readily available from the Office. Deprived of this data base, structural sociologists were urgently looking for alternatives. The solution they found was based on cooperation: they combined their interests with other researchers and applied for a "Nationaler Sozialer Survey" (NSS), which, for obvious historical reasons, later changed its name to "Allgemeine Bevölkerungsumfrage der Sozialwissenschaften" (ALLBUS). Meanwhile, in Britain, Social and Community Planning Research (SCPR), already a well-established survey research institute, raised the funds for an annual general social survey called the "British Social Attitudes Survey " (BSA). Other countries followed the example of these three surveys, a recent addition being the Polish General Social Survey. The needs for replicative, systematic observation and the sharing of data are reflected in the special characteristics of NGSS, distinguishing them from almost all other surveys and polls.
2.1
NGSS as a Collective Research Program
In his introduction to the Guide to the NORC General Social Survey (Davis & Smith, 1992), Peter Marsden points to the once standard ideal of a social scientist: "In one way of conducting social science research, independent investigators take responsibility for all phases of a research project: After formulating ideas, they collect, analyze, and present evidence." (Davis & Smith, 1992, p. vii). Today this rather holistic and individualistic approach may be appropriate in certain areas, but not all. As in some of the natural sciences (meteorology or nuclear physics), social scientists make use of collective, multi-purpose and multi-user data bases. In many cases they have to rely not on a single source measurement, but on a multitude of observations from different sources. The modern multi-user and multi-purpose surveys are also used by many social scientists as a source of background information for their own specialized studies. One of the aims of NGSS was to propagate data as a collective good. Thus the technical advancement of NGSS as robust, reliable and valid scientific surveys representing the population of a nationstate is, of course, only half of the success story. The other half is strongly connected with the idea of data sharing. As the early founding of the Roper Center in the late forties suggests, the idea itself is not new, but it has taken almost twenty years to become popular. In Europe Stein Rokkan who played a major role in establishing European comparative social research was among the first to promote the idea of sharing data (at that time punch cards) in addition to the publication of tables (Rokkan, 1966). The term "data" is used here to refer to quantitative "machine-readable" data stored on keypunched cards, tapes, disk drives etc. These data are the numerical transcript of the respondents' answers to social scientists' standardized or open-ended questions. They are ready for input into statistical analysis packages like SPSS, NSDstat+, SAS, and so on. NGSS data thus differ from the famous trend reports by Ogburn (cf. Noll & Zapf, in this volume), for instance, or tabulations in statistical yearbooks, in that they represent the original measurements that can be re-arranged, analyzed and tabulated ad infinitum for secondary, tertiary, or whatever analysis. At the time when Stein Rokkan and others promoted the idea of data archives data stored in an archive were, in a way, "second hand" because scientists and research organizations tended to store data in an archive once they had no further use for them. Sometimes archiving involved a gap of several years between the collection of the data and making them accessible to the public. Researchers and especially students who did not have the support of major research centers had thus often to rely on "outdated, recycled data or dubious local samplings" (Davis & Smith, 1992: 2).
Nationwide General Social Surveys
19
There was an urgent needed were data as current and up-todate as possible, spanning a wide area of research topics, and with all the merits of modern nationwide survey research. The key contribution of NGSS was to make a collective effort to gather such data not only for a specific research project but for the scientific community as a whole - in other words, to provide the community with a "shared actual reference data set". The notion of „current data sets" implies that NGSS are distributed to the scientific community as quickly as possible after data editing and before any in depth analysis of the actual research team.
2.2
NGSS as Multi-user and Multi-purpose Surveys
The idea of developing NGSS as a collective good depended on the utility of the data produced for a wide area of social research and for as many scientists as possible. To cover the first requirement NGSS are designed as multi-topic surveys. In addition, they typically include an elaborate section on background or demographic variables needed for later analyses of the different topics. In contrast to the now all-time low duration of polls (in the United States down to 10-15 minutes, in Germany down to 30 minutes), NGSS are richer in content, in so far as they combine different substantive topics into a 60-90 minutes long interview. To be as economic as possible, NGSS tend to use only indicators already proven to be good, reliable and valid. The ideal is to rely on cumulative research in the scientific community and to select those items and questions with the highest scientific value. However, being an ideal this often needs pragmatic tempering. When the first NGSS, the NORC General Social Survey, was established in 1972/1973, there were not too many research areas with a common set of robust indicators. The solution taken was to have the research teams of the NGSS develop new indicators. Hence, the NGSS designed and tested many of today's standard items. This is even more true of items in the German NGSS. The researchers were both required to develop many items and to 'borrow' items from NORC questionnaires. Even with the long NGSS interviews, time is at a premium. To help overcome these limits, NGSS often employ multi-purpose indicators. Standard background variables such as education or occupation, for instance, are designed to serve both as dependent and as independent variables. The use of multi-purpose standard instruments mitigates the impact of an argument concerning "secondary data analysis", namely, the non-participation of individual researchers in the questionnaire design (Scheuch, 1973, Porst, 1985). Because the instruments are standard, they reflect the state of the art in the scientific community and, by virtue of being a standard, the results become comparable at least on the level of variables input into the statistical analysis. The high acceptance of NGSS data in the scientific community indicates that ingenuity and originality on the level of items and questions are less important than theoretically sound and methodologically rigorous data analysis. By widening the utility of items and covering quite a wide range of topics which can be cross-referenced in one survey, the multi-purpose function of the NGSS is maximized, as is their multi-user function. Researchers are, of course, not the only target group of NGSS. They are also designed to inform the general public and students, and they achieve this through extensive dissemination via reports and books, such as the American GSS Reports, the British Social Attitudes (BSA) series, and the ALLBUS "Blickpunkt Gesellschaft" (Focus on Society) series2.
20
James Α. Davis, Peter Ph. Möhler, and Tom W. Smith
What makes NGSS different from many other multi-user enterprises is their defined status as being in a sense the property of the scientific community only. The research teams of the NORC GSS and the ALLBUS report to the scientific community and their work is evaluated by specific academic bodies. Of course this also exerts a certain pressure on NGSS: if they fail to address topics relevant to the scientific community or fail to satisfy the rigorous methodological standards required of them, support, and with it funding, will disappear. This strong tie to an independent academia makes NGSS scientifically independent data sources, in contrast to government-controlled data (like the EUROBAROMETER series) or privately-owned data (such as the World Value Surveys).
2.3
Promoting Standardization
As said above, NGSS tend to use and replicate valid and tested questions and items in order to achieve comparative and cumulative research. It is important to stress this point. The issue of comparability over time and space is crucial for the observation of social dynamics is to implement. For example, the simple issue of grouping the age of a respondent can be done in many ways. However, even if certain standard groupings, such as using five-year intervals are preferred comparability over time may not be secured because some surveys start recoding respondents' ages at the age of 16, others at the age of 18, and so on. To avoid all of this, NGSS provide the age of respondents in years ungrouped. Then again, different ways of asking for this information also can affect the responses received: "How old are you (today)" vs. "In which year (month, day) were you born". Scientific rigor requires the slightly awkward second formulation. Without going into detail on the context effects of question wording (cf. Schwarz et al., in this volume) and problems of equivalence of items within and between cultures (cf. Alwin et al., in this volume), the simple example reflects the necessary standardization of measurement in surveys like NGSS. And in fact NGSS are very rigorous about carrying out replications as exact as possible, since they insist on the necessity of comparisons based on consistent measurement. One problem in this is that, standardization in the sense of too rigid rules, such as "use the same words, typeface, layout and if possible, same interviewer type" or „randomize everything", tends to freeze further development and needs to be counterbalanced by methodological experiments striving for better (but still comparable) measurement. In the last two decades, GSS, BSA and ALLBUS have all contributed extensively to methodological development to ensure better instruments as well as standardized measures (cf. GSS Methodological Reports, the SCPR Survey Methods Centre reports and the Methodenstudien of ALLBUS). Typically, methodological experiments use split-half tests or elaborate pretests, while in some cases they employ special studies, such as the additional panel in the 1986 ALLBUS (Bohrnstedt et al., 1987), special studies testing quota designs vs. random sampling in the NORC GSS (Stephenson, 1979), implementing Ersatz-Networks in the GSS (Burt, 1987), panel vs. cross-section designs (Lievesly, 1985) and scale development (Heath, in press) in BSA.
Nationwide General Social Surveys
2.4
21
Replication
To observe the dynamics of societies, notably social change, replicative measurements over long periods of time are necessary. NGSS are designed to serve the function of generating very long time series by means of replicative surveys (Porst, 1985: 24, citing Otis Dudley Duncan), which involves, of course, collecting data for a rather long time before a set of surveys for time series analysis is available. Yet, despite the many instances of survey research over the last hundred years or so (Bulmer, Bales & Kish-Sklar, 1991), the number of nationwide representative crosssections and time series surveys is still rather small. Data for Germany in the Nazi era, when no surveys were conducted and for the years after 1945, years for which almost all the data files have been destroyed (Merrit, 1980), are, for example, a sore lack. It is easy to overlook that the abundance of survey data today and the wide coverage of polls in the media are relatively recent phenomena. NGSS seek to provide for future generations of social scientists, politicians and publics the kind of data we would like to have today for comparing contemporary society with former times. Whether replication of questions is always possible or desirable has to be decided from case to case. The meaning of questions may change, for instance, and with this the facts measured may not be the same at two points in time; even if the questions point to comparable facts, the facts may have ceased to be relevant for society. Conversely, what seems outdated today may be of interest tomorrow (consider the revival of East/West items like "socialism is a good idea badly implemented in the past" used during the Cold War but dropped in the late seventies). Moreover, questions can prove to be flawed (cf. Schwarz et al., in this volume); the question is then, whether one should stick to them or not. To add to the complexity, different social dynamics are in effect at any given time in any given society (simultanity of different social time systems). Each different social dynamic requires ist own clock indicating the appropriate time span between two measurements. To set such clocks one needs methodological and substantive reasons for selecting the specific time parameters. Too short time spans between replications may lead to fluctuation noise in the data, hiding the general trend. Too long intervals may result in a stable picture which fails to reflect the actual dynamics of the processes.3 Up to now little has been written about how to achieve replication other than in the "literal sense".4 Compared to the long tradition in official statistics to change the base of time series indicators as and whenever needed, academic social science has perhaps appeared to be veiy rigid. This, in fact, is not the case. Official statistics are governmental data collections which, as such, have resulted in many cases in "politically destroyed time series". Moreover, official statistics, at least in Germany, are by definition "error-free" - a pristine state achieved by an editing process best labeled "data cleansing". Academic survey research, on the other hand, is not in a position to define replication adjustments by definition. Instead, the rules of scientific sound proofs require careful testing of the effects of changes in measurements. This sort of rigidity is, in our view, desirable, despite being often time consuming and costly. Even so, NGSS research teams are aware of the problem of literal replication and the need for better decision rules for replications. It is, for example, now standard to test for context and other effects (cf. Schwarz et al., in this volume). In addition, experiments are planned to investigate replication and comparison systematically (Davis et al., 1993).
22
James Α. Davis, Peter Ph. Möhler, and Tom W. Smith
2.5
International Comparison with NGSS
It is sometimes suggested that cross-national comparison is little different from the comparison of data from two sample surveys within a nation-state (Kish, 1994). However, international/intercultural survey research poses a number of substantive and pragmatic problems above and beyond those posed by within-nation comparison (Scheuch, 1968, Przeworski & Teune, 1970, Scheuch, 1973, Davis & Jowell, 1989, Oyen, 1990, Alwin et al., in this volume). Most notable is the problem of variables versus systems, i.e. the simplistic use of the proper noun "NATION-X" as a variable in the analysis: "In our view, the crux of the problem lies in the status of proper names of social systems within general theory.... The status of social units, however, is not the same as the status of variables" (Przeworski & Teune, 1970:8). Reflecting the many institutional and cultural phenomena embedded in the nation-state as the most comprehensive unit, the concept of a Nationwide General Social Survey implies two goals: First, to represent the nation-state and its political boundaries; second, (and theoretically more important) to represent the cultural, legal and economic boundaries that coincide in many cases with the boundaries of the nation-state. To date there is neither a methodological nor a substantial concept for an above-nation sampling design. Thus all NGSS rely on standard national sampling frames; the ALLBUS, for instance, relies on a sampling frame derived from constituencies (Arbeitsgemeinschaft ADM-Stichproben, 1994). The nation-state continues to be the largest comprehensive unit for researching public opinion on an individual basis. With the advent of the European Union, this might change; at this point in time, however, the nation-state remains the top-level unit for sample surveys.5 From this it follows that for the time being Przeworski & Teune's point concerning systems (nation-states) versus variables remains valid. Another as yet unsolved problem is that of international/intercultural equivalence (see Alwin et al. in this volume). Despite early optimism that questionnaires equivalence could be reduced to translation equivalence and that this could be negotiated by means of procedures such as "back-translation" (Scheuch, 1973:220), the search for better methods continues. As with replication, further methodological research is urgently needed here. Despite such methodological considerations, cross-national comparison and across-time comparison are among the most powerful tools for understanding and explaining societies. NGSS achieve a cross-national perspective by building up international networking and modes of cooperation, following the examples of numerous internationally-renowned comparative studies of the last four decades (e.g. the World Fertility Surveys, Almond and Verba's five-nation study (Almond & Verba, 1963), the two waves of the Political Action study group (Barnes et al., 1979, Jennings et al., 1989), the World Value Surveys, the EUROBAROMETER surveys of the European Commission, and, from 1985 on, the surveys of the International Social Survey Programme (ISSP)). As early as the second ALLBUS (1982), the American and German NGSS devoted a section of their questionnaires to a common set of items. And again, one year after the first BSA, James A. Davis and Roger Jowell agreed to seek multilateral instead of bilateral cooperation and discussed a bolt-on supplementary questionnaire designed specifically for the purpose of crossnational comparison. SCPR raised funds for the first meetings and invited the representatives of ZUMA and the Australian National University to join the group. All three NGSS, together with the Australian National Social Science Survey, agreed to start a continuous and truly international/intercultural collaboration in 1985. They organized the ISSP to "(a) jointly develop topical modules dealing with important areas of social science, (b) field the modules as a 15minute supplement to the regular national surveys (or as a special survey if necessary), (c) include an extensive common core of background variables, and (d) make the data available to
Nationwide General Social Surveys
23
the social science community as soon as possible." (Davis & Smith, 1992:27, cf. Davis & Jowell, 1989:3).6 From the four-nation study first fielded in 1985, the ISSP has developed into a 22nation survey, extending the idea of NGSS, albeit on a modest scale, to many countries. 7
3
Trends and Perspectives
The visions of continuous measurement of social dynamics of Duncan, Rokkan and many others have not been realized in several respects in NGSS, and, indeed, a number of challenges need urgently to be met. Response rates, for example, are declining (or stabilizing) to around 50 to 60 percent in Germany and a number of other countries. Fieldwork quality has been lowered for some standard surveys by allowing very loose rules for random walk procedures. Telephone interviews are increasingly replacing face-to-face interviews, bringing with them needed cuts in interview length, but also changes in question design. While 60 to 90 minutes-long face-to-face interviews with strict sampling designs are now typical for NGSS, the surveys are under tremendous economic (money and time) pressures. Maintaining their very high standards will thus be a major methodological and technical challenge for NGSS. In other words, the issue is whether sound and necessarily expensive methodology can survive against unsound and cheap (in both meanings of the word) alternatives. The time series of the NGSS, now long enough for some substantial time series analyses, can help solve this problem. They should improve the methodological debate about keeping indicators comparable over time and space. It is easy to show that such analyses point to the need for high quality, well-designed and well-documented surveys, of which the NGSS are an outstanding example. NGSS have, of course, also to keep up with methodological and substantial developments and to recognize and rectify the influence of outdated perspectives in how the surveys are conducted - a typical example being the male orientation of many questionnaires, sometimes called „Parsonian view of the society". However, the major problem will be to keep the time series running, in the face of requests to include new and „fascinating" topics. In sum, the NGSS will need to achieve a balance between the needs of long-term observation and keeping up-to-date on current developments in academic thinking or societal change. In contrast to the volume of survey data reported daily in the media (and the low standards of reporting), a better understanding of survey data should be part of everyday knowledge. Today, only a small section of the public, journalists included, is able to understand reported survey data fully and to look critically at a given interpretation of data. This can be taken as evidence that survey data are used as a means of illustration rather than as a tool for critical social observation. In order to increase the number of critical readers of polls coverage, it would be necessary to include survey research, its methodology and its technology in the standard curricula in schools and colleges. The Scandinavian nations have demonstrated that NGSS can be used successfully in school curricula, especially since computers have become common as teaching aids. Alongside traditions in survey research and the organization of university and other academic research (for instance the French model of huge national research institutes like the CNRS versus the American model of institutes for social research as part of the university system itself), the successful implementation of a NGSS is a good indicator for the general standing of social research in that country. As the implementation of NGSS requires a financial and hu-
24
James Α. Davis, Peter Ph. Möhler, and Tom W. Smith
man resources investment comparable to long-term projects in the natural sciences. Consequently, it takes a good portion of trust in the cost-benefit ratio of the NGSS in a given society to get a project of this magnitude off the ground. Given that modern societies need more and better data to become better analysts of themselves, more countries are likely to consider NGSS as an instrument for producing high quality, scientifically controlled data for across-time and across-country comparisons. One real methodological challenge will be to make NGSS comparable across many nations without imposing a specific (probably Western) perspective on any country. NGSS can doubtless learn from and hopefully adapt to perceptions in countries culturally very different from the pioneering NGSS. This, perhaps, will be the most interesting challenge for the future.
Notes lr
rhe NORC GSS is funded by the National Science Foundation, the ALLBUS is jointly funded by the Federal and State governments, core funding of BSA comes from the Sainsbury Family Charitable Trusts. 2 The GSS Report Series is published by NORC in Chicago, the British Social Attitudes Series is published by SCPR in London, the ALLBUS Blickpunkt Gesellschaft Series is published by ZUMA in Mannheim. In addition each survey is documented in a codebook distributed by the respective data archives. Cumulative codebooks spanning a period of time are also available. 3 To make this point clear, consider the intervals of measuring a person's pulse: in most cases measuring once a day is absolutely sufficient. However, in the operating theater, the pulse of a patient is measured continuously. 4 The term "literal replication" may be misleading implying a stable, never changing phenomenon, namely the „black on white spots" called „characters". According to linguists this is at least an over-simplification. More likely it is a sociological fiction. ^One argument in favor of our prediction is the current design of the EUROBAROMETER studies as the official studies of the European Commission based on an additive integration of individual "national" sample surveys. ^While today the institutions supporting the ISSP are important, one should not forget to name those who founded the ISSP: Roger Jowell (SCPR), Jonathan Kelly (Australia), James A. Davis and Tom Smith (NORC) and Manfred Küchler (ZUMA). 7 Members of the ISSP as of 1994 are: Australia, Austria, Bulgaria, Canada, Czech Republic, Germany, Great Britain, Hungary, Ireland, Israel, Italy, Japan, the Netherlands, New Zealand, Norway, the Philippines, Poland, Russia , Slovenia, Spain, Sweden, and the United States. Special sub-samples are fielded in Northern Ireland and in East-West Germany; in Canada the survey is fielded in Canadian French and Canadian English; in the Philippines the survey is fielded in four different languages. The combined data files and codebooks are distributed by the Central Archive in Cologne, Germany.
Nationwide General Social Surveys
25
References Almond, G. & Verba, S. (1967). The Civic Culture. Boston, MA: Little Brown. Arbeitsgemeinschaft ADM-Stichproben & Bureau Wendt, (1994). Das ADM-StichprobenSystem (Stand 1993). In S. Gabler, J.H.P. Hoffmeyer-Zlotnik & D. Krebs (Eds.). Gewichtung in der Umfrageforschung (pp. 188-202). Westdeutscher Verlag: Opladen Barnes, S., Kaase, M. et al., (1979). Political Action - Mass Participation in Five Western Democracies. Beverly Hills/CA: Sage. Bohrnstedt, G.W., Müller, W. & Möhler, P.P. (Eds.) (1987). Special Issue of Sociological Methods and Research on An Empirical Study of the Reliability and Stability of Survey Research Items, 3 New York: Sage. Bulmer, M., Bales, K. & Kish-Sklar, K. (Eds.) (1991). The Social Survey in Historical Perspective 1880 -1940. Cambridge: Cambridge University Press. Burt, R.S. (1987). A Note on the General Social Survey's Ersatz Network Density Item. Social Networks, 9, 75-85. Davis, J.Α., Möhler, P.P., Smith, T.W. & Harkness, J.A. (1993). MINTS - Research into Interculturall Methodology of Comparison. Proposal for the Humboldt Foundation, ZUMA Mannheim and NORC Chicago. Davis, J.A. & Jowell, R. (1989). Measuring National Differences. In Jowell, R., Witherspoon, S. & Brook. L. British Social Attitudes - Special International Report. Aldershot: Gower. Davis, J.A. & Smith, T.W. (1992). The NORC General Social Survey - A User's Guide. Newbury Park: Sage. Heath, A. F., Evans, G. & Martin, J. (in press). The Measurement of Core Beliefs and Values: the Development of Balanced Socialist/Laissez Faire and Libertarian/Authoritarian Scales. In: British Journal of Political Science. Jennings, K.M. et al. (1990). Continuities in Political Action. New York: de Gruyter. Jowell, R., Möhler, P.P., Ester, P., Ward, C. & Calvi, G. (1993). Study Programme for Quantitative Research (SPQR). Proposal for a Social Science Network to the European Commission's Human Dimensions Programme, ZUMA Mannheim. Kish, L (1994). Multipopulation Survey Designs. Five Types with Seven Shared Aspects. International Statistical Review, (in press). Lievesly, D. & Waterton, J. (1985). Measuring Individual Attitude Change. In: Jowell, R. & Witherspoon, S. (Eds.) British Social Attitudes - the 1985 Report. Aldershot: Gower Merritt, A. & Merrit, R. (1980). Public Opinion in Semisovereign Germany · The HICOG Surveys 1949-1955. Urbana/Ill: The University of Illinois Press. Oyen, E. (Ed.) (1990). Comparative Methodology - Theory and Practice in International Research. Newbury Park: Sage. Porst, R. (1985). Praxis der Umfrageforschung. Stuttgart: Teubner. Przeworski, A & Teune, Η (1970). The Logic of Comparative Social Inquiry. New York: John Wiley. Rokkan, S. (1966). Data Archives for the Social Sciences. Paris: UNESCO. Scheuch, E.K. (1973). Entwicklungsrichtungen bei der Analyse sozialwissenschaftlicher Daten. In R. König (Ed.). Handbuch der empirischen Sozialforschung 3rd ed. (pp. 161-226). Stuttgart: F. Enke. Stephenson, C.B. (1979). Probability Sampling with Quotas: An Experiment. Public Opinion Quarterly, 43, 477-496.
Measurement in Multi-National Surveys Duane F. Alwin, Michael Braun, Janet Harkness and Jacqueline Scott
1
Introduction
While the use of intercultural comparisons as a general research strategy has a century-long tradition, multi-national surveys are a relatively recent development, especially those including a large number of countries. Compared to other kinds of research, multi-national surveys offer increased possibilities for the advancement of the social sciences, but they also present a number of additional difficulties. In this paper we first briefly assess the role of multi-national surveys within the broader framework of comparative research in modern social science. Second, we present a discussion of a number of issues surrounding what is often considered to be the critical problem faced by comparative researchers using survey data - the problem of functional equivalence. Several different rationales are given for the use of multi-national surveys. They are, for example, viewed as indispensable for drawing comparisons across national contexts (Kish, in press), and they are valuable for developing and testing general propositions about the nature of societies and social processes (Kohn, 1987). When the focus of comparative research is on testing general theories and hypotheses, e.g. on whether predictive relationships found in one country hold up in other countries (Kohn & Slomczynski, 1991), nation is often treated as a "context" variable and may not be of interest per se (Kohn, 1987; Scheuch, 1968). In this instance, nations may also be classified according to a set of analytic variables, then used as explanatory and/or control variables (Rokkan, 1964; Przeworski & Teune, 1970), making nations the "unit" of analysis (Kohn, 1987). More often, however, national differences are themselves the focus of interest; nation is then used as an "object of analysis" (Kohn, 1987). When the aims of research are essentially descriptive, say in assessing national differences, a very high premium is usually placed on strict replication of methods in sampling, mode of administration, items asked, question wording, and so forth. However, when the aims of comparative research are more theoretical, in the sense of establishing the generality of sociological propositions, while equal emphasis is placed on rigorous methodological standards the concern is often on conceptual rather than literal replication of methods and measures (Lykken, 1968). The relationship between both notions of replication is quite complex: On the one hand, literal replications may or may not be conceptually equivalent. And, on the other hand, conceptual equivalence can be realized by procedures which constitute literal replications and by others which do not. Both strategies for meeting the criterion of comparability - conceptual and literal replication - can be regarded as "prototypes", since there are perhaps no "pure" types, but only exemplars that seem to typify one extreme or the other. The conceptual approach gives primary attention to (a) the precise definition of concepts in the development of procedures and measures
Measurement in Multi-National Surveys
27
(such as survey questions), (b) evaluating the extent to which these concepts are applicable to each national context, and (c) designing appropriate within-nation procedures and measurement strategies. A more literal approach to establishing conceptual equivalence relies on the "sameness" of procedures and questions. Studies such as the International Social Survey Program (ISSP) 1 emphasize the literal replication of questions concerning attitudes and beliefs (cf. Davis et. al. in this volume). While the literal approach may also constitute a conceptually equivalent replication, this is not necessarily the case. The alternative approach to literal replication asks different questions in different countries with a view to capturing a single common underlying concept. The problem with this procedure of maintaining "conceptual equivalence" is one of constructing items which are different with regard to their manifest content but functionally equivalent with regard to the theoretical dimension. A comparison of marginals also becomes impossible, but as Küchler (1987) points out, the relationships to third variables can still be compared across countries. On the other hand, any comparison of items which are literal replications but not conceptually equivalent poses severe problems of interpretation. As a general rule, when national differences are explored either through the univariate distributions, multivariate analyses or LISREL-type models (see Alwin, 1992), the assumption is that variables have basically the same meaning in the different national contexts. This assumption, however, is often problematic, as discussed below.
2
Comparative Research in the Social Sciences
2.1
Comparative Studies in the Past and Present
As Kish (in press) observes, comparative research is not new: "multinational comparisons ... have been made since time immemorial", certainly going as far back as Aristotle's comparison of the constitutions of the Greek city-states (Ross, 1964). And although there is recurrent pessimism regarding the results of comparative analysis in the social sciences (e.g. Form, 1979), comparative orientations have been central to many key sociological studies, as, for example, in the writings of Marx, Weber and Dürkheim (Rokkan, 1964). More recent decades have seen a rapid increase in the number of multi-national studies launched both by official international agencies (e.g. UNESCO) and by academic researchers. Multi-national studies have been undertaken in the study of fertility, health, education, income, social mobility, class structure, political behavior, the use of time, and with regard to perceptions, beliefs, values and attitudes in a variety of substantive fields of the social sciences (see review by Kish, in press). Most of these cross-national studies reflect survey-oriented approaches, studying national/cultural differences by asking people questions about their attitudes, beliefs and behaviors. Other studies are oriented more toward aggregate-level data, assumed to reflect cultural values and behavior. Even among those using surveys, there is considerable variation in the nature of multi-national surveys, not only in the topics on which the surveys focus, but also in the nature of the design (Kish, in press). Some studies are organized as ongoing replicative endeavors, attempting to gauge changes over time as well as national differences (see Davis & Jowell, 1989). Some include numerous countries while others are much smaller in scale; there also is considerable variation in how questions are put to respondents. This variation, along with the vastness of the ever-expanding international survey data base, makes it virtually impossible to draw any general characterisations regarding the use of surveys in multi-national studies. Regardless of how optimistic or pessimistic one is about the ultimate
28
Duane F. Alwin, Michael Braun, Janet Harkness, and Jacqueline Scott
success of comparative research strategies, the sheer volume of studies and their increasing availability for secondary analysis creates a situation in which the use of multinational surveys is no longer the exclusive domain of international or "area" specialists. One upshot of the increased availability of data is that many analysts have not actually participated in the collection of data and have no primary experience of the special problems involved in the design and implementation of comparative measures. In addition, secondary researchers often lack the close working relationship with colleagues in the countries involved which could provide useful information on methodological peculiarities of the national surveys and the interpretation of results.
2.2
Research Strategies
With respect to issues of measurement, one critical feature of multi-national survey research strategies is the distinction referred to above between nation as "context" or "unit" on the one hand and nation as "object" of analysis on the other (Kohn, 1967). In the following section we take examples from existing research to highlight this distinction and to illustrate the implications different approaches have for measurement strategies. Cross-national comparisons are often used to study the importance of changing socio-economic processes on the development of human qualities and capacities. One example is Inglehart's (1977, 1990) important investigation of the role of socio-economic experience and the development of cultural value priorities. Based on a Maslowian (1954) hierarchy of needs, the basic idea Inglehart is interested in is that, among the variety of needs people have, most of their attention is given to those they believe are in short supply. To test this idea, Inglehart contrasted birth cohorts born before and after World War II, working on the assumption that those growing up in pre-war Europe experienced socio-economic scarcity and would therefore be likely to emphasize economic and physical security. Those growing up in post-war Europe, on the other hand, would have experienced prosperity and thus be more predisposed to give high priority to non-material goals, such as political freedom and the quality of life (Inglehart, 1979). Another well-known example of cross-national research on socio-economic processes is Kohn and his colleagues' investigations of the extent to which job conditions lead to changes in workers' personalities, value orientations and attitudes (Kohn & Slomczynski, 1990). One prominent hypothesis in this research has been the universality of certain causal relationships between work and personality in industrialized societies. Kohn and Slomczynski (1990, p. 3) argue that people's social class positions and their place in the stratification order affect personality mainly through their influence on conditions of work and other proximate conditions of life. Changes in the complexity of work, for example, should be associated with consequent changes in intellectual complexity. These causal relationships have been explored in the US, Poland, and Japan - three countries quite diverse culturally, politically, and linguistically. Such examples of research seeking support for causal relationships between theoretical variables in diverse national settings turn on the idea that if similarities of such relationships are found, confidence can be generated for the generality of the theoretical hypotheses, "not only beyond the boundaries of either nation but also beyond either type of society" (Kohn and Slomczynski, 1990, p. 55). If, on the other hand, differences are found, characteristics of the societies chosen for study will help in understanding them and will caution against making general conclusions. In each of these examples the literal replication of measures is less important than the fit between the concept and indicators within a particular country. For example, it is well known
Measurement in Multi-National Surveys
29
that socio-economic indicators cannot be literally replicated or even easily equated across national contexts. Some success has been achieved in attempting to measure cross-national similarities in such things as occupational prestige (see Treiman, 1977). Nevertheless, the great advantages of available international occupational classifications (e.g. ISCO-88, see International Labour Organization) notwithstanding, it is also well known that national occupational classification systems are quite varied, and for some purposes serious disparities exist (Erikson & Goldthorpe, 1992). It is also very difficult to arrive at any equivalent classification scheme for measuring education (Braun and Müller, forthcoming), since educational systems are quite different, tied to different labor markets, and tied in different ways. And finally, even other indicators of socio-economic position, such as income level, are difficult to equate, because of variations both in units and the meaning of income cross-nationally. Common approaches to comparing income distributions cross-nationally use percentile or stanine scores (e.g. Easterlin, 1975; Duncan, 1975) but even these are affected by differences in the coarseness of measurement categories. There are other areas, such as the study of poverty, in which it is very difficult to equate measures literally across national contexts because to do so would violate the theoretical adequacy of the indicators for representing within-nation variability. Economic systems are so vastly different that it is difficult, if not impossible, to arrive at a common definition of poverty, although some creative approaches are available (Duncan et al. 1993). Another perennial problem is in gauging political orientations cross-nationally. Take the concept of political liberalismconservatism, for example. In Western countries, especially in the United States, the notion of a political liberal involves that one favors some nationalisation of goods and services and supports the idea that a certain degree of welfarism is essential to a just society. A political conservative opposes too much government involvement and prefers the ideals of "free enterprise" and "rugged individualism" over social welfare supports for individuals. In the contemporary republics of the former Soviet Union, however, the notions associated with what is termed "politically liberal" and "politically conservative" are virtually the reverse of this. Even within the same nation over time, the changing nature of the political environment necessitates an emphasis on conceptual rather than literal replication (Alwin, Cohen & Newcomb, 1991, p. 76). One example of research using nation as the unit of analysis is the investigation of national differences in attitudes towards women's dual role in the family and workforce. One factor that might help explain cross-national differences in the acceptability of maternal employment is the availability of childcare or parental leave policies (Alwin, Braun & Scott, 1992). The analytic strategy of using nation as a unit involves classifying countries according to the potentially influential socio-institutional characteristics, such as, in this case, the availability of alternative childcare or parental leave policies. Each nation might be represented by a set of dummy variables and the dependent variables would be measured as similarly as possible across national contexts. In some cases, it may also be desirable (and possible) to directly measure the analytical variables that differentiate nations and enter these directly into the analysis, instead of countries as such (Rokkan, 1964; Przeworski & Teune, 1970). Kohn (1987), however, argues that social research is not yet ready for this approach and suggests researchers should concentrate on analyzing nation as context. Thus in studies where nation is used as "context" or as "unit", it may in fact be advantageous for functional equivalence of indicators to rely primarily on conceptual replication rather than any sort of literal equivalence. In other words, it may be better for this type of research to optimize the within-country fit between concept and indicators. The suggestion is that research using nation as context should place a primary emphasis on conceptual replication, whereas research which involves using nation as the object of analysis should emphasize more the literal replication of measures.
30
Duane F. Alwin, Michael Braun, Janet Harkness, and Jacqueline Scott
The distinction between the "nation as object"- and "nation as context"-approaches should not be pushed too far, since in practice the two often merge. Nevertheless, the distinction is useful for highlighting a difference in emphasis: for some national comparisons a literal replication of measures takes precedence over the optimal operationalization of a concept within any particular nation. However, if the functional equivalence of an indicator is questionable, then the comparison of literally replicated measures loses any value, even if the focus is on comparing nations in their own right. Thus the common practice among researchers of using survey questions as if the variables have basically the same meaning in the different national contexts raises a number of problems - to which we turn shortly.
3
Problems of Functional Equivalence of Measurements
The conventional methodological literature on survey techniques has, in the past, tended to stress the impact of question context on the respondents' interpretations and the resultant responses to survey questions. However, in internationally comparative research other contexts such as the socio-economic or the linguistic environment matter, too. The "generalized" contexts constituted by the objective social structure of the countries (political system, economic situation, social inequalities, provision of public child care), not to mention language and culture, pose a problem for validity and thus for the comparability of information gathered in different nations. One common understanding of functional equivalence of items (questions) is when questions asked in the different countries stand in identical relationships to the intended theoretical dimensions. This is both a prerequisite for theory testing and for comparing nations as units. Yet, functional equivalence in terms of text equivalence is more complicated than has been assumed in many studies. Our discussion of equivalence concerns both attitudinal and demographic variables. It will become evident that the underlying problem is not just one of language and in seeking equivalence the different objective realities in the countries involved must be also taken into account. The issue of functional equivalence and the criterion of comparability are not unique to multinational surveys. Indeed, as Kaplan (1964) pointed out, the essence of measurement lies in the principle of "standardisation," that is, the principle that units of magnitude have a constancy across time and space. It would be impossible to make comparisons across units of observation if one were not reasonably certain of such constancies. In survey practice, a practical criterion is to develop questions and response categories assumed to have a common meaning across respondents (despite it being widely recognized that various categories of respondents may take the question to mean very different things). So, even the routinely assumed within-survey functional equivalence may more often be an unproven assumption than an established reality. Differing interpretations of survey questions or differing applications of response scales can be conceptualized in terms of survey measurement errors, both random and systematic. Multi-national survey measurement, as we have noted, adds several additional dimensions to this problem.
3.1
Measurement Approaches
The difference between conceptual and literal replication of survey measures can be illustrated using the example of measuring educational attainment, which is fundamental to comparative
Measurement in Multi-National Surveys
31
analyses of social change. There are several issues here: a) the similarity of concepts of education used by sociological theories in multi-national contexts, b) the impossibility of directly comparing measures of education because the systems/institutions are different, and c) the related inability to use the same survey items across national contexts. In the US researchers typically ask for the "number of years of schooling completed"; in Great Britain it is the "age at which the person left school", and in Germany the questions ask about diplomas, certificates and tracking. A comparison of the educational categories between these countries is thus not easy, as the educational systems are structured in an entirely different way. It makes no sense to regard educational variables as comparable in any purely mechanical way (Scheuch, 1968). Instead, the different social realities and the different institutional contexts have to be taken into consideration. How this can best be achieved depends largely on the dimensions on which the research is going to focus. In analyzing the effects of the educational system on the perpetuation of social inequality, the systems of education have to be gauged in their institutional contexts. The CASMIN project constructed a comparative scale of educational attainment that is used in the analysis of mobility (König & Müller, 1986; König, Lüttinger & Müller, 1988), but inevitably such scales only permit very rough distinctions between levels of educational certification. Differences in respective social systems, however, affect more than the measurement of socio-demographic variables. They also impact on possible interpretations of behavioral and attitudinal questions. One kind of overt behavior might have an entirely different significance under different societal circumstances. For example, a question on respondents' participation in demonstrations is useless as a comparable measurement of non-conventional political participation if one country has a democratic political regime and the next a dictatorship. Comparisons become even more confusing if, in the dictatorial regime, an obligation to participate in official mass demonstrations is institutionalized. Thus attempts to compare East and West Germany on the basis of participation in demonstrations could lead to the mistaken conclusion that non-conventional political participation was more common in the East. Moreover, former members of the ruling party in the GDR (Sozialistische Einheitspartei Deutschland) are over-represented among those respondents who report having participated in demonstrations. It would seem that respondents include official demonstrations in the former GDR in their responses - but these demonstrations can hardly be viewed as non-conventional. On the other hand, former SED members were also over-represented in the anti-government demonstrations that took place after reunification, and these protest marches could indeed be interpreted as non-conventional participation. Thus comparisons are not only problematic because of cross-national differences, but because socio-political regimes differ within nation, across time. A similar problem occurs with regard to attitudinal questions. Consider, for example, comparing differences in national pride across a specified range of topics. Given the question whether a person takes pride in their country's sporting achievements, we note that respondents may lack pride (not be proud) for at least two reasons. Either the country is unsuccessful in sports or sporting achievements are not be considered a valid source of pride. Similar ambiguities abound in other areas. For instance, Germans might be less proud than Americans of their armed forces either because their military power is not commensurate with the German economy, or because Germans are more wary of taking pride in military strength. Extensive probing would be required to disentangle these interpretations. Moreover, interpretations may also vary across subgroups within countries, which further exacerbates the problems of cross-national comparisons. The sources of national pride are likely to be very different for the Welsh and the English (indeed a source of national pride to the Welsh may be that they are NOT English), nevertheless both are considered "British" for the purposes of cross-national comparison.
32
Duane F. Alwin, Michael Braun, Janet Harkness, and Jacqueline Scott
A related issue - attitudes to "foreigners"- illustrates these problems forcefully. This is not only because different groups of foreigners are present in different countries, but also because the circumstances under which they have come and the conditions connected to their sojourn are usually not identical. Thus a question on whether foreigners should be allowed to buy real estate or not may be interpreted in different ways and understood, for example, to refer to people who work in the country but do not have citizenship, foreign non-residents who would like to buy a vacation home, or foreign companies wanting to invest for profit. Moreover, the situation might be far more salient in one country than in another. Thus Norwegians may appear less tolerant than Germans in their responses to an item like this because they could think of the foreigners they expect to buy summer residences in large numbers. Germans are less likely to associate the question with a concrete group of prospective house buyers. Different social realities impact not only on the understanding of the question but also on the meaning of the results. Thus in order to evaluate the low support expressed by Germans of any increase in expenditure on education, account must be taken of the actual (and in this case high) expenditure on education. Otherwise the false conclusion could be reached that Germans do not see education as a high priority. Similarly, in Sweden people may disagree that the government should do more to ensure that fathers are able to take time off following the birth of their child, not because of any lack of egalitarianism, but rather because the legislative endorsement of paternity leave is already in place. The challenge of designing functionally equivalent measures is how to capture a comparable situation in the face of very different social realities.
3.2
The Impact of Social Change on Replication Issues
The main aim of many of internationally comparative projects is the study of social change, pursued through a series of replicative surveys. There are good reasons for not changing the questionnaire and the constituent items, because that would make results less comparable across points of time and destroy a time series. However, as suggested earlier, one factor which interferes with strict replication is the change of social reality itself, which may cause a depreciation of the questions used. Though change is what we want to monitor with replicative surveys, too much change might make old questions obsolete, as the original issues alluded to in the items lose importance and new social problems require new or additional items. The replication of old questions might become impossible because of a change in the way they are understood in every-day communication, or because the old formulations might become unacceptable in a changed environment (terminology for minority groups or geographical/ administrative regions). Taking into account new structures - the new Germany, the former Yugoslavia - shifts in a sense the object of analysis , too. Just as the range of responses appropriate to a particular survey question and the phrasing of the items are likely to change as social norms change, so the appropriate range of responses is likely to differ across countries in which social change is progressing at a different pace. For example, on an ISSP item regarding whether maternal employment has a negative impact on young children, the question format precludes the feminist position that maternal employment may have a beneficial impact. "Strongly disagreeing" that a pre-school child is likely to suffer is not the equivalent of asserting that a child is likely to benefit if the mother works. Yet when the question was first framed in the 1960s (by the American feminist scholar Alice Rossi), to have asked about the positive advantages of maternal employment would have seemed silly. The dilemma facing survey researchers is whether to abide by the "dead hand of history", keeping the question wording and response options the same, in order to obtain comparable trend data, or
Measurement in Multi-National Surveys
33
whether to track living history, and permit the issue to be re-formulated for the 1990s. Research projects such as ISSP adopt what seems a reasonably sensible compromise strategy by keeping the traditional items but supplementing them with "modern" ones with the option to drop - or replicate in longer intervals - some of the traditional ones after a new time series has been established.
3.3
Translation of Survey Questions
Sampling strategies, data collection strategies, coding and analysis procedures on cross-cultural surveys are kept uniform by being the same. We referred to some problems arising with this in foregoing sections. As soon as more than one language is involved, the option of "same procedure" is not available. The issue then becomes how to produce questionnaires in different languages which are equivalent instruments. Our discussion of the issues involved here must perforce be brief. A number of procedures are commonly used for dealing with the translation issues in surveys, notably back-translation and de-centering (Brislin, 1970, 1980, p. 43If.; 1986, p. 159f). When first advocated for surveys, these were seen as methods for developing the master or source language questionnaire (SLQ) and the target (i.e. translated) language questionnaire (TLQ). The basic idea was that translation of a draft SLQ provides information about the suitability of the items, the formulation of the items, and the adequacy of the translation of the items in the target language. Adjustments resulting from this information, i.e. modification of SLQ and TLQ, lead to items in the source language and in translation considered to be functionally equivalent in sociological terms and to be suitable for the cultures and languages involved in the study. De-centering repeats the back-translation process, the notion being that the end product in the source language after a series of back-translations retains a kernel meaning valid for source and target language, which could then well constitute the final SLQ version. The growing need for translations in survey work is matched by a renewed interest in how to go about it, as reflected in the AAPOR 1994 session on questionnaire translation. Misgivings about the usefulness of back-translation on the part of survey research experts are not new (Rokkan, 1964; Scheuch, 1968). Our own experience in developing International Social Survey Program (ISSP) and European Community Household Panel (ECHP, Scott, 1993) questionnaires lets us echo these misgivings (Braun, 1993, Braun & Harkness, in preparation; Harkness, 1992, 1993). From a linguistic standpoint, too, back-translation and de-centering raise many more issues than they claim to solve (Harkness, in preparation). In the decades which have passed since these procedures were launched in the survey world, developments in survey strategies and in translation studies mean that new possibilities are available to deal with SLQ and TLQ development and translation. Recent developments in linguistics and translation studies, for example, give greater emphasis to the discourse aspect of language - language as communication. And while the notion of fixed meanings for words has finally been shelved, a number of strategies are available to explain how, in a given context, we understand what we understand. These ideas could be of benefit to monolingual and multi-lingual questionnaire development (Harkness, in preparation). Checking the information provided in reports on crosscultural studies published in periodicals, we have found sometimes only scanty information about the procedures employed to ensure equivalence. References to back-translation, however, suggest that for some the procedure has somehow achieved the status of an adequate and suitable procedure for testing TLQ (for details see Harkness, in preparation). Other projects reporting in detail on SLQ and TLQ development (and difficulties met) testify to the energy and ingenuity of researchers in pursuing what appears to be a grail quest, but they also reflect the
34
Duane F. Alwin, Michael Braun, Janet Harkness, and Jacqueline Scott
absence of generally available, efficient, relatively speedy and affordable/economical procedures and strategies (McKay & Lavallee, 1993; Hayashi et al, 1992). One obvious consequence is that the quality of questionnaire development and/or questionnaire translation stands and falls with the perseverance, expertise, timetable, and budget of the researchers involved. The need is growing for cooperation on working towards a set of semi-streamlined procedures to reduce the problems involved in producing a multi-cultural, multi-language instrument. We take up various aspects of proposals towards this goal elsewhere (Braun & Harkness, in preparation; Borg & Harkness, in preparation; Harkness, 1994). In view of the space available, we will not reiterate criticisms and proposals made by other researchers (Brislin, 1986; Küchler, 1987; Scheuch, 1968) nor really try to outline our own; instead, we simply point to a number of general factors hitherto neglected in the discussion of questionnaire translation: 1) The notions of equivalence in the field of translation studies differ from those in survey research even when the terminology is the same (conceptual equivalence, functional equivalence) and some consensus needs to be reached. 2) Insufficient communication between survey researchers and translation studies experts has resulted in a situation in which neither side appreciates the requirements and limitations holding for the other. Researchers speak of literal or close translation, not realizing, it seems, that there are many aspects to which a translator (translation) can be especially faithful and that achieving one kind of "equivalence" can mean excluding another. In parallel fashion, the translator remains uninformed about the notions of item equivalence relevant for the survey researcher. This lack of understanding of requirements and expectations is followed by inadequate definition of the goals to be achieved in a translation - yet proper and fair assessment of a translation product requires that the goals to be met be established beforehand. Given this insufficient articulation of the potentials and restrictions which hold for both sides, those involved in SLQ and TLQ development are under-informed about: a) what can be produced, b) how to decide what should be produced and c) how to assess what is produced. 3) Newer approaches to translation based on more recent models of communication have made little impact on how survey translation is actually undertaken. 4) (in connection with the foregoing) Little consideration is given to the fact that a questionnaire is also a text destined for discourse (Harkness, 1994). Thus the fields of text analysis and discourse analysis and cross-cultural differences in these spheres must be taken into account in the development of SLQ and TLQ. 5) Already available and emerging techniques in sociology could contribute positively to the construction and refining processes of SLQ and TLQ. The AAPOR 1994 session on focus group contributions to instrument development considers one such example. On a different tack, facet analysis (Borg & Shye, in press) of items and dimensions could clarify decisions on equivalence for translations (Brislin, 1980, p. 423f.; Borg & Harkness, in preparation), while empirical research into response scale construction on a cross-cultural basis (as proposed in a new ZUMA-NORC project) can provide a more secure basis for maintaining or rejecting certain verbal and visual scale measurements across countries. 6) Too little cross-cultural cooperation is undertaken on the methodological issues involved here. Everyone does her/his best but few in the end benefit from the experience(s) of others. One small step towards remedying this will be taken by the ISSP methodology group, which will begin coordinating international initiatives with feedback to the survey community. In sum, we are convinced that SLQ and TLQ development procedures can be improved, need to be improved, and that techniques are available to begin doing so. We are equally convinced that a cross-cultural cooperative initiative is needed to move forward.
Measurement in Multi-National Surveys
3.4
35
Data Analysis
Data analysis may necessitate reappraisal of the functional equivalence of measures (Scheuch, 1968; Smith, 1988). Linguistic problems of measures are often intrinsically entangled with the social reality under investigation. It is therefore necessary to address the issue of functional equivalence not only during questionnaire construction and translation but at the later stage of analysis. As a rule, analytical investigations require that a theoretical concept is operationalized by more than one indicator. The first step, then, is to check whether the cross-national ordering of the marginals or means is similar for the different indicators. If the ordering differs, this may suggest serious problems with the measures. If only one measure is used, it is important to realize that the assumption of uni-dimensionality may be invalid when comparisons are drawn across nations and time. Used with care, techniques such as cluster analysis, factor analysis, multidimensional scaling, and facet analysis are useful for assessing cross-national similarities and differences in the dimensionality of the items used. Multivariate analysis of cross-national data can help reveal certain anomalies of functional equivalence. The regression of attitude measures on sociodemographic variables like age, education or income, for example, could be used to help clarify the meaning of an attitudinal question. High correlations with age and education may suggest an ideological interpretation, while correlation with income (net of education) make a self-interest based interpretation more likely. It is still problematic to disentangle such interpretations since relationships between variables may vary by country. For instance, the fact that different educational groups in East Germany are closer to each other with regard to the importance of the family than the respective groups in West Germany could be due to a) a different meaning of the family in both parts of Germany or b) a difference with respect to the relationship between education and the importance rating of the family, or c) both these factors. Moreover, the reasons for attaching importance to the family may vary considerably across nation. Thus the higher importance given the family in East Germany may reflect the fact that the family is regarded as a refuge and support in times of social and economic severity, rather than any attachment to traditional values per se (Braun, 1992; Braun, Scott & Alwin, 1994). Similar differences in interpretation apply to many attitudinal domains and it would be false to assume the same causal processes apply in different national contexts. In West Germany, for example, there is a marked tendency for the less well educated and the older cohorts to show a stronger familial orientation (presumably because this reflects a traditional ideological stance), whereas in the East, the corresponding associations are much weaker. This illustrates again that in each country what may seem to be the same attitude may not only reflect different dimensions but also be associated with quite distinct causal processes, depending on the different social realities.
4
Conclusions - The Future
We conclude that both literal and conceptual replication are valid and desirable approaches to establishing functional equivalence. As with any research methods, the applicability of one or the other approach depends on the concept and dimensions being assessed, and on the general research goal. Functional equivalence, it should be remembered, is not an aim in itself, but a means to make meaningful comparisons. Whatever the approach, a focus on this set of issues at the point of conceptualizing the problem under study, at the point of drafting questionnaires,
36
Duane F. Alwin, Michael Braun, Janet Harkness, and Jacqueline Scott
and during the phase of data analysis, coupled with a continuous feed-back to theory are necessary (Scheuch, 1968; Smith, 1988). As we noted at the outset, there is a great deal of optimism about the overall value of multi-national surveys in social science. Some observers see the prospects for conducting multinational surveys as bright, given the growth in expertise in how to carry out such research. This is the view taken by Kish (in press), who suggests that, while multi-national surveys are not new, "... the deliberate design of valid and efficient multi-national surveys is new and it needs the attention of statisticians". Increasing opportunities are emerging for the proper design of multi-national surveys that permit more meaningful and valid cross-national comparisons. Others may be less optimistic, feeling there is more to it than the application of statistical knowledge to the design and execution of surveys. Some would place a higher priority on theoretical and conceptual development, seeking a richer nomological basis for multi-national research. Przeworski (1983), for example, argues that methodological advice is rarely followed by those wishing to draw conclusions from comparative studies, and that social researchers are too often more interested in collecting multi-national data than testing theoretical propositions. It seems obvious to us that in social research generally there is a gap between the desirable and the possible: research designs often embody a series of compromises between the two (Kish, 1987). Comparative social research is no exception to this. However, we assume that as problems are discovered and attention given to their solution, comparative researchers will learn how to make informed decisions regarding the key trade-offs involved. With time and the cumulation of knowledge, researchers will learn which compromises are worth making and which ones are truly compromising to the goals of the research endeavor. Translation problems make some of the data in multi-national surveys problematic. Such problems can be identified in the majority of inter-cultural comparative research (and differences in texts are in any case inevitable). However, since documentation of problems noted is at best available for insiders, there is a real need not only to improve translation procedures but to document problems noted for external users and to develop strategies to replace faulty items in replications. International comparative research will hopefully also contribute to improving our understanding of the meaning of questions and to a more adequate interpretation of results and to question design in general - all issues of relevance for single-nation studies, too. Increased sensitivity to problems of functional equivalence is of long-term methodological benefit. The researcher becomes aware of differences in meaning resulting from even slight modifications in the formulation of questions, and of possible violations of uni-dimensionality in a single country. It will by now be sufficiently clear that a naive approach to multi-national survey data will not suffice. However, while there are enormous problems remaining to be solved, this should not discourage those who wish to get involved with this extremely fruitful strand of research. And while it should be clear that many lessons of multi-national research need to be learned through concrete experience, it is also clear that errors can serve to inform the development of new hypotheses and research ideas. Thus the recognition of such problems can serve as a basis for future progress.
Measurement in Multi-National Surveys
37
Notes 1 The International Social Survey Program (see Braun, in press, Davis et al. in this volume) is an annual ongoing project with members in twenty-two countries to date. The surveys deal with different topics of interest to social science research with a view to replication every five years or so. The questionnaires are developed in English on the basis of team work. A drafting group consisting of members from 4-5 countries drafts a source language questionnaire. The members comment on revised versions of this throughout the year until a final draft version is discussed, modified and ratified at the annual general assembly. All target language versions are prepared on the basis of the source language questionnaire. Members adhere to stipulated sampling, fielding, coding and data editing procedures and archive their data with the Zentralarchiv in Cologne within fixed deadlines. As membership has grown, increasing the complexity of coordinating and cooperating on instrument design, so, too, has recognition of the importance of an adequately cross-cultural source language questionnaire, which in turn is dependent on sufficient input from the members outside the language and culture of the source language questionnaire. The ISSP set up a methodology group in 1993 to address these issues.
References Alwin, D.F. (1992). The application of structural equation models in comparative research. Paper presented at the ISA Conference on Methodology, Trento. June. Alwin, D.F, Braun, Μ. & Scott, J. (1992). The separation of work and the family: attitudes towards women's labor-force participation in Germany, Great Britain, and the United States. European Sociological Review 8, 13-37. Alwin, D.F., Cohen, R.L. & Newcomb, Th. M. (1991). Political attitudes over the life span. Madison: University of Wisconsin Press. Borg, I. & Harkness, J. (in preparation). Approaches to questionnaire development and translation: facet analysis and frame-and-scene semantics. Borg, I. & Shye, S. (in press). Facet Theory: Form and Content. Newbury Park, CA: Sage. Braun, Μ. (1992). Arbeitsplatzunsicherheit und die Bedeutung des Berufs. In W. Glatzer & H.H.Noll, Lebensverhältnisse in Deutschland: Ungleichheit und Angleichung (pp.75-88). Frankfurt: Campus. Braun, M. (1993, May). Potential problems of functional equivalence in ISSP 88 (Family and Changing Gender Roles). Paper Presented at the Scientific Meeting of the International Social Survey Program, Chicago. Braun, Μ. The International Social Survey Programme (ISSP). In: P. Flora, F. Kraus, H.H. Noll & F. Rothenbacher, Social Statistics and Social Reporting in and for Europe. Europe in comparison - a series of guidebooks for the social sciences, Vol. I. Bonn: Informationszentrum Sozialwissenschaften. Braun, Μ. & Harkness, J. (in preparation). Data-based and text-based approaches to survey translations. Braun, Μ. & Müller, W. (forthcoming), Measurement of education in comparative research. In T. Kolosi (Ed.), Handbook of Methodology. Braun, Μ.; Scott, J. & Alwin, D.F. (1994). Economic necessity or self-actualization? Attitudes toward women's labor-force participation in East and West Germany. European Sociological Review, 10.
38
Duane F. Alwin, Michael Braun, Janet Harkness, and Jacqueline Scott
Brislin, R.W. (1970). Back-translation for cross-cultural research. Journal of Cross-Cultural Psychology 1, 185-216. Brislin, R.W. (1980). Translation and content analysis of oral and written material. In H.C. Triandis & J.W. Berry (Eds.), Handbook of Cross-cultural Psychology, Vol. 2 (pp. 389444), Boston: Allyn and Bacon. Brislin, R.W. (1986). The wording of translation of research instruments. In W.J. Lonner & J.W. Berry (Eds.), Field Methods in Cross-Cultural Research (pp.137-164). Beverly Hills: Sage. Davis, J. Α., R. Jowell (1989). Measuring national differences - An introduction to the International Social Survey Programme. In R. Jowell et al., British Social Attitudes -Special International Report (pp. 1-13). Aldershot: Gower. Duncan, O.D. (1975). Does Money Buy Satisfaction? Social Indicators Research 2, 67-74. Duncan, G., Gustafson, B., Hauser, R., Schmauss, G., Messinger, Η., Muffels, R., Nolan, Β. & Ray, J.-C. (1993). Poverty dynamics in eight countries. Journal of Population Economics 6, 215-234. Easterlin, R.A. (1975). Does economic growth improve the human lot? Some empirical evidence. In P.A. David & M.W. Reder (Eds.), Nations and Households in Economic Growth (pp.89-125). New York: Academic Press. Erikson, R. & Goldthorpe, J.H. (1992). The Constant Flux: A Study of Class Mobility in Industrial Societies. Oxford: Clarendon Press. Form, W.H. (1979). Comparative industrial sociology and the convergence hypothesis. Annual Review of Sociology 5, 1-25. Harkness, J. (1992, September). Something rotten in the state of Norway?: Translation Studies and the translation of survey questionnaires. Paper presented at the Congress of the German Sociological Association, Düsseldorf. Harkness, J. (1993). Mountains and molehills: response scales in cross-cultural survey questionnaires. Paper presented at the Annual Meeting of the American Association for Public Opinion Research, St. Charles, IL. May. Harkness, J. (1994, July). Talking back to questionnaires: the questionnaire as text and discourse. Paper presented at the World Congress of Sociology, Bielefeld. Harkness, J. (in preparation). Backing up and backing down on cross-cultural questionnaire translation. Hayashi, C; Suzuki, T. & Sasaki, M. (1992). Data Analysis for Comparative Social Research: International Perspectives. Amsterdam: North-Holland. Inglehart, R. (1977). The Silent Revolution: Changing Values and Political Styles among Western Publics. Princeton, NJ: Princeton University Press. Inglehart, R. (1979). Value priorities and socioeconomic change. In S.H. Barnes & M. Kasse, et al. (Eds.), Political Action: Mass Participation in Five Western Democracies (pp. 305342). London: Sage. Inglehart, R. (1990). Culture Shift in Advanced Industrial Society. Princeton, NJ: Princeton University Press. International Labour Office (1990). International Standard Classification of Occupations: ISCO-88. Geneva: International Labour Office. Kaplan, A. (1964). The Conduct of Inquiry. San Francisco: Chandler. Kish, L. (1987). Statistical Design for Research. New York: John Wiley. Kish, L. (in press). Multipopulation survey design: five types with seven shared aspects. International Statistical Review.
Measurement in Multi-National Surveys
39
König, W.; P. Lüttinger & Müller, W. (1988). Comparative analysis of the development and structure of educational systems: methodological foundations and the construction of a comparative educational scale. CASMIN Working Paper No. 12. Mannheim: Institut für Sozialwissenschaften. König, W. & Müller, W. (1986). Educational systems and labor markets as determinants of worklife mobility in France and West Germany: a comparison of men's career mobility, 1965-1970. European Sociological Review 2, 73-96. Kohn, M. L. (1987). Cross-national research as an analytic strategy. American Sociological Review 52, 713-731. Kohn, M.L. & Slomcynski, K.M. (1990). Social Structure and Self-Direction: A Comparative Analysis of the United States and Poland. Cambridge, MA: Basil Blackwell. Küchler, Μ. (1987). The utility of surveys for cross-national research. Social Science Research 16, 229-244. Maslow, A.H. (1954). Motivation and Personality. New York: Harper & Row. McKay, R.B. & Lavallee, A.P.(1993). The Hispanic version of the redesigned CPS questionnaire: applying sociolinguistic and survey research methods to translation survey questionnaires. Paper presented at the Annual Meeting of the American Association for Public Opinion Research, St. Charles, IL. May. Lykken, D. (1968). Statistical significance in psychological research. Psychological Bulletin 70: 151-59. Przeworski, A. (1983). Methods of cross-national research, 1970-1983: an overview. Paper presented at the Forum on Cross National Policy Research, Science Center Berlin, Berlin. Przeworski, A. & Teune, H. (1970). The Logic of Comparative Social Inquiry. New York: Wiley. Rokkan, St. (1964). Comparative cross-national research: the context of current efforts. In R.L. Merrit & St. Rokkan (Eds.) Comparing Nations: The Use of Quantitative Data in CrossNational Research (pp. 3-25), New Haven, CT: Yale University Press. Ross, W.D. (1964). Aristotle. London: Methuen. Scheuch, Ε. Κ. (1968). The cross-cultural use of sample surveys: problems of comparability. In: St. Rokkan (Ed.), Comparative Research Across Cultures and Nations (pp. 176-209). Paris: Mouton. Scott, J. (1993). The national report of the UK. In B. von Rosenbladt & S. Reifenrath (Eds.), ECHP Pilot Survey: implementation of wave 1, Technical Report. Luxembourg: Statistical Office of the European Communities. Smith, T.W. (1988). The ups and downs of cross-national survey research. GSS Cross-National Report No. 8. Chicago: NORC. Treiman, D.J. (1977). Occupational Prestige in Comparative Perspective. New York: Academic Press.
Cognitive and Communicative Aspects of Survey Measurement Norbert Schwarz, Herbert Bless, Hans-J. Hippler, Fritz Strack, and Seymour Sudman
1
Introduction
That survey data are only as meaningful as the answers that respondents provide has long been recognized by survey researchers. Nevertheless, survey methodology has long been characterized by rigorous theories of sampling on the one hand, and the so called "art of asking questions" on the other hand. It has only been recently that the cognitive and communicative processes underlying question answering in surveys received theoretical attention. This development reflects an increasing collaboration of cognitive and social psychologists and survey methodologists, which was initiated by two conferences. One conference was held under the auspices of the U. S. National Academy in the fall of 1983 and focused primarily on behavioral reports in surveys (see Jabine et al., 1984). The other conference was held at ZUMA in the summer of 1984 and focused primarily on issues of attitude measurement (see Hippler, Schwarz, & Sudman, 1987). In the ten years since these initial conferences, work on cognitive aspects of survey measurement has developed at a rapid pace, several edited volumes have been published (e.g., Jobe & Loftus, 1991; Schwarz & Sudman, 1992, 1994; Tanur, 1992) and a first textbook is in press (Sudman, Bradburn, & Schwarz, in press). Moreover, several major survey centers, in the U.S. as well as in Europe, have established cognitive laboratories to help with questionnaire development and the first degree granting program in survey methodology (the University of Maryland-University of Michigan Joint Program in Survey Methods) requires courses in cognitive and social psychology as part of its curriculum. Drawing on psychological theories of language comprehension, memory, and judgment, researchers have begun to formulate explicit models of the question answering process and have tested these models in tightly controlled laboratory experiments and split-ballot surveys. This work links survey methodologists' expertise in the "art of asking questions" to recent developments in cognitive science, thus providing a useful theoretical and empirical basis for understanding the processes by which survey respondents arrive at an answer. Below we review key aspects of this interdisciplinary research. Following an initial discussion of respondents' tasks, we focus on issues of question comprehension and attitude measurement. Due to space constraints, we do not address issues of autobiographical memory and the validity of behavioral reports, which represent the third major area of research in this interdisciplinary field (see Bradburn, Rips, & Shevell, 1987; Schwarz, 1990, for reviews, and Schwarz & Sudman, 1994 for research examples).
Cognitive and Communicative Aspects of Survey Measurement
2
41
Respondents' Tasks
From a cognitive perspective, answering a survey question requires that respondents perform several tasks (see Strack & Martin, 1987; Tourangeau, 1984; Tourangeau & Rasinski, 1988, for a more detailed discussion). As a first step, respondents have to interpret the question to understand what is meant. If the question is an opinion question, they may either retrieve a previously formed opinion from memory, or they may "compute" an opinion on the spot. To do so, they need to retrieve relevant information from memory to form a mental representation of the target that they are to evaluate. In most cases, they will also need to retrieve or construct some standard against which the target is evaluated. Once a "private" judgment is formed in their mind, respondents have to communicate it to the researcher. To do so, they may need to format their judgment to fit the response alternatives provided as part of the question. Moreover, respondents may wish to edit their response before they communicate it, due to influences of social desirability and situational adequacy. Accordingly, interpreting the question, generating an opinion or a representation of the relevant behavior, formatting the response, and editing the answer are the main psychological components of a process that starts with respondents' exposure to a survey question and ends with their overt report. In addition, the processes involved in answering a question may themselves change respondents' cognitive representation of the issue addressed and may affect their subsequent behavior (see Feldman and Lynch, 1988).
3
Question Comprehension
The key issue at the question comprehension stage is whether the respondent's understanding of the question does or does not match what the researcher had in mind: Is the attitude object, or the behavior, that the respondent identifies as the target of the question the one that the researcher intended? Does the respondent's understanding tap the same facet of the issue and the same evaluative dimension? From a psychological point of view, question comprehension reflects the operation of two intertwined processes (cf. Clark & Schober, 1992; Schwarz, 1994; Strack, in press; Strack & Schwarz, 1992). The first refers to the semantic understanding of the utterance. If the words used are ambiguous or unfamiliar, for example, respondents need to disambiguate their meaning. However, understanding the words is not sufficient to answer a question. For example, if respondents are asked, "What have you done today?", they are likely to understand the meaning of the words. However, they still need to determine what kind of activities the researcher is interested in. Should they report, for example, that they took a shower, or not? Hence, understanding a question in a way that allows an appropriate answer requires not only an understanding of the literal meaning of the question, but involves inferences about the questioner's intention to determine the pragmatic meaning of the question. To understand how respondents infer the intended meaning of a question, we need to consider the assumptions that govern the conduct of conversation in everyday life. These tacit assumptions were systematically described by Paul Grice (1975), a philosopher of language (see Clark & Schober, 1992; Schwarz, 1994; Schwarz & Hippler, 1991; Strack, in press; Strack & Schwarz, 1992, for applications to survey research). According to Grice's analysis, conversations proceed according to a co-operativeness principle. This principle can be expressed in the form of four maxims. There is a maxim of quality that enjoins speakers not to say anything they believe to be false or lack adequate evidence for, and a maxim of relation that enjoins speakers
42
Norbert Schwarz, Herbert Bless, Hans-J. Hippler, Fritz Strack, and Seymour Sudman
to make their contribution relevant to the aims of the ongoing conversation. In addition, a maxim of quantity requires speakers to make their contribution as informative as is required, but not more informative than is required, while a maxim of manner holds that the contribution should be clear rather than obscure, ambiguous or wordy. In other words, speakers should try to be informative, truthful, relevant, and clear. As a result, "communicated information comes with a guarantee of relevance" (Sperber & Wilson, 1986, p. vi) and listeners interpret the speakers' utterances "on the assumption that they are trying to live up to these ideals" (Clark & Clark, 1977, p. 122). These tacit assumptions have important implications for survey research.
3.1
Response Alternatives
Suppose, for example, that respondents are asked in an open response format, "What have you done today?". To give a meaningful answer, respondents have to determine which activities may be of interest to the researcher. In an attempt to be informative, respondents are likely to omit activities that the researcher is obviously aware of, such as, "I gave a survey interview", or may take for granted anyway, such as, "I took a shower". If respondents were given a list of activities that included giving an interview and taking a shower, most respondents would endorse them. At the same time, however, such a list would reduce the likelihood that respondents report activities that are not represented on the list (see Schuman & Presser, 1981; Schwarz & Hippler, 1991, for a review of relevant studies). Both of these question form effects reflect that response alternatives can clarify the intended meaning of a question, in the present example by specifying the activities the researcher is interested in. Whereas this example may seem rather obvious, more subtle influences are frequently overlooked. Suppose that respondents are asked how frequently they felt "really irritated" recently. To answer this question, they again have to determine what the researcher means with "really irritated". Does this term refer to major or to minor annoyances? To identify the intended meaning of the question, they may consult the response alternatives provided by the researcher. If the response alternatives present low frequency categories, e.g., ranging from "less than once a year" to "more than once a month", they may conclude that the researcher has relatively rare events in mind and that the question cannot refer to minor irritations, which are likely to occur more often. In line with this assumption, Schwarz, Strack, Müller, and Chassein (1988) observed that respondents who had to report the frequency of irritating experiences on a low frequency scale assumed that the question referred to major annoyances, whereas respondents who had to give their report on a high frequency scale assumed that the question referred to minor annoyances. Thus, respondents identified different experiences as the target of the question, depending on the frequency range of the response alternatives provided to them. Similarly, Schwarz, Knäuper, Hippler, Noelle-Neumann, and Clark (1991) observed that respondents may use the specific numeric values provided as part of a rating scale to interpret the meaning of the scale's labels. In their study, a representative sample of German adults was asked, "How successful would you say you have been in life?". This question was accompanied by an 11-point rating scale, ranging from "not at all successful" to "extremely successful". However, in one condition the numeric values of the rating scale ranged from 0 ("not at all successful") to 10 ("extremely successful"), whereas in the other condition they ranged from -5 ("not at all successful") to +5 ("extremely successful"). The results showed a dramatic impact of the numeric values presented to respondents. Whereas 34 percent of the respondents endorsed a value between 0 and 5 on the 0 to 10 scale, only 13 percent endorsed one of the formally equivalent values between -5 and 0 on the -5 to +5 scale. Subsequent experiments indicated that
Cognitive and Communicative Aspects of Survey Measurement
43
this difference reflects differential interpretations of the term "not at all successful". When this label was combined with the numeric value "0", respondents interpreted it to reflect the absence of success. However, when the same label was combined with the numeric value "-5", and the scale offered "0" as the mid-point, they interpreted it to reflect the presence of failure. This differential interpretation of the same term as a function of its accompanying numeric was also reflected in inferences that judges drew on the basis of a report given along a rating scale. For example, in one experiment, a fictitious student reported his academic success along one of the above scales, checking either a "-4" or a "2". As expected, judges who were asked to estimate how often this student had failed an exam assumed that he failed twice as often when he checked a "-4" than when he checked a "2", although both values are formally equivalent along 11-point rating scales of the type described above. In combination, these findings demonstrate that respondents use the response alternatives in interpreting the meaning of a question. In doing so, they proceed on the tacit assumption that every contribution is relevant to the aims of the ongoing conversation. In the survey interview, these contributions include apparently formal features of questionnaire design, such as the numeric values given on a rating scale. Hence, identically worded questions may acquire different meanings, depending on the response alternatives by which they are accompanied (see Schwarz & Hippler, 1991, for a more extended discussion).
3.2
Question Context
Respondents' interpretation of a question's intended meaning is further influenced by the context in which the question is presented. Not surprisingly, this influence is the more pronounced, the more ambiguous the wording of the question is. As an extreme case, consider research in which respondents are asked to report their opinion about a highly obscure - or even completely fictitious - issue, such as the "Agricultural Trade Act of 1978" (e.g., Bishop, Tuchfarber, & Oldendick, 1986; Schuman & Presser, 1981). Questions of this type reflect public opinion researchers' concern that the "fear of appearing uninformed" may induce "many respondents to conjure up opinions even when they had not given the particular issue any thought prior to the interview" (Erikson, Luttberg, & Tedin, 1988, p. 44). To explore how meaningful respondents' answers are, survey researchers introduced questions about issues that don't exist (e.g., Bishop, Tuchfarber, & Oldendick, 1986; Schuman & Presser, 1981). Presumably, respondents' willingness to report an opinion on a fictitious issue casts some doubt on the reports provided in survey interviews in general. In fact, about 30% to 50% of the respondents do typically provide an answer to issues that are invented by the researcher. This has been interpreted as evidence for the operation of social pressure that induces respondents to give meaningless answers in the absence of any knowledge. From a conversational point of view, however, these responses may be more meaningful than has typically been assumed. The sheer fact that a question about some issue is asked presupposes that this issue exists - or else asking a question about it would violate every norm of conversational conduct. Respondents, however, have no reason to assume that the researcher would ask meaningless questions and will hence try to make sense of it. If the question is highly ambiguous, and the interviewer does not provide additional clarification, respondents are likely to turn to the context of the ambiguous question to determine its meaning, much as they would be expected to do in any other conversation. Once respondents have assigned a particular meaning to the issue, thus transforming the fictitious issue into a better defined issue that makes sense in the context of the interview, they may have no difficulty in reporting a subjectively
44
Norbert Schwarz, Herbert Bless, Hans-J. Hippler, Fritz Strack, and Seymour Sudman
meaningful opinion. Even if they have not given the particular issue much thought, they may easily identify the broader set of issues to which this particular one apparently belongs. If so, they can use their general attitude toward the broader set of issues to determine their attitude toward this particular one. An experimental survey on educational policies may illustrate this point. In this study, Strack, Schwarz, and Wänke (1991, Experiment 1) asked a sample of German college students to report their attitude toward the German government's alleged plan to introduce an "educational contribution". For some subjects, this target question was preceded by a question that asked them to estimate the average tuition fees that students have to pay at US universities (in contrast to Germany, where university education is free). Others had to estimate the amount of money that the Swedish government pays every student as financial support. As expected, students' attitude toward an "educational contribution" was more favorable when the preceding question referred to money that students receive from the government than when it referred to tuition fees that students have to pay. Subsequently, respondents were asked what the "educational contribution" implied. Content analyses of respondents' definitions of the fictitious issue clearly demonstrated that respondents used the context of the "educational contribution" question to determine its meaning. Thus, respondents turned to the content of related questions to determine the meaning of an ambiguous one. In doing so, they interpreted the ambiguous question in a way that made sense of it, and subsequently provided a subjectively meaningful response to their definition of the question. This finding stands in stark contrast to the assumption that responses to ill-defined terms are largely random in nature, representing a "mental flip of coin" as Converse (1964) and other early researchers hypothesized. As Strack et al.'s (1991) results indicate, the assumption of random responding does not capture the underlying process. What is at the heart of reported opinions about fictitious issues is not that respondents are willing to give subjectively meaningless answers by flipping a coin, but that researchers violate conversational rules by asking meaningless questions in a context that suggests otherwise. Respondents, however, have no reason to suspect this may be the case and work hard at making sense of the question asked. To do so, they draw on the context of the question, much as they would be expected to do in any other conversation.
3.3
Summary
As the preceding examples illustrate, question comprehension is not primarily an issue of understanding the literal meaning of an utterance. Rather, question comprehension involves extensive inferences about the speaker's intentions to determine the pragmatic meaning of the question. To make these inferences, respondents draw on the nature of preceding questions as well as the response alternatives. Accordingly, survey methodologists' traditional focus on using the "right words" in questionnaire writing needs to be complemented by a consideration of the conversational processes involved in the question answering process.
4
Attitude Measurement and the Emergence of Context Effects
That attitude measurement is context dependent is no news to survey researchers. Many studies have demonstrated that preceding questions may influence the responses given to subsequent
Cognitive and Communicative Aspects of Survey Measurement
45
ones (see Schuman & Presser, 1981; Schwarz & Strack, 1991; Schwarz & Sudman, 1992; Tourangeau & Rasinski, 1988, for research examples and reviews). However, the conditions under which context effects may emerge are not well understood - and when they emerge, it has typically been difficult to predict their direction. In recent years, considerable progress has been made in this domain and several related conceptual models have been offered (Feldman & Lynch, 1988; Schwarz & Bless, 1992; Schwarz & Strack, 1991; Strack & Martin, 1987; Tourangeau & Rasinski, 1988). Below we draw on Schwarz and Bless' (1992a) mental construal model, which specifies the conditions under which question order effects emerge and predicts their direction, their size, and their generalization across related issues.
4.1
The Construal of Targets and Standards
The model assumes that individuals who are asked to form a judgment about some target stimulus first need to retrieve some cognitive representation of it. In addition, they need to determine some standard of comparison to evaluate the stimulus. Both, the representation of the target stimulus and the representation of the standard are, in part, context dependent. Individuals do not retrieve all knowledge that may bear on the stimulus, nor do they retrieve and use all knowledge that may potentially be relevant to constructing a standard. Rather, they rely on the subset of potentially relevant information that is most accessible at the time of judgment (see Bodenhausen & Wyer, 1987; Higgins, 1989). Accordingly, their temporary representation of the target stimulus, as well as their construction of a standard of comparison, includes information that is chronically accessible, and hence context independent, as well as information that is only temporarily accessible, due to contextual influences. Whereas differences in the chronic accessibility of information reflect respondent characteristics, differences in the temporary accessibility of information are primarily due to questionnaire variables. Most importantly, information that has been used for answering a preceding question is particularly likely to come to mind when respondents are later asked a related question, to which it may be relevant. How the information that comes to mind influences the judgment, depends on how it is categorized, i.e., on whether it is used to construct a representation of the target or a representation of the standard or scale anchor, against which the target is evaluated. 4.1.1
Assimilation Effects
Information that is included in the temporary representation that individuals form of the target category results in assimilation effects. This reflects that the judgment is based on the information that is included in the representation used. Accordingly, the addition of information with positive implications results in a more positive judgment, whereas the addition of information with negative implications results in a more negative judgment. The size of assimilation effects increases with the amount and extremity of the temporarily accessible information, and decreases with the amount and extremity of chronically accessible information, that is included in the representation of the target. Hence, we would expect that respondents who are experts on a given issue show less pronounced assimilation effects than novices, reflecting that experts can draw on a larger set of chronically accessible information, which in turn reduces the impact of adding a given piece of temporarily accessible information. Note, however, that expert status needs to be defined with regard to the specific issue at hand. Global variables, such as years of schooling, are unlikely to moderate the size of assimilation effects, unless they are confounded with the amount of knowledge regarding the issue under
46
Norbert Schwarz, Herbert Bless, Hans-J. Hippler, Fritz Strack, and Seymour Sudman
consideration. Thus, it comes as little surprise that formal education has been found to show inconsistent relationships with the emergence and size of context effects. By the same token, the impact of a given piece of information that is brought to mind by a preceding question is reduced the more additional information is brought to mind by other context questions. Hence, the impact of a given question decreases as the number of related context questions increases (e.g., Schwarz, Strack, & Mai, 1991). 4.1.2
Contrast Effects
According to the model, the same piece of information that elicits an assimilation effect may also result in a contrast effect. This is the case when the information is excluded from, rather than included in, the cognitive representation formed of the target. As a first possibility, suppose that a given piece of information with positive (negative) implications is excluded from the representation of the target category. If so, the representation will contain less positive (negative) information, resulting in less positive (negative) judgments. The size of such a subtraction based contrast effect increases with the amount and extremity of the temporarily accessible information that is excluded from the representation of the target, and decreases with the amount and extremity of the information that remains in the representation of the target. Hence, we would again expect, for example, that experts show less pronounced subtraction based contrast effects, reflecting that a larger amount of chronically accessible information is used in constructing the representation of the target in the first place. As a second possibility, respondents may not only exclude accessible information from the representation formed of the target, but may also use this information in constructing a standard of comparison or scale anchor. If the implications of the temporarily accessible information are more extreme than the implications of the chronically accessible information used in constructing a standard or scale anchor, this process results in a more extreme standard, eliciting contrast effects for that reason. The size of these comparison based contrast effects increases with the extremity and amount of temporarily accessible information used in constructing the standard or scale anchor, and decreases with the amount and extremity of chronically accessible information used in making this construction. Which of these processes drives the emergence of a contrast effect determines whether the contrast effect is limited to a single target or generalizes across related targets. If the contrast effect is based on the mere subtraction of information from the representation formed of the target, it is limited to the evaluation of this particular target. This simply reflects that the evaluation is based on the information "left" in the representation of the target. If the information that is excluded from the representation of the target is used in constructing a standard of comparison or scale anchor, on the other hand, contrast effects are likely to emerge on each judgment to which this standard or scale anchor is relevant. That the model provides two related mechanisms for the emergence of contrast effects raises the question under which conditions we are likely to obtain subtraction based or comparison based contrast effects? Information that is excluded from the representation of the target is only used in constructing a standard of comparison or scale anchor if it has been thought about with regard to the relevant judgmental dimension. An empirical example may clarify this point. Schwarz, Münkel, and Hippler (1990) asked respondents to rate a number of beverages according to how "typically German" they are. In one condition, this task was preceded by a question about the frequency with which Germans drink beer or drink vodka, respectively. If the preceding question referred to the consumption of vodka, an atypical drink for Germans, the subsequent beverages were rated as more typically German than if the preceding consumption ques-
Cognitive and Communicative Aspects of Survey Measurement
47
tion referred to beer. Other respondents were also asked a question about vodka or beer, respectively. However, they had to estimate the caloric content of these drinks, rather than the frequency of their consumption. In this case, the subsequent typicality ratings were unaffected by the context question. In combination, this pattern of findings indicates that respondents only used the highly accessible drinks in constructing a standard or scale anchor when the question that brought these drinks to mind tapped the underlying dimension of frequency of consumption that is crucial to typicality judgments. If they thought about these drinks with regard to some other dimension, here their caloric content, they were not used in constructing a standard or scale anchor, despite their high accessibility in memory. In summary, information that is excluded from the representation of the target category results in subtraction based contrast effects if it has not been thought about with regard to the underlying dimension of judgment. If it has been thought about with regard to the relevant dimension, the excluded information is likely to be used in constructing a standard or scale anchor, resulting in comparison based contrast effects. Whereas subtraction effects are limited to the evaluation of the target from which the information is excluded, comparison based contrast effects generalize to the evaluation of every target to which the standard or scale anchor is relevant. 4.1.3
What Triggers the Exclusion of Information?
The model assumes that the default operation is to include information that comes to mind in the representation of the target. This suggests that we should be more likely to see assimilation rather than contrast effects in survey research, an issue that should be addressed by meta-analyses. In contrast, the exclusion of information needs to be triggered by salient features of the question answering process. In principle, any variable that affects the categorization of information is likely to affect the emergence of assimilation and contrast effects, linking the present model to cognitive research on categorization processes in general. Schwarz and Bless (1992a) review a host of heterogeneous variables that have been shown to affect context effects in social judgment. Below, we address the ones that are of particular relevance to survey research. These variables can be conceptualized as bearing on three decisions that respondents have to make with regard to the information that comes to mind. As shown in Figure 1, some information that comes to mind may simply be irrelevant, pertaining to issues that are unrelated to the question asked. Other information may potentially be relevant to the task at hand and respondents have to decide what to do with it. The first decision bears on why this information comes to mind. Information that seems to come to mind for the "wrong reason", e. g., because respondents are aware of the potential influence of a preceding question, is likely to be excluded (e.g., Lombardi, Higgins, & Bargh, 1987; Ottati, Riggle, Wyer, Schwarz, & Kuklinski, 1989; Strack, Schwarz, Bless, Kiibler, & Wänke, 1993). The second decision bears on whether the information that comes to mind "belongs to" the target category or not. The content of preceding questions (e.g., Schwarz & Bless, 1992a), the width of the target category (e.g., Schwarz & Bless, 1992b), the extremity of the information (e.g., Herr, 1986), or its representativeness for the target category (e.g., Strack, Schwarz, & Gschneidinger, 1985) are relevant at this stage. Finally, conversational norms may determine respondents' perception of what they are supposed to do with highly accessible information (e.g., Schwarz, Strack, & Mai, 1991; Strack, Martin, & Schwarz, 1988).
Norbert Schwarz, Herbert Bless, Hans-J. Hippler, Fritz Strack, and Seymour Sudman
Judgmental task
ι
Construct representation of target category,
i
May the information that comes to mind bear on the task?
No
i
Yes Determine what to do with it:
Ignore (no effect)
a. Does it come to mind due to irrelevant influences? (e.g., awareness of priming episode)
Yes
No b. Does it "belong" to the category? (e.g., representativeness, category width, explicit categorization)
No-
Yes c. Am I intended to use it? (conversational norms)
i
No
Yes
INCLUDE in temporary representation of target category
EXCLUDE — from temporary representation of target category Does it bear on dimension of judgment? No
Yes Use to construct standard/anchor
ASSIMILATION EFFECT
CONTRAST EFFECT LIMITED TO TARGET
i CONTRAST EFFECT ACROSS TARGETS
Figure 1. Inclusion/exclusion and the emergence of assimilation and contrast effects.
Cognitive and Communicative Aspects of Survey Measurement
49
Whenever any of these decisions results in the exclusion of information from the representation formed of the target, it will elicit a contrast effect, the size of which depends on the variables discussed above. Whether this contrast effect is limited to the target, or generalizes across related targets, depends on whether the excluded information is merely subtracted from the representation of the target or used in constructing a standard or scale anchor. Whenever the information that comes to mind is included in the representation formed of the target, on the other hand, it results in an assimilation effect, the size of which depends on the variables discussed above. Hence, the model predicts the emergence, the direction, the size, and the generalization of context effects in attitude measurement.
4.2
Implications for Questionnaire Construction
How do these considerations bear on the emergence of assimilation and contrast effects in survey research? In this section, we review a number of different questionnaire variables, along with selective empirical evidence where available. The most important variables are the content and number of preceding questions, the generality of the target question, the spacing of related questions in the questionnaire, introductions or the lack of introductions to a block of questions, and the graphical lay-out of self-administered questionnaires. 4.2.1 The Content of Preceding Questions The content of preceding questions determines the information that becomes temporarily accessible in memory. In addition, it may determine respondents' decision of whether the information that is brought to mind does or does not "belong" to the target category they are to evaluate. Schwarz and Bless (1992a) asked German college students to evaluate the Christian Democratic Party that governs the Federal Republic of Germany. To do so, respondents presumably recall chronically accessible information from memory, which may include that the CDU is a conservative party, that Chancellor Kohl is a member of it, and so on. For some respondents, the party evaluation question was preceded by one of two political knowledge questions, each of which pertained to a specific politician, namely Richard von Weizsäcker. This politician is a member of the CDU who is highly respected by Germans, independent of their party preference. To elicit the inclusion of Richard von Weizsäcker in respondents' representation of the Christian Democrats, some respondents were asked, "Do you happen to know of which party Richard von Weizsäcker has been a member for more than 20 years?". Answering this question resulted in more favorable evaluations of politicians of the Christian Democratic Party, relative to a control condition in which no question about Richard von Weizsäcker was asked. This assimilation effect reflects that Richard von Weizsäcker was included in the representation formed of the CDU. However, Richard von Weizsäcker has not only been a member of the Christian Democratic Party for several decades, but he also served as President of the Federal Republic of Germany at the time of the study - and the office of President required that he no longer participated in party politics. The President as a representative figure-head of the Federal Republic is supposed to take a neutral stand on party issues, much as the Queen in the United Kingdom. Accordingly, other respondents could be asked, "Do you happen to know which office Richard von Weizsäcker holds, that sets him aside from party politics?". Answering this question should exclude Richard von Weizsäcker from the representation that respondents form of Christian Democratic Party politicians, resulting in a contrast effect. In line with this prediction, these
50
Norbert Schwarz, Herbert Bless, Hans-J. Hippler, Fritz Strack, and Seymour Sudman
respondents evaluated the CDU less positively than respondents who were not asked a question about Richard von Weizsäcker. Given that neither the party membership question nor the presidency question taps the evaluative dimension, the model predicts that the observed contrast effect reflects a mere subtraction process. According to this account, Richard von Weizsäcker was chronically accessible to some respondents in the control condition and the party membership question increased, whereas the presidency question decreased, the number of respondents who included him in their representation of the CDU. As a result, the contrast effect should be limited to evaluations of the Christian Democrats and should not generalize to evaluations of related targets. In line with this assumption, evaluations of the Social Democratic Party were not affected by the context questions about Richard von Weizsäcker. Had the context questions tapped the evaluative dimension, on the other hand, the model would predict that the contrast effects generalizes across targets, reflecting that Weizsäcker would be used in constructing a standard. In summary, these findings illustrate that the same piece of information may result in assimilation as well as contrast effects, depending on whether it is included in, or excluded from, the representation that respondents form of the target category. In the present case, these operations were a function of the specific content of the knowledge questions asked, which not only brought Richard von Weizsäcker to mind, but also determined his inclusion in, or exclusion from, the representation constructed of the CDU. Note, however, that context effects can only be detected if the majority of the sample shares the same evaluation of the information that comes to mind. Suppose, for example, that only half of the respondents would have thought highly of Richard von Weizsäcker, whereas the others would not have respected him. In that case, his inclusion in the representation of the CDU would have resulted in more favorable judgments for some respondents, but less favorable judgments for others. Whereas each of these effects would reflect an assimilation of the general judgment to the evaluation of Richard von Weizsäcker at the theoretical level, these effects could have canceled one another, resulting in the apparent absence of context effects in the sample as a whole. Schwarz, Strack, and Mai (1991) observed such a finding in a different content domain. In their study, thinking about one's marriage increased life-satisfaction for happily married respondents, but decreased life-satisfaction for unhappily married ones, resulting in the absence of a context effect in the sample as a whole. It is therefore important to keep in mind that context effects are conditional (see Smith, 1992): For any given respondent, the impact of the same general cognitive process depends on the implications of the specific information that is brought to the respondent's mind. Unless we acknowledge the conditional character of context effects in our analyses, we may erroneously conclude that none were observed. 4.2.2
The Number of Preceding Questions
According to the model, the impact of a given piece of information depends on the amount and extremity of competing information used in constructing a representation. Accordingly, adding a given piece of information to the representation formed of the target results in a larger assimilation effect when the representation contains a small rather than a large amount of other information. Consistent with this assumption, Schwarz, Strack, and Mai (1991) observed that answering a marital satisfaction question before answering a question about one's general life-satisfaction increased the correlation of both measures from r = .32 (in the general - specific order) to r = .67 (in the specific - general order) when the marital satisfaction question was the only domain satisfaction question asked, reflecting an assimilation effect. This effect was less pronounced,
Cognitive and Communicative Aspects of Survey Measurement
51
however, when three specific life domains (work, leisure, and marriage) were addressed prior to the general life-satisfaction question, resulting in a correlation of r = .42 for marital and general life-satisfaction. Thus, the impact of information bearing on respondents' marriage was less pronounced as other information relevant to evaluating one's life became more accessible, due to a larger number of relevant context questions. 4.2.3
The Generality of the Target Question: Category Width
One of the most important determinants of assimilation versus contrast effects in survey practice is probably the generality of the target question. For example, a question about the trustworthiness of politicians could refer to all politicians in Germany, to politicians of the CDU, or to some specified individual politician, e.g. Oskar Lafontaine. In psychological terms, these questions would address target categories of differential width. The first question pertains to a wide category that allows the inclusion of any German politician who may come to mind, whereas the second question would only allow the inclusion of Christian Democrats. In contrast, the last question, pertaining to Oskar Lafontaine would not allow the inclusion of any other politician because a given person makes up a category by him- or herself. How would this differential category width affect the emergence of context effects? Suppose, for example, that a preceding question asks respondents to recall some politicians who were involved in a scandal, rendering these politicians highly accessible. According to the model, the politicians involved in the scandal are members of the general category "politicians" and are therefore likely to be included in the temporary representation that respondents form of that category. If so, their evaluation of the trustworthiness of politicians in general should decrease, reflecting an assimilation effect. In contrast, however, the politicians who were involved in the scandal could not be included in the representation formed of the narrow category "Oskar Lafontaine". Hence, the scandal ridden politicians may now serve as a standard of comparison or scale anchor, resulting in a contrast effect. Experimental data confirmed this prediction. Specifically, Schwarz and Bless (1992b) asked some respondents to name two politicians who were involved in the well-known Barschel scandal. Compared to respondents who were not asked a scandal related question, these respondents subsequently reported lower trust in German politicians in general. Other respondents, however, were not asked to evaluate the trustworthiness of German politicians in general, but the trustworthiness of three specific individual politicians. In this case, having answered the scandal question increased the reported trustworthiness of each of three specific politicians, although these exemplars were not particularly trustworthy to begin with. This pattern of findings reflects that the scandal ridden politicians could be included in the representation formed of "German politicians" in general, but not in the representation formed of any specific person. Given that a question about scandals taps the trustworthiness dimension to which the subsequent ratings pertained, the scandal ridden politicians were now used to construct the standard or scale anchor, resulting in a contrast effect. In general, the inclusion/exclusion model predicts that assimilation effects are the more likely to emerge, the more inclusive the target category is. Accordingly, general questions that assess respondents' opinion about a wide target category, that allows for the inclusion of a variety of different information, should be most likely to show assimilation effects. On the other hand, specific questions, that assess respondents' opinion about a narrowly defined target, should be more likely to show contrast effects. This reflects that it is more likely that the information that comes to mind can be included in one's representation of a global rather than of a specific target.
52
Norbert Schwarz, Herbert Bless, Hans-J. Hippler, Fritz Strack, and Seymour Sudman
4.2.4 The Spacing of Items in a Questionnaire The spacing of items in a questionnaire may determine the direction of context effects for two different, but related, reasons. First, psychological experiments have shown that respondents exclude information that comes to mind if they assume that it does so for the "wrong" reason. For example, Lombardi, Higgins, and Bargh (1987) observed that priming effects in a person perception task were only obtained when respondents were not aware of the priming episode (see also Strack, Schwarz, Bless, Kübler, & Wänke, 1993). If respondents are aware that the information that comes to mind may only do so because it was triggered by a preceding question, they may exclude it for that reason. As a second possibility, conversational norms may induce respondents to ignore information that they have already provided in response to a specific question when they are later asked to answer a more general one. This reflects that conversational norms request us to provide information that is "new" to the recipient, rather than to reiterate information that has already been given (see Schwarz, in press; Strack & Schwarz, 1992, for more detailed discussions). Both of these possibilities may be influenced by the spacing of items and by introductions to blocks of related items, to be addressed below. A study by Ottati, Riggle, Wyer, Schwarz, and Kuklinski (1989) bears on the impact of item spacing. They asked respondents to report their agreement with general and specific statements pertaining to civil liberties. For example, a general statement would read, "Citizens should have the right to speak freely in public." In one condition, this general statement was preceded by a specific statement that pertained to a favorable or unfavorable group, e.g., "The ParentsTeacher Association (or the Ku-Klux-Klan, respectively) should have the right to speak freely in public". As expected, respondents expressed a more favorable attitude toward the general statement if it was preceded by a specific one that pertained to a favorable, rather than to an unfavorable group. However, this assimilation effect was only obtained when the items were separated by eight filler items. If the items were presented immediately adjacent to one another, a contrast effect emerged. In this case, respondents reported a more favorable attitude towards the general statement if the preceding statement referred to a negative rather than positive group. This latter finding presumably reflects the exclusion of the primed information as a function of conversational norms and/or awareness of the possible influence of the preceding item. These considerations suggest that information that is primed by a preceding question is more likely to be included in the representation formed to answer a subsequent question when both questions are separated by unrelated filler items than when they are not. As a result, assimilation effects are likely to emerge in the former case, whereas contrast effects are likely to emerge in the latter. Note, however, that this prediction does only pertain to the emergence of assimilation effects that are based on the inclusion of primed information. Some assimilation effects are not based on this process, but reflect that respondents use the content of a preceding question to interpret the meaning of an ambiguous subsequent question (see Strack, 1992, for a more detailed discussion). For example, in a study reviewed above, Strack, Schwarz, and Wänke (1991) observed that college students were more likely to support an obscure "educational contribution" when they could infer from the context that it implied that they would receive money from the state, rather than that they would have to pay money for their education. Assimilation effects of this type occur at the level of question comprehension and reflect a deliberate effort to make sense of an ambiguous question. Hence they are likely to be obtained when the relevant context question and the ambiguous question are presented together, thus emphasizing their apparent relatedness.
Cognitive and Communicative Aspects of Survey Measurement
53
4.2.5 Introductions to Item Blocks As alluded to above, communicators are expected to avoid redundancy (Grice, 1975). In psycholinguistics this is known as the "given-new contract", which requires speakers to provide information that is "new", rather than to reiterate information that is already "given" (Haviland & Clark, 1974). Several studies indicate that this conversational norm is evoked when related questions are perceived as belonging to the same conversational context (e.g., Schwarz, Strack, & Mai, 1991; Strack, Martin, & Schwarz, 1988; Strack, Schwarz, & Wänke, 1991). If the questions follow a part-whole format, using the information that has already been provided in response to a specific ("part") question in answering a subsequent more general ("whole") question, would violate the conversational norm of non-redundancy. Variables that evoke this norm are introductions to a block of related items and the graphical lay-out of self-administered questionnaires. For example, Schwarz et al. (1991) asked respondents to report their marital satisfaction and their general life-satisfaction. When the marital satisfaction question was asked as the last question on one page of the questionnaire and the general question as the first question on the next page, happily married respondents reported higher, and unhappily married respondents reported lower, general life-satisfaction than when the general question came first. This reflects an assimilation effect as discussed above. When both questions were introduced by a joint leadin, thus assigning them to the same conversational context, respondents excluded the information that they reported in response to the specific question, much as if the question read, "Aside from your marriage, that you already told us about, how satisfied are you with other aspects of your life?" Accordingly, the joint lead-in elicited a contrast effect, with happily married respondents reporting lower life-satisfaction than unhappily married respondents. We conclude from this and similar findings (cf. Schwarz, in press; Strack & Schwarz, 1992) that conversational norms can trigger the exclusion of information that has already been provided from the cognitive representation formed for answering a subsequent question. Variables that can elicit the application of the conversational norm of non-redundancy include leadins to blocks of items as well as the graphical lay-out of self-administered questionnaires.
4.3
Summary
In summary, a consideration of the mental processes that underlie the construal of temporary representations of targets and standards provides a conceptual framework that allows predictions regarding the emergence, direction, size, and generalization of context effects in attitude measurement. The reviewed inclusion/exclusion model holds that any variable that influences the categorization of information that comes to mind is likely to moderate the emergence of assimilation or contrast effects. Schwarz and Bless (1992a) provide a comprehensive review of a host of different variables that have been studied in psychological research. The ones that are most relevant to questionnaire construction include the content and number of preceding questions, the width of the target category, the spacing of items in a questionnaire, the lead-in to blocks of related questions, and the graphical lay-out of self-administered questionnaires. Moreover, the model allows for the conceptualization of respondent variables, such as expertise, motivation, and cognitive ability, within the same conceptual framework.
54
Norbert Schwarz, Herbert Bless, Hans-J. Hippler, Fritz Strack, and Seymour Sudman
5
Concluding Remarks
At the level of empirical phenomena, the issues that we addressed in this chapter are all but new. Survey researchers have been well aware of the context dependency of attitude measurement, the potential impact of response alternatives, and respondents' difficulties in making sense of the questions asked. However, much of the research into these issues has been of a rather atheoretical nature, rendering it difficult to draw conclusions beyond the specific question under investigation (see Hippler & Schwarz, 1987; Schuman, 1992 for a historical review). As a result, research has often been of an ad hoc nature and the obtained findings did not result in a cumulative body of knowledge. In contrast, the recent collaboration of survey methodologists and cognitive social psychologists has resulted in a number of comprehensive conceptual models, allowing the derivation of theoretically informed predictions that go beyond the question given. Although some of these predictions are likely to prove false as research progresses, the existence of comprehensive conceptual frameworks is a crucial prerequisite for systematic and cumulative research. We hope that the present review illustrates the promise of one of the most rapidly developing areas of survey methodology.
References Bishop, G.F., Oldendick, R.W., & Tuchfarber, R.J. (1986). Opinions on fictitious issues: the pressure to answer survey questions. Public Opinion Quarterly, 50, 240-250. Bodenhausen, G. V., & Wyer, R. S. (1987). Social cognition and social reality: Information acquisition and use in the laboratory and the real world. In H.J. Hippler, N. Schwarz, & S. Sudman (Eds.), Social information processing and survey methodology (pp. 6-41). New York: Springer Verlag. Bradburn, Ν. M. (1983). Response effects. In P. H. Rossi, J. D. Wright, & A. B. Anderson (Eds.), Handbook of survey research (pp. 289-328). New York: Academic Press. Bradburn, N. M., Rips, L.J., & Shevell, S.K. (1987). Answering autobiographical questions: The impact of memory and inference on surveys. Science, 236, 157-161. Clark, Η. H., & Clark, Ε. V. (1977). Psychology and language. New York: Harcourt, Brace, Jovanovich. Clark, H. H., & Schober, Μ. F. (1992). Asking questions and influencing answers. In J. M. Tanur (Ed.), Questions about questions (pp. 15-48). New York: Rüssel Sage. Converse, P. E. (1964). The nature of belief systems in mass politics. In D. Apter (Ed.), Ideology and discontent (pp. 238-45). New York: Free Press of Glencoe. Erikson, R. S., Luttberg, Ν. R., & Tedin, K.T. (1988). American public opinion (3rd ed.). New York: Macmillan. Feldman, J. M., & Lynch, J. G. (1988). Self-generated validity and other effects of measurement on belief, attitude, intention, and behavior. Journal of Applied Psychology, 73, 421-435. Grice, H. P. (1975). Logic and conversation. In P. Cole, & J.L. Morgan (Eds.), Syntax and semantics, Vol. 3: Speech acts (pp. 41-58). New York: Academic Press. Haviland, S. E., & Clark, Η. H. (1974). What's new? Acquiring new information as a process of comprehension. Journal of Verbal Learning and Verbal Behavior, 13, 512-521. Herr, P. M. (1986). Consequences of priming: Judgment and behavior. Journal of Personality and Social Psychology, 51,1106-1115. Higgins, Ε. T. (1989). Knowledge accessibility and activation: Subjectivity and suffering from unconscious sources. In J. S. Uleman & J. A. Bargh (Eds.), Unintended thought (pp. 75123). New York: Guilford Press. Hippler, HJ., Schwarz, Ν., & Sudman, S. (Eds.). (1987). Social information processing and survey methodology. New York: Springer Verlag.
Cognitive and Communicative Aspects of Survey Measurement
55
Jabine, T.B., Straf, M.L., Tanur, J.M., & Tourangeau, R. (Eds.). (1984). Cognitive aspects of survey methodology: Building a bridge between disciplines. Washington, DC: National Academy Press. Jobe, J., & Loftus, E. (Eds.). (1991). Cognitive aspects of survey methodology. Special issue of Applied Cognitive Psychology, 5. Kahneman, D. & Miller, D. (1986). Norm theory: Comparing reality to its alternatives. Psychological Review, 93, 136-153. Lombardi, W. J., Higgins, Ε. T., & Bargh, J. A. (1987). The role of consciousness in priming effects on categorization: Assimilation and contrast as a function of awareness of the priming task. Personality and Social Psychology Bulletin, 13, 411-429. Martin, L. L., Seta, J. J., & Crelia, R. A. (1990). Assimilation and contrast as a function of people's willingness to expend effort in forming an impression. Journal of Personality and Social Psychology, 59, 27-37. Ottati, V.C., Riggle, E.J., Wyer, R.S., Schwarz, Ν., & Kuklinski, J. (1989). The cognitive and affective bases of opinion survey responses. Journal of Personality and Social Psychology, 57, 404-415. Payne, S. L. (1951). The art of asking questions. Princeton: Princeton University Press. Schuman, H., & Presser, S. (1981). Questions and answers in attitude surveys. New York: Academic Press. Schwarz, Ν. (1990). Assessing frequency reports of mundane behaviors: Contributions of cognitive psychology to questionnaire construction. In C. Hendrick & M. S. Clark (Eds.), Research methods in personality and social psychology (Review of Personality and Social Psychology, Vol. 11, pp. 98-119). Beverly Hills, CA: Sage. Schwarz, Ν. (1994). Judgment in a social context: Biases, shortcomings, and the logic of conversation. In M. Zanna (Ed.), Advances in experimental social psychology (Vol. 26). San Diego, CA: Academic Press. Schwarz, Ν., & Bless, Η. (1992a). Constructing reality and its alternatives: Assimilation and contrast effects in social judgment. In L.L. Martin & A. Tesser (Eds.), The construction of social judgments (pp. 217-245). Hillsdale, NJ: Erlbaum. Schwarz, Ν., & Bless, Η. (1992b). Scandals and the public's trust in politicians: Assimilation and contrast effects. Personality and Social Psychology Bulletin, 18, 574-579. Schwarz, Ν., & Hippler, H.J. (1991). Response alternatives: The impact of their choice and ordering. In P. Biemer, R. Groves, N. Mathiowetz, & S. Sudman (Eds.), Measurement error in surveys (pp. 41-56). Chichester: Wiley. Schwarz, Ν., Knäuper, Β., Hippler, Η. J., Noelle-Neumann, Ε., & Clark, F. (1991). Rating scales: Numeric values may change the meaning of scale labels. Public Opinion Quarterly, 55, 570-582. Schwarz, Ν., Münkel, Τ., & Hippler, H.J. (1990). What determines a "perspective"? Contrast effects as a function of the dimension tapped by preceding questions. European Journal of Social Psychology, 20, 357-361. Schwarz, Ν., & Strack, F. (1991). Context effects in attitude surveys: Applying cognitive theory to social research. In W. Stroebe & M. Hewstone (Eds.), European Review of Social Psychology, 2, 31-50, Chichester: Wiley. Schwarz, Ν., Strack, F., Hippler, HJ., & Bishop, G. (1991). The impact of administration mode on response effects in survey measurement. Applied Cognitive Psychology, 5, 193-212. Schwarz, Ν., Strack, F., & Mai, H.P. (1991). Assimilation and contrast effects in part-whole question sequences: A conversational logic analysis. Public Opinion Quarterly, 55, 3-23. Schwarz, Ν., Strack, F., Müller, G., & Chassein, B. (1988). The range of response alternatives may determine the meaning of the question: Further evidence on informative functions of response alternatives. Social Cognition, 6, 107-117. Schwarz, Ν., & Sudman, S. (Eds.). (1992). Context effects in social and psychological research. New York: Springer Verlag. Schwarz, Ν. & Sudman, S. (1994). Autobiographical memory and the validity of retrospective reports. New York: Springer Verlag.
56
Norbert Schwarz, Herbert Bless, Hans-J. Hippler, Fritz Strack, and Seymour Sudman
Smith, Ε. E. (1990). Categorization. In D. N. Osherson & Ε. E. Smith (Eds.), Thinking. An invitation to cognitive science (Vol. 3, pp. 33-54). Cambridge, Mass.: MIT Press. Smith, T. W. (1992). Thoughts on the nature of context effects. In N. Schwarz & S. Sudman (Eds.), Context effects in social and psychological research. New York: Springer Verlag. Sperber, D., & Wilson, D. (1986). Relevance: Communication and cognition. Cambridge, MA: Harvard University Press. Strack, F. (in press). Urteilsprozesse in standardisierten Befragungen: kognitive und kommunikative Einflüsse. (Judgmental processes in standardized interviews: cognitive and communicative influences.) Heidelberg, FRG: Springer Verlag. Strack, F. (1992). Order effects in survey research: Activative and informative functions of preceding questions. In N. Schwarz & S. Sudman (Eds.), Context effects in social and psychological research (pp. 23-34). New York: Springer Verlag. Strack, F., & Martin, L. (1987). Thinking, judging, and communicating: A process account of context effects in attitude surveys. In H.J. Hippler, N. Schwarz, & S. Sudman (Eds.), Social information processing and survey methodology (pp. 123-148). New York: Springer Verlag. Strack, F., Martin, L.L., & Schwarz, Ν. (1988). Priming and communication: The social determinants of information use in judgments of life-satisfaction. European Journal of Social Psychology, 18, 429-442. Strack, F., & Schwarz, Ν. (1992). Implicit cooperation: The case of standardized questioning. In G. Semin & F. Fiedler (Eds.), Social cognition and language (pp. 173-193). Beverly Hills: Sage. Strack, F., Schwarz, Ν., & Gschneidinger, Ε. (1985). Happiness and reminiscing: The role of time perspective, mood, and mode of thinking. Journal of Personality and Social Psychology, 49, 1460-1469. Strack, F., Schwarz, Ν., Kübler, Α., & Wänke, Μ. (1993). Awareness of the influence as a determinant of assimilation versus contrast. European Journal of Social Psychology, 23, 5362.
Strack, F., Schwarz, Ν., & Wänke, Μ. (1991). Semantic and pragmatic aspects of context effects in social and psychological research. Social Cognition, 9, 111-125. Sudman, S., Bradburn, N., & Schwarz, Ν. (in press). Applications of cognitive science to survey methodology. San Francisco, CA: Jossey-Bass. Tanur, J. M. (Ed.). (1992). Questions about questions. New York: Rüssel Sage. Tourangeau, R. (1984). Cognitive science and survey methods: A cognitive perspective. In T. Jabine, M. Straf, J. Tanur, & R. Tourangeau (Eds.), Cognitive aspects of survey methodology: Building a bridge between disciplines (pp. 73-100). Washington, DC: National Academy Press. Tourangeau, R., & Rasinski, K.A. (1988). Cognitive processes underlying context effects in attitude measurement. Psychological Bulletin, 103, 299-314.
Secondary Analysis of Official Microdata Richard Alba, Walter Müller, and Bernhard Schimpl-Neimanns
1
Introduction
The use of official data, i.e., data collected originally by government agencies, has a long tradition in sociological research. A renowned example is that of Suicide, in which Dürkheim attempted, through systematic comparisons of available suicide statistics, to demonstrate the necessity of a specifically sociological branch of science. Traditional secondary analysis, like Suicide, drew upon published or aggregated data because a single researcher was not in a position to analyze the frequently extensive original materials gathered by official bodies. An inherent problem was thus the risk of individualistic and ecological misinterpretation (Robinson, 1950; Scheuch, 1966). Also problematic was the potential impact on the analysis of decisions made by the collecting agencies at the time of aggregation, usually to serve administrative purposes. These decisions often proved less than ideal for problems tackled with the data by non-governmental researchers at later points in time. The role of official data has changed since these early days. Empirical social research is no longer so dependent on external data sources, for it has itself become a producer of data, which can be generated to fit the specific contours of the research problem. Data collected by researchers are usually available in the form of microdata, i.e., data containing information on the individual persons, households, or other units to be studied. Microdata allow a researcher considerable latitude in the definition of variables and in the use of complex statistical methods of analysis. Changing practices in these respects have, in turn, impacted upon the use of official data. Since the revolution in computer technology, the individual researcher often is able to analyze official microdata directly. This means that, in principle, official microdata offer opportunities as substantial as those from data collected by researcher and that, in particular, the data can be manipulated in a variety of ways to adjust them to the research problem and to the needs of the best analytic methodology. Nevertheless, data from official sources are not used much in German social research. This situation stands in strong contrast to that in the United States and some European countries. The fact that official data are employed quite successfully elsewhere demonstrates that an existing potential remains underexploited in Germany. In this paper, we address this situation and attempt to show that, also in Germany, official data could be of great benefit to sociological analysis, provided they can be used in microdata form. We begin with an overview of their role in the United States, where the secondary analysis of official data is an established instrument of empirical social research. We then describe the relationship between official statistics and empirical social research in Germany, where far more restrictions on the use of these data are found. A brief account of the past provides insight into the origins of the current situation.
58
Richard Alba, Walter Müller, and Bernhard Schimpl-Neimanns
We subsequently address directly the research potential in Germany, using for illustrative purposes the Employment Statistics (Beschäftigtenstatistik), the Sample Survey of Income and Expenditure (Einkommens- und Verbrauchsstichprobe), and a detailed example of a Microcensus analysis. Finally, we provide a prospective look at possible directions in social research and their relationship to the role of official microdata.
2
Illustrations from the United States
The state of the art of research with official microdata in the USA highlights the potential of this type of data. A number of different data sets are publicly available from the U.S. Bureau of the Census and other agencies and, as a result, are distributed widely among research centers and individual researchers, making them readily accessible to virtually anyone with training in data analysis procedures. The most important of these data sets include: 1. The Public Use Samples from various decennial censuses. The data sets represent large samples of the households and individuals counted in the censuses. For recent censuses, the samples are as large as five percent and include variables describing aspects of housing and residential location, migration, household composition, fertility, racial and ethnic background, education, labor force participation, income, and other characteristics. 2. Current Population Surveys. These are monthly surveys of approximately 60 thousand households that serve the official purpose of measuring labor force participation for the calculation of unemployment statistics. However, they invariably include a number of demographic and socioeconomic variables, and thus can be used by researchers for analyses that go well beyond their official function. In addition, an annual supplement explores a specific topic, such as fertility history or ethnic attachment, in greater detail. 3. Other specialized surveys, such as the National Health Interview Surveys, annual surveys by the National Center for Health Statistics intended to gather data on the frequency of illness and the use of medical services; the High School and Beyond surveys, longitudinal studies of American youth conducted by the National Center for Education Statistics, starting with students in two base years (1972 and 1980); and the National Crime Surveys gathered by the U.S. Department of Justice beginning in 1972 to investigate the frequency of crime victimization in the general population. Data of all three types have so worked their way into the practice of sociologists, demographers, economists, and geographers that a substantial percentage of articles published in the leading journals of these fields draw on microdata from official sources. Examining, for instance, the articles appearing during a recent three-year period (1990-92) in the American Journal of Sociology and the American Sociological Review, the two leading American sociology journals, we found that nearly one of every four empirically based articles on American society (such articles were the majority of those published) used official data. In addition, Census Bureau microdata from the 1980 Census provided the major data source for a series of volumes, sponsored by the Social Science Research Council, that assessed fundamental social and demographic trends in the United States (e.g., Lieberson & Waters, 1988; Frey & Speare, 1988). The range of topics that can be investigated with these data sources is impressive, although one must also acknowledge that research on any topic is inevitably confined by the limited variables included in official data. To give some sense of the range, we have selected a few of the titles from recent articles in major American sociological and demographic journals - "Race, family structure, and changing poverty among American children" (Eggebeen & Lichter, 1991);
Secondary Analysis of Official Microdata
59
"Sons, daughters, and the risk of marital disruption" (Morgan, Lye, & Condran, 1988); "Ethnicity, geography, and occupational achievement of Hispanic men in the United States" (Stolzenberg, 1990); "Industrial restructuring, gender segregation, and sex differences in earnings" (Tienda, Smith, & Ortiz, 1987); "Youth, underemployment, and property crime: Differential effects of job availability and job quality on juvenile and young adult arrest rates" (Allan & Steffensmeier, 1989); "The epidemic theory of ghettos and neighborhood effects of dropping out and teenage childbearing" (Crane, 1991). As this listing helps to make clear, microdata from official sources are essential to research in the areas of racial and ethnic differentiation; marriage, fertility, and family structure; socioeconomic attainment and inequality; gender inequality; urban and community studies; migration; and regional differentiation. Even research in criminology and deviance has been enhanced by the availability of such data. One of the more interesting developments of recent years is that American social scientists find ever new ways to combine various official data sources to enhance the power of their analyses. We now give several examples in detail to illustrate the sophistication, persuasiveness, and substantive variety of recent research employing such data. One example is research on changes in the propensity to many, as analyzed by Qian and Preston (1993). Their article demonstrates the use of microdata from different time points to shed light on important social trends - in this case, on sharply changing patterns of marriage and singlehood in the U.S., where men and women are marrying, on average, at later ages, and a rising percentage is remaining single for life. These trends have potentially profound social consequences, underscored by their link to the climbing proportion of births occurring to unmarried women (about a quarter of all U.S. births in 1987). The authors employ Current Population Survey data from 1973, 1980, and 1988, allowing them to consider the changes during the 1970s separately from those in the 1980s. They analyze changes in marriage patterns by age and education through the application of a methodologically sophisticated two-sex model that distinguishes between changes in the marriage pool, i.e., in the relative availability of different types of eligible partners, and changes in the "attractiveness" of partners with different characteristics. They conclude that the sharp decline in marriage rates during the 1970s occurred across the board - that is, it was not highly differentiated by age or education. This broad sweep was not true of the smaller decline during the 1980s, however, for it was concentrated among younger women. Quite different methodological and substantive possibilities are illustrated by the second example, an analysis of racial and ethnic variations in residential patterns in suburbia by Logan and Alba (1993; for a technical exposition, see Alba & Logan, 1992). Their analysis represents a new approach to a classical problem in using microdata to investigate questions in social geography: because of confidentiality restrictions, microdata files generally do not identify geographical units of residence in anything but the crudest of terms; but aggregate data sets, which do contain detailed information about specific communities, do not permit analyses at the level of the individual or household. Thus, the crux of the problem is how to analyze who lives where, without running afoul of the ecological fallacy, i.e., false conclusions about the forces operating at the individual level drawn from aggregate data patterns. Alba and Logan solve this by combining coefficients calculated from microdata and aggregate files in a single covariance matrix, which is then used to estimate OLS regression models. In the paper under discussion, they estimate what they call "locational-attainment" models predicting the affluence of the community where a person resides (measured as its median household income) in terms of individual-level variables, such as household income, English-language ability, and family characteristics. Their group- specific models show that some racial/ethnic groups, namely, whites and Hispanics, receive consistent locational returns (i.e., improvements in the affluence
60
Richard Alba, Walter Müller, and Bernhard Schimpl-Neimanns
of the communities where they live) based on their individual socioeconomic and acculturation status. The locational patterns of Asians differ because they are unrelated to measures of acculturation. For blacks, locational outcomes correspond least to any of these human capital characteristics. As a consequence, blacks live in poorer communities than do their counterparts in the other groups, i.e., those who have similar values on socioeconomic and acculturation variables. The third article we discuss here is DiPrete's (1993) analysis of job mobility among American workers during the 1980s, which focuses on issues of broad import for the U.S., specifically, the impacts of industrial restructuring. The analysis is motivated by the notion that the consequences of structural change for workers are mediated by organizational labor markets. To explore this idea, DiPrete uses Current Population Survey data from two years in the 1980s (1983 and 1987) to identify workers in different industries who changed or lost jobs, data from the Bureau of Labor Statistics to measure employment by industry, and data from the Small Business Data Base to indicate firm size by industry. The use of the latter two data sources allows the analysis to specify how job mobility is affected by the restructuring patterns of specific industries and by the presence of large firms, which are presumed to be better able than small firms at sheltering employees from economic turbulence. Nevertheless, DiPrete finds that the effects of economic restructuring on labor markets were pervasive during the 1980s, especially for blue-collar and service workers (white-collar workers, especially those in the upper ranks, were least affected). This does not deny that organizational labor markets were also partially effective: The analysis does succeed in detecting these effects, and DiPrete concludes that large organizations were able in some respects to shelter their work forces from the potent force of economic restructuring, but that, in the end, "these effects appear to be of limited importance in this period" (p. 92). Taken together, the three analyses described illustrate the substantive range and sophistication of what it is possible to do with official microdata in the U.S., where these data are a taken-for-granted part of the data stock available for research.
3
Empirical Social Research and Official Data in Germany
In the United States, not only do official microdata play a fundamental role in social research, but a degree of cooperation even exists between the agencies that produce such data and the research community that makes use of them. In Germany, the relationship between these two groups is more difficult for reasons that extend back in history. Both groups, in fact, share common historical roots1: they developed from the same early beginnings of systematic social observation for the purposes of the absolutist state ("cameralistic statistics"), predominantly qualitative "university statistics," "political arithmetics" and, last but not least, probability theory and mathematical statistics. However, there are elements in the history of official statistics and empirical social research that quite early marked the beginning of separate courses of development. Compared with France and Great Britain, modern social statistics was institutionalized rather late in Germany.2 When the Kaiserlich Statistisches Amt (Imperial Statistical Office) was established in 1872 shortly after the founding of the German Empire, the fact that the agency did not occupy itself only with data reporting but also employed scientists was regarded as significant. Holder and Ehling (1991, p. 18) point out, however, that the function of the
Secondary Analysis of Official Microdata
61
statistical bureaus, i.e., whether they were to be, above all, administrative agencies rather than more scientifically oriented institutions, has remained in dispute. In fact, official statistical bodies have always had to concentrate primarily on data needs for administrative purposes and could do little basic research because of limited staff and financial resources. While this fact has been deplored at the statistical bureaus (Wingen, 1989), it has by no means furthered cooperation with the university-based social research community. The social problems emerging in the wake of industrialization gave rise to research questions that could not be answered with statistical methods confined to bureaucratic accounting schemes. The Verein für Socialpolitik (Association for Social Policy and Legislation) carried out inquiries that were similar to - indeed, derived from - the social surveys conducted in other countries and that can be regarded as the starting point of survey research in Germany (Schäfer, 1971). In a self-conscious departure from the applied research and social-critical orientations of the Verein für Socialpolitik, the Deutsche Gesellschaft für Soziologie (German Society for Sociology) was founded in 1909 with an orientation that was more theoretical and philosophical in nature. Initially, the Deutsche Statistische Gesellschaft (German Statistical Society) and the Deutsche Gesellschaft für Soziologie agreed to establish close institutional links, but this never came to pass. Instead, there was distance and even rivalry. According to its supporter, von Mayr, the school of social bookkeeping in statistics felt exposed to enemies from inside and outside of statistics. Among the enemies within were the mathematical statisticians; among those without the sociologists who pointed to the problems associated with social measurement and argued that statistics was merely a subsidiary science. Certain priorities in the history of sociology thus have adversely affected the relationship between statistics and empirical social research, and their influence can still be seen today. As Esser (1989, p. 78) puts it, there is a "doctrine of 'pure sociology' hostile to empiricism, on the one hand; and, on the other, a style of research that exhausts itself in the obsession with empirical, 'socio-statistical' detail and is devoid of any theoretical basis." Schad (1972, pp. 25) regards the failure of the intended cooperation between sociologists and statisticians as one of the reasons why empirical social research had hardly developed in pre-war Germany. Among the few examples of empirical work using official data from this period is Geiger's (1932) well-known analysis of the social structure, based on the 1925 census of occupations and work places. After 1945, social research with a quantitative orientation mainly followed trends in the United States, thus borrowing the American emphasis on microsociological problems and survey research. The personal interview came to be regarded as the ideal instrument in sociology. One exception was the study by Peisert (1967) on social status and education opportunity, which demonstrated the fruitfulness of addressing "theoretical questions by manipulating the dry statistics of the population census in such a way that they yield unexpected findings" (p. 7). Starting in the mid-seventies, however, came growing interest and steady improvement in the technology for directly using individual data from official surveys on behalf of social research. Researchers at the Universities of Frankfurt and Mannheim played a pioneering role in this process: Economists and sociologists developed a Socio-Political Indicators and Decisionmaking System (SPES, Sozialpolitisches Indikatoren- und Entscheidungssystem), and used it to analyze changes in the social structure of the Federal Republic of Germany. These investigations showed for the first time that the direct analysis of microdata was a major step forward in the use of official data. This strategy was subsequently continued at the Sonderforschungsbereich 3 "Mikroanalytische Grundlagen der Gesellschaftspolitik" (Special Collaborative Program 3, Microanalytical Foundations of Social Policy, financed by the German National Science Foundation), and in particular in the VASMA project (Comparative analysis of social structure
62
Richard Alba, Walter Müller, and Bernhard Schimpl-Neimanns
with large data sets), where - among other things - the individual data obtained by the 10 percent sample survey of the 1970 population census were subjected to secondary analysis (see Haller & Müller, 1983; Biossfeld, 1985; Handl, 1988). Since the 1950s, there have been several developments in the realm of German official data that are of great significance for the relationship to the social sciences. Most important, despite a few setbacks, the number of data sets of interest to social-science research has increased substantially. The Microcensus was introduced in 1957 and has since been conducted annually (with occasional interruptions); and the first Sample Survey of Income and Expenditure took place in 1962/63. A program for education and university statistics was established in 1971, and one for Employment Statistics has developed since the mid-seventies. The most recent innovation, the time-budget survey, was initiated in 1991/92. Other major improvements include the creation of an environmental reporting system and an EU (European Union)-wide household panel. These surveys and data programs have provided data sets in fields that are critical for observing trends within society. If the data sets could be used shortly after they are collected, they would provide up-to-date data sources, since the intervals between collections are generally brief. But backward steps and declines in data quality cannot be overlooked, either. Examples are the drastic cuts in the data collection program of the population census, and the restrictions imposed by legislation and bureaucratic decision on the use of census data. Although the Bundesverfassungsgericht (Federal Constitutional Court) has not ruled out the use of individual population census data by scientists, researchers now receive mainly tabulated data, a situation that is not suitable for many scientific purposes. More than in other European countries, official data in Germany are strictly regulated and dependent on the vagaries of the political climate (Als, 1993, p. 40). While in countries such as France and Great Britain professional statisticians and survey research experts are responsible for designing official surveys, in Germany even the wording of the response categories is sometimes determined by law or ordinance. Regulations also restrict the opportunities to analyze official data sets to an extent that is surprising in light of the experiences elsewhere. For instance, the rich possibilities inherent in the Microcensus as a panel survey have not been exploited for more than ten years (see below). This is perhaps an extreme illustration of the degree to which the existing potential for analysis goes unutilized. But even the implementation of recommendations by a scientific advisory panel on how to modernize the Microcensus and increase its potential without additional cost (Esser, Grohmann, Müller, & Schäffer, 1989) was blocked by political decisions. The overregulation and increasing inflexibility of official data have been deplored by a number of observers (Wingen, 1989; Holder & Ehling, 1991; Jäger 1992). In Germany, these developments affect the use of official microdata within academic research. During the 1970s, a satisfactory solution to the issue of data access permitted intensive data use and resulted in numerous publications. However, as a consequence of the public debate over data protection and of the resulting introduction of restrictive regulations, the problems of data access have become so burdensome as to interfere at times the use of the Microcensus as a data source for secondary analyses or to render it less interesting because of limitations on the information available for analysis. Nevertheless, the 'population census judgement' (Volkszählungsgesetzurteil) of the Federal Constitutional Court acknowledged the legitimacy of the scientific need for access to microdata. The 1987 Federal Statistics Law took this judgement into account by introducing the concept of "factual anonymity", which stipulates that individual data be traceable to respondents only by inordinate expenditures of time, costs, and personnel, if at all. This concept affords sufficient practical protection for respondents, while augmenting the potential for data analysis by relaxing formerly stringent masking techniques. The Federal Statistical Office (Statistisches Bundesamt),
Secondary Analysis of Official Microdata
63
Mannheim University, and ZUMA tested implementations of factual anonymity in a joint research project. Empirical procedures were used to develop concrete measures to protect confidentiality and to determine their impact. These measures now serve as a basis for an improved practice on the part of the statistical bureaus in disseminating microdata (Müller, Blien, Knoche, & Wirth, 1991). But compared with the situation in other countries, there are still deficiencies. In the United States, official microdata can be accessed as 'Public Use Files' (see above), which form part of the standard holdings of social science data archives. Similarly, social scientists in Great Britain can use official data via the ESCR data archive. In France, LASMAS (Laboratoire Analyse Secondaire et Methodes Appliquies ä la Sociologie) has been entrusted by the French National Institute of Statistics and Economic Studies (INSEE) with the processing and distribution of data to be used by scientists. Similar institutions exist in several Scandinavian and Eastern European countries. These institutions demonstrate the international acceptance of access to official microdata for scientific purposes. In Germany, however, data are made available only for specific, previously defined research projects and must be deleted upon expiration of the contractual period of use. These restrictions make it very difficult to conduct continuous research in those fields where microdata are essential, such as the analysis of social change. In addition, the costs of access to the data are many times those in other countries and thus constitute a further obstacle to use. In addition to the legal, practical, and economic problems of data access, acquiring and analyzing large data sets from government agencies presuppose substantial methodological, data processing, and financial resources, which can generally be supplied only by sizable research institutions. To improve the individual researcher's opportunities for using such data, the Microdata Department was set up at ZUMA when GESIS, the German Society for Social Science Infrastructure (Gesellschaft Sozialwissenschaftlicher Infrastruktureinrichtungen e.V.), was founded. This ZUMA department provides advice and technical support to social scientists analyzing official microdata and carries out its own research projects with such data.
4
Strengths and Weaknesses of Secondary Analysis of Official Microdata
In principle, analysis of official microdata has the same advantages and shortcomings as any other kind of secondary data analysis: a) A fundamental norm of scientific procedure is that the production of knowledge must be open to criticism and competing approaches and also guarantee that replications and reanalyses are possible. The secondary analysis of existing data by third parties plays an extremely critical role in this process, for the adjudication between alternative interpretations often is only resolved by reanalysis of the data, b) Another important advantage is the economical use of the means available. To the extent that a given data set can be and is employed for different purposes, resources are used more efficiently, c) The most important shortcoming is that data may not be well matched to the requirements of a research problem. Such requirements may originate in the scientific theory to be tested, the assumptions of an analysis method, or the particularities of a problem of applied social research. The researcher who uses existing data may have to cope with a solution falling short of the optimum. We will not discuss these general conditions in detail here as they apply to all kinds of secondary analyses (compare Hyman, 1972; Hakim, 1982; Kiecolt & Nathan, 1985). These conditions may have even greater relevance for data from official sources than for data collected by scientific institutions, for the following reasons: a) Precisely because official
64
Richard Alba, Walter Müller, and Bernhard Schimpl-Neimanns
data are so often central to important policy decisions, governmental bodies cannot be allowed to monopolize their use, and guarantees must exist that the data can be analyzed from a variety of perspectives, b) Since the collection of official data involves high costs, and the data are often underutilized by the agencies that collected them, analyses by non-governmental researchers are strongly justified by considerations of cost efficiency, c) However, since official data are generated within a framework set by administrative procedures and policies, they sometimes fall short of scientific requirements. We now survey some special features of official data, with specific reference to Germany, in order to consider some of their strengths and shortcomings. The examples described in detail in the subsequent section then illustrate some typical features of official data: Large sample size: In most instances, official data sets encompass extremely large numbers of cases. The data typically derive from complete censuses or very large samples. Low sample bias: Official surveys are generally affected much less by non-response than are the usual social science surveys. Data that become available in the course of administrative action (e.g., the Employment Statistics) include all, or nearly all, cases subjected to a specific procedure. In Germany, it is compulsory for respondents to participate in population surveys. For instance, the level of non-response by households covered in the Microcensus is below five percent. Nevertheless, the existence of systematic selectivities must be determined for each data set. Context-Relatedness: The design of instruments for collecting official data often lends itself to investigations of the social contexts of individual actors. This is true, for instance, of such population surveys as the population census, the Microcensus, and the Sample Survey of Income and Expenditure, since in all these cases data are provided by all persons living in the same household. Consequently, individual respondents can be characterized by the composition and other aspects of the households in which they live. In the Employment Statistics, to take another example, all employees can be related to the workplaces in which they are employed. As a result, a workplace can be characterized by the aggregate characteristics of its workers (e.g., work-force size), and individual employees by their positions in the employment structure of the workplace. Longitudinal and international comparability: The data gathering of statistical bureaus has a high degree of continuity. Many data collections are repeated on the basis of the same concepts in an identical or only slightly modified form over long periods. Of great value for comparisons between countries is the fact that international standards exist for many concepts used in official surveys. While these standards do not solve the fundamental problems of comparative research, they can alleviate them. Bias in favor of legally defined statuses: Social reality as reflected in official data sets (especially given the German bureaucratic tradition) is strongly biased towards legally defined characteristics and almost completely neglects attitudes, intentions, and meanings. What is recorded, for instance, is not participation in a consensual union but marital status, not the social status of a person but membership in groups defined in terms of social insurance rights. Sampling uncertainties: To protect confidentiality, the official data distributed to the research community do not, as a rule, contain small-area geographic identifiers, i.e., information identifying the enumeration district or region of residence below the level of federal states. This adversely affects sampling error calculations. For one thing, it means that variance-reducing sample stratification cannot be taken into account. For another, insofar as official data are generated through cluster sampling (true of the Microcensus, for instance), sampling error is underestimated, since observations in cluster samples are not independent of each other but this dependency cannot be taken into account when the clustering at the enumeration district level is
Secondary Analysis of Official Microdata
65
unknown. Consequently, correct error calculation is not possible in practical work, and approximations have to be used.3 In short, the following picture of the significance of official microdata may be drawn based on the special features outlined above: Since the response rates for common sample surveys of the population are quite low in Germany (in many instances, they are now below 60 percent), unbiased reference statistics are especially important for checking the quality of estimates obtained by means of surveys. The use of the Microcensus has made it possible to attribute the systematic biases of survey data, which can be observed especially for variables relating to social class, to specific factors: These biases, known to survey researchers as "middle-class bias", are basically due to the under- or overrepresentation of persons with low or higher educational attainment and also to the difficulties of reaching respondents who are gainfully employed (Hartmann & Schimpl-Neimanns, 1992). It is obvious that the enormous size of official data sets creates specific strengths. Precise results can be obtained even for small population groups. The weight and relevance of findings obtained by surveys based on smaller samples can be assessed. These advantages are particularly important for estimates of absolute population figures or if estimates are needed for small regional units. Furthermore, the small sizes of estimation errors in official data make it feasible to investigate changes over short intervals of time. Such trends can only be identified if errors in estimates are smaller than the magnitude of change. Thus, in order to observe the often slowly developing trends of social change, and in particular to identify the trends in different regions or population groups, the analysis of repeated large-scale surveys is indispensable. The continuity and international comparability of official data are clearly beneficial for the analysis of long-term social change and for comparative social research. There are no other data sources that permit researchers to analyse - with at least somewhat comparable categories central social trends since the beginning of industrialization, such as the transformation of occupational, economic, and workplace structures, of family and household patterns, or of women's role in economic life since 1875 (as examples see Willms-Herget, 1985; Stockmann, 1986; for an overview see Statistisches Bundesamt, 1990). The advantages of official microdata have been recognized also in comparative research (see, e.g., the comparative analyses emanating from the LIS (Luxembourg Income Study) project on the problems of income gaps and poverty (Smeeding, O'Higgins, & Rainwater, 1990); Esping-Andersen, Assimakopoulou, and Kersbergen (1993) on institutional determinants of class structure; König and Müller (1986) on the relationship of educational and employment systems). Given limited space, our most difficult task is to describe in detail the subjects that lend themselves best to analysis by official data and to characterize the way in which social reality is reflected by such data. It is certain that some official data are not well suited to the requirements of social science; at the same time, their potential is often underestimated. Obviously, the potential contribution of official data varies considerably from one field to another, and it is therefore necessary to take a close look at individual data sets. In the following section, we briefly describe for Germany selected data that are or will soon be available in the form of microdata. We can give no more than cursory descriptions here, which convey only rough impressions of the existing potential. To compensate, the description of the Microcensus is followed by a short presentation of some results from a recent analysis. We thus illustrate how important research questions can be readily addressed with this data set, whereas it would require far more effort if social science was to conduct its own surveys for the same purpose.
66
Richard Alba, Walter Müller, and Bernhard Schimpl-Neimanns
5
Analytic Potential of Major Bodies of Official Microdata
Federal agencies in Germany maintain a number of data sets that are based on individual-level data. We cannot discuss all of them here (for an overview see Statistisches Bundesamt, 1988). We focus instead on the data sets that constitute the backbone of German social statistics, including the Employment Statistics, the Survey of Income and Expenditure, and the Microcensus. Since they exemplify quite different types of surveys and data, their special features are briefly described in this section.4
5.1
Employment Statistics
The Employment Statistics (Beschäftigtenstatistik) provide an example of data produced by administrative action.5 These data are derived from the insurance accounts maintained by the Bundesanstalt für Arbeit (Federal Employment Services), which are in turn based on employer reports to social security agencies for those employees who are legally obliged to make social insurance contributions. Such information has to be provided at the beginning and end of an employment relationship (one that is subject to social insurance) and in the event of changes in or interruptions of such a relationship. This means that the data in principle contain information on the entire working life of an employee as far as employment subject to social insurance is concerned. The variables included are: year of birth/age, sex, general education and vocational training, nationality, and for any employment spell, branch of industry, place of work, occupation/profession, class of worker, full-time/part-time employment, and gross earnings6 subject to social insurance contributions. These data cover the great majority of wage earners and salaried employees, some eighty percent of total employment. Public officials, self-employed persons, and family workers as well as persons whose income remains below the lower income threshold established for social insurance contributions7 are exempt from the obligation to pay contributions and are therefore not covered in the accounting system. At the Bundesanstalt fur Arbeit, individual plants are assigned separate identification numbers. This workplace number is added to the data of the insured and is used for establishing a workplace file and thus enables analyses relating to workplaces (Bellmann & Buttler, 1989). All that has been published so far out of this enormous body of data is in a highly aggregated form: tables of averages by industrial sectors and regions, which are published quarterly, annually, and for longer periods. Since analysis of the entirety of these data (Historical File) would consume considerable resources, the Institut für Arbeitsmarkt- und Berufsforschung (IAB, the Institute for Employment Research) of the Bundesanstalt für Arbeit has extracted a sample. So far, little experience with this file has been reported by researchers outside the Bundesanstalt für Arbeit and the IAB, since the data were available to few outside researchers in the past. In a cooperative project with the Wissenschaftszentrum Berlin für Sozialforschung (WZB, the Science Center Berlin) and ZUMA, the IAB is currently preparing an anonymized one percent subsample that will contain longitudinal data on employees for the period 19751990/1991, augmented by data on workplaces and on the receipt of benefits (e.g., unemployment benefits). Insofar as data protection regulations permit, these data will be made available to interested researchers via the Zentralarchiv (Central Archive) in Cologne (Bender, 1993). Though the coverage of only a small number of variables limits the range of subjects that can be analyzed with the Employment Statistics, the availability of information for workplaces adds considerably to their value. Because of the large number of cases involved, these data constitute a unique resource for regional and sectoral analyses of the labor market (see
Secondary Analysis of Official Microdata
67
Bellmann & Möller, 1993). In addition, contextual analyses can be constructed using the data on workplaces, as can analyses at the workplace level itself (see Bellmann, Boen, & Lehmann, 1994)8. Even the one-percent sample from the Historical File will still contain about 460 thousand cases and will provide a sound basis for event history analyses, e.g., of income changes (see Blien & Rudolph, 1989). Compared with longitudinal surveys, the Employment Statistics, which are produced in the course of administrative action, have the advantage that problems of panel mortality are minimal. Neither do they contain any of the various errors that may arise when data on occupational careers are collected retrospectively. Since these data reflect employment careers, they afford the opportunity to examine individual changes as well as changes in aggregates, for example, growth or shrinkage processes in workplaces, industrial sectors, and regions (see Boeri & Cramer, 1991; Arminger, Blien, & Wiedenbeck, this volume). Also, individual-level processes can be analyzed against the backdrop of structural change.
5.2
Sample Survey of Income and Expenditure
The Sample Survey of Income and Expenditure (EVS, Einkommens- und Verbrauchsstichprobe) is one of the few quota samples found in the corpus of official data. It was first conducted in 1962/63 and has been repeated ever since at five-year intervals. In 1993, it was carried out for the first time in the new federal states (Euler, 1992). Participation in this sample survey is not obligatory. Because of the systematic non-response evident in the pilot surveys conducted with random sampling methods for the first EVS, random sampling was dropped. From the beginning, the method of quota sampling has been used (in 1993 with disproportionate stratified sampling) to compensate for the non-response associated with voluntary participation. For related reasons, households with extremely high incomes have been excluded since 1969; in 1993, the cutoff was a monthly net household income of DM 35,000 (US $ 22,000, approximately). Households headed by foreigners (i.e., non-citizens) were first covered by a pilot survey in 1988 and have been polled on a regular basis since 1993. Like all voluntary surveys, the EVS is affected by falling participation rates. According to the quota plan, the 1993 EVS was to collect data on 70,000 households, for an average sampling fraction of 0.2 percent. For the first time in the history of the EVS, a smaller number of households participated (only 81 percent of the target) than was anticipated (Pöschl, 1993). Because the survey is based on quota sampling, the generalization of results is somewhat problematic, since the sample selectivity among different groups of respondents is unknown. In terms of content, a strong point of the EVS is that it contains a variety of data on income and its sources as well as on consumption patterns. No other survey supplies such an array of data on the origin and distribution of household income, household wealth and indebtedness, social transfers, household expenditures, and possession of consumer durables. These distinctive features of the data were exploited, for instance, by a recent study of household expenditures on children in relation to household income and socio-economic composition (Euler, 1993). EVS data would also be well suited for an investigation of notions about the pluralization and individualization of life-styles (see Beck, 1986; Zapf et al., 1987; Mayer & Blossfeld, 1990). However, because the EVS does not collect data on education, it cannot be used for human-capital analyses of labor supply and earned income. Variables extracted from the EVS data by the former Sonderforschungsbereich 3 at the Universities of Frankfurt and Mannheim have recently been stored with the Luxembourg Income Study Project (LIS) and can be analyzed there by means of teleprocessing. Researchers can also
68
Richard Alba, Walter Müller, and Bernhard Schimpl-Neimanns
acquire individual EVS data in accordance with the confidentiality criteria described by Müller, Blien, Knoche, and Wirth (1991, pp. 440.). So far, social research has used the EVS mainly for studies of poverty. While the stock of official data is generally far from ideal for income analyses, the EVS is among the data sources best suited for this purpose.9 The EVS made it possible, for instance, to set standardized poverty limits using household-specific social welfare rates. Further, the findings of Hauser, Cremer- Schäfer, and Nouvertni (1981) from the EVS show a sharp decline between 1962/63 and 1973 of relative poverty, defined as a household income amounting to less than 50 percent of average income; for the period from 1978 to 1983, however, Rohwer (1987) has found an increase in income inequality.
5.3
Microcensus
The Microcensus has been carried out at least once in nearly every year since 1957. As a survey collecting information on a variety of subjects, it supplies data that can be evaluated in many ways at the individual and the household or family levels (Statistisches Bundesamt, 1989; Esser, Grohmann, Müller., & Schäffer, 1989). In fact, the basic data program covers only a small number of variables. However, through supplementary modules on special subjects (some repeated at regular intervals) a data base has been established that covers a wide range of subjects and can be used for labor-market and broader socio-economic analyses, as well as for analyses of household and family patterns. At regular intervals, the Microcensus collects information on, among other things, basic demographic characteristics, labor-force activities (labor-force status, self-employment, occupation, industry, working hours, and terms and conditions of employment), income and its sources, past and present schooling and vocational training, and various aspects of social security, health, and other subjects. A sample of one percent of the population, presently some 800,000 persons in about 350,000 households, is surveyed each year. The new federal states have been included since 1991; and a new sampling plan, operational since 1990, has assured the validity of results for different regions (see Reinders, 1993; Heidenreich, 1994; Meyer, 1994). The Microcensus employs a one-stage stratified area or cluster sample, based on sampling districts consisting of neighboring buildings or parts thereof. In principle, this scheme makes it possible to characterize individuals not only by the families and households they belong to but also by the neighborhoods where they live. Since a sampling district is replaced only after it has been surveyed four times, the Microcensus is an implicit panel survey. Regrettably, this feature of the Microcensus's sample rotation is not usable anymore and is ignored even for evaluative purposes by governmental statistical bureaus. The data-protection rationale for disabling the panel feature is, in our view, not justified, even given maximum recognition to the need for data confidentiality. To measure changes at the individual and household levels, even the statistical bureaus must therefore resort to retrospective questions, which are included, for instance, in the EU labor force survey forming part of the Microcensus (0.45 percent sample) but are less informative than panel analyses. However, for social scientists the EU labor force survey data, which are well suited for international comparisons, are not made available as individual data. Because respondents are under a legal obligation to provide information in the Microcensus, non-response is quite low; in 1991, the non-response rate in the old federal states averaged 3.3 percent, compared to 4.2 percent of all households in the new federal states (Heidenreich, 1993, 1994). Since 1990, the biennial supplementary module is no longer
Secondary Analysis of Official Microdata
69
obligatory; this change affects questions on education and vocational training, as well as special information to be provided by foreigners. In 1991, for example, nearly 10 percent of the respondents aged 18 years and over did not answer any questions on education and training. Scientific institutions can acquire two variants of Microcensus microdata for research purposes. The so-called basic file is a 70 percent subsample. Apart from identifying the federal state and locating a household in terms of a rough classification of community size, it does not give any geographic information; but all other data are included in nearly complete detail. The regional file, which is better suited for regional analyses, contains more detailed regional information (though all regional reporting units must still have a minimum of 100,000 inhabitants). To protect confidentiality, some variables that are reported in detail in the basic file - occupation, industry, and nationality - are included in a coarser form in the regional one. The Microcensus is the official data base that has so far been used most intensively by social science. This is due partly to the size and high quality of its sample, but primarily to the wide range of problems that can be studied with these data. For instance, the supplementary survey in the 1971 Microcensus, entitled "Occupational and Social Restratification of the Population," stimulated a large number of studies on stratification themes - inequality of educational opportunities, patterns of occupational and social mobility, changing participation of women in the labor market, and the integration of refugees and expellees. Recent publications serve as additional proof of the extraordinary potential of this survey (Handl & Herrmann, in press; Müller & Haun, 1994). We cannot treat these and many other studies in detail here, but we will illustrate the potential of the Microcensus for social research by briefly summarizing some results of a recent study of ethnic inequality in the German school system, based on the 1989 Microcensus (Alba, Handl, & Müller, in press). This study investigates whether children of immigrant minorities face disadvantages during their school careers in comparison with German children, what factors influence the degree of any disadvantage, and what specific educational differences exist between different groups of immigrant children. To answer these questions, data on large numbers of children of different ethnic origins must be available. These data must also contain information on the most important non-ethnic determinants of placement in the school system, for, in order to determine whether any educational differences between groups of children are traceable specifically to ethnic origins, other relevant factors must be taken into account. Undoubtedly, the social, cultural, and economic situation in the home has a critical impact on children's participation in education. Although the Microcensus does not contain a single direct question on the social background of each person, much of the required information can still be derived because the Microcensus is a household survey which collects data for all persons living in a household. Consequently, information for parents (or other household members), such as their educational attainment, can be used to characterize the home environment of school-age children. Variables derived in this way can be related to children's school placement, and their power to explain ethnic differences in education can be assessed. Table 1 illustrates such an analysis scheme: it presents the results of a multivariate logistic regression analysis of whether 13-15 year-olds are placed in the Hauptschule track (scored 0) or one of the higher tracks of the school system (Realschule or Gymnasium, scored as 1). Other research has shown that this fork in the German system is the most fateful for young people's school and professional careers (Müller & Haun, 1994).
70
Richard Alba, Walter Müller, and Bernhard Schimpl-Neimanns
Households head's education no diploma (or no information) Hauptschule without apprenticeship Hauptschule with apprenticeship Realschule without apprenticeship Realschule with apprenticeship Abitur (Gymnasium) Technical college College or university**
-2.327* -2.385* -1.945* -1.434* -1.205* -.732* -.608* 0
Households heads's occupation Agricultural -.858* Simple manual -1.272* Qualified manual -.993* Technical -.127 Engineering -.095 Simple service -1.184* Qualified service -.541 Semi-professional -.486 Professional** 0 Clerical/simple administrative -.505 Qualified administrative -.179 -.155 Managerial Unemployed / Social welfare -1.894* Other employment -.953* Unknown -1.071* Self-employed .116
Family-size, Gender Number of children in household Male
-.168* -.352*
Residence In Bundesland with high proportion of foreigners Small city** Medium-sized city Large city
.041 0 .106 .531*
Generation / Age of arrival German/second generation** Came 5-9 years old Came 10-14 years old Unknown
0 -.525* -.757* .228
Ethnicity German** Turkish Yugoslav Italian Greek Other foreign
0 -.515* -.310* -.739* -.001 .008
Constant X2
3.361* 2,426.0 with 35 d.f.
* = (ρ < .05); ** = Italics indicate reference category. Source: Microcensus 1989 (70 % subsample; ZUMA-File) Table 1: Logistic regression analysis of educational placement in Realschule or Gymnasium (higher), vs Hauptschule (standard) of 13-15 year-olds.
The analysis demonstrates that the educational attainment and occupational position of the parent who is household head have pronounced impacts on the school placement of a child. The probability of placement outside the Hauptschule track - and hence in a track of the school system that leads to better educational opportunities - declines with each step down in the education of the household head (the coefficients in the table are comparisons to the reference category, containing household heads with university educations). The distance between extreme categories of the head's education is approximately 2.4 in logit terms, roughly corresponding to a difference of 50 percentage points in the average probability of placement in a higher track. The effect of the head's occupational status is not as large as this. Nevertheless, compared with the children of parents with professional occupations, those whose parents perform simple manual or service occupations are considerably more likely to be found in the
Secondary Analysis of Official Microdata
71
Hauptschule track. Even more disadvantaged are the children of parents whose main income comes from unemployment compensation, or social welfare payments. In addition, children from large families are more likely to be found in Hauptschulen. This is also true for boys in contrast to girls, but a home located in a substantial-sized city is an advantage when it comes to attending schools in the more favored tracks. With respect to ethnicity, the findings reveal, in one respect, evidence of a process of assimilation: this is reflected in the effects of generation in Germany and/or age at immigration on school placement. The study defined children of the second generation as foreign children who were either born in Germany or arrived before the age of 5 and thus before school entry. In comparison to such children, foreigners arriving later are distinctly disadvantaged in terms of school placement, and their probability of being placed in Hauptschulen appears to increase the later they arrive (the relevant logit coefficients are -.525 for those who arrive between the ages of 5 and 9, and -.757 for those who arrive at 10 or older). But even children of the second generation do not have nearly the same chances as German children to attend schools in the more favored tracks. This inequality is reflected in the coefficients for different ethnic origins; these coefficients show, further, that there are important differences among children from different immigrant groups. In particular, Italian, Turkish, and Yugoslav children (in declining order of disadvantage) are more likely to be placed in Hauptschulen, while the children of Greek immigrants do not appear to be disadvantaged at all. By means of further analyses, Alba, Handl, and Müller (in press) consider additional aspects of how socio-economic and ethnic factors affect school outcomes and vocational training. However, even the limited results presented here should suffice to illustrate the potential of the Microcensus to illuminate critical features of social structure. Indeed, it would be extremely difficult to arrive at such powerful results with any other current German data set, for even the special foreigner samples of the well-known Socio-economic Panel are too small to allow distinctions among different nationalities and an equivalently refined analytical model. Moreover, since the Microcensus is conducted every year, one can, in principle, use it to monitor social change - in this case, whether ethnic disadvantages remain constant or change over time.
6
Research Potential and Prospects
This brief survey of official microdata demonstrates the potential of this type of data to contribute to research on numerous important questions concerning social patterns, trends, and policy. As we have seen, major problems of basic research in the social sciences can be addressed by innovative methods applied to official data. In the United States, the use of such data is far more advanced than in Europe, especially in Germany. Microdata are widely used in American social research (sometimes in conjunction with other data), and their use is accepted as part of the normal research process. In this final section, we focus on the prospect that the demand for analyses of official microdata will increase in Europe in the future. As we have argued, the research potential of official microdata is mainly due to the combination of large sample size with repetition of measurement at regular intervals. These features enable the analyst of official microdata to obtain detailed, over-time portraits of social patterns in such fields as families and households, migration, occupations and labor markets, and schooling. Reliable information can be provided on small population groups, and results can be separately delineated by geographic units. Because data collections are repeated, changes in social patterns can be detected. Given
72
Richard Alba, Walter Müller, and Bernhard Schimpl-Neimanns
these strengths, a number of current social trends, some likely to have even greater impact in the future, suggest a growing need for the kinds of insights that official microdata can yield. We refer here to trends concerning social differentiation within nations, on the one hand, and supranational integration in Europe, on the other. It is often maintained that trends towards an individualization of life-courses and a pluralization of family types and life-styles are among the fundamental impacts that advancing modernization and an increasing standard of living have on domestic social differentiation and social structure (see Beck, 1986; Zapf et al., 1987). The large official surveys that are conducted regularly can deliver critical information here, if only in answering the following questions: In what domains do individualization and pluralization actually take place? What is their extent, and what precise forms do they take? If assumptions about a developmental process are correct, then there will be a growing demand for data by which an increasingly more differentiated society can be comprehended; and this will require the ability to reliably represent small population groups. Social differentiation has increased in most European societies over the last several decades, due in significant part to the fact that these societies have become immigration societies. A number of different ethnic minorities live in most of these countries today and are likely to remain permanently. Based on current demographic realities, minorities are likely to increase their share of the population in the near future. Thus, Europe is almost certain to be confronted with a number of problems relating to ethnic differentiation and integration. Research on migration and minorities will need to expand, though this research field may not become as central in Europe as it has long been in the United States. The design of major bodies of data must take this need into account. As the discussion in the preceding section indicates, even one of the largest social surveys, the Socio-economic Panel, falls somewhat short when reliable information on ethnic subgroups, e.g., characteristics of children of secondary-school age in different immigrant minorities, is required. This is true, even though a considerable effort has been made in this survey to include foreign inhabitants through disproportionate sampling. As a consequence of European integration, comparative approaches are becoming more important for empirical research. While most research is still conducted within the national context, the restrictions imposed by national borders are gradually losing their legitimacy. The integration process makes it imperative to compare the conditions and trends in any single nation with those existing in other European countries. Reliable, differentiated, and comparable data on social structure constitute the critical prerequisite for achieving this goal. Fortunately, the European statistical bureaus have made efforts for some time to coordinate their data collection programs. For example, the European labor force survey, which is linked with the Microcensus in Germany, gathers a considerable amount of data at regular intervals in the EU countries, and great emphasis is placed on its international comparability. We are not able here to assess in general the degree to which the sometimes conflicting goals of international comparability and accurate recording of country-specific conditions are attained. Nevertheless, we do not doubt that it will take years and considerable investment until empirical social research will have at its disposal for the whole of Europe even a fraction of the sizeable stock of official data that exists within specific countries today. The integration of Europe, if it continues to proceed as forecast, will not only entail the abolition of borders between individual countries. As a supranational union takes shape and national units lose some of their power, subnational regions will also grow in importance in accordance with the principle known as "subsidiarity", which is basic to the political framework of the European Union. The shaping of regional identities and the fostering of a more or less balanced development of different regions within Europe are major items on the agenda for the EU. It can be anticipated that regional issues will be objects of contention in political debates,
Secondary Analysis of Official Microdata
73
perhaps above all in unified Germany, and that they will require more emphasis on the regionalization of research findings than in the past. Both in Germany as in Europe as a whole, the research community will have to rely on data from official sources if the aim to produce a complete picture of regional differentiation is to be met. Given these broad trends, data from official sources seem certain to gain in value for research at various levels. More than ever before, they are indispensable to research on social structures and their development in the European societies. It is therefore regrettable that access to these data for research purposes is still not satisfactorily resolved in a number of instances. There are great differences in this respect from one country to another. While in the United States, and also in Great Britain, France, and the countries of Scandinavia and Eastern Europe, data availability regulations are, for the most part, compatible with the needs of scientific research, considerable problems can still be found in other countries. Unfortunately, Germany is one of them, even though, after a concerted effort by all involved parties, a satisfactory solution to the problem of confidentiality has been forged. Thus, in principle, all questions relating to data protection have been solved since 1991; but, with them aside, it has turned out that high costs and the lapse of time between data collection and availability constitute further obstacles to optimal use of the data.10 For Europe as a whole, the problem of access to microdata has not been settled even in principle (see Als, 1993, p. 175). No ways therefore exist to use microdata on an European level. Resolving this problem is of vital importance, since comparative analyses in Europe will inevitably have to rely primarily on official microdata. In many fields of social science, American research produces extraordinarily fruitful results because it is able to draw upon official microdata. In the U.S., such data have long been available to researchers and widely disseminated among research institutes; the research practices involved in using them are well understood and part of the standard toolkit of researchers. If the chance to use such resources and to benefit from the findings they can yield about social trends in a unified Germany and an integrating Europe is not to be missed by default, then ways will have to be found to unleash their full research potential in Germany and in Europe.
74
Richard Alba, Walter Müller, and Bernhard Schimpl-Neimanns
Notes 1. For details see Oberschall (1965), Schäfer (1971), Fürst (1972), Schad (1972), and Maus (1973). 2. This was due to political and economic backwardness and to the fact that the country was split into individual states. Statistical bureaus were founded in the individual states as precursors of imperial statistics — the first of them being the Preussisches Statistisches Büro (Prussian Statistical Bureau) in 1805. 3. Not long ago, a major ESRC Research Programme into the Analysis of Large and Complex Datasets (1993) has been started. 4. We can only briefly mention here the microdata from the former German Democratic Republic, which contain vital data about social structure in the East just prior to the transformations initiated by unification (for details about these data, see Lüttinger & Wirth (1993); Schimpl-Neimanns, Sutter, & Wirth, 1993). The following listing cites the data that are of prime interest for research and available, because they are still stored in a machine-readable form: the 1971 and 1981 population censuses, the 1988 Income Sample Survey, the 1982 - 1992 Household Budget Statistics, and finally the 1985 and 1990 TimeBudget Surveys. 5. For more detailed descriptions see: Herberger and Becker (1983), Cramer and Majer (1991), and Statistisches Bundesamt (1992). 6. Wages are reported at the end of a year or after a job change, but higher wages are censored to the limit up to which contributions are levied: at present DM 7,600 per month. 7. These are persons working less than 50 workdays per year or earning less than DM 500 per month. Persons in minor employment have been included in the data only since 1990. 8. However, the existence of some limitations on workplace analyses must be acknowledged, since the workplace identifiers have not been assigned by a consistent procedure (see König & Weisshuhn, 1989). 9. However, some groups with a high poverty potential (homeless people, persons in institutions) are not covered by this survey and must therefore be studied separately. 10.Data from the Microcensus are available to researchers only two to three years after the surveys are carried out.
References Alba, R., & Logan, J. (1992). Analyzing locational attainments: Constructing individual-level regression models using aggregate data. Sociological Methods & Research, 20, 367-97. Alba, R., Handl, J., & Müller, W., W. (in press). Ethnische Ungleichheit im Deutschen Bildungssystem. Kölner Zweitschrift für Soziologie und Sozialpsychologie. Allan, E.A., & Steffensmeier, D. (1989). Youth, underemployment, and property crime: Differential effects of job availability and job quality on juvenile and young adult arrest rates. American Sociological Review, 54, 107-123. Als, G. (1993). Organisation der Statistik in den Mitgliedsstaaten der Europäischen Gemeinschaft. Vol. 1. Luxemburg: Amt für amtliche Veröffentlichungen der EG. Beck, U. (1986). Risikogesellschaft. Auf dem Weg in eine andere Moderne. Frankfurt: Suhrkamp.
Secondary Analysis of Official Microdata
75
Bellmann, L., & Buttler, F. (1989). Lohnstrukturflexibilität— Theorie und Empirie der Transaktionskosten und Effizienzlöhne. Mitteilungen aus der Arbeitsmarkt- und Berufsforschung, 22, 202-217. Bellmann, L., Boen, T., & Lehmann, U. (1994). Analyse betrieblicher Wachstumsprozesse auf der Basis der Beschäftigtenstatistik. In U. Hochmuth & J. Wagner (Eds.), Firmenpanelstudien in Deutschland (pp. 83-105). Tübingen: Franke. Bellmann, L., & Möller, J. (1993). Institutional influences on interindustry wage differentials. Paper prepared for the 2. Workshop 'Institutional Frameworks and Labor Market Performance', November, 18-20, Nürnberg: IAB. Bender, S. (1993). Anonymisierung der IAB-Stichprobe aus der Beschäftigtenstatistik. Unpublished Project Paper. Nürnberg: IAB. Blien, U., & Rudolph, H. (1989). Einkommensentwicklungen bei Betriebswechsel und Betriebsverbleib im Vergleich. Mitteilungen aus der Arbeitsmarkt- und Berufsforschung, 22, 553567. Biossfeld, H.-P. (1985). Bildungsexpansion und Berufschancen. Frankfurt: Campus. Boeri, T., & Cramer, U. (1991). Betriebliche Wachstumsprozesse: Eine statistische Analyse mit der Beschäftigtenstatistik 1977-1987. Mitteilungen aus der Arbeitsmarkt- und Berufsforschung, 24, 70-80. Cramer, U., & Majer, W. (1991). Ist die Beschäftigtenstatistik revisionsbedürftig? Mitteilungen aus der Arbeitsmarkt- und Berufsforschung, 24, 81-90. Crane, J. (1991). The epidemic theory of ghettos and neighborhood effects of dropping out and teenage childbearing. American Journal of Sociology, 96, 1226-1259. DiPrete, T. (1993). Industrial restructuring and the mobility response of American workers in the 1980s. American Sociological Review, 58, 74-96. Eggebeen, D., & Lichter, D. (1991). Race, family structure, and changing poverty among American children. American Sociological Review, 56, 801-817. Esping-Andersen, G., Assimakopoulou, Z., & Kersbergen, K. van. (1993). Trends in contemporary class structuration: A six-nation comparison. In G. Esping-Andersen (Ed.), Changing classes. Stratification and mobility in post-industrial societies (pp. 32-57). London: Sage. ESRC Research programme into the analysis of large and complex datasets (1993). ESRC Data Archive Bulletin 54, 11-18. Esser, H. (1989). Amtliche Statistik und empirische Sozialforschung: Bemerkungen zu einem (scheinbar) schwierigen Verhältnis. Allgemeines Statistisches Archiv, 73, 70-86. Esser, H., Grohmann, H„ Müller, W., W., & Schäffer, K.-A. (1989). Mikrozensus im Wandel. Untersuchungen und Empfehlungen zur inhaltlichen und methodischen Gestaltung. Stuttgart: Metzler-Poeschel. Euler, M. (1992). Einkommens- und Verbrauchsstichprobe 1993. Wirtschaft und Statistik, 463469. Euler, M. (1993). Aufwendungen für Kinder. Wirtschaft und Statistik, 759-769. Frey, W., & Speare, A. (1988). Regional and metropolitan growth and decline in the United States. New York: Russell Sage. Fürst, G. (1972). Wandlungen im Programm und in den Aufgaben der amtlichen Statistik in den letzten 100 Jahren. In Statistisches Bundesamt (Ed.), Bevölkerung und Wirtschaft 18721972 (pp. 11-83). Stuttgart: Kohlhammer. Geiger, T. (1932). Die Soziale Schichtung des deutschen Volkes. Soziographischer Versuch auf statistischer Grundlage. Stuttgart: Enke. Hakim, C. (1982). Secondary analysis in social research. A guide to data sources and methods with examples. London: Allen & Unwin.
76
Richard Alba, Walter Müller, and Bernhard Schimpl-Neimanns
Haller, M., & Müller, W., W. (Eds.). (1983). Beschäftigungssystem im gesellschaftlichen Wandel Frankfurt: Campus Handl, J. (1988). Berufschancen und Heiratsmuster von Frauen: empirische Untersuchungen zu Prozessen der sozialen Mobilität. Frankfurt: Campus. Handl, J., & Herrmann, C. (in press). Soziale und berufliche Umschichtung der Bevölkerung in Bayern nach 1945: Eine Sekundäranalyse der Mikrozensus-Zusatzerhebung von 1971. München: Judicium Verlag. Hartmann, P.H., & Schimpl-Neimanns, B. (1992). Sind Sozialstrukturanalysen mit Umfragedaten möglich? Analysen zur Repräsentativität einer Sozialforschungsumfrage. Kölner Zeitschrift für Soziologie und Sozialpsychologie, 44, 315-340. Hauser, R., Cremer-Schäfer, H., & Nouvertni, U. (1981). Armut, Niedrigeinkommen und Unterversorgung in der Bundesrepublik Deutschland Bestandsaufnahme und sozialpolitische Perspektiven. Frankfurt: Campus. Heidenreich, H.-J. (1993). Einführung des Mikrozensus in den neuen Bundesländern. Probleme und Erfahrungen. In P. Lüttinger, & H. Wirth (Eds.), Amtliche Daten der DDR und der neuen Bundesländer: Informationsquelle für die Sozialwissenschaften — Tagungsdokumentation (pp. 11-26). Mannheim: ZUMA. Heidenreich, H.-J. (1994). Hochrechnung des Mikrozensus ab 1990. In S. Gabler, J. HoffmeyerZlotnik, & D. Krebs (Eds.), Gewichtung in der Umfragepraxis (pp. 112-123). Opladen: Westdeutscher Verlag. Herberger, L., & Becker, B. (1983). Sozialversicherungspflichtig Beschäftigte in der Beschäftigtenstatistik und im Mikrozensus. Wirtschaft und Statistik, 290-304. Holder, E., & Ehling, M. (1991). Zur Entwicklung der amtlichen Statistik in Deutschland. In Schriften des Zentralinstituts für sozialwissenschaftliche Forschung der Freien Universität Berlin, 65 (pp. 15-31). Berlin. Hyman, H.H. (1972). Secondary analysis of sample surveys: Principles, procedures, and potentialities. New York: Wiley. Jäger, Μ. (1992). Mikrozenus im Wandel — Entwicklungen in Deutschland und Österreich. Österreichische Zeitschrift für Statistik und Informatik, 22, 371-387. Kiecolt, K.J., & Nathan, L.E. (1985). Secondary analysis of survey data. Series: Quantitative Applications in the Social Sciences. Beverly Hills: Sage König, W., & Müller, W., W. (1986). Educational systems and labour markets as determinants of worklife mobility in France and West Germany: a comparison of men's career mobility, 1965-1970. European Sociological Review, 2, 73-96. König, Α., & Weißhuhn, G., unter Mitarbeit von Seetz, J. (1989). Betriebsgrößenentwicklungen, Beschäftigungsgewinne und -Verluste in den Wirtschaftsbereichen der Bundesrepublik Deutschland 1980-1986. In R. Schettkatt & M. Wagner (Eds.), Technologischer Wandel und Beschäftigung. Fakten, Analysen, Trends (pp. 121-143). Berlin: de Gruyter. Lieberson, S., & Waters, Μ. (1988). From many strands: Ethnic and racial groups in contemporary America. New York: Russell Sage. Logan, J., & Alba, R. (1993). Locational returns to human capital: Minority access to suburban community resources. Demogaphy, 30, 243-268. Lüttinger,P., & Wirth, H. (Eds.). (1993). Amtliche Daten der DDR und der neuen Bundesländer: Informationsquelle für die Sozialwissenschaften — Tagungsdokumentation. Mannheim: ZUMA. Maus, H. (1973). Zur Vorgeschichte der empirischen Sozialforschung. In R. König (Ed.), Handbuch der empirischen Sozialforschung. Band 1, Geschichte und Grundprobleme der empirischen Sozialforschung, (pp. 21-56). Stuttgart: Enke.
Secondary Analysis of Official Microdata
77
Mayer, K.U., & Biossfeld, H.-P. (1990). Die gesellschaftliche Konstruktion sozialer Ungleichheit im Lebenslauf. In P.A. Berger & S. Hradil (Eds.), Lebenslagen, Lebensläufe, Lebensstile (Soziale Welt. Sonderband 7) (pp. 297-318). Göttingen: Schwartz. Meyer, K. (1994). Zum Auswahlplan des Mikrozensus ab 1990. In S. Gabler, J. HoffmeyerZlotnik, & D. Krebs (Eds.), Gewichtung in der Umfragepraxis (pp. 106-111). Opladen: Westdeutscher Verlag. Morgan, S.P., Lye, D., & Condran, G. (1988). Sons, daughters, and the risk of marital disruption. American Journal of Sociology, 94, 110-129. Müller, W., W., Blien, U., Knoche, P., & Wirth, H. (1991). Die faktische Anonymität von Mikrodaten. Stuttgart: Metzler-Poeschel. Müller, W., W., & Haun, D. (1994). Bildungsungleichheit im sozialen Wandel. Kölner Zeitschrift für Soziologie und Sozialspsychologie, 46, 1-42. Oberschall, A. (1965). Empirical social research in Germany 1848-1914. Paris: Mouton. Peisert, H. (1967). Soziale Lage und Bildungschancen in Deutschland. München: Piper. Pöschl, H. (1993). Werbung und Beteiligung der Haushalte an der Einkommens- und Verbrauchsstichprobe 1993. Wirtschaft und Statistik, 385-390. Qian, Z., & Preston, S. (1993). Changes in American marriage, 1972 to 1987: Availability and forces of attraction by age and education. American Sociological Review, 58,482-495. Reinders, M. (1993). Fehlerrechnung zum Mikrozensus 1990. Statistische Rundschau Nordrhein-Westfalen, 398-404. Robinson, W.S. (1950). Ecological correlations and behavior of individuals. American Sociological Review, 15, 351-357. Rohwer, G. (1987), Niedrige Einkommen 1978-1983. Eine Auswertung der Einkommens- und Verbrauchsstichprobe. Allgemeines Statistisches Archiv, 71, 375-392. Schad, S.P. (1972). Empirical social research in Weimar-Germany. Paris: Mouton. Schäfer, U.G. (1971). Historische Nationalökonomie und Sozialstatistik als Gesellschaftswissenschaften. Köln: Böhlau. Scheuch, E.K. (1966). Cross-national comparisons using aggregate data. Some substantive and methodological problems. In R.L. Merritt & S. Rokkan (Eds.), Comparing nations. The use of quantitative data in cross-national research (pp. 131-167). New Haven: Yale University Press. Schimpl-Neimanns, B., Sutter, C., & Wirth, H. (1993). Abschlußbericht zum Projekt "Mikrodaten der amtlichen Statistik der DDR bis 1990. Bestandsaufnahme und Nutzungsmöglichkeiten für Sekundäranalysen über soziale Ungleichheit". Unpublished Research Report. Mannheim: ZUMA. Smeeding, T.M., O'Higgins, M., & Rainwater, L. (Eds.). (1990). Poverty, inequality and income distribution in comparative perspective: the Luxembourg income study (LIS). New York: Harvester Wheatsheaf. Statistisches Bundesamt (1988). Katalog der Statistiken zum Arbeitsgebiet der Bundesstatistik. Mainz: Kohlhammer. Statistisches Bundesamt (Ed.). (1989). Inhaltliche Fragen bevölkerungsstatistischer Stichproben am Beispiel des Mikrozensus. Bericht zur Konferenz vom 21. und 22. Oktober 1988. Schriftenreihe Ausgewählte Arbeitsunterlagen, Heft 10. Wiesbaden. Statistisches Bundesamt (Ed.). (1990). Historische Statistik in der Bundesrepublik Deutschland. Schriftenreihe Forum der Bundesstatistik, Band 15. Stuttgart: MetzlerPoeschel. Statistisches Bundesamt (Ed.). (1992). Bevölkerung und Erwerbstätigkeit, Fachserie 1, Reihe 4.2.1, Struktur der Arbeitnehmer, 31. März 1992. Stuttgart: Metzler-Poeschel.
78
Richard Alba, Walter Müller, and Bernhard Schimpl-Neimanns
Stockmann, R. (1986). Gesellschaftliche Modernisierung und Betriebsstruktur: die Entwicklung von Arbeitsstätten in Deutschland 1875-1986. Frankfurt: Campus. Stolzenberg, R. (1990). Ethnicity, geography, and occupational achievement of Hispanic men in the United States. American Sociological Review, 55, 143-154. Tienda, M., Smith, S., & Ortiz, V. (1987). Industrial restructuring, gender segregation, and sex differences in earnings. American Sociological Review, 52, 195-210. Willms-Herget, A. (1985). Frauenarbeit. Zur Integration der Frauen in den Arbeitsmarkt. Frankfurt: Campus. Wingen, M. (1989). Herausforderungen der amtlichen Statistik durch den gesellschaftlichen Wandel. Allgemeines Statistisches Archiv, 73, 16-41. Zapf, W., Breuer, S., Hampel, J., Krause, P., Mohr, H.-M., & Wiegand, E. (1987). Individualisierung und Sicherheit. Untersuchungen zur Lebensqualität in der Bundesrepublik Deutschland (Schriftenreihe des Bundeskanzleramts, Bd. 4). München: Beck.
Computer-Assisted Interviewing in Social and Market Research Rolf Porst, Michael Schneid, and Jan Willem van Brouwershaven
1
History
The victorious advance of computer-assisted data collection started in the United States in the 1970s with computer-assisted telephone interviewing (CATI). Until that time computers in social survey research had exclusively served as "tabulating machines". The resulting idea of using computers in the area of data collection was made possible, on the one hand, by the introduction of interactive remote terminals and interactive programs, (see Dandurand 1987). On the other hand, this development was accelerated by the perception that computers not only could store and process huge data sets, but were also able to activate data and to present information on terminals extremely quickly. The advantages of computers in the area of data collection were first recognized in commercial market-research. Supported by AT & T, Chilton Research Services, an American market-research company, carried out the first survey in 1971 using the method of computer-assisted telephone interviewing (or "cathode ray tube interviewing"). In the middle of the 1970s, other market-research companies followed and made their first efforts at computer-assisted data collection. At that time, their monitors were linked to mainframe computers, and the interviews were directed by "first-generation software" (see Curry 1989). The programs were hard to handle and experienced programmers were needed to build the questionnaires. At the end of the 1970s, second-generation software was developed, which was run on minicomputers. The software was easier to handle than its predecessors and had greater capabilities, both in size and quality. But its use was still limited to larger-size companies due to its expense. At the beginning of the 1980s, third-generation software came onto the market. The progress of personal computers led to the breakthrough in computer-assisted data collection. Even smaller firms turned to computer-assisted interviewing. It was at this time, that German market researchers also became aware of the new technique. Infratest (Munich) initiated its own CATI program called "Infracall" (see Anders 1988) and was followed by other German market-research firms. Today, one of every three commercial market-research companies conducts computer assisted interviews (see Schneid 1991). The development of portable computers and laptops represented another qualitative step in the history of computer-assisted interviewing. The advantages of the method could be transferred from telephone interviewing to face-to-face interviewing. Today, techniques have now been developed and are in use which obviate the need for interviewers or even contact between two persons.
80
Rolf Porst, Michael Schneid, and Jan Willem van Brouwershaven
Academic social scientists in the United States have been involved in computer-assisted data collection since the middle of the 1970s. In the Californian Disability Survey, which was conducted by scholars at the University of California in 1978, about 30,000 households were interviewed by means of CATI. By 1987, more than 40% of all U.S. universities worked with computer-assisted data collection systems. Whereas German commercial market research had adapted the new technique with only a small time lag, academic social scientists in Germany limped behind their American colleagues. In the early 1980s, they preferred to discuss the general use of telephone studies rather than the advantages of computer-assisted telephone interviewing. At this time, academic empirical research in Germany had hardly any practical experience in the area of telephone interviewing, to say nothing of computer-assisted telephone interviewing. ZUMA ran its first telephone interviews in early 1982, its first CATI survey in 1985 (see Schneid 1986). In addition, government agencies all over the Western world soon made use of computerassisted data collection. In the United States, the U.S. Bureau of the Census, in cooperation with the Center of Computer Based Behavioral Studies of the University of California, had developed its own system at the beginning of the 1970s. In 1988, the U.S. government alone had more than 50 installations (Saris 1991, p. 5). Since 1990, the National Center of Health Statistics has been conducting about 800 computer-assisted personal interviews a week and the National Agricultural Statistical Service of the United States has been doing more than 125,000 CATI interviews a year. Since 1989, a Public Electronic Network (PEN) has been in use in Santa Monica (California) with which citizen consultations are held. Citizens can contact the Network by themselves, ask the government questions, and start lively discussions in this way (Depla, Schalken, and Tops 1993). In the Netherlands, the use of CAPI systems (Computer-Assisted Personal Interviewing) by the Dutch Statistical Office grew from practically zero in 1985 to more than 3,000 per month by 1990. With a home-computer-based system called tele-interviewing, 150,000 interviews per year are carried out (Saris 1991, p. 6). In Delft (Netherlands) a citizens panel has been developed (see Severijnen and Willems 1993). Annually 1,000 heads of households are approached for face-to-face interviews and their family members complete self-administered questionnaires; the data processing is computer assisted. This instrument has five basic functions: (1) a descriptive function and the creation of a database; (2) the understanding of signals from the community, (3) communication, (4) policy evaluation, and (5) prediction. In Germany, the Central Statistics Office (Statistisches Bundesamt) has had very encouraging experiences with computer-assisted personal interviews (see Riede and Dora 1991), but has yet to make extensive use of computer-assisted data collection. The literature about computer-assisted interviewing does not tell us much about its use in the private sector. This is certainly not due to any lack of use by the private sector of such systems. Figures from SKIM Software Division (the European distributor of the Sawtooth systems, located in Rotterdam) indicate that about 20% of their software installation was for clients from the private sector. Usage has increased since the development of standard interviewing software for the PC in the mid-eighties. These systems were easier to use than the systems developed for mainframes. Parallel to the increased usage of PCs since then, more and more private companies have also started using software themselves. The use of computer-assisted data collecting systems is further boosted by the increased number of students leaving university with experience operating such systems. When they assume new positions, they bring to their company knowledge of these systems and of their practical applications which is often only acquired through practical use.
Computer-Assisted Interviewing in Social and Market Research
81
Two groups of private-sector users can be distinguished: (a) self-users, who use the systems without any help, and (b) co-users, who use the systems themselves, but with the help of market-research agencies. This cooperation may be a combination of elements from different phases in the research process. All in all, the use of computer-assisted data collection is still growing. The total number of CATI installations alone is estimated to exceed 1,000 worldwide (Saris 1991, p. 5).
2
Definitions and Descriptions
As a generic term covering all varieties of computer-assisted data collection, CAI (ComputerAssisted Interviewing) has gained acceptance. Occasionally other terms may be found, such as CADAC (Computer-Assisted Data Collection), or CASIC (Computer-Assisted Survey Information Collection). CATI (Computer-Assisted Telephone Interviewing), CAPI (Computer Assisted Personal Interviewing), and CSAQ (Computerized Self-Administered Questionnaires) fall within the scope of CAI. Common to all of these forms of data collection is the fact that they do not represent new methodologies, but are just new techniques to make traditional forms of data collection in social survey research more effective. This also holds for another form of data collection which should be mentioned here, socalled disk-by-mail (DBM) interviewing (see Goldstein 1987; Higgins et al. 1987; Wilson 1989; Zandan and Frost 1989; Gershenfeld et al. 1991). DBM is a special kind of postal interviewing where a disk rather than a paper questionnaire is sent to the respondents. The respondents, who have to be equipped with a computer, answer the questions on the disk and send it back to the researcher or the research agency. Although disk-by-mail surveys do not seem to be very widespread, IntelliQuest alone has conducted 34 DBM surveys in three years and, for this purpose has sent out more than 30,000 disks (Zandan and Frost 1989). Electronic mail surveys (EMS) represent another special case of CSAQ. Here, respondents receive a questionnaire by electronic means (e.g. by BTX or e-mail) and send it back the same way (see Günther and Semrau 1984; Perlman 1985; Kiesler and Sproull 1986; Sproull 1986). There are other techniques in computer-assisted interviewing, especially in the area of CSAQ. Some are presently but hopes for the future. Other techniques are already developed and have been tested, at least sporadically (see Appel and Nicholls 1993). We will come back to these new data collection methods in section 6.
3
Classification and Function of Computer Technology for Social and Market Research
In this section we will try to develop some theoretical perspectives on computer-assisted data collection in the global context of computer technology for social and market research. Although this can only represent a first step toward a general theoretical framework for computerassisted data collection, we want to initiate the process, convinced as we are of the necessity of such a theoretical framework.
82
Rolf Porst, Michael Schneid, and Jan Willem van Brouwershaven
3.1
The Product Range - Production Process Matrix
The "Product" Range In order to provide an overview of the use of computer-assisted systems we will employ a framework that combines the different kinds of data "products" that are produced in the (market) research process and the "production process" that produces the data. This framework can also be used to compare paper and pencil interviewing with computerized interviewing. The products, as shown on the continuum in Fig. 1, range from simple data to strategic marketing (see Curry 1988). They are characterized by increasing levels of support, personnel expertise, and added value as one moves from left to right:
Raw Data
Information
Analyses/ Modeling
Strategic advice
Figure 1. The (marketing) research product range. The most basic products provided by a (market) researcher are (raw) data reports. These are sales, demographic, competitive, or other data, tabulated in report form or supplied in an electronic database. In addition to the data in its basic form, the researcher may provide information: raw data that have been organized, summarized, or plotted for easy understanding of content. The types of information products offered include reports, cross tabulations, univariate statistical analysis, and charts and graphs. Next on the continuum are analyses and modeling. Analysis involves inference, interpretation and forecasting - all based on the information product. Analysis (as used here) employs multivariate techniques, time-series analysis, and computer modeling. The product in the (market) research product range with the most value added is strategic advice. For example, the department may provide advice on how to defend against a competitive move, prevent slippage in market share, or approach a new market segment. In social research, strategic advice may involve political decisions or provide an empirical basis for legislative processes.
The "Production" Process The "production" process that generates the products of (market) research can be subdivided into a number of stages; each of these stages is characterized by skills and production factors which, for the researcher who wants to excel at that stage, are of vital importance. A researcher who wants to trace a problem has to thoroughly understand the market and the area of interest, and he or she has to be a proficient communicator with sharp eyes and ears for the essentials. In order to excel at the second stage, the analysis of problems and the designing of research, the researcher has to be a first-class methodologist and marketing or social-science theoretician. He must know which and what kind of data he has to produce in order to solve a problem and he must know how to generate these data. The output of the first two stages is a questionnaire of a certain length and complexity, based on a certain technology and a specific type of interviewer/respondent interaction.
Computer-Assisted Interviewing in Social and Market Research
83
Data collection involves two parties: the respondent and the interviewer. The respondent can be active or unaware and the interviewer can be active or inactive. In creating a two-dimensional map of these two variables, we have created the four basic interviewing modalities according to which interviewing methods based on certain technologies can be classified.
Self-admlnlsterlng
Respondent active
P&p mail Disk by mall CSAQ
Person-to-person CATI CAPI Group discussion
Interviewer inactive
Interviewer active
Membership cards
In-store observation
Observation
Scanning
Respondent unaware Figure 2. The four basic interview modalities.
Different factors are of vital importance at the stage of data-collecting and data-processing. Here the keys to excellence are economics of scale, efficiency, field management, quality control and punctuality. The qualities needed in analyzing and interpreting, the fourth stage, are 80 percent identical with the "unique selling points" of the second stage. In addition, efficiency and accuracy may make difference. To provide advice and implement the results, one has to be a marketer or a specialist in the area of study. Although the skills and productions factors that are of vital importance for the researcher who wants to excel differ at each stage, the stages are linked and are mutually dependent. For the researcher, the production process may appear to be a discrete process, which it is not: it is instead continuous in nature. As each product in the (market) research product range has its own characteristics and unique selling points, the process(es) stressed in the production process will differ from product to product. A combination of product range and production process results in the PP/PR matrix (based on Huisman 1988).
84
Rolf Porst, Michael Schneid, and Jan Willem van Brouwershaven
Product Range
Production Process
Raw data
Information
•
Analyzing problem, Research/Design Data collecting/Data processing
Advice, Implementation results
Strategic Advice
•
Problem tracing
Analyzing/Interpretation
Analyses Models
···
··· ·•·
·· ·· ··· ··
··· ·•·
·· ···
Figure 3. Production Process / Product Range Matrix (PP/PR matrix) © SKIM
In Fig. 3, the number of "^s" represents the strength of the connection between the given product range element and the production process element. For a correct interpretation of the matrix, the "relative emphasis" is essential. For example, the quality of data and data collecting is for any product, considered in absolute terms, equally important. Anyone supplying strategic advice, however, will be very critically assessed as to his or her qualities in the fields of problem-tracing and interpretation of results.
3.2
The Function of Computer Technology in the Production Process
To get an idea of what the computer can contribute to the production process, we will briefly discuss its different stages. The advantages of the computer in the first two stages of the production process are enormous. Most advantages, however, such as the use of a PC for tracing and analyzing the problem, have not yet been translated into widely available and applicable software. The most promising and interesting developments concern marketing information systems (MIS), decision support systems (DSS), and artificial intelligence (AI) systems. The challenge facing the researcher in the future will be how to select the information needed from an overload of data. Once again it will be the combination of data analysis and data disclosure skills on the one hand and knowledge of substantial problems on the other hand that will clear the way for a growing use of MIS, DSS, or AI systems. Both the time saved and the opportunity to better "tune" the research design to the core of the problem will favor the use of computers at these stages of the production process. Aside from its use in MIS, DSS, and AI systems, the PC will also be employed in problemformulating studies. The expanded use of PC-based CAPI systems has not limited itself to the stages following data collecting and processing. The first experiences in using computers in qualitative and problem defining types of studies are promising. For each product that is produced by the marketing research process data must be collected and processed.
Computer-Assisted Interviewing in Social and Market Research
85
The development of computer technology is most clearly seen in person-to-person interviewing. Collecting data with paper and pencil was first replaced by computer-assisted telephone interviewing. It added automated management of the sample and automated routing, and it also made card-punching redundant. In addition its results are directly available following data collection. When the programs for computerized interviewing became more proficient this added another component of value to the data-collecting process: their arithmetic power made it possible to make calculations; complicated routings could be followed that could not be made by an interviewer or only with great effort; and checks and controls could be made. All of these options enable computerized interviewing to do the same job better and thus improve quality. With the development of interactive computer programs, data could be collected by means of programs that absolutely cannot be collected by means of paper and pencil. Inclusion of pictures and color give the interview a more "real-life look". With computers, one can handle studies one could not handle before. The same advantages hold for self-administering as for person-to-person interviews. The paper and pencil mail survey can be replaced by a disk-by-mail survey. The researcher has to consider, however, the availability of PCs among the sample: for business research it will approach 100%, while for consumer research a rate of 20% can be expected. Especially for selfadministered interviewing, new technologies (descrobed below) create opportunities that make the interviewer redundant. The scanning computers used in supermarkets make both respondents and interviewers superflubus. It integrates the data-collecting phase and the data-processing phase. There is no technology yet that makes it possible to computerize both observation and data-processing. There is, however, observation software that makes the processing of, for example, the taped actions of persons easier to handle. Data analysis was originally carried out in a mainframe environment and has always been a specialist's job. Now a growing software library for the Personal Computer has made data analysis more easily accessible. The combination of much greater efficiency, improved quality, and speed has resulted in a widespread use of the PC in the analyzing stage. The introduction of the PC into the advisory stage is not exclusively tied to the research production process. Advances in word processors and graphics programs - in a word, desktop publishing - are what make the PC popular at this stage. In this article we will concentrate on the data-collecting process: the possibilities, opportunities, and its future. We must, however, not forget that the production process in (market) research has to be an integrated process. The integration of this process is the main advantage of computer technology, and especially of the Personal Computer. In addition, it offers the opportunity to improve the quality of the product at each stage and to limit production time and cost.
4
Features and Capabilities of Computer-Assisted Data Collection Systems
When talking about the features and capabilities of computer-assisted data collection one must keep in mind that there are considerable differences between specific programs in both performance and cost. The capabilities of one program are not necessarily found in another, which on its part may offer attractive functions the first program does not possess (see Schneid 1989; Porst
86
Rolf Porst, Michael Schneid, and Jan Willem van Brouwershaven
and Schneid 1991). Thus, when we talk about CAI, we refer to modern CAI programs in general rather than to any specific program.
4.1
Before Interviewing
Constructing the questionnaire has to be so easy that a programmer or other specialist is not required. In actual CAI programs, questions and answers may first be written with common word processors and then be converted to the CAI program. In addition, all programs have their own text editors which are rather easy to use. With regard to the questionnaire, CAI software systems must be able to do at least all the things which can be done in a paper questionnaire, and even a little bit more. That is, all kinds of questions must be designable, closed as well as open-ended questions, single-response questions as well as multiple-response questions, numeric questions as well as different kinds of scales. Especially when conducting a CAPI interview, different optical features have to be available to assist the respondent in doing her or his job. Modern CAI programs are easy to handle and flexible. Flexibility means that little effort is required to install those files which are needed for interviewing. In the best case, all necessary files can be copied onto a field disk, and you need nothing else in order to conduct the interview - besides a computer, of course. Data entry is usually carried out via a keyboard, light pen, touch screen, or mouse. Using the mouse makes interviewing relatively fast and comfortable (see Poynter 1991), while using a light pen or touch screen is rather tiring for the respondents. Other input media, such as the paddle, joystick, or notepad, are known. Whereas the features and capabilities described up to now have been combined with all kinds of CAI, we will now turn to CATI-specific software. CATI software is usually of two kinds: software which is needed to develop and run the questionnaire and software which is used for study management. It is the latter which lessens the influence of the interviewer in computer-assisted telephone interviews as compared with traditional telephone interviewing. This becomes evident, for instance, in the sampling procedure. Modern CATI software includes modules which support sampling and modulation of the sample, and manage calls, appointments, and contacts. The program determines the priority of every single call as well as the point of time at which it is made. Of course, respondents can also be handchosen, if this seems advisable. Furthermore, CATI programs can sample the respondents into different categories randomly. This may be important when working with split ballots.
4.2
Interviewing
The interview itself is totally directed by the program. The interviewer simply has to follow the directions given on the screen. While running the interviewprogram it is possible to return to preceding questions. CAI programs can present preceding questions on the screen either with or without the given answer. This enables respondents to review their prior answers and to change them as they see fit. To obtain detailed information about the interviewing process, it may be necessary for the interviewers to make notes on respondents' reactions. Some CAI programs allow interviewers to make such notes in regard to every single question. These comments are stored in separate files.
Computer-Assisted Interviewing in Social and Market Research
87
Furthermore, questions can be skipped if the respondent refuses to answer. This may be of importance, especially when asking delicate or difficult questions. Interviewing can be set up in such a way that the program automatically skips a question if it is not answered within a specific time period as defined by the researcher. Or the interview can be stopped or broken off in such a case. This is important when the unanswered question is of central significance for further interviewing, e.g., for screening or global skips. Interviews can also be interrupted for any reason and be restarted with exactly the next unanswered question. Important technical skills provided by computer-assisted data collection include calculations, randomization of questions and answer categories, validation of responses, complex branching, and complex coding (see also Saris 1991, p. 20ff.). Because of the importance of these skills for assessing the advantages of computer-assisted interviewing it makes sense to look at them more closely. Modem CAI programs allow even very complex kinds of branching. This opportunity is fully used when skipping results from specific answers to more than one question (for example, a skip is activated only if the repondent answers the question about her or his year of birth with 1960 or earlier and the question about his or her marital status with married). This can lead to an extreme individualization of the interview without confusing the interviewers, who might ask 25 questions and not even recognize that the total questionnaire might consist of 250 questions (see Groves and Mathiowetz 1984). In general, one of the most interesting aspects of computer-assisted data collection is that even extremely complicated and extremely complex questionnaires can be conducted rather easily (Frey 1983). This includes the possibility of doing a survey in more than one language, so that each respondent can choose to answer in the language he or she is most comfortable in (see Karweit and Meyers 1983). This aspect may be less important to social scientists, but is rather interesting to market researchers conducting point of sale studies at international conferences and trade shows. Sometimes, it makes sense to ask questions only if a specific numeric value is reached when combining the answers of more than one of the preceding questions (for example, the question on specific investments is to be asked only if total household income is higher than, say, $100,000 per year - whereas income has been ascertained as the net income of each household member separately asked about). CAI programs are able to calculate the four fundamental operations of arithmetic and even calculate rather difficult arithmetical problems. To avoid context effects it is useful to randomize items or questions. CAI programs satisfy these important demands and can even randomize answer categories or complete parts of the questionnaire. Especially researchers who deal with questions of method or methodology will enjoy the fact that CAI programs allow them to measure the time the respondent needs to answer each question of the interview. Naturally, the total time spent conducting the interview is measured as well, which is not only an interesting bit of information in itself, but may serve as an indicator of incorrect or even forged interviews. Researchers should be careful when the interviews of a specific interviewer take noticeably less time than the average. Finally, when running the interview CAI programs make it possible to recall information from external data bases or from data gathered in a preceding interview, which is very important in panel surveys. Other CAI skills are specially linked with CATI. We have already mentioned the sampling and modulation of the sample, and the management of calls, appointments, and contacts. Furthermore, the interviewers' output may be registered, that is, the number of contacts and appointments, the number of interviews conducted and the average time for conducting these interviews.
88
Rolf Porst, Michael Schneid, and Jan Willem van Brouwershaven
4.3
After Interviewing
Because no separate data entry is necessary, the collected data may be analyzed immediately after the end of the data-collection process. Data editing is possible, e.g. the deleting of data and the transformation of specific data. Some CAI programs are even able to calculate frequency distributions during the field period or immediately after the end of data collection and thus give initial information about the results. Finally, the programs include subroutines to translate raw data, so that they can be read by such standard statistical software programs as SPSS or SAS. If a questionnaire includes open-ended questions, the answers to these questions can be coded with computer assistance after interviewing. Besides the specific question and the answer to this question, the program also shows the code list on the screen. In this way, the given answer can be coded interactively.
5
Implications and Methodological Aspects of the Use of Computers
5.1
Unique Investments
The investments which are needed to create the basis for computer-assisted interviewing differ according to the kind of data collection and the kind of technical equipment required. With regard to the hardware, investments depend on whether you want to work with CAPI or CATI, and in the latter case, whether a server and network are drawn upon or only free-standing. With regard to the software, you have the choice between commercially available software and the developement of your own. Different considerations arise with regard to CAPI. PC-based interviewing (face-to-face) and a network of PCs in the hands of the interviewers change the key economic figures and bottlenecks of a research firm. The research agency's organization has to be adapted since each interviewer is walking around with a one-thousand-dollar investment which may easily be lost or need to be replaced again soon. Originally, the costs of interviewing were variable costs, but this is rapidly changing. In addition to the interviewers and their traveling costs, the institute has to add a fee to cover investments. Thus, computer interviewing is more expensive than paper-and-pencil interviewing. These higher costs have to be covered by lower costs in the data-processing phase or by a higher price paid for a better product. In contrast to the United States, it is not yet possible, as far as we know, to rent laptops on a large scale in Europe. The PC software environment is characterized by on-time prices and no software leasing. In comparison with hardware costs, software costs for PC-based interviewing are reasonable and limited.
Computer-Assisted Interviewing in Social and Market Research
5.2
89
Time and Money
Data collection may be divided into three phases: planning and preparing the data collection, interviewing, and data preparation. More than for traditional interviewing, for CAI it is absolutely essential to test all aspects of the questionnaire to be put into operation, as well as all phases of data collecting and data editing. Usually, testing the questionnaire takes more time than programming it. The amount of time and cost involved in preparing for data collection increases with the complexity of the questionnaire. The time and costs for interviewer training differ according to the job experience of interviewers to be trained. When training interviewers for CAPI/CATI, one must distinguish between novice interviewers and more experienced ones. In the first group, the goal is to make them familiar with the computer and make them feel confident. In the second group, one can work toward the optimal utilization of CAPI/CATI. In contrast to general interviewer training, training for an actual survey is less costly in time and money. A variety of interviewer problems (skips, probing, etc.) need not be treated at all in training because they are solved automatically by the computer program (Sudman 1983). The most effective method for instructing novice interviewers has proved to be to give them personal training and a computer to practice with. Some ways of training the experienced interviewer are discussed by Wojcik and Hunt (1993). First, use self-study materials for interviewers who have already received CAPI training: this self-study method makes it possible to lower the costs associated with CAPI/CATI training. Training can be in the form of a tutorial on paper, computer or video. Second, use real respondents for training: the best training for an interviewer is the interaction with a real respondent. In this case one can instruct the interviewer to provide certain answers or leave the respondent free to give his or her own answers. The latter option provides a more real-life interview, but has two drawbacks. First, the interviewer might not reach certain parts of the interview because of branching. Second, an open interview does not give the researcher the opportunity to compare the actual answer type with the answers expected. Nicholls and Groves mentioned that computer-assisted (telephone-)interviews last about 50% longer in pure interviewing time than traditional interviewing without computers. To explain this, four reasons have been found: First, in a traditional paper-and-pencil interview the experienced interviewer can record the answer to a question while already reading the next one. Second, it takes more time to type on the keyboard than to write with a pencil. Third, the computer forces the interviewer to read questions completely, and the interviewee to answer questions completely. And fourth, checking the validity and consistency of the answers during interviewing increases the time you need for conducting the interview (Nicholls and Groves 1986). On the other hand, interview time can be saved, when the program is able to manage calls, dates, and contacts. Time can also be saved insofar as completed questionnaires no longer need to be screened. Using CAI systems for the data-editing phase leads to remarkable savings in time and costs, because separate data entry is not necessary. Data editing is limited to combining data in an output data base for data analyses. All in all, CAI may be cheaper than traditional data collection, but it need not be. Whether CAI systems lead to savings in money and time depends on such things as the characteristics of the specific study, its complexity, its sample size, and its questionnaire.
90
Rolf Porst, Michael Schneid, and Jan Willem van Brouwershaven
5.3
Response and Nonresponse
The willingness to participate in computer-assisted interviews does not differ from that found in traditional data collection. On the contrary, computer-assisted interviewing is judged to be modern and interesting. Refusals as a reaction to the computer are rare. If you consider some important research on acceptance in different countrys (see Thornberry et al. 1990; Van Bastelaer and Sikkel 1987; Bateson and Hunter 1990; Riede and Dora 1991), you will find no specific problems regarding response to computer-assisted personal interviewing. Surveys in different countries has led to the conclusion that the behavior of respondents in computer-assisted interviewing is not different from their behavior in tradional interviewing situations (National Center of Health Statistics 1988; Statistics Sweden 1989; Riede and Dorn 1991). Respondents have positive attitudes vis-ä-vis the computer as an interviewing agent. The attractiveness of computer-assisted interviewing for respondents is mentioned in various studies (for example, see Erdmann and Greist 1983). Response rates are reported to be rather high, but low rates can also be found. In general, it was found that computer-assisted interviewing does not lead to self-selection among respondents.
5.4
Interviewers
Like respondents, interviewers also accept computer-assisted data collection as compared with paper-and-pencil interviewing (see Sebestik et al. 1988; Couper and Groves 1990, Thornberry et al. 1990), and interviewers themselves cite the advantages of computer-assisted interviewing (Groves and Mathiowetz 1984). What differs from face-to-face interviewing is the relationship between interviewer und intervierwee. Interviewers report on problems of interaction with the respondents since they tend to concentrate on the screen rather than on the interviewer (Thornberry et al. 1990). The interaction between man and computer may result in new biases amd problems (Tull and Hawkins 1987), which unfortunately, have not yet been adequately solved or, in some cases, even considered. Although some other difficulties in interaction between interviewer and respondent are mentioned (see Sebestik et al. 1988; Thornberry et al. 1990), we are still in the dark about the overall influence of computers on this relationship. The same is true regarding the consequences of less interviewer-interviewee rapport on the quality of the data.
5.5
The Quality of the Data
Along with time and cost, the quality of the data is another extremely important aspect of data collection. In general, one of the main advantages of computer-assisted interviewing is that the quality of the data is clearly better than in traditonal interviewing (Shanks 1983). Nonresponse error results from the fact that not every sampled element actually can be interviewed, either because he/she cannot be contacted or refuses to be interviewed. Regarding contacts, Groves and Nicholls (1986) describe a significant higher rate of contacts in CATI studies than in traditional telephone interviewing, and they explain this result as the advantage of CAI in managing calls, dates, and contacts. On the other hand, Statistics Canada, in cooperation with the U.S. Bureau of Census, has observed that the contact rate in CATI studies is significantly less than in traditional telephone interviewing and that this results from the greater dura-
Computer-Assisted Interviewing in Social and Market Research
91
tion of the interview (Groves and Nicholls 1986; Catlin and Ingram 1988). In terms of refusals, no significant differences between CAT! and tradional telephone interviewing have been found. In general, there are no indications of an increased unwillingness to participate. Only 1% of those persons who had agreed to participate in a face-to-face survey conducted by the German Central Statistics Office (Statistisches Bundesamt) refused after they had been told that interviewing would be computer assisted. Measurement errors may be influenced by the interviewer, the interviewee, the questionnaire, and/or the recording of answers. In a computer-assisted interview they are reduced by automatic skips and plausibilty controls. Furthermore you can control context effects by rotating the presentation of questions, items, or even greater parts of a questionnaire. And finally, no errors in data entry may occur. On the other hand, typing errrors can ot be excluded, errors in programming may lead to errors in data, and, though least likely, the program or even the system may crash. In this case, at least part of your work will be irretrievablely lost. Most researchers report that there is a lower tendency to answer in a socially desirable way as compared with traditional survey methods (Kiesler and Sproull 1986; Martin and Nagao 1989; Waterton and Duffy 1984). Respondents seem to answer even delicate or difficult questions in a more honest and reliable way. This holds for various kinds of computer-assisted data collection, such as CASQ (O'Brien and Dugdale 1978) or EMI (Kiesler and Sproull 1986; Sproull 1986; Vavra 1986), and it is true for different thematic areas, such as sexual behavior, abortion (Johnson and Sturtevant, n.d.), and alcohol abuse (Lucas et al. 1977; Waterton and Duffy 1984). Data quality in general is better than in traditional interviews because typical weak points of traditional data collection are eliminated. There are no formally incorrect answers, no inconsistent answers, no skipping errors - assuming they are not programmed into the interview. Thus, computer-assisted interviewing forces the researcher as well as the programmer to develop questionnaires as logically as possible. This leads to better quality data because certain sources of error are eliminated before interviewing (see Sudman 1983).
5.6
Anonymity (Data Protection)
Data protection or anonymity in computer-assisted data collection do not appear to be greater problems than in traditional surveys. Some respondents even expect a greater degree of anonymity than in traditional forms of interviewing. Some researchers mention that the anonymity of computer-assisted interviewing was judged positively by the respondents. But with regard to the interviewer-interviewee relationship, many questions on anonymity and data protection cannot yet be answered.
5.7
Further Comparisons of Different Data Collection Methods
Resistance to computer interviewing is fed by fairy tales and widespread misunderstandings about its nature. One is that computer interviewing causes bias and selective response linked with the resistance among respondents to computers. Another claims, that you need programmers and specialists to run computer-assisted interviews. Both arguments are no longer valid. CAI's potential to better use available information, resulting in research designs finely tuned to the problems, they adress or its unique combinatorical capabilities, resulting in new ap-
92
Rolf Porst, Michael Schneid, and Jan Willem van Brouwershaven
proaches, should overshadow any potential threats associated with computers. If this is the case, computer-assisted interviewing should surpass other kinds of data collection. From a subjective point of view, respondents significantly find face-to-face interviewing to be more convenient and congenial than CAPI. On the other hand, CAPI is considered easier. In general, both kinds of interviewing are positively assessed, with the face-to-face interview being slightly more so (Kurz and Schnoetzinger 1982). Regarding estimated interviewing time, respondents held the CAI to be longer than the face-to-face interview. There is a lot of research concerning the comparison of CATI and traditional telephone interviewing without computers (see, for example, Groves and Mathiowetz 1984; Coulter 1985; Harlow et al. 1985; Catlin and Ingram 1988), some of which is rather inconsistent. For example, House (1985) "proved" that there are no differences in interviewing time between CATI and the paper method, whereas other researchers "proved" that CATI interviews would take more time than traditional telephone interviews (see Nicholls and Groves 1986; Catlin and Ingram 1988). Regarding interviewer errors as a dimension of data quality, CAI led to less errors than the paper method (Van Valey and Crull 1991). Wyatt (1991) has compared CATI and paper-and-pencil interviewing from preparation until analysis. The overall interviewing time was 10 minutes. He found that the development and testing of the CATI questionnaire takes tremendously more time than the same steps for traditional telephone interviewing do. Both the editing and coding of open-ended questions and tabulating take about the same time. The CATI study allows you to save time in data collection. Beckenbach (1992) compared CAPI, CATI, and CSAQ in a systematic methodological study. He concluded that CAI will presumably work without any restrictions. The computerassisted methods are at least as well accepted as traditional interviews. Anonymity is presumed to be higher than in traditional face-to-face interviews, which leads to better data quality. Above all, data quality is improved by computer techniques and capabilities which prevent errors in data collection.
6
New Data Collection Techniques
6.1
Perspectives on Data Collecting Techniques
In this section we will discuss several more techniques in computer-assisted interviewing (see Appel and Nicholls 1993), especially such techniques which may be classified as computerized self-administered interviewing (CSAQ). Some of them could only be developed using computers as the data-collecting agent. TDE (Touchtone Data Entry): TDE allows the respondent to answer questions using the keypad of a touchtone telephone. TDE serves as an interviewer who reads the questions and records the touchtone entries in a digitized voice. The fact that you can only ask questions with numeric response limits the use of TDE for surveys dramatically. Nevertheless, TDE can be used for reporting numeric data and numeric information, and because of this it is tested for monthly reporting of numeric data in different establishment surveys in the United States (Werking, Tupek, and Clayton 1988). VRE (Voice Recognition): VRE allows the respondent to speak over a telephone to reply to the computer-generated prompts. TDE serves as an interviewer who reads the questions in a digitized voice, recognizes the respondents verbal replies, and echoes them back for confirmation. Appel and Nicholls (1993) divide VRE technology into three levels of vocabulary size: a
Computer-Assisted Interviewing in Social and Market Research
93
small vocabulary, limited to the digits 0-9, yes, and no; a medium vocabulary of up to 100 words per prompt; and a large vocabulary from one hundred to thousands of words. EDI (Electronic Data Interchange)·. EDI stands for the electronic transfer of information in a standard format between two persons or units. It is usually used between business partners (Ambler and Messenbourg 1992). Appell and Nicholls (1993) discuss some other new technologies in the area of computerassisted interviewing: pen-based computers, defined as computers that provide a pen for input rather than a keyboard; optical imaging, referring to a process in which questionnaires are electronically scanned, converted to digital images, and stored in computer-readable form; optical character recognition (OCR), as a method in which computer software recognizes each element of an alphanumeric character string and converts it to the corresponding ASCII code; and image processing of FAX data returns (IP-FDR), which is an extension of imaging and OCR technologies.
6.2
Conjoint Analysis
Conjoint analysis is a widely-used research technique. It helps researchers, for example, to determine what features a new product should have and how it should be priced. In the past, its usefulness has been limited by the data collection methods available; there was too much information for respondents to consider thoughtfully. Because the respondents' time and attention are such important factors, the scope of many studies had to be limited. Too much information was presented and respondents' answers were compromised by "information overload". Computer systems like Sawtooth's ACA System for Adaptive Conjoint Analysis moves beyond the limitations of traditional data collection methods. It adapts the interview to each particular respondent, using a PC-based interviewing procedure that takes advantage of the intelligence of the computer. During the interview the computer learns enough about the respondent's values to focus only on those areas of greatest interest to that respondent. As a result, you get: * Greater realism: more attributes can be tested, with more levels per attribute * Greater relevance: the information gathered is more respondent-specific * Higher quality data: respondents have a higher level of interest and involvement in the data collection task The ACA System contains everything you need to execute a conjoint study: an interviewing module; a utility calculator for estimating respondent utilities during the interview; and an easy-to-use market simulator for estimating the market's shares of preference for your products and your competitors' products. ACA contributes to the integration of data collection, data storages, analysis, simulation, and presentation. It is not the sophisticated arithmetic and programming that makes these kind of programs interesting: it is the opportunity to upgrade the job of the research manager and to offer products from the right-hand side of the product range (more advanced sectors) that is interesting: in the future this will change the organization of the production process.
94
Rolf Porst, Michael Schneid, and Jan Willem van Brouwershaven
7
The Future of Computer-Assisted Data Collection
When the future is envisioned the first development expected is the rapid improvement of hardware. Again and again the speed, graphics possibilities, and storage capacity of computers are increased and their size and their prices reduced. These changes cause a feeling of discomfort in the market because people do not know what to expect. Despite this uneasiness, the lowered price of hardware makes it possible for more and more users to start computerized interviewing, both CATI and CAPI. The use of computerized interviewing is still on the increase, becoming more and more widespread. The introduction of the Apple PowerPC is good news. This computer is the result of cooperation between IBM, Motorola, and Apple, which brings standardization a step closer. Soon it will not matter what kind of computer you have. Johnson (1992) believes that computers will soon be as common as telephones and televisions - indeed, these three instruments may merge into the same appliance. This will be a real boom for computer interviewing, because it will mean that computer-administered surveys will be common among consumers, just as they have become in business-to business surveys, where PCs are common today. He also remarked, as software developer, that software is getting easier to use. Although the new versions of software are easier to use than the older ones in some ways, there are still many ways in which this could be improved. This has become one of our highest priorities, and we expect to see dramatic improvements in the future. Curry (1992) mentioned several technologies that seem to have some relevance to marketing research: Multimedia: The most exciting emerging technology is multimedia. This is the ability to produce sound and full motion video on the PC using compact-disc technology. Once standards are established, we are going to reincorporate interactive video into PC interviewing using this new technology. Multimedia promises to make interactive video instantaneous and affordable. Handwriting Recognition: Handwriting recognition involves converting handwritten text into machine readable form in real time. When this technology becomes available, keyboards can be eliminated from computer interviewing. Unfortunately, handwriting recognition development has proven difficult and this technology probably will not be available until the late 1990s. Palm-Top Computers:Palm-Top Computers are computers about the size of a wallet, yet are IBM compatible. Inexpensive palm top computers will make PC interviewing universal and the interviewer's clipboard a relict of the past. What has already happened in market research will also happen in the academic social research, at least gradually. Computer-assisted interviewing will outrun the traditional type of data collection (for example, see Shanks 1983, Dandurand 1987; Lavrakas 1987). CAI is easier to conduct and results in better quality data. Moreover, data collected with computers are more quickly available than other data, because no data entry is necessary after the end of the field period. The further development of software will lead to more complex questionnaires for measuring more complex social realities (e.g., diaries, life careers, social monitoring), and the degree to which academic social researchers can use these more complex methods is simply a question of
Computer-Assisted Interviewing in Social and Market Research
95
infrastructure. In general, the development of more complex software will lead to a further diffusion of computer-assisted data collection methods. The falling prices of hardware and the further development of technologies such as laptops, notebooks, penbooks, and other kinds of easily portable computers will lead to more "point of sale" studies, and this also applies to social survey research. Consider, for instance, the continous monitoring of media use or the recording of daily routines. Along with CATI, the use of CAPI will also increase in the social sciences, and it is just a question of time until new techniques in computer-assisted data collection will make their entry into social survey research.
References Ambler, C. and Th. L. Messenbourg (1992). EDI - Reporting Standard or the Future. In Proceedings of the Bureau of the Census 1992 Annual Research Conference (pp 289-297) Arlington, Virginia: Bureau of the Census. Anders, M. (1988). Telefonbefragung - wissenschaftlich nicht fundiert? München: Infratest. Appel, Μ. V., and W. L. Nicholls (1993). New Casic Technologies at The U.S. Census Bureau. Paper presented at the 1993 AAPOR Conference. Bateson, N.and P. Hunter (1990). The Use of CAPI for Offical British Surveys, Paper presented at the World Congress of Sociology, Madrid. Beckenbach, A. (1992). Befragungen mit dem Computer, Methode der Zukunft? Unpublished Diploma thesis, University of Mannheim. Catlin, G. and S. Ingram (1988). The Effects of CATI on Costs and Data Quality: A Comparison of CATI and Paper Methods in Centralized Interviewing. In R. M. Groves et al., Telephone Survey Methodology (pp. 437-450). New York: Wiley and Sons, Techn. Publications. Coulter, R. (1985). A Comparison of CATI and NONCATI on a Nebraska Hog Survey, Statistical Reporting Service, U.S. Department of Agriculture, April. Couper, M., R. M. Groves, C. A. Jacobs (1990). Building Predictive Models of CAPI Acceptance in a Field Interviewing Staff. In Proceedings of the Bureau of the Census Annual Research Conference, (pp 685-702) Arlington, Virginia: Bureau of the Census, U.S. Department of Commerce. Curry, J. (1988). Using your PC strategically. Quirk's Marketing Research, 2, 20-22. Curry, J. (1989). Introduction and Perspective, In 1989 Sawtooth Software Conference Proceedings, Starting a PC-Based CATI Facility, Volume II, (pp 1-6). Sun Valley, Idaho: Sawtooth Software. Curry, J. (1992). Marketing opportunities with advanced research techniques. In SKIM Seminar Proceedings 1992. Rotterdam: SKIM Software Division. Dandurand, L. (1987). Historical perspectives and the future of computer interviewing. In Sawtooth software conference on perceptual mapping, conjoint analysis, and computer interviewing, (pp 1-9). Ketchum, Idaho: Sawtooth Software. Depla, P., K. Schalken und P.W. Tops (1993). Bestuurskunde 6: Burgeronderzoek en gemoderniseerd lokaal bestuur, De betekenis van informatie- en communicatietechnologie, 311 326. Erdmann, Η., Μ. H. Klein, and J. H. Greist (1983). The reliability of a computer interview for drug use/abuse information. Behavior Research Methods & Instrumentation, 15, 66-68. Frey, J. H. (1983). Survey research by telephone. Beverly Hills, CA: Sage.
96
Rolf Porst, Michael Schneid, and Jan Willem van Brouwershaven
Gershenfeld, S., T. Atherton, M. Ben-Akiva, L. Musetti (1991). Context-Specific Choice Experiments for Multi-Featured Products: A Disk-By-Mail Survey Application. In Proceedings of the Sawtooth Software Conference, Gaining A Competitive Advantage Through PC-Based Interviewing and Analysis, Volume 1 (pp 19-24) Sun Valley, Idaho: Sawtooth. Goldstein, H. (1987). Computer Surveys by Mail. In Proceedings of the Sawtooth Software Conference on Perceptual Mapping, Conjoint Analysis, and Computer Interviewing, (pp 55-59) Sun Valley, Idaho: Sawtooth. Groves, R.M. and N. A. Mathiowetz (1984). Computer assisted telephone interviewing: Effects on interviewers and respondents. Public Opinion Quarterly, 48, 356-369. Groves, R. M. and W. L. Nicholls II (1986). The Status of Computer-Assisted Telephone Interviewing: Part II - Data Quality Issues. Journal of Offical Statistics 2, 2, 117-134. Günther, J. and E. Semrau (1984). Meinungsforschung mit Bildschirmtext? Entwicklung eines Instruments zur Nutzung von Bildschirmtext für die empirische Meinungsforschung. Wien: Literas. Harlow, B.L., J. F. Rosenthal, and R. G. Ziegler (1985). A Comparison of Computer-Assisted and Hard Copy Telephone Interviewing. American Journal of Epidemiology 122, 335-340. Higgins, C.A., T. P. Dimnik, and H. P. Greenwood. (1987). The DISKQ survey method. Journal of Market Research Society, 29,437-445. House, C.C. (1985). Questionnaire design with computer assisted interviewing. Journal of Official Statistics, 1, 209-219. Huisman, D. (1988). PC-Based reasearch in Europe and the USA now and after 1992: Strengths, weakness, fairy tales. Presented at the seminar on "The impact of new user-oriented computer facilities on market research", Copenhagen (Denmark), October. Johnson, B. and V. Sturtevant (n.d.). A comparison of computer interviewing with traditionell paper-pencil format: Soliciting sensetive information. Johnson, R. (1992). Ci3: Introduction and evolution. In Proceedings of the Sawtooth Conference 1992. (pp 91-102). San Valley, Idaho: Sawtooth Software. Karweit, N. and E. D. Meyers Jr. (1983). Computers in survey research. In P.H. Rossi, J.D. Wright, and A.B. Anderson (Eds.). Handbook of survey research (pp 379-414) Orlando, FL: Academic Press. Kiesler, S. and L. S. Sproull (1986). Response effects in the electronic survey. Public Opinion Quarterly, 50. 402-413. Kurz, Η., and J. Schnoetzinger (1982). Effizienzvergleich zwischen Computerbefragung und mündlichem Interview. Seminar "Computer in der Datenerhebung", Vienna. Lavrakas, P.J. (1987). Telephone survey methods: Sampling, selection, and supervision. New York, CA: Sage. Lucas, R.W., P. J. Mullin, C. Β. X. Luna, and D. C. Mclnroy (1977). Psychiatrists and a computer as interrogaters of patients with alcohol-related illnesses: A comparison. British Journal of Psychiatry, 131. 160-167. Martin, C. L., and D. H. Nagao (1989). Some Effects of Computerized Interviewing on Job Applicant Responses. Journal of Applied Psychology, 74, 72-80. National Center of Health Statistics and Bureau of Statistics (1988). Report of the 1987 Automated National Health Interview Survey, Feasibility Study. An Investigation of Computer Assisted Personal Interviewing. Working Paper Series Number 32, U.S. Department of Health and Human Services. Nicholls II, W. L. and R. M. Groves (1986). The Status of Computer-Assisted Telephone Interviewing: Part I - Introduction and Impact on Cost and Timeliness of Survey Data, Journal of Offical Statistics, 2, 2, 93-115.
Computer-Assisted Interviewing in Social and Market Research
97
OvBrien, T. and V. DugdaJe (1978). Questionnaire administration by computer. Journal of the Market Research Society, 20, 228-237. Perlman, G. (1985). Electronic surveys, Behavior Research Methods, Instruments & Computers, 17, 203-205. Porst, R and M. Schneid (1991). Software-Anforderungen an computergestützte Befragungssysteme. ZUMA-Arbeitsbericht Nr. 91/21. Poynter, R. (1991). Of Mice and Man and Women. In Proceedings of the 1991 Sawtooth Software Conference, (pp 55-66). Sun Valley, Idaho: Sawtooth. Riede, Τ. and V. Dorn (1991). Zur Ersetzbarkeit von Laptops in Haushaltsbefragungen in der Bundesrepublik Deutschland. Schriftenreihe Ausgewählte Arbeitsunterlagen, 20. Saris, W.E. (1991). Computer-Assisted Interviewing. London: Sage. Schneid, M. (1986). Methodenprojekt: Egozentrierte Netzwerke und Kontextanalyse. Bericht über die telephonische Nachbefragung, ZUMA-Technischer Bericht T85/09. Schneid, M. (1989). Datenerhebung am PC. planung und analyse, April/Mai, 148-154. Schneid, M. (1991). Einsatz computergestützter Befragungssysteme in der Bundesrepublik Deutschland. ZUMA-Arbeitsbericht Nr. 91/20. Sebestik, J., H. Zelon, D. DeWitt, J. M. O'Reilly, and K. McGowan (1988). Initial Experiences with CAPI. In Proceedings of the Fourth Annual Research Conference, (pp 357 - 371) Arlington, Virginia: Bureau of the Census. Severijnen, P. C. A. and P. R. Willems (1993). AGORA, is g66n bedreigde diersoort: STASPANEL DELFT. Paper presented on the 1993 NVvM congress. Shanks, J. M. (1983). The current status of computer-assisted telephone interviewing: Recent progress and future prospects. Sociological Methods & Research, 12, 119-142. Sproull, L. S. (1986). Using electronic mail for data collection in organizational research. Academy of Management Psychologist, 42. 159-169. Statistics Sweden, (1989). Computer Assisted Data Collection in the Labour Force Survey. Report of Technical Tests. Technical Report. Sudman, S. (1983). Survey research and technological change. Sociological Methods & Research, 12, 217-230. Thornberry, Ο., B. Rowe, and R. Biggar (1990). Use of CAPI with the U.S. National Health Interview Survey. Presentation at the XII World Congress of Sociology, Madrid, July. Tull, D.S. and D. I. Hawkins (1987). Marketing research: Measurement and method (4th ed.) New York: Macmillan. Van Bastelaer, A. M. L. and D. Sikkel (1987). From Three to Threehundred Hand-Held Computers. In CBS select 4, Automation in Survey Processing, Netherlands Central Bureau of Statistics, Voorburg, 27-36. Van Valey, Th. L. and S. R. Crull (1991). Interviewer Error: A Comparison Of Methods. In Proceedings of the Sawtooth Software Conference (pp. 1-10), Sun Valley, Idaho: Sawtooth. Vavra, T.G. (1986). A comparison of computer administered questionnaires with paper and pencil questionnaires. Paper presented to the 3rd Annual Microcomputers in Marketing Education Workshop. Pomona: California State Polytechnic University. Waterton, J.J. and J. C. Duffy (1984). A comparison of computer interviewing techniques and traditional methods in the collection of self-report alcohol consumption data in a field survey. International Statistical Review, 52,173-182.
98
Rolf Porst, Michael Schneid, and Jan Willem van Brouwershaven
Werking, G. S., A. Tupek, and R. Clayton (1988). CATI and Touchtone Self-Response Applications for Establishment Survey. Journal of Official Statistics, 4, 349-362. Wilson, Β.(1989). Disk-By-Mail Surveys: Three Years" Experience. In Proceedings of the 1989 Sawtooth Software Conference (pp. 1-4). Sun Valley, Idaho: Sawtooth. Wojcik, M.S. and A. Hunt. (1993). CAPI training: Where Do We Go From Here?. Paper presented at the 1993 AAPOR Conference. Wyatt, E. (1991). Quality in Computer Interviewing-Tricks of the Trade. In Proceedings of the Sawtooth Software Conference (pp. 33-39). Sun Valley, Idaho: Sawtooth. Zandan, P. and L. Frost (1989). Customer Satisfaction Research Using Disks-By-Mail. In Proceedings of the 1989 Sawtooth Software Conference (pp. 5-17). Sun Valley, Idaho: Sawtooth.
The Study of Work Values: A Call for a More Balanced Perspective Arthur P. Brief and Ramon J. Aldag
1
Introduction
The title of this chapter is somewhat arbitrary. Rather than using the term "work values," we could have relied upon several other alternatives that capture what the chapter is about. These other terms include the functions, purposes, or meanings of "occupational work" (Miller, 1980) in people's lives. While each alternative can be construed as conveying a distinctive idea, they all share a common core. That core is concerned with the question of why people work - what is it people gain from working that motivates them to invest their physical, intellectual, and emotional energies in it? For many thinkers, this question strikes at the very essence of who we are as individuals and as collectives of people. This is why so much effort across disciplines, in both the humanities and the social sciences, has been devoted to the study of work values. The intents of this chapter are to assess the study of work values and, based upon that assessment, to suggest potentially fruitful avenues for future investigations. Clearly, however, we will not attempt to provide a comprehensive review of the work values literature. Our approach will be selective in two ways. First, we focus on the research of only one loosely defined group of investigators, organizational scientists. This group - comprised of psychologists, sociologists, and others concerned with understanding the thoughts, feelings, and actions of people at work was selected because much of the recent work values research has been performed by its members and because we identify ourselves as organizational scientists and, as such, this research is accessible to us. We believe most readers will not find our choice particularly problematic for, as noted, the work values research in the organizational sciences is relatively large and current and conducted by investigators rooted in various other disciplines. The second way our approach is selective is the manner in which we treat the work values research of the organizational scientist. Because of the size of this segment of the literature, we again are forced to sample rather than survey the population. In doing so, however, we will attempt to portray adequately the key conceptual themes and empirical methodologies in the body of research. The remainder of the chapter unfolds as follows. First, a very brief history of work values and its study in the organizational sciences is presented. The results of this historical analysis indicate that a particular view, which we label the "internal orientation," has dominated the study of work values in the organizational sciences. This orientation emphasizes the value of work itself. In the next section of the chapter, we offer evidence for an alternative view, the "external orientation," which focuses on the value of the economic outcomes of work. In doing so, special attention is paid to the measurement of work values. The chapter closes with a set of suggestions for future studies of work values.
100
2
Arthur P. Brief and Ramon J. Aldag
A Brief History of Work Values and their Study
Although the development and transmission of work values are historical processes (cf. Aldag & Brief, 1979a), this history often is ignored or misunderstood. Without careful analysis of the total configuration of forces that shape history, the course that history took often seems inevitable. Interests, values, and ideologies influence the content of what is accepted as historical fact. While a number of distortions result, one that is especially central to this chapter is the tendency of historical accounts to overlook the alternative courses that history may have taken, thereby making the course that history did take appear to be inevitable. In this section, we examine the conventional view of work values in the organizational sciences. We summarize its history and substance. We begin with a very cursory look at the concept of work in the history of Western civilization. [For a fuller description, see Tilgher (1931); de Grazia (1964); and Neff (1968).] Ancient Greek philosophers saw work as a waste of a citizen's time and as a corrupting activity that made the pursuit of truth and virtue more difficult. Leisure time provided the vehicle for the attainment of truth and virtue; it was reserved for the exercise of the mind and spirit. For example, Aristotle maintained that leisure itself was a source of intrinsic pleasure, happiness, and felicity. In his view, happiness did not arise from an occupation; it was a property of those who had leisure. Through the writings of Aristotle, as well as of Plato and Epicurus, these work values spread to Rome and largely were embraced. Of course, in both societies this classical ideology of work was dependent on slavery; slaves were viewed more as instruments than as people. As Grant (1960, p. 112) observed, the Roman writer Cato the Elder offered the following advice: "The best principle of management is to treat both slaves and animals well enough to give them the strength to work hard." Thus, work was a curse that could be separated from the good life, but this separation required a class of persons to whom one could assign, without feeling remorse, vulgar or degrading activities. Ancient Hebrew philosophers and theologians shared the Greco-Roman system of work values, with one principal exception. They saw work as a product of original sin and valued it ass a means of atonement. In other words, to the ancient Hebrews, work was both a product of original sin and a way of redeeming oneself in the eyes of God. Work, therefore, as a way of cooperating with God in the world's salvation, began to emerge in a positive light. These more positive feelings about work became part of the very early Christian teachings and continued to be evident through the Middle Ages. Work was viewed as a route to goodness; it was a path for accumulating a surplus of goods and services to be shared with the needy. The hoarding of riches, however, was considered a transgression of the law of God - the sin of avarice. But, work itself (i.e., the activities of work per se) as seen, for example, by Thomas Aquinas, was morally neutral - a natural condition of Christian life. With Luther, the moral neutrality of work itself began to wane. Luther advocated the concept of a "calling," a life-task set by God. Weber (1930) argued that the rise of capitalism in America was spurred by a work ethic which emphasized the inherent morality of working hard. He traced this emphasis to the teachings of Luther as later interpreted by Calvin who saw a person's work, whatever it may be, as predestined by God. Furthermore, Calvin taught that to excel at one's work was a sign of salvation. Puritan interpretations of Calvin's theological view of work ultimately led to an ethic embracing the morality of hard work. Thus, the able-bodied individual who does not work hard is seen as immoral. From a so-called Protestant work ethic perspective, therefore, work functions to serve moral ends. Weber further argued that work served this moral function for the bulk of pre-Industrial Revolution workers in America and, thereby, contributed to this nation's eco-
The Study of Work Values: A Call for a More Balanced Perspective
101
nomic progress. Weber's arguments have come to be treated by many as conventional wisdom (e.g., Aldag & Brief, 1979a; Dubin, 1976; George, 1968; Heneman, 1973; Parker & Smith, 1976; Vroom, 1964). For example, Hulin and Blood (1968) described the work norms of middle class Americans as follows: "Positive affect for occupational achievement, a belief in the intrinsic value of hard work, a striving for the attainment of responsible positions, and a belief in the work-related aspects of Calvinism and the Protestant ethic" (p. 48). They added that the dominance of these values can be explained by children learning in school and at home those values: brought by the Anglo-Saxon Protestants from Europe in the 17th and 18th centuries. The values have become the standard in middle-class teachers society. Children are taught these values in school by their middle-class teachers and attempt to reach goals defined in terms of these values of behavior consistent with these values (p. 52). Such conventional wisdom is supported by at least three other beliefs evident among organizational scientists. These beliefs relate to the changing importance of needs, the contributions of intrinsic outcomes to overall job satisfaction, and self reports of the importance of job outcomes. Regarding changes in need importance, Maslow's need hierarchy, with its conception of the successive prepotency of needs, has become firmly entrenched in the popular consciousness, contrary evidence notwithstanding. The common view, based on this succession, is that socalled lower order needs now have been fairly well satisfied for the majority of American workers and, therefore, greater attention should be given to satisfaction of so-called higher order needs. Often left unstated is the implicit assumption that higher order needs are best satisfied by intrinsic outcomes (e.g., work itself), while lower order need satisfaction is more tightly linked to extrinsic outcomes (e.g., pay). Some point to the fact that overall affect toward the job, as captured by general job satisfaction measures, is primarily determined by affect toward the non-economic aspects of the job (e.g., Campbell, Converse & Rodgers, 1976). For instance, Aldag and Brief (1978) found that when general job satisfaction was regressed on the job satisfaction facets of work itself, pay promotional opportunities, coworkers, and supervision, only the coefficient of the satisfaction with work itself term was significant. Such findings suggest that the content of the work itself is the critical determinant of overall job satisfaction. Those favoring the internal view also present another empirical set of arguments. For instance, they cite studies showing that 75% of workers report they would continue to work even if they inherited enough money to live comfortably (Kanter, 1978). They note also that workers do not consistently rank pay as the most important motivator of work. Lawler (1971) reviewed these studies and found pay to be ranked first in only 27% of them. As can be seen, the conventional view, initially reflective of a portion of Weber's description of the Protestant work ethic, has been modified to emphasize a perspective on work values largely devoid of religious content and tied closely to a psychological concern with the intrinsic outcomes of the activity of working. Thus, the conventional view, in the organizational sciences, is an internal one, advocating work as influencing worker well-being primarily through its impact on so-called higher order needs. In its extreme form, this orientation is exemplified by the writings of Argyris (1957, 1964, 1973). He asserted that there are important psychological needs (e.g., the needs to be relatively independent and active, and to use important abilities) that can only be satisfied fully at work. That is, if work satisfies these higher order psychological needs, it can be seen as an end in and of itself. Such an internal orientation toward the functions of work is evident in the writings of Maslow (1943), McGregor (1960), Herzberg (1966), and their more contemporary proponents who suggest various ways of structuring work itself as a means of enhancing the motivation to perform. Coupled with the fact that the internal orienta-
102
Arthur P. Brief and Ramon J. Aldag
tion has a socially desirable, apparently humanistic aura, there is an overwhelming sentiment, at least in most areas of the organizational sciences literature, for such an orientation. In part, for example, research on task characteristics, participative management, and other so-called organizational development tactics, appears to rest, implicitly or explicitly, on the internal orientation (see Nord, 1986). But, as will be discussed in more detail later, the internal orientation is only a partially valid picture, for it is based on an incomplete understanding of history (Nord, Brief, Atieh, & Doherty, 1988). What has been accepted as historical fact has been influenced by a complex interaction of values, interests, and desires. However, in retrospect, in the absence of indepth analyses of the historical process, those who hold the internal view interpret the process in a far more linear fashion than it was. Moreover, they fail to see the ideological contents of their perspective and become subject to what Bern and Bern (1970) labeled an "unconscious ideology." An unconscious ideology contains attitudes and beliefs that an individual accepts without awareness. These beliefs constrain a person's ability to conceive of alternative possibilities and tend to go unrecognized until confronted by a fundamentally different perspective. The internal view of work values is so widely shared by organizational scientists, as well as by managers, politicians, and even trade union leaders in America, that it contains elements of an unconscious ideology. Managers frequently respond to problems with workers by lamenting the decay of the work ethic. Organizational scientists have built much of their theory, research, and advice to managers on the premise that work is somehow noble and that psychologically engaging work is a necessary condition for human development. Politicians appear to find it useful to attach their comments to an assumed set of work values. In celebrating Labor Day, 1971, for instance, the President of the United States saluted "the dignity of work, the value of achievement, and the morality of self-reliance. None of these are going out of style" (Guttman, 1976, p. 4). In fact, the work ethic is assumed to be so widely shared that Rodgers (1974) observed that, in the past, trade unionists, radicals, and conservatives often attempted to rally support for their causes through an appeal to the premises of the work ethic (Rodgers, 1974). Even today, it is popular to appeal to the need to rekindle the work ethic to solve a variety of social and economic problems. Clearly, the internal view represents a major feature in American society. Not only will it be shown that the internal view rests on a partially distorted historical understanding, but we also will argue that proponents of the internal view seem to discount too easily contemporary empirical results inconsistent with their perspective. Below, in presenting evidence for an alternative, external view of work values, we address both of these difficulties with the internal orientation.
3
Support for the External View
In presenting support for the external view, the four sets of evidence in support of the internal orientation will first be addressed. Then, additional evidence drawn from the literatures of job involvement, subjective well-being, and social stratification will be briefly considered.
3.1
The Work Ethic
As noted above, Weber's (1930) arguments concerning the moral function of work have become widely accepted. If this conventional perspective can be refuted, then the view that work serves other functions (economic in nature) becomes more plausible. In fact, a number of analyses
The Study of Work Values: A Call for a More Balanced Perspective
103
seriously question Weber's view of social history (e.g., Guttman, 1976; Hays, 1957; Nelson, 1975, Pollard, 1963, Stone, 1974; Thompson, 1963). Rodgers' (1974) study of the work ethic in America between 1850 and 1920 will be used to exemplify these ideas. Rodgers noted, for example, the "nagging contradiction between the ideals of duty and of success - between the appeal to the dignity of all labor, even the humblest, and the equally universal counsel to work one's way as quickly as possible out of manual toil" (p. 14). In addition to the tensions within the work ethic of the era, Rodgers saw a tension between the classes. While the meddling, largely Protestant, property-owning classes emphasized the moral function of work, the "ascetic injunctions of Puritanism never penetrated very far into the urban working classes" (p. 15). The Industrial Revolution, according to Rodgers, drove a widening wedge between Protestant work ideals and the realities of work. Workers responded by "reporting irregularly for work, moving restlessly from job to job, or engaging in slowdowns and work restrictions" (p. 155). These behavioral responses were coupled with "a withdrawal of interest in work and indulgence in the dream of one day when men would no longer live to work, but would make labor 'auxiliary' to their lives" (p. 174). As Rodgers noted, the middle class moralists responded to the dilemmas created by the Industrial Revolution by such actions as launching the cooperative movement, crusading for worker democracy, and inaugurating handicraft and industrial betterment programs. While this represents only a fraction of Rodgers' analysis, the conclusions are clear. Prior to the Industrial Revolution in America, a moralist perspective on the functions of work was represented by a middle class espousing a not-wholly consistent rhetoric. The Revolution itself sharpened differences between these moralistic spokespersons and the masses of workers. Thus, the moral function of work as emphasized by Weber was probably never widely evident in America. It seems clear, therefore, that the moralist foundation of the internal orientation is weak. Thus, as reasoned by Nord, Brief, Atieh, and Doherty (1988), there is a need to examine alternative functions of work.
3.2
Changing Need Structure
Arguments about the relative importance of intrinsic outcomes, which are based on historical perspectives of the successive prepotency of needs, are flawed by an assumed fit between need hierarchy level and outcome type. It is generally taken for granted that economic outcomes satisfy physiological and security needs. For instance, from a philosophical point of view, Okrent (1978-1979) sought to answer the question of why people work. His initial response was "People work in order to live, to eat, to survive. This is immediately apparent and self-evident" (p. 322). However, the fact that economic outcomes may satisfy these needs does not preclude their instrumental function for other needs. This fact is often discounted in need-based theories of work motivation. For instance, Alderfer (1972) recognized the role of income in satisfying existence needs, but subsequently ignored its instrumental function in fulfilling relatedness and growth needs. In part, for instance, he defined growth need as the need to master one's environment (i.e., to feel in control or efficacious). As will be seen shortly however, self-efficacy in life may be linked to the work domain through income. Thus, in a very basic way, income is a vehicle for fulfilling even so-called higher order needs. In this regard, Locke (1976, p. 1322) has stated: The root of desire for pay as such is the individual's desire to satisfy his physical needs (food, shelter, shelter, etc.). But it can mean much more than this. Money also serves as a symbol of achievement (McClelland, 1961) as a source of recognition and as a
104
Arthur P. Brief and Ramon J. Aldag
means of obtaining other values (e.g., leisure, art, etc.). To some it is a status symbol; to others it means security; to others it allows greater freedom of action in all areas of life. Further, the argument that "lower-order" needs have been relatively well satisfied, and thus have become less important, is contradicted by the literature on distributive justice. That literature challenges the conventional wisdom concerning the motivating potential of pay. In particular, Mirowsky (1987) developed a set of hypotheses regarding the psycho-economics of feeling underpaid based, in part, on the works of Rainwater (1974), Jasso (1978), and Veblen (1899). He tested those hypotheses using a national sample of 680 married couples. Interestingly, he found that, as income rises, people shift from comparisons based on the amount needed to "get along" along" to the amount needed to "get ahead," producing a U-shaped relationship between income levels and perceived underpayment. He concluded that: There is a tendency to assume that human goals are like geographic destinations. One chooses a destination, sets off in its direction, and closes the distance with each advancing step. The old image of a draught-horse plodding after a carrot dangling from a stick tied to its harness is a reminder that the geographic analogy may be wrong. Advancement may be driven by two motives: avoiding the bottom and seeking the top. The bottom is a more-or-less fixed point that recedes with achievement, but the top is very different. Each step towards the top multiplies the cost of the next and displaces the old level of comparison with a new and higher one. The motive is less and less satisfied. The goal recedes as it is approached. The logical consequences are often ironic: the highest pay does not minimize the sense of underpayment, and the pay that does often seems unfairly low. (p. 1433)
3.3
Contribution to Overall Satisfaction
While there is, in fact, some evidence that job satisfaction measures largely capture reactions to the work itself, such evidence is potentially misleading. That is, of ultimate concern in assessing the relative impacts of economic and other outcomes are the roles that economic and other factors play in workers' entire lives, not just in their affect toward work. By viewing economic and non-economic factors in this broader context, one gets a markedly different picture of their relative roles. To address this issue, it is useful to consider the relationship of work to the quality of life. In particular, research on this topic has focused primarily on the job satisfaction-life satisfaction relationship. It has been found across studies that less than 10% of the variance in life satisfaction can be attributed to job satisfaction (Rice, Near, & Hunt, 1980). This result should be interpreted in light of several factors. First, strong, positive correlations between job and life satisfaction are to be expected because the relationship, at least at the conceptual level, represents a part-whole correlation (Quinn, Staines, & McCullough, 1974). Second, measures of job and life satisfaction generally are collected at one point in time with scales that often use similar if not identical formats, leading to some degree of covariation solely attributable to common method variance (Rice et. al., 1980). Thus, the figure of less than 10% common variance is probably an overstatement of true shared variance, and it is surprisingly low. The absence of a strong link between job and life satisfaction is further evidenced by the fact that attempts to isolate moderators which strengthen the relationship have yielded no substantial results (e.g., Brief & Hollenbeck, 1985).
The Study of Work Values: A Call for a More Balanced Perspective
105
How can this remarkable set of findings be explained? Does work really have so little to do with overall quality of life? Such a conclusion, in the present authors' opinion, would have to rest on a logical flaw concerning causal mechanisms. That is, such a view is based on the assumption that the contribution of work to overall quality of life operates through the intervening variable of job satisfaction. In fact, other causal mechanisms seem more likely. Recall that most job-life satisfaction researchers have relied upon measures of overall job satisfaction. As noted above, some evidence (e.g., Aldag & Brief, 1978; Campbell, Converse, & Rodgers, 1976) has indicated that such measures largely capture affect toward the noneconomic facets of jobs (e.g., the work itself). However, Chacko (1983), based upon a study of the relative contributions of satisfaction with economic and noneconomic job facets to life satisfaction, concluded that economic facets appear to be the stronger correlates of life satisfaction. One is encouraged, therefore, to look beyond the internal aspects of jobs to isolate the functions of work. Rather, it appears that other functions are to be found in what people take away from their jobs and are of instrumental value in the other domains of their lives. That is, economic outcomes may impact directly on life satisfaction, rather than indirectly through overall job satisfaction. This posture is strongly supported by the research program of Andrews and Withey (1974). They identified 30 semi-independent domains of life and, like most other quality of life researchers, found the work-related domain to only marginally contribute to perceived quality of life. Importantly, however, Andrews and Withey also found the work domain to stand alone from other domains more central to quality of life, except for one linkage. Income was shown to be linked directly or indirectly to self-efficacy, family fun, and housing. These, along with money, were the domains found to make the largest independent contributions to perceived quality of life. Again, the noneconomic dimensions of work were shown to stand in isolation. Similar results from a sample of German workers were reported by Bergermaier, Borg and Champoux (1984). The likelihood of underestimating the impact of pay on satisfaction also is suggested by the literature on the interplay of job context and job content. For instance, Oldham, Hackman, and Pearce (1976) have shown that individuals who were satisfied with work context, including pay, were generally more responsive to job content factors than were those dissatisfied with context. Katerberg, Horn, and Hulin (1979), while finding only limited support for the moderating role of satisfaction with context on the job complexity-employee response relationship, did find that most of the effect of context was produced by pay satisfaction. In a related vein, Ferris and Gilmore (1984) have shown organizational climate to moderate the impact of specific task characteristics on job satisfaction. Dunham, Pierce, and Newstrom (1983) have reviewed this literature and proposed psychological absorption/distraction as a process to explain the role of contextual satisfaction. Taken together, this stream of research suggests that job contextual factors, including pay and job security, may serve as necessary conditions for the activation of the role of job content. When viewed in this light, the impact of job content on employee responses (such as satisfaction/dissatisfaction) may itself be seen as due, at least in part, to perceived adequacy of levels of economic factors.
3.4
The Reported Importance of Extrinsic Rewards
Self-reports of behavioral tendencies and of the relative importance of various work outcomes must obviously be viewed with caution. The recent surge in the number and size of state-operated lotteries and commercial giveaways should provide an interesting, if limited, test
106
Arthur P. Brief and Ramon J. Aldag
of whether "instant millionaires" will in fact choose to continue to work. Until then, however, such reports must be viewed as baseless speculation. On the issue of the rated importance of various job outcomes, however, there is a larger literature, and the evidence provides little convincing support for the internal view. For instance, Lawler (1971) noted in his review of studies on the ranking of outcome importance that the results were sensitive to question wording and, of greater importance, to pay level. In particular, respondents at lower pay levels reported higher ranks for pay. More generally, it seems that some minimum-hurdle levels of economic outcomes from work are required before other work outcomes gain importance (Cofer & Appley, 1964; Wahba & Bridwell, 1976). As pay levels increase, therefore, the importance of pay as a motivator may decrease relative to other outcomes. However, if a person's economic security is threatened, the importance of pay ranking is likely to increase. The "givebacks" in recent labor contracts provide concrete evidence of this phenomenon. Another reason to doubt the validity of self-reports of attribute importance relates to social desirability of responses. Quite simply, it is probably seen by most people as more socially acceptable to show an interest in, say, job challenge and responsibility than in economic outcomes. Evidence in support of this contention is provided by policy capturing studies that typically show job security and economic outcomes to have a markedly greater importance ranking than is suggested by self-reports. For instance, Zedeck (1977) found job security to be ranked fifth of six outcomes by a simple self-reported ranking method, but first by policy capturing. In that study, pay retained a second-place ranking by both methods. Feldman and Arnold (1978) found pay and fringes to be ranked fourth of six outcomes by simple self-reported ranking, but first by policy capturing. Arnold and Feldman (1981) found these differential rankings to be especially great for those subjects with high needs for social approval. Such individuals overreported (comparing rankings to inferred weighting from policy capturing) the importance of autonomy and opportunities to use important skills and underreported the importance of pay and fringe benefits (see Schwab, Rynes, & Aldag, 1987 for a fuller discussion of this issue). In addition, it is interesting to speculate concerning how the results of studies such as those reviewed by Lawler would have been altered through inclusion of unemployed respondents seeking work. The extensive research program of Warr and his colleagues on the unemployment problem in Britain provides some clues. Warr (1983) found the psychological health (as indexed by levels of anxiety, depression, insomnia, irritability, self-confidence, listlessness, and concentration) of the unemployed to be poorer than that of employed individuals. Further, this poorer health was apparently the result of unemployment since a return to paid employment was commonly followed by an improvement in psychological health. As further evidence on this point, Brief, Konovsky, Goodwin, George, and Link (in press) examined economic and experiential functions of work for a sample of 148 unemployed individuals. Economic deprivation was gauged by items assessing the extent to which subjects were having difficulties affording life's necessities since they became unemployed. Experiential deprivation was assessed by items reflecting the relative absence of the latent consequences of work (time structure, opportunity for social experiences outside the family, status level, and variety of activity) due to unemployment. As predicted, increasing length of unemployment was associated with heightened economic and experiential deprivation. Economic deprivation, in turn, was associated both with more experiential deprivation and lower subjective well-being. Unexpectedly, however, experiential deprivation was not related to subjective well-being. Such findings further reinforce the importance of the economic functions of work in people's lives. [For more on the relationship between unemployment and health, see, for example, Catalano and Dooley (1977, 1983); Jahoda (1982); and Kahn (1981)].
The Study of Work Values: A Call for a More Balanced Perspective
107
An additional reason to believe that the actual importance ranking of pay may be greater than suggested by workers' rankings relates to the relative variances associated with outcomes. That is, the importance an individual attaches to an outcome relates both to the level of desirability of the outcome and to the perceived variance of the outcome. In general, rated importance is likely to be monotonically associated with variance. For instance, an individual recognizing that all feasible jobs have similar pay ranges but offer substantially different degrees of job challenge has little reason to emphasize pay. Since competitive factors probably more tightly constrain pay levels than more intrinsic factors - and since intrinsic factors may be harder to objectively quantify and thus more subject to perceptual widening of variance - differential variability of alternative outcomes is likely to dampen the relative importance ranking of extrinsic outcomes. Thus, any self-ranking of the importance of outcomes probably says more about social desirability, current outcome levels, employment status of those sampled, and relative perceived variances of outcomes than about the outcomes per se. However, each of these considerations probably serves to understate the actual importance of economic outcomes of work.
4
Other Evidence for the Functionality of Economic Work Outcomes
Other streams of evidence support the external orientation. For instance, the role of income is highlighted in a study by Gould and Werbel (1983). They found, for a sample of municipal employees in a large southern city, that (a) both job involvement and organizational identification were lower among males whose spouses worked than among those whose spouses were not employed; and (b) for males whose spouses were employed, job involvement and organizational identification were higher for those with children than for those who were childless. Gould and Werbel interpreted these findings as consistent with Hall and Hall's (1978) speculations regarding the relationship between financial need and work involvement. Using LISREL analysis on data from the 1977 Quality of Employment Survey, Fenwick and Olson (1986) examined support for worker participation among union and nonunion workers. They found that, while dissatisfaction with intrinsic outcomes did not lead to support for participation, dissatisfaction with extrinsic outcomes did. These findings led Fenwick and Olson to criticize advocates of participation for "the lack of attention given to the relationship between extrinsic dissatisfaction and support for participation, and the apparently instrumental view many workers have of participation...advocates of participation either tend to dismiss extrinsic and instrumental orientations of workers or see them as barriers to participation. Our analysis suggests otherwise. Indeed, it leads us to argue that failure of proponents to consider the extrinsic basis of support for participation could limit or undermine the various strategies of participation advocated" (p. 519). Klein (1987) presented evidence consistent with the Fenwick and Olson (1986) argument. She evaluated three models as explanations for employees' reactions to stock ownership plans: (a) ownership per se increases employees' commitment to the company; (b) employee ownership increases worker participation which, in turn, increases employee commitment; and (c) employee ownership increases organizational commitment if ownership is financially rewarding to employees. Based upon the strong support she obtained for the financial benefits of ownership model, Klein concluded that "... by documenting the impact of financial rewards on employee attitudes, the study invites additional research on compensation and benefit systems. Psycholo-
108
Arthur P. Brief and Ramon J. Aldag
gists have tended to neglect this important area of study. (In contrast, participative management is the focus of considerable psychological research and theory.)" (p. 329). Further, while it may be socially desirable to assert that "Money can't buy you happiness," the subjective well-being literature suggests otherwise. Diener (1984, p. 553) concluded, after reviewing the relevant research, that "there is an overwhelming amount of evidence that shows a positive relationship between income and SWB (Subjective Well-Being) within countries." Consistent with Diener's conclusions on the psychology of well-being arc a number of studies on the sociology of work (e.g., Dubin, 1956; Lyman, 1955; Tausky, 1969). For instance, Goldthorpe, Lockwood, Bechofer, and Piatt (1969, p. 164) concluded from a study of British blue-collar workers that "the meaning they gave to the activities and relationships of work was a predominantly instrumental one; work was defined and experienced essentially as a means to the pursuit of ends outside of work and usually ones relating to standards of domestic living." In a related vein, research by Ross and Huber (1985) on a national sample of married couples vividly illustrated the consequences for emotional well-being of the ability to meet family obligations. Their findings concerning the impact of earnings on depression led them to flatly conclude that "For a husband, the bottom line in socioeconomic status and psychological well-being appears to be money and what it symbolizes to himself and others" (p. 324). Similarly, Adelmann (1987), using data collected by face-to-face interviews in a 1976 cross-sectional national survey of adults age 21 and over (Veroff, Diuvan, & Kulka, 1981) as well as data from the Dictionary of Occupational Titles, hypothesized that higher occupational complexity, personal control over self and others, and personal income would be associated with higher levels of happiness and self-confidence and lower psychological vulnerability. She found personal income to be a stronger predictor of happiness and low vulnerability than were age, education, occupational complexity, or control. Further, income was a stronger predictor of self-confidence than were any of those variables except education. There is substantial additional evidence to support this conclusion regarding the relationship of socioeconomic status to well-being (e.g., Dohrenwend & Dohrenwend, 1969; Kessler, 1982; Liem & Liem, 1978). Almost a half century ago, moreover, researchers of the Great Depression (e.g., Bakke, 1940; Komarovsky, 1940) supplied thick descriptions of how the relationship was particularly manifest among male breadwinners who had lost their jobs. In reviewing this Great Depression research, O'Brien (1986) examined two alternative interpretations of the psychological effects of unemployment: (a) the loss of job activities per se led to distress, implying that job activities are essential for the maintenance of psychological health; and, (b) the distress resulting from job loss is primarily attributable to economic deprivation, that is, to income loss. Based upon his review, O'Brien concluded that, "The literature seems to identify the major stressor as economic deprivation, not the absence of job activities" (p. 201); and, "Psychologists have tended to understate the importance of economic factors by focusing on the loss of activity, time structure, and job prestige" (p. 206). Results presented by Staines, Pottick, and Fudge (1986) are consistent with O'Brien's conclusions, as well as with those of Ross and Huber (1985). Staines et al. sought to better understand the relationship between wives' employment and husbands' attitudes toward work and life. They found husbands of working wives to feel less adequate as family breadwinners than do husbands of housewives, and these feelings appear to account in substantial measure for their lower levels of job and life satisfaction. Thus, it seems that husbands manifest an external orientation toward the functions of work through their adherence to what Bernard (1981) calls the "good provider role." The robust nature of the findings reported by O'Brien is illustrated by a study on female breadwinners. Downey and Moen (1987), noting previous research showing men's sense of per-
The Study of Work Values: A Call for a More Balanced Perspective
109
sonal efficacy to be related to their earnings, used data from the University of Michigan's Panel Study of Income Dynamics to examine the income-efficacy relationship for women heading households. They considered three theoretical perspectives: role enhancement (personal income is positively related to efficacy), sex-role socialization (personal income is less important than family roles in promoting efficacy), and role combination (the effects of income on efficacy are moderated by family demands). They estimated several models of efficacy change, incorporating measures of previous efficacy, income, income change, employment transitions, family transitions, as well as controls for race, education, and physical health, and concluded: Our findings indicate that achievement in the form of increased earnings affects the personal efficacy of women heading households in the same way as previous research has shown it to affect that of men (Andrisani, 1977, 1978; Duncan & Liker, 1983). Namely, an increase in personal income is associated with an increase in efficacy. Moreover, the positive impact of earnings on personal efficacy is independent of family transitions. Specifically, the effect is not contingent on either marital transitions (getting married), or on parental status or transitions ... The present study does not support the role-combination hypothesis. Instead, the data are consistent with the role-enhancement perspective, which proposes that the rewards of employment should promote feelings of self-worth. However, the employment role itself, apart from income, had no effect on efficacy (p. 328). Perhaps the most obvious place to look for evidence of the functionality of economic work outcomes is the economics literature. However, since that literature seems to accept the primacy of economic outcomes for workers as gospel, explicit consideration of the issue is not deemed necessary. For instance, Schwab et al. (1987) have observed that the economic theory of job search has focused on a single job attribute, pay. Economics does not stand alone as a discipline in which the economic instrumentality of work is presumed to be primary. In industrial relations, Wheeler (1985), for example, has stated, "When one applies Occam's razor to lay bare the core of the industrial relationship, one finds an exchange of autonomy for pay" (p. 241). Indeed, even in the personnel and human resource management literature (albeit in a very limited way), the functionality of economic work outcomes has been presumed. In particular, Heneman (1985) reviewed the research on the consequences of pay satisfaction and was "less than overwhelmed by the quantity of research that has been conducted." As a result, he offered as his major suggestion for future research, "quite simply, more" (p. 137). The presumptions that have been dealt with obviously supply the weakest form of evidence for the functionality of economic work outcomes. They do demonstrate, however, that one can fall into the trap of implicitly treating the external orientation rather than, as is urged here, addressing it explicitly. Finally, it should be recognized that economic outcomes are closely entwined with the other two determinants of social stratification structures - status and power (Gerth & Mills, 1946). Therefore, economic outcomes may serve an array of additional functions, social in nature, which we have not considered here. For example, Mills, (1953, p. 230) long ago noted that "The economic motives for work are not its only firm rationale ... Work is also a means of gaining status, at the place of work, and in the general community ... And also work carries various sorts of power over materials and tools and machines, but more crucially now, over other people." [For discussions of the functionality of work-derived status and power, see for example, Inkeles (1960), Bradburn (1969), and Campbell (1981).] In sum, evidence in favor of the argument that economic outcomes of work serve essential functions in people's lives appears convincing. Consistent with the views of others (e.g., Fein, 1976; Neff, 1968; Nord, 1977; Strauss, 1963; Wool, 1973), the present authors conclude that
110
Arthur P. Brief and Ramon J. Aldag
the function of economic outcomes has been underemphasized by many behavioral scientists in trying to understand those factors underlying work behavior. As we noted earlier, it appears that behavioral scientists may be subject to an unconscious ideology that constrains an individual's ability to conceive of alternative possibilities and generally goes unrecognized until confronted by a fundamentally different perspective (Anthony, 1977; Nord, Brief, Atieh, & Doherty, 1988). An unconscious ideology of the internal orientation appears to be prevalent, and a forceful presentation of the external perspective is needed to bring that ideology to awareness.
5
Implications for Management Practice
The implications of the arguments of this paper for management practice are clear. Their essence, at least for the management of performance, is that greater emphasis should be placed on performance-economic outcome contingencies by managers to motivate performance. This suggests, for example, greater management reliance on performance-contingent pay plans, developing innovative ways of linking various sorts of fringe benefits to performance, and making explicit to workers that their jobs, and thus their economic security, are dependent on their performance. The arguments of this paper suggest, too, that the economic consequences of other interventions, such as job redesign or leadership training, should also be thoroughly considered. Implicitly or explicitly threatening workers with the loss of their economic security as a means of promoting job performance may be viewed by some as contrary to a humanistic value system. The present authors suspect that it is the troubling nature of such an implication that has led many writers to ignore the roles of economic outcomes or to argue against their relevance. In a sense, of course, the aversion to performance-contingent use of economic outcomes is curious. Those who believe intrinsic outcomes to be of primary importance to workers apparently feel no moral qualms concerning manipulation of, for instance, challenge and responsibility. However, to link economic outcomes - which they assert to be less important to performance is seen as inappropriate. It may be that, at some level, the advocates of the use of intrinsic outcomes really do believe economic outcomes to be important, to both workers and their families. In fact, if there is for some reason a tradeoff between extrinsic and intrinsic outcomes (and the authors of this paper are not convinced there is), and if extrinsic outcomes contribute more directly to family welfare, the desire of workers for intrinsic outcomes may be seen as selfish, and the heavy use of such rewards by organizations as short-sighted. Arguments against the use of performance-contingent economic outcomes, including job security, on the basis of organizational effectiveness criteria (e.g., Greenhalgh & Rosenblatt, 1984) rather than of values, appear unacceptable. Such arguments seem sound as long as one assumes that labor market conditions, including various legal constraints evident in that market, are such that the threat of one employer is offset by opportunity provided by another employer. In other words, to threaten employees with the loss of their jobs will have negative organizational consequence (e.g., reduced productivity and increased voluntary turnover) only in cases of high perceived ease of movement. This value-laden dilemma cannot be resolved through research. It is probably a dilemma that many managers have had to contend with as an ethical or moral issue. If work in fact, serves important economic functions, managers would have found that under appropriate labor market conditions their use of performance-contingent economic outcomes, including threats, generally produced desired behaviors. The criterion for choosing to use such an approach,
The Study of Work Values: A Call for a More Balanced Perspective
111
therefore, is not whether or not they "work," but rather whether such appreciation is right or wrong on some higher level.
6
Implications for Applied Organizational Research
This paper initially asserted that many applied organizational researchers assume an internal orientation toward the functions of work and that this orientation is associated with advocating a restructuring of the work itself as a means of enhancing the motivation to perform. Thus, it can be seen that much applied organizational research is based on the belief that workers, as complex psychological beings, require their work to satisfy an assignment of needs, principal among them so-called higher order needs. The arguments presented in this paper question the exclusivity of this belief by attesting to the plausibility of an external (or economic) orientation toward the functions of work. In this section, ways in which applied organizational research might be altered by adequately reflecting both orientations are suggested. Such an aim should not be interpreted as a claim that the applied literature is devoid of research rejecting a dual orientation; there are some, regrettably uncommon, examples. For instance, there are a number of studies addressing the job satisfaction-job performance relationship, typically with inconsistent but weak results (cf. Iaffaldano & Muchinsky, 1985; Vroom, 1964). In virtually all of these studies, job satisfaction was operationalized by some measure of overall job satisfaction. Therefore, distinctions between satisfaction with intrinsic (internal) and extrinsic (external) outcomes of work were ignored. Wanous (1974), however, drew such a distinction and thus reflected a dual orientation toward the functions of work. From his results, Wanous concluded that if satisfaction is at all causally related to performance, the causal agent is satisfaction with economic outcomes, a result inconsistent with exclusive reliance on an internal orientation. Another example of a dual orientation can be found in the study of turnover. A number of researchers have explicitly recognized the importance of economic (or employment) security in seeking to understand the relationship between job satisfaction (regardless of its intrinsic or extrinsic origins) and turnover (e.g. Hulin, Roznowski, & Hachiya, 1985; Muchinsky & Morrow, 1980). This recognition is rooted in findings, such as those of Eagly (1965), which indicate that labor market conditions accounted for over 70% of the year-to-year variance in voluntary termination rates. Moreover, Carsten and Spector (1987), in a meta-analytic study, found the job satisfaction-turnover relationship to be strong during periods of economic prosperity and weak during times of economic hardship. In essence, much of the recent work on procedural justice (e.g., Folger & Greenberg, 1985) also reflects a dual orientation, in that concern for the perceived fairness of the procedure used in making decisions often has been focused on the mechanisms used for distributing such organizational rewards as pay. For instance, Greenberg (1987) found support for the notion that procedural justice is a necessary precondition for distributive justice, but only when payments are low. Interestingly, he also found that positive reactions to pay generalize to the task itself. Since it is also known that task properties covary with pay levels (e.g., Milkovich & Newman, 1984), it may be the case that the frequently observed association between task properties and job satisfaction (e.g. Hackman & Oldham, 1975) are attributable, at least in part, to pay levels. Perhaps the most obvious reflection of a dual orientation are certain programs of cross-cultural organizational research. In particular, researchers in this area do not assume a given orientation. Rather, they have sought
112
Arthur P. Brief and Ramon J. Aldag
to empirically ascertain the differences, if any, in orientations of workers within and between countries (e.g., MOW international Research Team, 1987; Shenkar & Ronen, 1987). It should be noted that the tendency for reliance on a single orientation has been nurtured by certain methodological concerns. In particular in an attempt to sidestep problems caused by scale use tendencies, some researchers have utilized ipsative scales. While these scales may have superior psychometric properties (see Aldag & Brief, 1979b), they lead to an either-or attitude. For instance, growth need strength measure Β of the Job Diagnostic Survey is really a measure of strength of growth needs relative to other needs and says nothing about their absolute levels. The focus now shifts in this section to some suggestions concerning how future applied organizational research might be altered by assuming a dual orientation toward the functions of work.
6.1
Investigating the Economic Meaning of Interventions
First, research attention on interventions, which purportedly are designed to enhance both intrinsic satisfaction and performance (e.g., restructuring efforts such as job enrichment programs), would be broadened to include investigation of the economic meaning that workers attach to those interventions. Reviews of the research on contemporary approaches to enrichment indicate that such interventions produce, at best, very modest improvements in performance (see Aldag, Barr, & Brief, 1981; Roberts & Glick, 1981). Such marginal results may be due in part to the failure of interventionists and researchers to adequately consider the economic meaning of their efforts. As noted earlier, only one enrichment study has examined this meaning in any detail. Locke, Sirota, and Wolfson (1976), in attempting to understand the negative reactions of some workers to an enrichment program implemented in a government agency, conducted a series of post-experimental interviews. They concluded that "It was clear from the interviews that the employees viewed their jobs instrumentally, that is, as a means to an end. Their comments (e.g., "How could anyone agree with [the item on the questionnaire] Ί live, eat, and breathe my job'?") indicated that the concept of intrinsically satisfying work was not psychologically real to them. Their greatest concern was to get good ratings so that they could get promoted and get more pay. When these outcomes did not follow the enrichment program, the employees were angry and bitter" (p. 710). While concerned with job transfers rather than an enrichment intervention, Simonds and Orife (1975) drew conclusions similar to those of Locke et al. They found that economic factors, rather than intrinsics, predicted requested transfers. (See also Kennedy & O'Neill, 1958; MacKinney, Wernimont, & Galitz, 1962; and Turner & Michlette, 1962 for less direct, but consistent, evidence.) An extension of the first suggestion is that the economic meaning of any intervention aimed at boosting performance should be investigated. For example, research should address how, if at all, workers interpret the economic meaning of performance appraisal/feedback and goal setting programs. It may be the case that, at least in part, such programs are organizationally efficacious because workers see noncompliance with their dictates as a threat to economic security. Additionally, the less organizationally efficacious effects of interventions, such as participatory decision making (Filley, House, & Kerr, 1976; Locke & Schweiger, 1979; Schweiger & Leana, 1986), may be attributable to workers not seeing their involvement as instrumental for the attainment of economic outcomes. If this was found to be the case, then participatory programs might benefit from explicitly incorporating some instrumental component in their design.
The Study of Work Values: A Call for a More Balanced Perspective
6.2
113
Addressing the Integration of Worker and Organizational Goals
A second suggestion for how adoption of a dual orientation toward the functions of work might influence applied organizational research concerns the attention currently focused on seeking an understanding of how to integrate the goals of organizational effectiveness with the presumed goals of the worker to satisfy an array of psychological needs at work (e.g., Argyris, 1973; McGregor, 1967, Ouchi, 1981). Rather than exclusively trying to understand how management can better indoctrinate workers in the mission of the organization, efforts also would be aimed at studying how to manage an ideologically indifferent or even hostile workforce. While such indifference may emerge from workers being relatively unconcerned with the intrinsic features of their work, hostility may arise from management's attempts to substitute intrinsic for extrinsic outcomes, or from the conflict between management, which wants to control labor costs, and workers who want to enhance their economic gains and security. In regard to this latter point, some see such economic conflict between management and labor as part and parcel of capitalist organizational life (e.g., Marx, 1967). Edwards (1979, p. 12) has concisely described the relationship in production which he sees as the basis for this conflict: Workers must provide labor power in order to receive their wages, that is, they must show up for work; but they need not necessarily provide labor, much less the amount or labor that the capitalist desires to extract from the labor power they have sold. In a situation where the workers do not control their own labor processes and cannot make their work a creative experience, any exertion beyond the minimum needed to avert boredom will not be in the workers' interest. On the other side, for the capitalist it is true without limit that the more work he can wring out of the labor power he has purchased, the more goods will be produced; and they will be produced without any increased wage costs. It is this discrepancy between what the capitalist can buy in the market and what he needs for production that makes it imperative for him to control the labor process and the workers' activities. The capitalist need not be motivated to control things by an obsession for power; simple desire for profit will do it. Edwards goes on to define control as "the ability of capitalists and/or managers to obtain desired work behaviors from workers" (p. 17) and to describe the system for doing so as entailing the coordination of three elements: (1) direction (e.g., specifying what needs to be done with what degree of precision or accuracy); (2) evaluation (e.g., assessing each worker's performance); and (3) discipline (e.g., disciplining and rewarding workers in order to elicit cooperation and enforce compliance). The elements are at the core of human resource management practices. Thus, Edwards supplies a Marxist rationale for what human resource management is about. His rationale, rooted in the idea that labor-management conflict is unavoidable, is quite different from the story-line offered in texts on the subject. For example, Schüler and Youngblood (1986) defined "effective personnel management" as "recognition of the importance of a company's work force as vital human resources and the utilization of several functions and activities to ensure they are used effectively and legally for the benefit of the individual, the organization, and society" (p. 6). One can begin to see how consideration of the external orientation can lead to questions about the origins and intended consequences of contemporary personnel practices. For more on the origins and purposes of contemporary personnel practices see, for example, Baritz (1960), Bramel and Friend (1981), Braverman (1974), Burawoy (1979), Clawson (1980), and Nelson (1975).
114
6.3
Arthur P. Brief and Ramon J. Aldag
Considering Roles of Intrinsic and Extrinsic Variables in WorkNonwork Linkages
Third, research on work-nonwork linkages would demonstrate a fuller recognition of the distinction between intrinsic (internal) and extrinsic (economic) work outcomes and for the mechanisms by which these outcomes serve to integrate the work domain with other, perhaps more central, life concerns. Indeed, we anticipate that such demonstrations would lend further support to the arguments presented here regarding an external orientation toward the functions of work. Job stress is an example of an issue for which such work-nonwork linkages are important. While there has been a tendency in the stress literature to focus on such determinants as role ambiguity and intrinsic task characteristics, Brief and Atieh (1987) have argued that such a focus may be "making mountains out of molehills." Instead, they suggest that ample evidence exists to indicate that factors perceived by workers as influencing the maintenance of thenfinancial well-being may also be critical stressors.
6.4
Considerating Individual Differences
Fourth, the perspective developed in this paper suggests that individual differences may deserve additional attention. Simply, although the economic outcomes of work may be seen as essential functions for most, if not all, workers' lives, when those functions will be performed and how much income will be required to perform them probably varies across individuals. That is, some workers will see themselves as working all their lives to survive economically, while others will have this survival function served early in their working lives. Moreover, workers will vary considerably in terms of the income they define as adequate to insure survival. These individual differences are probably attributable to a complex interplay of psychological, sociological, and other factors. Their detailed treatment is beyond the scope of this paper. However, two classes of individual difference variables are viewed as being important to understanding the functionality of a given level of earned income; these are economic needs and economic expectations. After Gould and Werbel (1983), economic needs are seen as being indexed by such factors as the number of dependents the worker must support and the availability of alternative sources of money to the household. Possible indices of economic expectations include the workers' socioeconomic origins, level of education, and occupational prestige. The speculation proposed here is that as economic needs and economic expectations rise, higher levels of income earned from work are required to adequately serve such functions as meaningful enhancement of one's life satisfaction. While this notion is consistent with various themes in the sociology of work literature (e.g., Morse, 1953; Wilensky, 1960, Kalleberg, 1977), most theorizing by applied organizational researchers about the relative importance of work outcomes (e.g., Lawler, 1973; Locke, 1976; Landy, 1978) has neglected these types of individual difference indicators. George and Brief (1990), surveying a sample of almost 500 managerial and professional workers employed by an insurance company, found evidence supportive of this perspective. Using an objective indicator of financial requirements based on marital status, employment status of one's spouse, and number of children aged 22 and under, they found that (1) pay satisfaction and life satisfaction are positively correlated; (2) the pay-life satisfaction relationship is moderated by financial requirements such that the relationship is stronger for those with high requirements; and (3) the pay-life satisfaction relationship is jointly moderated by financial
The Study of Work Values: A Call for a More Balanced Perspective
115
requirements and gender such that the relationship is strongest for males with high financial requirements. Similarly, Doran, Stone, Brief, and George (1991), using a longitudinal design with a sample of retail salespersons, tested hypotheses regarding behavioral intentions as antecedents of job attitudes. Hierarchical regression analyses revealed that workers' intentions to leave at organizational entry predicted subsequent job satisfaction on 19 of 21 satisfaction scales. In general, these relationships were moderated by perceived choice, construed as the absence of externally imposed financial requirements or economic pressures to stay on the job. Financial requirements were assessed using the criteria applied by George and Brief (1990) (i.e., marital status, whether one's spouse works, and number of children aged 22 and under) as well as a housing arrangements variable tapping financial requirements due to housing costs. Consistent with cognitive dissonance theory, the intent to leave-job satisfaction relationships were stronger when economic choice was higher (i.e., financial requirements were lower). For instance, while there was a strong negative correlation between intent to leave and general satisfaction in the low-financial-requirements subgroup, the corresponding correlation for the high-financialrequirements subgroup was not significant. The results demonstrate the important roles of externally imposed pressures to stay on the job in the construal of choice.
6.5
Using an Expanded Arsenal of Research Methodologies
A final suggestion is that the use of an expanded array of methodologies could provide a potentially richer understanding of reactions to work. In particular, expanded use of policy capturing, social judgment analysis, information boards, and cognitive mapping procedures may be helpful. Policy Capturing We have already discussed studies employing policy capturing which have added insights concerning work values, and we feel expanded use is appropriate. Policy capturing bypasses the process of obtaining subject reactions to specific attributes. Instead, subjects provide only holistic evaluations of multiattribute alternatives. These overall assessments then become dependent variables in analyses where attribute levels associated with alternatives (such as alternative jobs) have been experimentally manipulated for inclusion as explanatory variables. By using multiple regression or analysis of variance procedures, the influence of variation in the levels of attributes on holistic judgments is assessed post hoc. Evaluation policies of individuals are "captured" in the limited sense of explaining variance in their overall evaluations. Moreover, standardized regression coefficients (or eta2s) associated with each attribute can arguably be interpreted as indicants of relative attribute importance in making overall assessments. Policy capturing offers some clear benefits. For one thing, it does not rely on self-insight to the same degree as do direct assessments. In addition, policy capturing approaches have been shown to be less subject to social desirability contamination than direct estimates (e.g., Brookhouse, Guion, & Doherty, 1986). Policy capturing has, in fact, been used to assess socially sensitive issues, such as sexual harassment (York, 1989). Further, policy-capturing methods incorporate multiple attribute levels into the holistic descriptions. As such, they permit examination of whether the importance of an attribute varies as a function of attribute level and/or distribution in the variability of attribute levels. Finally, the sorts of judgments required by policy capturing are typically closer to those individuals generally make in reacting to jobs than are direct estimations of attribute importance. That is, in "real world" situations individuals are typically
116
Arthur P. Brief and Ramon J. Aldag
asked to estimate overall job attractiveness or intent to apply for a position rather than to rate the importance of individual attributes. This methodology could be used in a variety of ways to enhance our understanding of work values. As one example, individuals can be clustered based on the similarity of their policies and then differences across clusters in terms of financial need, demographic or personality characteristics, or other individual differences or other variables, can be assessed. Further, policy capturing procedures could be used to explore the degree to which policies vary as a function of the particular criterion employed. It may be the case, for instance, that individuals will give different relative weights to economic and noneconomic factors depending on whether the criterion variable is "satisfaction," "happiness," "meaningfulness," or "goodness." Finally, policy capturing methodologies permit consideration of the relative validities of alternative model forms. As such, policy capturing could be used to assess whether, for example, individuals combine economic and noneconomic variables in a compensatory manner, such that good levels of one set can offset poor levels of the other, or a noncompensatory manner, such that tradeoffs are not possible. Social Judgment Analysis Another potentially useful methodology, social judgment analysis, represents the systematic application of policy capturing to the group context. With social judgment analysis (e.g., Rohrbaugh, 1979; Harmon & Rohrbaugh, 1990) policies of members of a group are first captured. Members are then each given feedback about their policies, and they then share information about their policies to explore in depth the logic of their underlying judgment policies and to attempt to arrive at a group judgment policy. It would be interesting to see how the policy which would emerge when such a process was applied to economic and noneconomic variables would compare to the initial policies of individual group members. Examination of the nature and direction of changes from those initial policies would be especially insightful. Information Boards Policy capturing, social judgment analysis, and related approaches may provide valuable information, but they do not directly examine the decision process. As a result, they are unable to determine whether, for example, individuals weight one variable or set of variables at early stages of an evaluative process and others at later stages. It may be the case, for instance, that individuals in determining reactions to a job first consider the levels of economic factors and, if those are acceptable, consider levels of noneconomic factors (or vice versa). In contrast, decision process tracing approaches permit direct examination of the decision process. While a variety of decision process techniques, including eye fixation analysis (e.g., Russo & Rosen, 1975) and verbal protocol analysis (e.g., Isenberg, 1986), may be appropriate for examination of issues relating to work values, information boards could probably be most readily and profitably applied. With information boards (e.g., Payne, Braunstein, & Carroll, 1978) individuals are presented with a decision task in which they must explicitly search for information about available alternatives. The information is generally presented in the form of an array (displayed either as a physical board or a computer screen), with the alternatives (such as jobs) presented on one axis and attribute names (such as pay, job security, and job challenge) on the other. Each cell in the array contains the value of the appropriate alternative and attribute (e.g., pay on a particular job). The value is not revealed until the individual explicitly seeks the information.
The Study of Work Values: A Call for a More Balanced Perspective
117
Subjects are permitted to acquire as much or as little information as they wish prior to reaching a decision. Use of information boards could provide direct evidence relating to a variety of questions. For instance, by considering the amount of information requested and the order of requests, including whether search for information is within alternatives (that is, the individual checks a number of cells for one alternative before considering another) or within attributes (the individual looks at the levels of a particular attribute for more than one alternative), such questions can be addressed as: (1) In what order do subjects consider economic and noneconomic variables?; (2) Do subjects appear to employ multiple hurdles, such that an alternative is not further considered if levels of economic or noneconomic variables are low?; and, (3) Is there evidence that model use changes during the decision process, such that, for example, the subject first searches within attributes (perhaps suggesting that a particular attribute is critical) and then within alternatives (suggesting that alternatives surviving initial hurdles are then evaluated on a more compensatory basis). Further, combining process tracing approaches with policy capturing (e.g., Einhorn, Kleinmuntz, and Kleinmuntz, 1979) may provide even greater insight into decision processes. Cognitive Mapping Increasing attention has been paid in recent years to the roles of schemata (e.g., Nisbett & Ross, 1980; Taylor & Crocker, 1981) and schema-based approaches to organizational behavior are emerging (e.g., Daft & Weick, 1984; Phillips & Lord, 1982). One approach to the understanding of employees' schemata is cognitive mapping. Cognitive mapping techniques "recognize that there are many realities of organizational life, and that knowledge of such realities as they exist within the minds of managers may be enlightening." (Aldag & Stearns, 1988, pp. 262-263). While the term "cognitive mapping" is used in a variety of ways (e.g., Bougon, 1983), it is often applied to the mapping of causal relationships. In this vein, for instance, Jones (1986) wrote: In cognitive mapping a "map" is developed by coding persons' explanatory and predictive beliefs, in the form of belief constructs, as descriptions of relevant entities, concrete or abstract, and the relationships between these. Causal relationships, such that one thing leads to and affects another, are represented by arrows; noncausal, connotative relationships are represented by a simple line. (p. 73) In general, individuals are asked their perceptions of causal relationships. That is, they are asked whether the causal relationship between one variable and another is positive, negative, or zero, often in the form of a matrix. These responses are typically aggregated across individuals. Row sums show the potency of the corresponding row variable, and column sums show the extent to which that column variable is seen as an outcome. Quite simply, cognitive mapping procedures may help inform us about how individuals see economic and noneconomic variables as related to one another as well as to various outcomes. Such an examination of cognitive maps would provide insights into the schemata which are driving individual evaluative processes. In conclusion, it seems clear to us that economic outcomes have more importance to workers than is typically suggested by current organizational behavior literature. Recognition of the roles of economic outcomes has important implications both for practice and for research. Again, we are not arguing that intrinsic outcomes are unimportant. Instead, the stance adopted is that of devil's advocate to encourage movement toward a more balanced view.
118
Arthur P. Brief and Ramon J. Aldag
Notes 1
Portions of this paper are drawn from Nord, Brief, Atieh, and Doherty (1988), but principally Brief and Aldag (1989).
References Adelmann, P. K. (1987). Occupational complexity, control, and personal income: Their relation to psychological well-being in men and women. Journal of Applied Psychology, 72, 529-537. Aldag, R. J., Barr, S., & Brief, A. P. (1981). Measurement of perceived task characteristics. Psychological Bulletin, 90,415-431. Aldag, R. J., & Brief, A. P. (1978). Examination of alternative models of job satisfaction. Human Relations, 31, 91-98. Aldag, R. J., & Brief, A. P. (1979a). Task design and employee motivation. Glenview, IL: Scott, Foresman and Company. Aldag, R. J., & Brief, A. P. (1979b). Examination of a measure of higher-order need strength. Human Relations, 32, 705-718. Aldag, R. J., & Stearns, Τ. M. (1988). Issues in research methodology. Journal of Management, 14, 253-276. Alderfer, C. P. (1972). Existence, relatedness, and growth. New York: Free Press. Andrews, F. M., & Withey, S. B. (1974). Developing measures of perceived life quality: Results from several national surveys. Social Indicators Research, 1, 1-26. Andrisani, P. J. (1977). Internal-external attitudes, personal initiative, and the labor market experience of black and white men. Journal of Human Resources, 12, 308-328 Andrisani, P. J. (1978). Work attitudes and labor market experience. New York: Praeger Publications. Anthony, P. D. (1977). The ideology of work. London: Tavistock Publications. Argyris, C. (1957). Personality and organization. New York: Harper & Row. Argyris, C. (1964). Integrating the individual and the organization. New York: Harper & Row. Argyris, C. (1973). Personality and organization revisited. Administrative Science Quarterly, 18. 141-167. Arnold, H. J., & Feldman, D. C. (1981). Social desirability response bias in self-report choice situations. Academy of Management Journal, 24, 377-385. Bakke, W. E. (1940). Citizens without work. New Haven: Yale University Press. Baritz, L. (1960). The servants of power. New York: John Wiley. Bern, S. L. & Bern, D. J. (1970). Case study of a nonconscious ideology: Training the woman to know her place. In D.J. Bern, Beliefs, attitudes, and human affairs (pp. 89-99). Belmont CA: Brooks/Cole. Bergermaier, R., Borg, I., & Champoux, J. E. (1984). Structural relationships among facets of work, nonwork, and general well being. Work and Occupations, 11, 163-181. Bernard, J. (1981). The good-provider role: Its rise and fall. American Psychologist, 30, 1-12. Bradburn, Ν. M. (1969). The structure of psychological well-being. Chicago: Aldine. Bramel, D., & Friend, R. (1981). Hawthorne, the myth of the docile worker, and class bias in psychology. American Psychologist, 30, 867-878. Braverman, H. (1974). Labor and monopoly capital. New York: Monthly Review Press.
The Study of Work Values: A Call for a More Balanced Perspective
119
Brief, A. P., & Aldag, R. J. (1989). The economic functions of work. In K. Rowland & G. Ferris (Eds.), Research in personnel and human resources management (Vol. 7, pp. 1-23). Greenwich, CT: JAI Press. Brief, A. P., & Atieh, J. M. (1987). Studying job stress: Are we making mountains out of molehills? Journal of Occupational Behavior, 8, 115-126. Brief, A. P., & Hollenbeck, J. R. (1985). Work and the quality of life. International Journal of Psychology, 20, 199-206. Brief, A. P., Konovsky, Μ. Α., Goodwin, R., George, J. M., & Link, K. (in press) Inferring the meaning of work from the effects of unemployment. Journal of Applied Social Psychology. Burawoy, M. (1979). Manufacturing consent: Changes in the labor process under monopoly capitalism. Chicago: University of Chicago Press. Campbell, A. (1981). The sense of well-being in America: Recent patterns and trends. New York: McGraw-Hill. Campbell, Α., Converse, P. E., & Rodgers, W. I. (1976). The quality of American life: Preconceptions, evaluations, and satisfactions. New York: Russell Sage Foundation. Carsten, J. M., & Spector, P. E. (1987). Unemployment, job satisfaction, and turnover: A metaanalytic test of the Muchinsky model. Journal of Applied Psychology, 72, 374-381. Catalano, R. Α., & Dooley, D. (1977). Economic predictors of depressed mood and stressful life events in a metropolitan community. Journal of Health and Social Behavior, 13, 292-307. Catalano, R. Α., & Dooley, D. (1983). The health effects of economic instability: A test of the economic stress hypothesis. Journal of Health and Social Behavior, 24,46-60. Chacko, Τ. I. (1983). Job and life satisfactions: A causal analysis of their relationships. Academy of Management Journal, 26, 163-169. Clawson, D. (1980). Bureaucracy and the labor process: The transformation of U.S. industry, 1860-1920. New York: Monthly Review Press. Cofer, C. N., & Appley, Μ. H. (1964). Motivation: Theory and research. New York: Wiley. Daft, R. L., & Weick, Κ. E. (1984). Toward a model of organizations as interpretation systems. Academy of Management Review, 9, 284-295. de Grazia, S. (1964). Of time, work, and leisure. Garden City, NY: Anchor Books. Diener, Ε. (1984). Subjective well-being. Psychological Bulletin, 95, 542-575. Dohrenwend, B. P., & Dohrenwend, B. S. (1969). Social status and psychological disorder. New York: Wiley. Doran, L. I., Stone, V. K., Brief, A. P., & George, J. M. (1991). Behavioral intentions as predictors of job attitudes: The role of economic choice. Journal of Applied Psychology, 76,40-45. Downey, G., & Moen, P. (1987). Personal efficacy, income, and family transitions: A longitudinal study of women heading households. Journal of Health and Social Behavior, 28, 320-333. Dubin, R. (1956). Industrial workers' attitudes: A study of the central life interests of industrial workers. Social Problems, 3, 131-142. Dubin, R. (1976). Work in modern society. In R. Dubin (Ed.), Handbook of work, organization, and society (pp. 5-35). Chicago: Rand-McNally. Duncan, G. J., & Liker, J. K. (1983). Disentangling the efficacy-earnings relationship among white men. Five thousand American families - Patterns of economic progress, 10, 218-248. Ann Arbor: Institute for Social Research. Dunham, R. B., Pierce, J. L., & Newstrom, J. W. (1983). Job context and job content: A conceptual perspective. Journal of Management, 9, 187-202.
120
Arthur P. Brief and Ramon J. Aldag
Eagly, R. V. (1965). Market power as an intervening mechanism in Philips curve analysis Economica, 32, 48-64. Edwards, R. (1979). Contested terrain. New York: Basic Books. Einhorn, Η. J. (1971). Use of nonlinear, noncompensatory models as a function of task and amount of information. Organizational Behavior and Human Performance, 6, 1-27. Einhorn, Η. J., Kleinmuntz, D. N., & Kleinmuntz, B. (1979). Linear regression and processtracing models of judgment. Psychological Review, 86,465-485. Fein, M. (1976). Motivation for work. In R. Dubin (Ed.), Handbook of work, organization, and society (pp. 465-530). Chicago: Rand McNally. Feldman, D. C., & Arnold, H. J. (1978). Position choice: Comparing the importance of organizational and job factors. Journal of Applied Psychology, 63, 706-710. Fenwick, R. & Olson, J. (1986). Support for worker participation: Attitudes among union and non-union workers. American Sociological Review, 51, 505-522. Ferris, G. R., & Gilmore, D. C. (1984). The moderating role of work context in job design research: A test of competing models. Academy of Management Journal, 27, 885-892. Filley, A. C., House, R. J., & Kerr, S. (1976). Managerial process and organizational behavior (2nd ed.). Glenview, IL: Scott, Foresman and Company. Folger, R., & Greenberg, J. (1985). Procedural justice: An interpretive analysis of personnel systems. In K.M. Rowland & G.R. Ferris (Eds.), Research in personnel and human resources management (Vol. 3, pp. 141-183). Greenwich, CT: JAI Press. George, C. S. (1968). The history of management thought. Englewood Cliffs, NJ: Prentice-Hall. George, J. M., & Brief, A. P. (1990). The economic instrumentality of work: An examination of the moderating effects of financial requirements and sex on the pay-life satisfaction relationship. Journal of Vocational Behavior, 37, 357-368. Gerth, Η. H., & Mills, C. W. (trans.) (1946). From Max Weber: Essays in sociology. New York: Oxford University Press. Goldthorpe, J. H., Lockwood, D., Bechhofer, F., & Platt, J. (1969). The affluent worker in the class structure. Cambridge: Cambridge University Press. Gould, S., & Werbel, J. D. (1983). Work involvement: A comparison of dual wage earner and single wage earner families. Journal of Applied Psychology, 68, 313-319. Grant, M. (1960). The world of Rome. London: Weidenfeld and Nicolson. Greenberg, J. (1987). Reactions to procedural injustice in payment distributions: Do the means justify the ends? Journal of Applied Psychology, 72, 55-61. Greenhalgh, L, & Rosenblatt, Z. (1984). Job insecurity: Toward conceptual clarity. Academy of Management Review, 9,438-445. Guttman, H. G. (1976). Work, culture, and society in industrializing America: Essays in American working-class and social history. New York: Alfred A. Knopf. Hackman, J. R., & Oldham, G. R. (1975). Development of the Job Diagnostic Survey. Journal of Applied Psychology, 60, 159-170. Hall, F.S., & Hall, D.T. (1978). Dual careers: How do couples and companies cope with the problems? Organizational Dynamics, 6, 57-79. Harmon, J., & Rohrbaugh, J. (1990). Social judgment analysis and small group decision making: Cognitive feedback effects on individual and collective performance. Organizational Behavior and Human Decision Processes, 46, 34-54. Hays, S.P. (1957). The response to industrialism, 1885-1914. Chicago: University of Chicago Press. Heneman, H. G., Jr. (1973). Work and nonwork: Historical perspectives. In M.D. Dunnette (Ed.), Work and nonwork in the year 2000 (pp. 12-27). Monterey, CA: Brooks/Cole.
The Study of Work Values: A Call for a More Balanced Perspective
121
Heneman, H. G., Ill (1985). Pay satisfaction. In K.M. Rowland & G.R. Ferris (Eds.) Research in personnel and human resources management (Vol. 3, pp. 115-140). Greenwich, CT: JAI Press. Herzberg, F. (1966). Work and the nature of man. Cleveland: World Publishing. Hulin, C. L., & Blood, M. R. (1968). Job enlargement, individual differences, and worker responses. Psychological Bulletin, 69,41-55. Hulin, C. L., Roznowski, J., & Hachiya, P. (1985). Alternative opportunities and withdrawal decisions: Empirical and theoretical discrepancies and an integration. Psychological Bulletin, 97, 233-250. Iaffaldano, Μ. T., & Muchinsky, P. M. (1985). Job satisfaction and job performance: A metaanalysis. Psychological Bulletin, 97, 251-273. Inkeles, A. (1960). Industrial man: The relation of status to experience, perception, and value. American Journal of Sociology, 66, 1-21. Isenberg, D. J. (1986). Thinking and managing: A verbal protocol analysis of managerial problem solving. Academy of Management Journal, 29, 775-788. Jahoda, M. (1982). Employment and unemployment. Cambridge: Cambridge University Press. Jasso, G. (1978). On the justice of earnings: A new specification of the justice evaluation function. American Journal of Sociology, 83, 1398-1419. Jones, S. (1986). Addressing internal politics: A role for modeling in consultant-client interaction. Small Group Behavior, 17, 67-82. Kahn, R. L. (1981). Work and health. New York: John Wiley & Sons. Kalleberg, A. L. (1977). Work values and job rewards: A theory of job satisfaction. American Sociological Review, 42, 124-143. Kanter, R. M. (1978). Work in a new America. Daedalus, 107(1), 47-78. Katerberg, R., Horn, P. W., & Hulin, C. L. ( 1979). Effects of job complexity on the reactions of part-time employees. Organizational Behavior and Human Performance, 24, 317-332 Kennedy, J. E., & O'Neill, Η. E. (1958). Job content and workers' opinions. Journal of Applied Psychology, 47, 372-375. Kessler, R. C. (1982). A disaggregation of the relationship between socioeconomic status and psychological distress. American Sociological Review, 47, 752-764. Klein, K. J. (1987). Employee stock ownership and employee attitudes: A test of three models. Journal of Applied Psychology, 72, 319-322. Komarovsky, M. (1940). The unemployed man and his family. New York: Octagon Books. Landy, F. J. (1978). An opponent process theory of job satisfaction. Journal of Applied Psychology, 63, 533-547. Lawler, Ε. Ε., III. (1971). Pay and organizational development. Reading, MA: Addison-Wesley Lawler, Ε. Ε., III. (1973). Motivation in work organizations. Monterey, CA: Brooks/Cole. Liem, R., & Liem, J. (1978). Social class and mental illness reconsidered: The role of economic stress and social support. Journal of Health and Social Behavior, 19, 139-156. Locke, E. A. (1976). The nature and causes of job satisfaction. In M.D. Dunnette (Ed.), Handbook of industrial and organizational psychology (pp. 1297-1349). Chicago: Rand-McNally. Locke, Ε. Α., & Schweiger, D. Μ. (1979). Participation in decision-making: One more look. In B.M. Staw (Ed.), Research in organizational behavior (Vol. 1, pp. 265-339). Greenwich, CT: JAI Press. Locke, E. A. Sirota, D., & Wolfson, A. D. (1976). An experimental case study of the successes and failures of job enrichment in a government agency. Journal of Applied Psychology, 61, 701-711.
122
Arthur P. Brief and Ramon J. Aldag
Lyman, E. (1955). Occupational differences in the value attached to work. American Journal of Sociology, 61, 138-144. MacKinney, A. C., Wernimont, P. F., & Galitz, W. O. (1962). Has specialization reduced job satisfaction? Personnel, 39, 8-17. Marx, K. (1967). Capital, Vols. 1-3. New York: International Publishers. Maslow, A. H. (1943). A theory of human motivation. Psychological Review, 50, 370-396. McClelland. D. C. (1961). The achieving society. Princeton: Van Nostrand. McGregor, D. (1960). The human side of enterprise. New York: McGraw-Hill. McGregor, D. (1967). The professional manager. New York: McGraw-Hill. Milkovich, G. T., & Newman, J. M. (1984). Compensation. Piano, TX: Business Publications, Inc. Miller, G. (1980). The interpretation of nonoccupational work in modern society: A preliminary discussion and typology. Social Problems, 27, 381-391. Mills, C. W. (1953). White collar. New York: Oxford University Press. Mirowsky, J. (1987). The psycho-economics of feeling underpaid: Distributive justice and the earnings of husbands and wives. American Journal of Sociology, 92, 1404-1434. Morse, N.C. (1953). Satisfactions in the white-collar job. Ann Arbor: Institute for Social Research, University of Michigan. MOW International Research Team (1987). The meaning of working. London: Academic Press. Muchinsky, P. M., & Morrow, P. C. (1980). A multidisciplinary model of voluntary turnover. Journal of Vocational Behavior, 17, 263-290. Neff, W. S. (1968). Work and human behavior. New York: Atherton. Nelson, D. (1975). Managers and workers: Origins of the new factory system in the United States, 1880-1920. Madison, WI: University of Wisconsin Press. Nisbett, R., & Ross, L. (1980). Human inference: Strategies and shortcomings of social judgment. Englewood Cliffs, NJ: Prentice-Hall. Nord, W. R. (1977). Job satisfaction reconsidered. American Psychologist, 23, 1026-1035. Nord, W. R. (1986). New paths and unexpected bedfellows in OD's future. Organizational Development Distinguished Address, Academy of Management, Chicago, August 1986. Nord, W. R., Brief, A. P., Atieh, J. M., & Doherty, E. (1988). Work values and the practice of organizational psychology. In L.L. Cummings & B.M. Staw (Eds.), Research in organizational behavior (Vol. 10, pp. 142). Greenwich, CT: JAI Press. O'Brien, G.E. (1986). Psychology of work and employment. Chichester, England: John Wiley & Sons. Okrent, M. (1978-1979). Work, play and technology. The Philosophical Forum, 10, 321-340. Oldham, C. R., Hackman, J. R., & Pearce, J. L. (1976). Conditions under which employees respond positively to enriched work. Journal of Applied Psychology, 61, 395-403. Ouchi, W. (1981). Theory Z: How American business can meet the Japanese challenge. Reading MA: Addison-Wesley. Parker, S. R., & Smith, M. A. (1976). Work and leisure. In R. Dubin (Ed.), Handbook of work, organization, and society (pp. 37-62). Chicago: Rand-McNally. Payne, J. W., Braunstein, Μ. L., & Carroll, J. S. (1978). Exploring predecisional behavior: An alternative approach to decision research. Organizational Behavior and Human Performance, 22, 17-44. Phillips, J. S., & Lord, R. G. (1982). Schematic information processing and perceptions of leadership in problem-solving groups. Journal of Applied Psychology, 67, 486-492. Pollard, S. (1963). Factory discipline in the industrial revolution. Economic History Review, 16, 254-271.
The Study of Work Values: A Call for a More Balanced Perspective
123
Quinn, R. P., Staines, G. L., & McCullough, M. R. (1974). Job satisfaction. Is there a trend? Manpower Research Monograph No. 30, U.S. Department of Labor. Rainwater, L. (1974). What money buys: Inequality and the social meaning of income. New York: Basic Books. Rice, R. W., Near, J. P., & Hunt, R. G. (1980). The job satisfaction/life satisfaction relationship: A review of empirical research. Basic and Applied Social Psychology, 1, 37-64. Roberts, Κ. H., & Glick, W. (1981). The job characteristics approach to task design - A critical review. Journal of Applied Psychology, 66, 193-217. Rodgers, D. T. (1974). The work ethic in industrial America, 1850-1920. Chicago: University of Chicago Press. Rohrbaugh, J. (1979). Improving the quality of group judgment: Social judgment analysis and the Delphi technique. Organizational Behavior and Human Performance, 24, 73-92. Ross, C. E., & Huber, J. (1985). Hardship and depression. Journal of Health and Social Behavior, 26, 312-327. Russo, J. E., & Rosen, L. D. (1975). An eye fixation analysis of multialternative choice. Memory and Cognition, 3, 267-276. Schwab, D. P., Rynes, S. L, & Aldag, R. J. (1987) Theories and research on job search and choice. In K. Rowland & G. Ferris (Eds.), Research in personnel and human resources management (Vol. 5, pp. 129-166). Greenwich, CT: JAI Press. Schweiger, D. M., & Leana, C. R. (1986). Participation in decision making. In E.A. Locke (Ed.) Generalizing from laboratory to field settings. Lexington, MA: Lexington Books. Schuler, R. S., & Youngblood, S. A. (1986). Effective personnel management. St. Paul: West Publishing Company. Shenkar, O., & Ronen, S. (1987). Structure and importance of work goals among managers in the People's Republic of China. Academy of Management Journal, 30, 564-576. Simonds, R. H., & Orife, J. W. (1975). Worker behavior versus enrichment theory. Administrative Science Quarterly, 20, 606 612. Staines, C. L., Pottick, K. J., & Fudge, D. A. (1986). Wives' employment and husbands' attitudes toward work and life. Journal of Applied Psychology, 71, 118-128. Stone, K. (1974). The origins of job structures in the steel industry. Review of Radical Political Economy, 6, 113-173. Strauss, G. (1963). Some notes on power equalization. In H. J. Leavitt (Ed.), The social science of organizations (pp. 39-84). Englewood Cliffs, NJ: Prentice-Hall. Tausky, C. (1969). Meanings of work among blue-collar men. Pacific Sociological Review, 12, 49-55. Taylor, S. E., & Crocker, J. (1981). Schematic bases of social information processing. In Ε. T. Higgins, C. P. Herman, & M. P. Zanna (Eds.), Social cognition (Vol. 1). Hillsdale, NJ: Erlbaum, 89-134. Thompson, E. P. (1963). The making of the English working class. London: Victor Gollantz. Tilgher, A. (1931). Work: What it has meant through the ages. London: Harrap. Turner, A.N., & Michlette, A. L. (1962). Sources of satisfaction in repetitive work. Occupational Psychology, 36, 215-231. Veblen, T. (1899). The theory of the leisure class. New York: Macmillan. Veroff, J., Diuvan, E., & Kulka, R. (1981). The inner American: A self-portrait from 1957 to 1976. New York: Basic Books. Vroom, V. H. (1964). Work and motivation. New York: John Wiley & Sons. Wahba, Μ. Α., & Bridwell, L. G. (1976). Maslow reconsidered: A review of research on the need hierarchy theory. Organizational Behavior and Human Performance, 15, 212-240.
124
Arthur P. Brief and Ramon J. Aldag
Wanous, J. P. (1974). A causal correlational analysis of the job satisfaction and performance relationship. Journal of Applied Psychology, 59, 139-144. Warr, P. B. (1983). Work, jobs, and unemployment. Bulletin of the British Psychology Association, 36, 305-311. Weber, M. (1930). The Protestant ethic and the spirit of capitalism (T. Parsons, Trans.). New York: Charles Scribner & Sons. Wheeler, Η. N. (1985). Toward an integrative theory of industrial conflict. In Κ. M. Rowland & G. R. Ferris (Eds.), Research in personnel and human resources management (Vol. 3, pp. 231-270). Greenwich, CT: JAI Press. Wilensky, H. L. (1960). Work, careers, and social integration. International Social Science Journal, 12, 543-560. Wool, H. (1973). What's wrong with work in America?: A review essay. Monthly Labor Review, 96, 38-44. York, Κ. M. (1989). Defining sexual harassment in the workplace: A policy-capturing approach. Academy of Management Journal, 32, 830-850. Zedeck, S. (1977). An information processing model and approach to the study of motivation. Organizational Behavior and Human Performance, 18, 47-77.
Theory and Practice of Sample Surveys Horst Stenger and Siegfried Gabler
1
Introduction
Survey sampling consists essentially of the selection of a part of a population and the subsequent inference from it. The objective of inference may be the whole population, subsets of the population, called domains, or parameters of models for the population, called superpopulation models. To derive statistical conclusions from the data ultimately available means to interpret these data in terms of their stochastic background, a background formed mainly by a random selection procedure, a superpopulation model, and response and measure models.
2
Fixed Populations and Sampling Designs
Consider a finite population U of Ν units u, typically persons, with associated values yu> z ul···· zuM °f a characteristic of interest η and of auxiliary characteristics ζι»---CM' somehow related to η. All y u , u e U are unknown. The vectors z u = ( z u i,... z ^ / , u e U may also be unknown; in any case their mean
is known to us. To take an example, let η be income and use sex, race, age, and so forth as auxiliary variables. To emphasize that y u is not random, but fixed, U is called a fixed population. Information is needed on a population parameter θ = 0(y u ,u e U), i.e., θ is a function of y u , u e U; in addition, θ may depend on z u , u e U which is not made explicit in our notation. The population mean
the population variance
of η, and the population correlation coefficient
126
Horst Stenger and Siegfried Gabler
Zcyu-yXzui-zi VN ueU
/
σ2 Z r z u i - Z i ^ 2 / N ueU
of η and ζί are examples. Usually, a population is partitioned into (disjoint) strata, e.g., by geographic characteristics, with each stratum hierarchically clustered, i.e., each stratum is composed of first stage (primary) clusters, which in turn are sets of second stage (secondary) clusters ... with elements u ε U as last stage units. A structure of this type may be described by auxiliary characteristics appropriately defined. The (partial) knowledge of z u , u 6 U does not provide meaningful evidence of the population parameter θ of interest. It is necessary, then, to select a sample s c U, usually by a procedure adapted to the complex structure of the population, to ascertain y u , u e s and to use a function e = e ( y u , u e s), called an estimator, as a substitute of θ . e may depend at least on z u , u e s and on z, or even on z u , u e U if these vectors are known. In classical sampling theory, established essentially by J. Neyman (1934) and Godambe (1955), the sample is selected by a stochastic procedure defining positive p(s) for s c U. The performance of an estimator e is measured by its p-bias = Xp(sXe(yu,u e s
and its p-variance varpe = Ep(e-Epe)2
.
The probability distribution defined by ρ is called the (sampling) design, e if E p f e - Θ ^ Ο .
isp-unbiased
As an example consider 1 for s with I s| = n Ν ντ λ
pfsH 0
otherwise,
defining a sampling design ρ called the simple random sampling (srs) of size n, n being an integer less than N. Then, the estimator e = e ( y u , u e s) = - £ y n
u
=ys ,
ues
which is called the sample mean, is p-unbiased for the population mean, and σ2 N - n Let ρ be an arbitrary design and define the inclusion probabilities
Theory and Practice of Sample Surveys
=
127
SpCS) s:ues
πυν =
Xpfs; s:u,ve s
for u , v e U. Then, the estimator — Σ — N ues*u called the Horvitz-Thompson estimator, is p-unbiased for the population mean with p-variance Ν
πν
where 7tuu = 7iu. For a self-weighting design, all inclusion probabilities π„ ,u e U are equal to n/N and the Horvitz-Thompson estimator is I V _
—
ys »
U€ S
where η is the expected sample size of p.
2.1
Weighting
A linear estimator (see Godambe, 1955) Xasuyu us s with a s u > 0 for u e s may be given a geometric interpretation by considering the points (yu,zu/,ue s in a (M+l)-dimensional space with asu, u e s as associated weights·, note, however, that the a s u , u e s need not add up to 1. The weighted sample mean of all z u , u € s may coincide with the true mean ζ of all z u , u e U, i.e., Iasuzu = ζ , ue s in which case X a s u y u is called representative (see Häjek, 1981, p. 156). Starting with weights not satisfying the last equation, Deville & Särndal (1992) look for new weights, a su , satisfying the equations without differing too much from the original weights. To measure the distance between the new and original weights, the function Xf3su-asu ;2/QU ues
128
Horst Stenger and Siegfried Gabler
seems to be appropriate, with Q u > 0, u e U to be determined. Minimizing this function for Ssu . w i t h Easuzu = ζ , ue s leads to the estimator
ue s
z - Xasuzu ues
ue s
ues
) ^Qu^yu
·
ues
This estimator is called the generalized regression (GREG) estimator (see Chaudhuri & Stenger, 1992, chap. 4) if
To derive another special case of interest, let the population be partitioned into strata U(1),...U(L) and define for i=l,„. L _ fl if u e U(U Zui =
l0ifnot
Assume further, a s u = Q u , u e s. Then, as is easily seen, we obtain the poststratified estimator V-
v N m
ues where Nf^ j =
and ueU
y(i)=
XQuyu/ ZQu. ues ues ue U (t) ueUfiJ
Again, the case Q u = 1 / π„ is of special importance. Now, assume a population partitioned into strata U(l),... U(L) and, at the same time, into strata V(l),... V(M). If the number N(^,m) of units in )nV(m) is known for all £ and m, the above-described weight adjustment applies, leading to a poststratified estimator E X - ^ y r ' . m j , where for a self-weighting design yf V(£ )nV(m). If only marginal frequencies Ν ( 0 = Σ Ν ( ^ , π ι ) and m
N(.,m) = XN(i,m) t
is the mean of all n(i,m)
observations in
Theory and Practice of Sample Surveys
129
are known, possibly from two different surveys, then a cross-classification and poststratification are impossible. Adjusting the weight 1/n of each observation in the estimator
to the marginal frequencies leads to the well-known raking ratio estimator, provided the distance between the new and the original weights is measured by 2i/asufo£~i—ssu + asilj . a
ues
su
The new weights a ^ , which are always positive, may be determined by the widely used iterative proportional fitting (IPF) algorithm (see Deville, Särndal & Sautory, 1993).
2.2 Variance Estimation An estimator
Swu »
ue s
with w u a function of y u and ζ (and not of yu- , z u - u , ) , is called a linear statistic (see Smith, 1994). Its p-expectation is ue U and its p-variance
Σ Σ ( π ϋ ν _ π ϋ π ν) ν ν. w. U,V
€ U
is estimated without p-bias by Σ Σ (* uv ~ π υ π ν ) W u w ν / π uv u,v e s Note that the Horvitz-Thompson estimator is a linear statistic, while the GREG estimator is a nonlinear function of linear statistics. By linearization of nonlinear functions of linear statistics, approximate expectations and variances may be calculated. To outline the method we consider a nonlinear function
f(Xw u ) ues
of only one linear statistic. Clearly, this function will generally not be p-unbiased for ί ( Σ π ϋ ν ν υ λ However, under weak assumptions, it may be linearized according to Taylor's formula as
ue U
ue U
Lues
ueU
130
Horst Stenger and Siegfried Gabler
where f ' denotes the first-order derivative of f. Then, f ( X w u ) ues p-unbiased for f( Σπ„\ν„;
is called approximately
,
ueU
with the approximate p-variance 12 f'f Z * u w u ;
varp £ w u
.
ue s
ueU
For a nonlinear function of linear statistics, an approximately p-unbiased estimator of the p-variance is derived by estimating the p-variance of the linearization. Replication is another technique to obtain variance estimates needed, e.g., in order to construct confidence intervals. Strictly speaking, replication means to repeat a design independently, with subsequent estimation of a parameter. If k is the number of replications with estimates ei,...e k , X e j / k may be used as an ultimate estimate with estimated p-variance 2
2
k i=l
j k j=l
Improvements such as the random group method, the balanced repeated replication and the balanced half-samples method are described in detail by Wolter (1985). In addition, jackknife and bootstrap methods have been developed and applied to estimate p-variances of estimators. These are resampling methods, i.e., the sample at hand is used to construct artificial samples that allow one to assess the variability of the underlying sampling design (see Rao & Wu, 1985; 1988).
3
Superpopulation Models
Frequently, the values y u , u e U may be interpreted as realizations of random variables Y u ,u e U, with a distribution depending on z u , u e U and additional parameters, called model parameters. The common distribution m of Yu , u e U is called a superpopulation model. In classical sampling theory, m interferes in the planning phase of a survey when a decision has to be made as to which strategy (p,e) to choose. That is, EmEp(E-e;2 will be a selection criterion. Here E m denotes expectation with respect to the model m and Ε is equal to e with y u replaced by Y u , u e s. Note that it is not indicated explicitly that θ = 0(YU, u e U) is a random variable here. In this approach, the performance criteria remain unchanged. Hence, inference is designbased, just as it was earlier. If we were wrong in believing m was true, our sampling strategy would be less efficient than it would have been under m, a fact that is indicated by the variance estimator and, therefore, by the length of the confidence interval. Hence, we are not led astray. Of course, we can use the m-bias E m ( E - 9 j and the m-variance varm Ε as performance characteristics. Then, inference is model-based and no design characteristics interfere.
Theory and Practice of Sample Surveys
131
Following this predictive approach we need not select samples randomly. To choose a balanced sample (see Royall, 1988) is also admissible, but has found less general acceptance. The widely used quota sampling seems to be a technique to select balanced samples. For critical discussions of this cost-saving method, we refer to Smith (1983), Moser (1952), and Moser & Stuart (1953). Of importance are questions of weighting in the model-based approach (see Pfeffermann, 1993). It is certainly tempting to incorporate prior knowledge in a superpopulation model and, then, to use estimators, called predictors in this context, with optimality properties under this model. However, model-based inference depends crucially on the correctness of the underlying model, while design-based inference is robust, giving meaningful results whatever model may be true. Hence, it is recommendable to combine both principles. As an example, we mention small area sampling, where, to provide reliable estimates for communities, one models similarity between domains and applies Bayes and empirical Bayes methods (see Prasad, 1988; Ghosh & Meeden, 1986). Another combination of design and model principles, called the model-assisted approach, shall be described in greater detail. Consider the model m, Yu = z u ] 3 + e u ; u e U , where e u , u e U, are residuals with Emeu = 0
and
(
Vu for u = ν
0
otherwise.
With arbitrary Q u > 0 , u e U, and arbitrary design p, define L = ( XQuiuz'u Γ1 IQuz„Yu u€s ues ^
Nues 71 "
ue s71"
and
"Q
The realization TQ of TQ is identical with the GREG estimator introduced earlier; it is approximately p-unbiased for the population mean and m-unbiased (in the strict sense) for Y. In addition, we will try to choose Q u = V u . Then, var
m. Tq -
var
m Τ
for all m-unbiased predictors Τ which are linear in Yu , u e s. Hence, the GREG predictor TQ has good design and model properties. It should be emphasized that Cassel, Sämdal & Wretman (1977) propose to apply the _ ο GREG predictor in a design based way, i.e., to use E p ( t q - y) as a performance characteristic. We want to mention the minimax approach to survey sampling, aiming at a game-theoretic foundation of design- and model-based inference. Here, the statistician choosing a strategy (p,e) is playing a game against nature, fixing the values y u , u e U. The mixed strategies of nature are to be interpreted as models. Hence, the statistician will look for the least favorable
132
Horst Stenger and Siegfried Gabler
model which is in reach of nature and will play the best answer to it. For the equilibrium pair m and (p,e) emerging this way, there is no difference between design- and model-based inference (see Stenger, 1989; Gabler, 1990).
4
Analytic Studies
If a model is introduced, information on the model parameters may be needed. Corresponding inference is called analytic and is slightly different from descriptive inference aiming at population parameters. Assume, for instance, that η is a categorical characteristic, i.e., Yu = k if u is of category k , where k=l,... K. Let N k be the number of u e U with Y u = k and suppose H o : E m N k = N 0 k ; k = 1....K is to be tested at a significance level α = 0.05; here θ^,,-,θ^ are predetermined probabilities. Data are provided by a design p; we write n k for the number u e s, with Yu = k and n= £ n k . Now, assume Y u , u e U are independent with identical distributions (iid). Then ni,...n K are multinomial with parameters θ 1 ( ...θ κ under H 0 . Therefore, H 0 will be rejected at significance level α if Κ n I i " 7n- 0 l J 2 / e k > X K - l , a · k=l
Note that ρ is completely arbitrary. Now we drop the assumption of independence of Y u , u e U and assume instead srs. Then nj,...nK are approximately multinomial, now with respect to the randomization distribution induced by the design. The parameters of this multinomial are N k / N , k = 1,... K, as long as the randomization distribution is considered alone. In view of the compound randomization and model distribution, ni,...n K are approximately multinomial with θ^.,.θκ under H 0 because N k / N will be consistent for 0 k , k=l,..K, under weak assumptions. Hence, the same test applies, now without restrictive assumptions on the model. Usually, the iid assumption is unrealistic and, at the same time, the survey design is complex in such a way that neither of the above approaches is adequate. Let ρ be a self-weighting design such that Ρ
"k p
Nk
n ~ Ν
and define
Theory and Practice of Sample Surveys
_
v
F
/£k
Üti/Ξί.
133
Iii»
v=rvk£; η η Ν, Νκ F = (f — L ——)' Ν '"· Ν ' 0=re1(...eKr.
~
Then, n f f - F J ' V
4-
1
(f-F)
is approximately χ -distributed with K-l degrees of freedom; +
here V is the +Moore-Penrose inverse of V. Therefore, reject HQ, if n(f-erV (f-e;>xLi.A . to obtain a level α test (see Rao & Thomas, 1989). However, V may not be known to the analyst interested in H 0 . This will be the case especially when the data are collapsed across strata and clusters, e.g., by reasons of confidentiality of individual data or with the objective to simplify the presentation of the data. In this situation with only f_ known (and V unknown), a decision is usually based on the statistic Κ n£(—-ek)2/e k=l n
k
= n(f-e;'w_Vf-e) ,
where W=diag(0i,...0K). However, this statistic is approximately distributed as K-l k=l where Χι,...Χις_ι are independent Χf-variables and λι >..AK_J > 0 the positive eigenvalues of W - 1 V . Note that Xj = λ 2 · . . = λ ^ - ΐ = 1 for srs. For stratified sampling λι < 1 such that the test described is conservative. However, for clustered sampling, typically some or all eigenvalues are greater than 1 and the true significance level of the test described can be much higher than a . For corrections necessary under cluster sampling, see Rao & Scott (1981). A quick and easy approximation to the distribution of a sum of weighted chi-square variables is given by Gabler & Wolff (1987).
5
Nonsampling
Errors
So far we have assumed that the units of investigations have been unambiguously identified and listed and that the true values of the relevant characteristics may be ascertained for any subset of units with subsequent error-free data processing. However, all activities in planning, implementing, and evaluating surveys are subject to error. Suppose there are no conceptual problems, i.e., the units and their characteristics to be measured are defined uniquely and in complete agreement with the objective of the survey, and a true value exists for each unit-characteristic pair. Assume, additionally, that data processing is error-free. Even then, the deviation between a true population value and its estimate is only
134
Horst Stenger and Siegfried Gabler
partly accounted for by the sampling design. Imperfect frames, missing responses, and incorrect responses are other important sources of error, called nonsampling errors, causing bias and variance inflation. Strategies to reduce nonsampling errors, such as randomized response procedures (see Chaudhuri & Mukerjee, 1988), as well as methods to deal with unavoidable nonsampling errors are extensively discussed in current survey literature (see Lessler & Kalsbeek, 1992). Here, a few comments on the nonresponse problem seem necessary. It would be unrealistic to assume that people could be partitioned into disjoint sets consisting of respondents and nonrespondents, respectively. Consider all activities starting with the selection of a person's address and ending up either with the person's response or with the decision of the interviewer to forgo further attempts to get an answer, classifying the person a "not-at-home" or a "refused." Early contributions of Hartley (1946), Pölitz & Simmons (1949; 1950), and Hansen & Hurwitz (1946) are already based on an implicit modeling of these activities and events while, for the rest, the design approach to inference is accepted. According to this quasi-randomization approach favored by many survey statisticians today, the population has to be partitioned into cells small enough as to allow modeling of the response behaviour of persons by homogeneous response probabilities; in other words, it is assumed that within a cell, the attempts to get responses are independent Bernoulli trials with the same (model) parameter. With these assumptions weighting procedures are applicable. For srs it is appropriate to use the weighting class estimator
where n c is the number of units drawn from cell c, y c the mean of all responses in cell c, and n = X n c . If the cell sizes are known, poststratification is recommended while raking ratio methods apply to cross-classified data with known marginal frequencies in the population. In multipurpose surveys, H>1 characteristics η 1( ...η Η are of interest, i.e., a vector y^ = (y U !,...y U H )' is associated with u, u e U. Weighting methods are mainly applied in situations of unit nonresponse, where for some units in the sample all components of the vector of interest remain unknown, while for the other sampling units all components are available. Imputation methods are preferred in the presence of item nonresponse where only one or a few components of some vectors y ,u e s are missing (see Little & Rubin, 1987; Madow, Nisselson & Olkin, 1983). Considering nonresponse in a survey analysis necessarily presupposes modeling assumptions. Therefore, it may be easier to deal with nonresponse in the framework of model-based inference, especially if the inadequacy of hypothetical response probabilities are not sufficient in explaining the observations necessitates more complex response models.
6
Sampling at ZUMA
The theory outlined so far forms the basis for ZUMA in advising practitioners in sampling. In many cases the definition of the sampled population must be clarified with regard to regional, temporal and substantive aspects. In conjunction with the researcher we look for prior information, such as strata, clusters, and so on, which could play an important role in efficient design. Since the same words have different meanings in colloquial language and statistics, we must
Theory and Practice of Sample Surveys
135
clarify what the researcher really means. An example is the concept of representativeness (see Kruskal & Mosteller, 1979a-c;1980). Often we must assess whether it is even possible to draw a random sample. Sometimes only quasi-experimental designs are possible. Quota sampling also cannot be avoided entirely. ZUMA also responds to inquiries involving surveys by telephone (Frey, Kunz & Lüschen 1990; Groves & Kahn 1979), which have their own special advantages and disadvantages, as well as panel surveys (Kasprzyk et al. 1989). Samples of special populations, such as ethnic groups, drug users, or the homeless, cannot be selected by means of classical random designs. One reason for this is the small share of the total population made up by such groups. A frame for the target population cannot be constructed or the units of the population may be registered in several frames. Furthermore, such groups are often difficult to interview. It is common for these populations, that the units are often connected by a network. Using suitable sampling procedures, not only knowledge about the members of the network is gathered, but also about the structure of the network and the relations between the units. Special examples of such link-tracing or ascending designs are snowball sampling and network or multiplicity sampling. It is difficult to generalize the sampling results for such designs. Fundamental contributions in this field have been made by Coleman (1958), Erickson (1979), Goodman (1961), Sudman & Kalton (1986), Kalton & Anderson (1986), Sudman, Sirken & Cowan (1988), and van Meter (1990). If the aim of sampling is to estimate the total number of persons or objects of a population, one may choose capture-recapture sampling methods (Thompson 1992). One of the main objects of interest of sampling at ZUMA is to support and to develop nationwide surveys which are representative for the population of Germany. In the context of the Allgemeine Bevölkerungsumfrage der Sozialwissenschaften (ALLBUS: German General Social Survey; see Mayer & Schmidt, 1984) which has been conducted biennially since 1980, the ADM design (ADM 1979), well known in connection with commercial and scientific surveys, has been applied in all cases except for 1994. Initially, the population consisted of all Germans in (West) Germany, living in private households and not younger than 18 years at the time of interview. Recently, foreigners have been also included if their knowledge of German is good enough for an interview. This change in the definition of the population has repercussions. The classical ADM design consists of three steps. It is basically a multistage and, in a certain sense, a stratified sampling design. The territory of Germany is partitioned into a great number of small areas, the primary units or sample points. The units of the second stage are the private households. The last level of selection consists of the members of the chosen households belonging to the target population. The partition of the primary units is mainly based on the voting districts used in Federal Parliamentary (Bundestag) elections. The universe of the voting districts is available as an EDP tape with several auxiliary characteristics, which can be used as sampling frame. By systematic proportional-to-size sampling, the voting districts are selected. The measure of size is the (estimated) number of private households in the districts. Since a list of the private households within a primary unit does not exist, they are ascertained by the interviewer according to the instructions of inspection. The interviewer is given a starting point and rules of continuation from the commissioned institute. Usually he has to ring every third doorbell. Finally, it must be determined which one person of the target population in each household is to be surveyed; this is done by means of the Kish table. The ADM design yields theoretically a self-weighting sample with regard to the households. We will not discuss here the disadvantages of the ADM design that are repeatedly cited in the literature. It is obvious that by including foreigners in the survey the selection in the first step cannot be performed adequately since the persons entitled to vote are Germans. This and other things led to the idea to conduct selection by means of the address registers of the municipal registration offices. In the first stage, municipalities are selected with
136
Horst Stenger and Siegfried Gabler
probabilities proportional to the number of units (at a certain moment). Each chosen municipality delivers the same number of individuals of the target population. This results in a selfweighting design with regard to the individuals. The future will tell us more about the advantages and disadvantages of this sampling design. One advantage proves to be that the percentage of foreigners in the sample is nearer to their percentage in the population. A disadvantage are the higher costs caused by this selection procedure. The high rate of nonrespondents, which appears to be independent of design, also causes problems for the ALLBUS. Treating these problems has occupied ZUMA for a long time. Especially the question of weighting has a long history at ZUMA. An overview of the momentary discussion can be found in Gabler et al. (1994). A further important point of focus of the sampling work of ZUMA will be the international comparison of samples. An initial starting point is the International Social Survey Programme (ISSP), in which more than 20 nations are now participating.
References ADM - Arbeitskreis Deutscher Marktforschungsinstitute (Ed.) (1979). Muster-StichprobenPläne. Published by Felix Schaefer. Munich: Verlag Moderne Industrie. Cassel, C.M., Särndal, C.E. & Wretman, J.H. (1977). Foundations of inference in survey sampling. New York: Wiley. Chaudhuri, A. & Stenger, H. (1992). Survey sampling. New York: Marcel Dekker. Chaudhuri, A. & Mukeijee, R. (1988). Randomized response: Theory and techniques. New York: Marcel Dekker. Coleman, J.S. (1958). Relational analysis: The study of social organizations with survey methods. Human Organization, 1, :28-36. Deville, J.C. & Särndal, C.E. (1992).Calibration estimators in survey sampling. Journal of the American Statistical Association,87, 376-382. Deville, J.C., Särndal, C.E. & Sautory, O. (1993). Generalized raking procedures in survey sampling. Journal of the American Statistical Association, 88, 1013-1020. Erickson, B.H. (1978): Some problems of inference from chain data. In K.E. Schuessler (Ed.), Sociological Methodology, 1979 (pp. 276-302). San Francisco: Jossey-Bass. Frey, J.H., Kunz, G. & Liischen, G. (1990). Telefonumfragen in der Sozialforschung. Opladen: Westdeutscher Verlag. Gabler, S. (1990). Minimax solutions in sampling from finite populations. New York: Springer Verlag. Gabler, S., Hoffmeyer-Zlotnik, J.H.P. & Krebs, D. (Eds.) (1994). Gewichtung in der Umfragepraxis. Opladen: Westdeutscher Verlag. Gabler, S. & Wolff, C. (1987). Α quick and easy approximation to the distribution of a sum of weighted chi-square variables. Statistische Hefte 28, 317-325. Ghosh, Μ. & Meeden, G. (1986). Empirical Bayes estimation in finite population sampling. Journal of the American Statistical Association, 81, 1058-1062. Godambe, V.P. (1955). A unified theory of sampling from finite populations. Journal of the Royal Statistical Society, ser. B, 17, 269-278. Goodman, L.A. (1961): Snowball sampling. Annals of Mathematical Statistics, 32, 148-170. Granovetter, M. (1976): Network sampling: Some first steps. American Journal of Sociology, 81, 1267-1303.
Theory and Practice of Sample Surveys
137
Groves, R.M. & Kahn, R.L. (1979). Surveys by telephone. A national comparison with personal interviews. New York: Academic. Häjek, J. (1981). Sampling from a finite population. New York: Marcel Dekker. Hansen, M.W. & Hurwitz, W.N. (1946). The problem of nonresponse in sample surveys. Journal of the American Statistical Association, 41, 517-529. Hartley, H.O. (1946). Discussion of " A review of recent statistical developments in sampling and sampling surveys" by F. Yates. Journal of the Royal Statistical Society, 109, 37-38. Kalton, G. & Anderson D.W. (1986). Sampling rare populations. Journal of the Royal Statistical Society, ser. A, 149, 65-82. Kasprzyk, D., Duncan, G., Kalton, G. & Singh, M.P. (Eds.) (1989). Panel surveys. New York: Wiley. Kruskal, W. & Mosteller, F. (1979a). Representative sampling, I: Non-scientific literature. International Statistical Review, 47, 13-24. Kruskal, W. & Mosteller, F. (1979b). Representative sampling, II: Scientific literature. International Statistical Review, 47, 111-127. Kruskal, W. & Mosteller, F. (1979c). Representative sampling, III: The current statistical literature. International Statistical Review, 47, 245-265. Kruskal, W. & Mosteller, F. (1980). Representative sampling, IV: The history of the concept in statistics, 1895-1939. International Statistical Review, 48, 169-195. Lessler, J.T. & Kaisbeek, W.D. (1992). Nonsampling error in surveys. New York: Wiley. Little, R.J.A. & Rubin, D.B. (1987). Statistical analysis with missing data. New York: Wiley. Madow, W.G., Nisselson, H. & Olkin, I. (Eds.) (1983). Incomplete Data in sample surveys, Vols. 1-3. New York: Academic Press. Mayer, K.U. & Schmidt, P. (Eds.) (1984). Allgemeine Bevölkerungsumfrage der Sozialwissenschaften. Frankfurt: Campus Verlag. Meter, K.M. van (1990). Methodological and design issues: Techniques for assessing the representatives of snowball sampling. In W. Wiebel (Ed.), Hidden population. NIDA Research Monograph 98, Rockville. Moser, C.A. (1952). Quota sampling. Journal of the Royal Statistical Society, ser. A, 115, 411423. Moser, C.A. & Stuart, A. (1953). An empirical study of quota sampling. Journal of the Royal Statistical Society, ser. A, 116, 349-405. Neyman, J. (1934). On the two different aspects of the representative method: The method of stratified sampling and the method of purposive selection. Journal of the Royal Statistical Society, 97, 558-625. Pfeffermann, D. (1993). The role of sampling weights when modeling survey data. International Statistical Review, 61, 317-337. Pölitz, A.N. & Simmons, W.R. (1949). I. An attempt to get the "not at homes" into the sample without callbacks. II. Further theoretical considerations regarding the plan for eliminating callbacks. Journal of the American Statistical Association, 44, 9-31. Pölitz, A.N. & Simmons, W.R. (1950). Note on an attempt to get the "not at homes" into the sample without callbacks. Journal of the American Statistical Association, 45, 136-137. Prasad, N.G.N. (1988). Small area estimation and measurement of response error variance in surveys. Unpublished Ph.D. thesis, Carleton University, Ottawa. Rao, J.N.K. & Scott, A.J. (1981). The analysis of categorical data from complex sample surveys: Chi-squared tests for goodness of fit and independence in two-way tables. Journal of the American Statistical Association, 76, 221-230.
138
Horst Stenger and Siegfried Gabler
Rao, J.N.K. & Thomas, D.R. (1989). Chi-squared tests for contingency. In C.J. Skinner et al. (Ed.), Analysis of complex surveys (pp. 89-114). New York: Wiley. Rao, J.N.K. & Wu, C.F.J. (1985). Inference from stratified samples: Second-order analysis of three methods for non-linear statistics. Journal of the American Statistical Association, 80, 620-630 Rao, J.N.K. & Wu (1988). Resampling inference with complex survey data. Journal of the American Statistical Association, 83, 231-241. Royall, R.M. (1988). The prediction approach to sampling theory. In P.R. Krishnaiah & C.R. Rao (Eds.), Handbook of statistics, vol 6 (pp 399-413), Amsterdam: Elsevier Science Publishers. Smith, T.M.F. (1983). On the validity of inferences from non-random samples. Journal of the Royal Statistical Society, ser. A, 146, 394-403. Smith, T.M.F. (1988). To weight or not to weight that is the question. In J.M. Bernardo et al. (Eds.), Bayesian Statistics, 3 (pp. 437-451). Oxford University Press. Smith, T.M.F. (1994). Sample surveys 1975-1990; an age of reconciliation? International Statistical Review, 62, 5-19. Stenger, H. (1989). Asymptotic analysis of minimax strategies in survey sampling. The Annals of Statistics, 17, 1301-1314. Sudman, S. & Kalton G. (1986). New developments in the sampling of special populations. Annual Review of Sociology, 12, 401-429. Sudman, S., Sirken M.G. & Cowan C.D. (1988). Sampling rare and elusive populations. Science, 240, 991-995. Thompson, S.K. (1992). Sampling. New York: Wiley. Wolter, K.M. (1985). Introduction to variance estimation. New York: Springer Verlag.
Statistics and the Sciences Jan de Leeuw
"When, after the agreeable fatigues of solicitation, Mrs. Millamant set out a long bill of conditions subject to which she might by degrees dwindle into a wife, Mirabell offered in return the condition that he might not thereby be beyond measure enlarged into a husband. With age and experience in research come the twin dangers of dwindling into a philosopher of science while being enlarged into a dotard." (C. Truesdell)
1
Introduction
This paper summarizes and extends the arguments in a number of earlier papers (De Leeuw, 1984; De Leeuw, 1988a; De Leeuw, 1988b; Dekker, 1992; Gifi, 1990). Although it is meant as a contribution to the methodology of the social and behavioral sciences, I think my argument actually applies to all disciplines that use statistics. The common concern in the papers and chapters mentioned above is to demarcate the responsibilities of the statistician and those of the empirical scientist. This means we assume that there is a legitimate academic discipline called "Statistics". This is, by no means, uncontroversial. Many scientists feel that they do not need statisticians to analyze their data, and many university administrators think that statistics is just an undergraduate course that students take to satisfy the general quantitative requirements. Quite a few statistics departments have disappeared, or could easily disappear, because it is tempting to distribute statisticians over the quantitative programs of various disciplines. In order to describe what belongs to science and what belongs to statistics I have to grope around in the murky area called the Foundations of Statistics. In this area I generally side with the hard-nosed frequentists, and every year or so I reread, with increasing pleasure, the papers by Kiefer (1977) and LeCam (1977).
2
Statistics
2.1
Definition
Statistics is defined as the science of building and evaluating tools for data analysis. The word "tools" is chosen on purpose here. It indicates that statistics is close to engineering, and in some
140
Jan de Leeuw
instances perhaps even close to carpentry. The tools we refer to are statistical techniques. I find it helpful to distinguish between statistical techniques and statistical models, although many people seem to use these words almost interchangeably. They talk about the output of a factor analysis model, or they analyze their data by using a LISREL model. This is both vague and confusing. Not making this distinction blurs the boundary between theory and technological implementation. It suggests that, in some sense, statistical techniques can replace scientific theory in some sense. 2.1.1
Inference
By now, some statisticians may become quite nervous. What about probability? What about inference? What about decisions? The answer is quite simple. Inference is not the business of the statistician. It is often said that statistics "transforms certain knowledge about the sample into uncertain knowledge about the population". This is, indeed, a catchy phrase, but what does it really mean ? Nothing much, as far as I can see. It restates the obvious fact that everybody, including scientists, generalizes, but it suggests that statistics can contribute to make such generalizations more respectable in some logical or methodological sense. This suggesting is quite misleading. Some data analysis techniques are used by scientists to make various kinds of extrapolations and interpolations. This is proper and altogether unavoidable. Of course science must generalize beyond the actual data it has collected. But in each of these cases, no matter if we use extrapolation in time or space, or interpolation in time or space, there are no deductive rules that can be applied. Missing data are indeed missing. They have to be imputed, preferably on the basis of prior knowledge. If we have a strong model, or strong a priori information of another type, then we can interpolate with great confidence, and extrapolate with somewhat less confidence. Many practical situations, in which statistics is especially useful, can be thought of as "making a convincing story" or "trying to convince the jury" or "trying to convince the reviewers". One has to take the possibility into account that somebody else can try to formulate a very different story, precisely because so much information is missing and has to be imputed in some way. 2.1.2
Decisions
The point of view that "Statistics is the science of decision-making under uncertainty" also does not make sense to me. It is too general a definition to be useful. Everything that lives and breathes is involved in decision making under uncertainty. If the definition is made more specific by defining "uncertainty" and "decision-making", then it suddenly turns out to be much too narrow. We would all be sitting in our chairs, afraid to take one of these decisions, because we have to take all its possible consequences into account, preferably with monetary costs attached. More importantly, however, it is not the statistician who makes the scientific decisions. The statistician makes statistical decisions, i.e. which tool will I use, which gauge will I apply it to, what will I advice this client to do, and so on. A great deal of mischief has come from the fact that in some cases scientists have actually delegated scientific decision making to the statisticians, or even to some arbitrary statistical tools. The use of the word significant illustrates this nicely, as does the word normal in "normal distribution". Just as it is useful to distinguish models and techniques, it is useful to distinguish scientists and statisticians. Fortunately, both models and techniques, and scientists and statisticians, are
Statistics and the Sciences
141
closely connected. Often the analysis and discussion of a scientific experiment involves both models and techniques, and it is done by a person who is both a scientist and a statistician. This practical confounding of the two does not mean, however, than we cannot make a distinction. 2.1.3
Probability
Statistical techniques sometimes use probability, and sometimes they don't. Statisticians propose and study statistical techniques. The idea that only the language of probability can be used for data analysis, which especially the Bayesians tend to believe, is just cultural imperialism. There is much scientific data analysis going on that does not use probability, but only analysis, algebra, set theory, or graph theory. To call this inferior, by implication, is quite infuriating.
2.2
Techniques
Statistical techniques are mappings of data into statistics. The data and the statistics are not necessarily quantitative, although in most cases numbers are involved. What do I mean by mappings? Data, which are codings of results of experiments, are mapped into some statistical space. From the data we compute a mean, a cross table, a correlation matrix. Or we generate five pages of computer output. Thus we map, for instance, rectangular data matrices into the space of correlation matrices.
Protocol
Coding
Technique
Figure 1. Gathering and analyzing data.
Almost always, data reduction is involved, which means that the mapping is injective and not surjective. We can compute the covariance matrix from the data, but not the data from the covariance matrix. This is illustrated in Figure 1. Survey forms, sense impressions, or experimental protocol sheets are in the dashed box on the left. Coding transforms these raw protocols into data, and statistical techniques map the data into statistics. It is not entirely clear if coding is a part of statistics. Obviously, it is very important, because it determines the form of the data, and consequently it determines the types of statistical techniques that can be used. Curiously
142
Jan de Leeuw
enough, not much attention is paid to coding in teaching or philosophical discussions, although obviously many coding decisions are at least as important as the choice between maximum likelihood or least squares, between likelihood or posterior distribution, or the choice between the normal or the t-distribution. Clearly part of the coding phase is related to the area of experimental design, which is often considered to be part of statistics.
3
The Evaluation of Statistical Techniques
In Gifi (1990) the business of evaluating statistical techniques is discussed in detail. In classical statistics, we start with models. A model is then combined with a principle, such as maximum likelihood, to derive a technique. This is a mechanical process, which produces a unique technique from any model, given the principle. Unfortunately, the process takes place entirely within statistics (or mathematics), and there is no actual contact with reality. The one-to-one correspondence between models and techniques, based on narrowly defined notions of optimality, is often not really useful. There are not too many scientific disciplines in which we can afford to start with the model, without ever questioning it, and let it completely dictate the technique. Strong prior knowledge of this sort is available, it seems, only in some areas in the physical sciences. And even in those areas the prior knowledge often is not specific enough to determine the technique completely. More often than not this is not really a problem, because the precise choice of the technique does not make much of a difference, due to low error levels. In many quantitative disciplines, most typically in econometrics, the appropriate statistical method is to assume a statistical model, then collect the data, then test the model by comparing the statistics with the model. If the model does not fit, it is rejected. This is supposedly "sticking out one's neck", which is presumably the macho Popper thing to do. There are various things problematic with the prescription. They are by now tedious to repeat, but here we go anyway. In the first place, if you follow the prescription, and your data are any good, your head gets chopped off. In the second place, because people know there head will get chopped off, nobody follows the prescription. They collect data, look at their data, modify their model, look again, stick out their neck a tiny bit, modify their model again, and finally walk around with a proud look on their face and a non-rejected model in their hands, pretending to have followed the Popperian prescription. Thus the prescription leads to fraud. The only reason it is still around is because some scientists take their models, and themselves, much too seriously. In order to discuss the business of evaluating techniques, Gifi (1990) distinguishes the gauging of a technique and the stability analysis of a technique. This supposedly covers almost all of classical statistics, both the correspondence between models and techniques, and the study of standard errors and confidence intervals.
3.1
Gauging
We are gauging a statistical technique if we apply it to a data set with known properties, and then study how the technique represents these known properties. A little reflection shows that the notion of gauging is a radical departure of the usual practice of deriving a unique technique from a model and some optimality principle. We apply different techniques to the same gauge (i.e. the same model), and we gauge a technique by applying it to different models.
Statistics and the Sciences
143
Gifi (1990) discusses a number of different gauges. We repeat the list here, because it illustrates clearly what we mean by gauging. Probabilistic gauges. In multivariate analysis the multivariate normal distribution is the main gauge, but other interesting gauges are the Poisson process, the Markov chain, the Rasch model, the Cauchy distribution, and so on. If we apply our technique to the distribution or process, as if these are our data, then we see what happens with the known aspects of the gauge. If we apply correspondence analysis to the bivariate normal distribution, we find the Hermite-Chebyshev polynomials. Statistical gauges. In statistics we apply techniques to random samples from a distribution, i.e. we have a number of independent random variables which all come from the same distribution. Monte Carlo gauges. If the formulas become too complicated, we can always do the actual sampling, for instance from a multivariate normal. We construct, say, artificial data sets in this way, and apply our techniques. Algebraic gauges. As we said above, statistics is not probability Benz (1992). In multivariate analysis the algebraic aspects are often more important than the probabilistic ones. Empirical gauges. Sometimes we are in the fortunate situation that an empirical finding is wellestablished. This usually happens in the natural sciences, where we have very precise determinations of constants and the form of laws. We can then apply statistical techniques to data sets that obey these laws, or exhibit these constants, and we can compare our results to the "true" value. There are some fine examples of such empirical gauging in Stigler (1977), Wilson (1926),Wilson & Worcester (1939).
3.2
Stability Analysis
The other statistical activity used to evaluate techniques is stability analysis. If we make a small and unimportant change in our data, then the result of our technique should not change dramatically. This is a continuity or smoothness condition on the mapping that defines the technique. Classical statistics has always studied stability by using standard errors or confidence intervals. Gifi thinks this is much too narrow, and other forms of stability are important as well. Replication stability. If we replicate our experiment, and then reanalyze the results, the results should not be too different. This is a general scientific principle, to some extent tautological because the principle is implied by the definition of "replication". Statistical stability. Statistics has been described as a poor man's way of replicating experiments. If we cannot actually replicate, because we do not have the time or the money, we assume a statistical model which tells us what will happen if we replicate. We then perform our stability analysis over the hypothetical replications generated by the model. This means computing standard errors, confidence intervals, null-hypothesis tests, and so on. Stability under data selection. If we take a random sample from our data, results of the technique should not change dramatically. Of course this type of sampling is an experiment that we can easily replicate, especially these days with fast computers. The stability analy-
144
Jan de Leeuw
sis based on resampling and subsampling has gained enormous popularity in the last 15 years. Stability under model selection. A small and unimportant change in the model (leaving out a variable, fixing a regression coefficient, allowing for auto-correlation) should have no major consequences for the results of the technique derived from the model. This is especially true, of course, if we vary aspects of the model we are not sure of (such as normality or independence). Much of the study of robustness falls under this heading. Numerical stability. Changing computational precision should not change the results of the technique in a major way. This type of stability is typically studied in numerical analysis, but of course numerical stability is an important property of data analysis techniques as well. Compare the study of robustness, and the bouncing betas of regression analysis. Analytical stability. If the mapping of the data into the statistics space is differentiable, we can compute its derivative, and use this in stability analysis. Algebraic stability. Techniques from linear algebra often use techniques based on perturbation or eigenvalue bounds to establish or quantify stability. Stability under selection of technique. Finally, if we apply a slightly different technique (least absolute deviations instead of least squares), the results should not be too different.
3.3
Models
There are an enormous number of books published these days about modeling. In fact, going through some or all of these books is quite a humbling experience. I do not aim so high. For our purposes a model is just a subset of the statistics space. If we study covariances, it is a set of covariance matrices. If we are interested in five-dimensional contingency tables, then it is a subset of the space of such tables. We must immediately take issue with the idea that the model is, in some sense, "true" (De Leeuw, 1988a). This notion is difficult to define, and largely irrelevant. The definitions given so far lead us to conclude that, if the word means anything, then models are most certainly not true. For our purposes, it suffices that the model assists us in selecting and evaluating statistical techniques. Models can be extremely useful and efficient, even though they are obviously untrue.
4
The Role of Models in Statistics
4.1
Why Models?
Why are models useful, given that they are always false ? There are many reasons, we only mention some important ones. • Science is, presumably, cumulative. This means that we all stand, to use Newton's beautiful phrase, "on the shoulders of giants". It also means, fortunately, that we stand on top of a lot of miscellaneous stuff put together by thousands of midgets. If we want to study a scientific problem we do this in the historical context, and we do not start from scratch. This is one of
Statistics and the Sciences
145
the peculiar things about the social sciences. They do not seem to accumulate knowledge, there are very few giants, and every once in a while the midgets destroy the heaps. But ideally, the model incorporates the prior knowledge in the discipline. • Models facilitate communication. They are languages that users in a particular field have to learn, and that they use to talk to each other efficiently. Regression analysis, path analysis, factor analysis, survival analysis are all examples of this. There is an (unfortunate, I guess) tendency to narrow down the language even more, so that for example in the seventies LISREL became the language of choice for a large group of scientists in various disciplines. If you wanted to get your paper accepted, you had to talk LISREL or SPSS. • Models enhance precision. This is the main reason for using models from the statistical point of view. If there is prior knowledge, in a precise form, then it can be used to sharpen the tools. Although a very specialized tool can only be used in a limited number of situations, in those situations it really works well. If our model, i.e. the formalized theory about the relationship between the variables in our experiment, is very specific, then we can get very low standard errors and very high power from statistical techniques based on the model. There is, obviously, a down-side. If we have a specialized tool, and we want to use it in another situation, then we are in trouble. We are pulling out nails with tweezers, or mowing the lawn with an ax. If we have a tool that can be used in a great many situations, then it may not be very powerful. Think of the Swiss Army Knife, for instance. Again, the social and behavioral sciences are in an unfortunate situation here. Because there is no strong prior knowledge, there are no specialized tools, and thus there is not much power.
4.2
An Example
We give a simple example of the use of models. Suppose the Netherlands has N= 14,000,000 inhabitants. This is the population. We make a list of all these people, and we use a random number generator to select a sample of n= 1,000 of them. For simplicity, suppose we sample with replacement. We compute the number in our sample with an IQ larger than 140. Suppose there are m= 12. We now want to say something about the number of individuals Μ in the population with an IQ larger than 140. What can we say ? Well, obviously Μ > 12. But usually more specific statements are made such as: we estimate Μ to be » 14,000,000 1 M 1, = — χ 12 = 168,000. 1,000
(1)
This estimate is unbiased and has a standard error of about 48,000. Before we analyze what this means, let us look at two other statistical techniques, that also illustrate the role of models. We assume that IQ is normal in the population with mean μ and standard deviation σ. This is our model. Again, it is obviously not "true". The population is finite, and thus at the very most our model is an approximation. The proportion of individuals in the population with an IQ of more than 140 is now p= ι - Φ | — _
146
Jan de Leeuw
If we do not know μ and a we have to estimate them first. Suppose we have IQ measurements for all 1,000 individuals in the sample. The mean turns out to be μ=101.35 and the standard deviation =drive, ^incentive, and //=habit strength. Hull's version then, explicated as a mapping sentence with "operational" terms (e.g. 'trial' rather than 'habit strength'), reads as follows:
186
Ingwer Borg
For rat (r) in maze (χ) it holds that (t min) f\
( ...
)
(m reward units)
(water) deprivation of
' -fi
.(Omin)
(food).
( (0 reward
(n trials)
)f./3i ( ) units). (0 trials)
in Μ
=/4 {k units of performance} of approach behavior towards goal box, where f\
,...,/4
are strictly increasing real-valued functions, given Kmax.
The side constraint t « ) , then for a model Mk Xk = T° = nFk, where T°is the value of Tk obtained when Σ" substitues S in the discrepancy function F untilizes, and F°is the corresponding minimum of F under Mk obtained when Σ(θ° jis fitted to Σ". Asymptotically, Xk = Tk = ηF k . The CFI compares the noncentrality parameter of a model Mk, Xk with the noncentrality parameter of the independence model M·,. It can thus be considered as a measure of of the comparative reduction of the degree of misspecification. The population value is defined as: A _
~ λ^ _ . _
λ,
λ,
In the unnatural case that the sample sizes of Af, and Mk are unequal we have to use: F" Δ = 1- — Κ t is easy to verify that Δ = Δ y + Δ j k , that is, that the increments are additive. The population value may be estimated by Δ = CFI = 1 A
max\Kk, 0 \ 1 max[Xi,Xk ,0j
A
where Xk =ndk, λ, = ndt, and where dn dk are the asymptotically unbiased estimates of F", Fk , with di = (7] - Tk. If this is the case, dk becomes negative. Goffin (1993) points out that RNI may be a less biased estimator of Δ , so that the choice between RNI and CFI might depend on whether one is more interested in a biased but more efficient estimator like the CFI or in a less biased but less efficient estimator like the RNI. He further remarks that the existence of the above inequality may indicate an overfitting. As noted above, there has been an effort devoted to building sensible indices that not only evaluate model fit, but also appropriately penalize excess model complexity. Recent research indicates that these parsimonious fit indexes are not yet ready for routine use (Williams & Holohan, 1994).
5
Model Modification
An initial model may be modified in steps until a final model is obtained which has a better fit to the data. This process is often called specification search (MacCallum, 1986; MacCallum et al., 1992; Silvia & MacCallum, 1988). Model modification can be viewed as a successive detection of specification errors (Kaplan, 1988) together with a subsequent correction for mispeciflcation until an optimally specified model is reached. The logic underlying model modification is not unlike that discussed in connection with incremental fit indexes. At each step, a comparison between two nested models Mj and Mk has
240
Frank Faulbaum and Peter M. Bender
to be performed, where Μ .
may be either more constrained or less constrained as M k . Such a
comparison may be done by applying a χ 2 difference test (D test), a Lagrange Multiplier Test (LM test) or a Wald Test (W test). Asymptotically they all lead to the same results (Satorra, 1989). Moreover the estimated parameter change may be considered. The D test refers to the χ 2 difference of two models, the latter of which is χ 2 distributed with d f j - dfk degrees of freedom ( Μ i being the more constrained model). While the D test must be based on two separate model evaluations, the LM test and W test need only the information of one single evaluation. In its multivariate implementation, the LM test tests on r simultananeous contraints. It computes an LM statistic which is asymptotically distributed as a χ 2 variate and can be used to find those parameters and constraints that should be set free in order to significantly improve the overall fit of the tested model considered. In its univariate form it corresponds to modification indexes offered by the LISREL program. The multivariate W test can be used to test whether a set of free parameters can be simultanously constrained. It computes a W statistic which is also asymptotically χ 2 distributed. The estimated parameter change is naturally associated with an LM test and can be used to help determine whether a parameter, when freed, would be of such a magnitude that is substantially important (Bentler, 1986; Bentler, 1989; Bentler & Chou, 1986). It has been argued that even if an LM test does not suggest statistically that a fixed parameter should be freed, a large Parameter Change statistic implies that the model is badly misspecified and the parameter should be freed for that reason. This is the point of view taken by Saris, Satorra and Sörbom (1987), who developed the concept (see also Kaplan, 1988; Luijben et al., 1988). However, the size of the Parameter Change can be affected by how variables and factors are scaled or identified, so the absolute size of this statistic is hard to interpret. A partially standardized version, SEPC-K, was introduced by Kaplan (1989). Chou and Bentler (1993) developed a multivariate estimated parameter change, MECP together with a fully standardized version of it, which they call SMEPC. Many possible pitfalls like, for example, capitalizing on chance (MacCallum et al., 1992) may occur during the process of model modification. Thus, the process may be influenced by the initial model, and what constraints to be introduced or released may vary from sample to sample. Chou and Bentler (1990) observed that the multivariate LM test may recommend too many parameters to be freed, a fact which could be met by subsequently applying a multivariate W test. Generally, the initial model should be based on strong substantive theoretical arguments, and that only those freely estimated parameters are interpreted which substantially exceed the significance level. In this respect, overfitting beyond the significance level as occasionally recommended to get an optimally specified model (Kaplan, 1988), may be dangerous. Moreover simulation studies may be performed to get an overview of the sampling variations. In seems reasonable to assume, that the fewer steps of modification are taken, the fewer the possibilities of trapping into some of the pitfalls. In this respect the multivariate LM and W tests currently offered only by EQS may have special merits, because they allow an end to the modification process in fewer steps.
6
Concluding Remarks and Suggestions for Future Research
Some conclusions seem apparent. Thus, the statement of Steiger (1990), that there can never a one best coefficient for assessing fit, seems still to be valid, though some simulation studies seem to favor certain classes of indexes. The chi-square test statistic should not be the sole basis
Causal Modeling: Some Trends and Perspectives
241
for determining model fit. Particularly, the failure of the variables to satisfy the distributional assumptions of the test statistic can lead to the rejection of correct models or the failure to reject incorrect models. Thus, the test statistics corresponding to the proper distributional assumptions, should be used, though ML chi-square seems to be rather robust under certain circumstances. Further no single measure of overall fit should be relied on exclusively, different types of indexes should be mixed and additional exploratory analyses like screening of outliers (Bollen, 1989; Bollen & Arminger, 1991) should be performed. Although, of course, the objective of fitting structural equation models is to understand a substantive area, not simply to obtain an adequate fit, and the resulting model should add our substantive understanding (Bollen & Long, 1992, p. 129), this question evidently does not pertain to the proper area of statistics. The background theories needed for the interpretation of structural modeling results belong to what is commonly called metadata. (Hand, 1992, 1994). The question of how to assist the interpretation of structural modeling results by knowledgebased systems and, thus, making possible computer-assisted interpretation and modification of models might not be an obsolete one (Faulbaum, 1994). Suitable technology like, for example, interactive deductive data bases, are already in use in other fields. In any case, methods of finding alternative models, which transcend the normally used paradigms are welcome, though it may be difficult to judge what models can in principle already be discovered by applying the modification methods of CSM. In fact, Bentler (1989) gave an example how nonstandard models can be specified by using the Bentler-Weeks model representation. The findings of Lee and Hershberger (1989) and Stelzl (1986) may also be helpful. Their rules may be viewed as model generating rules. Finally programs for finding causal graphs like TETRAD may assist the user in finding empirically and theoretically attractive initial candidate models already lying very near the final model, so that extensive statistically based modification is not needed. Coming back to statistical analysis we found some promising progress in robust estimation methods, which circumvent the drawbacks of ADF estimation. What is, nevertheless, needed are distribution-free estimators and tests that work well in small to medium sized samples, and that can be computed when the number of variables is very large (say, 50). Generally methods for the analysis of very large models do currently not exist. The development of methods for large models, e.g., 200 variables, 30 factors, 500 parameter that are correct although not necessarily optimal (not ML) still wait for realization. Still other uses of structural equation modeling in addition to those discussed here, seem to be attractive like, e.g. methods for doing multiple group modeling when group membership is not known, e.g., normal mixture distributions with people belonging to one group or another with certain probabilities (clustering under a structural model). Despite substantial progress in many fields of CSM, several helpful features are still not available in current computer programs like correct and efficient ways to deal with missing data, e.g., using the EM algorithm for correct standard errors and chi-squares in such a case. Other example procedures still needing to be developed are the ability to impose constraints of positive semidefiniteness on covariance matrices in a model, multiple comparison procedures to correct probability levels in models with multiple tests (see Cudeck & Dell, 1994, for an approach relevant to this issue) and, finally, completely graphical computer programs (e.g., diagram input, diagram output), perhaps with scanner input for immediate model running from a sketch.
242
Frank Faulbaum and Peter M. Bentler
References Alwin, D.F. & Jackson, D.J. (1980). Measurement models for response errors in surveys: Issues and applications. In K. Schuessler (Ed.), Sociological Methodology 1980 (S. 68-199). San Francisco: Jossey Bass. Amemiya, Y. (1985). On the goodness-of-fit for linear structural relationships (Tech Rep No. 10). Stanford, A: Stanford University, Econometric Workshop. Amemiya, Y. & Anderson, T.W. (1990). Asymptotic chi-square tests for a large class of factor analysis models. Annals of Statistics, 18, 1453-1463. Anderson, J.C. & Gerbing, D.W. (1984). The effect of sampling error on convergence, improper solutions and goodness-of-fit indices for maximum likelihood confirmatory factor analysis. Psychometrika, 49, 155-173. Anderson, J.C. & Gerbing, D.W. (1988). Structural equation modeling in practice: A review and recommended two-step approach. Psychological Bulletin, 103, 411-423. Anderson, T.W. & Amemiya, Y. (1988). The asymptotic normal distribution of estimators in factor analysis under general conditions. Annals of Statistics, 16, 759-771. Arminger, G. (1994). Specification and estimation of non-standard mean and covariance structure models with MECOSA. In F. Faulbaum (Ed.), SoftStat'93. Advances in statistical software 4 (pp. 13-22). Stuttgart: Gustav Fischer. Arminger, G. & Schoenberg, R.J. (1989). Pseudo maximum likelihood estimation and a test for misspecification in mean and covariance structure models. Psychometrika, 54, 409-425. Baldeijahn, I. (1989). Robustness of estimation methods against small sample size and nonnormality in confirmatory factor analysis models. In O. Opitz (Ed.), Conceptual and Numerical Analysis of Data (pp. 3-11). Berlin: Springer.. Baumrind, D. (1993). Specious causal attributions in the social sciences: The reformulated steppeing-stone theory of Heroin use as exemplar. Journal of Personality and Social Psychology, 45, 1289-1298. Bearden, W.O., Sharma, S. & Teel, J.E. (1982). Sample size effects on chi-square and other statistics used in evaluating causal models. Journal of Marketing Research, 19,425-430. Bekker, P.A., Merckens, A. & Wansbeek, T.J. (1994). Identification, equivalent models, and computer algebra. Boston: Academic Press. Bentler, P.M. (1980). Multivariate analysis with latent variables: Causal modeling. Annual Review of Psychology, 31, 419-456. Bentler, P.M. (1983).Some contributions to efficient statistics for structural models: Specification and estimation of moment structure. Psychometrika, 48, 493-517. Bentler, P.M. (1984). Theory and Implementation og EQS. A structural equations program. Los Angeles: BMDP Statistical Software. Bentler, P.M. (1986). Lagrange Multiplier and Wald tests for EQS and EQS/PC. Los Angeles: BMDP Statistical Software. Bentler, P.M. (1989). EQS Structural equations program manual. Los Angeles: BMDP Statistical Software Bentler, P.M. (1990a). Comparative fit indexes in structural models. Psychological Bulletin. 107, 238-246. Bentler, P.M. (1990). Fit indexes, Lagrange multipliers, constraint changes and incomplete data in structural models. Multivariate Behavioral Research, 25, 163-172. Bentler, P.M. & Berkane, M. (1986). The greatest lower bound to the elliptical theory kurtosis parameter. Biometrika, 73, 240-241. Bentler, P.M. & Bonett, D.G. (1980). Significance tests and goodness of fit in the analysis of covariance structures. Psychological Bulletin, 88, 588-606.
Causal Modeling: Some Trends and Perspectives
243
Bentler, P.M. & Chou, C.-P. (1986, April). Statistics for parameter expansion and contraction in structural models. Paper presented at American Educational Research Association meetings, Los Angeles. Bentler, P.M. & Chou, C.-P. (1987). Practical issues in structural modeling. Sociological Methods & Research, 16, 78-117. Bentler, P.M., Lee, S.-Y. & Weng, L.-J. (1987). Multiple population covariance structure analysis under arbitrary distribution theory. Communications in Statistics-Theory and Methods, 16, 1951-1964. Bentler, P. M. & Mooijaart, A. (1989). Choice of structural model via parsimony: A rationale based on precision. Psychological Bulletin, 106, 315-317. Bentler, PM. & Weeks, D.G. (1980). Linear structural equations with latent variables. Psychometrika, 45, 283-308. Bentler, P.M. & Wu, E.-J. (1993). EQS/Windows user's guide. Los Angeles: BMDP Statistical Software. Blalock, H.M. (1964). Causal inferences in nonexperimental research. Chapel Hill: University of Carolina Press. Bohrnstedt, G.W., Möhler, P.Ph. & Müller, W. (Eds.). (1987). An empirical study of the reliability and stability of survey research items (Special issue). Sociological Methods & Research, 15 (3). Bollen, Κ.A. (1987). Total, direct, and indirect effects in structural equation models. In C.C. Clogg (Ed.), Sociological Methodology 1987 (pp. 37-69). San Francisco: Jossey Bass. Bollen, Κ.A. (1989a). Structural equations with latent variables. New York: Wiley. Bollen, Κ.A. (1989b). A new incremental fit index for general structural equation models. Sociological Methods & Research, 17, 303-316. Bollen, Κ.A. (1990a) Overall fit in covariance structure models: Two types of sample size effects. Psychological Bulletin, 107, 256-259. Bollen, Κ.A. (1990b). Outlier screening and a distribution-free test for vanishing Tetrads. Sociological Methods & Research, 19, 80-92. Bollen, Κ.A. & Arminger, G. (1991). Observational residuals in factor analysis and structural equation models. In P.V. Marsden (Ed.), Sociologicial Methodology 1991 (pp. 235-262). Washington, D.C.: American Sociologcial Association. Bollen, K.A. & Long, J.S. ( 1992). Tests for structural equation Models. Sociological Methods & Research, 21, 123-131. Bollen, K.A. & Ting, K.-F. (1993). Confirmatory Tetrad analysis. In P.V. Marsden (Ed.), Sociological Methodology 1993 (pp. 147-175). Washington, D.C.: The American Sociological Association. Bollen, K.A. & Long, J.S. (1993). Testing structural equation models. London: Sage. Boomsma, A. (1982). The robustness of LISREL against small sample sizes in factor analysis models. In K.G. Jöreskog & Η. Wold (Eds.), Systems under indirect observation, Part I (pp. 149-173). Amsterdam: North Holland. Boomsma, A. (1983). On the robustness of LISREL (maximum likelihood estimation) against small sample size and nonnormality. Ph.D. Thesis, University of Groningen. Boudon, R. (1979). Generating models as a research strategy. In R.K.Merton, J.S. Coleman & P.H. Rossi (Eds.), Qualitative and quantitative research, (pp. 51-64). New York: The Free Press. Browne, M.W. (1982). Covariance structures. In D.M. Hawkins (Ed.), Topics in applied multivariate analysis (pp. 72-141). London: Cambridge University Press.
244
Frank Faulbaum and Peter M. Bentier
Browne, M.W. (1984). Asymptotically distribution-free methods for the analysis of covariance structures. British Journal of Mathematical and Statistical Psychology, 37, 62-83. Browne, M.W. (1985. July). Robustness of normal theory tests of factor analysis and related models against nonormally distributed factors. Paper presented at the Fourth European Meeting of the Psychometric Society and the Classification Societies, Cambridge, England. Browne, M.W. (1987). Robustness of statistical inference in factor analysis and related models. Biometrika, 74, 375-384. Browne, M.W. & Cudeck, R. (1989). Single sample cross validation indices for covariance structures. Multivariate Behavioral Research, 24, 445-455. Browne, M.W. & Cudeck, R. (1992). Alternative ways of assessing model fit. Sociological Methods & Research,. 21, 230-258. Browne, M. W. & Mels, G. (1992). RAMONA user's guide. Columbus OH: Ohio State University, Department of Psychology. Browne, M. W. & Shapiro, A. (1988). Robustness of normal theory methods in the analysis of linear latent variate models. British Journal of Mathematical and Statistical Psychology, 41, 193-208. Byrne, Β. M. (1994). Structural equation modeling with EQS and EQS/Windows. London: Sage. Chou, C.-P., Bentler, P. M. & Satorra, A. ( 1991). Scaled test statistics and robust standard errors for non-normal data in covariance structure analysis: A Monte Carlo study. British Journal of Mathematical and Statistical Psychology, 44, 347-357. Chou, C.-P.& Bentler P.M. (1990). Model modificaton in covariance structure modeling: A comparison among Likelihood Ratio, Lagrange Multiplier, and Wald Tests. Multivariate Behavioral Research, 25, 115-136. Chou, C.-P. & Bentler, P.M. (1993). Invariant standardized estimated parameter change for model modification in covariance structure analysis. Multivariate Behavioral Research, 28, 97-110. Cliff, N. (1983). Some cautions concerning the application of causal modeling methods. Multivariate Behavioral Research, 18, 115-126. Cox, D.R. & Wermuth, N. (1993). Linear dependencies represented by chain graphs. Statistical Science, 8, 204-283. Cudeck, R. & Browne, M.W. (1983). Cross-validation of covariance structures. Multivariate Behavioral Research, 18, 147-167. Cudeck, R. & Henly, S.J. (1991). Model selection in covariance structures analysis and the "problem" of sample size: A clarification. Psychological Bulletin, 109, 512-519. Cudeck, R. & O'Dell, L.L. (1994). Applications of standard error estimates in unrestricted factor analysis: Significance tests for factor loadings and correlations. Psychological Bulletin, 115,475-487. Dunn, G., Everitt, B. & Pickles, A. (1993). Modelling covariances and latent variables using EQS. London, Chapman & Hall. Faulbaum, F. (1983). Konfirmatorische Analysen von Wichtigkeitseinstufungen beruflicher Merkmale. ZUMA-Nachrichten, 13, 22-44. Faulbaum, F. (1994). Kontextuelle Wissensbasen als Erweiterung der Datenanalyse. In H. Best, B. Endres-Niggemeyer, M. Herfurth & H.-P. Ohly (Eds.), Informations- und Wissensverarbeitung in den Sozialwissenschaften (pp. 185-205). Opladen: Westdeutscher Verlag. Faulbaum, F. (in press). Zur Entdeckung von Einflußstrukturen: Die Rolle von TETRAD II im Prozeß der Modellspezifikation. In R. Brandmaier & C. Rietz (Eds.), Methodische Grundlagen und Anwendungen von Strukturgleichungsmodellen. Stuttgart: Gustav Fischer.
Causal Modeling: Some Trends and Perspectives
245
Fornell, C. & Yi, Y. (1992). Assumptions of the two-step approach to latent variable modeling. Sociological Methods & Research, 20, 291-320. Gerbing, D.W. & Anderson, J.C. (1992). Monte Carlo evaluations of goodness of fit indices of structural equation models. Sociological Methods & Research, 21, 132-160. Goffin, R.D. (1993). A comparison of two new indices for the assessment of fit of structural equation models. Multivariate Behavioral Research, 28, 205-214. Gollob, H.F. & Reichardt, C.S. (1987). Taking account of time lags in causal models. Child Development, 58, 80-92. Hand, D.J. (1992). Microdata, macrodata and metadata. In Y. Dodge & J. Whittaker (Eds.), Computational Statistics. Volume 2 (pp. 325-340). Heidelberg: Physica Verlag. Hand, D.J. (1994). The impact of artifical intelligence on statistical data analysis. In F. Faulbaum (Ed.), SoftStat '93. Advances in statistical software 4 (pp. 3-10). Stuttgart: Fischer. Harlow, L.L. (1985). Behavior of some elliptical theory estimators with nonnormal data in a covariance structure framework: A Monte Carlo study, Ph.D. Thesis, University of California, Los Angeles. Hartmann, W. (1989). Proc CALIS: Analysis of covariance structures. In F. Faulbaum, R. Haux & K.H. Jockel (Eds.), SoftStat' 89. Fortschritte der Statistik-Software 2 (pp. 74-81). Stuttgart: Gustav Fischer. Hausman, J. (1978). Specification tests in econometrics. Econometrica, 46,1251-1272. Heise, D. (1975). Causal analysis. New York: Wiley. Henly,. S.J. (1993). Robustness of some estimators for the analysis of covariance structures. British Journal of Mathematical and Statistical Psychology, 46, 313-338. Hoelter, J.W. (1983). The analysis of covariance structures: Goodness-of-fit indices. Sociological Methods & Research, 11, 325-344. Holland, P.W. (1986). Statistics and causal inference. Journal of the American Statistical Association, 81, 945-970. Holland, P.W. (1988). Causal inference, path analysis, and recursive structural equation models. In C.C. Clogg (Ed.), Sociological Methodology 1988 (pp. 449-484). San Francisco: Jossey Bass. Hu, L.-T. & Bentler P.M. (1992). Can tests statistics in covariance structure analysis be trusted? Psychological Bulletin, 112, 351-362. James, L.R., Mulaik, S.A. & Brett, J.M. (1982). Causal analysis: Assumptions, models, and data. Beverly Hills: Sage. Jöreskog, K.G. (1967). Some contributions to maximum likelhood factor analysis. Psychometrika, 32, 443-482. Jöreskog, K.G. (1969). A general approach to confirmatory maximum likelihood factor analysis. Psychometrika, 34, 183-202. Jöreskog, K.G. (1970). A general method for analysis of covariance structures. Biometrika, 57, 239-251. Jöreskog, K.G. (1971). Statistical analysis of sets of congeneric tests. Psychometrika, 36, 109133. Jöreskog, K.G. (1977). Structural equation models in the social sciences: Specification, estimation and testing. In P.R. Krishnaiah (Ed.), Applications in statistics (S. 265—287). Amsterdam: North-Holland. Jöreskog, K.G. (1978). Structural analysis of covariance and correlation matrices. Psychometrika, 43, 443-477. Jöreskog, K.G. (1981). Statistical models for longitudinal studies. In F. Schulsinger, S.A. Mednick & J. Knop (Eds.), Longitudinal research (pp. 118-124). London: Martinus Nijhoff.
246
Frank Faulbaum and Peter Μ. Bentler
Jöreskog, K.G. & Sörbom, D. (1988). LISREL 7, A guide to the program and applications. Chicago: SPSS Inc. Jöreskog, K.G. & Sörbom, D. (1993). LISREL 8: Structural equation modeling with the SIMPLIS command language. Chicago: Scientific Software International. Kano, Y. (1990). A simple adjustment of the normal theory inference for a wide class of distribution in linear latent variate models. Technical Report, University of Osaka Prefecture, Osaka, Japan, Department of Mathematical Sciences, College of Engineering. Kano, Y., Berkane, M. & Bentler, P.M. (1990). Covariance structure analysis with heterogeneous kurtosis parameters. Biometrika, 77, 575-585. Kaplan, D. (1988). The impact of specification error on the estimation, testing and improvement of structural equation models. Multivariate Behavioral Research, 23, 69-86. Kaplan, D. (1989). Model modification in covariance structure analysis: Application of the expected parameter change statistic. Multivariate Behavioral Research, 24, 285-305. Kaplan, D. (1991). The behavior of three weighted least squares estimators for structured means analysis with non-normal Likert variables. British Journal of Mathematical and Statistical Psychology, 44, 333-346. Kenny, D.A. (1979). Correlation and causality. New York: Wiley. LaDu, T.J. & Tanaka, J.S. (1989). The influence of sample size, estimation method, and model specification on goodness-of-fit assessments in structural equation models. Journal of Applied Psychology, 74, 625-635. Lee, S.-Y., Poon, W.-Y. & Bentler, P.M. (1990). A three stage estimation procedure for structural equation models. Psychometrika, 55, 45-51. Lee, S.-Y. & Poon, W.-Y. (1992a). Structural equation models with continuous and polytomous variables. Psychometrika, 57, 89-105. Lee, S.-Y. & Poon, W.-Y. (1992b). Two-level analysis of covariance structure for unbalanced designs with small level-one samples. British Journal of Mathematical and Statistical Psychology, 45, 109-123. Lee, S-Y. & Tsui, K.-L. (1982). Covariance structure analysis in several populations. Psychometrika-Vol 47, 297-308. Lee, S. & Hershberger, S. (1990). A simple rule for generating equivalent models in covariance structure modeling. Multivariate Behavioral Research, 25, 313-334. Lohmöller, J.-B. (1989). Latent variable path modeling with partial least squares. Heidelberg: Physica Verlag. Long, S.J. & Trivedi, P.K. (1992). Some specification tests for the linear regression model. Sociological Methods & Research,. 21, 161-204. Luijben, T.C., Boomsma, A. & Molenaar, I.W. (1988). Modification of factor analysis models in covariance structure analysis, A Monte Carlo study. In T.K. Dijkstra (Ed.), On model uncertainty and its statistical implications (pp.70-101). New York: Springer. MacCallum, R.C. (1986). Specification searches in covariance structure modeling. Psychological Bulletin, 100, 107-120. MacCallum, R. C. & Tucker, L. R. (1991). Representing sources of error in the common-factor model: Implications for theory and practice. Psychological Bulletin, 109, 502-511. MacCallum, R. C., Roznowski, M., & Necowitz, L. B. (1992). Model modifications in covariance structure analysis: The problem of capitalization on chance. Psychological Bulletin, 111, 490-504. MacCallum, R.C. & Browne, M.W. (1993). The use of causal indicators in covariance structure models: Some practical issues. Psychological Bulletin, 114, 533-541.
Causal Modeling: Some Trends and Perspectives
247
MacCallum, R. C., Wegener, D. T., Uchino, B.N. «St Fabrigar, L. R. (1993). The problem of equivalent models in applications of covariance structure analysis. Psychological Bulletin, 114, 185-199. MacCallum, R.C., Roznowski, M., Mar, C.M. & Reith, J.V. (1994). Alternative strategies for cross-validation of covariance structure models. Multivariate Behavioral Research, 29, 132. Marini, B. & Singer, B. (1988). Causality in the social sciences. In Clogg, C. (Ed.). Sociological Methodology 1988 (pp. 347-409). San Francisco: Josey Bass. Marsh, H.W., Balla, J.R. & McDonald, R.P. (1988). Goodness-of-fit indexes in confirmatory factor analysis: The effect of sample size. Psychological Bulletin, 103, 391-410. McArdle, J.J. & Cattell, R.B. (1994). Structural equation models of factorial invariance in parallel proportional profiles and oblique confactor problems. Multivariate Behavioral Research, 29, 63-113. McArdle, J. J. & McDonald, R. P. (1984). Some algebraic properties of the Reticular Action Model for moment structures. British Journal of Mathematical and Statistical Psychology, 37, 234-251. McDonald, R.P. (1980). A simple comprehensive model for the analysis of covariance structures: Some remarks on applications. British Journal of Mathematical and Statistical Psychology, 33, 161-183. McDonald, R. P. (1988). An index of goodness-of-fit based on noncentrality. Journal of Classification, 5, 97-103. McDonald R. P. & Marsh, Herbert, W. (1990). Choosing a multivariate model: Noncentrality and goodness of fit. Psychological Bulletin, Vol. 107, No. 2, 247-255. Meredith, W. (1993). Measurement invariance, factor analysis and factorial invariance. Psychometrika, 58, 525-543. Mooijaart, A. & Bentler, P. M. (1991). Robustness of normal theory statistics in structural equation models. Statistica Neerlandica, 45, 159-172. Muirhead, R.J. (1982). Aspects of multivariate statistical theory. New York: Wiley. Mulaik, S.A. (1987). Toward a conception of causality applicable to experimentation and causal modeling. Child Psychology, 58, 19-32. Mulaik, S.A. (1993). Objectivity and multivariate statistics. Multivariate Behavioral Reseach, 28, 171-203. Mulaik, S.A., James,L.R., Van Alstine, J., Bennett, N. Lind, S. & Stillwell, C.D. (1989). Evaluation of goodness-of-fit indices for structural equation models. Psychological Bulletin, 105, 430-445. Muthen, B.O. (1988). LISCOMP. Analysis of linear structural equations with a comprehensive measurement Model (2nd edition). Mooresville: Scientific Software. Muthin, B.O. (1989a). Multiple-group structural modelling with non-normal continuous variables. British Journal of Mathematical and Statistical Psychology, 42, 55-62. Muthen, B.O. (1989b). Latent variable modeling in heterogenous populations. Psychometrika, 54, 557-585. Muthen, B.O. (1994). Multilevel covariance structure analysis. Sociological Methods & Research, 22, 376-398. Muth£n, B.O. & Kaplan, D. (1985). A comparison of some methodologies for the factor analysis of non-normal Likert variables. British Journal of Mathematical and Statistical Psychology, 38, 171-189.
248
Frank Faulbaum and Peter Μ. Bentier
Muthen, Β. & Kaplan, D. (1992). A Comparison of some methodologies for the factor analysis of non-normal Likert variables: A note on the size of the model. British Journal of Mathematical and Statistical Psychology 45, 19-30. Neale, M.C. (1991). Mx: Statistical modeling. Richmond, VA: McNeale. Pearl, J. & Verma, T. (1991). A theory of inferred causation. In J.A. Allen, R. Fikes & E. Sandewall (Eds.), Proceedings of the Second International Conference on the Principles of Knowledge Representation and Reasoning (pp. 441-452). San Mateo CA: Morgan Kaufman. Poon, W. L. & Lee, S. Y. (1991). A distribution free approach for analysis of two-level structural equation model. Computational Statistics & Data Analysis, 17, 265-275. Poon, W.-Y. & Lee, S.-Y. (1992). Statistical analysis of continuous and polytomous variables in several populations. British Journal of Mathematical & Statistical Psychology 45, 139149. Potthast, M.J. (1993). Confirmatory factor analysis of ordered categorical variables. British Journal of Mathematical and Statistical Psychology, 46, 273-286. Salmon, W.C. (1984). Scientific explanation and the causal structure of the world. Princeton: Princeton University Press. Saris, W.E., Satorra, A. & Sörbom, D. (1987). The detection and correction of specification errors in structural equation models. In C.C. Clogg (Ed.), Sociological Methodology 1987 (pp. 105-129). San Francisco: Jossey Bass. SAS Institute, Inc. (1990). SAS/STAT user's guide. Volume 1. Cary NC: SAS Institute Inc. Satorra, A. (1989). Alternative test criteria in covariance structure analysis: A unified approach. Psychometrika, 54, 131-151. Satorra, A. & Bentler, P.M. (1988a). Scaling corrections for chi-square statistics in covariance structure analysis. American Statistical Association 1988 proceedings of the Business and Economics Sections (pp. 308-313). Alexandria, VA: American Statistical Association. Satorra, A. & Bentler, P.M. (1988b). Scaling corrections for statistics in covariance structure analysis. (UCLA Statistics Series 2). Los Angeles: University of California, Department of Psychology. Satorra, A. & Bentler, P.M. (1990). Model conditions for asymptotic robustness in the analysis of linear relations. Computational Statistics & Data Analysis, 10, 235-249. Satorra, A. & Bentler, P.M. (1991). Goodness-of-fit test under IV estimations: Asymptotic robustness of a NT test statistic. In R. Gutierrez & M.J. Valderrama (Eds.), Applied stochastic models and data analysis (pp. 555-567). Singapore: World Scientific. Satorra, A. & Bentler, P.M. (in press). Corrections to test statistics and standard errors in covariance structure analysis. In A. von Eye & C.C. Clogg (Eds.), Analysis of latent variables in developmental research. London: Sage. Schepers, A. & Arminger, G. (1992). MECOSA-user guide: A program for the analysis of mean and covariance structures with non-metric dependent variables. Frauenfeld: SLIAG. Schoenberg, R. & Arminger, G. (1990). LJNCS 2.0. Linear covariance structures. A computer program for the analysis of linear models incorporating measurement error disturbances as well as structural distrurbances. Kent WA: RJS Software. Shapiro, A. (1984). A note on the consistency of estimators in the analysis of moment structures. British Journal of Mathematical and Statistical Psychology, 37, 84-88. Shapiro, A. (1987). Robustness properties of the MDF analysis of moment structures. South African Statistical Journal, 21, 39-62.
Causal Modeling: Some Trends and Perspectives
249
Shapiro, A. & Browne, M.W. (1987). Analysis of covariance structures under elliptical distributions. American Statistical Association, 82, 1092-1097. Silvia, E.S. & MacCallum, R.C. (1988). Some factors affecting the success of specification searches in covariance structure modeling. Multivariate Behavioral Research, 23, 297-326. Simon, H.A. (1979). The logic of causal ordering. In R.K. Merton, J.S. Coleman & P.H. Rossi (Eds.), Qualitative and quantitative social research (pp. 65-81). New York: Free Press. Spirtes, P., Glymour, C. & Scheines, R. (1993). Causation, prediction and search. Berlin: Springer. Spirtes, P., Scheines, R., Glymour, C. & Meek, C. (1993). TETRAD II. Tools fo discovery. Pittsburgh: Spirtes, Scheines, Glymour and Meek. Sörbom, D. (1976). A statistical model for the measurement of change. In D.N.M. de Gruijter & J.L.T. van der Kamp (Eds.), Advances in psychological and educational measurement (pp. 159-169). New York: Wiley. Steiger, J.H. (1990). Structural model evaluation and modification: An interval estimation approach. Multivariate Behavioral Research, 25, 173-180. Steiger, J.H. (1994). SEPATH-A Statistica for Windows structural equations modeling program. In F. Faulbaum (Ed.), SoftStat'93. Advances in statistical Software 4 (pp. 99-105). Stuttgart: Gustav Fischer. Steiger, J.H. & Lind, J. (1980). Statistically based tests for the number of common factors. Paper presented at the Annual Meeting of the Psychometric Society, Iowa City. Stelzl, I. (1986). Changing a causal hypothesis without changing the fit: Some rules for generating equivalent path models. Multivariate Behavioral Research, 21, 309-331. Suyapa, E.,. Silvia, M. & Robert, R.C. (1988). Some factors affecting the success of specification searches in covariance structure modeling. Multivariate Behavioral Research, 23, 297-326. Tanaka, J.S. (1984). Some results on the estimation of covariance structure models. Unpublished doctoral dissertation, University of California, Los Angeles. Tucker, L.R. & Lewis, C. (1973). A reliability coefficient for maximum likelihood factor analysis. Psychometrika, 38, 1-10. Wermuth, N. & Lauritzen, S. (1990). On substantive research hypotheses, conditional independence graphs and graphical chain models. Journal of the Royal Statistical Society, Series B, 52, 21-50. Wheaton, B. (1987). Assessment of fit in overidentified models with latent variables. Sociological Methods & Research, 16, 118-154. Whittaker, J. (1990). Graphical models in applied multivariate statistics. New York: Wiley. Williams, L. S., & Holohan, P. S. (1994). Parsimony based fit multiple indicator models: Do they work? Structural Equation Modeling, 1, 161-189.
Attitude Theory and Measurement: Implications for Survey Research leek Ajzen and Dagmar Krebs
1
Historical Perspective
Scientific progress is facilitated by advances in scientific methods, advances that enable us to measure phenomena of interest with greater reliability and precision. A good case in point is the development of sophisticated attitude scaling techniques that took place in the three-decade span preceding the 1960s. Armed with these methods, survey researchers set out to assess a profusion of attitudes with respect to a wide range of social issues (see the collections of attitude scales in Robinson, Shaver, & Wrightsman, 1991 and Shaw & Wright, 1967). At the same time, social psychologists encouraged by the enhanced scientific status of the attitude construct embarked on an ambitious program of experimental research designed to develop and test theories of attitude structure, formation, and change (see Eagly & Chaiken, 1993, for an up-to-date overview). The work of survey researchers and experimenters proceeded, for the most part, along parallel lines with little concern for the interdependence of attitude theory and measurement. Efforts to link attitude measurement in an integral fashion to the requirements of theory arose largely in response to the frequently observed failure of attitudes to predict actual behavior. Although disappointing empirical findings concerning the attitude-behavior relation had started to appear already in the 1930s (e.g., Corey, 1937; LaPierre, 1934), the greatest challenge to the prevailing conceptualization and measurement of attitude was posed by a review of the empirical literature in the late 1960s which revealed little if any relation between verbal attitudes and overt actions (Wicker, 1969). The poor predictive validity of existing attitude measures stimulated development of theories concerning the attitude-behavior relation and, at the same time, it made researchers aware of the need to adapt their attitude measures to the requirements of these newly developed theories. The present chapter examines the measurement implications of several attitude conceptualizations, as well as the ramifications of these conceptualizations for the prediction of behavior. Consideration of early unidimensional definitions of attitude is followed by a discussion of the tripartite model, which views attitudes as comprised of cognitive, affective, and conative components. Measurement procedures appropriate to these conceptions are reviewed, and the possibility of behavioral prediction is addressed. We then turn to contemporary theoretical perspectives, especially the theories of reasoned action and planned behavior, and discuss their implications for attitude measurement and the prediction of behavior.
Attitude Theory and Measurement: Implications for Survey Research
2
251
Unidimensional Conceptions of Attitude
Attitude measurement procedures are closely tied to the definition of attitude, but not all definitions have clear measurement implications. Consider, for example, Allport's (1935) well-known definition of attitude as "... a mental and neural state of readiness, organized through experience, exerting a directive or dynamic influence upon the individual's response to all objects and situations with which it is related" (p. 810). While holding out the promise that attitudes can predict any behavior with respect to the attitude object, this definition provides no guidelines for operationalizing its central feature, the "mental and neural state of readiness." It is perhaps for this reason that no attitude scaling techniques conforming to Allport's definition have ever been developed. By way of contrast, Thurstone's (1931) definition of attitude as affect for or against a psychological object, although vilified by Allport (1935) and others as overly simplistic, opened the way for the development of widely used attitude scaling methods. Well aware of the complexity of people's social attitudes, Thurstone nevertheless recognized that all measurement involves abstraction and has to restrict itself to clearly defined continua along which a construct can be quantified. For Thurstone, the essential property of attitude was the evaluation or affect with respect to an object, thus defining the relevant underlying continuum as ranging from positive to negative, favorable to unfavorable, liking to disliking. A variety of alternative definitions of attitude have been offered over the years (see Fishbein & Ajzen, 1975; McGuire, 1985 for reviews), but most contemporary social psychologists would agree with Thurstone that the characteristic attribute of attitude is its evaluative dimension. This consensus is reflected in a recent textbook definition of attitude as "... a psychological tendency that is expressed by evaluating a particular entity with some degree of favor or disfavor" (Eagly & Chaiken, 1993, p.l).
2.1
Measures of Unidimensional Attitudes
Attitude scales are generally designed to produce, for each respondent, a score that places the individual on a certain point along the unidimensional evaluative continuum, from extremely negative, through neutral, to extremely positive. Many methods are available to accomplish this aim (see, e.g., Edwards, 1957; Fishbein & Ajzen, 1975; Green, 1954; Himmelfarb, 1993 for general overviews). The present discussion briefly considers some of the common procedures and basic principles, and examines their implications for the attitude-behavior relation. 2.1.1
Direct Measures
The simplest procedure for placing individuals on a point along an evaluative dimension is to ask them to report directly on their own attitudes. To assess attitudes toward legal abortion, for example, respondents could be asked to circle a number on the following scale: I favor legal abortion :
1
: 2 : 3 : 4 : 5 : 6 : 7 : 8 : 9 :
I oppose legal abortion
For many purposes, measures of this kind are quite adequate. Their main drawback is the high degree of random measurement error often associated with single responses. For this reason, it is usually preferable to use a multi-item measure, even when assessing attitudes directly. By far the most popular multi-item direct measurement procedure is the semantic differential
252
leek Ajzen and Dagmar Krebs
developed by Charles Osgood and his associates (Osgood, Suci, & Tannenbaum, 1957). It consists of a set of bipolar adjective pairs selected for their evaluative content by means of factor analysis. The adjectives are placed on opposite ends of 7-place scales, and respondents are asked to evaluate the attitude object by placing a checkmark on each scale, as in the following example: Legal abortion is harmful beneficial good bad desirable undesirable unpleasant pleasant nice awful Responses are scored from -3 on the negative side of each scale to +3 on the positive side, and the attitude toward legal abortion is computed by summing the five item scores. Easily constructed, the semantic differential has proved very popular in attitude research, especially in the context of laboratory experiments. Its major disadvantage, shared by other direct assessment procedures, is that it may elicit relatively superficial evaluative responses. Because they do not require thorough examination of the different aspects involved in a complex issue such as legal abortion, responses to direct attitudinal probes can easily be biased by the particular wording of the issue or by temporarily salient situational cues. In addition, direct measures also do not permit us to explore and map the contents of an attitudinal domain. 2.1.2 Indirect Measures Many limitations of direct measures can be overcome by adopting an indirect measurement procedure. These measures provide opportunities for respondents to consider different aspects of an attitudinal domain and their responses to a set of specific questions are used to infer their attitudes. In Thurstone scaling (Thurstone, 1931; Thurstone & Chave, 1929), a sample of judges is used to establish the evaluative implication, i.e., the scale value, of each of a large set of items. Evaluatively unambiguous items are selected for inclusion in the attitude scale. Respondents are asked to check the statements with which they agree, and their attitude scores are computed by averaging the endorsed items. Table 1 shows a few statements and their scale values, taken from a Thurstone scale that was developed to assess attitudes toward capital punishment (Peterson & Thurstone, 1933). Although it is sometimes useful to know the location of each item on the attitude continuum, the need to use an independent sample of judges to establish item scale values is costly and time consuming. For this reason, many survey researchers prefer Likert scaling (Likert, 1932). In Likert's method of summated ratings, items constructed by the investigator are classified as positive or negative with respect to the attitude object, and responses to these items are assessed on a 5-point multiple-choice agreement scale, as shown in Table 2. Responses are scored from 1 to 5, with reverse scoring for negative items, and the sum over all items on the scale is used as a preliminary attitude score. The final attitude scale is constructed by selecting items that display high correlations with the preliminary attitude score. The signs of the item-total correlations can also be used to verify the initial classifications of items as positive or negative.
Attitude Theory and Measurement: Implications for Survey Research
Scale value
Item
0.0 2.4 5.5 8.4 11.0
Capital punishment is absolutely never justified. Capital punishment cannot be regarded as a sane method to deal with crime. It doesn't make any difference to me whether we have capital punishment or not. We must have capital punishment for some crimes. Every criminal should be executed.
253
Table 1. Sample items and scale values from a Thurstone scale designed to measure attitudes toward capital punishment (Source: Peterson & Thurstone, 1933)
The statements below describe attitudes toward the role of women in society that different people have. There are no right or wrong answers, only opinions. You are asked to express your feelings about each statement by choosing one response between A and Ε indicating best your own attitude: (A) strongly agree; (B) agree; (C) undecided; (D) disagree or (E) strongly disagree. Please indicate your opinion by blackening either A, B, C, D or Ε on the answer sheet for each item. 1. Swearing and obscenity are more repulsive in the speech of a woman than of a man. 2. Women should worry less about their rights and more about becoming good wives and mothers. 3. Under modern economic conditions with women being active outside the home, men should share in household tasks such as washing dishes and doing the laundry. 4. Women should take increasing responsibility for leadership in solving the intellectual and social problems of the day. 5. Intoxication among women is worse than intoxication among men. Table 2. Examples of Likert litems (with Instruction) from the attitudes toward women scale (Source: Spence, Helmreich & Stapp, 1973)
2.2
Prediction of Behavior
Although the unidimensional scaling procedures described above can be used to measure attitudes with respect to virtually any object, behavior, or event, until fairly recently, most surveys have dealt with attitudes toward broad social issues, such as religion, capital punishment, racial discrimination, and political liberalism/conservatism. Consistent with Allport's definition of attitude, and with the views of most other theorists, such global attitudes were assumed to predict and explain the various behaviors people display with respect to the attitude object. The weak and inconsistent empirical relations between attitudes and behavior that became apparent in the 1960s thus came as a great surprise, and they threatened to undermine confidence in the utility of the attitude construct. To be sure, the attitudes people express on important social issues can
254
leek Ajzen and Dagmar Krebs
be of interest in their own right, as can be comparisons among the attitudes of different segments of the population. Nevertheless, it had been an article of faith that these attitudes are not merely verbal expressions of a position on an issue, but that they are significant determinants of behavior toward the attitude object. Thus, religious people were expected to express their religiosity in various ways, and prejudiced individuals were expected to display more discriminatory behavior than nonprejudiced individuals. Although going against intuition, the lack of a strong relation between attitudes and behavior was anticipated by Thurstone (1931) who wrote that two individuals could have identical attitude scores but"... their associations about the psychological object might be entirely different and ... their overt actions might take quite different forms" (p. 236). It is only in their totality that the associations or beliefs about an object imply an attitude of a certain favorability, and it is only in their totality that overt behaviors show the influence of the attitudinal disposition. This idea is captured in the principle of aggregation (Ajzen, 1988; Fishbein & Ajzen, 1974). Any single behavior is determined by a multitude of factors besides attitudes. We can assume, however, that whereas attitudes have some effect on all relevant behaviors, these other factors differ from one behavior to another. By aggregating over a representative sample of behaviors in the attitudinal domain, the other "contaminating" factors are removed, leaving only the consistent impact of the attitude. In support of the aggregation principle, empirical research over the past 20 years has shown that although general attitudes do not predict specific actions, they correlate very well with measures of behavior that are based on multiple acts with respect to the attitude object (for reviews, see Ajzen, 1988; Ajzen & Fishbein, 1977; Eagly & Chaiken, 1993; Kraus, 1994). 2.2.1
Attitude Accessibility
Despite these findings, the search has continued for conditions under which general attitudes will either be poor or good predictors of specific behaviors. A large number of potential moderating variables have been suggested, including personality characteristics, involvement, amount of knowledge, and the internal consistency of the attitude (see Ajzen, 1988 for a review). The most comprehensive framework for understanding the operation of moderating variables, however, has been offered by Fazio (1986, 1990). According to Fazio's analysis, attitudes, and especially general attitudes, often fail to predict specific behaviors because they are not sufficiently accessible to be automatically activated. Fazio's (1990) MODE model distinguishes between a deliberative or controlled mode of information processing and spontaneous or automatic processing, usually defined as processing that occurs without cognitive effort (cf. Bargh, 1984). Attitudes are assumed to guide behavior in a spontaneous fashion when people are either not sufficiently motivated to engage in extensive deliberations, or when they are incapable of doing so. But to guide behavior in a spontaneous fashion, attitudes have to be automatically activated in the presence of the attitude object. Measures of attitude will fail to predict behavior when the behavior is performed under circumstances that favor neither deliberative construction or recall of the relevant attitude nor its automatic activation. The spontaneous mode of attitude-behavior influence is said to involve a sequence of steps. At the point of behavior, the attitude must be automatically activated, thus becoming available to guide action. Automatic activation, in turn, depends on attitude strength, because strong attitudes, with a well-established association between object and evaluation, are assumed to be chronically accessible in memory. Once activated, the attitude is said to influence perception or construction of the situation and thus to produce attitude-consistent behavior.
Attitude Theory and Measurement: Implications for Survey Research
255
Using response latency as an indicator of attitude accessibility, it has been shown that attitudes with relatively fast response times predict behavior better than attitudes with relatively slow response times (Fazio, Powell, & Williams, 1989; Fazio & Williams, 1986). In addition, variables likely to influence attitude accessibility, such as direct experience with the attitude object and repeated expressions of the attitude, have also been shown to improve attitudinal prediction of behavior (Fazio, Chen, McDonel, & Sherman, 1982; Houston & Fazio, 1989; Powell & Fazio, 1984; Regan & Fazio, 1977). Recent analyses and empirical studies, however, have questioned the proposition that only highly accessible attitudes are automatically activated (Bargh, Chaiken, Govender, & Pratto, 1992), and have tried to explain the moderating effects of such variables as direct experience in terms of attitude stability rather than accessibility (Doll & Ajzen, 1992).
2.3
The Expectancy-Value Model of Attitude
Thurstone and Likert scaling are based on the assumption that people's attitudes, i.e., their favorability with respect to a given issue, can be inferred from their agreement or disagreement with opinion or belief statements concerning the issue in question. This idea is formalized and developed in what is probably the most widely accepted theory of attitude in social psychology today, the expectancy-value (EV) model (see Feather, 1982). One of the most complete statements of this model can be found in Fishbein's (1963, 1967a) summation theory of attitude, although somewhat narrower versions were proposed earlier by Peak (1955) and Rosenberg (1956). In Fishbein's theory, people's evaluations of, or attitudes toward, an object are determined by their salient beliefs about the object, where a belief is defined as the subjective probability that the object has a certain attribute (Fishbein & Ajzen, 1975). The terms "object" and "attribute" are used in the generic sense and they refer to any discriminable aspect of an individual's world. For example, a person may believe that there is a 75% chance that making abortion illegal will result in the birth of unwanted children. The belief object, "making abortion illegal," is linked to the attribute, "birth of unwanted children," with a subjective probability of 0.75. Each belief thus associates the attitude object with a certain attribute. According to the EV model, a person's overall attitude is determined by the subjective values or evaluations of the attributes associated with the attitude object and by the strength of these associations. Specifically, the evaluation of each attribute contributes to the attitude in direct proportion to the person's subjective probability that the object has the attribute in question. The basic structure of the model is shown as follows,
where A is the attitude toward some object, bj is the strength of the belief (the subjective probability) that the object has attribute i, ej is the evaluation of attribute i, and η is the number of beliefs associated with the object. People can, of course, form many different beliefs about an object, but it is assumed that they can attend to only a relatively small number (perhaps 5 to 9) at any given moment. It is these salient beliefs that are considered to be the prevailing determinants of a person's attitude. Based on the EV model, a measure of attitude can be obtained by eliciting salient beliefs, assessing the strength and attribute evaluation for each salient belief, computing the products of belief
256
leek Ajzen and Dagmar Krebs
strength and attribute evaluation, and summing the products across all salient beliefs. A large number of empirical studies have provided support for the basic structure of the EV model of attitude (see Eagly & Chaiken, 1993; Fishbein & Ajzen, 1975). 2.3.1 Implications for Survey Research Thurstone and Likert scales can be used to assess not only people's locations on the evaluative continuum vis-ä-vis the attitude object, but also to map the attitudinal domain in terms of broad areas of concern. The expectancy-value model goes a step further in that it attempts to get at the underlying determinants of attitudes. In contrast to agreement or disagreement with belief statements formulated by the investigator, which can be used to infer attitudes but do not necessarily constitute the determinants of those attitudes, it is assumed that salient beliefs do provide information about the important considerations that produce the observed attitudes. In work with the EV model of attitude, it has been recommended practice to conduct a pilot study in which salient beliefs about the attitude object are elicited in a free-response format (see Ajzen & Fishbein, 1980). For example, when the object in question is a person or group of people, salient beliefs are elicited by asking respondents to list the characteristics, qualities, and attributes of the person or group. In a similar manner, when the attitude object is a behavior, such as mountain climbing, the respondents can be asked to list the advantages and disadvantages of mountain climbing, and anything else that comes to mind when they think about engaging in this activity. The particular beliefs listed by a given respondent are considered to be that person's salient beliefs, and the most frequently mentioned beliefs in the pilot sample constitute the set of beliefs salient in the research population, termed modal salient beliefs (Fishbein & Ajzen, 1975). It is then possible to assess belief strength (subjective probability) of each salient belief as well as its attribute evaluation, as in the following example. Belief Strength Capital punishment is an effective deterrent to crime extremely likely : : : : : : : : extremely unlikely Attribute Evaluation extremely good :
:
An effective deterrent to crime is : : : : :
: extremely bad
By examining the relations between each salient belief (its strength and its attribute evaluation) and the overall attitude, we can gain some understanding of the factors that support a given attitude; and by comparing the salient beliefs of different segments of the population, we can learn why people differ in the attitudes they hold (see Ajzen & Fishbein, 1980 for illustrations).
3
Multidimensional Conceptions of Attitude
Although there is today widespread agreement that attitude is best defined as a unidimensional construct, it is possible to go beyond attitude to consider the structure of the domain to which it applies. Two approaches to this issue are discussed on the following pages, one focusing on
Attitude Theory and Measurement: Implications for Survey Research
257
categories of responses subsumed under the attitude construct, the other on an attitude's antecedents and consequences.
3.1
The Tripartite Model of Attitude
It is generally acknowledged that attitude is a latent variable or hypothetical construct. Being inaccessible to direct observation, it must be inferred from measurable responses, and given the nature of the construct, these responses must reflect positive or negative evaluations of the attitude object. Beyond this requirement, however, virtually no limitations are placed on the kinds of responses that can be considered. To simplify matters it is possible to categorize attituderelevant responses into various subgroups. The most popular classification system goes back at least to Plato and distinguishes between three categories of responses: cognition, affect, and conation (see Allport, 1968; Hilgard, 1980; and McGuire, 1985 for general discussions). The cognitive category consists of responses that reflect perceptions of, and information about, the attitude object; the affective category consists of feelings with respect to the object; and the conative category consists of behavioral inclinations, intentions, commitments, and actions with respect to the attitude object. Examples of the three response classes with respect to attitudes toward Castro's Cuba are shown in Table 3 (see Ostrom, 1969).
Cognitive component
The US should not concern herself with Cuba. The Cuban people are industrious and hardworking.
Affective component
The thought of a communist Cuba disgusts me. I am overjoyed that liberation has come to the Cuban people.
Conative component
If I were president I would order an invasion of Cuba. I would award Castro the Nobel Peace Prize.
Table 3. Sample items used to assess the cognitive, affective, and conative components of attitudes toward Castro's Cuba (Source: Ostrom, 1969)
3.1.1 A Hierarchical Model Thus far we have assumed that an evaluative disposition is the same, whether it is inferred from responses of a cognitive, affective, or conative nature. The tripartite view of attitude, however, holds that cognitive, affective, and conative response tendencies represent conceptually distinct components of attitude (see e.g., Krech, Crutchfield, & Ballachey, 1962; McGuire, 1985). Specifically, the model offered by Rosenberg and Hovland (1960), which serves as the starting point of most contemporary analyses, is a hierarchical model that includes cognition, affect, and conation as first order factors and attitude as a single second order factor. In this model, the three components are defined independently and yet comprise, at a higher level of abstraction, the single construct of attitude.
258
leek Ajzen and Dagmar Krebs
The empirical implications of the hierarchical attitude model can be stated as follows. Given that the three components reflect the same underlying attitude, they should correlate to some degree with each other. Yet, to the extent that the distinction between cognitive, affective, and conative response categories is of psychological significance, measures of the three components should not be completely redundant. In combination, these expectations imply correlations of moderate magnitude among measures of the three components. A number of attempts have been made over the years to confirm the discriminant validity of measures designed to tap the different components with the aid of multi-trait multi-method matrices and by means of confirmatory factor analyses. Depending on the method used and the assumptions made, the data have variously been interpreted either as supporting a tripartite model or a single factor model (see the exchange between Dillon and Kumar, 1985 and Bagozzi and Burnkrant, 1985). The major issue seems to revolve around whether differences between measures of the cognitive, affective, and conative components are to be interpreted as due to differences in the methods used to assess them (i.e., as theoretically uninteresting method variance) or as due to true differences between conceptually independent components. At a general level, however, most of the data reported in the literature is quite consistent with the hierarchical model in that a single factor is found to account for much of the variance in attitudinal responses, and the correlations among measures of the three components, although leaving room for some unique variance, are typically of considerable magnitude (see, e.g., Breckler, 1984). 3.1.2 Implications for Survey Research It has sometimes been argued that a proper account of existing attitudes, and accurate prediction of behavior, require independent assessment of the cognitive, affective, and conative components of attitude. However, given the strong empirical relations among measures of the three components it is now generally agreed that attitudes can be inferred from any one of the three response classes; most often they are inferred from verbal cognitive responses (see Eagly & Chaiken, 1993). Similarly, there is little if any improvement in the prediction of behavior when the measure of attitude is based on more than one response category. Thus, although it may be of theoretical interest to obtain independent measures of the three components, unidimensional measures are easier to construct and sufficient for most practical purposes.
3.2
The Theories of Reasoned Action and Planned Behavior
The hierarchical tripartite model of attitude views cognition, affect, and conation as parallel first-order factors and overall evaluation or attitude as a general second-order factor. The theory of reasoned action (Ajzen & Fishbein, 1973, 1980; Fishbein, 1967b; Fishbein & Ajzen, 1975) and its successor, the theory of planned behavior (Ajzen, 1985, 1988, 1991) offer a theoretical framework that structures the attitudinal domain in a different way. Within this framework, the term "attitude" is reserved strictly for the overall evaluative response while cognition, affect, and conation are treated as conceptually distinct antecedents or consequences of attitude. The theories of reasoned action and planned behavior recognize that global attitudes toward a broad target cannot be expected to predict specific behaviors with respect to that target. Ajzen and Fishbein (1977; Ajzen, 1988) formulated a principle of compatibility to clarify the conditions under which strong attitude-behavior correlations can be expected. Similar to Guttman's (1955) contiguity hypothesis, the principle of compatibility states that measures of attitude and behavior are compatible, and should thus correlate with each other, to the extent
Attitude Theory and Measurement: Implications for Survey Research
259
that they address the same behavior, directed at the same target, and in the same context. It can be seen that the principle of aggregation discussed earlier is a special case of the principle of compatibility. Compatibility can be established either by aggregating behaviors to elevate the generality of the behavioral measure to that of a general attitude (aggregation), or by measuring attitudes with respect to the specific behavior of interest. Numerous investigations have supported the principle of compatibility by showing that attitudes correlate strongly with behavior when the two constructs are assessed at the same level of generality or specificity (e.g., Fishbein & Ajzen, 1974; Weigel & Newman, 1976; Werner, 1978; for reviews see Ajzen, 1988; Kraus, 1994). 3.2.1 The Theory of Reasoned Action The theory of reasoned action (TRA) applies the principle of compatibility to the prediction of specific behavioral tendencies. On the assumption that most behaviors of interest to social psychologists are under volitional control (see Ajzen & Fishbein, 1980), the TRA stipulates that intention is the immediate antecedent of the corresponding behavior. At least with respect to volitional acts, people are expected to do what they intend to do. Intention, in turn, is determined by two factors, the attitude toward the behavior, expressing a personal preference (resulting from multiplication of belief strength and attribute evaluation) and subjective norm, reflecting the perceived social pressure (resulting from a multiplication of specific normative beliefs and motivation to comply with the normative referents) to perform or not to perform the behavior. Attitude and subjective norm combine in a weighted linear fashion to produce the intention. The complete model representing the TRA can thus be written as follows: β = / oc [ w ^ + where Β is the behavior, I is the intention, Ag is the attitude toward the behavior, SN is the subjective norm, and wj and W2 are empirically determined weights. The approximation sign indicates that a measure of intentions is expected to predict subsequent behavior only if intentions have not changed as a result of intervening events. Personality characteristics, demographic variables, and other background factors influence intentions and behavior indirectly by their effects on attitudes and subjective norms. The theory of reasoned action is typically evaluated by means of correlational techniques, where measures of attitude and subjective norm are regressed on a measure of intentions, and a correlation is computed between intentions and behavior. Over the past 25 years, virtually hundreds of studies have tested and applied the TRA in a multitude of different contexts. By and large, the model has been well supported whenever its constructs were carefully operationalized (Eagly & Chaiken, 1993). This conclusion is confirmed by a meta-analysis based on 150 data sets published in 113 articles between 1969 and 1989 (van den Putte, 1991). Disregarding the quality of the methods and procedures employed, the average correlation between intention and behavior was .62, and the average multiple correlation for the prediction of intention was .68. 3.2.2 The Theory of Planned Behavior The requirement imposed by the theory of reasoned action that behavior be under volitional control places limitations on the model's range of application, and the assumption that most behaviors of interest to social psychologists are in fact under volitional control has frequently been
260
leek Ajzen and Dagmar Krebs
challenged (e.g., Bentier & Speckart, 1979; Liska, 1984; Triandis, 1980). The theory of planned behavior (TPB) was developed partly in response to these concerns. Consistent with Bandura's (1977, 1982) work on self-efficacy expectations, the TPB incorporates a construct that deals with people's perception of control over the behavior, i.e., their beliefs that they can perform the behavior if they so desire, that they have the required time, skills, and other resources. Logically and formally, perceived behavioral control (PBC) is expected to interact with the other constructs in the theory: attitudes and subjective norms should influence intentions to the extent that PBC is high, and similarly, the effect of intention on behavior also depends on the degree of perceived behavioral control: Β - PBC • I
oc PflC[w, AB + W2SN].
Research with the theory of planned behavior, however, has shown that most of the variance in intentions and behavior can be accounted for by linear combinations, and the interaction terms are typically not significant (see Ajzen, 1991). Because of these findings, simpler linear models have actually been evaluated in most applications of the theory: B~[WJ
+ W2PBC]
and I oc [W,A B + W2SN +
W3PBC].
Studies testing these models have found in virtually every case that inclusion of perceived behavioral control significantly improves prediction of intentions, and in many instances also prediction of behavior (see Ajzen, in press). 3.2.3 Implications for Survey Research As is the case for attitudes in general, the constructs in the theories of reasoned action and planned behavior can be assessed directly, by means of one or more rating scales, or indirectly on the basis of relevant salient beliefs. Sample items for direct measures concerning the behavior of cheating on a test or exam are shown in Table 3 (see Beck & Ajzen, 1991); the indirect measures of the different constructs follow the logic of the expectancy-value model described earlier. Surveys based on the theory of planned behavior can provide useful information about the important determinants of a given behavior. By examining specific beliefs that are salient with respect to the behavior we gain an understanding of the kinds of considerations that lead to the formation of behavioral intentions and, ultimately, performance or nonperformance of the behavior. These considerations have to do with the perceived consequences of the behavior (underlying attitude toward the behavior), the perceived expectations of important referent individuals or groups (determining the subjective norm), and perceptions concerning required resources and obstacles (determining perceived behavioral control) that may interfere with implementation of an intended behavior. Information of this kind can be used not only to gain an understanding of a behavior's determinants but also to design effective behavioral intervention programs (see, e.g., Terry, 1993; Van Ryn & Vinokur, 1992).
Attitude Theory and Measurement: Implications for Survey Research
4
261
Conclusions
Theoretical developments concerning the structure of attitudes and their relations to overt behavior have important implications for the assessment of attitudes in survey research, implications that are seldom realized in practice. Most surveys continue to rely on Likert-type instruments to assess attitudes toward global social issues. Although such surveys can provide useful information, contemporary attitude theory offers alternative approaches that may prove superior in many applications. The expectancy-value model provides a means not only of measuring attitudes but also of exploring their underlying cognitive and affective determinants. This technique requires construction of statements that represent salient beliefs about the attitude object, and then assessing subjective probabilities as well as evaluations with respect to each belief. Variance in attitudes can be the result of differences in subjective probabilities (belief strength) or differences in evaluations associated with the beliefs. In contrast, the belief statements that appear on Likert scales have a predetermined evaluative significance (they are either clearly positive or clearly negative); no differences in evaluations are admitted. All variance in attitudes must therefore be attributed to differences in belief strength (degree of agreement or disagreement with the items on the scale). Moreover, because the items selected for a Likert scale do not represent salient beliefs, they may have little to do with the actual cognitive and affective determinants of the attitude. Consider, for example, the statement, "Abortion causes psychological distress." Because most respondents tend to agree with this statement, Likert scaling methods would exclude it from a scale designed to assess attitudes toward abortion. Yet this may be one of the important (salient) beliefs about abortion in the research population, being in part responsible for the attitudes people hold. Conversely, the statement, "Most women would be willing to have an abortion" might well be included on a Likert scale because respondents vary in their agreement with the item. However, unless this happens to be a salient belief in the research population, agreement or disagreement with the item does not further our understanding of the factors that determine attitudes toward abortion. In contrast, by using only salient beliefs, and assessing belief strength as well as evaluations in accordance with the expectancy-value model, we obtain a differentiated picture of an attitude's cognitive and evaluative determinants. The great majority of attitude surveys are concerned with general social issues, but for many practical purposes it is important to focus on specific action tendencies. Thus, to design effective intervention programs, we must have some understanding of the factors that determine behaviors such as problem drinking and drug abuse, organ donation, adhering to medical regimens, using public transportation, or recycling paper and other materials. The theories of reasoned action and planned behavior offer a framework for survey research dealing with issues of this kind. They emphasize the importance of focusing on the particular behavior of interest, and on assessing beliefs, attitudes, and intentions with respect to that behavior. Empirical research has demonstrated the utility of this approach and its advantages over the use of traditional item batteries that focus on general social issues.
262
leek Ajzen and Dagmar Krebs
Attitude toward the behavior: Cheating on a test or exam is foolish good pleasant unattractive useful
wise bad unpleasant attractive useless
Subjective norm: If I cheated on a test or exam, most people who are important to me would... approve : : : : : : : : disapprove No one who is important to me thinks it is OK to cheat on a test or exam agree : : : : : : : : disagree Perceived Behavioral Control: easy:
:
For me to cheat on a test or exam is : : : : :
true:
:
If I want to I can cheat on a test or exam : : : : : rfalse
: difficult
Intention: If I had the opportunity, I would cheat on a test or exam likely: : : : : : : :unlikely true:
:
I would never cheat on a test or exam : : : : : :false
Behavior (6 months later): never
How many times have you cheated on a test or exam in the past 6 months? once twice 3 times 4 times 5 times more than 5 times
Table 4. Sample items for assessing the constructs in the theory of planned behavior with respect to cheating on a test or exam (Source: Beck & Ajzen, 1991)
Attitude Theory and Measurement: Implications for Survey Research
263
References Ajzen, I. (1985). From intentions to actions: A theory of planned behavior. In J. Kuhl & J. Beckmann (Eds.), Action control: From cognition to behavior (pp. 11-39). Heidelberg: Springer. Ajzen, I. (1988). Attitudes, personality, and behavior. Chicago: Dorsey Press. Ajzen, I. (1991). The theory of planned behavior. Organizational Behavior and Human Decision Processes, 50,179-211. Ajzen, I. (in press). Decision making. In Ε. T. Higgins & A. W. Kruglanski (Eds.), Social psychology: Handbook of basic principles. New York: Guilford. Ajzen, I., & Fishbein, M. (1973). Attitudinal and normative variables as predictors of specific behaviors. Journal of Personality and Social Psychology, 27, 41-57. Ajzen, I., & Fishbein, M. (1977). Attitude-behavior relations: A theoretical analysis and review of empirical research. Psychological Bulletin, 84, 888-918. Ajzen, I., & Fishbein, M. (1980). Understanding attitudes and predicting social behavior. Englewood Cliffs, NJ: Prentice Hall. Allport, G. W. (1935). Attitudes. In C. Murchinson (Ed.), A handbook of social psychology (pp. 798-844). Worcester, MA: Clark University Press. Allport, G. W. (1968). The historical background of modern social psychology. In G. Lindzey & E. Aronson (Eds.), The handbook of social psychology (2nd Ed., Vol.1, pp. 1-80). Reading, MA: Addison Wesley. Bagozzi, R.P., & Burnkrant, R.E. (1985). Attitude organization and the attitude-behavior relation: A reply to Dillon and Kumar. Journal of Personality and Social Psychology, 49, 4757. Bandura, A. (1977). Self-efficacy: Toward a unifying theory of behavioral change. Psychological Review, 84,191-215. Bandura, A. (1982). Self-efficacy mechanism in human agency. American Psychologist, 37, 122-147. Bargh, J. A. (1984). Automatic and conscious processing of social information. In R. S. Wyer, Jr., & Τ. K. Srull (Eds.), Handbook of social cognition (Vol.3, pp. 1-43). Hillsdale, NJ: Erlbaum. Bargh, J. Α., Chaiken, S., Govender, R., & Pratto, F. (1992). The generality of the automatic attitude activation effect. Journal of Personality and Social Psychology, 62, 893-912. Beck, L., & Ajzen, I. (1991). Predicting dishonest actions using the theory of planned behavior. Journal of Research in Personality, 25, 285-301. Bentler, P. M., & Speckart, G. (1979). Models of attitude behavior relations. Psychological Review, 86, 452-464. Breckler, S.J. (1984). Empirical validation of affect, behavior, and cognition as distinct components of attitude. Journal of Personality and Social Psychology, 47, 1191-1205. Corey, S. M. (1937). Professed attitudes and actual behavior. Journal of Educational Psychology, 28, 271-280. Dillon, W.R., & Kumar, A. (1985). Attitude organization and the attitude-behavior relation: A critique of Bagozzi and Burnkrant's reanalysis of Fishbein and Ajzen. Journal of Personality and Social Psychology, 49, 33-46. Doll, J., & Ajzen, I. (1992). Accessibility and stability of predictors in the theory of planned behavior. Journal of Personality and Social Psychology, 63, 754-765. Eagly, A. H., & Chaiken, S. (1993). The psychology of attitudes. Fort Worth, TX: Harcourt, Brace, Javanovich.
264
leek Ajzen and Dagmar Krebs
Edwards, A.L. (1957). Techniques of attitude scale construction. New York: Appleton Century Crofts. Fazio, R. H. (1986). How do attitudes guide behavior? In R. M. Sorrentino, & Ε. T. Higgins (Eds.), The handbook of motivation and cognition: Foundations of social behavior (pp. 204-243). New York: Guilford. Fazio, R. H. (1990). Multiple processes by which attitudes guide behavior: The MODE model as an integrative framework. In M. P. Zanna (Ed.), Advances in experimental social psychology (Vol. 23, pp. 75-109). San Diego, CA: Academic Press. Fazio, R.H., Chen, J., McDonel, E.C., & Sherman, S.J. (1982). Attitude accessibility, attitudebehavior consistency, and the strength of the object-evaluation association. Journal of Experimental Social Psychology, 18, 339-357. Fazio, R. H., Powell, M. C., & Williams, C. J. (1989). The role of attitude accessibility in the attitude-to-behavior process. Journal of Consumer Research, 16, 280-288. Fazio, R. H., & Williams, C. J. (1986). Attitude accessibility as a moderator of the attitude-perception and attitude-behavior relations: An investigation of the 1984 presidential election. Journal of Personality and Social Psychology, 51, 505-514. Feather, Ν. T. (Ed.). (1982). Expectations and actions: Expectancy-value models in psychology. Hillsdale, NJ: Erlbaum. Fishbein, M. (1963). An investigation of the relationships between beliefs about an object and the attitude toward that object. Human Relations, 16, 233-240. Fishbein, M. (1967a). A consideration of beliefs and their role in attitude measurement. In M.Fishbein (Ed.), Readings in attitude theory and measurement (pp. 257-255). New York: Wiley. Fishbein, M. (1967b). Attitude and the prediction of behavior. In M. Fishbein (Ed.), Readings in attitude theory and measurement (pp. 477-492). New York: Wiley. Fishbein, M., & Ajzen, I. (1974). Attitudes toward objects as predictors of single and multiple behavioral criteria. Psychological Review, 81, 59-74. Fishbein, M., & Ajzen, I. (1975). Belief attitude, intention, and behavior: An introduction to theory and research. Reading, MA: Addison-Wesley. Green, B.F. (1954). Attitude measurement. In G.Lindzey (Ed.), Handbook of social psychology, (Vol.1, pp. 335-369). Reading, MA: Addison-Wesley. Guttman, L. (1955). An outline of some new methodology for social research. Public Opinion Quarterly, 18, 395-404. Hilgard, E.R. (1980). The trilogy of mind: Cognition, affection, and conation. Journal of the History of the Behavioral Sciences, 16, 107-117. Himmelfarb, S. (1993). The measurement of attitudes. In A. H. Eagly & S. Chaiken (Eds.), The psychology of attitudes (pp. 23-87). Fort Worth, TX: Harcourt, Brace, Javanovich. Houston, D. Α., & Fazio, R. H. (1989). Biased processing as a function of attitude accessibility: Making objective judgments subjectively. Social Cognition, 1, 51-66. Kraus, S. J. (1994). Attitudes and the prediction of behavior: A meta-analysis of the empirical literature. Personality and Social Psychology Bulletin. Kretch, D., Crutchfield, R.S., & Ballachey, E.L. (1962). Individual in society. New York: McGraw Hill. LaPierre, R.T. (1934). Attitudes vs. actions. Social Forces, 13, 230-237. Likert, R.A. (1932). A technique for the measurement of attitudes. Archives of Psychology, No. 140. Liska, A.E. (1984). A critical examination of the causal structure of the Fishbein/Ajzen attitude behavior model. Social Psychology Quarterly, 47, 61-74.
Attitude Theory and Measurement: Implications for Survey Research
265
McGuire, W. J. (1985). Attitudes and attitude change. In G. Lindzey & E. Aronson (Eds.), Handbook of social psychology (3rd ed., Vol. 2, pp. 233-346). New York: Random House. Osgood, C. E., Suci, G. J., & Tannenbaum, P. H. (1957). The measurement of meaning. Urbana, IL: University of Illinois Press. Ostrom, T.M. (1969). The relationship between the affective, behavioral, and cognitive components of attitude. Journal of Experimental Social Psychology, 5, 12-30. Peak, H. (1955). Attitude and motivation. In M. R. Jones (Ed.), Nebraska symposium on motivation (Vol. 3, pp. 149-188). Lincoln: University of Nebraska Press. Peterson, R. C., & Thurstone, L. L. (1933). Motion pictures and the social attitudes of children. New York: Arno Press. Powell, M. C., & Fazio, R. H. (1984). Attitude accessibility as a function of repeated attitudinal expression. Personality and Social Psychology Bulletin, 10, 139-148. Regan, D.T., & Fazio, R.H. (1977). On the consistency between attitudes and behavior: Look to the method of attitude formation. Journal of Experimental Social Psychology, 13, 3845. Robinson, J. P., Shaver, P. R., & Wrightsman, L. S. (Eds.). (1991). Measures of personality and social psychological attitudes (Vol. 1). San Diego, CA: Academic Press. Rosenberg, M. J. (1956). Cognitive structure and attitudinal affect. Journal of Abnormal and Social Psychology, 53, 367-372. Rosenberg, M.J., & Hovland, C.I. (1960). Cognitive, affective, and behavioral components of attitudes. In C.I. Hovland, & M.J. Rosenberg (Eds.), Attitude organization and change (p.l 14). New Haven, CT: Yale University Press. Shaw, Μ. E., & Wright, J. M. (1967). Scales for the measurement of attitudes. New York: McGraw-Hill. Terry, D. J. (1993). Self-efficacy expectancies and theory of reasoned action. In D. J. Terry, C. Gallois, & M. McCamish (Eds.), The theory of reasoned action: Its application to AIDSpreventive behavior. United Kingdom: Penguin. Thurstone, L.L. (1931). The measurement of attitudes. Journal of Abnormal and Social Psychology, 26, 249-269. Thurstone, L.L. & Chave, E.J. (1929). The measurement of attitude. Chicago: University of Chicago Press. Triandis, H. C. (1980). Values, attitudes, and interpersonal behavior. In Η. E. Howe, Jr., & M. M. Page (Eds.), Nebraska Symposium on Motivation, 1979 (Vol. 27, pp. 195-259). Lincoln: University of Nebraska Press. van den Putte, Β. (1991). 20 years of the theory of reasoned action of Fishbein and Ajzen: A meta-analysis. Unpublished manuscript, University of Amsterdam, The Netherlands. Van Ryn, M., & Vinokur, A. (1992). How did it work? An examination of the mechanisms through which a community intervention influenced job-search behavior among an unemployed sample. American Journal of Community Psychology, 5, 577-597. Weigel, R.H., & Newman, L.S. (1976). Increasing attitude-behavior correspondence by broadening the scope of the behavioral measure. Journal of Personality and Social Psychology, 33, 793-802. Werner, P.D. (1978). Personality and attitude-activism correspondence. Journal of Personality and Social Psychology, 36, 1375-1390. Wicker, A. W. (1969). Attitudes versus actions: The relationship of verbal and overt behavioral responses to attitude objects. Journal of Social Issues, 25, 41-78.
Reconciling Macro and Micro Perspectives by Multilevel Models: An Application to Regional Wage Differences^ Uwe Blien, Michael Wiedenbeck, and Gerhard Arminger
1
Multilevel Models Bridge the Gap Between the Micro and Macro Level
Multilevel models help to solve a basic problem of the social sciences, namely, the reconciliation between the micro and macro perspectives on social reality. The behavior of youths is explained by referring to their individual characteristics as well as to their families, neighborhoods, and peer groups. The development of wages in time is associated with individual characteristics, such as education and work experience, and with the development of firms or regional labor markets. To understand the test performances of students, the characteristics not only of the students, but also of their classes or schools or regions are important. The relation between the micro (individual) and the macro (aggregate i.e. social or societal) level is important in many sociological and economic theories. In some approaches, the dynamics of the macro level are determined by forces located at the macro level. In others (like the rational choice approaches in sociology or neoclassical economics), macro level dynamics are explained primarily or solely by the preferences of individual decision makers. In both types of approaches, however, the aggregate has an effect at the individual level. Even in individualistic approaches, the society forms conditions for, or places restrictions on, the actor's behavior. To take the micro and macro level into account, empirical analyses use data that have a nested structure: elementary units (such as people, respondents, actors, pupils, or workers) are placed within aggregate units (such as firms, schools, regions, or families). In most cases these data are analyzed in a conventional way, for instance, by using simple OLS regression. The researchers use multivariate statistical methods that treat data of different levels equally. Variables measured at an aggregate level are used to describe the variation of responses or properties of individuals across aggregate units. Linear regression models with random coefficients have been mainly applied in experimental design and educational statistics (cf. Bryk and Raudenbush 1992) In this paper we demonstrate that these models are useful in the more general context of the social sciences. They are appropriate for simultaneous estimation of the regression coefficients associated with both elementary- and aggregate-level variables on a dependent elementary-level variable. They allow a decomposition of variance that is not provided by the ordinary regression. In the paper a short outline of this type of model is given. An analysis of regional wages is presented as an example.
Reconciling Macro and Micro Perspectives by Multilevel Models
2
267
An Example: the Regional Wage Curve Hypothesis
An important research problem in sociology and economics is the explanation of regional patterns of economic activity and of related regional disparities. There are different theories to explain why production and wages are high in some regions and low in others. These theories refer to specific regional cultures, to regional networks between firms or employees, to regional business cycles and to other factors related to the respective regions. Blanchflower and Oswald (1992) established the so-called wage-curve, that is, an inverse relationship between the regional wage level and regional unemployment. Its shape is assumed to be a decreasing convex function of wages on the unemployment rate. Its theoretical foundation are regional versions of the efficiency wage theory and of bargaining models. The efficiency wage approach argues that workers are motivated to work hard either by a wage premium or by the implied threat of dismissal and consequent unemployment. If the unemployment rate is high, workers who lose their jobs will, obtain no income or only unemployment benefits since it will be difficult to find another job. "Knowing this, firms need pay only low wages to extract the required level of effort from workers. Fear of unemployment then disciplines workers. If unemployment is low, by contrast, employers have to offer high wages. If they do not, employees are likely, realizing that it will be easy to find another job if dismissed, to take the risk" of putting little effort into their work (Blanchflower and Oswald 1992 p. 9). An extra wage premium is needed to motivate workers. As an alternative to the efficiency-wage approach, a bargaining modelcan be assumed to hold: the relative strength of unions and of individual workers in wage negotiations is weakened by unemployment. The existence of the wage curve was corroborated by some empirical studies, in the case of Germany, by Gerlach and Wagner (1993). The wage-curve is a functional relationship between aggregates, not individuals. At least in the efficiency wage version of the approach, individual wages cannot be determined by individual workers. They are the outcomes of calculations of the firms that react to the conditions on the regional markets. There are other possible influences at the regional wage level. Average wages depend on differences in distribution of worker skills and types of jobs. Individual wages are related to the personal characteristics of workers (e. g., qualification, or occupational experience). Since regional wages may be influenced by the composition of the regional work force, an analysis has to take the attributes of single workers into account as well. Since it is well known that the labor markets for men and for women differ in many respects, gender should also be taken into account in the analysis. In their first analysis of the wage curve, Blanchflower and Oswald (1990) estimated a simple wage function of the Mincer type, using OLS regression. They controlled for all personal characteristics of the individual worker and for their unequal distribution across regions. As usual, the logarithm of the individual wage was the dependent variable. They introduced dummies for regions and the unemployment rate measured at the regional level. The researchers assumed that the presented OLS approach gave consistent estimates not only of the regression coefficients of the aggregate-level variables, but also of their asymptotic standard errors. In their second paper, Blanchflower and Oswald (1992) changed their research strategy. They used only data for aggregate variables. The distribution of jobs and individual workers was controlled by their respective proportions in the relevant categories. This change in research design was motivated by a discussion among econometricians that showed that merging variables measured at different levels of aggregation can lead to erroneous results.
268
Uwe Blien, Michael Wiedenbeck, and Gerhard Arminger
Moulton (1990) and Kloek (1981), for instance, argued that the standard errors of coefficients estimated by OLS regression are biased if the random disturbances in the regression are correlated. "The magnitude of the downward bias for the standard errors increases with the average group size, the intraclass correlation of disturbances, and the intraclass correlation of regressors. For an aggregate regressor, which is fixed within the groups, the intraclass correlation of the regressor is 1" (Moulton 1990 p. 335). Moulton tried some experiments to assess the magnitude of the resulting error in wage regressions. He found that even random numbers had "significant" effects if they were treated as a variable measured at the regional level and were included in an ordinary wage function with empirical wages measured at an individual level. Therefore, working with aggregate data only - the strategy Blanchflower and Oswald chose in their second paper - is reasonable. However, other serious problems arise. First, aggregating metric variables implies a loss of accuracy. Second, in the case of categorical variables with many categories, the model might no longer be identifiable. To avoid these problems, statistical models, such as random coefficient models, should be used that allow a direct connection between information measured at the micro level and information measured at the macro level.
3
Random Coefficient Models
During the last few years there has been a growing interest in the use of random coefficient models for the analysis of multilevel data. Many important applications are found in school effectiveness research (see e.g., Bock 1989, Raudenbush and Willms 1991). However, their application appears appropriate in a much wider context. In the following, we give a short summary of the rationale for the use of random coefficient models. Some of the arguments have already been mentioned in the foregoing paragraphs and are presented here in a more formal way. We will start with basic features of random coefficient models for single-level data, which can be extended to two-level (and general multilevel) data. The crucial point is the contrast to the traditional way of modeling, where regression coefficients of explanatory variables of different levels on an individual-level dependent variable are estimated within the framework of single-level models. These models are based on the assumption of independent observations, whereas random coefficient models allow for aggregate-level variation of regression coefficients and take into account explicitly the stochastic dependence of the data.
3.1 Linear Random Coefficient Models for Single-Level Data Assume that we have observations of a response variable Y and some exogenous variables X k , k = 1, ... , K. Suppose further that all the variables are microvariables, that is, they vary between individuals. The regression equation is as usual κ (1) k=0
where X° equals 1 for every observation. Yj is decomposed into a systematic part, represented by a linear function of the X's, and a random part ^ stochastically independent across the individuals. The assumption of stochastic independence may not be fulfilled when the individuals
Reconciling Macro and Micro Perspectives by Multilevel Models
269
are clustered within groups, such as classes or schools in education or regions in the labor market. For the individuals that belong to the same cluster, the errors may be correlated. In addition, the assumption that the regression coefficients are the same for all individuals may be questionable. A typical example where the assumption of equal regression constants and coefficients is too restrictive is the analysis of growth curves in developmental studies. A simple linear growth curve for individual i is given by Y u = a + t ß + eit , i = Ι,.,.,η where t = l,2,...,T denotes consecutive time points. Individual differences are captured by letting α and β vary for each individual, yielding individual growth curves Yit = c c i + t ß i + e i t . Linear regression models with individually varying regression coefficients may be written as Υ,-ixfßL+e, .
(2)
k=0
This model attaches a separate regression hyperplane to each individual. Unlike in the growth curve model, ß'k cannot be identified. If we are not interested in the individual regression coefficients ßk but rather in how much they vary across individuals we may write ß'k as the sum of a systematic and of a random part, that is: ß'k = ß k + 5 k , k = 0,l,...,K, where 5k follows a distribution with expected value 0 and covariance matrix V(5). The theoretical underpinning of this random coefficient model is the principle of exchangeability (see, for example, Lindley and Smith 1972; Longford 1993), according to which the joint distribution of the vectors 6k of random deviations is invariant with respect to any permutation of the indices i. From this point of view, it is not important to assess a specific vector of coefficients for each single individual. Individuals are sampled randomly, and their individually varying structural parameters have a common systematic part β which is a population parameter, whereas the individual parts vary randomly following an unknown centered distribution. Of interest are only the structural population parameters and the dispersion parameters of the distribution which rules the individual deviations. Substitution of ß'k = ßk +6 k in (1) yields Y^ixfo+ixft+e, k=0
k=0
.
(3)
Combining the random parts into a new error term Vj - f x ? δ^+ε, yields the model k=0 Κ Υ; = L x f ß k +Vj with a heteroskedastic error term v,. For each individual i the variance of Yi( k=0
given X,k , k = 0,1,...,K, is var(Yj) = σ2 + XfQX t ,
270
Uwe Blien, Michael Wiedenbeck, and Gerhard Arminger
where Ω = ν ( δ ) and X( is the vector of X-values for the i-th individual. The Yj are still independent, but their conditional variances, given Χ ρ now depend on X; explicitly. This dependence can be detected by inspecting the residuals of an OLS solution under model (1). The OLS yields consistent estimates for β, but the OLS estimates for the standard errors of the parameters are inconsistent.
3.2 Random Coefficient Models for Multilevel Data The arguments for specifying random coefficient models for single-level data carry over to a multilevel situation. In a two-level structure, individuals i are nested within groups j. The endogenous variable Y is again measured at the individual level, the regressors X may be individual-level or group-level variables. We want to emphasize here that the essential feature of building models for multilevel data is the nesting of the data and not the use of group-level regressors. For the two-level structure, a linear random coefficient model is given by
where e;j is the random disturbance with expected value 0 and variance σ 2 .In the analysis of a two level structure we are mainly interested in the variation of the regression coefficients across groups. Therefore we neglect the (possible) individual variation of the regression coefficients within groups and emphasize the variation between groups:
ßü = ß k +5i-
(5)
5Jk now represents a random deviation from the mean regression coefficient of the variable X k , which is common for all individuals in group j. Substitution of (5) in (4) yields
Υυ=Σχ|ί(β1ί+δί)+ευ .
(6)
If δ[ is fixed, model (6) is an ANCOVA-model with δ[ as the regression coefficient of the interaction between variable X j and group j . The application of ANCOVA results in a large number of parameters, which may be too complex for a reasonable interpretation. The conceptualization as random variable makes it not only more parsimonious, but aims at a description of the variability of the effects between different groups. Using a more comprehensive matrix notation, (6) can be formulated equivalently as +
;
(7)
η j is the size of group j , Yj is a η-vector and Xj is a nj χ (K+l)-matrix. We assume that £j and ö j are stochastically independent and normally distributed:
~ Ν(0,σ 2 Ι η ), 6J ~ Ν(Ο,Ω),
where In is the identity matrix of order n^ Then the variance of Yj, given X j ( is
Reconciling Macro and Micro Perspectives by Multilevel Models
var(Y j ) = X j Q X j + a 2 I n j .
271 (8)
If Ω * 0 the individual components of Yj are not independent anymore. The conditional covariance of two individuals i and i' of the same group j, given Xj, is cov(Yij,Yi,) = X
i
^
j
.
(9)
Model (7) has some well-known sub models. If in model (7) all components of 5 j are restricted to zero, it becomes the usual regression model (1). The so-called random intercept model has the intercept as the only random variable. On the other hand model (7) is a special case of the mixed model
Yj-Xjßk + Z ^ + E j ,
(10)
which is different from model (7) in the explanatory variables of the error term. This specification seeks to model the dispersion of the dependent variable, which is not in the line of the rationale of model (7) and was not pursued in the course of our analysis. Though it can be embedded in a model of type (7), unless there are strong reasons based on substantive theory, it is more advisable to stick to more parsimonious models. If Ω and σ were known, we could use Generalized Least Squares to estimate the average regression coefficients ß. In general, these parameters are unknown and must themselves be estimated. If 5 j has only a "small" variance, the results of OLS estimation may become very similar to the results of an estimation which takes the variance of random coefficients into account. However, if the variation of the random components 5j is large, there may be a loss of efficiency and the risk of biased estimation of standard errors of the estimated regression coefficients, if OLS is used for model (7). Hox and Kreft (1994), for example, emphasize the considerable increase of type-I errors caused by even small variances in random group effects. These disadvantages depend on the design matrix X. The analysis of the impact of X on efficiency of estimation and biases in the estimated standard errors leads to complex functions, nonlinear in X, which are not tractable and are therefore neglected here. Here we think further research is needed to assess theoretically the advantages of estimation methods which take the dependencies of the observations fully into account, in contrast to the computationally much easier method of OLS. A useful discussion of this topic is given in Longford (1993). There are nevertheless some indications that in general more serious biases are incurred in the standard errors of estimated fixed effects of group-level regressors than of individual-level ones. Earlier results, such as in Kloek (1981) or Moulton (1990), point in that direction. They show that the underestimation of the standard errors of group level regressors may be considerable.
3.3 Special Identification Problems In linear random coefficient models for two-level structures, one usually assumes that only a part of the regression coefficients are random. The choice will primarily depend on substantive theory. However, there are some additional identification problems that must be taken into account. A typical example is the simultaneous use of random coefficients for the intercept, which is included in almost all random coefficient models, and a random coefficient for a second-level variable. Since the second-level variable varies only across groups, the variance of
272
Uwe Blien, Michael Wiedenbeck, and Gerhard Arminger
a random coefficient for such a variable cannot be distinguished from the variance of the random intercept and should therefore be set to 0.
3.4 Random Intercepts and Shifts of Origin in the Explanatory Variables Explanatory variables may be used in a raw state, but sometimes for reasons of substantive theory, it is more advisable to use them after a transformation. Thus, it may be more reasonable to use the difference between a variable and its grand mean as explanatory variable instead of the variable itself. Another (trivial) transformation consists in merely excluding some explanatory variables from the model. In other cases, a nonlinear transformation has to be applied to the explanatory variables. In general, any transformation of the explanatory variables means an essential change of the model; in other words, the class of distributions for dependent variable, which is defined by the model, is changed if the explanatory variables are transformed. A well-known and simple example is the "centering" of the explanatory variables of a model similar to (1), i.e., replacing them by the differences between them and their grand means, when an intercept is not included. For in general, the mean of the dependent variable in a model with uncentered explanatory variables is located in a different subspace than in the case of centered variables. A problem arises when the explanatory variables are merely equivalent representatives of variables that are generated by a class of transformations. In such a case it is natural to require that the class of random distributions, which are defined by a model, remains the same when the explanatory variables are substituted for by their transformed versions. This is referred to as invariance with respect to linear transformations, namely, shifts in the origins of explanatory variables. There may be variables without a "natural" or theoretically defined origin, and the setting of the origin of an explanatory variable may be arbitrary to a certain degree. But inferences based on the estimated parameters of the model should not be influenced by the choice of origin. As one might conclude from (8) and (9), the covariance structure of the dependent variable Y varies with a shift of the origins of the explanatory variables X , because it depends explicitly on X. This would be a violation of the principle of invariance. Nevertheless, the shift of origins does not result in a different means or covariance structure of the endogenous variable Y. Following the arguments of Longford (1989), we writeXj = (1 , Z j ) , β = (β 0 ,γ) and 6 j = (δ£,θ ] ). The shift in the origin of Z\ to a k , a constant value not depending on i or j, can be formally described by transforming Z|j to Zjj - a k or, in matrix notation, transforming Zj to Zj - l Dj a T , where 1* is the vector (1,1, .. ,1) with nj components and a T = ( a ' . a 2 , . . . ^ * ) . After rearranging some terms, model (7) can be written equivalently as η = lBj (ß 0 + a T y) + (Z, - lBj a T )y + lBj (6j + a T 9 j ) + (Z j - l Bj a T ) 9 j + E j . The shift in the origin of Z y to a T results in a new intercept βό = ß 0 + a T y and a new random deviation
from the intercept: S"0' = 5j + a T 6 j . The variance of δ^ is:
vai(8j J ) = var(6j) + 2a T cov(5j,8 j ) + a T var(0 j )a . The random deviations for the Ζ -variables remain unchanged, and their covariance with 6g is cov(50J,0J) = cov(5g,0 ] ) + a T var(9 j ). By elementary algebra, it is shown that the mean and the
Reconciling Macro and Micro Perspectives by Multilevel Models
273
covariance structure of the endogenous variable are not changed. Under the assumption of normally distributed disturbances, the shift-operation renders an equivalent model. The dependence of the variances of group disturbances on origins of the regressors is important for the interpretation of the results. Shifting of the origins far beyond the range of the regressor variables leads to arbitrarily large values for the variance of the random intercept. This is obvious in a typical situation depicted in Figure 1.
Figure 1: Group-specific regression lines Because the variances of other random effects, the slopes, are not changed, the impact of the variability of the macro-units via the regressors is less important, compared with the random intercepts. Intuitively, centering of the regressor variables should yield sensible results. For var(9j) nonsingular, var(50j) attains its minimum at the shift in the origin of Zj to a*T = -cov(6j,6 j )(var(0 j )) _1 , where COV(6Q ,9 j ) is equal to zero. One possibility would be to use this "minimizing" shift a*T. Then the random intercept and the random slopes would not be correlated, and the two sources of variability would be separated in an optimal way. A recursive scheme could be employed to set the origin simultaneously with estimating all the model parameters, but this raises technical problems.
3.5
Estimation and Computation
Most of the computer programs presently available use normal theory maximum likelihood estimation. A comprehensive review is given in Kreft et al. (1990). A useful discussion of the relations between different algorithms is also given in Longford (1993). Our analysis has been performed with VARCL, written by N.T. Longford (Longford 1988). Computational aspects matter a great deal, because large groups would, in principle, require the inversion of large matrices. The iterative procedure of VARCL is based on a Fisher scoring algorithm, which appears to be very fast under ordinary circumstances. Nevertheless, DeLeeuw and Kreft (1994) point out that ill-conditioned data can lead to slow convergence, a problem common with ordinary regression.
274
Uwe Blien, Michael Wiedenbeck, and Gerhard Anninger
3.6 Explained Variance In ordinary regression models, the proportion of the "explained" variance, R 2 , i.e., 1 minus the proportion of the estimated residual variance compared to the estimated total variance of the dependent variable, is used as a measure of the fit of the model. There is no simple extension to random coefficient models. A natural analogue of R 2 is based on the reduction of the estimated variances for the individual-level disturbance as well as for the group-level disturbance. The null model, Υ^ , is used as a reference. The explained variance R 2 , referring to the individual level, is defined as cf 2_ R,-1-^, where σ2 is the model-based individual variance, and model.
the individual variance of the null
For the group level, the definition is Χ,ΩΧί where Ω0 represents the variance of the group-level disturbance of the empty model. The explained group-level variance depends on X^. This makes interpretation difficult, because in extreme cases each observation has its own reduction in group-level variance.
4
Data and Variables to Test the Wage-Curve Hypothesis
In our analysis two different sources of data are used: the unemployment and a sample from the employment statistics of the BA (Bundesanstalt fur Arbeit, Federal Employment Services). The regional unemployment rate is obviously important for estimating wage curves. It is measured as the average for 1989. Since many characteristics of regional economies are correlated with the degree of urbanization (e. g., the price level), a binary variable indicates whether the respective district is a town or a rural district. The wage variable and the other independent variables are measured at the individual level. The plan is to include all regions of western Germany in a test of the wage-curve hypothesis. A random sample with data from 1% of all employed workers will be available soon. Up until now, only a smaller sample with 1% of the data from 22 municipal and rural districts (NUTS III) could be used, in an exploratory study. The sample drawn from the employment statistics has 9083 cases. It includes white- and blue-collar employed workers covered by the social security system. This criterion excludes civil servants and individuals with an income lower than about DM 500 a month. Here, we give only a very short description of the data (for more details, cf. Blien 1994; Cramer 1986; Rudolph 1986; and Alba, Müller, and SchimplNeimanns, in this volume). Wages are reported either only at the end of the year (December 31th) or if a worker leaves a certain firm. Only employment spells from 1989 are used for the analysis. The longest possible spell is exactly one year. Each spell is treated independently, with the 1% sample referring to a percentage of employment cases and not to a percentage of individual workers.
Reconciling Macro and Micro Perspectives by Multilevel Models
275
Most of the variables normally used in estimating wage functions are reported in the records of the employment statistics. The WAGE variable gives the daily average gross income a worker earned in an employment spell in 1989. For persons with wages exceeding a defined threshold, the contribution assessment ceiling of the social insurance system, only the value of this threshold is reported in the data (i.e. DM 6175 a month in 1989). The duration of a spell is computed not in days worked but in calender days. Two dummy variables control for working time lower than 19 hours a week and for weekly working time between 19 and 35 hours. A binary variable represent the genders. It is equal to "one" for men and "two" for women. Professional experience in the labor market is approximated by age minus 15 plus the estimated duration the respective worker needs for any qualification level above the lowest. To control for nonlinearity, it is included in its squared form as well. Occupational status, qualification level, industry, and occupational group are indicated by sets of dummy variables. The categories can be seen in Table 2, where the reference groups are set in parentheses following the heading of the respective groups. It should be noted that the industry categories are recorded not for individual workers but for firms. Since from laborsegmentation approaches, it can be expected that wages differ between firms, it would have been interesting to have included firms in the analysis as an intermediate level between regions and individuals. This was not possible since a 1% sample of all employment cases was used. Most firms in the sample have only one employee. All cases with missing values in the wage variable were excluded from the analysis. In the other variables there was only a small percentage of missing values, for example, in the case of occupation, 1.5 %. The highest proportion of missing values was found in the qualification level. Here 6 % of the cases have missing values. The missing values were included in the reference categories of the respective group of dummy variables.
5
Results
The following model is estimated: (11) This model is a modification of the usual wage function approach that was used to analyse regional wages (cf Bellmann, Gerlach 1983; Gerlach & Kehlbeck, 1988; Wagner, 1991; Gerlach & Jakobi, 1990; Wagner, 1993). As usual, the natural logarithm of the wage is the dependent variable (Mincer, 1974). The intercept ß 0j = ß0 + 5 0j varies randomly between regions. In order to include the segmentation of the labor market according to gender, the coefficient ßij = (ß, of the gender variable x nj is random; z, is the unemployment rate, and z2 is the variable indicating the degree of urbanization. x2 to x, are the variables measured at the individual level. If the intercept and gender coefficient are assumed to be constant, then the standard wage function can be estimated with OLS. The regional variables extend the form of the ordinary wage function. The null multilevel model that does not include explanatory variables is fitted. This model decomposes the variance between the regional and the individual level. Table la displays the results. In the null model the proportion of variance at the regional level is 3.4 % of the total variance. As usual, most variation is at the individual level.
276
Table la. Variance components of the null multilevel model Variance Standard deviation Individual level 0.3702 0.6084 Regional level 0.0130 0.1140
Standard error 0.0187
|
Table lb. Variance components of a multilevel model with variables that refer only to the! individual level Variance Standard deviation Standard error Individual level 0.1127 0.3357 Regional level 0.0025 0.0503 0.0086 The explained variances are for the individual level Rf = 0.696, and for the regional level
=
0.808.
The conditional expectations for the average wages in the 22 districts are also obtained from the fitting of the multilevel model. The highest average wage is paid in Wolfsburg (44 % more than the estimated mean in the sample), the lowest average wage in the rural district of Pirmasens (-12 %). The inclusion of the individual-level explanatory variables reduces the variance component related to regions to a greater extent than the one related to individuals (cf. Table lb). Most of the variation of regional wages can therefore be reduced to different distributions of workers and jobs over the regions. The R 2 defined for groups is larger than R 2 defined for individual respondents. Table 2 shows first of all the results for the estimation of a constant coefficient model using all available variables. Both variables measured at the regional level are included2'. This model is the baseline for a comparison to an application of a multilevel model with random coefficients with the same variables. The R 2 of the OLS estimation is relatively high (0.70). The coefficients have the expected signs and their magnitudes give little cause for surprise. Most variables are significant and omitting the variables that are not has only minor effects on the coefficients and standard errors of the remaining variables. In all two-level models tested the intercept was random and varied over the 22 regions. The coefficients of some variables were treated as random. The variables indicating the qualificational levels and the professional experience showed no significant regional variation. However, significant variation of the gender differences across regions was found. Improvement of the fit can be assessed by the likelihood ratio test statistic (difference of the deviances, cf. Table 2). This is in line with the hypothesis that men and women are found in different segments of the labor markets. The coefficients of the individual-level variables estimated in the multilevel model are very similar to those estimated with OLS. This is due to the fact that OLS estimates are consistent and our sample is large. However, the similarity of the estimates might be partly attributable to the homogeneity of the 22 regions represented in the data. The most important result of the analysis is the corroboration of the wage curve. At the regional level, wages react relatively elastically to variations in the regional unemployment rate: estimated with the multilevel model, every additional percent of unemployment reduces daily income on the average by 1.70%. The coefficients of the unemployment rate and of the urbanrural difference are a little lower than they are in the OLS estimates.
Reconciling Macro and Micro Perspectives by Multilevel Models
277
In the two-level model, the standard errors of the variables measured at the regional level are much larger, though both variables are still significant. The difference in the results is an example of the property of OLS regression to underestimate the standard errors of variables measured at the aggregate level when the disturbances are correlated within groups. This is often the case for respondents belonging to the same social group or the same regional labor market. Thus, a researcher cannot be sure that the standard errors for regional variables obtained by OLS are not biased downward. The conclusions drawn may rely on spurious regressions. Table 3 shows the estimated variance components of the full multilevel model.
Table 2. Estimation with OLS and a two level model with random coefficients using a onepercent sample of the employment statistics of 22 rural and municipal Districts (N=9083); Groups of dummy variables are indicated by a header (Reference categories in parentheses) | Constant coefficient model Random coefficient model estimated with OLS estimated with maximum likelihood Variable Estimate Standard Estimate Standard error error Mean Intercept 4.4988 4.5076 Gender -0.2318 - 0.2448 0.0160 0.0115 Occupational 0.0231 0.0011 0.0230 0.0011 experience Occ. experience -0.0004 0.0000 -0.0004 0.0000 squared Occupational status (Simple blue coll.workers) Apprentice - 1.0764 0.0182 - 1.0738 0.0181 Qualified BCW 0.0122 0.0743 0.0789 0.0122 Foreman 0.2733 0.0301 0.2815 0.0299 Homeworker - 0.8034 0.0879 - 0.7836 0.0876 Female higher WCW 0.0815 0.0565 0.0781 0.0563 Female lower WCW 0.0506 0.0179 0.0544 0.0178 Male WCW 0.1879 0.0185 0.1953 0.0185 Working time < 19h - 0.9952 0.0283 - 0.9943 0.0283