227 10 6MB
English Pages 305 [299] Year 2018
The Unit Problem and Other Current Topics in Business Survey Methodology
The Unit Problem and Other Current Topics in Business Survey Methodology Edited Proceedings of the European Establishment Statistics Workshop 2017
Edited by
Boris Lorenc, Paul A. Smith, Mojca Bavdaž, Gustav Haraldsen, Desislava Nedyalkova, Li-Chun Zhang and Thomas Zimmermann
The Unit Problem and Other Current Topics in Business Survey Methodology Edited by Boris Lorenc, Paul A. Smith, Mojca Bavdaž, Gustav Haraldsen, Desislava Nedyalkova, Li-Chun Zhang and Thomas Zimmermann This book first published 2018 Cambridge Scholars Publishing Lady Stephenson Library, Newcastle upon Tyne, NE6 2PA, UK British Library Cataloguing in Publication Data A catalogue record for this book is available from the British Library Copyright © 2018 by Boris Lorenc, Paul A. Smith, Mojca Bavdaž, Gustav Haraldsen, Desislava Nedyalkova, Li-Chun Zhang, Thomas Zimmermann and contributors All rights for this book reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, without the prior permission of the copyright owner. ISBN (10): 1-5275-1661-X ISBN (13): 978-1-5275-1661-8
CONTENTS Preface ............................................................................................... vii The European Network for Better Establishment Statistics ................ ix 1.
Introduction .................................................................................. 1 Paul A. Smith
2.
The unit problem: an overview ..................................................... 7 Paul A. Smith, Boris Lorenc and Arnout van Delden
3.
The unit problem from a statistical business register perspective .................................................................................. 19 Roland Sturm
4.
How to improve the quality of the statistics by combining different statistical units? ............................................................ 31 Olivier Haag
5.
Issues when integrating data sets with different unit types ......... 47 Arnout van Delden
6.
Producing business indicators using multiple territorial domains....................................................................................... 65 Daniela Ichim
7.
Improving the efficiency of enterprise profiling ......................... 79 Johan Lammers
8.
The impact of profiling on sampling: how to optimise sample design when statistical units differ from data collection units............................................................................ 91 Emmanuel Gros and Ronan Le Gleut
9.
Coping with new requirements for the sampling design in the survey of the service sector ................................................. 107 Thomas Zimmermann, Sven Schmiedel and Kai Lorentz
10. Sampling coordination of business surveys at Statistics Netherlands ............................................................................... 127 Marc Smeets and Harm Jan Boonstra
vi
11. Sample coordination and response burden for business surveys: methodology and practice of the procedure implemented at INSEE ............................................................. 139 Emmanuel Gros and Ronan Le Gleut 12. Response processes and response quality in business surveys ...................................................................................... 155 Gustav Haraldsen 13. Paradata as an aide to questionnaire design .............................. 177 Jordan Stewart, Ian Sidney and Emma Timm 14. Studying the impact of embedded validation on response burden, data quality and costs ................................................... 199 Boris Lorenc, Anders Norberg and Magnus Ohlsson 15. Adaptations of winsorization caused by profiling .................... 213 Arnaud Fizzala 16. Big data price index .................................................................. 229 Li-Chun Zhang 17. Analysis of scanner data for the consumer price index at Statistics Canada ....................................................................... 237 Catherine Deshaies-Moreault, Brett Harper and Wesley Yung 18. Surveys on prices at the Statistical Office of the Republic of Slovenia .................................................................................... 253 Mojca Noþ Razinger 19. An overview of data visualisation ............................................ 267 José Vila, José L. Cervera-Ferri, Jorge Camões, Irena Bolko and Mojca Bavdaž Index ................................................................................................ 285
PREFACE During preparations for the 2017 European Establishment Statistics Workshop (EESW17), an opportunity was offered to the EESW17 Scientific Committee to produce a Proceedings volume. As books dedicated to methodology for producing business statistics are few and tend to appear rarely, the committee decided to explore this possibility further by consulting the potential authors. We received an overwhelmingly positive response, which settled our course through an intensive period of work in late 2017 and the first half of 2018. The result is this edited Proceedings of selected, considerably revised papers from EESW17, which we – the chapters’ authors and volume’s co-editors – are now proud to present to a wider audience of business statistics methodologists and practitioners. The first chapter of the volume, Introduction, is a conspectus of the book’s content: an overview of the current topics in business statistics methodology treated in the chapters. We trust these topics will be of relevance to other colleagues too. By placing the chapters side by side, some trends emerge. Among these, the unit problem is a major new one. In a belief that it is a technical problem only, solvable by proper regulation, the problem – both its formulation and development of methodologies for addressing its components – have been put aside for decades. However, the five chapters that address the unit problem here explicitly signal that it is time to acknowledge the unit problem as one of the constituent error sources in the Total Survey Error framework and start addressing it properly. Several further papers report changes in methods to deal with changes in units models. Other trends include the greater use of alternative data sources, seen particularly in the papers on price indices, the development of coordinated sampling practices, and (in one summary paper) the use of data visualisation to assist with the dissemination of statistical outputs. In their work towards publishing the volume, the co-editors – who were also members of the Scientific Committee of EESW17 – have had the great pleasure to work with a group of talented, able and patient authors who have worked hard on improving the chapters’ content through repeated revisions. We are grateful to all the authors for their endurance and accomplishment.
viii
We look forward to meeting our readers at the next EESWs and hope that the chapters in this volume provide a timely input for further developments of business statistics methodology and its practices. Boris Lorenc Paul Smith Mojca Bavdaž Gustav Haraldsen Desislava Nedyalkova Li-Chun Zhang Thomas Zimmermann
June 2018
THE EUROPEAN NETWORK FOR BETTER ESTABLISHMENT STATISTICS The European Network for Better Establishment Statistics (ENBES) is the network which organises European Establishment Statistics Workshops. ENBES was launched in 2009 at a meeting during the New Techniques and Technologies for Statistics Conference in Brussels. ENBES is dedicated to improving cooperation and sharing knowledge on theory, methodology and practices within European establishment statistics. “Establishment” is used with a similar definition to that in the International Conference on Establishment Surveys (ICES) conference series to refer to businesses, farms, hospitals, schools and other similar institutions, for which there was no pre-existing generic term in English (Cox and Chinnappa, 1995). ENBES encourages cooperation on and development of the methods and practices for enterprise statistics, primarily through a biennial series of multi-day workshops: 2009 in Stockholm, 2011 in Neuchâtel, 2013 in Nuremberg, 2015 in PoznaĔ, and 2017 in Southampton. It also holds occasional one-day workshops, which are usually devoted to a specific theme. Since 2013 ENBES has been affiliated to the Royal Statistical Society in the UK, whose continuing organisational support has allowed it to operate without the need for a formal organisational registration. ENBES’s biennial workshops provide an opportunity for discussion of the methodologies and practices of business statistics, covering a variety of different topics. They are designed to allow more space for development of ideas and discussions than a traditional conference, to encourage progress in business statistics through the synergy of the participants. The chapters in this volume derive from the ENBES workshop which took place from 30 August to 1 September 2017 in Southampton, UK, and have allowed the authors an opportunity to further develop the ideas which were presented there.
Reference Cox, B.G. and Chinnappa, B.N. (1995). Unique features of business surveys. In Cox, B.G., Binder, D.A., Chinnappa, B.N., Christianson, A., Colledge, M.J. and Kott, P.S. (eds.), Business survey methods, pp1-17. New York: Wiley.
CHAPTER 1 INTRODUCTION PAUL A. SMITH1
Abstract This chapter introduces the topics of the different chapters and sets them in the context of current views of business survey methodology and practice. The chapter also highlights how topics are related to each other and how they demonstrate the features which are peculiar to business surveys.
The 2017 instance of the European Establishment Statistics Workshop (EESW17) consisted of 35 contributed papers and three posters. In comparison to a large conference these numbers are modest, however the contributions covered a broad range of themes within business statistics. The classical themes stretched from the design and sampling for business – and more broadly, establishment – surveys, through data collection, and data editing to estimation. Additional themes covered data provideroriented and user-oriented matter like response burden management and communication with data providers and with the users of statistics. Further themes involved aspects of modernisation of systems for business statistics production, the impact of new data sources on production of indices (such as the consumer price index), and some considerations regarding nonresponse in business surveys. And, most relevantly, as is reflected in the title of this volume, the workshop covered the emerging theme of the unit problem in business and establishment statistics. Unlike a large conference, the EESW series is tailored to enable considerable feedback from colleagues after presenting a paper, and thus enables greater exchange among peers. The papers included here have 1
S3RI, University of Southampton, Highfield, Southampton, SO17 1BJ, UK. Email: [email protected].
2
Chapter 1
benefitted from this feedback and discussion. In the remainder of this chapter, we introduce the topics of the different chapters and sets them in the context of current views of business survey methodology and practice. Taken together, the chapters reflect the current advances in establishment statistics methodology as pursued by European business survey methodologists. Chapter 2 gives an overview of the unit problem: the issues that emerge when the type of units used to produce statistics differs from the type of unit that the statistical concept is intended to represent. This has several aspects, including when the unit type to calculate the statistics is different from the unit type on which the data are collected, when unit types are in a hierarchical relation, and when different data sources need to be merged to improve data content and to produce more detailed statistics. Chapters 3 to 7 all describe approaches to deal with aspects of the unit problem, particularly the delineation of businesses. This activity is fundamental to the construction of business statistics, where we need to know what units we are trying to construct statistics about. The availability of a business register is a key feature that sets business surveys apart from social surveys (where generally a corresponding register doesn’t exist, except in a few countries with a longstanding tradition of population registers). Such a register is based on administrative data usually derived from tax and/or employment registration processes. The business register lists all the businesses that are known to be present in a country, and usually contains a limited amount of information on their characteristics, particularly their size. Such a list supports detailed sampling designs, which are important since it is necessary to include all the largest businesses in a survey in order to obtain accurate results. Many, particularly larger, businesses have complex structures, so the delineation of businesses is needed to support this detailed sampling by distinguishing separate businesses. This process is dependent on a units model which describes how the different units which make a business are treated in the register. This could lead to different statistical outputs from different units models, and this is the essence of one element of the Unit Problem, discussed in more detail in Chapter 2. Additional aspects of the unit problem include understanding the intended unit structure at data collection, and communicating statistics to users, where the conceptual interpretation may be different to the interpretation ‘needed’ by the user). The European Union had attempted to update the definition of enterprise within the units regulation, but there was no agreement to change and therefore the original definition is now being implemented. In Chapter 3, Sturm discusses this implementation in Germany and how this relates to
Introduction
3
the structures and profiling activities within the German business register. In Chapter 4 Haag discusses a similar revision of the French business register as part of a change to a system with statistics based on the enterprise, but where the legal unit is retained as the observation unit. (This leads later in Chapters 8 and 15 to further developments of the sampling design and outlier processing, changes which are needed to support this approach.) Chapter 5 also discusses units, but goes further to consider the whole system of statistical data integration when there are inputs derived from different units. This is a particularly important issue when there are multiple data sources with different underlying units models, as it underpins the ability to put all these sources together to develop new outputs and realise efficiencies in statistical production. Chapter 6 also deals with the challenge of making estimates using different unit definitions, more specifically how to make regional estimates according to several different definitions of “region”, derived from different regional classifications of unit structures. Chapter 7 also addresses profiling, but examines the efficiency of the profiling activities, and how they can achieve the most value for the least cost. Following the order of processing of a business survey, we then consider the process of sample design, which continues to be an important one for business surveys. Two chapters deal with sample design challenges: chapter 8 continues the story of the implementation of the business units regulation in France by designing a cluster-based business survey where enterprises (clusters of legal units) are selected at the first stage, and then all the legal units within selected enterprises are surveyed as the observation units. Cluster designs in business surveys are unusual, although there are other similar cases of 1-stage designs (two-stage cluster designs are much rarer), and this is therefore an interesting example. In Chapter 9 Zimmermann et al. investigate changes to a sample design brought about by a court decision on equal treatment which is at odds with efficient sample design. This mismatch between design considerations and the interpretation of the legal framework is a lesson in itself of the unintended consequences that can arise in setting up a national statistical system. The immediate need is for a design with minimum (and justified) use of takeall strata, but with as small an effect on accuracy as possible; a change to the estimator helps to retrieve some of the lost accuracy. Longer term it would be sensible to seek an appropriate update to the legal framework. Sample coordination is a generic term for ensuring that surveys (or different occasions of the same survey) either have some units in common (positive coordination) or that the same units are not included (negative
4
Chapter 1
coordination). The former gives more accurate estimates of changes, and the latter a fairer distribution of the burden of completing questionnaires, and there is some trade-off between these extremes in any system. There is often some negative coordination in social surveys, but there the large size of the population means that its impact is usually negligible. Business surveys, however, often cover the same population and need to include high sampling fractions. And they may have rotating designs without fixed waves. Therefore sample coordination has been an important topic in business surveys, often implemented through a permanent random number based system (Ohlsson 1995). Here Chapters 10 and 11 describe the methods and implementations of coordinated sampling in the Netherlands and France respectively. These are interesting case studies, since coordination has many varieties and implementations (see also Ernst et al. 2000, Lindblom 2003, Nedyalkova et al. 2009) and there is no recent review of the aims and approaches in this topic area. A series of chapters deals with the process of obtaining information from businesses. Chapter 12 categorises respondents to a business survey questionnaire by their information retrieval process, and uses this to investigate the effect of business complexity and size on the response process, specifically the quality of the responses and the burden of responding. Chapter 13 discusses the challenges of converting a paper questionnaire to a web questionnaire, and in particular deals with ways in which the paradata collected in the process of administering the web version can feed back to improvements in the questionnaire design. The authors use a classification of the ways in which questionnaires are completed to help in this analysis. Chapter 14 examines experimentally the impact of increasing the number of embedded edit checks in an on-line questionnaire (which already contains some such edits). It shows minimal impact on data quality and efficiency for a small increase in embedded edits, but further testing over a wider range of numbers of embedded edits is needed. The main message from this work is the need for cognitive survey methodologists to focus much more on aspects of embedded editing. Chapter 15 is related to the change to the enterprise as the basic unit for business statistics in France, and examines how to implement winsorisation in this new sampling context. There are also challenging data collection problems in surveys of prices as the basis of price indices, and complex clustered sample designs are sometimes needed to make such collections practical. Many National Statistical Institutes are investigating the possibility of using alternative data sources to supplement or replace these, and three chapters deal with this topic. Chapter 16 provides a strategic overview of the challenges of
Introduction
5
using alternative data sources for prices. Chapter 17 describes an exploratory study using a variety of methods of index calculation with scanner data from a restricted range of products in Canada. The extra possibilities from the availability of weighting information at lower levels and higher frequencies are interesting, but do not always have the desired effect on the indices, and may just add noise. Chapter 18 provides an overview of price collections in Slovenia and how they are moving to more cost efficient procedures, particularly how agreements with retailers for the provision of scanner data are leading to these being introduced to the main consumer price index calculation. Finally Chapter 19 summarises several presentations from the ENBES workshop that dealt with the topic of data visualisation. This has clear applications in business statistics, but is a general and much wider subject. The way in which users interact with visualisations, and how they can be used to convey important information and, particularly, stories in the data, is the subject of some empirical investigation. These experiments are also related to existing research on user interactions with graphics, and some commentary on the opportunities and challenges of modern devices. In summary, this volume presents a range of recent developments in business statistics, many related to aspects of the Units Problem and providing additional information to develop the assessment of quality in relation to the choice of units. A number of current research areas are represented, and some future avenues are suggested. We hope that these contributions will also act as a spur to further research on the methodology for establishment statistics and its application to solve real problems and challenges. If we inspire some activity then ENBES will be continuing to achieve its objectives for promoting knowledge sharing and cooperation in this important area. So if you find these topics interesting we encourage you to participate in ENBES and its events and activities at www.enbes.org.
References Ernst, L.R., Valliant, R. and Casady, R.J. (2000). Permanent and collocated random number sampling and the coverage of births and deaths. Journal of Official Statistics 16, 211-228. Lindblom, A. (2003). SAMU – The System for Coordination of Frame Populations and Samples from the Business Register at Statistics Sweden. Background Facts on Economic Statistics 2003:3, Statistics Sweden. Available at: http://www.scb.se/statistik/OV/AA9999/ 2003M00/X100ST0303.pdf (accessed 6 April 2018).
6
Chapter 1
Nedyalkova, D., Qualité, L. and Tillé, Y. (2009). General framework for the rotation of units in repeated survey sampling. Statistica Neerlandica 6, 269-293. Ohlsson, E. (1995). Coordination of samples using permanent random numbers. In Cox, B.G., Binder, D.A., Chinnappa, B.N., Christianson, A., Colledge, M.J. and Kott, P.S. (eds.), Business Survey Methods, pp. 153-170. New York: Wiley.
CHAPTER 2 THE UNIT PROBLEM: AN OVERVIEW PAUL A. SMITH1, BORIS LORENC2 AND ARNOUT VAN DELDEN3
Abstract This chapter introduces unit error and the unit problem and gives some historical background. It then sets the unit error and unit problem within the contexts of two familiar survey methodology frameworks: the European Statistical System (ESS) quality dimensions and the Generic Statistical Business Process Model (GSBPM). To exemplify different aspects of the unit problem, connections to the chapters in this collection that treat the unit problem are made.
1. Introduction This chapter introduces unit error and the unit problem and gives some historical background, as well as referring to some of the latest developments in measuring and understanding its effects on survey quality. The unit problem, work on which has been encouraged by ENBES over several years, is a statistical quality issue more pronounced in business statistics due to businesses’ complex organisational structures. After defining the unit error in Section 2 and giving some historical background in Section 3, the chapter proceeds to review the unit error and unit problem 1
S3RI, University of Southampton, Highfield, Southampton, SO17 1BJ, UK. Email: [email protected] 2 Bright Lynx LLC, Vaarika 1-1, 10614 Tallinn, Estonia. Email: boris.lorenc@ blresearch.ee 3 Statistics Netherlands, P.O. Box 24500, 2490 HA The Hague, The Netherlands. Email: [email protected]
8
Chapter 2
within the contexts of two by now familiar survey methodology frameworks: the European Statistical System (ESS) quality dimensions in Section 4 and the Generic Statistical Business Process Model (GSBPM) in Section 5. To exemplify different aspects of the unit problem, we make connections to the chapters in this collection that treat the unit problem.
2. The unit problem The unit problem is based on the unit error (Zhang, 2011; van Delden et al., 2018), which refers to errors in a statistical output that are caused by deviations from an ideal case in identification, characterisation and delineation of the units and in establishing relationships between the units relevant for producing a desired statistical output. With that starting point, the unit problem refers to the challenges and obstacles to understanding of the unit error and to efforts to deal with it. The unit problem therefore indicates a paucity of investigation of the unit errors. A number of chapters in this volume, the statement paper “On the Unit Problem in Business Statistics” prepared for EESW17 (Lorenc et al., 2017), and the van Delden et al. (2018) letter to the editor of the Journal of Official Statistics, are contributions towards initiating a change to this paucity by focusing the methodological attention of survey statisticians on the unit error. Awareness of unit errors and the unit problem is not new. In the opening chapter of a compilation containing a selection of edited papers from the first major international conference on business surveys held in 1993, Cox and Chinnappa (1995) reviewed a number of issues related to units that are treated in subsequent chapters of that volume. The issues include: that unit types in business surveys are not natural and often defined for the purposes of the statistics produced; that the hierarchy of units in business surveys can be complex, including criteria such as location and legal and administrative structures; that actual business structures are difficult to relate to units for sampling and reporting, indicating possible issues with availability of data and the correctness of the collected data; and that business populations are extraordinarily dynamic (Cox and Chinnappa, 1995). As pointed out by Pietsch (1995) in the same volume, even in the ideal case of no conceptual and operational mismatch in unit type definition and application, the fragmentary approach taken by surveys in gauging businesses’ operations (different areas of business operations targeted, different unit types, different reference periods, different data providers, etc), contributes to the variability in the statistical output produced, due in part to issues now referred to as the unit error.
The unit problem
9
Meeting annually since 1986, what is now the Wiesbaden Group on Business Registers had already become well established as the body treating issues relating to units by the early 1990s. Struijs and Willeboordse (1992) reported on a survey of the state of affairs regarding units at that time. From the distant perspective of 2018, it can perhaps be said that the low visibility of survey methodology and sampling theory in the management of units in business registers, or more generally in treating the impact of what we now call the unit error on produced statistics, has characterised the whole period up until the 2010s.
3. Broader approaches Only more recently have more encompassing approaches emerged. Based on the survey lifecycle model (Groves et al., 2004, Fig. 2.5) reflecting the context of directly collected data from individuals and households, Bakker (2013) and Zhang (2012) have developed two-phase life cycle models of integrated statistical microdata for the production of statistics using administrative and survey data, thus spanning several “lifecycles” of data. Around the same time Zhang (2011) introduced the unit error in the context of household surveys. Business registers are not created for the purpose of conducting a single survey, and can therefore be seen as an earlier stage within a specific survey’s lifecycle. Creation of units when they enter the business register can lead to unit errors, so the two-phase perspective applies even here. Van Delden (2018, Chapter 5 in this volume) highlights this approach. Lorenc et al. (2017) and van Delden et al. (2018) express the need to address the unit error and unit problem in a methodological way, for which they argue that the Total Survey Error (TSE) approach is the most appropriate. The unit error then gets acknowledged as one among the other – more well-known – types of errors associated with the production of statistics, such as sampling error, nonresponse error, measurement error, and so on. Within that framework, part of the challenge is to obtain estimates of the effect of the units model on the statistical outputs. Ichim (2018, Chapter 6 in this volume) implicitly provides such estimates, as the difference between using different unit types in regional estimation, and some micro-level comparison of the effect of switching units in employment surveys in the UK is presented in Smith et al. (2003). This is helpful, but how to integrate these estimates into a survey error framework is an open problem. Should every statistic in principle include an estimate of the “unit model error” which characterises how much the outputs could vary under a range of different unit models? That would be a complex set
10
Chapter 2
of calculations, which could only be done over a small number of case studies – both a small number of statistics and a small number of unit models – because of the effort required to define units and recast collected data to those units. Even to use a single ideal measure would be resourceintensive, and it is far from clear in many situations what the ideal unit structure is. However, when done, such studies can indicate the contribution of the unit error to the uncertainty of the produced statistics. For instance, the study by INSEE referred to in Haag (2018, Chapter 4 in this volume) shows that when legal units are chosen to present the large enterprises’ share of exports from France that share is 22%. But, when the enterprise is the unit on which the statistics are calculated, that share is 52%. It also shows that important breakdowns of variables like turnover and exports can vary substantially – there are differences of up to 5 percentage points in the breakdown of turnover between NACE sectors and 30 percentage points in the breakdown of exports between small and large units (which, although a big difference, is perhaps less surprising because of the way large businesses are typically built up of multiple smaller units). This uncertainty due to unit choice is much larger than the usual sampling error in business surveys and thus deserves much more research than it has received thus far. Therefore, in spite of being an additional effort, studies to estimate the impact of the choice of unit type – a component of the unit error – are much needed. The European approach to the unit problem has been to harmonise (initially with limited take-up, but now more stringently) on one particular units model (Eurostat, 2014) so that statistics are comparable. But this does not give any indication of how important the model is in defining the outputs. The model in itself allows inconsistencies to arise, as discussed among others by Sturm (2015) and van Delden et al. (2018), contributing to the variability in statistical outputs that use this model. Further, as Sturm (2018, Chapter 3 in this volume) discusses, the stringent harmonisation goes down to a certain level, but still leaves open a wide variety of choices below that level in implementing the particular units model, leading to further deviations and variability. Measurement errors can contribute to the unit error in two ways. At one level, the unit error could be considered to be a measurement problem, since it is necessary to define the object of interest before attempting to collect data from it, and any mismatch between the objective and what is actually collected will be a type of measurement error. Nonetheless, the unit problem itself is wider, because it also encompasses how the object of interest is defined. The second contribution of measurement error is in the
The unit problem
11
variables from which the units are profiled (profiling is how the units model is implemented). Any errors in the variables on which profiling is based may cause errors in the unit structure, even if the model is conceptually perfect. Studies of data collection in business statistics have in recent years tried to understand how the data collection process within companies works, and whether one can make adjustments to this process to possibly reduce the impact of unit errors or at least to trace their effects. Haraldsen (2018, Chapter 12 in this volume) is an example of such a contribution. Haraldsen (2013) gives an overview of practical problems affecting the quality of data while conducting business surveys, where many of the examples arise as a result of unit error. In the remainder of this overview, we consider the unit problem in relation to the European Statistical System’s standard quality dimensions (Eurostat 2015) (and these thoughts can be straightforwardly adapted to other quality classifications), and also to the Generic Statistical Business Process Model (GSBPM) (UNECE 2013).
4. ESS quality dimensions perspective The ESS uses five dimensions of output quality, and we consider each in turn: Relevance, assessment of user needs and perceptions. This is a challenging component because enterprise statistics have a wide range of uses and users. To begin with, the National Accounts (NA) by tradition have been a very important user, and their whole-economy coverage has promoted the use of a consistent units model. However, it has been noted that in practice different countries use kindof-activity units, enterprises or even enterprise groups as the basic statistical unit underlying their supply and use tables. In the institutional sector accounts, some countries use legal units as the best approximation of institutional units (the unit type used by NA), while other countries apply enterprises or enterprise groups as being equivalent to institutional units. This is said to have a clear impact on the national accounts aggregates (OECD, 2016). The growing importance of the input-output framework to underpin NA leads to a wish for more detailed outputs by product. This level of disaggregation makes the units model less critical, since the more detailed components, once estimated, can be added up in any way as necessary. However, a prerequisite for such detailed outputs is that the required data are available. A rich source of such data can be found in the actual
12
Chapter 2
transactions in business administrations. The problem is however, that not all businesses classify their transactions in the same way. The use of a central, reference code set to which different business classifying systems are mapped, might be a solution to this problem (Buiten et al., 2016). There is also the issue that a readiness to make these transactions available to external entities (NSIs included) is not always to be counted on, as well as that providing data is a burden on businesses (Haraldsen, Chapter 8). Other users require different breakdowns, such as regional or by country of ownership, or by any of myriad other variables. Where such variables are clearly defined by the unit model and the relevant data are available from a survey or other source, the outputs should be directly estimable and relevant, but where they are not we may expect a mismatch between the actual estimates and some conceptual true value, diminishing the relevance of the statistical outputs. There is also a component of relevance related to user perceptions and the choice of the model itself. The model must be close enough to perceptions of reality that its use to describe business structures (and the inevitable approximations) is accepted by most users as a satisfactory representation for statistical purposes. There is an aspect of communication of statistics involved in this, at two stages in the production process: (i) at the data collection stage, whether the intended concepts related to units and their characterising variables have been understood correctly by the data provider, and (ii) similarly at the dissemination stage, whether the declared concepts related to units have been understood correctly by the users (including whether the user understands the uncertainty that may emerge at the data collection stage). This comes down to whether the statistics producer can vouch that the intended concepts regarding units have been understood and applied correctly during data collection, and whether any remaining uncertainty has been communicated clearly to the user. Accuracy and reliability. There are several components of the application of the units model which contribute to accuracy. One is in the choice of the model itself, which leads to a “units model assumption error”. A second relates to the data requirement for implementation of the model (and its timeliness, see below), and therefore whether the profiling of a business using that model is done correctly or with error. There are therefore two components of accuracy, one related to the range of values achievable from different unit models, which is a kind of variance, and one related to the difference between what is measurable and the target concept, a kind of bias and closely related to relevance. Regarding NA, when the unit type that is chosen for compiling NA differs from the unit type that is chosen for the data that are used to compile
The unit problem
13
NA, which is currently the case, this not only affects coherence but also the accuracy (variance). The input data that enter NA need modifications to make them suitable for the NA unit type. The NA unit type is more ‘homogeneous in economic activity’ than its input, to enable the production of a detailed input-output table. So, in compiling the NA, data of the largest (inhomogeneous) companies are adjusted to fit the NA unit type, which likely affects accuracy. Such adjustments are usually done when a company has shifts in their composition, when it splits off part of its legal units for instance. Timeliness and punctuality. Business populations are dynamic, and the processes of collecting information and using it to profile a business in line with a model are at best contemporaneous and at worst occur with a long lag. Therefore, the units model could be correctly implemented with current evidence, but the evidence itself may be out of date. There is some published information on lags in updating registers (Hedlin et al. 2001, Smith and Perry 2001) but more work in this area and how it relates to applications of units models and profiling would be beneficial. Accessibility and clarity. The profiling and units models are not in general publicly available, so they do not score highly on this quality dimension. There may be a case for more openness about such structures, in part as a way to obtain feedback from users and researchers on how well the models work in different situations. Coherence and comparability. A consistent units model across the whole economy promotes coherence within the national accounts, and similarly consistent models in different countries promote comparability of models, methods and outputs across the international statistical system. How to measure this comparability is however challenging – one possibility is to relate it to the accuracy and relevance dimensions by considering how great the mismatch in statistics would be under alternative models. However, as discussed earlier, this is likely to be an expensive type of case study because of the need to recode business structures onto an alternative model. Some elements can be treated in the same way as classification updates, since a new units model effectively applies a different classification of units; see Smith and James (2017) for a review of reclassification approaches in official statistics. Lorenc et al. (2017) also consider cost and response burden, which are important considerations, although not quality components in the ESS framework. There is a cost to the statistical system (e.g. of profiling), which is directly affected by the operational features (and Lammers (2018, Chapter 5 in this volume) considers the efficiency of this process in the Netherlands), but can ultimately be attributed to the conceptualisation. The
14
Chapter 2
response burden is also strongly affected by the conceptualisation in that it drives whether the data that exist in businesses’ accounting systems or administrative sources can be used. Finally, although there are many complications in the unit model, van de Ven (2018) makes an appeal that the approach to units should be as simple and practical as possible – for example, so that data linkage can be undertaken straightforwardly.
5. GSBPM perspective The GSBPM (UNECE 2013) codifies the steps in the design and processing of official statistics, and it is interesting to consider in which stages the units model has the greatest impact. The level 1 processes in the GSBPM are shown in Fig. 2-1 (there is a more detailed subprocess level, not shown). qualitymanagementmetadatamanagement specify needs
design
build
collect
process
analyse
disseminate
evaluate
Figure 2-1: Level 1 (high level) processes in the GSBPM (redrawn from UNECE 2013).
The unit model has most effect in the early stages of the GSBPM, since it provides a foundation for establishment statistics. Specify needs and Design are closely related to the relevance dimension of quality, where the chosen unit model needs to be capable of supporting a range of different user requirements, in as consistent and comparable a way as possible. These elements therefore imply appropriate decisions over the choice of unit model. Build means that the methods and software to support the implementation of the units model need to be constructed, and these are populated with information derived from the Collect processes. The data are then Processed to implement the units model, which may include some additional stages such as imputing for missing data. The subsequent stages rely on these underpinnings, but do not directly relate to the unit model. The two overarching processes, for Quality and Metadata Management are however important – the assessment of the quality impacts of the choice of the unit model is the essence of the unit problem (van Delden et al., 2018), and it is vital to document the sources of evidence and decisions which are taken to apply the unit model structures for a particular business, so that these can be revisited when necessary.
The unit problem
15
There is a feedback loop in the GSBPM where the quality of the outputs is Evaluated, and used to improve the processing steps in the next cycle. To some extent it is this feedback loop which has not been clearly implemented in business statistics, leading to the situation where the unit problem is clearly an area of weakness. But with the extra attention being brought to bear on this issue, we can hope that progress on improving the units model and the quality measures that describe its effects can be made.
References Bakker, B.F.M. (2013). Micro-integration: State of the art. In Report on WP1 of ESSnet on Data Integration. Available at: https://www.istat.it/ it/files/2013/12/FinalReport_WP1.pdf (accessed 2017-07-04). Buiten, G., Boom, R. van den, Roos, M. and Snijkers, G. (2016). Issues in automated financial data collection in the Netherlands. Proceedings of the Fifth International Conference of Establishment Surveys, June 20-23, 2016, Geneva, Switzerland. Virginia: American Statistical Association. Cox, B.G. and Chinnappa, B.N. (1995). Unique features of business surveys. In Cox, B.G., Binder, D.A., Chinnappa, B.N., Christianson, A., Colledge, M.J. and Kott, P.S. (eds.), Business survey methods, pp1-17. New York: Wiley. van Delden, A. (2018). Issues when integrating data sets with different unit types. In Lorenc, B., Smith, P.A., Bavdaž, M., Haraldsen, G., Nedyalkova, D., Zhang, L.-C. and Zimmermann, T. (2018) (eds.). The unit problem and other current topics in business survey methodology, pp 47-63. Newcastle upon Tyne: Cambridge Scholars. van Delden, A., Lorenc, B., Struijs, P. and Zhang, L.-C. (2018). Letter to the Editor: On statistical unit errors in business statistics. Journal of Official Statistics 34, 573-580. ENBES (2014) ENBES Workshop “The Unit Problem in Business Statistics Methodology” held in Geneva, Switzerland, on November 10th 2014. Available at https://statswiki.unece.org/download/ attachments/126353571/ENBES%20Workshop%20Unit%20Problem %20Summary_20150220_logo.pdf?version=1&modificationDate=14 77298058445&api=v2 (accessed 2018-04-03). Eurostat (2014). The statistical units model (Version: 15 May 2014), version presented to the Business Statistics Directors Group Meeting 24 June 2014. Eurostat (2015) ESS handbook for quality reports, 2014 edition. Luxembourg: Publications Office of the European Union.
16
Chapter 2
Groves, R.M., Fowler Jr., F.J., Couper, M., Lepkowski, J.M., Singer, E. and Tourangeau, R. (2004). Survey methodology. New York: Wiley. Haag, O. (2018). How to improve the quality of the statistics by combining different statistical units? In Lorenc, B., Smith, P.A., Bavdaž, M., Haraldsen, G., Nedyalkova, D., Zhang, L.-C. and Zimmermann, T. (2018) (eds.). The unit problem and other current topics in business survey methodology, pp 31-46. Newcastle upon Tyne: Cambridge Scholars. Haraldsen, G. (2013). Quality issues in business surveys. In Snijkers, G., G. Haraldsen, J. Jones, and D.K. Willimack, Designing and conducting business surveys, pp 83-125. Hoboken, New Jersey: Wiley. Haraldsen, G. (2018). Response processes and response quality in business surveys. In Lorenc, B., Smith, P.A., Bavdaž, M., Haraldsen, G., Nedyalkova, D., Zhang, L.-C. and Zimmermann, T. (2018) (eds.). The unit problem and other current topics in business survey methodology, pp 155-176. Newcastle upon Tyne: Cambridge Scholars. Hedlin, D., Pont, M.E. and Fenton, T.S. (2001) Estimating the effects of birth and death lags on a business register. In ICES-II: Proceedings of the Second International Conference on Establishment Surveys. Contributed Papers (CD), pp 1099-1104. Virginia: American Statistical Association. Ichim, D. (2018). Producing business indicators using multiple territorial domains. In Lorenc, B., Smith, P.A., Bavdaž, M., Haraldsen, G., Nedyalkova, D., Zhang, L.-C. and Zimmermann, T. (2018) (eds.). The unit problem and other current topics in business survey methodology, pp 65-78. Newcastle upon Tyne: Cambridge Scholars. Lorenc, B., van Delden, A., Struijs, P. and Zhang, L.-C. (2017). Statement on the unit problem in business statistics. Paper written for the European Establishment Statistics Workshop 2017, 30 August – 1 September, 2017, Southampton, UK. Available at: https://statswiki. unece.org/download/attachments/122325493/ENBES%20Unit%20 Problem%20Statement.pdf?version=1&modificationDate=150037050 1222&api=v2 (accessed 2018-04). OECD (2016). Reassessment of the role of the statistical unit in the System of National Accounts. Prepared for the Fifteenth session of the Group of Experts on National Accounts, Geneva, 17-20 May 2016. Available at: https://www.unece.org/fileadmin/DAM/stats/documents/ece/ces/ ge.20/2016/ECE.CES.GE.20.20_OECD.pdf (accessed 2018-06-13).
The unit problem
17
Pietsch, L. (1995). Profiling large businesses to derive frame units. In Cox, B.G., Binder, D.A., Chinnappa, B.N., Christianson, A., Colledge, M.J. and Kott, P.S. (eds.), Business survey methods, pp. 101-114. New York: Wiley. Smith, P.A. and James, G.G. (2017). Changing industrial classification to SIC (2007) at the UK Office for National Statistics. Journal of Official Statistics 33, 223-247. Smith, P. and Perry, J. (2001). Surveys of business register quality in central European countries. In ICES-II: Proceedings of the Second International Conference on Establishment Surveys. Contributed Papers (CD) pp 1105-1110. Virginia: American Statistical Association. Smith, P., Pont, M. and Jones, T. (2003). Developments in business survey methodology in the Office for National Statistics, 1994-2000 (with discussion). Journal of the Royal Statistical Society, Series D 52, 257295. Struijs, P. and Willeboordse, A. (1992). Terminology, definitions and use of statistical units. 7th Round Table on Business Registers, Copenhagen 12-16 October 1992. Available at https://circabc.europa.eu/sd/a/ f975e6c9-4a3c-44c4-b750-eae48cb4efb6/Terminology%252c%20 Definitions%20and%20Use%20of%20Statistical%20Units.pdf (accessed 18 Jun 2018). Sturm, R. (2015). Revised definitions for statistical units–methodology, application and user needs. The main conceptual issues of the “units discussion” of the years 2009–2014. Statistika 95, 55-63. Sturm, R. (2018). The unit problem from a statistical business register perspective. In Lorenc, B., Smith, P.A., Bavdaž, M., Haraldsen, G., Nedyalkova, D., Zhang, L.-C. and Zimmermann, T. (2018) (eds.). The unit problem and other current topics in business survey methodology, pp 19-30. Newcastle upon Tyne: Cambridge Scholars. UNECE (2013) Generic Statistical Business Process Model GSBPM (Version 5.0). Geneva: UNECE. Available at: http://www1.unece.org/ stat/platform/display/GSBPM/GSBPM+v5.0 (accessed 16 April 2018). van de Ven, P. (2018). Economic statistics: how to become lean and mean? Journal of Official Statistics 34, 309–321. Zhang, L.-C. (2011). A unit-error theory for register-based household statistics. Journal of Official Statistics 27, 415-432. Zhang, L.-C. (2012). Topics of statistical theory for register-based statistics and data integration. Statistica Neerlandica 66, 41-63.
CHAPTER 3 THE UNIT PROBLEM FROM A STATISTICAL BUSINESS REGISTER PERSPECTIVE
ROLAND STURM1
Abstract The chapter builds on work in progress in Germany on implementing profiling as a component in regularly updating the statistical business register. I am addressing three questions. (1) What are we looking for and what do we get? I highlight borderline issues between enterprise and kindof-activity unit in practical statistics production. (2) Aiming for harmony or aiming for harmonization? Since many ways to do profiling work are in use in the member states of the European Union, statisticians may provide a wide range of practical outcomes regarding units for everyone to follow, but the definition of enterprise is not harmonised in the same way. (3) Is the enterprise definition settled for good? I present some conceptual and empirical findings on the enterprise concept, focusing on two issues: splitting of legal units and a whole enterprise group classified as one enterprise. These findings suggest that the proposals for a revision of the wording of the enterprise definition which were formulated by the Statistical Units Task Force in 2014 are still relevant.
1. Introduction Continued attention is given to the unit problem as a major obstacle to producing and communicating good business statistics. Regarding statistical units, a distinction can be postulated between three stages of abstraction from reality (Sturm, 2014):
1
Destatis, Gustav-Stresemann-Ring 11, 65189 Wiesbaden, Germany. Email: [email protected].
20
Chapter 3
The definitions should capture decisive characteristics of units which are important from conceptual and analytical points of view. As concepts and analyses serve practical purposes these definitions are of course driven by issues of real life, they are not purely academic. The operational rules describe in more detail how the definitions should be understood or how they could be handled in reality, that is, in application. Therefore, the operational rules build a bridge between definitions – which should be concise but also as short as possible in wording – and application. When drafting definitions and operational rules, sometimes it has to be worked out what belongs to the pure definition and what is already practical and therefore belongs to the operational rule. Still, the operational rules should be general and not particular for only one specific context or situation. The application of the definitions starts by deciding which unit to choose for which statistical purpose. Operational rules often have to be elaborated in further detail regarding a certain context and it has to be worked out how to handle the manifold practical aspects, e.g. how to collect data from respondents about observation units and how to transform this data to produce figures about the statistical units. One of the crucial components in dealing with statistical units is the statistical business register. Register experts have been involved intensively in Eurostat’s efforts to improve the consistent application of the enterprise concept in recent years (Sturm and Redecker, 2016). Since the 2014 ENBES workshop (ENBES 2014), which dealt with the unit issue, major developments have taken place and have been dealt with in the working groups for Structural Business Statistics (SBS), for Short Term Statistics (STS) and for Business Registers and Statistical Units (BR&SU) at Eurostat. Firstly, in 2015, Eurostat abandoned the attempt to change the enterprise definition of Regulation 696/93 (the “Statistical Units Regulation”, European Union 1993), or at least postponed any major discussion about changes of the unit definitions for the coming years (Eurostat, 2015). Secondly, all Member States which do not apply the enterprise definition appropriately have provided Eurostat with action plans to implement the enterprise definition in Structural Business Statistics (SBS). Thirdly, profiling as a method to identify enterprises is being established in many statistical offices – commonly in the register
The unit problem from a statistical business register perspective
21
departments of the offices but also either in the business statistics departments or in collaboration of the two. The method proceeds by analysing enterprise groups in order to find out about the enterprises within the groups. The conceptual debate of the years 2009 to 2015 (Sturm, 2015) about enterprises, global enterprises (GENs), truncated enterprises and enterprises at national level is settled now, at first sight. At the same time, an increasing number of empirical findings from practical profiling work are becoming available. At the end of 2015, German official statistics committed to introduce profiling as a new task of the statistical business register and started preparations. In Germany, a methodology for Profiling in Germany is being elaborated, based on the work of the ESSnet on Profiling (ESSnet Profiling, 2014) – a Eurostat sponsored multi-year project that produced a set of recommendations about how to do profiling in practice – to which Destatis, the Federal Statistical Office of Germany, contributed in the years 2009-2013. Staff for decentralized profiling work in most of the 14 Länder (states’) statistical offices in Germany is being recruited and trained as statistical profilers, who are now becoming familiar with the method of profiling in practical terms. A first testing period of practical profiling was started in 2017, adding empirical evidence to the first case studies which were done in the years before. Germany is still to reach the level of the countries with a longer tradition in profiling. Eurostat at the same time is concentrating on multinational cases and promoting much so called European Profiling as a collaborative approach of statisticians in several national statistical offices who deal with the profiling of multinational enterprise groups. Eurostat is devoting considerable resources to achieve the presently announced goal of 300 multinational profiling cases in a time period ending 2018. Statistical offices in the Member States are adapting to this approach to varying degrees, while at the same time putting in place national action plans to implement the enterprise definition for SBS in the coming years. Based on this situation, this chapter addresses three questions regarding the application of the enterprise statistical unit in SBS – in Germany and in the European Union: What are we looking for and what do we get? – Borderline issues between enterprise and kind-of-activity-unit in practical statistics. Aiming for harmony or aiming for harmonization – Will the action plans in EU countries to introduce the enterprise in structural business statistics result in more consistency? Is the enterprise definition settled for good? – Conceptual and empirical findings on the enterprise concept.
22
Chapter 3
2. What are we looking for and what do we get? The enterprise is characterized as a unit with certain autonomy whereas a kind-of-activity unit (KAU) comprises all main or secondary activities within one enterprise that belong to the same class (4-digit level) of NACE Rev.2. The KAU therefore represents a part of the enterprise with homogeneous activity. Main and secondary activities are those activities of an enterprise which are presented at the output markets of the enterprise, e.g. production of cars in the case of an enterprise producing (and selling) cars for the market. The main activity is the most important one in quantitative terms, whereas secondary activities are less important in quantitative terms. Ancillary activities of an enterprise on the contrary are activities which are provided inside the enterprise in order to contribute to the activities on the market, e.g. marketing department, IT department and real estate management in the case of a car producing enterprise. Ancillary activities within the enterprise do not constitute KAUs but are to be attributed to the main and secondary activities. In the practical work of delineating (delineation of units meaning the process of identifying units and describing these units by characteristics which can be stored in the statistical business register) enterprises and/or KAUs, several circumstances occur which blur the picture of the units we can practically delineate compared to the units we are aiming for. One of these practical limitations in German profiling work is caused by the decision that – for the time being – the profilers, in their delineation of enterprises, shall not split legal units. This decision has been made for two reasons. Firstly, the wording of the definition of the enterprise (“combination of legal units”) suggests no splitting of legal units. Secondly, no survey data on split legal units is available from the current business surveys. German SBS statisticians intend to produce enterprise figures by consolidating survey results obtained for legal units. The survey data do not contain splitting ratios of the variables collected to distinguish main and secondary activities of the legal units. These would be needed to combine split parts of legal units with other legal units to form complex enterprises and to describe these enterprises by economic variables. An enterprise group is the starting point of a profiling case. Within it, the profilers’ goal is to identify autonomous entities which, according to the delineation criteria of enterprises, have the characteristics of enterprises. Manual profilers work “top down”, meaning they start from the enterprise group and investigate whether there are autonomous entities (i.e. enterprises) within the enterprise group. When these have been identified,
The unit problem from a statistical business register perspective
23
in the next step the profilers assign the legal units of the enterprise group to the different enterprises. In a real case of a big enterprise group in Germany, profilers identified several such autonomous entities within an enterprise group and would have described these to be enterprises. But in the next step of the profiling case (assigning the legal units of the enterprise group to the different enterprises) the profilers were confronted with the situation that the activities of one dominant legal unit within the group contributed to the activities of several of the enterprises – sometimes this legal unit even contributed the overwhelming part of the enterprises. That one dominant legal unit within the enterprise group (which consists of about 200 legal units) accounted for 90% of the overall turnover of the enterprise group. But as legal units are not to be split, the profilers have been left with three possible second-best options: Option A. Postulate the dominant legal unit to form an enterprise of its own. Additionally, form several enterprises which contain the other 200 legal units of the group. This would result in a NACE structure of the enterprise population which is highly misleading since the activities of the dominant legal unit are quite heterogeneous but for the publication of statistical figures they would all be attributed to the main activity (in the actual case the main activity comprises only about 20% of the volume of the activities of the dominant legal unit). Option B. Do as in Option A plus merge the dominant legal unit with the other legal units which are involved in the main enterprise activity of the dominant legal unit. Option C. Interpret the whole enterprise group as one enterprise, thereby saving all the efforts of delineating different relatively small (but still complex) enterprises which are by far outweighed by the dominant legal unit. These second-best solutions bring up (at least) three borderline issues between enterprise and kind-of-activity units in practical operations. Issue 1: In each of the three second best solutions structural business statistics – based on enterprises – would be biased considerably, each of the options A, B or C resulting in different figures. However, at the same time by each option the statisticians’ work is perfectly in line with their agreed methodology. Issue 2: Each of the second-best solutions for the enterprise delineation could be a good starting point for short term statistics (STS). In STS, according to the current European legislative plans, only the concept of the
24
Chapter 3
KAU will be applied. Using the KAU concept provides the STS statisticians with a different conceptual framework compared to their SBS colleagues. In contrast to the wording of the enterprise definition, the wording of the definition of the KAU does not suggest refraining from splitting the dominant legal unit of our real-world example. Already today some of the surveys of STS collect data about homogeneous parts (with respect to NACE Rev.2 class) of the observation units (which in Germany are, in fact, legal units). Based on this, STS will possibly be in a position in the end to present more appropriate figures for enterprises – but will label them as KAU data. In case of option A – which suggested to interpret the dominant legal unit to form an enterprise of its own – they will in fact subdivide this legal unit into “KAUs of legal units”, meaning units which are conceptually not included in the unit model of European statistics. As already described, conceptually the KAU is understood to be a part of an enterprise that is homogeneous with respect to the class (4-digit) level of NACE rev 2. The consideration about the KAU in the unit debate of 2009-2015 was not about modifying the definition itself, but to make the application of the KAU definition more operational. This is laid down in an operational rule which introduces thresholds which shall limit the fragmentation of enterprises into KAUs. It suggests that in “practical implementation, the delineation of KAUs may be restricted to enterprises which because of their size (e.g. production value) have: (i) a significant influence on the aggregated (national) data at NACE activity level, and (ii) at the level of the individual enterprise, as guidance one secondary activity accounts for: - more than 30% of its total production at the 4-digit (class) level of the valid NACE classification, or - more than 20% of its total production at the 2-digit (division) level of the valid NACE classification” (Eurostat, 2015, Annex 2). For all other enterprises the KAU is considered to be equal to the enterprise. The application of this approach to produce STS figures for KAUs would deal with the economically very important entities (which are also in the focus of profiling in order to delineate enterprises), only that delineation of enterprises would be substituted by building KAUs of legal units as proxies for enterprises. Should it turn out that among these KAUs of legal units there are many cases like the example mentioned, then STS figures (which in future will be based on KAUs) might turn out to be better
The unit problem from a statistical business register perspective
25
proxies for the true SBS figures then the SBS figures which are generated on the base of inappropriately built enterprises. Issue 3: Many enterprises will not be delineated manually, but by automated profiling algorithms. These algorithms work – contrary to the procedure applied in the manual profiling, which is “top down” – in the “bottom up” way: based on the legal units of an enterprise group the algorithm aims to select within an enterprise group the legal units which perform ancillary activities within enterprises. Then – regarding the nonancillary legal units – the algorithm groups the legal units which perform the same main activity. Actually, this algorithm tends to produce “KAUs of the enterprise group” instead of enterprises, again units which are conceptually not included in the unit model of European statistics. One line of research could be to investigate the impact which these issues have on short term statistics: do STS aggregates based on KAUs of legal units or of enterprise groups differ from STS aggregates based on legal units? The counterpart is legal units this time, since for smaller enterprises their KAUs are allowed to be approximated by legal units for STS purposes. But this would be rather an academic by-product. There is also the very relevant question: what is the bias we get if SBS aggregates are based on KAUs of legal units or of enterprise groups instead of enterprises?
3. Aiming for harmony or aiming for harmonization? All EU member states have committed themselves to action plans to implement the enterprise definition for SBS in the coming years. Profiling is or will be executed in various ways in order to find out about the enterprises: the European manual “Intensive” Profiling: This means working in direct contact with the management of the enterprise groups and in working relations with profilers of all the Member States in which the enterprise group is active the European manual “Light” Profiling: This means using available information e.g. annual reports, internet, press etc. and being in working exchange with profilers of all the Member States in which the enterprise group is active the National manual “Intensive” Profiling: This means working in direct contact with the management of an enterprise group the National manual “Light” Profiling: This means using available information e.g. annual reports, internet, press etc.
26
Chapter 3
National Automatic procedures: This means using probabilistic assumptions, typical (e.g.) legal patterns and typical (economic) activities which are programmed in an automatic profiling algorithm National Schematic approaches: This means to regard enterprise groups (of a certain degree of simplicity or even all of them) as enterprises, or to regard legal units (with certain size limitations, but not all of them) as enterprises These six ways to do the job all aim to find out about enterprises and all these enterprises will be units within the national perimeter (for the discussion about approaches to delineate global enterprises, so called GENs, see e.g. Sturm (2015)). By the first two approaches Eurostat collects about 300 profiling cases in the whole of the EU, most of them providing information for one, some of them for two of the reference years from 2012 to 2017, on average perhaps 40-70 cases per reference year. Not all of the Member States are participating in this European Profiling (European Profiling meaning collaborative profiling according to the Eurostat methodology). The future of these two approaches is uncertain since they are very resource-intensive, relying considerably on Eurostat grants. The other four approaches are conducted stand alone in the Member States of the EU. Their future is quite certain since Eurostat has made it clear that the application of the enterprise definition in SBS is not to be disputed any longer (Eurostat, 2015). To the author’s knowledge, many Member States combine some of the four approaches for sub-populations of their SBS strata. As the sameness of approaches between the Member States cannot be guaranteed, in the worst case SBS units will be received from 2 (EU) + 4 × 28 (MS) = 114 profiling approaches. The rationale behind Eurostat urging the Member States to apply the enterprise definition seems to be twofold. Firstly, only SBS figures that are based on the same statistical unit are comparable. So far there exist only figures based on a mixture of legal units, of enterprises and of something – in the view of the author – similar to KAUs. Secondly, only SBS figures based on the appropriate statistical unit are relevant. So, the aim is to produce all over Europe SBS figures based on the same and the appropriate statistical unit, namely the enterprise, in order to have comparable and relevant SBS figures. In the light of the issues described in the preceding section and in the light of the multitude of profiling approaches described in this section, we have to realize that we are about
The unit problem from a statistical business register perspective
27
to switch from one situation, in which we use fuzzy real data for the production of SBS figures to another state in near future where we use different fuzzy data. Will this result in more consistency? Probably none of these states is close to the reality that we would like to measure. But we lack a reliable benchmark. In the coming years we will likely be provided with analyses of the amount of change in published statistics due to the change of concepts. But we should be very careful about interpreting the amount of change to indicate the size of the improvement. In order to quantify the causes of the change in figures it would be helpful to have some insight into the extent to which the ingredients of the “2+4×28 Pandora’s box” will be applied for preparing SBS figures in the future. Questions to be addressed include: Which methods will be applied to which sub-sets of the EU-28-SBS-population? What is the contribution of the European Profiling-300-population? Besides profiling there is of course the potential multitude of approaches of SBS statisticians on how to produce figures for the variables about the enterprises. But this is beyond the business register job and therefore beyond the scope of this chapter.
4. Is the enterprise definition settled for good? Legal units are used in the definition of the enterprise as a means to describe the enterprise: an enterprise is “the smallest combination of legal units that is an organizational unit (…)” (European Union, 1993). In the course of the unit debate of 2009-2015 it was argued that legal units should be dealt with differently to the current definition, in a proposed revised definition of the enterprise. Conceptually the current definition provides – with some imperfections in wording – a hierarchy: legal unit – enterprise – enterprise group. One or more legal units act as an enterprise and a multitude (more than one) of enterprises can form an enterprise group. In 2014/2015 a proposal for a modified definition was developed which proposed to do things differently. It has been proposed in working versions of the Task Force on Statistical Units that an enterprise may consist of one or more legal units but legal units may also be split and the parts of them may be apportioned to different enterprises. Additionally, the proposal offered to settle an ambiguity about the oneto-many relationship between enterprises and enterprise groups. Whereas linguistically the term enterprise group suggests that this entity comprises of more than one enterprise, conceptually this was not the intention when the enterprise group was defined as a unit type. Conceptually the enterprise group comprises of a set of legal units which are, by direct or indirect
28
Chapter 3
control relations, controlled by a legal unit which holds the position of the so-called group head. The Task Force proposed to describe explicitly that an enterprise can correspond to either a single legal unit not controlled by any other legal unit, an enterprise group as a set of legal units under common control, or an autonomous part of an enterprise group. All in all, that proposal for a modified definition would have allowed enterprises which are built from parts of legal units, and it would have explicitly mentioned the case where a whole enterprise group is formed of one enterprise. Both situations singled out in that proposed definition of 2014/2015 can be seen as very important. This is for two reasons. Firstly, the two situations can be observed in reality, so they exist and should therefore be covered by the definition to make the definition cover the reality. Secondly, they may be very helpful for the practical delineation of enterprises by the method of top-down profiling. As already described, this approach uses enterprise groups as the starting point in order to identify the enterprises within the enterprise groups. This work can be done without the need to find out the relations between enterprises and legal units, so the change of the definition would be helpful for practical work. The second element of the proposal for a modified definition deals with this situation by the option that an enterprise may turn out to be an autonomous part of an enterprise group. This case offers that the relation of the enterprise to legal units may remain undetected if this knowledge is not required for practical reasons. In the meantime, empirical findings are being collected which prove the relevance of the two aspects which are dealt with in the 2014/2015 proposal. These findings are emerging from manual profiling cases. Such findings of course do not come from automatic profiling algorithms – since these algorithms produce by necessity only output which is consistent with the model which forms the basis of the algorithm. Manual profiling on the other hand, executed by human brains, tests the applicability of definitions when profilers detect situations in reality which conflict with the definition. By the end of 2017, empirical evidence from about 140 manual profiling cases in Germany became available. About 23 of these cases produce enterprises that are equal to the enterprise group. For a subset of 90 cases it has been examined whether splitting of legal units should be applied in order to delineate the enterprises appropriately. This has been so in 28 cases. In 20 of these a predominant legal unit has been detected. As described in section 2, this results in incorrect figures about the main and secondary activities of the enterprises, and therefore in incorrect figures of SBS by NACE if legal units are not split and attributed to different enterprises. In eight cases legal units have been detected that carry out
The unit problem from a statistical business register perspective
29
considerable ancillary activities for several enterprises. This results in incorrect cost structures for the SBS figures, since the ancillary activities will not be appropriately apportioned to “enterprises” as long as the ancillary legal units are not split and attributed to different enterprises. It will also result in incorrect SBS aggregate figures by NACE, since ancillary legal units which cannot be attributed to complex enterprises will be used as “enterprises” themselves.
5. Conclusions The recent years proved once more that dealing with statistical unit issues is time consuming, laborious and intellectually challenging. At the same time, it proved that a sound understanding of statistical unit issues is essential for the production of comparable and relevant statistical figures. The three issues discussed in this chapter tried to illustrate that we are at a promising stage to adjust and develop our conceptual framework of business statistics. Empirical work proves that the unit issues are not purely academic but of practical relevance – statistical figures will change. There remains a good deal to do to bring these issues to the statistical practitioners as well as to the users of statistics and to convince them that it is a change for the better. Aiming for harmonisation should clearly push our ambition more than aiming for harmony. Globalisation requires statisticians to work hard on the reliability and comparability of figures which still mainly result from very different national statistical systems. But the differences in our ways of working will become more and more obvious and more and more questioned if we statisticians cannot give evidence that we provide the best figures we can manage.
References ENBES (2014). The unit problem in business statistics. https://statswiki. unece.org/display/ENBES/ENBES+Workshop%2C+2014+%28Geneva %29+-+The+Unit+Problem+in+Business+Statistics. ESSnet ESBRs (2016). ESSnet on a European System of Interoperable Statistical Business Registers (ESBRS) – Phase 1, Work Package 3, Block 2: Stabilising profiling. https://ec.europa.eu/eurostat/cros/system/ files/essnet-esbrs1_del-wp3-a1.1_methodological-report-profiling.pdf. ESSnet Profiling (2014). Methodology of profiling. Report of the Work package B of the ESSnet on profiling large and Complex MNEs: Conceptual framework, methodology, rules and standards.
30
Chapter 3
European Union (1993). Council Regulation (EEC) 696/93 of 15 March 1993 on the statistical units for the observation and analysis of the production system in the Community. Official Journal of the European Communities No L 76/1. 1993. Eurostat (2014). The statistical units model (Version: 15 May 2014), version presented to the Business Statistics Directors Group Meeting 24 June 2014. Eurostat (2015). Notice of intention of the Business Statistics Directors Groups and the Directors of Macroeconomic Statistics on the consistent implementation of Council Regulation (EC) No 696/93 on statistical units, Drafted by the Eurostat Task Force “Statistical Units”. Adopted by the ESS Directors of Business Statistics (BSDG) and Macroeconomic Statistics (DMES). June 2015. Sturm, R. (2014). Revised definitions for statistical units – conception and practical aspects. Paper presented at the ENBES Workshop, Geneva, November 10, 2014. Sturm, R. (2015). Revised definitions for statistical units – methodology, application and user needs. The main conceptual issues of the “units discussion” of the years 2009-2014. Statistika 95, 55-63 Sturm, R. and Redecker, M. (2016). Das EU-Konzept des Unternehmens. Wirtschaft und Statistik 3, 57-71.
CHAPTER 4 HOW TO IMPROVE THE QUALITY OF THE STATISTICS BY COMBINING DIFFERENT STATISTICAL UNITS?
OLIVIER HAAG1
Abstract The French statistical business register system at the National Statistical Office of France contains statistical units of different types and the links between them. This complex system is needed to produce the most relevant business and economic statistics. In France, the key statistical units are the legal unit, enterprise and enterprise group. This chapter begins by highlighting the impact of the change from legal unit to enterprise as the statistical unit on the basis of which the French economy is statistically described. Then it briefly presents the different statistical unit types and shows their relevance for economic analyses, followed by an overview of the different French business registers, which store data about these different units and contain the links between them. Finally, it presents some methodological challenges arising from using an observation unit different to the statistical unit.
1. Introduction Until recently, in France, business statistics have mainly been produced based on legal units as the statistical unit type. This was the case for all the steps in the production process, like sampling, data collection, data editing, estimation, and dissemination. But since 2010 a new approach has been in the development to take into account the enterprise, the economic unit 1
Institut national de la statistique et des études économiques, 88 avenue Verdier, 92120 Montrouge, France. Email: [email protected].
32
Chapter 4
defined for statistical purposes (EEC, 1993) that in practice is defined through profiling, to produce the French structural business statistics. However, as in France the legal unit is the best observation unit, Institut National de la Statistique et des Études Économiques (INSEE), the National Statistical Office of France, has had to set up a new system that allows an observation unit different to the statistical unit. In section 2, using two examples, we sum up the reasons for this change. The shift from legal unit to an economic unit (group or enterprise) provides a far better view of the concentration of the French economy. In sections 3 and 4, we present the French framework in terms of statistical units and business registers. Finally, in sections 5 and 6 we present some methodological topics and challenges in the production of French structural business statistics stemming from the fact that the observation unit is different to the statistical unit to calculate the French structural business statistics.
2. Impact of the choice of statistical unit: two examples 2.1. Impact on the results by enterprise size Prior to the changes initiated in 2010, INSEE used to equate the enterprise with the legal unit, and the whole system of business statistics was based on the legal unit. Equating the legal unit to the enterprise is not appropriate for complex business structures. If carried out, such equating poses at least two problems: An enterprise is an economic concept and a legal unit is a legal, administrative or fiscal concept. The legal unit corresponds well to an enterprise when it is not controlled by another legal unit. This is the case for 94% of the legal units. However, those uncontrolled legal units represent only 30% of business value added. For other legal units, equating them to an enterprise is not appropriate as it does not give the right picture of what is going on. Equating the legal unit with the enterprise gives a misleading description of the concentration of the French economy. Large groups with 10,000+ employees have on average 170 legal units in France. Reasoning with legal units leads to underestimation of the concentration of the French economy. INSEE has therefore decided to move from a definition based on the legal unit towards a more appropriate statistical definition: the enterprise (EEC, 1993).
Improving quality of statistics by combining statistical units
33
This new concept is now used in the production of business statistics. Figure 4-1 compares two presentations of the French economy: one based on the legal unit (LeU) type and the other on the enterprise (ENT) unit type. Four enterprise categories are defined as follows: Micro-enterprises employ fewer than 10 people and neither their annual turnover nor balance sheet total exceeds €2 million: a little over 3,000,000 enterprises; SMEs (small and medium-sized enterprises) employ fewer than 250 people, and either their annual turnover is less than €50 million or their balance sheet total is less than €43 million: slightly over 135,000 enterprises; ETIs (intermediate-sized enterprises) employ fewer than 5,000 people, and either their annual turnover is less than €1.5 billion or their balance sheet total is less than €2 billion: about 5,000 enterprises; Large enterprises are those that do not fall into any of the previous categories: 245 enterprises. As shown in Figure 4-1, the enterprise presentation results in more concentrated distributions of the economic indicators than the legal unit presentation. The weight of the micro enterprises and SME’s has reduced to the benefit of the large enterprises. For instance, in the legal units presentation the large enterprises’ share of exports is 22% whereas in the enterprise presentation it is 52%. This result is partly due to the fact that the enterprise group can be organized in small legal units that are specialized. For example, one large enterprise group can create micro legal units to: export its production; register its fixed assets, and so on. In the legal unit presentation, the data of these legal units are accounted among the micro enterprises but in the enterprise presentation they are accounted among the large enterprises.
2.2. Impact on the results by industry Changing to the enterprise view has an impact on the breakdown of turnover by NACE code too. Table 4-1 shows, for 2014, the effect on NACE breakdown of whether the legal unit or enterprise is used, and if the enterprise then whether the accounts of legal units within the enterprise are consolidated or not.
34
100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 0%
Chapter 4
4%
3%
13%
27%
25%
14%
24%
29% 22% 19% LeU
ENT
No. of units
LeU
24%
26%
ENT
LeU
39% 30%
18% 38%
20% ENT
LeU
28%
33%
28% 24%13% 7%
27%
ENT
LeU
No. of Added value Gross profit employees Large
52%
52%
24%
23% 26%
22%
24%
32%
40%
20%
31%
28% 24%
96% 96%
14% 32%
Intermediate
ENT
Balance sheet SME
13% 3%
11% LeU
ENT
Export Micro
Figure 4-1: Breakdown of some indicators of business demography and the performance of businesses in France by unit size in 2014, in two presentations: based on legal units (LeU), left in each pair of columns, and on enterprises (ENT), right in each pair of columns.
The first pair of columns in the table body is the turnover calculated per NACE code of the legal unit (LeU) and percent contribution to the total, the second pair of columns is the corresponding statistics calculated at the NACE code of the enterprise (ENT), and the third pair of columns is the consolidated turnover and its percent contribution calculated at the NACE code of the profiled enterprise. As the table shows, a change from one unit type to another changes the distribution of the contributions of the sectors to total turnover. Large manufacturing groups comprise many affiliates and subsidiaries in their core businesses. However, they also often set up separate affiliates to perform sales/marketing or support functions, which in the Legal unit view are classified in the service sector. In the enterprise view, the inclusion of these legal units in the enterprises’ dominant NACE leads, for instance, to an increase in the turnover in the manufacturing sector by 5%. Thus, incorporating service-sector affiliates and subsidiaries increases the weight of manufacturing and construction. We believe that the enterprise view gives a more accurate view of the French economy that allows the French government to set up the best possible economic policy.
Improving quality of statistics by combining statistical units
35
Table 4-1: Breakdown of turnover (in B€) by NACE code.
Activity sector Manufacturing Construction Wholesale and retail trade Information and telecommunication Financial and Insurance Real estate Professional, scientific, technical, administrative and support services Public administration, education, human health and social work Other Total
LeU 697 110
Turnover by NACE code of: ENT % ENT % consolidated % 29.9 809 34.7 737 33.9 4.7 113 4.8 108 5.0
1122
48.1
1049
45.0
980 45.1
99
4.2
94
4.0
91
4.2
20 36
0.9 1.5
0 36
0.0 1.5
0 35
0.0 1.6
191
8.2
169
7.2
159
7.3
27 31 2333
1.2 1.3 100
29 34 2333
1.2 1.5 100
28 33 2171
1.3 1.5 100
In Table 4-1, two types of effects can be observed: A reallocation effect assessing the impact due to a new statistical distribution of economic variables (e.g. turnover, exports, etc) by enterprises’ characteristics instead of legal units’ characteristics. If we take the example of a trade legal unit which belongs to a manufacturing group, in the legal unit view (column “Turnover by NACE code of the LeU”) its turnover is included in the wholesale and retail trade sector but in the enterprise view (column “Turnover by NACE code of the group”), its turnover is included in the manufacturing sector. A “pure” consolidation effect. The consolidated turnover is the aggregate of all turnover generated by a parent company and its majority-owned subsidiaries, after eliminations of intercompany transactions. The intra-group flows have no economic interest because they do not take place in the market. That is why, for
36
Chapter 4
economic analysis, the consolidated turnover is more appropriate than the sum of the turnover of all the legal units of the group. The differences between the two first pairs of columns show the “reallocation effect”. The difference between the two last pairs of columns shows the “pure” consolidation effect. We have underlined in this first part, that we need the enterprise unit type to produce the most relevant statistics. But one difficulty is that data are not systematically available for this new statistical unit. For example, we do not have economic variables at the enterprise group level. To address this issue, we need to consider the observation unit and the statistical unit differently. For the first time, we have to take into account two different types of units to produce structural business statistics. We need to know the links between these two types of units and to be able to create different frames. The statistical business register is the backbone of this new process as described in detail in the following sections.
3. The French statistical unit legal framework The French national legislative framework distinguishes three main unit types: The legal unit; The local unit; The enterprise group. The subsections that follow focus on each of these unit types and on a fourth statistical unit type: the enterprise. The enterprise has no administrative existence but is essential for producing statistics that reflect the economic reality as accurately as possible.
3.1. The legal unit The legal unit is an administrative unit. In France, it is mandatory to be registered as a legal unit to perform an economic activity. A legal unit can be a legal or a physical person. This unit is the backbone of INSEE’s system of data collection, since these units are registered in a national business register called SIRENE (see below) and their identifiers are shared by all the French administrations (tax, employment, customs, etc). Moreover, the legal unit has to declare economic information to the tax administration (turnover, total assets, balance sheets, amount of investment, etc). Additionally, according to French law, the tax administration must make
Improving quality of statistics by combining statistical units
37
all this information available to INSEE in order to reduce the administrative burden on companies. Further, the French statistical law of 1951 obliges the legal unit to provide data for mandatory surveys. Therefore, the French structural business survey and short term statistics survey, for instance, are collected at the legal unit level. Until 2014, the legal unit was used for the production of business statistics. The legal unit was regarded as an enterprise for the structural statistics and as a kind of activity unit for the short-term statistics. But this situation is now changing with the introduction in INSEE of the enterprise as a unit type, and Subsection 3.4 below will explain why and how.
3.2. The local unit According to the European regulation on statistical units, the local unit is an enterprise or part thereof situated in a geographically identified place (EEC, 1993). In France the local unit is simultaneously an administrative unit and a statistical unit. The legal framework for the local unit is the same as for the legal unit. The only difference lies in the type of data collected about these two different unit types. The local unit is used as the observation, production and dissemination unit for: employment statistics, because the French administration collects information at this level; statistical surveys, about energy usage, waste generation, expenditure to protect the environment, and so on.
3.3. The enterprise group According to the European regulation on statistical units, an enterprise group is an association of enterprises bound together by legal and/or financial links (EEC, 1993). In France the enterprise group is simultaneously an administrative unit and a statistical unit. But, in contrast to the two previous unit types, there is no obligation for the enterprise group to respond either to administrative requests or to statistical surveys. Only the largest groups have to draw up consolidated annual accounts and consolidated annual reports. But these data are consolidated at the world level and cannot be used to calculate national statistics.
38
Chapter 4
That is why, for the moment, French statistics do not use the enterprise group as a production or dissemination unit. But the enterprise group is useful to create a new classification of legal units. Legal units that belong to a Foreign multinational group (GETMNE): enterprise groups having at least one subsidiary in France but whose global decision centre (GDC) is abroad (16,000 groups); Legal units that belong to a French multinational group (GFRMNE): enterprise groups having at least one subsidiary in France and whose GDC is in France (about 5,000 groups); Legal units that belong to a French-only group (GFR-FRA): enterprise groups having all subsidiaries in France (about 65,500 groups); French legal units (IND-FR): legal units not belonging to a group and with their registered office in France (more than 3.7 million legal units).
3.4. The enterprise According to the European regulation on statistical units, an enterprise is the smallest combination of legal units that is an organizational unit producing goods or services, which benefits from a certain degree of autonomy in decision-making, especially for the allocation of its current resources (EEC, 1993). An enterprise carries out one or more activities at one or more locations. An enterprise may be a sole legal unit. The enterprise is therefore a statistical unit and not an administrative one. In France, before 2012, INSEE used to equate the enterprise with the legal unit, and the whole system of business statistics relied on the legal unit. But equating the legal unit with the enterprise is not appropriate any more for complex businesses (e.g. group’s affiliates and subsidiaries). Indeed, they lose their autonomy in decision-making and do not fulfil the European definition of an enterprise. INSEE has therefore decided to move from a definition based on the legal unit towards a more appropriate statistical definition. In order to define the French enterprises and obtain their consolidated accounts, we conduct what is called enterprise profiling. At INSEE, profiling is a method to analyse the legal, operational and accounting structure of an enterprise group at national and world level, in order to establish the statistical units within that group, their links, and the most efficient structures for the collection of statistical data. It concerns all the French parts of all types of groups (GET-MNE, GFR-MNE and GFR-FRA).
Improving quality of statistics by combining statistical units
39
It relies on two distinct methods depending on the size and the complexity of the groups as described below (Beguin and Hecquet, 2014): The largest groups present in France, or the most complex ones (those that have a large number of subsidiaries and multiple activities) are profiled manually, through direct meetings between representatives of the groups and the members of a specialised division of INSEE. There are about 55 groups of this kind. Most of these groups have more than 10,000 employees in France. One difficulty of this exercise is to estimate the “certain degree of autonomy in decision-making” to identify different enterprises within the group. We use objective elements to define this autonomy (separate operational segments, different head offices within the enterprise group, etc). The profiler has to obtain the consolidated accounts for each enterprise in the group. The other groups are profiled automatically. There are approximately 80,000 such groups. In these cases we consider the whole group as a single enterprise. In this case, the consolidated accounts are systematically obtained by automatic algorithms (Chanteloup, 2017).
4. The French business register network 4.1 Four business registers This section briefly presents the four French business registers and their links. For details, see Haag (2016). The French system is based on three main source business registers, each dealing with one type of unit, and a French statistical business register (called SIRUS), concatenating all the different information and serving as the sole basis for all statistical operations requiring a reference sampling frame. The three original source business registers are: SIRENE: the administrative register of legal units (LeU 1 to 10 in Figure X4-2) and local units. SIRENE contains more than 10 million active legal units and 12 million local units; LIFI: the statistical register of enterprise groups (EG1 and EG2 in Figure X4-2). LIFI contains about 125,000 enterprise groups; BCE: the statistical register of enterprises (Ent1 to 6 in Figure X42). The BCE contains about 90,000 profiled enterprises, each of which is composed of at least two legal units. These 90,000 enterprises represent about 310,000 legal units.
40
Chapter 4
Figure 4-2: The different statistical units, their links and their business registers.
Figure 4-2 shows the different statistical units managed by the business registers and the links between them.
4.2 SIRENE SIRENE is an inter-administrative business register created in 1973. It is an exhaustive business register of legal units enabling exchanges of information between different sectors of the state administrations. It contains a single identifier shared by all the registers of French government (taxes, customs, central bank, etc.). This identifier enables INSEE to perform micro-data linking.
4.3 LIFI The LIFI statistical business register identifies the enterprise groups and contains the links between the legal units within these groups (core and extended perimeter). The French statistical definition currently in force takes the absolute majority of voting rights as the control criterion for identifying the perimeter of groups.
4.4 The Enterprise Creation Database (BCE): the register of enterprises The BCE statistical business register identifies enterprises and contains the links between the legal units within them. A French enterprise (the basis for French statistics), is therefore (see 3.3):
Improving quality of statistics by combining statistical units
41
either an independent (non-group) French legal unit; or the French footprint of a whole group of legal units; or the autonomous part of the French footprint of a group of legal units. In this case an enterprise group is divided into several enterprises. These autonomous parts are obtained by a manual profiling process. One difficulty of this exercise is to estimate the “certain degree of autonomy in decision-making” to identify different enterprises within the group. The profilers use objective elements to define this autonomy (separate financial report, separate executive board etc.). But this delineation is still somewhat subjective and can be considered as a new component of the unit error. We do not explore it in this chapter. The latter two types of enterprises mentioned above are more commonly referred to as “profiled enterprises” and are managed by BCE.
4.5 SIRUS SIRUS stands for “system of identification in the business register of statistical units” and is a statistical business register of statistical units (enterprises, enterprise groups, local units). The main objectives of SIRUS are the following: To list groups and enterprises (in the statistical meanings of the terms), and the legal units and establishments that comprise these enterprises. SIRUS records the links between the various statistical units. For all these units, SIRUS also records characteristics that are useful for the creation of a frame, such as turnover, sector classification (NACE code) and salaried headcount, based on input from a multitude of sources (the other business registers, but also from statistical surveys). If SIRUS needs other characteristics, there are web services between SIRUS and the other business registers that allow SIRUS to obtain them, but they are not stored in the SIRUS database. To provide business statisticians with reference populations. In this way, at a given moment in time and for a given reference period, an enterprise will be allocated to the same reference population in all surveys and will have the same characteristics in all of them. To provide new statistical information, in particular the classification of enterprises in four categories (micro-enterprises, small and medium-sized enterprises, intermediate-sized enterprises and large enterprises).
42
Chapter 4
To manage the statistical cessation of units. This makes it possible to distinguish between a unit that has an economic activity and a unit that is legally active but has no economic substance. To record the response burden of statistical surveys, meaning the time spent by enterprises filling in statistical survey questionnaires.
4.6 Their links As illustrated in Figure 4-3, the main principles of the French business registers network are: SIRUS is the core of the system SIRENE makes the links between the statistical and the administrative worlds There is no direct link between the main source business registers. All the flows go from a register to SIRUS and from SIRUS to another register SIRUS makes the links between the register and the statistical world. For instance, the entire frame is constituted by SIRUS. SIRUS is always up to date for the creation of legal units because it is updated daily by SIRENE
Figure 4-3: The French business register network
Improving quality of statistics by combining statistical units
43
5. Combining different statistical units As previously mentioned, the legal unit is the best observation unit but cannot always be considered as the best statistical unit. However, the profiled enterprise is the best statistical unit but cannot be used directly as an observation unit. Indeed, the enterprise groups are under no legal obligation to respond to statistical surveys or to provide consolidated accounts to the tax administration. That is why, for a business survey, the survey designer must now ask two questions: What is the statistical unit of my survey? What is the best observation unit for my survey? This part explains how, thanks to the statistical business register, the links between these two different units are possible and what new issues the statistician will face.
5.1 A top down approach for delineation of enterprises The French profiling process used to identify enterprises within an enterprise group is based on a top down approach: In manual profiling, the profiler asks the group to define the enterprise perimeter of legal units within the group. In automatic profiling, the entire group is considered as an enterprise. In these two cases the enterprise group is the starting point to delineate the enterprise, explaining why we can talk about a top down approach.
5.2 A bottom up approach for consolidation of the data collected We distinguish two types of profiling methods for the consolidation of accounts as follows: In manual profiling, there are different types of consolidation o A bottom up approach. In this case the profiler obtains the internal flow between the legal units of an enterprise o A top down approach. In this case, the profiler obtains the enterprise accounts directly from the group o A mixed method. The profiler obtains the value of the nonadditive variables (like turnover, purchases, equity, debts and receivables, dividends …) from the group. This information generally comes from the International Financial Reporting Standards concept. And the values of additive variables (like
44
Chapter 4
value added, employment, fixed assets etc.) are calculated by adding the legal units’ data. In automatic profiling, a bottom up approach is used. The intra-flow for the additive variables is calculated by an algorithm (Beguin, 2013). In most cases, as the enterprise group cannot be directly used to obtain, for instance, tax information, the legal unit is retained as the observation unit. In this way, to obtain the data at the enterprise level it is necessary to consolidate the legal units’ data. Therefore, we can talk about a bottom up approach.
6. The impact on the frames and the survey process 6.1 Two frames for one survey Since the statistical units are now different from the observation units for the profiled enterprises, the survey design can be seen as a two-stage cluster sampling (Gros and Le Gleut, 2018, Chapter 8 in this volume). As a cluster, an enterprise is randomly selected and then all the legal units within this enterprise are included in the sample. But the cost constraint is still based on the number of legal units surveyed, which is now random. For this new methodology of the survey sampling design two frames are necessary: one frame composed of enterprises, which allows the sample to be selected; one frame composed of legal units and the links between the legal units and the enterprises, which allows the specification of legal units that have to receive a questionnaire. These two frames are now delivered by SIRUS.
6.2 A new method of data editing As the unit to be verified in the data editing process is different from the observation unit, it is necessary to reconsider the data editing process of INSEE surveys (Fizzala, 2018, Chapter 15 in this volume). Indeed, the verification process checks the quality of the enterprise data. In the case of inconsistency, the clerks have to check the legal unit data and if necessary contact the legal unit to edit an apparently suspicious response. This new process makes the clerk’s job more complex, and new tools need to be developed in order to make it easier.
Improving quality of statistics by combining statistical units
45
6.3 A new source of error The automatic profiling is based on algorithms whose goal is to estimate the intra-enterprise flows. These flows should not be included in the produced statistics because they have no economic meaning (the fees charged for these exchanges within the group are not at market prices). But these algorithms are based on hypotheses, which could be wrong in some cases. Besides, it is sometimes difficult for the clerks to obtain good values for these flows because sometimes the legal units that they call: do not know the full extent of the group; do not know the relations between the other legal units of the group; answer for an extent within the group different from the one in SIRUS, and so on. Thus, even if we obtain a value of intra-enterprise flow from a legal unit, it can be wrong and also generate a new source of error. But this error is less important than completely disregarding internal flows.
7. Conclusion With the globalisation of economies and the increasing importance of the enterprise group in the French economy, it is not possible today to use only the legal units to produce all the business statistics. We have to take into account the enterprise to produce consistent and relevant business statistics. But with the importance of reduction of statistical burden, it is crucial to gather administrative data via micro data linking, for instance. This information is abundant for the legal unit but not for the enterprise group. We therefore have to consider two different types of statistical units for conducting a survey: the legal unit as the observation unit, and the enterprise as the statistical unit. In these circumstances, the role of the statistical business register is crucial, since it is the only way to link these different kinds of units and allow them to be followed up over time. For INSEE, even though the business register already enables this new approach, efforts will still have to be made to further improve the quality of the statistics produced. For instance, INSEE is going to continue the work on profiling by: increasing the number of groups that are manually profiled, increasing the quality of the consolidation algorithm, setting up a new survey to obtain the intra flow of the large groups, that cannot be manually profiled, due to a lack of resources.
46
Chapter 4
References Béguin, J-M. (2013). Calculation of the main SBS characteristics for the enterprises equal to the enterprise groups. In Report on a method to automatically compute SBS on enterprises defined as an enterprise group in its whole. Available at https://ec.europa.eu/eurostat/cros/ system/files/Deliverable%202_Lot1_Task2.pdf (accessed 2018-06-26). Béguin, J-M. and Hecquet, V. (2014). Profiling in France: Implementation and results. 24th Meeting of the Wiesbaden Group on Business Registers, Vienna. Available at: http://www.statistik.at/wcm/idc/idcplg? IdcService=GET_PDF_FILE&dDocName=077950 (accessed 201807-03). Chanteloup, G. (2017). The calculation of the automatically profiled enterprises characteristics profiling: a new and better way to apprehend the globalization. Group of experts on Business Registers, Paris. Available at https://www.unece.org/fileadmin/DAM/stats/documents/ ece/ces/ge.42/2017/France_ENG.pdf (accessed 2018-07-03). EEC (1993): Council Regulation (EEC) No 696/93 of 15 March 1993 on the statistical units for the observation and analysis of the production system in the Community https://publications.europa.eu/en/ publication-detail/-/publication/1ea18a1a-95c2-4922-935c-116d8694c c40/ language-en (accessed 2018-07-03). Fizzala, A. (2018). Adaptations of winsorization caused by profiling. In Lorenc, B., Smith, P.A., Bavdaž, M., Haraldsen, G., Nedyalkova, D., Zhang, L.-C. and Zimmermann, T. (2018) (eds.). The unit problem and other current topics in business survey methodology, pp 213-227. Newcastle upon Tyne: Cambridge Scholars. Gros, E., and Le Gleut, R. (2018). The impact of profiling on sampling. In Lorenc, B., Smith, P.A., Bavdaž, M., Haraldsen, G., Nedyalkova, D., Zhang, L.-C. and Zimmermann, T. (2018) (eds.). The unit problem and other current topics in business survey methodology, pp 91-105. Newcastle upon Tyne: Cambridge Scholars. Haag, O. (2016). Profiling: a new and better way to apprehend the globalization. Q2016, Madrid: http://www.ine.es/q2016/docs/ q2016Final00033.pdf (accessed 2018-07-03).
CHAPTER 5 ISSUES WHEN INTEGRATING DATA SETS WITH DIFFERENT UNIT TYPES
ARNOUT VAN DELDEN1
Abstract Output of official statistics is increasingly based on integration of different data sources. The present chapter provides an overview of the kind of issues that occur when one tries to compile new publications, in a reasonably short period of time, that concern integrating data sets with different unit types or with unit types that are of a composite nature. We conducted an inventory of possible issues by studying twelve different cases where data of different unit types have been integrated. We also used experiences from previous studies. We structured the issues in a linkage and data integration framework with six state types for both the units and the variables. Based on the inventory and the framework, we identified 22 different issues. Nine of those are related to the use of different unit types. We believe that the list of identified issues will aid other national statistical institutes in anticipating the problems they may be confronted with when integrating different data sets.
1. Introduction Traditionally, the output of National Statistical Institutes (NSIs) consists of a set of pre-defined, fixed output tables, using a well-regulated production process often based on a single input source. There are two on-going changes to this traditional way of producing output. First, statistical output is increasingly based on multiple input sources, or at least NSIs are moving in that direction (Citro, 2014). Second, NSIs are
1 Department of process development and methodology, Statistics Netherlands, 2490 HA, The Hague, The Netherlands. Email: [email protected].
48
Chapter 5
producing more tailor-made output, based on requests from governmental organisations such as municipalities and ministries. Both changes require that different data sources can easily be linked, and that output can be constructed in a short period of time. As a first step to enable such changes at Statistics Netherlands (SN), a system of social statistical data sets (SSD) has been developed since 1996 (Bakker et al., 2014). This system facilitates linkage of persons, households, buildings and organisations. In order to link data sets easily, and for confidentiality reasons, anonymised identification numbers are appointed. An extension of such an approach to economic data is even more complicated, since unit types in economic data are very diverse. Since 1978, SN has developed a general business register (GBR) integrating various administrative data sources. At first, data sources from the chambers of commerce and trade organisations were used. Since 1994, data on relations between legal and tax units, maintained by the tax authority, has also been an important input source. From 2003 to 2006, SN attempted to develop an Economical Statistical Database (Heerschap and Willenborg, 2006; Hoogland and Verburg, 2006), but did not succeed because (a) there were not so many administrative data sets at the time and (b) it was difficult to combine data sets with different unit types at a micro-level. Since 2016 a renewed project has been started in which SN aims to combine economic data sources with each other and with social data. It merits to realise that the distinction between economic and social data is not so clear cut anymore. For instance, the GBR contains more than 1 million sole proprietor businesses and they can also be found within the SSD as persons. In the Netherlands, an interrelated system of base registers has been developed by the government (European Commission, 2015, pp. 27–30; van der Valk and Spooner, 2014) that is very helpful for SN to link different kinds of data sources. Still, SN would also like to combine data that is not part of this base register system. Recently, different kinds of data sets have been combined at SN, giving insight into a number of problems during this activity. The present chapter aims to give an overview of the kind of issues that might occur when integrating data sets of different unit types. The remainder of this chapter is organized as follows. Section 2 introduces twelve case studies that were analysed for issues when integrating sources. To structure those issues, we first present a framework for linkage and data integration in section 3. In section 4 we describe the issues. Next, in section 5, the framework is discussed. Finally, section 6 gives concluding remarks.
Issues when integrating data sets with different unit types
49
2. Case studies Twelve case studies were selected in which data sets with different unit types have been combined. A description of those case studies can be found in Van Delden (2017). The case studies varied in the unit types that were involved in the different data sets. We distinguished four groups, namely data sets combining (see Table 5-1): administrative and statistical business unit types; administrative, statistical business unit types and natural persons; administrative, statistical business unit types and web sites; and different regional unit types. In addition to these four groups, there were two case studies with only data on natural persons. Those two case studies were added because it involved data sets without unique linkage keys. The case studies also varied in the statistical perspective of the data integration. Most of the case studies were output-driven. By output-driven we mean that one starts by defining an intended statistical output, in terms of the definition of the population and the variables. Subsequently, one tries to produce that output from available data sources and if that is not possible, one might conduct a survey to collect data for that purpose. Some of the case studies however were more data-driven. By data-driven we mean that one starts by combining available sources and then explores what kind of possibly relevant output can be produced with that data. There are also situations with an intermediate perspective, for instance when the targeted population is given (by an external party) and one aims to provide economic indicators of performance of those units with already available data sets. For each of the twelve case studies we interviewed one or more informants. We did not use a formal interview guide, but in each of the interviews the following topics were discussed: What is the intended output and how is it produced? Which unit types are involved? What are the difficulties that arise during the linkage process? Is there any methodology or tool that is missing at the moment which could support the process? Is there some written documentation about the methodology used?
50
Chapter 5
Table 5-1. Overview of the case studies Case study
Unit types involved
Starting point
Admin and statistical business unit types Global Value Chain Bankruptcies
Enterprise Group, Enterprise, Legal Person Enterprise, Legal person
Data-driven Output-driven
Admin and statistical business unit types & natural persons Administrative Unit Base Agricultural Census Family Enterprises Self-employed
Natural person, Administrative unit types, Statistical Business Unit types Natural person, Base Tax Unit, Enterprise, Reporting Unit Natural person, Website, Enterprise Group Natural Person, Legal Person, Enterprise
Data-driven Output-driven Output-driven Data-driven
Admin & statistical business unit types & web sites Centre of Multiple studies, e.g. Output-driven Policy studies Enterprise, Website; Enterprise, Units of trade organisations Top Sectors Enterprise, Units of trade Output-driven organisations, Web site Trade Enterprise, Unit of trade Output- / data Organisations organisations, Web site driven Different regional unit types Energy Consumption
Building Unit, Address
Output-driven
Natural persons Usual Residence System of Social Statistical Databases
Natural Person
Output-driven
Natural Person
Output- / datadriven
Issues when integrating data sets with different unit types
51
After each interview, we wrote a report that was checked for correctness by the informants. Those reports can be found as the appendices in Van Delden (2017). Based on the full set of reports we analysed which kind of issues occurred, which of those issues were related to differences in unit types and to what extent new methodology is needed to cope with those issues. We limit ourselves in the current paper to the first two points.
3. Framework To structure the findings, we developed a framework to ensure a rather comprehensive overview of the kinds of issues that might occur. We refer to this framework as the linkage and data integration (LDI) framework. Moreover, the LDI framework offers the opportunity to verify whether there are issues that are likely to occur but that were not mentioned by the informants. We refer to the latter as hidden issues. The starting point for the LDI framework was the total error framework in the case of combining administrative and survey data, originally designed by Bakker (2011) and later extended by Zhang (2012). This total error framework consists of two phases: the first phase concerns the processing of a single data source and the second phase concerns the integration of different data sources. We used the second phase as our starting point. However, we have adapted this second phase for several reasons. The main reason was that the LDI framework should capture the causes of the possible linkage errors. In Zhang’s framework the second phase moves directly from ‘target concept’ to ‘harmonised measure’ whereas we needed more detail in that part. Further, we wanted an explicit distinction between data and concepts. To that end, we used the business architecture model Sherwood Applied Business Security Architecture (SABSA), which mainly gives an IT perspective (TOGAF-SABSA, 2011). We have derived the LDI framework with two kinds of perspectives in mind. Firstly, similarly to the survey life-cycle model in Groves et al. (2004) and subsequently in Zhang (2012), we used an output-driven perspective, thus with an intended output in mind. This coincides with the perspective of most of our case studies. Given the output that one aims to produce, one may have some potentially relevant data sources to base the estimates upon. At SN there is a Data Service Centre (DSC) in which many data sets with their corresponding metadata are stored (Struijs et al., 2013). Secondly, we have focused on the situation where one integrates sources with different unit types. In section 5 we discuss how the LDI
52
Chapter 5
framework can be used in case of a data-driven perspective and when the unit types in the sources do not differ. Within the LDI framework, we distinguish six state types (see Figure 5-1), that apply to both the units and the variables going from separate sources to the statistical output from the integrated set. We will first briefly mention the different state types. Thereafter we apply these state types to the units and variables involved in the integration of different sources.
Figure 5-1. Linkage and data integration framework
Issues when integrating data sets with different unit types
53
The first state type concerns the concepts of units and variables, which is referred to as the conceptual layer in the business architecture model. The terms referring to the business architecture model are shown on the left side of Figure 5-1, marked by the curly brackets. The second state type concerns the operationalisation of the concepts, that is, the rules to make the concepts operational in practice; the actual realisation is not included. The operationalisation fits into the logical layer (or information layer) of the business architecture model, since it describes the information that is held in the source (not the data themselves). The third state type describes the actual realisations in terms of units and data values. Here one arrives at the physical layer in terms of the business architecture. The same holds for the remaining state types. The fourth state type concerns the relations between the units in the different data sets: which units are identical and should be linked? It also concerns logical relations that may exist between the variables within and across the data sets. An example of a logical relation is that a certain variable should be larger than or equal to the sum of a set of other variables. The fifth type concerns the derivation from actual source units and values to their statistical counterparts. The sixth state type concerns the estimation resulting in the output. Within each of the first five state types we distinguish between two states: one state refers to the representation side, with units and populations, and the other one to the measurement side with variables and their values. The LDI framework also has a time dimension, which stands for the states at different moments in time. The vertical arrows in Figure 5-1 show the order in which the states are passed starting from the conceptual state type and ending with the output. The representation side and the measurement side can partly be passed in parallel, but there are two clear dependencies which are marked with the horizontal arrows. The state “related measures” refers to variables in different linked data sets, which implies that the data needs to be linked first. Likewise, “derived measures” requires that the statistical target units have been derived. In practice, the actual processing flow may differ for specific applications, that is, states may be combined or states may be re-used from other production processes. If needed, one can draw a tailor-made flow-chart for a specific situation. In the current paper however, we restrict ourselves to a generic framework that can be applied in different specific situations. On the representation side the following states are passed: Conceptual sets. One starts by defining the target population and variables that one is interested in. Next, one can look into the
54
Chapter 5
metadata of potential sources at the statistical office or outside their office. When data and metadata are stored separately (Struijs et al., 2013), this can be done before having access to the actual data. An important question is to what extent the source and target definitions coincide with each other. For instance, source unit types may have a one-to-one, one-to-many, many-to-one or manyto-many relationship with the target unit type. Operational sets. This concerns the practical rules to derive units, in line with the concepts, in the actual data. For instance, a unit may be part of an administrative data source when its size passes a certain threshold. Notice that these practical rules can be studied without having access to the actual data. Data owners for instance can say how concepts have been made operational in their source. Obtained sets. There may be differences between the obtained units in a source after applying the operational rules to the actual data and those obtained from error-free data, for instance because errors in the identification variables may occur in the actual data. Linked set. After linkage of the units in different data sources one obtains the linked set. The preferred linkage method depends on the outcomes of the previous states. For instance, one will use a one-to-one linkage method only when one expects that both data sets concern the same unit types and have an overlapping population. Derived set. In this state, the statistical units and the corresponding population are derived. Often this set is obtained in a GBR. For this state, it is crucial that the relations between the units are known.
Having obtained the derived sets, the variables in the linked set need to be integrated. On the measurement side the following states are passed: Conceptual measures. One compares the concepts of the variables with those of the intended output. As with the conceptual sets, this can be done on the basis of the metadata of the relevant sources. An important question is how the concepts of the source variables are related to the target variables. Examples of such relations are: a) they are identical, b) the source variables are a proxy of the target variable with considerable random and systematic measurement errors, c) the target variable can be obtained from the source variable with ad-hoc derivation rules, d) the target variable can be estimated from the source variable by statistical methods.
Issues when integrating data sets with different unit types
55
Further, even when the definitions of source and target variable are the same, adjustments to the measures may be needed when the source unit type for which the variable is measured differs from the target unit type (see the first two issues on the measurement side, section 4). Operational measures. This concerns the operationalisation of the variables. For instance, certain activities may be exempted from VAT obligations. This operational information may be obtained from the data owners. Obtained measures. The actual values may differ from the true ones. For instance, sales of economic activities that are subject to a zero per cent tax tariff may be underreported by companies to the tax office. Related measures. One needs to consider relations between the variables in different linked data sets. Variables may have a logical relationship with each other or the same variables may be observed in different linked data sets. Note that while the previous three states can be passed parallel to the same state types on the representation side, the state ‘related measures’ requires that the units in the data have been linked together at the level of the source units. Derived measures. In the previous state, the measures were still at the level of the source units. In the current state one has derived the measures at the level of the statistical units. That is, those units that need to be used in order to produce the required statistical output. This state presupposes that the statistical units are derived.
Finally, output is generated based on the integrated micro data, requiring an estimation method. Notice that the LDI framework does not account for a separate data analysis/editing state type (for instance leading to the states ‘Corrected set / Corrected measures’). In the presented LDI framework, each process of obtaining a state may include detection and correction of errors.
4. Issues Based on the response by the informants, we found issues in ten of the eleven states of the LDI framework, the exception being the state ‘derived set’. These issues are given in Table 5-2. That triggered the question: are there really no issues in the derived set? To answer that question, we studied the interview reports of the case studies on implicit decisions that
56
Chapter 5
were taken during the process of producing those statistics and we mapped those decisions onto the LDI framework. In this way we found a number of hidden issues. To explain better how we traced those hidden issues, we give an example for a case study on energy use of buildings. Those buildings are classified by type of service (school, hospital, NACE sectors). Administrative data on this energy use are per address (of a gas- or electricity meter) and are linked to address information in a real-estate register. The informant explained that sometimes an address of an energy meter links to the address of an office block with multiple establishments, where these establishments may have different economic activities. These economic activities are classified according to a NACE rev.2 code. The total energy use of this energy meter was attributed to the NACE code of the largest establishment within the office block (section 6.10 in Van Delden, 2017). The informant explained that this solution did not affect the results at an aggregated level. In economic statistics it is so common to appoint a `main activity´ to a statistical object, that the informant did not experience this choice as an issue anymore. The hidden issue however is that when the categories of a classification variable are available at base unit level (establishments), whereas one applies this classification to a composite unit (the office block), then this gives a conflict at the state type ‘concepts’. In fact, other solutions are also possible, for instance one could try to model a relationship between energy use and NACE code, and use that to divide the energy use of that office block over multiple NACE codes. So, one has made an implicit pragmatic choice for a certain solution rather than seeking for (new) methodology to handle this issue.
4.1 Issues on the representation side When an NSI uses a set of units from a third party, there may be a discrepancy regarding the targeted population on the conceptual level. SN obtains a list of units receiving an agricultural survey from a third party, that is a mixture of units with and without a market orientation. For the purposes of (international) comparability, SN would like to limit the target population to the market-oriented units (present in our GBR) whereas the third party wishes to include the full set. Further, there may be a discrepancy regarding the identity of units in the data. For instance, SN receives a membership list of units active in catering and finds units with a distribution store on that list. Should the output that is produced for that unit concern the whole distribution store, or only the part active in catering?
Issues when integrating data sets with different unit types
57
A third issue is the discrepancy between identification variables and unit type. For instance, SN collects agricultural survey data and the sampled units provide contact information. Part of the reported values of the identification variables do not link to an agricultural enterprise but to an overarching holding. Another part links to persons rather than to businesses; this occurs when a respondent provides a personal address rather than a business address. A fourth issue is that when one links data sets based on non-unique identifiers it will often happen that not all the identification variables agree. The question then arises: what is the probability of a (true) match? For instance, one collects identification variables (name, phone number) from business websites that one tries to link to enterprises of a GBR. One should then use a linkage methodology that accounts for the composite nature of the GBR and allows for many-to-many relationships. Finally, when deriving statistical units from administrative units, one needs information on ownership relations between those administrative units. Missing or erroneous relations between units, for instance due to time-delays of delivered information, may lead to wrongly derived statistical units.
4.2 Issues on the measurement side One may have to aggregate or disaggregate the concepts of the source data to obtain the targeted ones. Consider for instance the energy use example in the introductory part of section 4 where one aims to classify energy use per m2 of buildings by type of service. When there are multiple services within a building with a single energy supply point, the first question to address is: how does one define the type of service? For now, assume that one chooses as a concept to describe a set of services for the same object, in this case of multiple services. The issue then arises that one needs data at a more detailed level. The opposite situation might also occur. For instance, suppose that one collects data on bankruptcies of legal units and one aims to publish the total financial impact of bankruptcies on the economy. Debts of a legal unit may be reclaimed to business partners, implying that by summing up the debts of all bankrupt legal units one would overestimate the total. One needs to transform the reported values to a less detailed level, in this case by consolidating (e.g. correcting for internal flows) the values. A third issue on the measurement side deals with missing data. Consider the situation that the targeted statistical units each consists of one or more of the units which are in the administrative data. Further, one
58
Chapter 5
has to publish before the administrative data are completely available. One then has to decide which unit type is used to impute missing values: the administrative or the statistical unit type. Table 5-2. List of issues in different states (prefix * refers to hidden issues).
4.3 Issues related to estimation (output) Concerning output produced from combined sources, two issues are relevant. The first issue is: can one determine to what extent output quality is affected by linkage errors? The second issue is: can one relate output variables to each other that are based on different unit types? For instance, a statistical office may have produced output on job vacancies by occupation. Assume that one knows that the number of job vacancies of certain occupational types may depend on the policy of management boards of large enterprise groups to outsource IT-services. The issue now is: how can one relate the output from “job vacancies” to those of “enterprise groups” in a meaningful way? We explain what we mean by
Issues when integrating data sets with different unit types
59
relate through an example in social statistics. Since a person has “a part of” relationship to a household, and one can make a cross table of person characteristics by household characteristics. Could one, in economic statistics, make a cross table of occupation type by enterprise type, using that a job (for a certain occupation) is offered by a business that is in turn part of an enterprise group (of a certain type). For such a cross tabulation to be meaningful, requires at least two things. First, that we can express variation in management policies into a clear enterprise group classification. Second, that we can measure and collect data on management policies for different enterprise groups in the population.
5. Discussion of the framework We have derived a LDI framework with different unit types from an output-driven perspective. The main purpose of this LDI framework is to support NSIs that plan to link different data sources, since it helps to find the causes of (potential) errors during this process. We will now discuss two aspects of this framework: a) To what extent can it be generalized to the situation where data sets with the same unit types are linked? And b) Can it also be used from a data-driven perspective? Linking the same unit types. In this situation, it is most likely that the source unit types coincide with the statistical units. Under this condition, simplifications occur in two state types. First, in the state type ‘concepts’ there are no conceptual differences between the unit types, so on the representation side we know that this is a one-to-one linkage between the units in the data sets. This implies that on the measurement side one does not have the issues ‘aggregate or disaggregate concepts’. Second, the state type ‘derivation’ may be skipped, since that concerns the transition on the representation side and on the measurement side from the source unit type to the statistical unit type. When the sources are based on the same unit type, but that unit type differs from the statistical unit type, then the LDI framework can be used without modifications. An example of this situation is found in the French statistical system where both survey and administrative data are collected at the level of the legal unit, (Haag, 2018, Chapter 4 in this volume). In the French system, enterprise-based statistics are subsequently generated bottom-up from the legal units data. In terms of the processing flow one might first use a one-to-one link among the units in the sources and then go through the last two states where data are transformed to statistical units.
60
Chapter 5
Data-driven perspective. We believe that the LDI framework is also useful from a data-driven perspective, but then it should be used in a different way. With a data-driven perspective, one starts by linking the data sets together. One can either link the data sets directly to each other, or one might link both data sets to a statistical business register. Now assume that the data sets contain different variables and the unit types in the data sets differ. The relations between the variables in the different data sets can then only be analysed in a meaningful way by taking one unit type as the ‘reference’ unit type, for example the more aggregated unit type. For instance, if one would like to compare income at person level from administrative data with household expenditure data from a survey, one would first have to derive income at household level. In that sense one needs to make a choice about a ‘target’ unit type for the combined set. Further, one needs information on the relationships between the unit types, in case they differ. At SN, for this purpose a general system is being developed, that is treated in the case study ‘administrative unit base’ (Van Delden 2017). Using the linked set, one then explores the relationships between the measures obtained, which refers to the state “related measures”. One tries to understand the relations obtained within the data set and aims to verify that they are correct. We illustrate this further by using an example. About ten years ago, the suitability of VAT declarations for the use of turnover estimates was explored at SN (Braaksma, 2009). The study investigated two options: a) directly aggregating the VAT data without linking to a business register, using the VAT-units as the statistical units, and b) linking VAT data to GBR using enterprises as statistical units. Results of the options (a) and (b) were compared with turnover from a structural business statistics sample survey (SBS). This revealed a number of deficiencies in the approaches (a) and (b). For instance, with option (b), for many economic sectors the VAT totals were much larger than the SBS totals. This turned out to be due to two forms of over coverage in the VAT declarations: (i) some VAT-units were involved in transit rather than production activities, and (ii) some VAT-units were fiscal representatives of sales by foreign companies. These two error types were later corrected by means of derivation rules. The LDI framework can be used to analyse the potential sources of such errors. In terms of the LDI framework: the units involved in transit activities as well as the fiscal representatives did not belong to the “conceptual set” of Dutch statistics on production. Also, the study mentioned above revealed that the economic activity code in the VAT data is operationalised as an old version of the NACE code (this refers to
Issues when integrating data sets with different unit types
61
the operational measures). Furthermore, there were errors in the measures obtained for the NACE code due to a limited quality control of this variable (which refers to the state obtained measures).
6. Conclusion In an inventory of a number of case studies, a large set of issues was reported that occur when integrating different data sets. In addition, some hidden issues were identified. Those issues may have been overlooked by the informants because they have found ways to cope with them in practice, or because they have accepted them and believe that the issues cannot be handled differently. Some of the issues occur when combining data with different unit types. We do not claim that our inventory resulted in an exhaustive list of problems. We assume that it is not even possible to arrive at an exhaustive list, because new ways of combining sources may be developed in the future that may come with their own issues. Moreover, we did not intend to find an exhaustive list, we rather wanted to reveal the current challenges in data linkage with different source types and to what extent new methodology is needed to face those challenges. A description of the methodological needs was beyond the scope of the current paper and can be found in Van Delden (2017). Additionally, we believe that the LDI framework and our inventory are useful for NSIs that plan to combine sources with different unit types. By considering the different states of the LDI framework and by taking notice of the issues presented in our chapter, they can anticipate the problems that they may be confronted with.
Acknowledgements The work reported in this chapter utilises ideas of a number of colleagues at SN, on whose behalf the author has written the chapter. The author thanks Daniela Ichim, Johan Lammers, Boris Lorenc and Paul Smith for their useful comments on earlier versions of the chapter and for their suggestions. This improved the chapter considerably and broadened the applicability of the work.
62
Chapter 5
References Bakker, B.F.M. (2011). Micro-integration: State of the art. In: ESSnet on Data Integration, report on WP 1, State of the art on statistical methodologies for data integration, 77–107. Available at https://ec.europa.eu/eurostat/cros/content/wp1-state-art_en (accessed 2018-02-07). Bakker, B.F.M., van Rooijen, J. and van Toor, L. (2014). The system of social statistical datasets of Statistics Netherlands: An integral approach to the production of register-based social statistics. Statistical Journal of the IAOS 30, 411–424. Braaksma, B. (2007). Redesign of the chain of economic statistics in the Netherlands. Seminar on Registers in Statistics - methodology and quality, 21-23 May, 2007 Helsinki. Available at https://www.stat.fi/ registerseminar/sessio3_braaksma.pdf (accessed 2018-02-07). Citro, C.F. (2014). From multiple modes for surveys to multiple data sources for estimates. Survey Methodology 40, 137–161. Delden, A. van (2017). Issues when integrating data sets with different unit types. CBS Discussion paper 2017-05. Available at: https://www.cbs.nl/en-gb/background/2017/23/issues-when-integratingdata-sets-with-different-unit-types (accessed 2018-02-07). European Commission (EC) (2015). eGovernment in the Netherlands. Version January 2015, Available at https://joinup.ec.europa.eu/ elibrary/factsheet/egovernment-netherlands-january-2015-v170 (accessed 2018-02-07). Groves, R.M., Fowler, F.J., Couper, M., Lepkowski, J.M., Singer, E. and Tourangeau, R. (2004). Survey methodology. New York: Wiley. Haag, O. (2018). How to improve the quality of the statistics by combining different statistical units? In Lorenc, B., Smith, P.A., Bavdaž, M., Haraldsen, G., Nedyalkova, D., Zhang, L.-C. and Zimmermann, T. (2018) (eds.). The unit problem and other current topics in business survey methodology, pp 31-46. Newcastle upon Tyne: Cambridge Scholars. Heerschap, N. and Willenborg, L. (2006). Towards an integrated statistical system at Statistics Netherlands. International Statistical Review 74, 357–378. Hoogland, J.J. and Verburg, I. (2006). Handling inconsistencies in integrated business data. Paper presented at the Conference of European Statisticians, work session on data editing, Bonn, Germany, 25-27 September 2006. Available at https://www.unece.org/fileadmin/
Issues when integrating data sets with different unit types
63
DAM/stats/documents/ece/ces/ge.44/2006/wp.10.e.pdf (accessed 201802-07). Struijs, P., Camstra, A., Renssen, R.R. and Braaksma, B. (2013). Redesign of statistics production within an architectural framework: The Dutch experience. Journal of Official Statistics 29, 49–71. TOGAF-SABSA (2011). TOGAF and SABSA Integration. How SABSA and TOGAF complement each other to create better architectures. A white paper published by the Open Group. Available at https:// www.slideshare.net/SABSAcourses/sabsa-togaf-integration-white-paper (accessed 2018-02-07). Valk, F. van der and Spooner, R. (2014). Completing the register. GeoConnexion International Magazine May 2014, 23–26. Available at https://www.vicrea.nl/site/assets/files/1287/completing_the_register.pdf (accessed 2018-02-07). Zhang, L.-C. (2012). Topics of statistical theory for register-based statistics and data integration. Statistica Neerlandica 66, 41–63.
CHAPTER 6 PRODUCING BUSINESS INDICATORS USING MULTIPLE TERRITORIAL DOMAINS
DANIELA ICHIM1
Abstract At least since the Commission Regulation No 251/2009, if not before, National Statistics Institutes disseminate structural business statistics using cross-classifications of administrative territorial partitions with other breakdowns. This chapter addresses several issues that might arise when producing business statistics using different territorial partitions, that is, administrative and labour market areas. The latter are introduced as functional territorial partitions derived by means of clustering algorithms which take into account travel-to-work flows. In this study the geographical location of local units of enterprises is exploited in order to express a possible link between enterprises and territory. Two strategies, a headquarter-based and a territory-based, are introduced and compared. The first is equivalent to an enterprise-level strategy; the latter was implemented by imputing the values of economic indicators at local unit level following a top-down approach. This study finds that the imputation methods could be further improved by considering different aspects of the enterprises activities, for instance the organisational structure or employment characteristics.
1. Introduction In Europe, the dissemination of structural business statistics (SBS) is implemented following (Council Regulation, 2009). Each Member State disseminates information on businesses’ economic accounts by given 1 Istituto Nazionale di Statistica - Istat, via Cesare Balbo 16, 00184, Rome, Italy. Email: [email protected].
66
Chapter 6
breakdowns. The main economic indicators at enterprise-level are: number of employees, turnover, labour cost, value-added at factor costs, and so on, while the three predefined breakdowns are given as cross-classifications of administrative territorial boundaries, that is, NUTS2 level of the NUTS classification (Eurostat, 2013), principal economic activity, following the NACE Rev.2 classification (Eurostat, 2008), and size class. While the SBS regulation specifies the main output related to the performance of national economic systems, local and/or national stakeholders may require different outputs in order to design, adopt and monitor territorial policies. Indeed, National Statistical Institutes are requested to provide increasingly more detailed economic information at finer geographical breakdowns. Such information may be derived by exploiting the common features of the main actors of any economic system, that is, enterprises, and territorial partitions. This chapter investigates several issues related to the dissemination of economic indicators using different geographical levels, for instance NUTS-based or local labour market areas. In section 2, an overall description of the commonly used territorial partitions is given focusing on labour market areas. Section 3 briefly describes the Frame-SBS, the Istat system designed for collection and dissemination of the principal economic indicators. Section 3 also describes some territorial aspects of the Italian enterprises. Section 4 illustrates the two approaches followed in this preliminary study for the regionalisation of economic indicators, that is, a territory-based and a headquarter-based approach. Section 5 discusses three data-driven proportional allocation methods that operationalise the territory-based approach. Section 6 illustrates the results obtained when applying the territory-based approach to Italian enterprise data. Finally, section 7 draws some conclusions and indicates some directions for further research.
2. Territorial partitions – labour market areas The territorial partitions commonly used in official statistics may be classified into two main categories: administrative and functional. The first group includes those partitions governed by the NUTS classification (Eurostat, 2013), that is, regions (NUTS2), provinces (NUTS3) and local authority units (LAU), which in Italy are the municipalities. The second group includes those partitions which correspond to a functional governmental scope or definition. Examples of such territorial partitions include those based on the degree of urbanisation, rural-urban areas or
Producing business indicators using multiple territorial domains
67
labour market areas (LMA). Of these functional territorial partitions, only the latter is discussed in this chapter. By definition, a labour market area (LMA) is an economically integrated spatial unit within which residents can find jobs within a reasonable commuting distance. The methodologies developed at the European level generally derive LMAs by means of clustering algorithms using commuting flows available at the finest territorial level, see Coombes et al. (1986), for example. In Italy, the latest release of the LMAs is based on the commuting flows registered in the Population Census 2011 (Franconi et al., 2016). These commuting flows are collected at the LAU level. Consequently, the LMAs represent a partition of the Italian LAUs. As the clustering algorithm implemented by the Italian National Institute of Statistics (Istat) is not constrained by any rule defined in terms of administrative boundaries, the LMA partition and the NUTS-based partitions are not necessarily hierarchically related. The boundaries of the Italian LMAs are available at http://www.istat.it/it/archivio/142676. Using the 2011 Population Census data, the Italian territory was partitioned into 611 LMAs. 56 of the LMAs contain LAUs belonging to different regions and 185 LMAs contain LAUs belonging to different provinces. About 69.7 % of the LMAs are spread over a single province, 26.7% of the LMAs are spread over two provinces and 3.6% of the LMAs are spread over more than two provinces. 90.8 % of the LMAs are spread over a single region, 8.8% of the LMAs are spread over two regions and two LMAs are spread over more than two regions. In official statistics, LMAs are defined for the purposes of monitoring labour force characteristics and related topics. Using the LMAs, Istat disseminates updated information on population, employment and unemployment, education, and so on. In Italy, an additional usage of LMAs is to support a law from 2012 which introduces a strategic plan built on the identification of complex and non-complex industrial crisis areas (Decretolegge, 2012). These areas are defined using labour market productivity as one of the characteristics, thus providing an example of national stakeholders’ interest in a combined usage of different territorial partitions and economic indicators.
3. Structural Business Statistics at Istat Since 2011 structural business statistics in Italy has been based on a statistical information system called Frame-SBS where enterprise-level data on economic aggregates are registered. The population of interest covered by the Frame-SBS is derived from the Istat Business Register
68
Chapter 6
(ASIA). The statistical unit in Frame-SBS is the enterprise, as defined in the (Council Regulation, 1993), that is, “the smallest combination of legal units that is an organizational unit producing goods or services, which benefits from a certain degree of autonomy in decision-making, especially for the allocation of its current resources. An enterprise carries out one or more activities at one or more locations. An enterprise may be a sole legal unit”. The main economic variables (e.g. turnover, changes in stocks, labour costs, wages and salaries, value added, etc.) are obtained directly from integrated administrative and/or fiscal data sources covering about 95% of the population of interest. Thus, Frame-SBS may be considered as an exhaustive register. The use of different types of registers and survey data in regionalisation of statistical indicators projects is discussed further in Franconi et al. (2017). In Frame-SBS the information on the structure and economic performance of enterprises is consolidated at headquarters level. As an enterprise may have more local units (LoUs), which may be located in different LAUs/provinces/regions, the production of economic indicators at different territorial levels represents a complex task. A possible link between enterprises and territory is given by the geographical location of their LoUs. In this study, the LoU is defined as “an enterprise or part thereof (e.g. a workshop, factory, warehouse, office, mine or depot) situated in a geographically identified place” (Eurostat, 2017). In this study, the enterprises were classified according to the distribution of their LoUs over different territorial partitions. Obviously, enterprises with a single LoU may only be located in a single LAU/province/region/LMA. Conversely, enterprises with multiple LoUs may develop their business/activities either in a single territorial unit or in multiple LAUs/regions/LMAs. In Table 6-1 some indicators of Italian enterprises’ demography and performance are shown, with a breakdown on their geographical distribution. The reference year is 2012. Further, it is well-known and well documented that economic indicators generally have extremely skewed distributions, see, for example, Kloek (2011) or Alleva (2014). In other words, small percentages of (large) enterprises correspond to high percentages of total turnover, total valueadded or exports, and so on. As the largest enterprises generally have more LoUs, a strong relationship is also expected between economic performance indicators and the territorial distribution of enterprises. In Table 6-1 it may be noticed that only 4.9% of Italian enterprises have multiple LoUs, but these enterprises are extremely significant from the economic point of view. For example, the turnover of the 0.7% enterprises having LoUs in different regions is about 36% of the total turnover of
Producing business indicators using multiple territorial domains
69
Italian enterprises. As may be observed, enterprises with LoUs located in different provinces or different LMAs are characterised by a similar skewed distribution. The value-added and the labour cost variables show a similar behaviour, too. Table 6-1: Territorial distribution of some indicators of Italian enterprises’ demography and performance in 2012: “1” – within a single territorial unit; “>1” – across multiple territorial units.
Enterprises with one local unit % Enterprises % Employees % Turnover % Value-added % Labour cost Enterprises with multiple local units % Enterprises % Employees % Turnover % Value-added % Labour cost
Number of regions 1 >1
Number of provinces 1 >1
Number of Tota LMAs l 1 >1
95.1 64.8 46.2 51.1 44.8
-
95.1 64.8 46.2 51.1 44.8
-
95.1 64.8 46.2 51.1 44.8
-
95.1 64.8 46.2 51.1 44.8
4.2 15.7 17.5 16.6 20.5
0.7 19.5 36.3 32.3 34.7
3.5 11.4 12.3 11.6 14.2
1.4 23.8 41.5 37.3 41.0
3.1 9.3 10.3 9.6 11.6
1.8 25.9 43.5 39.3 43.6
4.9 35.2 53.8 48.9 55.2
4. Territory-based versus headquarters-based approach In this section two computational approaches are introduced, a headquarter-based and a territory-based. The first one relies on the equality assumption between an enterprise and the LoU where the enterprise headquarters is located. Thus, the economic indicators can be estimated by simply considering the headquarters’ geographical locations. A territorybased approach would consider instead the geographical location of each LoU. Analysing Table 6-1, the dissemination of economic performance information based exclusively on the headquarters’ geographical location might not be adequate for a sound representation of characteristics of the local economy. To illustrate this issue, in Figure 6-1 a basic relationship between territory and the LoUs of the enterprises is sketched. Let X be the
70
Chapter 6
statistics of interest, for instance number of employees, turnover or valueadded. When using a headquarters-based approach, XAh = X11 + X12 + X21 + X31 + X32 and XBh = 0 (zero) would be the estimated statistics corresponding to the territorial units A and B, respectively. On the contrary, by applying a territory-based approach, the values of the two statistics would change to XAt = X11 + X12 + X21 + X31 and XBt = X32, respectively.
Figure 6-1: Distribution of local units across territory.
Consequently, when developing economic indicators using multiple territorial breakdowns, different characteristics of the production process, including territorial distribution features, should be taken into account. In this work, the unambiguous relationship between enterprises’ LoUs and the LAUs is exploited. Following the above definition of local units (Eurostat, 2017), each LoU is located in a unique LAU. If the economic indicators were aggregated/represented at the LoU level, then these statistics could be computed at LAU level, too. Subsequently, they could be computed for any other LAU-based geographical level, for instance NUTS or LMA. Unfortunately, the representation of economic indicators at LoU level is not always a straightforward task either due to the data collection characteristics or to the management strategies of enterprises. Indeed, some economic indicators may not be registered at LoU level at all. While the number of employees and labour cost may be quite easily measured at LoU level (and hence at LAU level, too), other indicators like turnover, valueadded or exports are generally accounted for and registered at enterprise
Producing business indicators using multiple territorial domains
71
level only, like in the Italian Frame-SBS, even if an enterprise might achieve its objectives (organisational or productive) through the collaboration of its LoUs.
5. Regionalisation of economic indicators 5.1 Goal While keeping in mind the possible extensions to multiple territorial breakdowns, the main objective of this work is to investigate the estimation of the economic indicators associated with each labour market area while controlling the published totals. The latter constraint is given by the SBS Regulation which requires the production and dissemination of economic indicators at regional level. In this preliminary study we consider valueadded as the economic indicator. Three data-driven allocation methods from enterprise level to LoU level are proposed in Section 5.3.
5.2 Data Aiming at linking the economic and territorial dimensions of enterprises, this case study explores the opportunities offered by two additional Istat registers: ASIA-UL: registering some structural (principal economic activity and geographical location) and detailed information on employment (number of employees and their socio-occupational characteristics) at LoU level RACLI: registering detailed information on labour costs at LoU level. For the experimental analyses conducted in this project, the enterprises with at least one employee and having a positive value-added were selected from the Frame-SBS register. The dataset contained information about 1.5 million enterprises. The variables registered in ASIA-UL and RACLI were integrated at enterprise level. Due to the existence of unique ID codes for both enterprises and LoUs, the integration process was a one-to-one deterministic record linkage. The dataset used in this study contained about 1.8 million records, each corresponding to a LoU. When the original information was consolidated at the enterprise level only, it was replicated for each associated LoU. The following variables were used in this exploratory phase:
72
Chapter 6
enterprise ID local unit ID NACE – enterprise NACE – local unit number of employees - enterprise number of employees - local unit labour cost - enterprise labour cost – local unit value added - enterprise.
For each enterprise with multiple LoUs, the integrated dataset also contained information about which LoU was the enterprise’s headquarters.
5.3 Allocation methods LMAs could be considered as unplanned estimation domains. Consequently, different methodological approaches stemming from a small area estimation framework could be applied in order to estimate valueadded at LMA level, see Saei et al. (2003), for example. This approach might however not be completely satisfactory, since it would provide a solution only for a single additional territorial domain, for instance the LMA. In order to design the estimation of economic indicators using multiple geographical domains, the finest territorial unit, that is, LAU, should be considered. Anyway, small area-based estimation strategies will be investigated further in future research studies. In order to estimate the value-added at LMA level, it should be sufficient to estimate the value-added for each local unit (LoU) located within LMAs. In such a methodological process, besides the obvious data availability constraint, there are three main issues to be tackled: theoretical background, allocation method and definition of criteria for quality evaluation. Firstly, it should be assessed whether the allocation of economic measures/indicators to local units (LoUs) is supported by any economic theory. A top-down allocation of economic measures from an enterprise to its LoUs depends on the nature and economic meaning of the estimated variable. Indeed, the LoUs may be involved in different stages of the production process. Thus, it may not be guaranteed that value-added, for example, could really be assigned to each LoU. Examples of such LoUs are those related to the logistics or to management processes. Further, the heterogeneity of the management and production strategies adopted by enterprises might be an additional issue to take into consideration. Indeed,
Producing business indicators using multiple territorial domains
73
activities like exports/imports or investments may be more selective and might involve only a few specialized LoUs. In conclusion, being bound by their meanings, different economic indicators might need different conceptual and methodological allocation approaches. Secondly, the statistical method used to allocate an economic indicator to each LoU should be evaluated. Many methods can be envisaged, but they generally depend on data availability, see also Eurostat (2013). In this preliminary work, three proportional methods were investigated for the value-added allocation: a) each LoU is assigned an equal value-added, b) each LoU is assigned a value-added proportional to its number of employees, c) each LoU is assigned a value-added proportional it its labour cost. Methods a and b rely only on the information registered in ASIA-UL while method c is based on the information included in RACLI. While method a assumes that LoUs are equally involved in the production process, methods b and c may be viewed as labour-based approaches, thus assuming a positive relationship between value-added and labour intensity. With respect to method b, method c also incorporates, in an extremely simplified manner, a positive correlation between value-added and work specialisation. Indeed, it is implicitly assumed that the involvement of more specialized work/workers generates simultaneously higher levels of labour cost and value-added. Thirdly, a criterion to evaluate the results should be identified in advance. As benchmarking indicators are not easily available, the identification of a meaningful criterion is not a simple task. For the current work, based on (Council Regulation, 2009), coherence with the headquarters-based regional estimates was chosen as a starting point. Further studies may evaluate other criteria, for instance based on auxiliary information available through additional surveys (see e.g. Smith et al. (2003) for an example).
6. Analysis of the results As methods b and c strongly depend on the workforce, an initial comparison of headquarters-based and territory-based approaches was performed using the number of employees as the statistic of interest. Consider a given territorial unit, for instance a region or an LMA. When using the headquarter-based approach, the number of employees, EMP_headq, is computed as the total number of employees of the
74
Chapter 6
enterprises whose headquarters are located in the considered territory. Alternatively, the territory-based approach estimates the number of employees, EMP_terr, as the total number of employees working in the LoUs which are located in the considered territory. Table 6-2 shows the main descriptive statistics of the relative errors (EMP_headq – EMP_terr)/EMP_headq*100 computed over the territorial units corresponding to a given partition. It should be noted that the headquartersbased approach relies on the information registered in Frame-SBS while the territory-based approach may be derived only from the ASIA-UL register. As the same total number of employees is registered in both Frame-SBS and ASIA-UL for each active enterprise, the large differences shown in Table 6-2 are exclusively due to the two approaches. Furthermore, the estimation of the total number of employees is not based on any particular economic or statistical assumption. Consequently, when using more complex economic indicators, for instance value-added, large differences between the two approaches should be expected, too. For each territorial partition, Table 6-2 generally shows negative values. This characteristic is due to the headquarters distribution across the Italian territory. Indeed, for example, only 25% of regions host 51% of the headquarters; half of the regions host 79% of the headquarters. As already stated, any allocation of value-added at LoU level would allow the straightforward derivation of the value-added at LMA-level or province-level. In contrast, the preservation of headquarters-based totals at regional level is not guaranteed. Once each LoU is assigned a value-added, the territory-based approach previously described may be applied. Then the differences from the headquarters-based approach may be computed at regional level (or using a different territorial partition). While in Table 6-2 a comparison of three territorial partitions is given, Table 6-3 shows a comparison of the proportional methods a, b and c with the headquarters-based approach. While Table 6-2 allows the comparison of three territorial partitions, Table 6-3 allows the comparison of three allocation methods. Similar to the number of employees in Table 6-2, the Table 6-2. Difference between estimated number of employees when applying the headquarter–based and territory-based approach. Territory Regions Provinces LMAs
Min -22.6 -38.7 -29.0
Q1 -11.8 -13.5 -13.6
Median -8.3 -9.1 -8.3
Mean -6.8 -9.2 -8.6
Q3 -3.9 -5.0 -3.9
Max 18.0 24.8 30.7
Producing business indicators using multiple territorial domains
75
Table 6-3. Estimation of value-added at regional level. Comparison of territory-based approach (methods a, b and c) with the headquartersbased approach. Method a (equal) b (prop. employees) c (prop. Labour cost)
Min Q1 Median Mean Q3 Max -82.5 -35.9 -21.3 -23.8 -7.9 41.4 -59.6
-26.0
-15.6
-16.7
-7.4
34.4
-54.6
-24.6
-14.5
-15.5
-6.9
33.3
differences illustrated in Table 6-3 are quite large. The largest differences are generated by method a while the smallest differences are obtained when applying method c. According to the preliminary evaluation criterion used in this work, that is, the preservation of the totals mentioned in (Council Regulation, 2009), method c seems to be preferable. Anyway, as previously indicated, this evaluation criterion might not be completely satisfactory. The enterprise- LoU allocation methods might be further developed and improved by taking into account additional characteristics of the production process. For example, one might further detail the labour cost by different job and occupational features, for instance type of contract, tasks, gender, experience, level of educational attainment, skills, and so on In Figure 6-2, a map of the differences generated by method c is shown. As expected Lombardia (the region whose capital is Milan) is the region corresponding to the greatest difference. This feature is due to the fact that about 18% of the Italian enterprises have their headquarters located in this region, since it offers closeness to national authorities and important infrastructural developments. In order to achieve coherence with value-added totals at regional level, different methodologies might be applied at macro or micro level. In further studies a comparison of possible strategies aimed at reaching the previously mentioned coherence will be performed. For example, iterative fitting methods could be applied at the LAU level in order to minimize the differences with respect to the estimates at regional level. Additionally, balancing methods could be applied for the derivation of economic indicators at LoU level (see Stone et al. (1942) and other studies in the area of balancing national accounts). These methods seem quite promising since they may be constrained to preserve given marginal totals within predefined confidence intervals.
76
Chapter 6
Figure 6-2: Percentage differences at regional level between headquarter-based and territory-based approaches. Estimated variable: value-added. Black = positive difference, dark gray = between – 10 and 0, light gray = less than -10.
7. Conclusions and further studies In this chapter several issues related to the production of economic indicators using different geographical levels were highlighted. A strategy to estimate structural business statistics at labour market area level was sketched. As it distributes the values from enterprise-level to local unit level, the proposed methodology may be considered a top-down approach.
Producing business indicators using multiple territorial domains
77
The relationship between economic indicators, allocation method and data availability was also discussed. Further methodological improvements might require the usage of different types of information. For example, small area estimation techniques might be studied in order to exploit different types of auxiliary information. Additional developments might concern the tuning of the allocation method in order to better exploit some available information. Two directions of research may already be indicated. Firstly, the usage of structural information on enterprises and their LoUs could improve the top-down allocation model. Examples of potential additional information are given by both the similarity between LoUs and enterprises with a single local unit and the principal economic activity (NACE classification) of LoUs. Secondly, as the model introducing employee characteristics prevailed over the two others, additional information on employment characteristics might provide further improvements. Socio-demographic (age, gender, nationality, and so on) and contract-related (type and duration of the working contract, experience, skills) characteristics are examples of employment features that may be included in the allocation model. Finally, it is surely worth investigating the evaluation criteria choice. Indeed, meaningful comparisons with additional surveys might be planned. Furthermore, evaluation criteria based on similarity measures between LoUs and enterprises with a single local unit might also be derived and analysed.
Acknowledgements The views expressed in this chapter are the author’s and have not been reviewed to reflect an official position of Istat.
References Alleva, G. (2014). Integration of business and trade statistics: limitations and opportunities. DGINS conference 2014 “Towards global business statistics”, Riga, September 2014. Coombes, M.G., Green, A.E., and Openshaw, S. (1986). An efficient algorithm to generate official statistics report areas: the case of the 1984 Travel-to-Work Areas in Britain. Journal of the Operational Research Society 37, 943-953. Council Regulation (1993). No 696/93 Statistical units for the observation and analysis of the production system in the Community. Official Journal of the European Union L 076 , 30/03/1993 P. 0001 – 0011.
78
Chapter 6
Available at: http://eur-lex.europa.eu/LexUriServ/LexUriServ.do?uri= CELEX:31993R0696:EN:HTML Council Regulation (2009). No 251/2009 [Structural business statistics]. Official Journal of the European Union L 086, 31/3/2009 P 170 – 228. Available at: https://eur-lex.europa.eu/legal-content/EN/TXT/ PDF/?uri=CELEX:32009R0251&from=EN. Decreto-legge (2012). Misure urgenti per la crescita del Paese. 22 giugno 2012, n. 83. Available at: http://www.agid.gov.it/sites/default/files/ leggi_decreti_direttive/dl-22-giugno-2012-n.83_0.pdf (accessed 201801-30). Eurostat (2008). NACE Rev.2 Statistical classification of economic activities in the European Community. Luxembourg: Publications Office of the European Union. Eurostat (2013). Manual of regional accounts. Luxembourg: Publications Office of the European Union. Eurostat (2015). Regions in the European Union. Nomenclature of territorial units for statistics NUTS 2013/EU-28. Luxembourg: Publications Office of the European Union. Eurostat (2017). Statistics explained. Luxembourg: Publications Office of the European Union. Available at http://ec.europa.eu/eurostat/ statistics-explained/index.php/Glossary:Local_unit (accessed 2018-0216). Franconi, L., D'Alo', M. and Ichim, D. (2016). Istat implementation of the algorithm to develop Labour Market Areas. http://www.istat.it/en/ files/2016/03/Description-of-the-LabourMarketAreas-algorithm.pdf. Franconi, L., Ichim, D., and D'Alo', M. (2017). Labour Market Areas for territorial policies: tools for a European approach. Statistical Journal of the IAOS 33, 585-591. Kloek, W. (2011). What makes business statistics different? European Establishment Statistics Workshop EESW2011, Neuchâtel, September 2011. Saei, A. and Chambers, R. (2003). Small area estimation under linear and generalized linear mixed models with time and area effects. S3RI Methodology Working Papers M03/15. Southampton: S3RI, University of Southampton. Smith, P., Pont, M. and Jones, T. (2003). Developments in business survey methodology in the Office for National Statistics, 1994–2000. The Statistician 52, 257-295. Stone, R., Champernowne, D.G. and Meade, J.E. (1942). The precision of national income estimates. The Review of Economic Studies 9, 111-125.
CHAPTER 7 IMPROVING THE EFFICIENCY OF ENTERPRISE PROFILING
JOHAN LAMMERS1
Abstract Producing economic statistics implies profiling of economic actors: enterprise groups and enterprises. As a process, profiling can be labourintensive. Improving the efficiency of profiling is part of cost reduction in a statistical office. This chapter describes the way efficiency improvements are worked out at Statistics Netherlands. As a consequence, these timesavings offer means to extend the coverage of the business register and to improve its quality.
1. Introduction Statistics Netherlands has invested in the past three years in Lean Six Sigma (Smekens and Zeelenberg, 2015) as a method to improve its statistical processes. The method uses the Define, Measure, Analyse, Improve and Control (DMAIC) approach (Smekens and Zeelenberg, 2015). In this paper we will discuss one of the many Lean Six Sigma Projects that were carried out. The project had the objective to improve the efficiency of the profiling of large enterprises in the Dutch Statistical Business Register. Profiling is used to determine the structure, core attributes and dynamics of those enterprises. The profiling activity is standardised and centralised within Statistics Netherlands to support statistical coordination. For large enterprises, profiling is labour-intensive. The intended gain in efficiency was necessary to finance cost reduction measures in Statistics Netherlands, 1 Department of process development and methodology, Statistics Netherlands, P.O. Box 4481, 6401 CZ Heerlen, The Netherlands. Email: [email protected].
80
Chapter 7
for the extension of the coverage of the business register and for improvement of the effectiveness of profiling. The objective of the project was to reduce the number of hours worked on profiling in the traditional way, taking the savings as a mixture of budget reduction, profiling scope increments and better communication about profiles with users. The ambition of the process owner was a reduction of 45%. The project was initiated by the idea that the profiling process, as it was executed during the past 10 years, could be more effective and efficient. The statistical producers’ needs for profiling were not sufficiently known and statistical producers were not familiar enough with the products and services offered by profiling. Some initial analysis (an internal study by Lammers and Paulussen from 2014) using process mining (van der Aalst, 2011) had already indicated opportunities to improve. This chapter describes the main results from the project. First, profiling will be described as a part of the chain of production of economic statistics. Then, the analysis of the profiling process and the proposed and implemented improvements are presented. The concluding section summarises and discusses the main findings from this project. The Lean Six Sigma approach is quite general and can be applied to improve all kinds of statistical processes. The case study on profiling is a showcase to illustrate this.
2. Profiling as a part of the chain of production of economic statistics Profiling is a method of analysing the legal, operational and accounting structure of an enterprise group at national and world level, in order to establish the statistical units within that group, their links and the most efficient structures for the collection of statistical data. The structure, main attributes (like the industry classification) and the dynamics of enterprises are processed. Profiling does not include consolidation of accounts. Statistical departments use this information for sampling, data collection, consolidation, editing, estimation and integration. Changes in the statistical business register (SBR) influence the outcomes of economic statistics: for example, births, mergers, cessations of large enterprises. A thorough and timely analysis of this influence is necessary to produce statistics of good quality. Larger enterprises have more influence. Therefore, and because manual profiling is costly, profilers mainly evaluate changes in the most influential enterprises. To profile the 350 ‘most complex’ Enterprise groups in the Netherlands requires 4 profilers. For the next segment of 950 Enterprise groups an additional staff
Improving the efficiency of enterprise profiling
81
of 7-8 is needed. Both segments together are referred to as Top-P enterprise groups. The scope of the project was limited to the second segment of 950 Enterprise groups within the Top-P. The policy at CBS was to manually evaluate all changes in Top-P enterprises. The owners of the chain of production of economic statistics demand an effective and efficient use of profiling. To determine the specific needs of this group of internal customers, we organized a workshop ‘Customer Arena’. In this workshop the customers, the statistical departments, are in the lead. They discuss their needs for profiling products and services. What are the current products, are statistical departments familiar with them, what do the departments think of the quality, of improvements etc. The producers, the profiling staff, form the public of this arena. Their main concern is to listen to their customers and to ask clarifying questions. The result is a better mutual understanding of these needs. The main results of this ‘Customer Arena’ showed that customers are satisfied with the current service, but there is still room for further improvement: Spend time and effort on profiling cases with the largest statistical relevance. Important enterprises and important changes should get more profiling attention. The relevance of a specific change will not be the same for different economic statistics. Nonetheless, the relevance can be determined initially by identifying changes in the number of enterprises within a specific cell, of size class and industrial activity, or by changes in the structure of an enterprise (group). A subsequent action will be to elaborate more on criteria of relevance. Improve the communication of major changes (timeliness, clarify the profiling procedure, standardized reports). The most important outputs from profiling to support statistical processing are profiling reports (elaborating the structure and describing the main underlying arguments), change reports (describing the causes and the impact of a specific change) and letters to inform the main stakeholders (statistical users, data collection staff and respondents) about the new structure. Involve statistical departments when necessary. Statisticians can contribute with additional knowledge about the units and they want to be prepared in time for relevant changes.
82
Chapter 7
3. Process analysis and improvements In this section, an analysis of the process of manual evaluation of results from the automatic profiling is presented. For the analysis of the profiling process, process mining (van der Aalst 2011) is applied. This technique uses process metadata that is generated by applications: in this case mainly the Dutch SBR application. With this information the ‘de facto’ process can be shown. The manual profiling process starts after SBR sources have been processed. It evaluates automated profiling results, which have identified statistical units, their attributes, relations, and dynamics (changes). The automatic results are evaluated with a set of criteria to determine whether the case should be evaluated manually. Less than 1% of the cases will enter the manual profiling process. In manual profiling, the profiler analyses the statistical events from automatic profiling at micro-level by comparing the information with other sources (accounts, press releases, information from business statistics) and evaluates the statistical impact. When the profiler agrees with the information, the SBR will be updated with it, otherwise the profiler is able to change the data. If necessary, the profiler takes care of the documentation. A change or a group of interrelated changes on the SBR is termed a statistical event. Figure 1 presents the result of a process mining analysis done on the manual profiling process for statistical events in the first half of 2016. In the scheme, the main process steps (boxes) are visualized and the main flows between process steps (single headed arrows). The colours of the boxes and the thickness of the arrows indicates the number of cases (statistical events) involved. Only the main elements are visualized to reduce the complexity of the scheme. All these events apply to the Enterprise groups in Top-P in the Netherlands. The policy at CBS has been that profilers are to evaluate all changes concerning these units manually. Figure 7-1 presents a process model that shows the number of times steps and their sequences have been carried out in the data from the first half of 2016. Such a model facilitates identification and quantification of process performance and is the basis for planning improvements. The figure shows that in the six months 4435 statistical events were evaluated by profilers. In 1235 events, SBR information created during automatic profiling was changed by the profilers, of which 268 (21.7%) events involved multiple changes. 1158 events with changes involved only making a change of the automatic profiling information, while 77 (1235 1158) involved first creating a documentation entry (this flow is not
Improving the efficiency of enterprise profiling
83
Figure 7-1. Process for profiling statistical events on TOP-P Dutch SBR, JanuaryJune 2016.
visualised in Figure 7-1). Hence, for 3200 (4435 1235) events the information was evaluated to be correct. In 639 cases (14%), one or more documents were produced to support/explain the manually evaluated information from automatic profiling in the SBR. For 46 cases, documents were produced after the implementation of the change. The documents are meant to communicate the new situation to the enterprises and the statistical departments. Finally, there were 2866 events (65%) that led to
84
Chapter 7
no change and no documentation entry. (The difference to the 2882 events in Figure 7-1 are events that are documented after implementation. Out of 46 such events, 16 have not been changed. From that, 2882 16 = 2866 events are neither changed nor documented.) Table 7-1. Types of change in the business population and whether they are deemed to have a statistical impact or not. Type of change Births or cessations, EG or ENT Births or cessations, LU
Births or cessations of control relations Changes of relevant attributes SBR Change of number of persons employed in ENT Changes of other attributes Changes of metadata
Description At least one birth or one cessation of an enterprise group (EG) or an enterprise (ENT). At least one birth or a cessation of a legal unit (LU). The population of EG and ENT is not changed, only the structure in terms of legal units changes. At least one birth or a cessation of a control relation between legal units. The population of EG, ENT and LU is not changed, only the structure in terms of legal units changes. Size class, Activity-code, Institutional Sector code or main legal unit of at least one EG or ENT changes. The population of EG, ENT and LU is not changed. Only the number of persons employed for at least one of the enterprises (ENT) changes. Population, size class of ENT, and other relevant attributes are not changed. Only other attributes (like a telephone number, URL, email etc) change. The content of the statistical population is not changed, only metadata (timestamps, source indicators) are altered.
Impact Yes Yes
Yes
Yes
No
No No
Improving the efficiency of enterprise profiling
85
Discussions in the ‘Customer Arena’ have shown that the statistical relevance of a change is important. In the current process, all statistical events that occur in Top-P enterprise groups are considered to be relevant. To analyse this relevance, the events are classified by the type of change in the business population. Seven types of change are distinguished in Table 7-1. Figure 7-2 shows the frequencies of the types of changes induced by statistical events. Statistical events with births or cessations of entities alter the population of large entities for business statistics. Generally, this has a large impact on produced statistics. The same applies to changes of relevant attributes (for example size class, activity, etc). The impact of the last three types of changes in the histogram is (very) limited and, in line with the needs of the customers (staff of business statistics and national accounts), will be considered as cases without impact. These types of changes without impact concern 3147 events (71%)!
1600 1400 1200 1000 800 600 400 200 0 Birthsor Birthsor Birthsor Changes OnlyNo Changesof Changesof Cessations Cessations Cessations relevant Persons other metadata EGENT LU Control attributes Employed attributes SBR ENT
Figure 7-2. Statistical events on Top-P, profiled in January - June 2016, by type of change.
Continuing with the analysis, the statistical events from January to June 2016 were divided into two groups: events with impact and events without impact. For both groups the profiling process was analysed in the way presented in Figure 7-1.
86
Chapter 7
Figure 7-3. Process for profiling statistical events with statistical impact on Top-P Dutch SBR, January - June 2016.
Figure 7-3, presenting profiling of events with impact, indicates that 44% (566/1288) of the events lead to a change in the SBR, while 26% (338/1288) lead to documenting the event. Because of the impact, more documentation, or at least more communication, with the statistical departments would be expected. We found that 45% (582/1288) of the events were profiled without making any changes to the original structure or without documentation. For these events the results from the automatic profiling of the SBR are accepted.
Improving the efficiency of enterprise profiling
87
Figure 7-4. Process of profiling statistical events without statistical impact on Top-P Dutch SBR, January - June 2016.
Figure 7-4 presents the results of profiling statistical events without a statistical impact. It shows that 10% (301/3147) of the events without a statistical impact are documented, which indicates overprocessing as such documentation can only be used to evaluate the automated processing. Further, 21% (669/3147) of the events without impact are changed. This
88
Chapter 7
may indicate overprocessing as these events are deemed to have no statistical impact. But on the other hand, these false positives (indicating a need for change in events with no statistical impact) may be necessary to avoid false negatives (not detecting a change with statistical impact) in the automatic profiling process. If performed, an evaluation of quality of the automatic change detection may offer a trigger to improve the criteria to assess whether to evaluate manually. In 79% (100%-21%) of the events involving no statistical impact the profilers did not change the business register. Manual profiling of cases with no statistical impact is not satisfying the needs of the customers and should be treated as overproduction.
4. Process improvements The analysis indicates that a remarkable part of the process may be identified as waste while other parts appear to be under-attended. The profiling process should be rebalanced to get a better compliance with the needs of the users, the statistical departments. Overproduction in manual profiling concerns almost 400 events as a monthly average. Avoiding this work corresponds to 30% less effort for manual profiling. Separately from this analysis, the profilers indicated that they spend considerable time waiting for applications to start up and close. Profilers use various applications simultaneously (the business register application, a dedicated Top-P-application, the Internet, MS-office, Adobe, etc.). Tests and estimations were carried out with the result that 6-8% of profiling capacity is spent on waiting for these purposes. The expectation is that 34% of profiling capacity can be gained with proper technical improvements. Other additional causes like overprocessing increase the potential to improve the efficiency by 10%. These causes are not documented in this paper but correspond to unnecessary complexity of the manual profiling when postponed checks cause extra loops in the process. Taking these three sources together, about 45% of profiling capacity (300-350 hours every month) can be gained from the traditional way of profiling and a part of this will be used to improve the service. In order to achieve all these improvements, changes in policy, the register application, the instructions and, hence, in the profiling process are necessary. The policy at CBS was to manually evaluate all changes in Top-P enterprises. This Lean Six Sigma project showed that this should be
Improving the efficiency of enterprise profiling
89
restricted to all changes with a statistical impact. Furthermore, the documentation during profiling has to be better aligned with the statistical needs: this requires a decision tree to decide what (and when) types of documentation are useful/necessary and when this is not the case. Three types of documents are relevant: profiling reports to explain the logic applied to the profile, change reports to illustrate the nature of a change and its impact, and structure reports to communicate with the enterprises for consolidation and reporting purposes. The readability and usability of the documents have been improved by standardization (using templates, created in cooperation with the statistical departments). The application should be able to support the identification of events without statistical relevance and to prevent these events from being profiled manually. In the second quarter of 2017 the Dutch SBR-application was changed and from the beginning of July 2017 the new way of profiling was implemented. In early July 2017, the first results of this change became available and they were in line with the expectations: a 45% gain in efficiency (of which 30% was caused by overproduction, and 10% by overprocessing)! The costs for this project and for the implementation of the improvements will be paid back by the benefits within 4 months. Since the changes considered are those that have been judged to have no statistical impact (see above), no investigation of the aggregate effect on the outputs was judged to be needed.
5. Conclusions This project showed that investing in improvements with the use of Lean Six Sigma methods may be worthwhile. Within the project three causes were found for the current inefficiency of the profiling-process: overprocessing (the process contains unnecessary steps and loops, i.e. no value is added), overproduction (more profiling is performed than needed by customers) and waiting. For these causes, we worked out improvements and partly tested them using experiments. The experiments showed that it is possible to construct better rules to filter the cases for manual evaluation by profilers. The proposed improvements were acceptable and realistic, and sufficient to realize the 45% efficiency gain. The improvements were implemented in the first half of 2017. To validate the effectiveness of the measures, the efficiency will be measured and managed during the subsequent 12 months.
90
Chapter 7
We managed to improve the efficiency of manual profiling drastically, which creates room to further improve profiling services. By the application of Lean Six Sigma on this process we gained knowledge about the process (especially in the stages Define, Measure and Analyse). Involvement of customers and profilers is necessary to get adequate information and commitment. The availability of sufficient metadata about the process of profiling is very helpful in measuring performance, identifying and quantifying waste and opportunities to improve. Open issues to be studied further include the quality of the automatic profiling process and its relation to the manual profiling process, as well as the impact of proposed changes in the profiling process on the quality of the produced (output) statistics.
References van der Aalst, W.M.P. (2011). Process mining, discovery, conformance and enhancement of business processes. Berlin and Heidelberg: Springer-Verlag. Smekens, M. and Zeelenberg, K. (2015). Lean six sigma at Statistics Netherlands. Statistical Journal of the IAOS 31, 583-586.
CHAPTER 8 THE IMPACT OF PROFILING ON SAMPLING: HOW TO OPTIMISE SAMPLE DESIGN WHEN STATISTICAL UNITS DIFFER FROM DATA COLLECTION UNITS
EMMANUEL GROS1 AND RONAN LE GLEUT1
Abstract Business statistics in France and other countries are transitioning from legal unit-based statistics to enterprise-based statistics. However, data collection units for most surveys remain the legal units whereas the statistical units are now the enterprises. The sampling design in this new paradigm can be seen as a two-stage cluster sampling: enterprises are selected with a probabilistic mechanism and then all legal units within those enterprises are included in the sample. Cluster sampling has two well-known drawbacks: loss of precision due to the similarity of units in a cluster, and lack of control of the sample size. This chapter presents how the sampling design of the French structural business survey was optimized, in order to obtain sufficiently precise estimates at the enterprise level under a constraint pertaining to the number of surveyed legal units.
1. Introduction and context In many countries of the European Union, business statistics are undergoing great changes. In France, for instance, business surveys conducted by INSEE – the French National Statistical Institute – which were previously based only on the observation of legal units that have a 1
Institut national de la statistique et des études économiques, 88 avenue Verdier, 92120 Montrouge, France. Email: [email protected], [email protected].
92
Chapter 8
juridical definition, are now increasingly based on the economic notion of the enterprise. Enterprise is the smallest combination of legal units that is an organisational unit producing goods or services with a certain degree of autonomy. The use of this statistical unit as the reporting unit has become necessary to reflect the globalization of the economy more correctly in statistics (Haag, 2018, Chapter 4 in this volume). The sample design of the surveys in this new paradigm can be seen as a two-stage cluster sampling. Enterprises are selected with a probabilistic mechanism, and then all legal units within those enterprises are included in the sample. A well-known drawback of cluster sampling is a loss of precision due to the similarity of units in a cluster, but: more than 95% of enterprises consist of a single legal unit; the legal units of an enterprise may have different activities; which both tend to attenuate this expected loss of precision. However, a further drawback of cluster sampling is that we cannot completely control the final sample size. If, at the enterprise level, we keep the sampling rates used for a survey design at the legal unit level, this may decrease the number of primary units drawn, while increasing the number of legal units to be surveyed. This paper presents how the sample design of the French structural business survey was optimized, in order to obtain sufficiently precise estimates at the enterprise level under a constraint pertaining to the number of surveyed legal units. To produce statistics on enterprises, an important methodological operation of profiling is ongoing at INSEE. This operation consists of defining economic enterprises within groups – because for independent legal units, the juridical definition of the legal unit corresponds to the economic definition of the enterprise. To do this, at INSEE a manual delineation of enterprises is performed within complex business groups, whereas an automatic profiling algorithm is applied for other groups, which considers that each group defines an enterprise. This profiling operation has a major impact on the process that produces structural business statistics in accordance with the European Structural Business Statistics (SBS) regulation. The process, called ESANE from the French “Élaboration des Statistiques Annuelles d’Entreprise”, is based on a major founding principle: use administrative data in an intensive way, in conjunction with survey data (Brion and Gros, 2015).
Impact of profiling on sampling
93
1.1. The French structural business surveys In the ESANE system, two structural business surveys are used to produce structural business statistics: the ESA (Annual Sectoral Survey), in whose scope are activities of trade, construction, services and transport. The sample is very large, with almost 116,000 legal units surveyed each year in Metropolitan France; the EAP (Annual Production Survey), whose scope concerns manufacturing industry. The sample is composed of about 35,000 units in Metropolitan France. One of the main purposes of these two surveys is to deduce the main activity of a company by breaking down its turnover into activities (sectoral classification). Until reference year 2015, these two surveys have had stratified simple random samples of legal units, drawn independently.
1.2. The SBS European Regulation The SBS Regulation (EEC, 1993) covers industry, construction, distributive trades and services. It prescribes statistics which describe the structure, conduct and performance of businesses across the European Union according to the NACE activity classification. These statistics can be broken down to a very detailed sectoral level, as well as according to the size of enterprises. In order to comply with this regulation, business statistics produced by ESA and EAP surveys will be based on the economic notion of enterprise instead of the juridical definition of a legal unit.
2. Method Since the statistical units (enterprises) are now different from the data collection units (legal units), the sample design of the surveys related to the reference year 2016 can be now seen as a stratified cluster sampling. As a cluster, an enterprise is selected using a probabilistic mechanism, and then all legal units within this enterprise are included in the sample.
2.1. Definition of the take-all strata We keep the same historical thresholds for the number of employees and turnover, but modulate them by a coverage rate of the turnover to reach in each business sector. Enterprises composed of more than 20 legal units,
94
Chapter 8
with more than 200 employees or with more than 50M€ turnover are automatically included in the take-all strata. In order to decrease the number of legal units in the take-all strata, we add a cut-off rule within each enterprise. This rule avoids surveying legal units with a very low turnover, which are assumed to have only one activity. This results in about 70,000 legal units in take-all strata: 40 000 for ESA – which leaves approximately 76,000 legal units to be sampled from takesome strata; and 30,000 for EAP – which leaves approximately only 5,000 legal units to be sampled from take-some strata.
2.2. Stratification and domains of interest The take-some strata are defined by crossing the five-digit business sector of the French activity classification (a sub-classification of the European four-digit classification) with the number of employees in each enterprise, in nine classes. Two sets of domains of interest are considered: the business sectors of the French five-digit activity classification; the intersection between the three-digit business sectors and the number of employees in four size classes: [0-9], [10-49], [50-199] and [200 and more].
2.3. Allocations The allocations of the sample to strata are calculated using Neyman allocation on the turnover available in the business register of the enterprises integrating local constraints on precision on domains of interest (Koubi and Mathern, 2009). The advantage of this algorithm in comparison to the classical Neyman allocation is that we can add the constraint of a maximum local CV in the domains of interest. Since data are still collected from legal units, the survey cost depends on the number of legal units surveyed (116,000 units for ESA and 35,000 for EAP). Therefore, we extend the algorithm presented by Koubi and Mathern to the optimal allocation, by introducing costs into the optimisation problem. If we denote by yk the turnover of enterprise k, tˆyʌ the Horvitz2 Thompson estimator for the total of turnover, S y,h the empirical variance
of yk in stratum h, NLU the number of legal units to be drawn in the scope of one survey (ESA or EAP), NLU,k the number of legal units of enterprise
Impact of profiling on sampling
95
k in the same scope, nh the number of enterprises to survey, Nh the number of enterprises and f h = nh / N h the sampling rate in stratum h,
Ch = N LU,h = 1 / N h ¦ N LU,k the cost, i.e. the mean number of legal units per enterprise in stratum h, D the whole range of domains of interest and CVloc the local precision we expect, we have to solve: H
Min V p [tˆyS ]
¦N
n1 ,...,nH
h 1
2 h
(1 f h ) 2 S y ,h nh
subject to the constraints:
H °¦ Ch nh N LU °° h 1 ®nh d N h ° °Max CVd d CVloc ¯° dD As we cannot combine the two different sets of domains of interest – the one related to five-digit business sectors of the French activity classification, the other one related to the intersection between the threedigit business sectors and the number of employees in four size classes – in the same Neyman allocation, we calculate both and compare them (see Section 3).
2.4. Variability of the number of legal units to survey We introduced in the preceding subsection a mean cost per stratum in the Neyman allocation, that leads to surveying NLU legal units on average. But the number of legal units to be drawn remains random and varies from one sample to another:
Nˆ LU
H
¦¦N
LU , k
h 1 k S h
We can rewrite this quantity as the Horvitz-Thompson estimator of the variable zk:
96
Chapter 8
z Nˆ LU = ¦ ¦ N LU,k =¦ ¦ k = tˆzʌ h=1 k S h h=1 k S h ʌ k H
H
with zk = ʌ k N LU,k and ʌ k =
nh , which is an unbiased estimator of NLU. Nh
In the case of stratified simple random sampling, the variance of this estimator can finally be expressed as: H
V p ª¬ Nˆ LU º¼ = ¦ nh 1 f h S 2
N LU
h=1
with S 2
N LU
,h
=
,h
2 1 N LU,k N LU,h ¦ N h 1 kU h
2.5. Efficiency boundaries In order to control the local precision on the two sets of domains of interest, we calculate, for each set of domains, the minimum number of enterprises that should be drawn in order to satisfy the last constraint in the minimisation problem of subsection 2.3, for different values of the CVloc parameter. We also calculate the global CV that we would obtain for each CVloc value. For a given number of enterprises to survey nent, we call the allocations (n1,…,nH) that cannot lead to a better local precision without a deterioration of the global precision the efficiency boundary. We can represent this boundary in a plot with the maximum (i.e. worst) local CVs on the x-axis and the global CVs on the y-axis. The plot in Figure 8-1 represents the efficiency boundary for the first set of domains of interest: the five-digit business sectors of the French classification. As we could expect, the Neyman allocation without local constraints on precision (represented here with a cross) is a flat optimum. The global precision gets worse if one chooses very strong local precision. We can see that the best local precision without a considerable deterioration of the global precision could be a local CV of 5% for ESA and 2% for EAP. The plot in Figure 8-2 represents the efficiency boundary for the second set of domains of interest: the intersection between the three-digit business sectors and the employee size classes. We can see that the best local precision without a noticeable deterioration of the global precision would be a CV of 8% for ESA and 11% for EAP.
Impact of profiling on sampling
97
0.18% EAP
0.16%
ESA
5%
0.14%
Global CV
0.12% 0.10% 0.08% 0.06% 2%
0.04% 0.02% 0.00%
0.5% 2.0% 3.5% 5.0% 6.5% 8.0% 9.5% 11.0% 12.5% 14.0% 15.5% 17.0% 18.5% 20.0%
Maximum local CV
Figure 8-1: Efficiency boundary for the five-digit business sectors of the French classification.
0.20% 0.18%
EAP
ESA
0.16%
Global CV
0.14%
8%
0.12% 0.10% 0.08% 0.06%
11 %
0.04% 0.02% 0.00% 0.5%
4.0%
7.5%
11.0%
14.5%
18.0%
21.5%
25.0%
28.5%
Maximum local CV
Figure 8-2: Efficiency boundary for the intersection between the three-digit business sectors and the number of employees per enterprise.
98
Chapter 8
3. Results 3.1. Number of enterprises to draw For the first set of domains of interest (five-digit business sectors), in order to get the best local CVs and the expected number of legal units to survey on average – that is 116,000 legal units for ESA and 35 000 for EAP – we have to draw nent,1 = 109,900 enterprises, with 27,000 enterprises from EAP's scope and 82,900 enterprises from ESA’s scope. These numbers result from the optimization algorithm in subsection 2.3. In this algorithm, the parameter – in terms of size of the sample – is the expected number of legal units that we want to select, and the number of enterprises to draw is the result of the optimisation. For the second domain of interest (three-digit business sectors crossed with the employee size classes), we have to draw nent,2 = 109,500 enterprises, with 27,000 enterprises from EAP's scope and 82,500 enterprises from ESA’s scope. We also calculated the mean of these two allocations, in order to get a “mix” between the best local precision on each domain of interest. We will discuss the results of this approach in Section 3.3. This “mixed” allocation leads to drawing nent,mix = 109,700 enterprises, with 27,000 enterprises from EAP's scope and 82,700 enterprises from ESA’s scope. All these values for nent (nent,1, nent,2, nent,mix) lead to the selection of approximately 35,000 legal units from EAP's scope and 116,000 legal units from ESA’s scope on average.
3.2. Variability of the number of legal units to survey If we now consider the variability of these results, using the formula in Section 2.4 for the variance of the number of legal units to be drawn, we can see in Table 8-1 that the variability is very low in general and for each survey's scope. The results are approximately the same for all the nent described above. This is mainly due to the large size of the take-all strata, and to the fact that all the enterprises composed of more than 20 legal units are included in these take-all strata. We present here the results for the “mixed” allocation. Another result we have to check is the number of legal units that would be drawn in aggregated sectors (construction, trade, manufacturing industry, energy, transport, services, etc.). In fact, the legal units drawn in a sample are treated by different teams, depending on the business sector. We have to check whether there are substantial changes in the number of legal units
Impact of profiling on sampling
99
Table 8-1: Number of enterprises to draw (in thousands) and confidence intervals for the number of legal units to survey EAP
ESA
Total
Nent,mix
27.0
82.7
109.7
E p ª¬ Nˆ LU º¼ = N LU
35.0
116.0
151.0
CI 95 N LU
[34.97, 35.03] [115.84, 116.16] [150.83, 151.17]
per aggregated sector, compared to the previous sampling design of ESA and EAP, where the samples of these two surveys were independently drawn by stratified simple random sampling of legal units. The result is that this survey design at an enterprise level increases the number of legal units to treat in the trade activities and decreases this number in the service activities. This remains true for all the values of nent obtained above. The variability of these results in each aggregated sector is very low. We also notice that this survey design leads, compared to the previous sampling design of ESA and EAP, to surveying slightly more legal units with 1 to 5 employees, and slightly fewer legal units with 30 to 49 employees, similarly for all three nent. The variability of these results in all enterprise size strata is also very low. All these observed differences from the previous sampling designs of ESA and EAP are due to the fact that in the new sampling design, we perform a stratified simple random sampling of clusters of legal units – the enterprises – in order to maximise the accuracy of the estimations at the enterprise level, whereas in the previous sampling design, the legal units where directly drawn by simple random sampling with an accuracy objective for estimations at the legal unit level. So both the sampling design and the accuracy objective for the computation of allocation have changed, which explain the variations observed in the structure of allocations between the sample of legal units with new and previous sampling design.
3.3. Precision at the enterprise level As explained in Sections 3.1 and 3.2, all three nent give approximately the same results for: the number of legal units in aggregated sectors; the number of legal units in each enterprise size stratum; the variability of the number of legal units to be drawn.
100
Chapter 8
In order to find the best allocation among nent,1 (the allocation considering the five-digit business sectors as the domains of interest), nent,2 (the allocation considering the intersection between the three-digit business sector and the employee size classes as the domain of interest), and nent,mix (the mix between nent,1 and nent,2), we calculate in Table 8-2 the precision at the enterprise level in the two sets of domains of interest: five-digit business sector with the Neyman allocation; intersection between the three-digit business sector and the employee size classes with the Neyman allocation (without the takeall strata of units with more than 200 employees in this domain of interest). Table 8-2: Distribution of local CVs of the total of turnover at the enterprise level, by allocation and domain of interest. Domains of interest Five-digit business sectors Levels
nent,1
Three-digit business sectors × employee size bands
nent,2
nent,mix
nent,1
nent,2
nent,mix
100% Max
5.0%
74.4%
23.1%
89.3%
11.0%
43.1%
90%
5.0%
9.0%
6.3%
20.8%
11.0%
12.5%
75% Q3
5.0%
4.9%
4.4%
9.2%
8.0%
8.9%
50% Median
2.0%
2.0%
2.0%
4.2%
4.6%
4.2%
25% Q1
0.9%
0.8%
0.8%
0.1%
0.2%
0.2%
10%
0.2%
0.1%
0.2%
0.0%
0.0%
0.0%
0% Min
0.0%
0.0%
0.0%
0.0%
0.0%
0.0%
As we could have expected, the “mixed” allocation seems to do better on both domains of interest at the same time (columns 3 and 6 of the table body), in comparison to the precision we obtain if the domain of interest used to calculate the Neyman allocation ex ante is different from the domain of interest ex post (columns 2 and 4). This is especially true for the 10% of domains of interest with the highest local CVs (figures in bold in the table body).
Impact of profiling on sampling
101
On the other hand, the “mixed” allocation degrades the precision if the domains of interest are the same as the domain of interest used to calculate the Neyman allocation (columns 1 and 5 of the table body). Indeed, the maximum local CV is 5% for the five-digit business sectors and 11% for the intersection between the three-digit business sector and the employee size bands, as we could expect from the precision seen in Section 2.5. However, this degradation concerns only the 10% of domains of interest with the highest local CVs. Moreover, the differences of precision between these three allocations concern only the domains of interest of the last quartile of the distribution of the local CVs. For all three nent, the value of the third quartile is close to 5% for the five-digit business sectors and 8-9% for the intersection between the three-digit business sectors and the employee size classes.
3.4. Precision at the legal unit level Some users of the business data (the National Accounts for example) still use the information at the legal unit level. In this case, the structure of the enterprise is not taken into account and the weight of a legal unit corresponds to the weight of the enterprise it belongs to. In this context, some legal units with similar characteristics have different weights, which would lead to a higher weight dispersion. We compare in Table 8-3 the precision at: the legal unit level with the new survey design using the “mixed” allocation nLU,mix; the legal unit level using the allocation of the 2015 ESA and EAP survey designs nLU,2015 (without the take-all strata of units with more than 200 employees in this domain of interest). If we denote by yk the turnover of the legal unit k, tˆyʌ the HorvitzThompson estimator for the total of turnover at the legal unit level, nh the number of enterprises to survey, Nh the number of enterprises and f h = nh / N h the sampling rate in stratum h, Yg = yk the sum of the
¦
turnover of the legal units of an enterprise g, Yh = 1 / N h
¦Y
g
the
empirical mean of Yg in stratum h, the variance of tˆyʌ with the two-stage cluster sampling is obtained using the following formula:
V p ª¬tˆyS º¼
H
¦N h 1
2 h
1 fh 2 SYg h with SY2g h nh
2 1 Yg Yh ¦ N h 1 gU h
102
Chapter 8
Table 8-3: Distribution of local CVs for the total of turnover at the legal unit level depending on the survey design and the domain of interest. Domains of interest Five-digit business sectors
Three-digit business sectors × employee size bands
Levels
nLU,mix
nLU,2015
nLU,mix
nLU,2015
100% Max
14.9%
47.4%
38.3%
48.5%
90%
5.9%
7.5%
10.6%
12.8%
75% Q3
3.9%
3.8%
7.3%
5.4%
50% Median
2.1%
1.8%
3.3%
1.3%
25% Q1
0.9%
0.6%
0.6%
0.0%
10%
0.2%
0.1%
0.0%
0.0%
0% Min
0.0%
0.0%
0.0%
0.0%
The distribution of the local CVs for the total of turnover at the legal unit level with the “mixed” allocation (columns 1 and 3 of the table body) is similar to the precision we currently have with the survey design at the legal unit level (columns 2 and 4 of the table body), for both sets of domains of interest. Indeed, the precision at the legal unit level with the new survey
Figure 8-3: Local CVs for the total of turnover at the legal unit level by five-digit Business sectors (left) and for the intersection between the three-digit sectors and the employee size bands (right) depending on the survey design.
Impact of profiling on sampling
103
design is better in 50% of cases than the precision we currently have with the current survey design, and is worse also in 50% of the cases (see Figure 8-3). However, the two-stage cluster sampling leads to a better precision (i.e. lower local CVs) for the highest values of the local CVs with the current survey design (e.g. more than 40%). This surprising result is due to the fact that the allocation of the survey design for legal units was optimised for quite a long time, and using a “simple” Neyman allocation without taking into account any domain of interest in the process. So, even if we are now performing – from the legal unit point of view – a cluster-sampling, the fact that we take the target domains into account during the optimisation process allows us to avoid having some domains with very bad precision of the estimates.
4. Conclusion and future work In this study, we assessed the impact of changes on statistical inference in business surveys that are now based on the sampling of enterprises instead of legal units. Our aim was to optimize the survey designs in the resulting two-stage cluster sampling in order to have good precision of estimators while respecting the constraint of surveying a limited number of legal units. The definition of the take-all strata leads us to consider approximately the same number of legal units in the take-all part of the sample as in 2015. The variability of the number of legal units to be drawn in the take-some strata is small for all allocations considered. However, the different allocations yield a different precision at the enterprise level. The variance of the estimator resulting from this optimised survey design was similar to the current one. To improve the stratification of the survey design (see 2.2), one could define an optimal categorization of the number of employee per enterprise using the Dalenius-Hodges method (Dalenius and Hodges Jr, 1959), the geometric method proposed by Gunning and Horgan (2004), or the Lavallée-Hidiroglou method (Lavallée and Hidiroglou, 1988). The latter method could also be applied to find an optimal threshold of the turnover in each activity for the definition of the take-all strata. Instead of using equal weights for the two designs in the calculation of the “mixed” allocation (see 3.1 and 3.2), it would be advisable to find optimal factors as discussed in Merly-Alpa and Rebecq (2016). We can also use algorithms for sample allocation that allow optimization on several domains of interest at the same time, such as the allocation methods developed by Falorsi and Righi (2015).
104
Chapter 8
Between the selection of the sample and the dissemination of the results, the “perimeter” of an enterprise (i.e. the legal units that belong to the enterprise) can change. For example, a legal unit of enterprise A can belong to another enterprise B one year later, or can become an independent legal unit. This problem can be seen as a particular case of indirect sampling: the sample is drawn in a population of enterprises (with a certain “perimeter”) which differs from the population of interest (at the stage of producing the estimates), but which is linked to this one via its legal units. In this context, the generalised weight share method proposed by Deville and Lavallée (2006) would allow us to handle this problem. Finally, this paper focuses mainly on the survey design. This is the first step of the production, but ensuring the quality of the surveys requires a lot of post-survey treatments, such as non-response weight adjustment (Brion and Gros, 2015), calibration and winsorization of outliers (Deroyon, 2015). All these methods are widely known and discussed, but their application to this survey, while using the economic structure of enterprises, needs to be studied. An issue of particular interest, which will have to be treated in the future, is the question of the correlation between non-response of legal units within enterprises.
References Brion, P., and Gros, E. (2015). Statistical estimators using jointly administrative and survey data to produce French structural business statistics. Journal of Official Statistics 31, 589-609. Dalenius, T. and Hodges Jr., J.L. (1959). Minimum variance stratification. Journal of the American Statistical Association 54, 88-101. Deroyon, T. (2015). Traitement des valeurs atypiques d’une enquête par winsorization – application aux enquêtes sectorielles annuelles. Acte des Journées de Méthodologie Statistique de 2015. Available at http:// jms.insee.fr/files/documents/2015/S10_5_ACTE_V2_DEROYON_ JMS2015.PDF. Deville, J.C. and Lavallée, P. (2006). Indirect sampling: The foundations of the generalized weight share method. Survey Methodology 32, 165-176. EEC (1993). Council Regulation 696 / 93 of 15 March 1993 on the statistical units for the observation and analysis of the production system in the Community. Official Journal of the European Communities 46, 1-11. Falorsi, P.D. and Righi, P. (2015). Generalized framework for defining the optimal inclusion probabilities of one-stage sampling designs for
Impact of profiling on sampling
105
multivariate and multi-domain surveys. Survey Methodology 41, 215236. Gunning, P. and Horgan, J.M. (2004). A new algorithm for the construction of stratum boundaries in skewed populations. Survey Methodology 30, 177-185. Haag, O. (2018). How to improve the quality of the statistics by combining different statistical units? In Lorenc, B., Smith, P.A., Bavdaž, M., Haraldsen, G., Nedyalkova, D., Zhang, L.-C. and Zimmermann, T. (eds.). The unit problem and other current topics in business survey methodology, pp 31-46. Newcastle upon Tyne: Cambridge Scholars. Koubi, M. and Mathern, S. (2009). Résolution d’une des limites de l’allocation de Neyman. 10èmes Journées de Méthodologie Statistique, Paris. Available at: http://jms.insee.fr/files/documents/2009/829_1JMS2009_S22-03_KOUBI-ACTE.PDF. Lavallée, P. and Hidiroglou, M.A. (1988). On the stratification of skewed populations. Survey Methodology 14, 35–45. Merly-Alpa, T. and Rebecq, A. (2016). Optimisation d’une allocation mixte. 9ème colloque francophone sur les Sondages, Gatineau. Available at: http://paperssondages16.sfds.asso.fr/submission_52.pdf.
CHAPTER 9 COPING WITH NEW REQUIREMENTS FOR THE SAMPLING DESIGN IN THE SURVEY OF THE SERVICE SECTOR
THOMAS ZIMMERMANN1, SVEN SCHMIEDEL1 AND KAI LORENTZ1
Abstract As a consequence of a decision taken by the Federal Administrative Court of Germany in March 2017, the sampling scheme in the structural survey in the service sector had to be revised. To cope with the court decision, we consider alternative sampling designs, which should avoid take-all strata as much as possible to spread the response burden more evenly among the target population. The results from our study reveal that we need take-all strata in the group of large enterprises to facilitate precise estimates using the currently applied Horvitz-Thompson estimator. As all the sampling designs considered lead to a reduction of take-all strata compared to the old design in place until 2016, the issue of an appropriate estimation method also needs to be addressed. In our paper, we therefore explore potential sources of auxiliary information that can be incorporated at the estimation stage. Specifically, we consider calibration estimators and illustrate advantages and disadvantages of different sources of auxiliary information using data from the 2014 survey.
1 Destatis – German Federal Statistical Office, Gustav-Stresemann-Ring 11, 65189 Wiesbaden, Germany. Email: {Thomas.Zimmermann, Sven.Schmiedel, Kai.Lorentz}@destatis.de.
108
Chapter 9
1. Introduction The German structural survey in the service sector (SiD) is an annual survey providing relevant information on medium-term developments and structural changes. It covers the following sections of the NACE classification of economic activity (Eurostat, 2008): transportation and storage (section H), information and communication (section J), real estate (section L), professional, scientific and technical (section M), administrative and support services (section N) and division S 95, which comprises repair of computers and personal and household goods. The survey is conducted as a sample with an overall sampling fraction of 15% of units in the population. Until 2016, the units in the sampling frame were stratified according to their NUTS1 region (16 states), combinations of the first four digits of their NACE-classification (NACE4 hereafter, 83 sectors) and size classes measured by turnover or alternatively the number of employees (8 classes). In a second step, sample sizes were determined such that precise estimates could be obtained for total turnover in stratum groups defined as cross-classifications from the NUTS1 regions with NACE4-classes, and additionally, efficient estimates for higher level of aggregations were obtained. Because of a decision taken by the Federal Administrative Court of Germany in March 2017, however, the sampling design in the SiD had to be revised. Specifically, the court required that the response burden should be spread as evenly as possible and that take-all strata are only acceptable if they are imperative to obtain estimates which are sufficiently representative (Heranziehung zur Dienstleistungsstatistik, 2017). The reason for this court decision stems from the German regulation law for the service sector statistics, wherein the existence of take-all-strata in the sampling plan has not been allowed for explicitly. To the contrary, the law requires the possibility of sampled units rotating out of the survey as soon as possible – a condition which cannot be fulfilled in take-all strata. It should be noted that the judges’ interpretation of a representative sample may be different from the standard definition in statistics. In statistics, a sample is deemed representative if it is selected by a probability sampling mechanism. However, this is hardly what the judges had in mind. Hence, we interpret their requirement such that the results from the sample should be sufficiently accurate. Note that while these requirements could be addressed by incorporating additional constraints when minimizing the loss function used in the sample allocation, the precision of the resulting estimates is likely to deteriorate, because the sampling variance of a take-all stratum is zero and
Coping with new requirements for the sampling design
109
owing to the skewed distribution of turnover, the population variance within the take-all strata typically exceeds the population variance within other strata by far. Hence, the sampling variance of the stratum groups increases dramatically. So, for the new sample drawn in autumn 2017, the sampling plan has been revised to cope with the above-mentioned court decision. The stratification scheme changed slightly in order to be coherent with the tables of results which have to be delivered to Eurostat. Therefore, the size classes are now defined by the number of employees. To avoid take-all strata as far as possible the sample allocation to strata changed. However, for the top size classes, it was quickly clear that take-all strata had to be allowed – otherwise the expected quality of the results would have been too poor. The paper is organised as follows. Section 2 is devoted to the old approach, which was applied in SiD until 2016. Section 3 presents the steps taken in finding a new sampling design. In Section 4, we elaborate on alternative estimation methods that might be of interest in the future, as they may alleviate the potential efficiency losses due to new design to some degree. Finally, Section 5 offers concluding remarks and gives an outlook.
2. The old approach Formally, the sample sizes in the strata were obtained until 2016 by minimizing the maximum of the weighted expected coefficients of variation using a Horvitz-Thompson (HT) estimator for total turnover among stratum groups g = 1, …, G:
F
max W CV (Yˆg )
g 1,!,G
q g
max
g 1,!,G
Wgq Yg
¦N
§1 1 · Sh2 ¨ ¸ © nh N h ¹
2 h
hg
under the constraints mh nh Mh for all h and
¦n
h
=n.
Here, Wg is a measure of importance associated with stratum group g,
CV Yˆg = Var Yˆg / Yg denotes the expected coefficient of variation in stratum group g and the exponent q is a constant between 0 and 1. Furthermore, nh, Nh and Sh2 refer to the sample size, number of units and population variance of the variable of interest in stratum h, respectively. Additionally, mh and Mh denote the box-constraints for the sample sizes in the strata, while n indicates the maximum overall sample size. When applying the sample size allocation for the SiD, we used mh = 3 and Mh =
110
Chapter 9
Nh for all h as well as Wg = Yg and q = 0.2. As the population of businesses is very heterogeneous with respect to turnover and the heterogeneity is most pronounced for the largest enterprises, minimizing F gives rise to very different sampling fractions nh /Nh among the strata. Furthermore, many take-all strata are obtained, especially in the group of large enterprises. Since the actual turnover is estimated using the survey, the turnover variable in the sampling frame was used to compute the sample sizes. Objective functions similar to the one above have been used for a long time at the German Federal Statistical Office (Destatis), since they ensure that the expected coefficients of variation are approximately inversely proportional to Wgq (Schäffer, 1961). Moreover, F can be shown to be a special case of the generalized power allocation developed by Hohnhold (2010), who extends the concept of the power allocation due to Bankier (1988). In the case of generalized power allocations, the loss function to be minimized is given by: G
Fp,q = ¦ Wgq CV Yˆg g 1
p
where 2 p . Hence, Fp,q is the p-norm of the vector of the weighted
coefficients of variation, i.e. Wgq CV Yˆg
g=1,...G
. It can be seen that
setting p = 2 reduces to the well-known power allocation due to Bankier. Following Hohnhold (2010), we can also consider the case p = , where the loss function Fp,q reduces to
Ff ,q = max Wgq CV Yˆg ,
g=1,... ,G
which is the maximum norm of the vector Wgq CV(Yˆg )
g=1,...G
and also
the loss function considered in the SiD survey. Destatis developed the OptAlloc SAS macro (a corresponding set of R-functions also exists), which implements power allocation methods for all p-norms, from p = 2 to p = (the above mentioned Schäffer approach). It can incorporate graduations (the above-mentioned W’s, in our context chosen as turnover totals for stratum groups) to some exponent q, where q = 0 targets equal relative standard errors for turnover totals for all stratum groups and q = 1/2 is equivalent to the optimal Neyman allocation method, minimizing the expected standard error for the overall total value over all the stratum groups.
Coping with new requirements for the sampling design
111
3. A new sampling design In the following, we present various approaches to account for the court decision by a new sampling mechanism and analyse their impact on the expected quality of the results from the SiD.
3.1 Background The following points have to be addressed for an appropriate assessment: 1. Establish a criterion which determines whether the results of the sample can be assessed as sufficiently “representative”, which is a requirement of the court decision. 2. Declare domains of interest, for which accurate results have to be determined. 3. Compare different variants of the sampling plan to find a suitable method which yields accurate results and has an acceptable response burden. With respect to point 1 the two relevant legal bases of the annual service statistics, the EU regulation on structural statistics (EC, 2008) and the German Services Statistics Act (Bundesministerium der Justiz und für Verbraucherschutz, 2000) do not include any specification regarding accuracy. Hence, no legal regulation is applicable. Instead, we decided to follow the standard practice in sampling by comparing unbiased estimation strategies in terms of their relative standard error. In German official statistics results are typically not published whenever the relative standard error exceeds 15%, while results with a relative standard error between 10% and 15% are parenthesized. Therefore, we consider results to be sufficiently accurate if their relative standard error is less than 10% (15%). In practice, this accuracy has to be achieved by the realised sample. However, since the variables of interest are collected from the sample and, therefore, only available for the subset of sampled elements, we decided to compare the different sampling designs in terms of their accuracy for the turnover variable, which is included in the sampling frame. Hence, the actual sampling error in the realized sample will be higher than what we calculate here. In addition, there are important variables, like value added, investments, number of employees and many more, which are more or less strongly correlated with turnover, which is the variable of sample size allocation to the strata. Therefore, our requirement to ensure the accuracy
112
Chapter 9
of the results of the sample is that the expected relative standard error for the turnover-proxy should be below 5% or at least not exceed 10%. Regarding point 2, the national legal basis does not specify domains of interest. The EU regulations contain obligations that must be met by the SiD. These requirements include three dimensions: estimates according to NACE Rev. 2, six employee size classes and a regional classification. Together these dimensions contribute to three domains of interest which are presented in Table 9-1. Table 9-1: Domains of interest within the EU (regulation 295/2008) Domain of Definition interest 1
Results at the national level up to the four-digit levels (n = 113) of NACE Rev. 2
2
Results at the national level up to the three-digit level (n = 67) of NACE Rev. 2 for employee size classes (n = 6)
3
Results up to the two-digit (n = 26) level of NACE Rev. 2 cross-classified with 16 federal states (3A), or with the 38 administrative districts (NUTS2) (3B)
3.2 Specification of different sampling designs We have calculated various potential sampling designs for the SiD for the year 2014. The goal was to develop a sampling design which fully meets the EU commitments with reliable results, and additionally reduces the number of enterprises in take-all strata compared with the current situation. Parameters determining the sampling design are the stratification of the sampling frame and the allocation of the total sample size (15% of the selection basis yielding n = 192,110). We switched the stratification from eight size classes mainly based on turnover to six size classes based on the number of employees to align the stratification with the domains of interest required by the EU. Additionally, the stratification now uses the fully differentiated NACE4-classes in contrast to the combined NACE4 classes used earlier. This caused an increase from 86 to 113 economic sectors. We did not change the stratification by federal states.
Coping with new requirements for the sampling design
113
Furthermore, we also applied a number of changes to the parameters of the loss function F. In order to reduce the number of take-all strata for strata with only a small number of units, the minimum of units per stratum mh was set to two rather than three. We also changed the exponent q governing the relative importance of the stratum groups from q = 0.2 to q = 0. This change has a number of important implications, as the loss function F aims to achieve equal relative standard errors for the stratum groups with q = 0. While this is expected to lead to more accurate estimates for stratum groups with smaller total turnover compared to the old approach, the quality of aggregated estimates is likely to decrease. This decision was taken as the estimates for the stratum group-level gained importance with the EU regulation. In the following, we briefly summarise the different variants which we considered: 1. This variant represents the 'baseline': stratification by regions (NUTS1), economic sectors (NACE4) and employee size classes according to the EU regulation. The allocation is somewhat similar to the old sampling mechanism, as there is no restriction on the take-all strata. Several parameters are adjusted in the following variants: 2. Stratification of the sample according to economic sectors (NACE4) and employee size classes, at the same time take-all strata are not allowed. 3. Stratification by economic sectors (NACE4) and employee size classes, take-all strata in the largest employee size class are allowed. 4. Stratification by economic sectors (NACE4), employee size classes and NUTS1, take-all strata in the largest employee size class are allowed. 5. Same stratification as in variant 4 but take-all strata are not allowed. 6. Stratification by economic sectors (NACE3), three aggregated employee size classes and NUTS1 regions, take-all strata in the largest employee size class are allowed. We estimate the relative standard errors of turnover totals according to the different sampling schemes by a design-based simulation study with R = 1000 Monte-Carlo replications. In every replication, we drew one sample according to the six variants specified above and used that sample to compute HT estimates according to the domains of interest 1 and 3a detailed in Table 9-1. Note that in the case of variants 1, 4 and 5 every
114
Chapter 9
domain under both domains of interest is obtained as a union of strata with a fixed sample size, that is, the domains are planned (Lehtonen and Veijanen, 2009). As we use the population variances in the strata to determine the sample size allocation, we could directly estimate the resulting precision without a simulation. However, unplanned domains for the other variants exist, e.g. variant 6 implies unplanned domains on the NACE4-classes. To facilitate a comparison between the variants, we decided to report the results from a simulation study for all sampling design variants. Specifically, we compare the different variants by the relative root mean squared error (RRMSE) of the point estimates. The RRMSE for the estimator Yˆd of the total in domain d, Yd is defined as:
RRMSE Yˆd =
1 r Yˆd Yd ¦ R Yd
2
r where Yˆd denotes the estimated domain total in the r-th replication.
3.3 Results The results for the relative standard errors for turnover totals for the domains of four digit NACE codes at the national level are shown in Figure 9-1, while the results at the state-level are depicted in Figure 9-2. In both graphs, the solid horizontal line indicates an RRMSE of 10%, while the dotted line indicates an RRMSE of 5%. In addition to the precision of the different sampling variants, we compared the number of take-all strata and the number of enterprises within those strata as the court decision required the possibility for enterprises to rotate out of the sample. This information is presented in Table 9-2. It is obvious from both graphs that the results of variant 1 are sufficiently accurate for the overall federal and the federal state level. Both the number of take-all strata and the number of units in these take-all strata are high, however, although remarkably lower than in the old sampling design. Variants 2 and 3 both attempt to abandon take-all strata where possible. In variant 2, the maximum sampling fraction was set to 95% in order to rule out take-all strata, with the exception of strata in which Nh 2. However, variant 2 leads to results which are inadequate for NACE4 classes at the federal level and for the NACE2 classes at the state level. Therefore, take-all strata are essential to the SiD from the perspective of Destatis.
Coping with new requirements for the sampling design
115
Figure 9-1: RRMSEs for domain of interest 1 with different sampling schemes.
Figure 9-2: RRMSEs for domain of interest 3A with different sampling schemes. 57 values with RRMSE > 0.5 are not plotted. 21 occurred in variant 2, 35 in variant 3 and 1 in variant 5.
116
Chapter 9
Table 9-2: Information about take-all strata. (The variant “Old” refers to the sampling design used for the SiD 2014.) Variant 1 2 3 4 5 6 Old
Number of strata 7,985 649 649 7,985 7,985 2,431 8,890
Number of takeall strata 3,229 21 117 2,341 1,731 670 4,145
Number of enterprises in take-all strata 20,108 29 3,931 6,950 2,396 3,107 72,495
In variant 3 take-all strata were allowed in the largest employee size class of each economic sector, which, owing to the skewness of the population, is the most important size class for the accuracy of the results. The results for the domain of interest 1 are satisfactory, but this is not the case for the domain of interest 3A. An explanation for the poor performance is that owing to the omission of the states as a stratification variable, the states are unplanned domains with random sample sizes. Design-based estimation methods, however, are known to suffer from high variances with unplanned domains. In variants 2 and 3 CVs exceeding 100% occurred in some of the domains for domain of interest 3A. They are not depicted in Figure 9-2 as the y-axis was limited to a CV of 50% to increase the readability. Therefore, the states were reintroduced as a stratification variable in variant 4 to achieve better results. This improves the results vis-a-vis variant 3 and leads to a significant increase in the number of take-all strata and their units. However, compared to variant 1 which allows take-all strata in all size classes a loss in precision can be seen. Variant 5 is similar to variant 4 but does not allow for take-all strata in the largest size class. This leads to an increase of the CV for both domains of interest. In variant 6 the stratification is less differentiated compared to variant 4. This results in a deterioration of the quality of the estimates for both domains of interest. Most notable is the poor performance of estimates for NACE4 classes, which emerge as unplanned domains under this variant.
Coping with new requirements for the sampling design
117
4. An investigation of alternative estimation methods After studying new sampling designs, we now turn our attention to calibration estimation methods incorporating auxiliary information. We chose to investigate them as they promise to compensate to some extent for efficiency losses due to the new sampling design. Our analysis is conducted using SiD data from 2014.
4.1 Data sources and auxiliary information in calibration estimation Two important issues regarding calibration estimation (see Deville and Särndal (1992)) are the choice of auxiliary variables which are calibrated to their population totals and the source of the totals if there is more than one. We first address the latter issue and note that an obvious data source for auxiliary information is the sampling frame itself. The sampling frame used for the SiD 2014 is based on the Unternehmensregister (business register) for the reporting year 2013, which was the latest data available when the sample was drawn. In the process of compiling the sampling frame for the SiD, the responsible section for service statistics applies various checks of the business register data using subject matter experience. This includes the removal of units which are likely to be inactive, and therefore, the sampling frame for the SiD comprises fewer units from the relevant sections than the business register. Interestingly, the phenomenon of over-coverage of the target population in the business register was not detected for total turnover and the total number of employees, respectively. This finding indicates that the adjustments in the processing of the sampling frame mostly affect smaller enterprises. The second data source is the business register for 2014, which refers to the same reporting year as the SiD. The choice of auxiliary variables is crucially important to facilitate accurate point estimates using calibration estimation. This is most easily seen for the linear GREG estimator, which can be derived from a calibration approach as well. If the residuals from the associated regression model are small, highly accurate point estimates can be obtained. Hence, the auxiliary variables should be chosen to be highly predictive for the variables of interest. Since many of the most important target variables in the SiD relate to turnover, the number of employees and the number of enterprises, we decided to concentrate on these variables.
118
Chapter 9
4.2 Misclassification and nonresponse As opposed to standard HT estimation, with calibration estimation only those responding units in the sample for which the vector of auxiliary information is available can contribute to the estimates. Thus, it seemed worthwhile to study whether the auxiliary information was missing for some units in the SiD survey 2014. Table 9-3: Percentage of missing values after matching auxiliary information to the SiD. For column descriptions, see text. SC 1 2 3 4 5 6 7 8
SF-TUR 81.4 1.3 1.6 0.7 0.9 1.0 0.8 0.3
BR-TUR 71.6 8.6 5.9 3.3 2.8 2.4 2.0 1.2
SF-EMP 0.0 85.9 68.2 43.4 24.7 15.3 8.5 3.5
BR-EMP 8.0 8.3 4.8 2.8 2.0 1.6 1.7 1.7
Table 9-3 displays the number of missing values of potential calibration variables after matching auxiliary information to the survey data for the different size classes (SC), based on the reported turnover in the sampling frame. Columns beginning with SF refer to auxiliary information taken from the sampling frame. Columns with BR refer to variables obtained from the business register 2014. Moreover, TUR indicates the variables for turnover and EMP the variables for the number of employees. The large fraction of units with missing auxiliary information for turnover in size class 1 is explained by its design to comprise units with missing or very small values for the reported turnover in the sampling frame. Moreover, large shares of missing information for the number of employees in the sampling frame are evident. A possible explanation for this finding is that the current estimation procedure does not incorporate auxiliary information at the estimation stage and since the size classes used for stratification were based on turnover, there was no need to establish the number of employees in the sampling frame. Except for the number of employees as obtained from the sampling frame, the share of missing values for enterprises in size classes 5 and higher is below 3%. Additionally, the percentage of missing values decreases as the size class increases. Hence, the availability of auxiliary information is less of an issue for larger enterprises.
Coping with new requirements for the sampling design
119
Furthermore, the analysis of the survey data showed that 15,206 enterprises were misclassified, i.e. they were sampled but do not belong to the population of interest. This is a common situation in business surveys targeted to specific sectors, as the sector in which an enterprise operates may change over time (Brion and Gros, 2015). This issue is also relevant for the non-sampled units, but the information about whether a unit is misclassified is only known for the sampled part. Thus, to avoid calibrating to inflated population totals we fitted a misclassification model using the survey data with the purpose to delete units in the auxiliary data. Specifically, the logistic regression model estimated by pseudo maximum likelihood included the size class, NUTS1, NACE4 and an interaction term between size class and NUTS1 as covariates. We then used the coefficients from this model to obtain predictions for misclassification in auxiliary data files. Finally, a unit was set to be misclassified when the realization of a uniform random variable between 0 and 1 took a value less than the predicted probability of misclassification. Another issue in the SiD survey is unit nonresponse. In our analyses, we first applied a nonresponse adjustment which is then followed by a calibration step. This procedure is known as the two-step approach to calibration in contrast to the approach where the nonresponse adjustment is done in one step together with the adjustment to known population totals (Haziza and Lesage, 2016). However, Haziza and Lesage (2016) showed that the one-step approach to calibration can be risky, as the choice of the distance function to be minimised corresponds to an implicitly assumed parametric nonresponse model. All further analyses were conducted using responding units that are correctly classified. This left us with a sample size of 158,312 enterprises and helped to ensure the comparability of our analyses with current practice at Destatis.
4.3 Results An overview of the different models is given in Table 9-4. Here, the second column identifies the calibration constraints used. As an example, the first model uses calibration constraints on the number of enterprises obtained from the sampling frame for breakdowns at the NUTS1 and the NACE2 level and for the size classes. Additionally, constraints on total turnover are used for the same breakdowns. Besides, these calibration constraints correspond to a linear regression model with dummy variables for NUTS1- and NACE2-levels and size classes in addition to the turnover variable, where interaction terms are included for turnover with each of these dummies and the inverse inclusion probabilities adjusted for
120
Chapter 9
nonresponse are used as regression weights. The third column in Table 94 indicates the sample size that was used to fit the models, i.e. the first model was adjusted using 153,699 observations. Finally, columns 4 and 5 indicate the adjusted R2 from a linear regression of the variables turnover and number of employees, respectively, on the covariates. For each specification, we first applied GREG-weighting and then, in cases where some negative weights were obtained, we additionally applied the raking ratio approach. Table 9-4: Comparison of the models Model SF-1
SF-2
BR-1
BR-2
BR-3
Calibration constraints N for NUTS1, NACE2, SC (all from SF) Total TUR for NUTS1, NACE2, SC (all from SF) N for NUTS1, NACE2, SC (all from SF) Total TUR for NUTS1 (all from SF) N for NUTS1, NACE2, SC (all from SF) Total TUR for NUTS1, NACE2, SC (all from BR) N for NUTS1, NACE2, SC (all from SF) Total TUR and EMP for NUTS1 (all from BR) N for NUTS1, NACE2 (all from SF) Total TUR and EMP for NUTS1 (all from BR)
n 153,699
R2TUR 0.951
R2EMP 0.837
153,699
0.864
0.353
152,872
0.909
0.844
152,342
0.905
0.967
152,342
0.905
0.967
The first two models use only auxiliary information from the sampling frame. Furthermore, these models do not include the number of employees among the auxiliary variables as this information was missing for a substantial share of units (see Table 9-3). We see that incorporating additional interaction effects between turnover and NACE2 levels as well as turnover and size classes increases the adjusted R2 for both dependent variables.
Coping with new requirements for the sampling design
121
Additionally, we studied models incorporating auxiliary information taken from the business register. These models are labelled by BR-1, BR2 and BR-3. These models still contain calibration constraints for the number of enterprises which have been obtained from the sampling frame. This modelling decision was taken because in the process of compiling the sampling frame various adjustments were applied to the business register data, so that the number of relevant enterprises for the SiD survey is accurately reflected in the sampling frame (see Section 4.1). A major difference amongst the three models utilising information from the business register is that BR-1 does not include the number of employees as a covariate while models BR-2 and BR-3 use this variable. Including this variable clearly increases the adjusted R2 for the number of employees. However, model BR-1 contains more detailed information with respect to turnover aggregates and yields a slightly higher adjusted R2 for turnover. Moreover, it is interesting to study the behaviour of the associated gweights. Under both models BR-1 and BR-2 a number of negative GREG weights occurred, while the weights under all other models were strictly positive. A comparison of the g-weights is shown in Figure 9-3, where the g-weights for model BR-2 have been computed using the raking ratio approach. We tried the same for model BR-1, but the algorithm did not converge. Hence, we excluded this model from subsequent analyses. It is interesting that models SF-1 and SF-2 both yielded an acceptable range of adjustment weights, where only one unit for SF-1 received a g-weight less than 0.5. In general, this is a desirable result, as very large and very small
Figure 9-3: Distribution of the g-weights.
122
Chapter 9
Table 9-5: Deviation from nonresponse adjusted HT estimates for the SiD 2014 in % Variable TUR EMP
SF-1 -0.84 -0.36
SF-2 -0.88 -1.38
BR-2 0.38 1.95
BR-3 0.60 2.02
g-weights may lead to an increase in the variance (Hedlin et al., 2001). With respect to the models BR-2 and BR-3 a wider range of g-weights was observed. Besides, the benefits of a more parsimonious model can be seen as models BR-3 and SF-2 yield a narrower range of weights than models BR-2 and SF-1, respectively. Furthermore, we were also interested in the quality of the associated estimates for totals of turnover and the number of employees, respectively. We first looked at the deviations that emerge if one of the four calibration estimators were applied instead of the nonresponse-adjusted HT estimator. The results for Germany as a whole are given in Table 9-5. While there are some deviations from the currently applied method, they are small and the calibration estimates seem plausible. We then compared the accuracy of the different calibration estimators to the nonresponse adjusted HT using the SiD survey for the domain of interest 1 introduced earlier. (We also analysed the domains of interest given by partition 3A, which yielded similar results.) This leaves us with a planned domain structure. Figure 9-4 shows the coefficients of variation (CV) that would have resulted for total turnover. It shows that the non-
Figure 9-4: CVs for total turnover in domain 1 using SiD 2014.
Coping with new requirements for the sampling design
123
Figure 9-5: CVs for total number of employees in domain 1 using SiD 2014. Two outlying CVs are not plotted.
nonresponse adjusted HT estimates are accurate in most of the domains. Nevertheless, improvements to the adjusted HT estimates can be realised by employing calibration estimators. Among the different calibration estimators, those which use turnover information from the sampling frame clearly outperform those using turnover from the business register. The CVs for the total number of employees are shown in Figure 9-5. The nonresponse adjusted HT estimates are accurate for the vast majority of the domains. Interestingly, the estimators SF-1 and SF-2 which do not include the number of employees from any auxiliary data source among the covariates were no better than the adjusted HT estimate. However, the BR2 and BR-3 estimators which incorporate information about the number of employees from the business register produced domain estimates with a consistently higher precision than the adjusted HT. Finally, the y-axis in Figure 9-5 was truncated to a CV of 60 per cent to increase readability. Hence, two outliers of the SF-2 method where the CV takes values of 117 and 150 per cent are not depicted.
5. Discussion and conclusion We have presented two different approaches to cope with the new requirements for the SiD survey. Our first approach aimed to find a new sampling design which avoids take-all strata as much as possible. Our results show that take-all strata are required to produce accurate estimates
124
Chapter 9
in the SiD. The decision of the Federal Administrative Court does not mean that take-all strata have to be excluded categorically. Rather, take-all strata are allowed if they are necessary to produce reasonably accurate results. Our simulation study clearly indicates that this is the case. Moreover, our analysis reveals a trade-off between the number of takeall strata and the accuracy of the results: fewer take-all strata lead to less accurate results. Besides, all the different variants lead to a reduction of the total number of take-all strata and units therein compared to the old approach. The second approach that we explored is to incorporate auxiliary information at the estimation stage by means of calibration estimation. Our results from applying this to SiD data 2014 are promising as we observed reductions in the CV of the domain estimates in comparison to the HT estimates. The potential gains from using a calibration estimator in a future SiD survey might be larger than the ones obtained in our application, where we had a large number of take-all strata in which the sampling variance of the HT is zero by construction. Nevertheless, there are a few issues which should be addressed if calibration estimators were to be applied in practice. Firstly, careful scrutiny should be employed in the editing and preparation process of the auxiliary data sources. One example in this regard is to find a suitable method for imputing missing values of key variables such as turnover or the number of employees. Secondly, more emphasis could be laid on selecting the auxiliary information for the assisting models. While we followed a rather intuitive reasoning here, a more sophisticated approach could be pursued by applying model selection tools and diagnostic checks. A few ideas in this regard might be obtained from the study by Hedlin et al. (2001). Finally, the choice of the assisting model in terms of the functional form and the level at which it is estimated are important. We applied a more-orless standard linear GREG estimation approach with multiple auxiliary variables. Our results indicate a benefit of including both turnover and the number of employees among the covariates to achieve greater precision for both domains of interest. However, our models were calibrated at a fairly aggregate level to avoid large variations in the g-weights with multiple covariates. A potential alternative which could yield similar or even better results is to consider separate ratio estimation within stratum groups. Many NSIs employ a ratio estimator in business surveys (cf. Smith et al., 2003) as it explicitly models heteroscedasticity and enables strictly positive gweights for covariates taking positive real numbers.
Coping with new requirements for the sampling design
125
References Bankier, M.D. (1988). Power allocations: Determining sample sizes for subnational areas. The American Statistician, 42, 174-177. Brion, P. and Gros, E. (2015). Statistical estimators using jointly administrative and survey data to produce French structural business statistics. Journal of Official Statistics 31, 589-609. Bundesministerium der Justiz und für Verbraucherschutz (2000). Gesetz über Statistiken im Dienstleistungsbereich (in German). Available at https://www.gesetze-im-internet.de/dlstatg/BJNR176510000.html (accessed 13 Feb 2018). Deville, J.C. and Särndal, C.-E. (1992). Calibration estimators in survey sampling. Journal of the American Statistical Association 87, 376-382. EC (2008). Regulation (EC) No. 295/2008 of the European Parliament and of the Council of 11 March 2008 on the structural business statistics. Available from http://eur-lex.europa.eu/LexUriServ/LexUriServ.do? uri=OJ:L:2008:097:0013:0059:EN:PDF (accessed 13 Feb 2018). Eurostat (2008). NACE Rev. 2 Statistical classification of economic activities in the European Community. Luxembourg: Office for Official Publications of the European Communities. Available from http://ec.europa.eu/eurostat/documents/3859598/5902521/KS-RA-07015-EN.PDF (accessed 13 Feb 2018). Haziza, D and Lesage, É. (2016). A discussion of weighting procedures for unit nonresponse. Journal of Official Statistics 32, 129-145. Hedlin, D., Falvey, H., Chambers, R. and Kokic, P. (2001). Does the model matter for GREG estimation? A business survey example. Journal of Official Statistics 17, 527-544. Heranziehung zur Dienstleistungsstatistik (2017) BVerwG 8 C 6.16 (Bundesvewaltungsgericht 2017). Available from http://www.bverwg. de/de/150317U8C6.16.0 (accessed 18 Apr 2018). Hohnhold, H.(2010). Generalized power allocations. Wiesbaden: Destatis. Available on request from the authors. Lehtonen, R and Veijanen, A. (2009). Design-based methods of estimation for domains and small areas. In D. Pfeffermann, and C. R. Rao (Eds.). Handbook of Statistics: Vol 29B (pp. 219-249). New York: Elsevier. Schäffer, K.A. (1961). Planung geschichteter Stichproben bei Vorgabe einer Fehlerabstufung. Allgemeines Statistisches Archiv 45, 350-361. Smith, P., Pont, M. and Jones, T. (2003). Developments in business survey methodology in the Office for National Statistics, 1994–2000. The Statistician 52, 257-295.
CHAPTER 10 SAMPLING COORDINATION OF BUSINESS SURVEYS AT STATISTICS NETHERLANDS MARC SMEETS1 AND HARM JAN BOONSTRA1
Abstract This chapter describes the methodology of the coordinated sampling system for business surveys that is used by Statistics Netherlands. The sampling system coordinates the samples in a given group of surveys with a common sampling frame where both stratified cross-sectional and rotating panel designs can be combined. The sampling coordination is realised by keeping a survey burden value for each enterprise in the sampling frame and the actual memberships of the panels in the group. In addition to this, every enterprise has a permanent random number in order to guarantee the randomness of the drawn samples. In this way the total survey burden is evenly spread among the enterprises, where the spread of survey burden is both over time and the surveys in the group. In this chapter some methodological aspects of the Dutch sampling system are discussed, such as the combination of cross-sectional and rotating panel designs, the conditions on the stratification and how population dynamics are taken into account. Finally, several aspects of the spread of survey burden are examined in a simulation study where it is also shown that no selectivity is introduced in the drawn samples.
1. Introduction The requirement to coordinate survey samples is commonly faced in official statistics. The purpose of sampling coordination is to control the overlap between successively drawn samples in a dynamic population, see 1 Department of Methodology, Statistics Netherlands, 6401 CZ, Heerlen, The Netherlands. Email: [email protected], [email protected].
128
Chapter 10
for instance Ohlsson (1995) and Nedyalkova et al. (2009). Two types of coordination are generally distinguished. In the case of positive coordination a fixed positive overlap between the samples is achieved in order to obtain precise estimates of change. The samples then form a panel. A panel should always be kept representative for the population, which means that it should be updated to account for population dynamics. In addition, for rotating panels a part of the panel, specified by the rotation fraction, is periodically renewed. In the case of negative coordination the overlap between the samples is minimised in order to realise an even spread of the survey burden of the population units. Challenges in dealing with sampling coordination are to balance the precision of estimates of change and the spread of survey burden, to prevent selectivity in the drawn samples due to dependency between the coordinated samples and population dynamics and to respect the sampling designs of the surveys. In December 2014 Statistics Netherlands brought a coordinated sampling system into use for business surveys with the main purpose to centrally perform the sampling for Dutch business surveys. Another purpose of the sampling system is to spread the total survey burden as evenly as possible over the enterprises by sampling coordination. This chapter focuses on the methodology of the sampling system, based on the former EDS system (van Huis et al., 1994). The spread of survey burden is realised in a given group of surveys with a common sampling frame by keeping a survey burden value for every enterprise in the sampling frame. The survey burden values represent the total accumulated survey burden of the enterprises. This results in a spread of survey burden both over time and over the surveys in the group. At this moment the sampling system performs the sampling for 17 business surveys, among which are the Structural Business Statistics and the Short Term Statistics. Coordination of samples can be applied in a group of surveys, possibly including both stratified cross-sectional and rotating panel designs. So, the system combines positive and negative sampling coordination. The spread of the survey burden in the group is optimal when all the surveys use the same stratification (basic stratification), but under some restrictions it is allowed to depart from the basic stratification by using substrata. Section 2 describes the sampling algorithm and discusses under which conditions substrata can be used and how the algorithm takes account of population dynamics like deaths, births and (basic) stratum movers. Section 3 presents the results of a simulation study in which several aspects of the spread of survey burden are examined and where it is shown that no
Sample coordination of business surveys at Statistics Netherlands
129
selectivity is introduced in the drawn samples. Section 4 gives some concluding remarks and discusses some future challenges.
2. Sampling algorithm This section describes the underlying sampling algorithm of the coordinated sampling system. Let ԭ be a group of surveys with a common sampling frame ܷ, in which both stratified cross-sectional and rotating panel designs may occur. It is assumed that the surveys in ԭ use the same basic stratification. The use of substrata is discussed in subsection 2.2. The algorithm produces a simple random sample ݏ of ݊ units in every basic stratum ݄ ൌ ͳǡ ǥ ǡ ܪ. The size of the stratum population ܷ is denoted ܰ for ݄ ൌ ͳǡ ǥ ǡ ܪ. For every panel in ԭ panel rotation is performed with a user-defined and stratum-dependent rotation fraction ݒ . And finally, for every survey ݈ in ԭ a stratum-dependent weight ܹ Ͳ is available representing the survey burden caused by this survey. The idea of the algorithm is to keep a vector of parameters ሺܴ ǡ ܤ ǡ ܫ ሻ for every unit ݇ ܷ א, with ܴ אሾͲǡͳሿ a unique permanent random number (PRN), ܤ the total cumulative survey burden and ܫ אሼͲǡͳሽ the panel membership indicator for every panel א ԭ. A sample is drawn by selecting the first ݊ units according to some specified ordering of the parameters ሺܴ ǡ ܤ ǡ ܫ ሻ. Before the first draw of a sample in ԭ, the sampling parameters ሺܴ ǡ ܤ ǡ ܫ ሻ are initialised as follows. For every unit݇ ܷ א a unique random number ܴ is uniformly and independently drawn from ሾͲǡͳሿ and ܤ ൌ ܫ ൌ Ͳ, for every א ԭ. Algorithm 1 produces a sample for a crosssectional survey or the first sample for a panel survey and Algorithm 2 produces a subsequent sample for a panel survey and performs panel rotation. Algorithm 1. Draw of a cross-sectional survey אफ For every basic stratum ݄ ൌ ͳǡ ǥ ǡ ܪ: 1. Sort all units ݇ ܷ א by (i) ܤ (increasing) and (ii) ܴ (increasing). 2. Select the first ݊ units within this ordering. These units form the sample ݏ . 3. For every ݇ ݏ א , let ܤ ൌ ܤ ܹ .
130
Chapter 10
Algorithm 2. Subsequent draw of a panel survey אफ For every basic stratum ݄ ൌ ͳǡ ǥ ǡ ܪ: 1. Sort all units ݇ ܷ א by (i) ܫ (decreasing), (ii) ܤ (increasing), (iii) ܴ (increasing). 2. Let ݉ be the number of units in the panel. If rotation is applied (mostly periodically), define ݑ ൌ ሺݒ ݉ ሻ, otherwise ݑ ൌ Ͳ. Remove the ݑ last units with ܫ ൌ ͳ from the panel. 3. Adjust the panel to get the required sample size ݊ : x If ݉ െ ݑ ൏ ݊ , add the first ݊ െ ሺ݉ െ ݑ ሻ units with ܫ ൌ Ͳ to the panel. x If ݉ െ ݑ ݊ , remove extra ݉ െ ݑ െ ݊ units from the panel (last units with ܫ ൌ ͳ). x If ݉ െ ݑ ൌ ݊ , no adjustment of the panel is needed. 4. Let ܫ ൌ Ͳ for every unit ݇ that is removed from the panel and ܫ ൌ ͳ for every unit ݇ that is added to the panel. For every ݇ with ܫ ൌ ͳ let ܤ ൌ ܤ ܹ .
2.1 Taking into account population dynamics Changes in economic activity or reorganisations of the enterprises may lead to population dynamics in terms of deaths, births and (basic) stratum moves of the units݇ ܷ א. Deaths can easily be removed from the sampling frame. In order to prevent births and stratum movers from being systematically over- or underrepresented in the drawn samples, these units must be indistinguishable from the existing units with regard to ሺܴ ǡ ܤ ǡ ܫ ሻ. In other words, in every stratum ݄, births, stratum movers and existing units must have the same joint distribution of ሺܴ ǡ ܤ ǡ ܫ ሻ. Before every draw of a sample in ԭ suitable values of ሺܴ ǡ ܤ ǡ ܫ ሻ are therefore assigned to the births and stratum movers. For births a new PRN ܴ is uniformly selected from ሾͲǡͳሿ and values for ܤ and for ܫ , for every panel א ԭ, are assigned that are appropriate to ܴ . The appropriate values of ܤ and ܫ are obtained by copying them from the existing unit in the stratum whose PRN is closest to ܴ . For stratum movers the relative position is taken over to the new stratum according to a user-defined ordering of ሺܴ ǡ ܤ ǡ ܫ ሻ. The relative position of a unit with rank ݅ according to the specified ordering of stratum ݄ with
Sample coordination of business surveys at Statistics Netherlands
131
ܰ units is defined as ݅Ȁሺܰ ͳሻ. This is realised by copying the values ൫ܤ ǡ ܫ ൯ from the existing unit in the new stratum which is closest to the relative position of the stratum mover according to the same ordering. A new and unique PRN ܴ is randomly chosen from the interval of PRNs determined by adjacent units of this existing unit. Note that in case of stratum movers the value of ܴ is in fact not permanent anymore. By the ordering of ሺܴ ǡ ܤ ǡ ܫ ሻ it can be specified whether or not the spread of survey burden has to be taken into account and whether it is more important to account for the cumulative survey burden or the panel membership of the stratum mover. Possible orderings are: (1) only by ܴ , (2) by ܤ and ܴ or (3) by ܫ , ܤ and ܴ .
2.2 The use of substrata It frequently occurs that surveys want to exclude specific enterprises in a basic stratum from being sampled or, on the contrary, select specific enterprises with probability 1. It is possible to depart from the basic stratification by using substrata. In order to prevent selectivity being introduced in the drawn samples with respect to the substrata, the parameters ሺܴ ǡ ܤ ǡ ܫ ሻ must always be assigned at the level of the basic strata, while the sampling is done per substratum. For cross-sectional surveys, every basic stratum ݄ can be divided into ܬ substrata ݄ for ݆ ൌ ͳǡ ǥ ǡ ܬ , with different sampling fractions. By applying the first two lines of Algorithm 1 to every substratum ݄ a sample of size ݊ is obtained. Then ܤ is updated for the first ݊ units in basic stratum ݄, where the units are sorted by ܤ and ܴ (both increasing) and ݊ . As a consequence, the burden is increased for some of the ݊ ൌ σୀଵ nonsampled units in the basic stratum ݄ and no burden is applied to some sampled units. This implies that by using substrata the spread of survey burden is suboptimal. Note that it is not allowed to add burden to the units that are actually sampled, because then the survey burden ܤ would be updated at the level of the substrata. In order to prevent selectivity it is necessary that the parameters ൫ܴ ǡ ܤ ǡ ܫ ൯are always updated at the level of the basic strata. For panels, both parameters ܤ and ܫ must be defined at the level of basic stratum ݄, while at the same time it must be possible to derive the panel from ܫ in every substratum of ݄. This is realised by allowing a restrictive form of substratification for panels, where basic stratum ݄ is divided into at most three substrata, one with an arbitrary sampling fraction
132
Chapter 10
݂ଵ (main substratum) and the others with fractions 0 and 1. Let the panel indicator ܫ represent an imaginary panel in basic stratum ݄. Sampling and rotation of this imaginary panel and updating the parameters ൫ܴ ǡ ܤ ǡ ܫ ൯ is done by Algorithm 2 with ݊୧୫ ൌ ݂ଵ ܰ . Here ݊୧୫ is the sample size of the imaginary panel in basic stratum ݄. The real panel can be derived from the (imaginary) panel indicator by taking the units in the main substratum with ܫ ൌ ͳ, supplemented with all units in the substratum with fraction 1. The sample size of the real panel in basic stratum ݄ is given by ݊ ൌ ݂ଵ ܰଵ ܰଶ , where ݄ଶ is the substratum with fraction 1.
3. Simulation results By a simulation study we show that the sampling algorithm coordinates the samples in a given group ԭ of (imaginary) surveys without introducing selectivity in the drawn samples. Furthermore, we show that the survey burden is evenly spread over the units in the sampling frame. The tests are performed by simulating a series of (monthly) draws from an artificial population ܷ with simulated population dynamics. Suppose that ܷ is divided into 5 basic strata. At the beginning of the simulation ܷ consists of 100,000 units. Table 10-1 displays the sizes of the basic strata at the beginning of the simulation. Births, deaths and stratum movers are simulated in such a way that ܷ remains sufficiently stable over time. Table 10-1: Population sizes of basic strata at beginning of simulation.
Stratum h 1 2 3 4 5 Total
Nh 83,288 12,688 3,187 675 162 100,000
Group ԭ contains one cross-sectional survey and two panels. 250 draws are simulated. The surveys in ԭ have ܷ as a common sampling frame and the basic stratification as the common stratification. We suppose that the
Sample coordination of business surveys at Statistics Netherlands
133
surveys have the same weights ܹଵ ൌ ܹଶ ൌ ܹଷ ൌ ͳ. Table 10-2 displays the survey designs and the sampling fractions per basic stratum. Table 10-2: Surveys in फ with sampling fractions per basic stratum. The rotation fractions are applied yearly (yr-1) or monthly (mo-1). 1 Panel Frequency Rotation Sampling fraction stratum 1 " " stratum 2 " " stratum 3 " " stratum 4 " " stratum 5
No Year 0.03 0.06 0.10 0.15 0.30
Survey 2 Yes Month 0.1 (yr-1) 0.02 0.04 0.08 0.12 0.25
3 Yes Month 0.2 (mo-1) 0.01 0.05 0.60 0.80 1.00
The use of substrata and the effect of using different weights is not examined in this simulation. The existence of possible selectivity in the drawn samples is examined by computing the realised sampling fractions in every stratum, where the full population and the subpopulations of births and stratum movers are considered separately. The fractions and the corresponding margins of error at 95% confidence are computed by σ ǡ ேǡ
݂ǡ ൌ σ
ሺଵି ሻ
and ݉ǡ ൌ ʹට
σ ேǡ
for ݅ { אfull, births, movers}.
Here ݊ǡ௧ is the number of units drawn in stratum ݄ at time ݐfor subpopulation ݅ with size ܰǡ௧ and is the sampling fraction according to the design. The results for the three surveys are given in the tables 10-3, 10-4 and 10-5. All fractions turn out to be within the confidence intervals for the full stratum, the births and the stratum movers. From this we conclude that averaged over time no selectivity is introduced in the samples. Also over time there are no trends visible in the realised fractions, which is for example shown in Figure 10-1 for survey 1 and stratum 1. The spread of survey burden in ԭ is investigated by computing the total survey burden for every unit, that is, the number of times a unit is drawn by one of the surveys in ԭ during the whole simulation. Figure 10-2 shows the total survey burden per stratum for the units that exist during the whole
134
Chapter 10
Table 10-3: Mean of realised sampling fractions in simulation for survey 1. Stratum Fraction h ࢎ 1 2 3 4 5
0.03 0.06 0.10 0.15 0.30
Fraction ࢌܔܔܝǡࢎ (95% margin)
Fraction ࢌܛܐܜܚܑ܊ǡࢎ (95% margin)
Fraction ࢌܛܚ܍ܞܗܕǡࢎ (95% margin)
0.0301 (0.0003) 0.0600 (0.0009) 0.0999 (0.0023) 0.1506 (0.0059) 0.2989 (0.0152)
0.0288 (0.0047) 0.0488 (0.0168) 0.0667 (0.0516) 0.1250 (0.1785) 0.1429 (0.3464)
0.0317 (0.0124) 0.0546 (0.0127) 0.0909 (0.0355) 0.1127 (0.0848) 0.3846 (0.2542)
Table 10-4: Mean of realised sampling fractions in simulation for survey 2. Stratum Fraction h ࢎ 1 2 3 4 5
0.02 0.04 0.08 0.12 0.25
Fraction ࢌܔܔܝǡࢎ (95% margin)
Fraction ࢌܛܐܜܚܑ܊ǡࢎ (95% margin)
Fraction ࢌܛܚ܍ܞܗܕǡࢎ (95% margin)
0.0201 (0.0001) 0.0399 (0.0002) 0.0801 (0.0006) 0.1200 (0.0016) 0.2499 (0.0042)
0.0198 (0.0011) 0.0392 (0.0040) 0.0721 (0.0136) 0.1173 (0.0486) 0.3214 (0.1157)
0.0223 (0.0101) 0.0366 (0.0105) 0.0697 (0.0320) 0.0986 (0.0771) 0.4615 (0.2402)
Table 10-5: Mean of realised sampling fractions in simulation for survey 3. Stratum Fraction h ࢎ 1 2 3 4 5
0.01 0.05 0.60 0.80 1.00
Fraction ࢌܔܔܝǡࢎ (95% margin)
Fraction ࢌܛܐܜܚܑ܊ǡࢎ (95% margin)
Fraction ࢌܛܚ܍ܞܗܕǡࢎ (95% margin)
0.0100 (0.0000) 0.0500 (0.0002) 0.5999 (0.0011) 0.8003 (0.0019) 1.0000 (0.0000)
0.0102 (0.0008) 0.0521 (0.0045) 0.5965 (0.0245) 0.8268 (0.0598) 1.0000 (0.0000)
0.0092 (0.0072) 0.0530 (0.0117) 0.5958 (0.0578) 0.8028 (0.0949) 1.0000 (0.0000)
Sample coordination of business surveys at Statistics Netherlands
135
Figure 10-1: Realised sampling fractions in the simulation for survey 1 in stratum 1. The subscripts b and m stand for births and stratum movers.
simulation and are not stratum movers. Stratum 5 is not shown, because of the small number of units in this stratum. Due to the larger fractions in strata 3 and 4, the total survey burden in these strata is larger than in strata 1 and 2. In strata 3 and 4 the values are close to each other, implying that most of the units experience the same survey burden. In strata 1 and 2 the total survey burden is concentrated around two values. In both strata the maximal survey burden is 120, the expected length of stay in the panel of survey 2. It implies that in strata 1 and 2 units that are drawn in this panel
Figure 10-2: Total survey burden in ԭ for units that exist during the whole simulation and are not stratum movers.
136
Chapter 10
hardly end up in the other surveys. From this, it can be concluded that the total survey burden is evenly spread across the units in the population. There are some other aspects of the spread of survey burden that could be examined. One could examine the lengths of stay in a panel in order to check whether the units do not stay too long in the panel given the rotation fraction. The spread of the burden over the surveys in ԭ could be investigated by computing the length of the survey-free periods of the units in ܷ, that is, the number of successive periods that a unit is not drawn for any of the surveys in ԭ. Another aspect is the occurrence that units are drawn for more surveys in ԭ at the same time. Here the results are given for the survey-free periods in the simulation. Table 10-6 shows the lengths of the survey-free periods for every stratum. Note that in stratum 5 there are no survey-free periods, because of the fraction 1 for survey 3. In stratum 1 all units have long survey-free periods. In the other strata the minimal survey-free period is only 1 month, which is caused by the larger sampling fractions in combination with the panels. Table 10-6: Lengths of survey-free periods (months) for units that are drawn at least twice in the simulation and are not stratum movers. Stratum h Min. 1 2 3 4
167 1 1 1
1st Qu.
Median
Mean
3rd Qu.
Max.
180 24 2 1
190 68 2 1
189.5 53.1 2.6 1.0
200 75 3 1
211 236 52 6
4. Conclusions and future work This chapter discusses the methodology of the current coordinated sampling system for business surveys at Statistics Netherlands. The sampling system coordinates the sampling in a given group of surveys, where both stratified cross-sectional and rotating panel designs can be combined. The system also takes into account population dynamics. For stratum movers it allows a choice whether or not to take account of the cumulative survey burden and whether it is more important to account for the cumulative survey burden or the panel membership of the stratum mover. By coordination of the sampling the total survey burden is spread as evenly as possible over the enterprises. The spread of survey burden is
Sample coordination of business surveys at Statistics Netherlands
137
optimal when the surveys in the group use the same basic stratification. Cross-sectional surveys can depart from the basic stratification by defining substrata with different sampling fractions. For panels only three substrata can be defined in every basic stratum, one with an arbitrary sampling fraction and the other with fractions 0 and 1. The extension to a general substratification for panels is one of the future challenges, but the restricted form of substratification will often suffice in practice. By a simulation study it is shown that sampling in the group can be coordinated without introducing selectivity in the drawn samples. By examining the total survey burden of the enterprises in the simulation, it is shown that the survey burden is evenly spread over the enterprises. Also some other aspects of the spread of survey burden are examined, such as the lengths of stay in the panel and the lengths of survey-free periods during the simulation. The main purpose of the further development of the sampling system is to support the sampling of more business surveys. Therefore, it has to be investigated whether sampling coordination can be applied to other sampling designs, like cluster sampling and probability proportional to size sampling, and under what conditions surveys with different sampling designs can be combined.
References van Huis, L.T., Koeijers, C.A.J. and de Ree, S.J.M. (1994). EDS, sampling system for the central business register at Statistics Netherlands. Internal report, Department of Statistical Methods, Statistics Netherlands. Available at https://statswiki.unece.org/download/ attachments/117774189/EDS%20EN.PDF. Ohlsson, E. (1995). Coordination of samples using permanent random numbers. In Cox, B.G., Binder, D.A., Chinnappa, B.N., Christianson, A., Colledge, M.J. and Kott, P.S. (eds.), Business Survey Methods, pp. 153-169. New York: Wiley. Nedyalkova, D., Qualité, L. and Tillé, Y. (2009). General framework for the rotation of units in repeated survey sampling. Statistica Neerlandica 63, 269-293.
CHAPTER 11 SAMPLE COORDINATION AND RESPONSE BURDEN FOR BUSINESS SURVEYS: METHODOLOGY AND PRACTICE OF THE PROCEDURE IMPLEMENTED AT INSEE EMMANUEL GROS1 AND RONAN LE GLEUT1
Abstract This chapter gives an overview of the method presently used at INSEE, the National Statistical Institute of France, for coordinated sampling in businesses surveys. This method is based on the use of a coordination function which transforms Permanent Random Numbers and has the property of preserving uniform probability. This function changes with each selection, depending on the desired type of coordination: negative coordination to foster the selection of units that have not already been selected in recent surveys, or positive coordination to maximize the overlap between coordinated samples. This method takes into account the cumulative response burden over several samples, and can be used with surveys based on different kinds of units with stratified simple random sampling.
1. Introduction Each year, the official statistical system carries out a significant number of business and establishment surveys. In Scholtus et al. (2014), the authors explain that the “main objectives of sample coordination are to obtain comparable and coherent statistics, high precision in estimates of change 1
Institut national de la statistique et des études économiques, 88 avenue Verdier, 92120 Montrouge, FRANCE. Email: [email protected], [email protected].
140
Chapter 11
over time and to spread the response burden evenly among the businesses”. The objective of the negative coordination of samples is to foster, when selecting a sample, the selection of businesses that have not already been selected or have been selected as few times as possible in recent surveys, while preserving the unbiasedness of the samples. This coordination contributes to reduce the statistical burden of small businesses by spreading the response burden on different units; large businesses, from a certain threshold, are systematically surveyed in most surveys. On the other hand, positive coordination aims at maximizing the overlap between the samples, either to obtain a panel sample or once again with the aim of reducing the statistical burden, in this case by reducing the number of questions in questionnaires through using information already collected with previous surveys. We present here the sampling coordination method used at the National Statistical Institute of France (INSEE) since the end of 2013. This method belongs to the family of sample coordination procedures based on Permanent Random Numbers (PRN, see for instance Scholtus et al., 2014), and is based on the notion of coordination functions (Hesse, 2001). These functions, defined for each unit and each new draw taking into account the past response burden of each unit, transform permanent random numbers so as to meet the objective of negative or positive coordination. After a review of the main principles of the method limited to the case of stratified simple random sampling (section 2), we present the results of simulation studies assessing the properties of this coordination method (subsection 3.1). Then, we focus on how the method allows the coordination of samples relating to surveys based on different kinds of units, for example legal units and local units (subsection 3.2). Finally, in section 4, we address two potential drawbacks of this coordination procedure – feedback bias and incompatibility with systematic sampling – and present options chosen on both issues.
2. The procedure of sampling coordination for business surveys implemented at INSEE We present here the main principles of the method, detailed in Guggemos and Sautory (2012), limited to the case of stratified simple random sampling, which is the sampling design that is the most frequently used at INSEE for business surveys. This method was first proposed by Hesse (2001).
Sample coordination and response burden for business surveys at INSEE 141
2.1 Coordination functions and sample selection The concept of a coordination function plays an essential role in the method. A coordination function g is a measurable function from >0,1@ onto itself, which preserves uniform probability: if P is the uniform g probability on >0,1@ , then the image probability P is equal to P .
>a, b@ included in >0,1@ : Pg I P I b a
It means that for any interval I
P ª¬ g 1 I º¼
def
The length of the inverse image of any interval under g equals the length of this interval: a coordination function preserves the length of intervals – or unions of intervals – by inverse image. Each unit k of the population is given a permanent random number
Zk , drawn according to the uniform distribution on the interval >0,1@ . The draws of the Zk are mutually independent. We consider a sequence of surveys t 1, 2, ... ( t refers to the date and the number of the survey), and we denote by St the sample corresponding to survey t . Suppose that one has defined for each unit k a “wisely chosen” coordination function gk ,t which changes at each survey t . Drawing of a sample The drawing of the sample St by stratified simple random sampling is done by selecting, within each stratum h, t of size N h ,t , the n h,t units associated with the n h,t smallest transformed numbers g k ,t Z k , k
1,..., N h ,t .
Proof The N h ,t random numbers Zk associated to the N h ,t units of the stratum have been independently selected according to the uniform g probability on >0,1@ , denoted P . Since we have P k ,t P for each
142
Chapter 11
k , the N h ,t numbers gk ,t Zk are also independently selected
according to P . Then, using a well-known result, see for instance Tillé (2006, p. 50), the n h,t smallest values gk ,t Zk give a simple random sample of size n h,t in the stratum.
2.2 Construction of a coordination function from the cumulative response burden For the drawing of a given sample of a survey t , the coordination function gk ,t takes into account the cumulative response burden *k ,t1 of a unit k to meet the objective of negative or positive coordination. For negative coordination, gk ,t is defined so that the higher the cumulative response burden of a unit k , the higher the number gk ,t Zk . Indeed, a sample is drawn by selecting the units with the smallest numbers gk ,t Zk , i.e. the ones with the smallest cumulative response burden. For positive coordination, we attribute a negative response burden in order to get a small transformed number gk ,t Zk . The coordination functions of previous surveys gk ,1 ,..., gk ,t 1 form the basis for calculating the cumulative response burden *k ,t1 : For each previous survey, we calculate the response burden of a surveyed business k , J k ,1,...,J k ,t 1 that depends on the coordination functions:
J k ,1 g k ,1 Zk
g k ,1 Zk # g k ,t 1 Zk
J k , t 1 g k , t 1 Z k
Then, we define the cumulative burden for a unit k up to survey t 1 , *k ,t1 , which allows us to determine the coordination function for the current sample drawing:
Sample coordination and response burden for business surveys at INSEE 143
* k ,t 1
¦ J g Z g Z k ,u
k ,u
k
k ,t
k
u d t 1
For more details on the formulas and the way to define the different functions (coordination function, cumulative response burden function, etc.), see Guggemos and Sautory (2012).
3. Assessment of the coordination procedure and coordination between surveys based on different kind of units A first empirical assessment of the procedure was made by Guggemos and Sautory (2012) on simulated data. The results of these simulations were very satisfactory: the coordination method proved to be both highly efficient (compared to independent draws) – in terms of response burden allocation over the population units – and remarkably robust with regard to the parameters of the different sampling plans – sampling rates, differences of stratification between the surveys, overlapping of the survey scopes, response burden assigned to each survey, etc. This first study was completed by Kevin Rosamont-Prombo during an internship at INSEE in 2012: Firstly, additional and more complete tests on simulated data were conducted, and their results confirmed those obtained previously. Secondly, a first test on real data was performed, based on the Information and Communication Technologies (ICT) surveys from 2008 to 2012. The results were once again satisfactory in terms of response burden allocation over the population units. They also showed that the coordination procedure could be used in concert with the method used at INSEE for the management of rotating samples in business surveys (see for instance Demoly et al., 2014). The simulations and results that follow enrich and complement the earlier work in two ways: A full-scale test of the procedure on real data in order to assess its operational feasibility and its performance in a production situation. A test of a “multilevel” coordination procedure allowing sample coordination between surveys based on different kinds of units (legal units and establishments, for example).
144
Chapter 11
3.1 Full-scale test on real data: coordinated drawing of 20 legal unit samples To assess the operational feasibility of the coordination procedure, as well as its properties in terms of response burden allocation over the population units in a production situation, we conducted a full-scale simulation study on real data. The simulations consisted of draws of 20 legal unit samples, which corresponds approximately to the number of samples to be drawn in two years: starting with the 2008 Annual Sectoral Survey (ESA), which thus constitutes the survey initiating the sequence of coordinated draws in our simulations; then performing, in chronological order, the draws of the 19 other legal unit samples (see the list below) following the production scheme in these two years: o respecting the sampling designs used during the actual drawings of these surveys: stratification criteria, allocations, positive coordination of the sample of the retail outlet survey with the ESA 2009 sample, etc. o each sample being negatively coordinated with all of the previous ones. The 19 legal unit samples used in this simulation are the following: Information and Communication Technologies 2010, 2011 and 2012, Housing Maintenance and Improvement Work Index 2010 and 2011, Labour Force Activity and Employment Conditions surveys on enterprises with less than 10 persons employed 2010, 2011 and 2012, Annual Sectoral Surveys 2009, 2010 and 2011, Retail Outlets 2010, New Enterprises Information System 2010, Continuing Vocational Training Survey 2011, Community Innovation Survey 2010, Companies and Sustainable Development Survey 2011, survey on the energy quality for the construction companies 2012, survey on “Global Value Chains” 2012, survey on ICT usage in enterprises with less than 10 persons employed 2012. A sequence of 20 independent draws was also carried out, for comparison in assessing the efficiency of the coordination process in terms of response burden allocation over the population units. From an operational standpoint (using a standard PC with a processor Intel Celeron CPU G1610 2.60 GHz and 4 Go RAM), no problems with the coordination procedure emerged:
Sample coordination and response burden for business surveys at INSEE 145
Computation time remained reasonable: about 8 hours for the complete sequence of 20 coordinated draws. The same goes for storage requirements: all the permutations required for defining the coordination functions involved in the drawings took up only 6 GB. In terms of efficiency of the sampling coordination method, there is, as expected, a far better response burden allocation over the population units when the draws are negatively coordinated. Table 11-1 shows the distribution of the population units depending on the number of samples in which they have been selected – that is, the distribution over the population of the cumulative response burden, defined by the variable “number of selections”. As the coordination does not affect the take-all strata, they were excluded from the calculations of cumulative response burden, in order to be able to assess the quality of the procedure on its real scope. Table 11-1: Allocation of the cumulative response burden, except take all-strata, according to the sampling scheme. Cumulative response burden, except take-all strata 0 1 2 3 4 5
Frequency according to the sampling scheme Independent draws
Coordinated draws
3,981,423 391,840 30,494 3,670 374 18
3,952,718 445,402 9,084 606 9 0
Difference between coordinated and independent draws -28,705 53,562 -21,410 -3,064 -365 -18
As expected, and in line with the results obtained in previous tests on simulated data and on real data based only on ICT surveys, there is a narrowing of the distribution around 1, that is a spreading of the response burden: the number of units selected in more than one sample decreases in significant proportions, as the number of non-sampled units, in favour of a marked increase in the number of units selected in a single sample. Furthermore, it is important to note that this coordination method also allows for positive coordination between surveys: to do this, one has to assign a negative response burden to the survey(s) with which one aims to positively coordinate the survey. In our simulations, we have also assigned
146
Chapter 11
a negative response burden to the sample of ESA 2009 when drawing the sample of the Retail Outlets survey (positive coordination). The results in terms of coverage between the two samples are satisfactory, slightly higher than those observed with the coordination method previously used by INSEE, based on another technique (Cotton and Hesse, 1992; Demoly et al., 2014). Indeed, among the 7,463 units of the ESA 2009 sample that belong to the scope of the Retail Outlets survey, 3,020 are also included in the sample of this second survey with the new coordination procedure, versus 2,885 with the previous one. Finally, in Table 11-1 the fact that some units are selected in more than one sample is mainly explained by: The positive coordination of the sample of the Retail Outlets survey with the sample of the ESA 2009. The existence of strata with high sampling rates in some surveys. Thus, of the 9,084 units present in two samples in the sequence of coordinated draws, 2,909 are due to the positive coordination mentioned above. For the 6,175 remaining units, 50% of them belong, in one of the two samples in which they are selected, to strata with a sampling rate greater than 50%, and 45% belong to strata with a sampling rate between 20% and 50%. The last 5% show a limitation of the method, which does not permit a total disjunction between the samples but remains random and unbiased. The same analysis could be done for the units present in more than two samples, with the positive coordination that explains 111 of the 615 units that belong to three samples or more, and the high sampling rates that mostly explain the rest of the coverage between the samples.
3.2 Sample coordination between surveys based on different kinds of units The method allows the coordination of samples relating to surveys based on different kinds of units, for example legal units and local units. This “multi-level” coordination is performed using the following procedure: We first define a permanent link between the legal unit and one of its local units – the head office of the legal unit at the time of its creation – and assign to this “principal local unit” the same permanent random number as the legal unit – the PRN of other local units being drawn according to the uniform distribution on the interval [0,1]. Thus, we obtain for each level a set of permanent random numbers following a uniform distribution on [0,1], with a
Sample coordination and response burden for business surveys at INSEE 147
one-to-one link [legal unit ļ principal local unit] between these two sets (see for instance Scholtus et al., 2014). Then, each level is subjected to its own coordination system. This implies in particular the management of coordination functions specific to each level, the coordination between legal unit samples and local unit samples taking place exclusively through the [legal unit ļ principal local unit] link as follows: o When drawing a sample of legal units, coordination with samples of local units is performed by taking the response burden of a legal unit as the cumulative response burden of its principal local unit; o Reciprocally, when drawing a sample of local units, coordination with samples of legal units is performed by taking the response burden of principal local units as the cumulative response burden of their legal unit. It is important to note that we cannot, in the first case, take into account for a legal unit the cumulative response burden of all of its local units, or in the second case divide the cumulative response burden of a legal unit across its local units. Indeed, for a given unit, our procedure takes into account, in the coordination process, not just a cumulative response burden value, but the cumulative response burden function of the unit. And this function makes sense, in terms of coordination, only if it is applied to the permanent random number of its unit. As our multi-level coordination procedure imposes a one-to-one link [legal unit ļ principal local unit], the cumulative response burden function of a legal unit is only relevant, when drawing a sample of local units, if applied to its principal local unit, which is the only one to have the same PRN as the legal unit. Conversely, the cumulative response burden function of the principal local unit is the only one that can be used when drawing a sample of legal units. This constraint on a one-to-one link points out a weakness of our method, for instance when one wants to select a sample through a stratified two-stage cluster sampling (see Gros and Le Gleut, 2018, Chapter 8 in this volume, for an application case). Indeed, the procedure was thought for stratified simple random sampling and not for another sample design. Another limitation is that if the principal local unit ceases to operate or no longer belongs to the legal unit, the multi-level coordination cannot be performed anymore. Indeed, the PRN of the legal unit remains the same and must be identical for the legal unit and its principal local unit. In this case, it is not possible to change the principal local unit of a legal unit. It is
148
Chapter 11
important to note that the changes in the structure of a legal unit (restructuring or reorganization of its local units, etc.) are fed into the register thanks to administrative data. We have assessed the efficiency of this multi-level coordination procedure by incorporating in our simulations 8 local unit surveys in addition to the 20 legal unit surveys previously mentioned. The 8 local unit samples added for this simulation are the following ones: Labour Cost and Structure of Earnings Annual Survey 2010, 2011, 2012 and 2013, REPONSE survey 2011 (survey on professional relationships and business negotiations), survey on services energy consumption 2012, survey on working conditions 2012, survey of the production of non-hazardous waste 2013. Three different sampling schemes were used: Independent draws of the 28 samples, respecting the sampling designs used during the actual draws. Coordinated draws of the 20 legal unit samples on the one side, and of the 8 local unit samples on the other side, but without multi-level coordination. Coordinated draws of the 28 samples via the multi-level coordination procedure described above. We then compared the results of these three strategies in terms of the distribution of the legal units’ response burden, the response burden of the principal local units being taken into account in the cumulative response burden of their legal units. The results in Table 11-2 are in line with our expectations and consistent with the previous ones: compared to independent draws, the strategy of separated coordinated draws leads to a far better response burden allocation over the population units, and this phenomenon is further strengthened when a multi-level coordination procedure is performed.
4. A study of two methodological issues To conclude this review of the properties of the coordination method, we addressed two methodological issues: the problem of “feedback bias” on the one hand, and the issue of systematic sampling of a sorted file on the other hand.
Sample coordination and response burden for business surveys at INSEE 149
Table 11-2: Allocation of the cumulative response burden of legal units, except take all-strata, according to the sampling scheme. Frequency according to the sampling scheme
Cumulative response burden, except take-all strata
Indep. draws
0 1 2 3 4 5 6 7 8
4,670,676 410,016 40,095 8,072 2,142 578 121 20 3
“Level by level” coord. draws 4,651,954 439,355 34,824 4,679 813 93 5 0 0
“Multilevel” coord. draws 4,634,250 474,286 18,230 4,125 737 92 2 1 0
Differences between coordinated and independent draws Indep. Indep. versus versus “level by “multilevel” level” -18,722 -36,426 29,339 64,270 -5,271 -21,865 -3,393 -3,947 -1,329 -1,405 -485 -486 -116 -119 -20 -19 -3 -3
4.1 Feedback bias Feedback bias is a well-known problem which appears in the context of sampling coordination: if we update the sampling frame from a sample A, and then draw in this sampling frame another sample B coordinated with the sample A, this may lead to bias in the results for survey B (see for instance Hesse, 1999; Scholtus et al., 2014). This phenomenon is particularly problematic in the context of a coordination system for business surveys used in production. Indeed, the sampling frames of the majority of business surveys conducted by INSEE are derived from the business register SIRUS, which is regularly updated from the results of different surveys. For example, dead units identified thanks to surveys are deleted from the business register. Another example is the sectoral classification of units in SIRUS – a classification which constitutes a stratification variable in almost all business surveys – which is updated each year based on the results of the Annual Sectoral Survey. Therefore, the establishment of a coordination system for business surveys that can be used in production requires:
150
Chapter 11
either to prohibit feedback from surveys to the business register, which seems unrealistic because it means to deny oneself the use of all available information; or to exclude from the global coordination system the Annual Sectoral Survey (ESA), which is the most important survey used to update the business register. However, insofar as the ESA is the largest – in terms of sample size – business survey, representing a high response burden, this solution would not be completely satisfactory; or to ignore the problem of feedback bias, assuming that it is low enough to be negligible compared to the disadvantages of the two alternatives outlined above. In order to guide us in choosing between the last two options, we conducted a simulation study to quantify the magnitude of the feedback bias, using data from the SBS production system ESANE (for the French “Élaboration des Statistiques Annuelles d’Entreprise”). More specifically, we performed, on the wholesale trade sector, a sequence of 5,000 independent draws of the Annual Sectoral Surveys between 2008 and 2011 “ESA 2008 ĺ ESA 2009 ĺ ESA 2010 ĺ ESA 2011”, and another sequence of 5,000 draws of these same four surveys with negative coordination. Then, for each strategy, we compute the relative bias for estimators by sector and by size group using tax data available for all units. Over this period 2008 - 2011, the main activity of the businesses is updated every year for about 3% of the units in the register: 1% are new companies in the wholesale trade sector, 1.3% are no longer in this sector and 0.7% change their main activity but stay in the wholesale trade sector. It is important to note that most of the changes in the register come from this survey. Table 11-3 shows the distribution of these relative biases for sectorbased estimates concerning the main variables of the ESANE system. As we can see, carrying out coordinated draws did not appear to induce significant and systematic bias in the estimates compared with a strategy of independent draws, and the magnitude of the feedback bias seems to be small enough to be negligible. Table 11-3 (facing page): Mean and distribution of relative bias for sector-based estimates concerning the main variables of the ESANE device in 2011, according to the sampling scheme.
Independent draws No. of Variable Turn-over enterprises Mean 0.0% 0.0% Max 0.3% 0.1% P99 0.3% 0.1% P95 0.1% 0.1% P90 0.1% 0.1% P75 0.0% 0.0% Median 0.0% 0.0% P25 0.0% 0.0% P10 -0.1% -0.1% P5 -0.1% -0.1% P1 -0.2% -0.6% Min -0.2% -0.6% Coordinated draws No. of Variable Turn-over enterprises Mean 0.0% 0.0% Max 0.1% 0.2% P99 0.1% 0.2% P95 0.1% 0.2% P90 0.1% 0.1% P75 0.0% 0.1% Median 0.0% 0.0% P25 0.0% 0.0% P10 -0.1% -0.1% P5 -0.2% -0.1% P1 -0.2% -0.1% Min -0.2% -0.1% Total purchases 0.0% 0.2% 0.2% 0.2% 0.1% 0.1% 0.0% 0.0% -0.1% -0.1% -0.2% -0.2%
Total purchases 0.0% 0.1% 0.1% 0.1% 0.1% 0.0% 0.0% 0.0% -0.1% -0.2% -0.6% -0.6%
0.0% 0.3% 0.3% 0.2% 0.1% 0.0% 0.0% 0.0% 0.0% -0.1% -0.1% -0.1%
Salary
0.0% 0.2% 0.2% 0.1% 0.1% 0.0% 0.0% 0.0% -0.1% -0.1% -0.4% -0.4%
Salary
Value added 0.0% 0.3% 0.3% 0.2% 0.1% 0.0% 0.0% 0.0% -0.1% -0.1% -0.1% -0.1%
Value added 0.0% 0.2% 0.2% 0.1% 0.1% 0.0% 0.0% 0.0% -0.1% -0.1% -0.4% -0.4% Gross op. Accounting Total Total assets profit result liabilities 0.1% 0.0% 0.0% 0.0% 2.3% 2.7% 0.4% 0.8% 2.3% 2.7% 0.4% 0.8% 1.2% 1.2% 0.2% 0.2% 0.3% 0.6% 0.1% 0.2% 0.1% 0.1% 0.0% 0.1% 0.0% 0.0% 0.0% 0.0% -0.1% -0.1% 0.0% 0.0% -0.2% -0.3% -0.1% -0.1% -0.4% -0.7% -0.1% -0.1% -2.4% -3.0% -0.5% -0.6% -2.4% -3.0% -0.5% -0.6%
Gross op. Accounting Total Total assets profit result liabilities 0.0% -0.1% 0.0% 0.0% 6.8% 4.1% 0.9% 1.0% 6.8% 4.1% 0.9% 1.0% 1.0% 2.4% 0.1% 0.1% 0.2% 0.2% 0.1% 0.1% 0.1% 0.1% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% -0.1% -0.1% 0.0% 0.0% -0.2% -0.6% -0.1% -0.1% -1.1% -1.9% -0.3% -0.2% -11.8% -6.7% -1.0% -1.1% -11.8% -6.7% -1.0% -1.1%
152
Chapter 11
4.2 Systematic sampling and coordination Samples of business surveys are almost always drawn according to stratified sampling designs, with equal probabilities within each stratum. Moreover, the drawing of the units within each stratum is frequently done by systematic sampling after sorting units within each stratum according to a given criterion (see for instance Cochran, 1977, p. 205). This sampling procedure – which provides, within each stratum, a distribution of sampled units close to that observed in the sampling frame for the sort criterion – is unfortunately totally incompatible with a coordination procedure based on permanent random numbers. However, as systematic sampling on sorted file is equivalent to an implicit stratified sampling with proportional allocation, it is possible in coordinated sampling to take into account the criterion previously controlled by the systematic sampling as follows: 1. We first redefine the sorting variable as an additional stratification variable in order to define selection strata (sub-stratification). 2. We then apply the sampling rates computed on the initial stratification to the selection strata in order to define stratum allocations. 3. Finally, we merge, if needed, some of the selection strata in order to avoid strata with an allocation equal to zero. This procedure leads to a large increase of the number of selection strata, which could affect the quality of the coordination. In order to assess the impact of this “over-stratification” procedure, we performed a simulation study, comparing three sampling schemes: systematic sampling, coordinated draws without over-stratification and “systematic coordinated draws”, that is coordinated draws with over-stratification. Results in Table 11-4 show that the increasing of the number of strata results in a small deterioration in the quality of coordination.
5. Conclusion The sampling coordination method presented in this chapter is shown, via four simulations studies conducted on real data, to be comprehensive and efficient. The new procedure allows both negative and positive coordination, and permits coordination of a sample with any number of previous surveys while differentiating the response burden assigned to each survey. Moreover, coordination of surveys based on different unit types is possible, even if the method presents weaknesses in this context. Indeed, this procedure does not permit to divide or cumulate response burden
Sample coordination and response burden for business surveys at INSEE 153
Table 11-4: Allocation of the cumulative response burden, except take all-strata, according to the sampling scheme.
-3,436 6,674 -3,045 -187 -6
Independent versus “systematic”
626,896 43,784 251 2 0
Differences between coordinated and independent draws Independent. versus “simple”
627,016 43,703 213 1 0
“Systematic” coordinated draws
630,452 37,029 3,258 188 6
“Simple” coordinated draws
0 1 2 3 4
Independent systematic draws
Cumulative response burden, except take-all strata
Frequency according to the sampling scheme
-3,556 7,755 -3,007 -186 -6
between a legal unit and its local units, and was thought to select samples only through stratified simple random sampling. Furthermore, if the principal local unit of a legal unit ceases to operate or no longer belongs to the legal unit, the multi-level coordination cannot be performed anymore. And also, it does not permit a total disjunction between the coordinated samples in order to remain random and unbiased. Some more simulations done on simulated as well as real data (Guggemos and Sautory, 2014; Levieil, 2015) highlight the efficiency of the method, providing important gains in terms of response burden allocation over the population units even with surveys based on different kinds of units. However, a limitation in this study is that the efficiency of the method is only tested relative to random draws, rather than the previous INSEE method (Cotton and Hesse, 1992; Demoly et al., 2014). However, some results presented in Levieil (2015) highlight the fact that, after one year of using the new procedure, this one performs slightly better than the method previously used by INSEE, while allowing coordination of a sample with any number of previous surveys (which was not the case for the previous method), but these simulations would need to be extended. This procedure of coordination is also robust vis-à-vis sampling design parameters (Guggemos et al., 2012). Concerning the feedback bias, the simulations showed that the method does not induce significant bias in the estimates when the sampling frame is updated from the results of different
154
Chapter 11
coordinated surveys. Finally, increasing the number of selection strata to replace systematic sampling results in a small deterioration in the quality of coordination, but stays acceptable. This method has been used operationally at INSEE since the end of 2013.
References Cochran, W.G. (1977). Sampling Techniques: 3rd edition. New York: Wiley. Cotton, F. and Hesse, C. (1992). Coordinated selection of stratified samples. Proceedings of Statistics Canada Symposium. Demoly, E., Fizzala, A. and Gros, E. (2014). Méthodes et pratiques des enquêtes entreprises à l’INSEE. Journal de la Société Française de Statistique 155, 134-159. Gros, E. and Le Gleut, R. (2018). The impact of profiling on sampling. In Lorenc, B., Smith, P.A., Bavdaž, M., Haraldsen, G., Nedyalkova, D., Zhang, L.-C. and Zimmermann, T. (2018) (eds.). The unit problem and other current topics in business survey methodology, pp 91-105. Newcastle upon Tyne: Cambridge Scholars. Guggemos, F. and Sautory, O. (2012). Sampling coordination of business surveys conducted by INSEE. Proceedings of the Fourth International Conference of Establishment Surveys, June 11-14, Montréal, Canada. Hesse, C. (1999). Sampling coordination: A review by country. INSEE working paper E9908. Available at https://www.epsilon.insee.fr/ jspui/bitstream/1/5768/1/e9908.pdf. Hesse, C. (2001). Généralisation des tirages aléatoires à numéros aléatoires permanents, ou la méthode JALES+. INSEE working paper E0101. Available at https://www.epsilon.insee.fr/jspui/bitstream/1/5764/1/ e0101.pdf Levieil, A. (2015). Mise en œuvre et résultats de la nouvelle procédure de coordination des échantillons des enquêtes auprès des entreprises et/ou des établissement. 12èmes Journées de Méthodologie Statistique, March 31 - April 2, Paris, France. jms.insee.fr/files/documents/2015/ S23_3_ACTE_V2_LEVIEIL_JMS2015.PDF Scholtus, S., van de Laar, R. and Willenborg, L. (2014). The Memobust handbook on methodology for modern business statistics. Topic on sample selection. https://ec.europa.eu/eurostat/cros/content/handbookmethodology-modern-business-statistics_en Tillé, Y. (2006). Sampling algorithms. New York: Springer.
CHAPTER 12 RESPONSE PROCESSES AND RESPONSE QUALITY IN BUSINESS SURVEYS
GUSTAV HARALDSEN1
Abstract In business surveys, responding typically involves looking for and retreiving information from the company’s finance and personnel systems. The difficulty of the questionnaire completion task is influenced both by cognitive challenges and by access to relevant information. This chapter reports from a study of this response process based on the survey component of the Norwegian Structural Business Statistics. In this study we are able to identify a small group of high revenue, large enterprises that had a considerably higher response burden then the majority of enterprises. Mismatches between the requested and available information, which may lead to imprecise estimates in sub-groups and sub-categories, seemed to be a major cause of increased burden for these enterprises.
1. Introduction This chapter is an investigation into the business survey response process, based on process data and a set of response burden questions. Its main purpose is a better understanding of this process. In addition, we discuss the usefulness of the process data and response burden questions that our investigation builds on. The empirical data used in this investigation are gathered from the 2016 Norwegian Revenue, Cost and Investment Survey. This is the survey component of the Norwegian Structural Business Statistics (SBS). Hence, 1
Statistics Norway, Oterveien 23, 2225 Kongsvinger, Norway. Email: [email protected]
158
Chapter 12
in the following we will use the abbreviation SBS2016 survey. Approximately 14,000 enterprises are sampled for this survey. All participants were compelled to respond using a self-administrated web questionnaire and responding is compulsory. The net sample (non-eligible and non-responding units excluded) in 2016 was 13,367 enterprises. In the Business Register of Statistics Norway (SSB, after its Norwegian name) enterprises are defined as “the smallest combination of legal entities that make up an organizational unit that produces goods or services and which, to a certain extent, has independent decision-making authority, especially regarding its current resources” (SSB, 2013). In addition to enterprises, local kind of activity units (LKAU) are identified with an establishment number. Establishments are defined as “Locally defined, functional units where the main business activities fall within a particular industry group” (SSB, 2013). We use the term “establishments” rather than LKAU because it refers to how LKAU is operationalized in SSB’s business register. One implication of this definition is that each enterprise consists of at least one establishment. The SBS2016 survey asked about revenue, costs and investments both at the enterprise and establishment level. If there was only one kind of activity taking place in one premises the enterprise and establishment unit were identical, and no establishment breakdown was needed. For enterprises with more than one establishment, details were requested for each establishment. At the time of the survey most of the enterprise totals from economic activities had already been reported to the National Tax Office and were preloaded into the questionnaire. Hence, the main task of the SBS survey participants was to split the totals into sub-totals; first at the enterprise level and next at the establishment level. For the National Accounts details at the establishment level are of particular interest. Social surveys typically collect information stored in respondents’ memories. Next, information processing usually takes place during questionnaire completion. In business surveys, however, we need to take a broader perspective. Here the response process is embedded in a business environment which affects priorities and access to information (Willimack and Nichols, 2010). It is the business management which decides who should respond to business surveys and what priority the task is given. One major challenge in business surveys is that, from a business perspective, surveys are generally considered to be a non-productive cost. Lack of available time, which may reflect low priority by the management, is an important source of burden pointed out by business respondents (Haraldsen and Jones, 2007).
Response processes and response quality in business surveys
159
In social surveys, focus at the retrieval stage is on memory challenges. By contrast, retrieval in business surveys typically starts by determining whether relevant information is available in the business’s information system, and if so, where. To collect relevant data the business respondent may need to relate to internal information which is distributed between different administrative sources and among holders of different roles at different times (Lorenc, 2007). Businesses differ in how well their work and finances are documented. Research has shown that the most timeconsuming tasks in business surveys usually take place prior to questionnaire completion (Dale and Haraldsen, 2007). Figure 12-1 gives an overview of the survey communication and response processes in business surveys. The survey communication, with information about the survey, should mainly be directed towards the business management, while the questionnaire communicates with the respondent(s). The motivation of the respondent is influenced by priorities made by the management. The difficulty of the completion task is influenced both by cognitive challenges and by access to relevant information.
Figure 12-1: The business survey communication and response process.
160
Chapter 12
In Section 2 of this chapter we look for characteristics which influenced how fast enterprises responded to the SBS2016 survey request. This investigation leads to a four-fold typology of businesses. Then, in sections 3 and 4, we look more closely at the time spent on the internal information collection and processing and on questionnaire completion in these four kinds of enterprises. In Section 5 we look at the response efficiency, defined as the relationship between the time respondents reported it took to perform these tasks and the time span from when they first logged into the web questionnaire to when they submitted it. In Section 6 we focus on tasks and survey questions that respondents found most burdensome. This leads us to some reflections about risks to data quality. The results are wrapped up in Section 7.
2. Survey response patterns The structural business statistics describe economic activities and goods and services provided by the businesses. Because of their economic importance, it is particularly important to collect information from highrevenue companies. Businesses participating in the SBS2016 survey had already reported their gross revenue to the National Tax Office and this was available to Statistics Norway before the survey was launched, so we were able to compare the accumulated response rate with the accumulated revenue produced by the enterprises during the data collection, see Figure 12-2. The results suggest that a number of high-revenue companies waited to report until they were reminded in mid-October that participation is obligatory and non-respondents can be fined. One is tempted to conclude that with a responsive design a reminder at the beginning of September would have given a higher response rate among the most important enterprises at an earlier time. However, if the reason for late reporting is that these businesses need to collect and process internal information, pressing for a response may lead to poorer response quality. Insight into the actual and perceived response burdens may therefore be an important element of a responsive design in business surveys. To gain quicker response without reducing the response quality, the first step may be to tailor the questions better to the internal information sources and otherwise improve the measurement instrument rather than to tighten the reminder procedures.
Response processes and response quality in business surveys
161
Figure 12-2: Cumulative response rate and revenue of enterprises participating in SBS2016 survey. Percent.
To study the results from Figure 12-2 in more detail, we compared response rates between businesses with different levels of total revenue. When doing so, we discovered that we needed a quite detailed categorization to identify the latecomers. While most companies followed the same response pattern, it was only the 5% with the highest total revenue that delivered later than others. There were only 648 such enterprises. Many companies are small. In the SBS2016 survey, 64% of the enterprises sampled had only one establishment, which meant that they only reported at the enterprise level. For the remaining enterprises we should expect that the time it took to complete the questionnaire increased with the number of establishments. As with the revenue, we tried out categorizations with different levels of detail to find the breaking point when the number of establishments affected the response pattern. There was practically no difference between enterprises of different size, but less than 10 establishments, and there was only a small difference between those in the interval 10-19 and 20 and above. Based on these results we constructed four groups of businesses based on the total revenue and number of establishments. Table 12-1 shows this typology. Figure 12-3 shows the response patterns of these four kinds of enterprises.
162
Chapter 12
Table 12-1: Number of establishments, categorised, belonging to enterprises participating in SBS2016, by their total enterprise revenue, categorised. Frequency and percent. Revenue Top 5% Lower Total
Frequency Percent Frequency Percent Frequency Percent
Number of establishments 1-9 10+ Total 426 648 222 3.2 1.7 4.9 12,432 287 12,719 93.0 2.2 95.2 12,858 509 13,367 96.2 3.8 100.0
Figure 12-3: Cumulative response rate by total enterprise revenue and number of establishments. Percent.
Figure 12-3 show that the skewness of the business population does not only have implications for sampling. Skewness is also reflected in the response rates. It is worth noticing how small the group of enterprises that responded late was. In Table 12-1 there are only 222 enterprises with 10 or more establishments and belonging to the 5% with the highest total revenue. A profiling unit or interviewers could have contacted these enterprises within a short period of time to remind them about the survey, and it would not have been very expensive.
Response processes and response quality in business surveys
163
3. Preparing response At the end of the SBS2016 questionnaire we asked the respondents about the length of time they used to collect information needed for the questionnaire and the length of time it took them to complete the questionnaire. We also asked which questions and tasks they perceived as the most difficult and burdensome. These questions are shown in the Appendix. As depicted in the process model presented in Figure 12-1, information can be distributed in different information sources or among people in different positions. Furthermore, where relevant information is to be found may change over time. Individual surveys are snapshots which do not catch changes in information sources unless they are parts of a panel and can be compared. At the time of the survey this leaves us with four options as illustrated in Figure 12-4. The respondent may not need to access other information sources, may need to check with other sources than those at hand, may need help from colleagues or may both need to check with other sources and ask colleagues for help. The percentages given to the left of the different alternatives show how common they were in the SBS2016 survey. They show that close to 60 % of the respondents managed on their own, while a little bit more than 40 % needed to collect information from other sources (Multisource), get help from colleagues (Helping hand) or both (Complex). The rather high proportion of respondents who managed on their own obviously reflects that many businesses are small. It may also be of some importance that all questions in this particular survey were about the business finances. Other business surveys often cover a wider range of topics and consequently also call for a wider range of competence. Figure 12-5 shows consequences of the different patterns of internal data collection on the time spent collecting information before the questionnaire was completed. Median time for collecting information by
Figure 12-4: Patterns of collecting information for the SBS2016 survey. Percent.
164
Chapter 12
those who needed to do so was 70 minutes (the question applied only to those who needed to collect information from other persons or sources other than those at hand). The median time used varied from 30 minutes by those who only needed to check with internal records (Multisource) to two hours by those who both needed to check with internal records and needed help from other persons (Complex). The upper quartile for the latter was four hours.
Figure 12-5: Time spent collecting data for the SBS2016 survey by information collection pattern. Median, upper and lower quartile. Minutes.
Table 12-2: Perceived sources of burden by information collection patterns. Percentage of those who felt the questionnaire burdensome. Sources of burden
Total
Simple
Wait for info (%) Needed help (%) (N)
11.7 27.5 5116
2.9 4.1 1804
Multisource 5.5 6.8 806
Helping Complex hand 19.2 25.6 53.9 56.9 677 1096
Note: The overall total (5116) is higher than the sum of the four other totals because some who identified sources of burden did not answer the question about information sources.
Response processes and response quality in business surveys
165
Those who felt that responding to the questionnaire was burdensome were asked to specify which tasks were burdensome from a multichoice list of options. Two of these options, waiting for information and needing help, were directly related to the internal data collection prior to questionnaire completion. Table 12-2 shows that need for help was considered a higher burden than waiting for information. More than half of those respondents who felt the questionnaire burdensome and had needed a helping hand or both a helping hand and information from other sources (Complex), pointed at the need for help as one source of burden (see figures in bold). Business managers look at survey requests from a cost-benefit perspective. Because there is normally no immediate, direct or obvious benefit, the management will be tempted to look at cost rather than competence when they decide who should respond to the request. Ironically, this may turn out to be an expensive decision because it leads to a need for help and looking for information sources. Figure 12-6 shows the different patterns of internal data collection in enterprises with different levels of revenue and number of establishments (see Table 12-1). In Figure 12-7 we then give the median time spent on internal data collection in these four groups of enterprises. The enterprises at the top of the revenue scale with 10 or more establishments are distinguished from the rest by the high proportion
Figure 12-6: Patterns of information collection by total enterprise revenue and number of establishments. Percent.
166
Chapter 12
Figure 12-7: Time spent on information collection by enterprise revenue and number of establishments. Median, upper and lower quartile. Hours and minutes.
(42.8%) of respondents who both had to ask for help from others and collect information from other sources (Complex). Moreover, high revenue seems to affect the need for help and additional information sources more than a large number of establishments. One way of showing this is to look at those who did not need any kind of help in Figure 12-6 (Simple) and compare differences while holding either the revenue variable or number of establishments constant. If we do so the average difference between enterprises in the top 5% group and those with a lower revenue was ((44.9 – 28.3) + (59.2 – 32.9))/2 = 21.5%, while the equivalent difference for enterprises with 1-9 compared with those with 10 or more enterprises was ((59.2 – 44.9) + (32.9 – 28.3))/2 = 9.5%. The median time spent collecting information in enterprises at the top of the revenue scale and with 10 establishments or more was almost six hours. The upper quartile was nine hours. Note that the timescale used in Figure 12-7 is wider than the one used in Figure 12-5. In Figure 12-5 the median time spent collecting information among enterprises with a complex internal data collection procedure was two hours (120 minutes). The results for the top 5%, 10+ group in Figure 12-7 is much higher than this. This shows that this small group of enterprises had a much bigger job to do than the majority of enterprises.
Response processes and response quality in business surveys
167
4. Responding We now move on to the process of completing the questionnaire once the relevant information has been gathered. The median time used to complete the SBS2016 questionnaire was 30 minutes. The upper quartile was 60 minutes and the lower quartile 15 minutes. Note that the time it took to complete the questionnaire was less than half the time it took to collect the information needed to complete it (Figure 12-5). Basically, there were three versions of the questionnaire. In enterprises with only one establishment, the establishment level was identical to the enterprise level. Consequently, these enterprises only needed to fill in enterprise information. They filled in the simplest version of the questionnaire which had: an introductory screen which updated the status of the enterprise, three screens covering revenue, costs and investments, one screen checking the industry code, one screen with questions about response burden, and finally, one screen for comments and updating contact information. The first four screens were the main survey screens. Enterprises with more than one establishment, except those in the manufacturing industry and those serving the oil industry, had one extra screen where they were asked to split the main enterprise revenue, costs and investment figures by establishment. Multi-establishment enterprises in the manufacturing
Figure 12-8: Time spent completing the three versions of the SBS2016 questionnaire. Median time, lower and upper quartile. Minutes.
168
Chapter 12
Figure 12-9: Versions of the SBS2016 questionnaire by enterprise revenue and number of establishments. Percent.
industry or serving the oil industry had more questions divided into three such specification screens; one for revenues, one for costs and one for investments. Figure 12-8 shows the median time it took to fill in these three versions of the questionnaire. Figure 12-9 shows which versions of the questionnaire enterprises with high and lower revenue and different numbers of establishments filled in and Figure 12-10 how much time these different kinds of enterprises spent completing the SBS2016 questionnaire. By definition, no enterprise with 10 or more establishments received the shortest version of the questionnaires. The version with 1 breakdown screen was most common in enterprises with the Lower, 10+ and top 5%, 10+ groups, the version with 3 breakdown screens was most common in the two top 5% groups. The completion time of high-revenue enterprises with 10 or more establishments stands out. While the overall median time for completing the questionnaire was 30 minutes and the median time for the first three groups in Figure 12-10 varied from 30 minutes to one hour, the median time in the fourth, high-level group was two hours. 50% spent between one and four hours on the questionnaire. This result cannot be explained by the length of the questionnaire these enterprises had to complete, but rather by the challenge of the breakdown tasks present in the longer versions of the questionnaire.
Response processes and response quality in business surveys
169
Figure 12-10: Time spent completing the SBS2016 questionnaire by the enterprise revenue and number of establishments. Median time, lower and upper quartile. Hours and minutes.
5. Response efficiency In this section we will look at the relationship between the total time to collect information and fill in the SBS2016 questionnaire, and the time from when the respondents first logged into the web portal to when they returned the completed questionnaire. The closer these times were to each other, the more efficient the respondents were. If we sum the time the enterprises spent collecting relevant information and completing the questionnaires, the median total time varied from 30 minutes (lower revenue, 1-9 establishments) to seven hours (top 5% revenue, 10+ establishments). This wide variety and the considerable time used by a small minority of the sample is hidden if one just looks at the main figures, dominated by numerous small enterprises. The time results given so far are based on information given by the respondents. But because all questionnaires were computerized, we were also able to collect paradata from the survey. We collected time-stamps when respondents logged in and out of the questionnaire and at different milestones during questionnaire completion. The paradata we use here is the time when the respondent logged into the questionnaire for the first time and the time when he or she submitted the completed questionnaire to Statistics Norway. Table 12-3 shows the result of these calculations. A
170
Chapter 12
Figure 12-11: Total time spent collecting information and completing the SBS2016 questionnaire by the enterprise revenue and number of establishments. Median time, lower and upper quartile. Hours and minutes.
Table 12-3: Time from log-in to log-out by the enterprise revenue and number of establishments. Median time, lower and upper quartile.
Lower, 1-9 Lower, 10+ Top 5%, 1-9 Top 5%, 10+
Lower quartile 13 minutes 1 hour 12 min 1 hour 12 min 2 days
Median 47 minutes 3 days 6 days 13 days
Upper quartile 9 days 16 days 18 days 27 days
comparison of the results in this table with the results in Figure 12-11 gives an indication of the response efficiency in different kinds of enterprises. Overall, the median time from logging in to logging out was 58 minutes, while the median time collecting information and filling in the questionnaire was reported to be 60 minutes. These almost identical numbers reflect that the majority were able to fill in the questionnaire straight away and that respondents were pretty good at reporting how long it took them to fill in the questionnaire. When the sample is split into subgroups and we do not only look at the median, but also at the lower and upper quartiles, however, some larger differences are revealed. While the time it took to collect information and complete the questionnaire was
Response processes and response quality in business surveys
171
given in hours and minutes, the median number of days seemed a more appropriate measurement of the time-span between logging in and submitting for all but the smallest enterprises. The top 5%, 10+ enterprises are of particular interest. Their median time collecting information and completing the questionnaire was 7 hours, while they typically spent 13 from first logging in to submitting the questionnaire. In 72% of these enterprises the respondents had to collect data from other sources or be helped by others (Figure 12-6). The period when the cumulative revenue did not follow the cumulative response rate which we focused on in Figure 12-1 was about 14 days. Considering all
Figure 12-12: Perceived response burdens.
172
Chapter 12
these results together, the top 5%, 10+ enterprises do not appear inefficient. It seems rather that they were delayed by the questionnaire asking for information that was not easily available.
6. Perceived response burden The respondents were asked if they found the questionnaire burdensome or not (see questions in appendix). Those who answered “both effortless and burdensome” or “burdensome” were followed up with a multiple-choice question where they could indicate what they felt burdensome. Figure 1212 shows the general results. More than 50% of the respondents found the questionnaire effortless to complete and were consequently not posed any follow-up question. Just below 40% of the respondents found either some, or most questions burdensome. As indicated in the Figure 12-12, some of the sources of burden can be linked to different cognitive steps in the question-answer process. 30.6% of those who were asked to specify, said that unclear terms were a challenge (comprehend). 31.7% pointed at mismatches between
Figure 12-13: Reasons for response burdens in enterprises which perceived some burden, comparing enterprises belonging to the top 5% total revenue and with 10 or more establishments with other enterprises. Percent of enterprises perceiving at least some burden.
Response processes and response quality in business surveys
173
information requests and available information (retrieval) and 41.2% at complicated calculations (judge). 30.7% complained about unfamiliar response options (report). Because top 5%, 10+ enterprises differ so much from the general response patterns, we wanted to identify whether certain challenges were more common in these enterprises than the others. First, we recognize that as many 42% of the respondents in top 5%, 10+ enterprises found some questions burdensome and 30% found most questions burdensome. The equivalent numbers in Figure 12-12 were 28.3% and 10.0%. In Figure 12-13 we have then compared kinds of burden pointed at. Two reasons stand out; mismatch between requested and available information, and complicated calculations. In many statistical institutes, particularly in Scandinavia, obtaining data from administrative records is recommended before direct data collection. This is particularly relevant for data collection from large enterprises, which have a lot to report. The results shown in Figure 12-13 should serve as a warning to those who think replacing surveys with data capture procedures from administrative sources is a straight-forward process. Mismatches between the information requested for statistical purposes and what is available in administrative business systems must still be dealt with. Calculations done by the respondents in business surveys must be implemented in the data capture instrument or done during editing. Bavdaž (2010) has suggested a five-point typology which links information availability to likely outcomes. This model was later expanded by distinguishing between availability challenges that have to do with concepts, units or reference periods (Haraldsen 2013). The expanded version is presented in Figure 12-14. Those who found the SBS2016 survey burdensome did not only point at mismatches between the requested and available information, but also at a need for complicated calculations. This indicates that the requested information was neither non-existent nor inconceivable, but estimable or generable. Hence, what we should expect are solid or rough estimates. In addition to the response burden question, respondents were also asked if they found questions difficult to answer and given the opportunity to comment upon those questions (see Appendix). We carried out a simple content analysis of these comments. In this content analyses we focused on the top 5%, 10 + enterprises and tried to identify if their problems that had to do with the concepts, with the units or the reference period. 92 of the 222 enterprises in this top 5%, 10 + enterprise group commented upon difficult questions. 36% of these comments pointed at problems with breaking down revenue, costs or investments to the
174
Chapter 12
Figure 12-14: Accessibility and likely outcomes in business surveys.
establishment level (unit problem) but splitting up enterprise figures across sub-categories was also often mentioned (concepts). 23% had problems with breaking down IT investments. In addition, 18% mentioned problemswith breaking down investments without specifying what kind of investments were the most troublesome. We guess some of these comments also referred to IT investments. Problems with breaking down costs were mentioned by 18%. 14% mentioned revenue breakdowns. Only one enterprise reported that the reference period, which was the previous year, caused a mismatch problem. The common denominator in these comments is breakdown problems, for both sub-units and sub-categories. Breaking down investments, and in particular investments in different kinds of IT equipment, software and services, was reported to be the most burdensome task. Consequently, this is where we should expect estimates with the poorest quality (‘rough estimates’ in Figure 12-14).
Response processes and response quality in business surveys
175
7. Wrapping up This study gives a glimpse into the complexity of the business response process, which is a combination of organisational and individual processes. This complexity needs to be studied by a stepwise analytical procedure. In this analysis we found our question set about time-use and perceived response burden particularly useful. The analysis points at the importance of taking the skewness of the business population into account, not only during sampling, but also when looking at measurement. When we did so, the sample could be split into a majority with fewer problems than we had anticipated, and a small minority which found the survey rather time-consuming and burdensome. The main cause of this burden was requests for information that was not easily accessible. This applied to questions about enterprise sub-units (establishments); but perhaps more surprisingly, also to questions asking for sub-group breakdowns (investments in particular, but also costs and revenues). The group of enterprises that found the questionnaire most burdensome also belonged to the most influential part of the sample. They were enterprises in the very upper part of the revenue scale and with more than 10 establishments to report from. Note, however, that the most worrying group was so small they could easily be contacted in an effort to tailor the measurement instrument to their information system or supported by trained interviewers who could help them with their calculations. We think it is by such combinations of quantitative analysis and qualitative development that methods for data collection in business surveys should be improved.
References Bavdaž, M. (2010). Sources of measurement errors in business surveys. Journal of Official Statistics 26, 25-42. Dale, T. and Haraldsen, G. (eds.) (2007). Handbook for monitoring and evaluating business survey response burdens. European Commission, Eurostat. Available at http://ec.europa.eu/eurostat/documents/64157/ 4374310/12-HANDBOOK-FOR-MONITORING-AND-EVALUATINGBUSINESS-SURVEY-RESONSE-BURDEN.pdf Haraldsen, G. (2013). Quality issues in business surveys. In Snijkers, G., Haraldsen, G., Jones, J. and Willimack, D., Designing and Conducting Business Surveys, pp 83-125. New York: Wiley.
176
Chapter 12
Haraldsen, G. and Jones, J. (2007). Paper and web questionnaires seen from the business respondent's perspective. ICES III, Montreal. Available at https://ww2.amstat.org/meetings/ices/2007/proceedings/ICES2007000259.PDF. Lorenc, B. (2007). Using the theory of socially distributed cognition to study the establishment survey response process. ICES III, Montreal. Available at http://ww2.amstat.org/meetings/ices/2007/proceedings/ ICES2007-000247.PDF. SSB (2013). Basiskunnskap for tilknyttede registre (Basic register knowledge). Internal report. Oslo: Statistics Norway Willimack, D. and Nichols, E. (2010). Hybrid response process model for business surveys. Journal of Official Statistics 26, 3-24.
Response processes and response quality in business surveys
177
Appendix The response burden sequence is dynamic. Respondents who did not need help from others or to check other information sources, were only asked how long it took to complete the questionnaire. Also, respondents who found the questions easy to answer and questionnaire effortless to complete, were not asked any follow up questions. Hence the minimum number of questions was four, while the maximum was 10. It is the maximum version which is shown here. Altinn, which is referred to in the introduction to the questions about perceived response burden, is a common web portal used for all governmental reporting and all business surveys run by Statistics Norway.
178
Chapter 12
CHAPTER 13 PARADATA AS AN AIDE TO QUESTIONNAIRE DESIGN
JORDAN STEWART1, IAN SIDNEY2 AND EMMA TIMM1
Abstract The chapter discusses how paradata can be used in future business data collections. It begins by discussing the feasibility of extracting paradata from an electronic questionnaire used in a pilot and the Monthly Wages and Salaries Survey. The rest of the chapter takes this paradata and uses it to improve the design of the online questionnaire. Key indicators for improvement in the design of the questionnaire were the reduction in the potential to trigger error messages and the total number of errors. The findings are discussed in terms of how these criteria were met.
1. Introduction The recent focus for the Office for National Statistics (ONS) on electronic data collection has presented an opportunity to collect additional data, namely paradata. Paradata can be briefly described thus: “Paradata are automatic data collected about the survey data collection process captured during computer assisted data collection.” (Kreuter, Couper and Lyberg, 2010). 1
Data Collection Methodology, Office for National Statistics, Cardiff Road, Newport, NP10 8XG, UK. Email: {jordan.stewart, emma.timm}@ons.gov.uk. 2 Well-being, Inequalities, Sustainability and Environment Division, Office for National Statistics, Cardiff Road, Newport, NP10 8XG, UK. Email: [email protected]
178
Chapter 13
There is still no standard definition of paradata (Kreuter, 2013). The authors’ use of the term “paradata” refers to automated data collected through the online data collection process, such as keystrokes, error messages, average completion times, average time spent on individual pages, pages that respondents exit the survey on. These automated paradata are termed “audit trails” by Snijkers and Morren (2010). In this chapter “paradata” also refers to manually collected data or process data, which can be obtained through respondent phone call records and notes taken on individual respondents by ONS staff. The analysis of paradata can support continuous improvement of questionnaires, associated systems and the wider services that support them. This chapter will give an overview of work on questionnaire design using paradata analysis on the Monthly Wages and Salaries Survey (MWSS). It is split into three sections, following this introduction; section 2 presents the background context of a pilot study which was conducted into the possibility of using electronic questionnaires as a mode of data collection. Section 3 discusses a feasibility study into investigating the use of paradata taken from the pilot study to improve questionnaire design. Section 4 focuses on how the analysis of paradata from live surveys was used to improve the design of the MWSS questionnaire prior to it being dispatched to the full sample of respondents. Hopefully, this section will be of use to anybody who is attempting to undertake paradata analysis for the first time.
2. Background context In 2015, ONS conducted a pilot study of on-line data collection in the MWSS to study whether there would be a significant mode effect when compared to the paper version of the survey. The pilot study demonstrated that online respondents did not provide substantially different figures to those who completed on paper. It also showed that response rates did not differ between online and paper modes of data collection. Indeed, online respondents submitted their questionnaire returns more quickly. The pilot study lasted for 6 months, equating to 6 collection periods, and at the peak of the study 5,000 respondents received the online version of the MWSS compared to around 10,000 who received the paper survey in the same period. Respondents to the pilot study had not been in the sample for the paper version of MWSS for 2 years and would not have had to complete another paper version of MWSS for 2 years after the online pilot, to avoid forcing respondents to switch modes repeatedly in a short space of time. Individual respondents in the full MWSS sample generally complete 60
Paradata as an aide to questionnaire design
179
questionnaires, which equates to one questionnaire a month for five years. Following ONS procedures, every month most respondents who have completed 60 questionnaires are rotated out of the sample and replaced with new respondents. Businesses with over 1,000 employees remain in the sample for MWSS. Therefore, the survey is encountered by new respondents every month. This allowed the researchers to investigate the impact of questionnaire changes following paradata analysis on new respondents The electronic questionnaire (eQ) was designed in such a way that respondents could select which pay pattern (PP) their company used to pay their staff – weekly, calendar monthly, four-weekly or five-weekly. Based on their answer to the PP question they were routed to the corresponding sections skipping any they had not selected. In many cases respondents selected more than one PP. If, for example, they selected weekly and monthly pay patterns the respondent would have seen the full range of weekly questions first followed by the monthly questions. One of the aims of the pilot study was to analyse and utilise the byproducts of the online questionnaire. For the pilot study an in-house paradata collection tool was developed to collect the audit trails from the online version of MWSS. Generally, audit trails may include information collected at different levels of aggregation: the server side and the client side. The client side includes access to information such as mouse actions, link utilisation, timestamps and keystrokes. The server side can collect information on the type of device, browser and operating system utilised by the respondent (Heerwegh, 2011). At the outset of the feasibility study the paradata collection tool was relatively limited. A key indicator missing was information on error messages. Error messages were triggered if a respondent attempted to undertake a restricted action; for example, entering a set of dates which far exceeded the range of one month. All error messages appeared on the screen in a red box informing the respondent of the error and how to correct it. The parameters for errors are agreed and set with the Process, Editing and Imputation team in ONS, along with subject matter experts.
3. Feasibility study In order to understand how best to use paradata for future online surveys a two-stage (quantitative and qualitative) feasibility study into the paradata obtained via respondents’ completion of the MWSS was conducted. The specific aims in running the study were to examine whether paradata could: define and identify “outlier” cases,
180
Chapter 13
enable the identification of question/survey/“system”-specific issues, be used to understand the respondent path from starting the survey through to submission, be triangulated with other data sources to enhance understanding of respondent paths. One of the first tasks of the feasibility study was to create a map of a potential process for using paradata analysis to improve questionnaire design. Figure 1 shows the conceptual diagram which was used to visualise the process. The oval shapes in Figure 13-1 represent the individual sources of paradata which could potentially be used to identify an outlier or even the reasons for a respondent’s “problematic journey” in a submitted questionnaire. This could be found in the online paradata produced by their journey through the questionnaire. They may have contacted ONS through the telephone helpline in which case their call would have been recorded and coded to their specific problem. They may have left comments about their experience of completing the questionnaire, which would have been recorded in the Respondent Feedback Database. Additional notes from staff who dealt with respondent phone calls were also recorded, adding more context to the calls – including the reason why the respondent was struggling to answer a question or complete the survey. A final source of paradata was the validation information provided to the authors after the analysis of the returned questionnaires, which included information on how many validation errors had occurred per item. This source of paradata was essential because there were no sources of client side paradata referring to answer changes available during the pilot. Once problems had been uncovered, alternative designs for wording, functionality or webpage design could be suggested and tested through cognitive interviewing or usability testing techniques. This stage would be repeated until a suitable solution was found.
3.1 Response profiles The level of detail of paradata obtained during the pilot study was not particularly high, so it was not possible to infer the respondent’s route through the questionnaire with sufficient accuracy by observing which pages they accessed.
Paradata as an aide to questionnaire design
181
Figure 13-1: Conceptual diagram of paradata analysis
Another key piece of information we focused on was “Session Identity”. The paradata tool collected information on the number of sessions it took an individual respondent to complete the questionnaire. However, a high number of sessions did not necessarily equate to a problem with questionnaire design. There are several reasons why respondents may have completed the questionnaire over numerous sessions (e.g., interruptions, leaving to find more information, etc.). Instead we focused on which page tended to cause most respondents to end a session (i.e., at an aggregate level) based on the assumption that questions with high drop-off rates are potentially flawed in some way. This initial project was the first step in using paradata. One of the ancillary aims of the project was to be able to specify additional functionality for the paradata tool. The paradata of a random sample of 300 pilot MWSS eQ were selected based on every 10th return as received by ONS. The aim was to identify how respondents completed the eQ; if we could understand respondent behaviour we could begin to identify and anticipate where problems might arise within the questionnaire. Snijkers and Morren (2010) conducted a study with a far larger sample size, around 40,000 respondents, finding that the audit trails created provided a useful insight into how respondents completed an online
182
Chapter 13
questionnaire. The authors did not have the capacity to carry out such a thorough study with the online MWSS but managed to find distinct respondent types in the pathways they analysed. In our study five main types of respondent were identified through patterns that emerged when analysing across respondents. The following images are simply to help visualise each type of respondent: 1. Completers: Completed the questionnaire linearly (i.e., in the way that it was designed). Respondent behaviours included: a) completing the questionnaire in a single session; b) completing it over numerous sessions, but always returning to the point where they broke off; and c) spending a long period of time on a single page before continuing (Figure 13-2). Unbroken lines in this and the next four figures signify single sessions; broken lines signify the end of one session and the beginning of another; a “break”.
Figure 13-2: Illustration of the response journey of a completer-type
2. Checkers: Completed the questionnaire as far as the “summary” page at the end of the questionnaire, then clicked either the web browser’s “previous” button or the eQ’s “previous button” back to a specific page (Figure 13-3). It was assumed they were either checking or editing their responses before finishing the questionnaire.
Paradata as an aide to questionnaire design
183
Figure 13-3: Illustration of the response journey of a checker-type
3. Attentive Checkers: As per “Checker”; but describes respondents who used the navigation bar located on the left side of every page of the questionnaire, to move back to a specific page rather than the “previous” button (Figure 13-4). If a respondent used the navigation bar to check and edit responses this would reduce the amount of time spent completing the questionnaire because they would not have to manually click back through every page to arrive at their desired question. The word “attentive” is used here to indicate that the
Figure 13-4: Illustration of the response journey of an attentive checker-type
184
Chapter 13
respondent saw the navigation bar on each page of the eQ and understood its purpose. 4. Familiarisers: This type of respondent did not seem to follow a particular pathway through the questionnaire moving backwards and forwards, seemingly at random (Figure 13-5). One could infer that respondents were attempting to familiarise themselves with the questions before completing.
Figure 13-5: Illustration of the response journey of a familiariser-type
Figure 13-6: Illustration of the response journey of a learner-type
Paradata as an aide to questionnaire design
185
5. Learners: The questionnaire was designed in such a way that respondents could select which pay pattern (PP) their company used to pay their staff – weekly, calendar monthly, four weekly or five weekly. They could select more than one PP, so if they selected weekly and monthly the respondent would have seen the full range of weekly questions first followed by monthly questions. This structure seemed to encourage a learning type of behaviour. Respondents familiarised themselves to begin with (e.g., moving back and forth through the weekly PP questions), but then progressed through the rest of the questionnaire linearly (e.g., through monthly PP questions) and completed the questions at a faster pace (Figure 13-6). Table 13-1 demonstrates the proportions within the sample that were categorised into each “type”, in order of most common – “Other” covers paths which fitted none of the identified user types. One of the main findings of this paradata analysis was the small number of Attentive Checkers. This told us that very few respondents noticed or chose to use the navigation bar in its current position. An important recommendation is therefore to redesign and improve the navigation bar to encourage more respondents to use it to move about the eQ more easily and encourage quicker completion. Table 13-1: Number of users who fitted each “user type” Completer Familiariser Checker Learner Attentive Checker Other Total
Number 192 58 33 8 3 6 300
% 64.0 19.5 11.0 2.5 1.0 2.0 100.0
Another finding was that respondents move from their initial type to the Completer type after their first month in the sample for MWSS. The questionnaire was structured so that one question was presented per page; respondents could only see the following questions by clicking the “next” button; this was an issue unique to online data collection. During the first completion of the MWSS eQ the one question per page design meant
186
Chapter 13
respondents were not sure what questions they would be confronted with, respondents tended not to have this problem after their first completion. However, it was important to consider ways to improve a respondent’s experience during the first completion of MWSS. Whilst identifying respondent types proved to be quite useful, without more detailed qualitative information we could only make assumptions about why respondents were encountering problems. Further investigation using the other sources of paradata was needed.
3.2 Identifying interesting cases After identifying how respondents completed surveys, the second part of the analysis involved exploring possible causes for identified behaviours through the analysis of additional paradata. Respondents’ survey paradata were supplemented with other paradata held by ONS in business contact databases using the Reporting Unit Reference (RU Ref), a unique identifier for each business selected to complete ONS surveys. These databases contain descriptions of the call records from occasions when respondents contact ONS to request help. Using the linked paradata, it was possible to examine potential causes of respondent behaviours while completing the MWSS online. This method enabled the researchers to design potential solutions to the problems raised by respondents. Case 1 A respondent showed a pattern of moving through the survey, reaching the last question, and then, at a later date, going back through all of the questions from the beginning, before submitting. By matching these paradata to ONS call centre records; it was found that the respondent had called to request a specimen questionnaire. The respondent explained that to complete the questions within the survey, they required data from different HR teams within their business. The only way for the respondent to know what data they needed to provide was to enter false data into the questionnaire to view all the relevant questions. When they had collated the data, the respondent completed and submitted the survey in a single session. This insight demonstrated how ONS could reduce burden on respondents by providing an option to preview the questionnaire. To test different solutions an ONS qualitative research project was scoped to fully understand what information business respondents need prior to accessing the survey. The outcome of the research is discussed in section 4.
Paradata as an aide to questionnaire design
187
Case 2 A respondent called ONS to say that they had both weekly and monthly waged staff, they were unsure about how to answer for both these categories online because it looked different from the paper version of MWSS. Comparing the qualitative data from the respondent contact databases with the online survey paradata it emerged that the respondent was moving backwards and forwards through the questionnaire pages without submitting. The respondent was advised over the telephone that they were able to select all the applicable categories they needed to report. The proposed solution to this particular problem was to position the guidance “Please select all that apply” closer to the question and answer fields. Case 3 A respondent called to explain that they were stuck on the “Pay Pattern” page and could not progress through the survey despite answering the question. When checking the respondent contact database with the online survey paradata it was clear that they had not been able to progress from the Pay Pattern page in two separate collection periods. Initially, it was assumed that there was a technical problem with the respondent’s computer or the software they were using. However, when it was observed through the paradata that they had the most up to date web browser it started to become clear that the respondent had been confused by the hard edit check on the page; they had failed to see the “Confirm” button which appeared on the top right of the screen after the respondent had clicked “Next”. This button was added to ensure respondents did not progress without selecting the correct pay patterns. The solution to this problem was to reposition the “Confirm” button and place it in a more obvious position on the page closer to the “Next” button.
3.3 Pilot study outcomes One of the successes of the feasibility study into paradata analysis was the speed with which it was possible to identify problems with questionnaire design or webpage design. While these problems can be identified in questionnaire design literature, the fact that paradata analysis picked these problems up validates this method as a procedure for improving survey design. Problems in an online survey could be quickly identified using
188
Chapter 13
paradata; analysis of paper questionnaire data could take weeks or months to identify problems. One tentative outcome of the feasibility study was the potential to save resources. If we could implement a good solution to any problem in the online questionnaire it would mean that fewer respondents would have to be called back to confirm that the data they had submitted was correct improving validation of data. By using paradata to improve questionnaire design we could also reduce respondent burden as well as save staff time and money on the MWSS. The next section of the chapter is concerned with testing these predictions. While the feasibility study was generally useful there were some drawbacks. One was the format of the paradata generated by the tool which had been developed. This was an Excel spreadsheet in which a new row was created every time a respondent accessed a new page, leading to a spreadsheet with an unfeasibly large number of rows. This meant that the paradata analysis was incredibly laborious and would become a near impossible task when the full sample of respondents was required to complete online. The number of error messages triggered by respondents across the online questionnaire would have been an invaluable resource and one which would have provided a much clearer picture of a respondent’s difficulty with completing MWSS. A key recommendation of the feasibility study was to find a more suitable program which could be tailored to suit the authors’ paradata analysis needs. Google Analytics was chosen as the software to analyse paradata, because it could collect detailed paradata from the both client and server sides, which suited the needs of multiple teams within ONS, not just those interested in questionnaire design. A bespoke paradata analysis tool with the sole purpose of improving questionnaire design would not have been cost effective. To overcome any confidentiality issues paradata was not collected on individual respondents, as in the pilot study, rather the total numbers of audit trails were recorded. Despite not being able to collect audit trails at the individual respondent level any longer, it is now possible to see the overall completion process for all respondents in a far more simplified way using Google Analytics. To find out why certain phenomena are being observed in the audit trails the manually processed paradata is used to understand which problems individual respondents are encountering. Previously, the researchers would have to match individual respondent audit trails with their reference numbers in the manually processed paradata. This was a far more time-consuming method which would not have provided as much detail.
Paradata as an aide to questionnaire design
189
4. Improving MWSS questionnaire design In this section we will discuss the design issues with the online version of MWSS, which were identified using paradata analysis in the pilot stage, and which solutions were implemented when online collection was extended to the full sample. After the survey had moved to the full sample ONS used Google Analytics to obtain audit trails and help with analysis. The images in this section will demonstrate how the questions looked online during the pilot stage and how they changed for the full online survey with the use of paradata analysis. Only after the pilot study had finished was it possible to obtain the numbers of errors triggered across the online survey. Coupling the number of errors with other paradata such as timestamps, session identity and telephone records gave a much clearer picture of the problems with the design of the questionnaire. Reducing the number of errors triggered across the survey was a key indicator of quality. Indeed, the ideal use of paradata is one posited by Couper (2008) that is, “preventing errors in the first place reduces the need for error messages”. This statement was always taken into consideration when the authors attempted to design solutions to the problematic questions highlighted here. The use of paradata analysis to improve questionnaire design was not directly focused on reducing perceived response burden. Nor was it our goal to simplify the concepts or variables that were asked for on MWSS; respondents would still need to provide the same data. However, based on the example of Case 1, the researchers attempted to reduce their time spent obtaining the information needed to complete the questionnaire by providing a preview page, as shown in Figure 13-7. This was a feature also requested by many respondents during usability testing for new functions for MWSS. A respondent could see further information on each topic if they clicked the “show” button. It was found that the Familiarisers and Learners would move to being Completer-types when they were able to view the contents of the questionnaire up-front, and this facility has since become available for all online business surveys administered by ONS. After the survey had gone live to the full sample of respondents the audit trail showed that over 1000 individual clicks were recorded on the preview page during the first month, suggesting this was a useful function for respondents.
190
Chapter 13
Figure 13-7: Preview page before MWSS online questionnaire
4.1 Pay pattern question As mentioned in Case 3, respondents tended to struggle with selecting all pay patterns which applied to their business. This caused them to move backwards and forwards through the questionnaire trying to find all the applicable pay patterns, increasing response burden and forcing them to use the telephone helpline. Respondents tended to miss the “Please select all that apply” guidance above the question (Figure 13-8) and only selected the first option which applied to their business, when in actuality their business could have had two or more pay patterns. While an “edit check” would appear at the top of the screen in a blue box asking respondents to confirm their answers were correct, respondents tended to misunderstand its purpose and clicked
Paradata as an aide to questionnaire design
191
Figure 13-8: The Pay Pattern question on the MWSS pilot study
“Confirm” to move on as quickly as possible. The purpose of the confirmation button was to ensure respondents had selected all the applicable pay patterns.
Figure 13-9: Question asking respondents to confirm their business’s pay pattern on the final version of MWSS
192
Chapter 13
To overcome this issue for the full version of MWSS a page dedicated to confirming pay patterns was added. While this meant respondents had to visit more pages and slightly increased the time taken to complete the survey, it was decided that forcing them to check that they had selected all the pay patterns which applied would save time overall. If they selected the correct answers at the beginning they would not have to go back through the questionnaire checking and editing their answers. ONS did not have to call respondents who had missed some of their applicable pay patterns to obtain these figures over the telephone, thereby saving the organisation and respondents time and money. In the last round of paper mode of collection the first time clearance rate for errors on MWSS was 75.04%, on the first month of online data collection was 88.30% and remained at this level in the following months. Effective questionnaire design and clearly worded error messages caused respondents to make fewer errors.
4.2 Significant changes question The opportunity to analyse the numbers of error messages triggered across the pilot study identified the questions which caused the most problems for respondents. One such question was the significant changes question. This question asked respondents to provide reasons for why there was a significant difference to the number of paid employees compared to last month for whichever pay pattern they had selected in a previous question. The question guidance instructed respondents to “select all [reasons] that apply” (see Figure 13-10). Common errors during the completion of this question included respondents selecting mutually exclusive options: more temporary workers and fewer temporary workers, more overtime and less overtime, no significant change and any other option. Across these three error types there was a total of 107 error messages triggered by respondents. It was hypothesised that respondents selected mutually exclusive options because they had selected multiple pay patterns in a previous question and believed this question applied to all of their pay patterns, rather than one. The potential to trigger three different errors by selecting mutually exclusive answer options showed there was a problem with the question design.
Paradata as an aide to questionnaire design
193
Figure 13-10: Pilot study version of significant changes question
Alternatives to problem questions were designed and tested through cognitive interviewing and usability testing methods. The final design split the response options into two questions with “yes” and “no” response options. The first question asked if there had been any significant changes to their employees and contained routing. Respondents would only see the question asking for more detail if they selected “yes”. The new designs made it very difficult for a respondent to trigger an error message. The first question is shown in Figure 13-11. One feature that aimed to reduce confusion among respondents was the highlighting of the pay pattern in the question. In Figure 13-11 “weekly paid employees” is highlighted in an attempt to ensure respondents are not thinking about any other pay patterns covered by the response.
4.3 The most common error - value not entered The most commonly triggered error on the MWSS pilot study was triggered whenever a respondent attempted to progress through the survey without entering an answer. The survey requires respondents to enter a zero if they do not have any figures to provide for the question. The error was featured on every question in the survey which accounts for its higher rate, it was triggered 1,553 times across the pilot study.
194
Chapter 13
Figure 13-11: Part of the newly designed significant changes question
After the pilot study it was assumed that some of the error messages were not clear enough. The error message for the pilot stated “This field is mandatory” when respondents did not provide an answer. The wording of the error message was changed to make the cause of the error more explicit. However, this did not reduce the number of errors triggered across the online questionnaire after it had been sent to respondents to complete. Table 13-2 shows the number of error messages taken from the Google Analytics tool for the July 2017 return for MWSS, the first month where 100% of the sample was scheduled to complete the eQ. The most common errors on the survey became: 1. “Please enter a value, even if the value is 0” 2. “The value you have entered is invalid. Please enter a numeric value.” To find the root of the problem other sources of paradata were consulted; this included the telephone call records described earlier in the chapter. Using these paradata it became clear that first time respondents triggered an error for not entering a value, they called ONS to explain their problem. After analysing call records, it became clear that respondents wished to be
Paradata as an aide to questionnaire design
195
Table 13-2: Number of error messages in the first month and final month of the study. Month of Survey July November
Number of Errors 8,487 3,665
able to enter a decimal point and a minus symbol for certain questions. One of the questions asked respondents to provide a percentage for new pay rates and respondents wanted to be able to enter, for example 1.2% rather than a whole percentage. They also wanted to be able to enter a minus symbol to signify a reduction in pay. Allowing respondents to enter decimal points and minus symbols meant increasing the accuracy of the data they provided. In November 2017, a few months after introducing these functions to the online version of MWSS, the number of errors triggered had markedly reduced, as shown in Table 13-2. As well as reducing the number of error messages over the course of four reporting periods, there was a notable drop in the average time taken to complete MWSS online. For the first month where 100% of the sample was required to complete the survey online the average time taken was 8 minutes and 36 seconds. Four reporting periods later after making changes to the questionnaire based on evidence from paradata analysis, the average time taken to complete the survey had been reduced to 8 minutes and 6 seconds. Respondents spend most of their time searching for the relevant information when attempting to complete a business survey. The time spent entering the figures is only a fraction of their overall completion time (Haraldsen 2018, Chapter 12 in this volume). However, paradata analysis helped us identify a solution to reducing the time spent searching for figures by introducing a preview page stating the relevant information needed to complete the survey. Over 800 clicks were recorded in November 2017, suggesting that respondents find this feature helpful when collecting data to complete the survey. When these factors are accounted for the reduction in time spent completing the survey is a success. As MWSS is a relatively simple survey the potential gains from paradata analysis could be greater for longer and more complex surveys. Using paradata analysis to improve questionnaire design could error messages triggered and improve the quality and reliability of the statistics produced.
196
Chapter 13
5. Concluding remarks The main conclusion from the investigation into paradata as an aide to questionnaire design is that audit trails may identify where respondents have problems, but that additional, manual paradata are needed to identity what the problem is. Consequently, they can both be used to identify possible solutions. The focus on error messages triggered during completion led to attempts to reduce the number of errors triggered and provided an indicator of success. Reducing the frequency of triggered errors reduces the time respondents need to complete the questionnaire. In order to complement the quantitative paradata produced by the online survey it was important to study the qualitative paradata produced by the telephone call records and staff notes. These sources of paradata provided essential context as to why errors were triggered. When coupled with the number of errors triggered on a particular question it is possible to understand how flawed the design is and what kind of solution might be implemented. Preventing errors messages by simplifying question design is more desirable than allowing for respondents to become familiar with the questionnaire challenges over time as was the case with paper questionnaires. The new, simplified question designs still collected the same data but made it more difficult to trigger an error message. Paradata analysis for online surveys is fast and cheap, provided one has access to a suitable paradata collection tool, such as Google Analytics. It can also be proactive; a solution can be implemented, monitored and improved upon in weeks. For example, it is possible to know in days if respondents are not clicking on a link to guidance for a question and if more errors are triggered on this question than others. A solution can be suggested, tested and implemented soon after discovering the problem. However, paradata analysis cannot be used to aid a respondent’s understanding of a concept - it can only highlight a problem. Cognitive interviewing and usability testing are still very important for providing the correct solution to these problems. It is important to stress that the examples highlighted in this chapter were thoroughly tested before being implemented and the correct solutions were found after a number of iterations.
Paradata as an aide to questionnaire design
197
References Couper, M. (2008). Designing effective web surveys. New York: Cambridge University Press. Haraldsen, G. (2018). Response processes and response quality in business surveys. In Lorenc, B., Smith, P.A., Bavdaž, M., Haraldsen, G., Nedyalkova, D., Zhang, L.-C. and Zimmermann, T. (2018) (eds.). The unit problem and other current topics in business survey methodology, pp 155-176. Newcastle upon Tyne: Cambridge Scholars.. Heerwegh, D. (2011). Internet survey paradata. In Das, M., Ester, P. and Kaczmirek, L. (eds.) Social and behavioural research and the Internet. New York: Routledge. Kreuter, F. (2013). Improving surveys with Paradata: Introduction. Pp. 112 in Kreuter, F. (ed.) Improving Surveys with Paradata: Analytic Uses of Process Information. Hoboken, New Jersey: Wiley. Kreuter, F., Couper, M. and Lyberg, L. (2010). The use of paradata to monitor and manage survey data collection. Proceedings of the Section on Survey Research Methods 2010, 282-296. Snijkers, G. and Morren, M. (2010). Improving web and electronic questionnaires: The case of audit trails. Q2010, Helsinki, Finland. Available at: http://www.websm.org/uploadi/editor/1362484759Snijkers Morren_2010_The_Case_Of_Audit_Trails.pdf (accessed 2018-03-05).
CHAPTER 14 STUDYING THE IMPACT OF EMBEDDED VALIDATION ON RESPONSE BURDEN, DATA QUALITY AND COSTS
BORIS LORENC1, ANDERS NORBERG2 AND MAGNUS OHLSSON2
Abstract Data validation is known to be a resource intensive step for producers of business statistics. Further, it commonly includes recontacts with data providers, thus also adding to their respondent burden. An increase in the amount of embedded validation (also called data provider validation) in an electronic data collection tool may have several benefits. It may reduce production validation, but it may also reduce the overall respondent burden and increase the data quality due to the availability in that setting of the data on the basis of which the data provider can carry out the validation. As a first step in finding an optimal balance between embedded validation and production validation, we studied experimentally the impact of an increased level of embedded validation in a business survey conducted by Statistics Sweden. While the impact of the experiment was limited, the results indicate that an increase in embedded validation may carry trade-offs and that further knowledge of the validation response process is needed.
1 Bright Lynx LLC, Vaarika 1-1, 10614, Tallinn, Estonia. Email: [email protected]. 2 Department for Development of Processes and Methods, Statistics Sweden, Box 24300, 104 51 Stockholm, Sweden. Email: {anders.norberg, magnus.ohlsson}@scb.se.
200
Chapter 14
1. Introduction Data validation, or data editing (UNECE, 2000; 2013), is an essential step in collecting and processing the data that form the basis for producing aggregated statistics. In official statistics production, validation is a resource intensive step tending to consume a considerable proportion – as much as 40 percent in business surveys (Granqvist and Kovar, 1997, p.418) – of the total cost of a statistics product. Validation performed by the statistics producer after receiving the data (henceforth: production validation) may result in subsequent recontacts with the data provider, who is asked to verify a certain reported value that deviates from what the statistics producer considers a set of valid or expected values or value relations. A data provider would be recontacted on a separate occasion, perhaps several weeks after the original data submission. In that setting, the data provider would likely not have as direct an access to the source records, nor the same awareness of the task, definitions, and similar, as in the original setting of data provision. Depending on the availability for the data provider of the information to be verified (Beatty and Herrmann, 2002), for instance whether the information is simply accessible within the business's systems or not (Lorenc, 2007; Bavdaž, 2010a), the data provided in a verification recontact risk either exerting a considerably higher burden or being of a lower quality if a response shortcut is used, compared to the same data verification performed during primary data collection. Thus, alerting the data provider during the data collection to deviations that, if not considered, would be likely to cause a validation recontact later on may have two related positive consequences: It occurs in the course of the data collection itself, so the data provider likely has direct access to the information that enables verifying and editing; subsequent recontacts generally happen outside the data provision setting, It occurs automatically from the point of view of the data collector, so that the data collector does not need to engage directly with the data provider; this leads to a reduction in the use of data verification resources on the data collector side. We refer to this as embedded validation: the electronic data collection instrument contains embedded edit (or validation) rules, that if violated trigger the instrument to inform the data provider of a deviation in form (decimal separator, number of decimal places, character set used, etc) or content (magnitude of the entered value, inconsistency between entered
Impact of embedded validation on response burden, data quality and costs 201
values, etc) from an expectation of the data collector. The validation is thus performed by the data provider, who is asked to review and – if needed – edit the originally supplied value while still in the data provision setting. Two comments on the terminology we use: 1. There is a shift from editing to validation for the broader activity of “detection of actual or potential errors” (sub-process 5.3 in GSBPM 5.0; UNECE, 2013). We use this term throughout the text. The former term would then apply to “correction activities that actually change the data” (sub-process 5.4 of GSBPM). In embedded validation that happens too, so the sub-processes 5.3 and 5.4 are performed iteratively. Our choice of the latter term seems to be justified by GSBPM's wording “some elements of validation may occur alongside collection activities, particularly for modes such as web collection” (ibid.). 2. An alternative to embedded validation could be interactive validation, as the latter term involves an interaction between the data collection instrument and the data provider. However, a similar term, interactive editing, has already been used for the computerassisted manual editing on the producer side (De Waal et al., 2011, p. 15), something that we would include in production validation. Embedded validation is thus expected to result in data of better quality, as it has been verified in a more suitable setting than a recontact; it has for the same reason been obtained with less respondent burden; and obtained at a lower cost as the statistics producer has invested fewer resources in production validation having assigned a part of the validation to the data provider. Prior to the experiment reported in this chapter, Statistics Sweden has had little empirical evidence regarding the impact of embedded validation on data quality, response burden and cost. While the outlined theoretical expectations might hold regarding the benefits of introduction or an increased level of embedded validation on the data quality, response burden and cost, it may also have adverse effects. Increased embedded validation may be experienced by the data providers as an obstacle to data provision and thus lead to larger unit nonresponse, to shortcuts in providing the requested data, to decreased acceptance of embedded validation features, to increase in perceived response burden, and therefore not achieve the envisioned data quality improvements. If the adverse outcomes prevail, the amount of production validation may remain unchanged. In particular, without measuring what the impact on the outcomes would be, the appearance of improvement through automation (the second perceived
202
Chapter 14
positive consequence above) from just moving an increasing number of edit checks to embedded validation may lead to misplaced but ultimately detrimental trust in the changed routine. To provide empirical evidence of impacts and potential improvements in its processes, Statistics Sweden initiated a project in 2016 to investigate the effects of an increase in the number of embedded validation rules in one of its business surveys that already had some embedded validation. The motivation for conducting the experiment lay in the pursuit of an optimal balance between embedded validation and production validation. In the remainder of the chapter, we describe a split-half experiment where the treatment group was given a version of the data collection instrument where the level of embedded validation was increased in some ways compared to the version given to the control group (subsection 2.2). A number of outcome variables were measured, aiming to cover the aspects where an impact was expected or feared: nonresponse, response burden, amount of production validation and attitude towards embedded validation (subsection 2.3). The experiment was analysed using generalised linear models (subsection 2.4), the results of which are presented in section 3. Section 4 contains some discussion and proposals for further topics in need of research or methodological attention towards utilising the embedded validation better.
2. Method 2.1 Context The vehicle for the experiment was the reference year 2016 round of the annual survey “Wage and salary structures in the private sector” (SLP, from its name in Swedish). In this survey, the majority of the data are from external sources (trade union organisations), however one part is directly collected from a probability sample of businesses (legal units) using a webbased questionnaire. SLP’s validation process is similar to a number of other surveys within Statistics Sweden and other NSIs and consists of some embedded validation (a number of validation rules implemented in the data collection instrument), some production (micro) data validation (validation rules verified manually by the production personnel), and an output (or macro) validation review (Scholtus, 2014). Some NSIs also use an automatic validation step that generally precedes the manual production validation; in SLP this is only used for the variable category of personnel.
Impact of embedded validation on response burden, data quality and costs 203
Violation of some of the production validation rules may lead to recontacting the business, as per a detailed specification to the validation personnel. A business would generally be recontacted within a week or two of the data provision. However, some validation rules for SLP are already implemented in the data collection instrument. When violated, such validation rules trigger a notification to the data provider, a validation message. Depending on whether the rule is implemented as hard or soft (Scholtus, 2014), the data provider in the former case will need to, or in the latter case may, confirm, comment, or edit the provided information on the spot. But this creates additional work, and the data providers may also view the validation messages requiring their attention as hindering their progress towards completing the data provision in as short a time as possible. The experiment was designed to fill in some of the missing components in working towards an optimal level of embedded validation in the survey: such that would lead to a reduced overall respondent burden at the same time as reducing the amount of production validation, while also keeping the data provider motivated to participate in the survey and provide data of high quality. Formally, this is a four-variable optimisation problem, and in working towards finding an optimal solution a statistics producer will need to specify a satisfactory relation between the four variables.
2.2 Study design The experiment was embedded in SLP utilising a split-half experimental design with two conditions: control: standard level of embedded validation in SLP, treatment: increased level of embedded validation in SLP. The businesses sampled for the 2016 SLP were first stratified on size (7 categories), activity (6 categories), whether located in Stockholm county or not, whether sampled for the 2015 SLP or not, whether responded in the 2015 SLP or not (conditional on being sampled in the 2015 SLP), and data provision mode – file or web questionnaire – chosen for the 2015 SLP (conditional on being a respondent in 2015 SLP). Then, within each stratification cell a sampled business was assigned to one of the two conditions using systematic sampling: each second element in the list of businesses ordered by their ID number was assigned to the treatment condition and the remainder to the control condition. After identification and removal of overcoverage, there were 3,676 businesses in the sample, roughly half of them (1,845) assigned to the experimental condition and the rest (1,831) to the control condition. Data
204
Chapter 14
collection for this survey started at the beginning of October 2016. By the closure of data collection for the experiment in December 2016, 2,500 businesses had responded (1,258 in the treatment and 1,242 in the control), giving an average response rate of 68%. In addition to providing a web-based questionnaire for data submission, the data collection instrument accepts data files formatted to a prespecified template. It is the data providers’ choice whether to use the web-based questionnaire or to submit a file. In the 2016 round of SLP, 1,792 businesses submitted data using the questionnaire while 708 businesses used the file. As the technical possibilities for embedded validation varied somewhat between these two modes, for analysis in this experiment we used data from the web-based mode only.
2.3 Experimental manipulation The treatment consisted of increased embedded validation of four validation rules among the around 30 existing embedded validation rules in the data collection instrument for SLP. The project team surmised that in order to be useful as an embedded validation rule, a validation rule would have to have these properties: a) have a high hit rate (UNECE, 2000) when triggered in production validation, that is, to often lead to a changed value after verification; information on hit rate was obtained through process data on production validation in the previous round of SLP; b) lead to recontact with data provider in the course of verification (i.e. be a costly component of both production validation and respondent burden); c) admit a clear validation message to the data provider as to what the issue might be with the originally submitted value. Using these criteria, four validation rules were manipulated in the treatment condition: a range rule for the weekly working time and a ratio rule for monthly versus weekly working time were added, and two range rules – for occupation code and for hours worked in the reference month – were turned from soft rules to hard rules. For reasons extending beyond the scope of this chapter, it was deemed that this level of increase was sufficient for a first experiment on embedded validation at Statistics Sweden. Explanatory variables. The following explanatory variables were available. They refer to a business sampled for the 2016 round of SLP:
Impact of embedded validation on response burden, data quality and costs 205
1. Whether the sampled business was assigned to the treatment or control group. 2. Business’s size (in terms of number of full-time equivalent employees), 7 categories. 3. Business’s area of activity, 6 groups of NACE codes. 4. Whether the sampled business was included or not in the 2015 SLP sample. 5. Whether the sampled business was a respondent or nonrespondent in the 2015 SLP (given that it was included in the 2015 SLP sample). The primary focus of the analysis is to estimate the effect of the experimental manipulation (item 1 in the list of explanatory variables, treatment/control group). Such an analysis answers the question of what impact the experimental variation had on the dependent variables – what happens with these variables when the amount of embedded validation is increased. Inclusion of the other available explanatory variables shows the relative importance of these variables, the experimental manipulation included, in explaining the dependent variables. It answers the question of which variables explain the dependent variables best. While of research interest for understanding business response behaviour, in practice none of the other variables – the experimental manipulation excluded – are under the statistics producer’s direct control and thus somewhat less relevant for the present study. Therefore, while these variables have been included in the models, we do not report these model estimates in full as they are outside of the scope of the present study. Dependent variables. These variables refer to a sampled business and its participation in the 2016 round of SLP: 1. Whether the sampled business participated (i.e. whether it provided data) or not, a measure of data quality (Particip16). 2. Number of embedded validation rules that the sampled business triggered, which were then flagged to the business with a validation message, a measure of data provision work (#ValFlags). 3. Number of data provider comments that a business submitted in addressing the flagged embedded validation messages, a measure of the amount of data provision work (#ProviderComments). 4. Number of production validation comments, that are written by the editing staff in the process of validating the data to serve as a note on how a particular edit flag has been treated, a measure of production validation work (#ProdValidComments).
206
Chapter 14
5. Number of recontacts with the sampled business, a measure of production validation work and data provision work (#ProviderRecontacts). 6. Length of time that the data provider within the business estimated that it took to provide the data in the primary data collection round (taken from the question that generally follows submission of data to Statistics Sweden), a measure of respondent burden (EstimTime). 7. Preference for embedded validation versus recontacts later on (one question of an optional three question survey prepared for the experiment and presented after the SLP form had been submitted) (PrefEmbValid).
2.4 Analysis Seven generalised linear models (Dobson, 2002) of the form E y g 1 x cȕ were fitted to the data, with the linear predictor xcȕ and a link function g . which varied with the dependent variable analysed: logistic for a binary outcome, Poisson for a count, and the identity for lognormal data (e.g. logarithm of time). Specifically, the glm function of the stats package in R (R Core Team, 2017) was used. To choose predictors in the models among the explanatory variables listed in subsection 2.3, an automated procedure provided by the step function of the stats package was used. If the procedure did not include the experimental manipulation variable in the model, then we included it ourselves as the focus was on studying the impact of this variable. To evaluate the models, a Pseudo R2 (McFadden, 1974) was calculated for each of them, using the expression: ln Lˆ M Full Pseudo-R 2 1 ln Lˆ M Intercept
3. Results This section presents the impact of the treatment condition on the relevant dependent variables. The values of the estimates of the regression coefficients pertaining to the explanatory variable treatment/control from the models of subsection 2.4 are presented in Table 14-1. They show that the experimental manipulation, in turn: 1. did not impact the response rates negatively;
Impact of embedded validation on response burden, data quality and costs 207
2. increased the number of validation messages that data providers were exposed to; 3. increased the number of comments that the data providers submitted in the process of validating their data; 4. did not reduce the number of comments that the validation personnel generated in the process of validating the data; 5. did not reduce the number of recontacts with the businesses; 6. did not increase the perceived length of time it took to provide the data for SLP; and 7. did not reduce the data providers’ preference for embedded validation. It should be noted, with respect to the items numbered 6 and 7 in the preceding list, that the perceived length of time and preference data – coming from an optional survey presented to the data providers after they submitted their SLP data – have fewer observations than the other dependent variables, n=587. Table 14-1: Modelling the impact of the experimental manipulation (x = {Control (0), Treatment (1)}) on the dependent variables. For each model, the dependent variable y, the estimate of the regression coefficient Eˆ , the probability p based on the estimate that the true E is not significantly different from 0, and the model’s Pseudo-R2 are presented. (Values in bold pertain to those coefficient estimates that at Į = 0.05 are significantly different from 0.) y Particip16 #ValFlags #ProviderComments #ProdValidComments #ProviderRecontacts Log(EstimTime) PrefEmbValid
Eˆ 0.014 0.271 0.273 0.016 0.008 0.015 -0.124
p Pseudo-R2 0.845 0.030 0.000 0.299 0.002 0.171 0.815 0.173 0.946 0.066 0.763 0.154 0.578 0.045
Summarising the results, the production validation work has not been significantly reduced in the treatment condition. The treatment did increase the work load of data providers (number of validation flags that have been presented to them and the number of comments they wrote), but that did not reduce the number of recontacts with the businesses. The data providers’ participation in the survey was not different in the treatment condition than in the control, nor has there been any significant impact on
208
Chapter 14
the data providers’ estimate of the length of time taken or their preference for embedded validation.
4. Conclusion 4.1 Discussion In general, the treatment that added two and strengthened two embedded edits, among the about 30 embedded edits already in the data collection instrument, seems to have been too limited to achieve the intended improvements. These existing embedded validation rules, as well as the production validation rules, may have necessitated recontacts independently of the four experimental embedded validation rules. While a further increase in embedded validation can be considered, based on the existing data it cannot be said whether that would still leave the response rate and the high preference for embedded validation intact, or whether these would start to deteriorate. The current experiment cannot give a decisive answer, but it suggests prudence: the coefficient for preference for embedded validation was negative. We believe that the preference will also depend on whether data providers perceive the validation messages as helpful or distracting.
4.2 Further research and applied topics Based on these results, we believe that experiments with additional increases in embedded validation are needed to cast further light on their impact on the variables of interest. Having worked on this study leads us to suggest that further work should be dedicated to the following topics. Total structured validation approach. While there are processes in place to improve data quality through validation at various stages (embedded in data collection, automatic on input data, production validation, macro validation), guidelines and principles overarching these stages are needed because presently these are conducted practically independently of each other and without measures of data quality improvement at each stage. An overarching approach could for instance be embedded in a total survey error framework, and also include burdens and costs. Further, a structured formal representation of such a comprehensive approach to validation would be useful. From such a structured account, it would for instance be possible to work out what effects changes in a specific embedded validation rule would have on recontacts.
Impact of embedded validation on response burden, data quality and costs 209
Usability studies of embedded validation. While we could not find any published empirical evidence, we assume that data providers’ acceptance of the embedded validation develops over the course of the data provision, depending on whether the triggered validation messages are seen by the data providers as supportive or distracting towards completing their task. Barely increasing the proportion of embedded edits does not guarantee that the theoretical promises of embedded validation will be fulfilled. One first needs a better grip on the role and impact of validation messages in the process. For understanding that, qualitative studies might also prove useful. QDET for validation messages. Another area seemingly lacking a better understanding concerns relations between the different validation rules and their respective validation messages. We note – based on an informal inquiry among colleagues – that validation messages for embedded validation are mostly not constructed and evaluated in a questionnaire design, evaluation and testing (QDET) framework. They seem to be implemented by the statistics production team and worded as extensions of the production validation rules that they are built upon. Based on a sample that we have seen, these validation messages tend to be terse and use some perhaps insufficiently clear wording (“Only numbers allowed”, “Both overtime compensation and overtime pay are filled in”, “Fixed salary is the same as variable salary”). Thus, we suggest that questionnaire evaluation should extend to evaluation of the validation messages used in the data collection process. Data availability and quality in recontacts. That we included only nonresponse as a measure of data quality can be seen as a weakness of the present study. However, we were not able to find in the published literature results of others’ research indicating that data quality variables improved by embedded editing either. We thus assume that this is yet another topic in need of further attention. We would especially encourage more research on the topic of whether a validation recontact (at a later point in time) about complex, not easily available data points improves the data quality or not. An edited data point complies with the data collector’s validation rules and is thus not likely to draw attention further down the statistics production line. We believe that several of the above areas can best be investigated in the context of a cognitive approach to survey methodology (CASM), expanded to business surveys (Bavdaž, 2010b; Haraldsen, 2013; Lorenc, 2007; Willimack and Nichols, 2010). As the core steps of comprehension, recall, judgment and response apply to the data providers' processing of questions, they also in fact apply to their processing of the validation messages. Theoretical understanding and empirical research on this topic
210
Chapter 14
is hugely missing in survey methodology: design and evaluation of validation messages seem presently to be kept in an area of their own, outside any CASM or QDET framework developed for questionnaires, questions and response alternatives.
Acknowledgements This paper builds on a project by Statistics Sweden entitled ``Embedded editing in SIV [Statistics Sweden’s Data Collection Tool]". The project group consisted of Anette Björnram, Pia Hartwig, Boris Lorenc, Anders Norberg, and Magnus Ohlsson. The chapter’s first author was at Statistics Sweden at the time of participating in the project. The opinions expressed in this chapter are the authors’ and are not to be seen as reflecting the position of Statistics Sweden or the project group. The authors are indebted to Gustav Haraldsen and Paul Smith for valuable comments on an earlier draft that greatly improved the chapter. Any remaining shortcomings should be attributed to the authors only.
References Bavdaž, M. (2010a). Sources of Measurement Errors in Business Surveys. Journal of Official Statistics 26, 25-42. Bavdaž, M. (2010b). The multidimensional integral business survey response model. Survey Methodology 36, 81-93. Beatty, P. and Herrmann, D. (2002). To answer or not to answer: decision processes related to survey item nonresponse. In Groves, R.M., Dillman, D.A. Eltinge, J.L. and Little, R.J.A. (eds.). Survey nonresponse pp. 71-87. New York, NY: Wiley. Dobson, A.J. (2002). An introduction to generalized linear models. Second edition. Boca Raton, FL: Chapman and Hall/CRC. Granquist, L., and Kovar, J. (1997). Editing of survey data: how much is enough? In Lyberg, L., Biemer, P., Collins, M., de Leeuw, E., Dippo, C., Schwarz, N. and Trewin, D. (eds.). Survey measurement and process quality pp. 415-435. New York, NY:Wiley. Haraldsen, G. (2013). Quality issues in business surveys. In Snijkers, G., Haraldsen, G., Jones, J. and Willimack, D. Designing and conducting business surveys, chapter 3. New York, NY: Wiley. Lorenc, B. (2007). Using the theory of socially distributed cognition to study the establishment survey response process. Proceedings of the 3rd International Conference on Establishment Surveys, Montréal, Québec, Canada, June 18-21, 2007. Available at http://ww2.amstat.org/
Impact of embedded validation on response burden, data quality and costs 211
meetings/ices/2007/proceedings/ICES2007-000247.PDF (accessed 2018-02-18). McFadden, D. (1974). Conditional logit analysis of qualitative choice behavior. In P. Zarembka (ed.), Frontiers in econometrics pp. 105-142. New York, NY: Academic Press. R Core Team (2017). R: A language and environment for statistical computing. R~Foundation for Statistical Computing, Vienna, Austria. URL https: //www.R-project.org/. Scholtus, S. (2014). Statistical Data Editing - Main Module. In Memobust Handbook on Methodology of Modern Business Statistics. Available at https://ec.europa.eu/eurostat/cros/content/statistical-data-editing-mainmodule-pdf-file\_en (accessed 2018-02-08). UNECE (2000). Glossary of terms on statistical data editing. Available at http://unstats.un.org/unsd/EconStatKB/Attachment326.aspx?Attachme ntType =1 (accessed 2018-03-08). UNECE (2013). Generic statistical business process model (Version 5.0, December 2013). Available at http://www1.unece.org/stat/platform/ display/GSBPM/GSBPM+v5.0 (accessed 2018-02-18). de Waal, T., Pannekoek, J. and Scholtus, S. (2011). Handbook of statistical data editing and imputation. New York: John Wiley and Sons. Willimack, D. and Nichols, E. (2010). A hybrid response process model for business surveys. Journal of Official Statistics 26, 3-24.
CHAPTER 15 ADAPTATIONS OF WINSORIZATION CAUSED BY PROFILING
ARNAUD FIZZALA1
Abstract The sampling design of the French Structural Business Statistics survey has changed starting with the reference year 2016. It no longer samples legal units but enterprises. Data is still collected on legal units, with the following rule: when an enterprise is selected, then all legal units within this enterprise will be surveyed. This paper develops the adaptation of winsorization (influential values treatment) to this new context. We show that winsorization with the Kokic and Bell thresholds, applied as if the sampling were a stratified sampling of legal units, seems to be the best option to deal with influential values. As alternatives, we tested methods based on conditional bias, but they led in our context to poorer results, and with some problems to solve to make them operational.
1. Introduction The French Structural Business Statistics (SBS) production system, known as ESANE, has two main uses: - production of statistics based on the European SBS regulation; - estimation of businesses’ contributions to GDP for the national accounts.
1
Institut national de la statistique et des études économiques (INSEE), 88 avenue Verdier, CS 70058, 92541 Montrouge Cedex, France. Email: [email protected].
214
Chapter 15
The system is based on a mix of exhaustive administrative fiscal data and data obtained on a random sample of the business population (Brion and Gros, 2015). The sampling design of the French SBS survey has changed starting with the reference year 2016 (Gros and Le Gleut, 2018, Chapter 8 in this volume). It no longer samples legal units but enterprises. An enterprise is defined by law as the smallest combination of legal units that is an organisational unit producing goods or services with a certain degree of autonomy (EEC, 1993). Data is still collected on legal units, with the following rule: when an enterprise is selected, then all legal units within this enterprise will be surveyed. Since the statistical units (enterprises) differ from the data collection units (legal units), the sample design can be seen as a two-stage cluster sampling. Before the change in the sampling design, and for the 2016 reference year as well, SBS results were computed on the population of legal units, based on data produced at the legal unit level. For the next editions, SBS results will be computed on the population of enterprises, based on “enterprise data”. Therefore, the SBS surveys post-collection treatments (non-response, influential units and calibration treatments) have to be adapted to handle the new sampling design. This chapter covers the adaptation of influential units treatments with winsorization to this new context. The paper is organized as follows: in section 2 we present the current treatment of influential values based on winsorization with the Kokic and Bell (1994) threshold. In section 3 we describe alternative methods based on conditional bias (Favre Martinoz et al. 2015, 2016). In section 4 we present the simulation study comparing the different methods. We conclude in section 5 with a summary and some comments.
2. Current treatment of influential values Economic variables with highly skewed distributions are very common in business surveys. In this context, influential unit problems frequently occur. In this paper, we assume that measurement errors (gross errors, unit errors, etc) have already been detected and corrected at the editing stage. Influential values are typically very large but “true”, and their presence in the take-some strata instead of the take-all strata tends to make classical estimators very unstable. The aim of influential value treatment is to limit their impact, by reducing their weights or their values, which leads to estimators that are more stable but potentially biased.
Adaptations of winsorization caused by profiling
215
Winsorization is the method used in the French SBS survey to treat influential values. In the case of stratified random sampling it is based on the determination of thresholds in each sampling stratum above which large values are reduced. More precisely, the values yw of the variable of interest y after (type 2) winsorization are defined by:
yi ° y = ® nh § nh · ° N yi + ¨ 1 N ¸ K h h ¹ © ¯ h w i
if yi < K h if yi t K h
with: Kh the threshold in the stratum h; nh the number of units selected in the sample in the stratum h; Nh the number of units in the sampling frame in the stratum h. The winsorized estimator is then defined as an expansion estimator of the winsorized variable: H
N tˆ w = ¦ h y h=1 nh
nh
¦y
w i
i=1
Unlike the Horvitz-Thompson estimator which is unbiased, the winsorized estimator is downwards-biased in estimating the total ty of the variable of interest y. On the other hand, its variance depends on the variance of the winsorized variable yw in each stratum, which is by construction smaller than the variance of the original variable y. So, winsorization carries a bias-variance trade-off: it will be efficient if there is more variance removed than bias introduced, that is if the mean squared error of the winsorized estimator is lower that of the Horvitz-Thompson estimator. The choice of the thresholds Kh is crucial, as it determines the quality of the winsorization procedure. The problem of choosing these thresholds has been studied by Kokic and Bell (1994), who proposed a method to determine, under some assumptions, optimal thresholds, that is, thresholds that minimize the estimated mean square error of the winsorized estimators. Until the reference year 2015, INSEE applied winsorization on the legal units’ turnover values, available for all legal units in the sampling frame from administrative information (especially fiscal data) (Deroyon, 2015). More specifically, winsorization was applied for the estimation of the total
216
Chapter 15
turnover by activity (NACE, 3 positions) using thresholds obtained with the Kokic and Bell method. After that, the weights of the winsorized units were modified so that the effect of winsorization could be “transferred” to yw the other variables. The winsorized weights were defined as wiw = wi i yi with wi the initial weight. It should be noted that the Kokic and Bell’s method was developed in a stratified sampling framework. As INSEE now uses a two-stage cluster sampling, there is no guarantee that the method holds. If the sampling design were still a stratified sampling of legal units, each legal unit in a stratum (defined with the legal units characteristics) would have the same weight. To evaluate how much the situation is different with the new sampling design, we have calculated distribution of the sampling weights of legal units by stratum, where strata are those that would apply if the sampling design were still a stratified sampling of legal units, but with weights resulting from the new sampling design. There are approximately 1,600 strata in our data. The results are presented in Table 15-1. In the table, each observation is one stratum. As can be seen in the table, 95% of the strata have a coefficient of variation of the legal unit weights less than 59.5%. 75% of the strata have a range of the legal unit weights less than 50. In each stratum, at least 47% of the legal units have the same weights. Table 15-1: Distribution of sampling weights of legal units, by stratum. For details see text. Quantile
100 99 95 90 75 50 25 10 5 1 0
CV (%)
401.7 118.1 59.5 38.7 20.7 11.9 6.1 2.0 0.0 0.0 0.0
Range
797 265 153 96 50 20 9 1 0 0 0
Frequency of the mode (%) 100 100 100 100 99 97 91 84 80 68 47
Adaptations of winsorization caused by profiling
217
We see that in all but one of the strata, the majority of the units in a stratum have the same weights. We also see that the remaining units can have very different weights. As we have a big proportion of units with the same weights within each stratum, we can expect good results with Kokic and Bell’s method applied as if the sample were a stratified sample of legal units. Below, we write “Kokic and Bell estimator” for winsorized estimator with Kokic and Bell thresholds computed as if the sampling design was a stratified sampling of legal units. To confirm this intuition, we conducted a simulation study based on 1000 replications of the new sampling design. We compared the “Kokic and Bell estimator” to the classical Horvitz-Thompson estimator and some robust estimators based on conditional bias. Indeed, conditional bias methods are robust estimation methods more general than the Kokic and Bell winsorization that can be adapted to any sample design without a simplifying hypothesis. The study is based on the SBS data for the reference year 2015. The results of the simulation study are presented after introducing methods based on the conditional bias.
3. Robust estimators based on conditional bias The formal framework of robust estimation based on conditional bias is described in Favre Martinoz et al. (2015, 2016). We give an overview of the approach, to facilitate understanding of the simulation study results. The robust estimator introduced in these papers minimizes, in the class of estimators of the form tˆyR = tˆy +į , the conditional bias of the most influential unit in the respondent population. Formally, the robust estimator for a variable y is: 1 tˆyR = tˆy Bmin + Bmax 2
with: tˆy the Horvitz-Thompson estimator of the total of y; Bmin and Bmax, the minimum and maximum conditional bias in the respondent population.
The conditional bias associated with the unit i is a measure of its influence. It is the deviation from the population total we would observe if we computed the mean of the Horvitz-Thompson estimators over all samples containing unit i:
218
Chapter 15
Bi
E p (tˆy | I i
1) t y
For the simulation study, we considered two phases in the sampling process: the first phase is the sampling of the legal units, the second phase is the “selection” of the responding legal units. The second phase is modelled as a Poisson sampling. It is a classical model for non-response in survey methodology studies. That way, the conditional bias takes into account the sampling design and non-response modelling. We test two versions of the first phase selection which lead to two different robust estimators based on conditional bias: 1 – Stratified Poisson sampling of enterprises; 2 – Stratified simple random sampling of enterprises. Version 1 does not correspond to the SBS sampling design but is easier to implement in the operational phase. We adopt the following notation: ʌ1i and ʌ1ij : first-order and second-order inclusion probabilities of legal units in the first phase; ʌ 2i and ʌ 2ij : first-order and second-order inclusion probabilities of legal units in the second phase; ʌ1E : first-order inclusion probabilities of enterprises in the first phase; ri : legal unit i’s response probability; mh: number of enterprises sampled in stratum h in the first phase; Mh: number of enterprises in the sampling frame U in stratum h; yE: total turnover of legal units with the same activity as i which are part of enterprise E; tyh: total turnover of legal units with the same activity as i which are part of enterprises in the sampling frame in stratum h. The conditional bias of unit i, for an arbitrary design in the first phase and Poisson sampling in the second phase, is (Favre Martinoz et al., 2016, p.1022):
Bi
§ S 1ij
¦ ¨¨ S jU
©
S1 j
1i
· 1 1¸ y j ¸ S 1i ¹
§ 1 · 1 ¸ yi ¨ © S 2i ¹
In version 1, we distinguish three situations (E is the enterprise containing the legal unit i):
Adaptations of winsorization caused by profiling
a) j = i so ʌ1ij = ʌ1i = ʌ1E =
219
mh ; Mh
b) j z i and j E so ʌ1ij = ʌ1i = ʌ1E =
mh ; Mh
c) j z i and j E so ʌ1ij = ʌ1i ʌ1j . So, we have:
Bi1
§ Mh · §M · 1 ¸ yi ¨ h 1 ¸ y E yi ¨ © mh ri ¹ © mh ¹
In version 2, we distinguish four situations: m a) j = i so ʌ1ij = ʌ1i = ʌ1E = h ; Mh b) j z i and j E so ʌ1ij = ʌ1i = ʌ1E =
mh ; Mh
c) j z i and j Ei and E j h so ʌ1ij =
mh (mh 1 ) ; M h (M h 1 )
d) j z i and j Ei and E j h so ʌ1ij = ʌ1i ʌ1j . So, we have:
Bi2
§ Mh · §M · 1 ¸ yi ¨ h 1 ¸ y E yi ¨ © mh ri ¹ © mh ¹ § M mh 1 · ¨ h 1 t yE ¨ m M 1 ¸¸ yh h © h ¹
§ M mh 1 · 1 t yE . Bi1 ¨ h ¨ m M 1 ¸¸ yh h © h ¹ Unlike version 1, in version 2 the conditional bias depends on the level of yE in the stratum. With stratified sampling, selecting E reduces the odds of selecting the other enterprises in the stratum. That is not the case with Poisson sampling. Remark: We can see that Bi2
220
Chapter 15
4. Simulations To evaluate the quality of the estimators, we selected 1000 samples with the new sampling design and estimated the total turnover for each of the 207 activities (3-digit NACE codes) with the estimators previously presented. As turnover was available in the sampling frame, the total true value was also known. In the real SBS results however, the data collected by the survey are used to compute a more accurate value for each legal unit’s activity. In the simulations presented here, we used the activity available for all units in the sampling frame. Next, we calculated the mean square error (MSE) of an estimator X for an activity:
MSE
2 1 1000 ˆ t yX t y ¦ 1000 k 1
To make the value of the MSE more interpretable, we divided it by the MSE of the Horvitz-Thompson estimator using the same units to obtain a relative mean square error. If this is larger than 100%, the robust estimates are less accurate than the usual expansion estimator. Otherwise, they are able to increase estimates’ accuracy:
MSER
2 1 1000 ˆ t yX t y ¦ 1000 k 1 2 1 1000 ˆ t yHT t y ¦ 1000 k 1
The sampling design of the French SBS survey consists of take-all and take-some strata. Influential value treatment only concerns the take-some strata (see section 2). Below, in order to simplify the study and to highlight the differences between the methods tested, estimates are limited to the take-some strata frame only. As take-some strata are defined by crossing the fine activity (5-digit NACE codes) and the number of employees, the domains called activity (3-digit NACE codes) each cover several strata. The results are presented in Table 15-2. In the table, each observation is one activity, and there are 207 activities in our data. As can be seen in the table, in half of the activities MSER is less than 67% with the Kokic and Bell estimator, less than 83% with the robust estimator V1, and less than 78% with the robust estimator V2.
Adaptations of winsorization caused by profiling
221
Table 15-2: Distribution of MSER (%) by activity. For details see text. Quantile 100 99 95 90 75 50 25 10 5 1 0
Kokic and Bell 100 100 88 84 77 67 43 16 10 1 1
Robust V1 131 108 101 98 93 83 61 39 31 24 22
Robust V2 141 100 95 92 87 78 59 39 29 22 19
The Kokic and Bell estimator performed the best even if the sampling design is not a stratified random sampling. Robust estimators based on conditional bias had good results too – better than Horvitz-Thompson in more than 95% of the activities for version 2. This is likely to be linked to the fact that the aim of robust estimation based on conditional bias is to minimize the influence of the most influential unit whereas winsorization aims to minimize the MSE which is also the indicator of quality used in the table. Doing so, the robust estimator based on conditional bias does not attain the minimum MSE, but it could have obtained a lower MSE than the Kokic and Bell estimator because the latter assumes a stratified simple random sampling which is not fulfilled. Winsorized estimators and robust estimators based on conditional bias are potentially biased. Therefore, we present in Table 15-3 the relative-bias of each estimator by activity for the turnover variable: 1000
BR =
tˆyX
¦ 1000 t
y
k=1
ty
In the table, each observation is one activity. As can be seen in the table, in half of the activities relative-bias is less than -5% for the Kokic and Bell estimator, less than -6% for the robust estimator V1, less than -4% for the robust estimator V2. The relative bias of 201% is observed in an activity with a very influential unit: its turnover is more than half of the total turnover of the activity. The atypical value of the relative bias is therefore
222
Chapter 15
Table 15-3: Distribution of relative bias (%) by activity for the estimation of the total of turnover. For details see text. Quantile 100 99 95 90 75 50 25 10 5 1 0
Kokic and Bell 0 0 -1 -1 -3 -5 -7 -13 -24 -52 -69
Robust V1 0 -1 -1 -2 -3 -6 -12 -23 -30 -61 -77
Robust V2 201 0 -1 -1 -3 -4 -8 -14 -21 -33 -48
due to this very atypical unit. Also, the number of simulations may have to be increased for this activity to correctly evaluate the relative bias. We haven’t investigated that because other indicators lead us to prefer the Kokic and Bell estimator. First, the results show that all the estimators are systematically downwards-biased. Indeed, as we have seen in the previous sections, the treatment of influential values leads, in the majority of practical cases, to reducing the largest values of turnover in the stratum concerned, so it leads logically to under-estimated totals. The three estimators perform similarly in terms of the relative bias. Second, relative bias seems to be important in some NACE activities, but this result must be put into perspective for two reasons at least: The relative-biases presented here, were calculated on the takesome strata only. They are much lower when the full sample is used. In the complete process of estimation used to build the French SBS results, a calibration on the turnover variable is done after winsorization, thus correcting the bias of the estimation of total turnover. To evaluate if the Kokic and Bell estimator is systematically better than the robust estimators based on conditional bias, Table 15-4 shows the distribution of the ratio between the robust estimators’ MSE and the Kokic and Bell estimator’s MSE. In the table, each observation is one activity. As can be seen in the table, the ratio between the robust estimator V1 MSE
Adaptations of winsorization caused by profiling
223
and the Kokic and Bell estimator MSE is more than 1.1 in 90% of the activities. The results show that the Kokic and Bell estimator performed better than the robust estimators (V1 or V2) in more than 90% of the activities. The reasons for this are probably twofold: as we have already seen, the majority of the legal units in a stratum have the same weights; the aim of robust estimation based on conditional bias is to minimize the influence of the most influential unit whereas winsorization aims to minimize the MSE which is also the indicator of quality that we use in this study. Table 15-4: Distribution of the ratio between the robust estimators MSE and the Kokic and Bell estimator MSE by activity. For details see text. Quantile
100 99 95 90 75 50 25 10 5 1 0
Robust V1 / Kokic and Bell 27.6 23.0 3.6 2.3 1.5 1.3 1.2 1.1 1.0 0.5 0.3
Robust V2 / Kokic and Bell 27.5 22.8 3.5 2.2 1.4 1.2 1.1 1.0 1.0 0.6 0.4
In the lights of these results, we conclude that winsorization with the Kokic and Bell thresholds, calculated as if the sampling were a stratified sampling of legal units, is the best option among those investigated to deal with influential values. To evaluate the impact of this winsorization on variables other than turnover, we computed the MSER for the estimators of totals of other variables with the weights after winsorization on turnover as described in the previous section. The other variables were: value added; investments; number of legal units.
224
Chapter 15
Table 15-5: Distribution of MSER (%) of estimators of total with winsorized weights by activity. For details see text. Quantile
Turnover
100 99 95 90 75 50 25 10 5 1 0
100 100 88 84 77 67 43 16 10 1 1
Value added Investments
100 100 99 97 90 81 64 31 20 3 0
100 100 100 100 99 92 68 28 12 4 0
Number of legal units 124 120 108 105 102 100 99 96 93 90 82
The results are presented in Table 15-5. In the table, each observation is one activity. As can be seen in the table, in half of the activities MSER is less than 67% for turnover, less than 81% for value added, less than 92% for investments, and less than 100% for number of legal units. This shows that even on investments, which is a variable with low correlation with turnover, winsorization improves the estimators. For the number of legal units, winsorization had a non-monotonic effect: MSER was better in half of the activities and worse in the other half, with the maximum achieved at 25%.
5. Conclusion Winsorization using Kokic and Bell thresholds, applied as if the sampling were a stratified sampling of legal units, was the best option, among those investigated, to deal with influential values for the French SBS survey. It performed the best on estimation of turnover and other variables including value added and investments. This result is likely related to the outcome statistic chosen for the simulation study which was the mean square error, because Kokic and Bell thresholds precisely minimize the mean square error of the estimator of the total of the winsorized variable. This choice of statistic coheres with the aim of the French SBS survey which is to estimate totals of the variables mostly strongly correlated with turnover. If the goal
Adaptations of winsorization caused by profiling
225
of the survey, and so the outcome variable of the simulation study, had been different, the conclusion might have been different. In this study, the totals were estimated using the expansion estimator. This kind of estimator is well-known and easy to study but it is not the kind of estimator used to produce the French SBS results. Thus, French SBS final estimators are calibrated on turnover and number of legal units by activity, and mixed with exhaustive administrative data (Brion and Gros, 2015). Further developments of this work will take into account the full estimation process, and study ways to adapt the robust estimation methods presented in this paper to ratio, regression or calibration estimators.
References Brion, P. and Gros, E. (2015). Statistical estimators using jointly administrative and survey data to produce French structural business statistics. Journal of Official Statistics 31, 589–609. Deroyon, T. (2015). Traitement des valeurs atypiques d’une enquête par winsorization – application aux enquêtes sectorielles annuelles. Acte des Journées de Méthodologie Statistique de 2015. Available at http:// jms.insee.fr/files/documents/2015/S10_5_ACTE_V2_DEROYON_ JMS2015.PDF. EEC (1993). Council Regulation (EEC) 696/93 of 15 March 1993 on the statistical units for the observation and analysis of the production system in the Community. Available at http://eurlex.europa.eu/LexUriServ/LexUriServ.do?uri=CELEX:31993R0696:E N:HTML (accessed 2018-03-03). Favre Martinoz, C., Haziza, D. and Beaumont, J-F. (2015). A method of determining the winsorization threshold, with an application to domain estimation. Survey Methodology 41, 57-77. Favre Martinoz, C., Haziza, D. and Beaumont, J-F. (2016). Robust inference in two-phase sampling designs with application to unit nonresponse. Scandinavian Journal of Statistics 43, 1019-1034. Gros, E. and Le Gleut, R. (2018). The impact of profiling on sampling. In Lorenc, B., Smith, P.A., Bavdaž, M., Haraldsen, G., Nedyalkova, D., Zhang, L.-C. and Zimmermann, T. (2018) (eds.). The unit problem and other current topics in business survey methodology, pp 91-105. Newcastle upon Tyne: Cambridge Scholars. Kokic, P.N. and Bell, P.A. (1994). Optimal winsorizing cut-offs for a stratified finite population estimator, Journal of Official Statistics 10, 419-435.
226
Chapter 15
Appendix: Practical problems with robust estimators The form of the robust estimators based on the conditional bias 1 tˆyR = tˆy Bmin + Bmax , which is the Horvitz-Thompson estimator with 2 an additional term, is not easy to use in production, because current computer programs work with estimators with a “linear form”, that is tˆyR = ¦ wi yiR . is
Favre Martinoz et al. (2016, p.1025) mention a method to transform the robust estimator into a linear form. It is based on a constant c, as:
yiR = yi
Bi ȥc Bi wi
, with ȥc Bi = sign Bi u min Bi ,c .
We can see with this ȥ c form, that only units with the most important conditional biases (greater in absolute value than c) will obtain a yiR different to yi. The French SBS system Esane currently works with winsorized weights. As a consequence, the winsorization of turnover has an impact on the estimators of other variables. There are two reasons for using this method: - winsorization of turnover will benefit estimators of the variables correlated with turnover; - accounting links are preserved, which is not the case if each variable is winsorized separately. To go from yiR to wiR, one can use the relation: y wiR = wi iR yi But this is not possible when yi=0. This problem appears, in a third of the activities, at least once among our 1000 replications. An alternative method is described in the paper of Favre Martinoz et al. (2015, p.75-76). The aim is to determine the value K as: yi ° yiR = ® K °w ¯ i
if wi yi d K if wi yi ! K
Adaptations of winsorization caused by profiling
227
and tˆyR = ¦ wi yiR is
The advantage of this method is that the targets are the highest values of wiyi, so the units with yi = 0 will not be concerned. But there is a drawback because of the cases where tˆyR > tˆy . With this method, we have, by construction, yiR d yi and so tˆyR d tˆy . 1 Bmin + Bmax > 0 , K is not calculable and the method does not 2 work. It does not happen with version 1 (conditional biases with Poisson sampling are positive), but it happens, in half of the activities, at least once among our 1000 replications with version 2. We are still working on ways to overcome this limitation of the conditional bias methods and to develop a method to translate robust estimate adjustments into new estimation weights for each sampled unit.
If
CHAPTER 16 BIG DATA PRICE INDEX LI-CHUN ZHANG1
Abstract Use of so-called big data for Official Statistics requires new theory and methodology. Based on a classification of the relevant forms of big data price collection, we outline and discuss some key issues that need to be resolved in the theory of index numbers, due to the use of big data instead of sample data collected by the traditional survey approach.
1. Introduction There is currently a surge of effort to harness data from administrative sources and other forms of “big data”. The point made by Di Zio et al. (2017) pertains to all such non-designed data: “On the one hand, there is an established tradition to use administrative data for the creation of sampling frames and for providing auxiliary information in data collection, editing, imputation and estimation. On the other hand, more and more often, the data from administrative sources are being transformed directly into statistical data, either in place of purposefully collected census or survey data, or to be combined with the latter on an equal footing in estimation.”
Secondary statistical uses of administrative and big data raise many new methodological questions for data collection, processing and analysis. See for instance Hand (2018) and Zhang (2018) for a recent discussion of some of the statistical challenges involved in using administrative and 1 S3RI/University of Southampton, UK and Statistisk sentralbyrå, Norway. Email: [email protected]
230
Chapter 16
transaction data. The focus of this chapter is price indices based on big data sources. To this end one may envisage the Consumer Price Index (CPI) as a primary example; see IWGPS (2004) for a comprehensive coverage of the CPI methodology. Based on a classification of the relevant forms of big data price collection, we outline and discuss some key issues that need to be resolved in the theory of index numbers, due to the use of big data instead of sample data collected by the traditional survey approach. See also Deshaies-Moreault (2018, Chapter 17 in this volume) and Noc (2018, Chapter 18 in this volume), which cover some other relevant aspects of big data price index production.
2. Three forms of big data price collection Whatever the commonality between them, there are some important differences between big data and administrative data collection. Firstly, big data are sometimes said to be “found” rather than collected. It may be the case that the data are not ‘reported’ or ‘surveyed’ from the owner directly, such as when scraping the web for prices. Moreover, unlike the case of administrative data, one may need to obtain big data from a third party, who provides the services that have generated the data but is not necessarily the original or exclusive owner of the data. For example, obtaining flight ticket price data from the Amadeus ticketing platform instead of the various airline companies, or bank transaction data from the Nets payment platform instead of the banks themselves. These new forms of data collection can create challenges related to the process responsibility, potential conflicts of interest, and extra complications at the testing and development stage of a new statistical output, among others. The data do not always represent the target concept – for example web scraped data present a price for a product, but no guarantee that a transaction at that price actually takes place; conceptually a price index should be based on transaction prices. Secondly, extracting and transforming big data may require different technical and operational know-how compared to the traditional data sources including administrative data. For instance, the big data may be unstructured or organic, for instance as images or webpages, such that one cannot simply use standard software as in the case of reading data stored in a list or an array. Instead some algorithmic program needs to be developed and certain ‘parameters’ tuned and decided upon, the choice of which may affect which data one obtains or ‘observes’. As another relevant example, one has experienced data to be ‘dumped’ to the statistical office in their totality, because the data provider prefers not to spend any extra resources on data preparation. It then takes effort and time in order to find one’s way
Big Data price index
231
out of such ‘messy’ data, which may contain lots of irrelevant or useless bits. Thirdly, the big data may be lacking appropriate unit identification or relevant characteristics or metadata that would enable matching or combination with other datasets, in order to correctly define the statistical units and measures, or to enhance the quality and scope of the separate datasets. For instance, in the context of price data, the same product may have different coding in the source over time, which then appears to be different products. Moreover, the characteristics or metadata associated with the product may be too sparse or too coarse to allow one to match them, which otherwise might have been possible despite the coding in the source having changed over time. To help the discussion of theoretical index formula issues later, we propose in Table 16-1 a classification of three forms of big data collection. It may be noticed that each form of collection has different strengths and weaknesses, as well as potential gains and risks with respect to the general differences to administrative data collection discussed above. Table 16-1: Three forms of big data price collection, with examples. Disaggregated
Aggregated
Unsupervised
Web Scraping
N/A
Supervised
Scanner Data
Enterprise Reporting
A typical example of an unsupervised and disaggregated form is web scraping of price data. The technique has been explored for a range of topics, such as flight tickets, hotel bookings, online retail of food, clothes, electronics, and so on. It is unsupervised insofar as the owner of the source does not engage in the specification and provision of the data that are being collected. In fact, sometimes the source owner may not even be aware that the data are being scraped from their web pages. It is disaggregated because the prices are observed for specific items. This includes the case when multiple prices are collected and averaged over a given period (say, a month) for the same item. Next, scanner data provide an example of a supervised and disaggregated form of price data collection. It is supervised because, unlike web scraping, there exists a clear specification of (or agreement about) the price data and metadata to be delivered. For instance, this may be the unitvalue price of an item over a calendar week, where an item is identified by
232
Chapter 16
the GTIN (Global Trade Identification Number) and the outlet. It is still disaggregated because, just as in the case of web scraping, the prices are observed for specific items of transaction. Finally, supervised and aggregated price data collection is still at the beginning of its development. In this form, the data provider may be asked to for instance deliver either a unit-value or a weighted average price of a set of products (or services), which are deemed to be subject to priceinduced substitution effects. An example of such a set of products can be rice of various producers and types from a retailer. In practice, this type of collection may become available because the retailer has a suitable stockkeeping code system, as the case is for the food market in Australia and Canada. In theory, a group of items may belong to a so-called homogeneous product in the terminology of National Accounts. It is an aggregated form of data collection because, unlike with the scanner data, the prices are no longer observed for specific items of transaction, but specified groups of items whose definition can have both theoretical and practical underpinnings. If such aggregated price data collection is feasible, it would not only simplify the post-collection processing, but also provide an opportunity for improvement because it uses information that otherwise might not reach the statistical office. On the other hand, there is also a risk associated with such pre-processing prior to data delivery, the control of which by the statistics producer is not as tight as in-house processing of data collected using the supervised and disaggregated form.
3. Some issues for index formula theory For the present discussion let us consider a common index formula:
P 0, t
¦
g
wg Pg 0, t ,
that is, from period 0 to t, where g = 1, …, G are the (budget) elementary groups of items, and wg is the group weight or expenditure share, and Pg(0,t) the corresponding elementary price index, which is calculated based on the prices at 0 and t from, say, ng items in the elementary group. In particular, the expenditures of the ng items are not required or used, and the items are matched (i.e. the same) for the two periods. It is therefore commonly referred to as the matched-item approach. This is one of the three scenarios of price index production depicted in Table 16-2, which characterises the traditional approach in the past. One may refer to it as the setting of small data and small formulae. It is small data, because the sample of prices constitutes only a minute proportion of
Big Data price index
233
the entire consumption universe. It is a small formula because, being based on unweighted price comparisons of the matched items, it in principle cannot handle the fact that both the target universe and the expenditure shares of the consumption items are changing continuously, that is, it does not fully capture the reality. Table 16-2: Three scenarios of price index production.
Small Data Big Data
Small Formula
Big Formula
Past
N/A
Present
Future
The uptake of scanner data and web scraping price data has since made it clear that one can no longer close one’s eyes and adopt a static approach to the dynamic universe. One has thus entered the present scenario (Table 16-2) of big data and small formulae. The latter is the case in the sense that the search for fully dynamic index formulae is an on-going effort. There are renewed discussions and debates about the basic properties of an index; see for instance Ivancic et al. (2011), Diewert et al. (2017), Chessa et al. (2017), Zhang et al. (2017). There are some tentative but not ideal methods to deal with the lack of quantity and characteristics, in the unsupervised disaggregated form of data collection; see for instance De Haan and Krsinich (2014), Krsinich (2016). There are some sensible but not yet definitive suggestions to deal with the fact that the observed consumption items do not match one-to-one over time; see for instance Dalèn (2001, 2017), De Haan (2001), Chessa (2016). Examples of some difficulties can be given to illustrate the point. 1. Due to the sheer amount of data and the lack of sufficiently detailed metadata, it is unrealistic to identify all the replacement items in the available data, as one has done for the small price sample in the past, where a new item is deemed to replace an old item in terms of function or utility but has a different GTIN code to start with. 2. At least partly for the same reason, the hedonic approach would be infeasible for quality adjustment in most big data situations, even though it has sometimes proved to be useful for this purpose in the past. 3. Despite the availability of item quantity in the scanner data, it would seem necessary to introduce an intermediate level of aggregation between the items and the elementary group they belong to, in order to better capture the substitution effects and the target concept of homogeneous products.
234
Chapter 16
But it is unclear at this stage how to accomplish this either in theory or in practice. 4. On the one hand, a price index is likely to become more volatile than what one is accustomed to, as a result of incorporating all the available quantity data. On the other hand, it is practically undesirable to the users if an index fluctuates more than usual around its trend, even when much of the fluctuation could be defended theoretically. Currently empirical criteria are lacking for judging and achieving a reasonable balance in this respect. To resolve these issues, a big formula price index needs to satisfy two overriding requirements. (a) At least in principle it should be able to accommodate all available items in a dynamic item universe, even though it is still possible for some data to be excluded from the final index compilation due to outlier controls or other justified considerations. (b) It must make it possible for one to keep the cost of quality adjustment due to unmatched items at a sustainable level. As a by-product of the big formula, it is likely that certain pre-aggregation of the price and quantity data can be specified and carried out, before the resulting data are delivered to the statistical office. One would thus move to the future scenario of big data and big formulae in Table 16-2, where supervised and aggregated enterprise reporting can replace some of the present scanner data collected in the disaggregated form. A motivation for the data provider may be that the pre-aggregation provides a form of disclosure control, when it is easy to follow the specification of pre-aggregation and thereby eliminate the need to release the relevant sensitive business data.
4. Closing remarks The traditional survey sampling approach for official statistics is unsustainable in many situations, including price statistics production, due to the combined pressure of increasing survey non-compliance and ever greater demand for richer and faster statistics. Administrative and transaction data, or other forms of big data, provide a tangible option but also raise many statistical challenges that need to be resolved, in order for official statistics to maintain its scientific foundation and its public perception as an objective and trustworthy information provider.
References Chessa, A.G. (2016). A new methodology for processing scanner data in the Dutch CPI. Eurostat review of National Accounts and Macroeconomic Indicators 1, 49-69.
Big Data price index
235
Chessa, A.G., Verburg, J. and Willenborg, L. (2017). A comparison of price index methods for scanner data. Paper presented at the fifteenth Ottawa Group meeting, Eltville, Germany. Available at https:// www.bundesbank.de/Redaktion/EN/Downloads/Bundesbank/Research _Centre/Conferences/2017/2017_05_10_ottawa_group_07_1_paper. html?__blob=publicationFile. Dalèn, J. (2001). Statistical targets for price indexes in dynamic universes. Paper presented at the sixth Ottawa Group meeting, Canberra, Australia. Available at http://www.ottawagroup.org/Ottawa/ ottawagroup.nsf/home/Meeting+6/$file/2001 6th Meeting - Dalén Jörgen - Statistical targets for price indexes in dynamic universes.pdf. Dalèn, J. (2017). Unit values in scanner data and some operational issues. Paper presented at the fifteenth Ottawa Group meeting, Eltville, Germany. Available at https://www.bundesbank.de/Redaktion/EN/ Downloads/Bundesbank/Research_Centre/Conferences/2017/2017_05 _10_ottawa_group_08_1_paper.html?__blob=publicationFile. De Haan, J. (2001). Generalized Fisher price indexes and the use of scanner data in the CPI. Paper presented at the sixth Ottawa Group meeting, Canberra, Australia. Available at http://www.ottawagroup.org/ Ottawa/ottawagroup.nsf/home/Meeting+6/$file/2001 6th Meeting - de Haan Jan - Generalised Fisher Price Indexes and the Use of Scanner Data in the CPI.pdf. De Haan, J. and Krsinich, F. (2014). Scanner data and the treatment of quality change in nonrevisable price indexes. Journal of Business and Economic Statistics 32, 341-358. Deshaies-Moreault, C., Harper, B. and Yung, W. (2018). Analysis of scanner data for the consumer price index at Statistics Canada. In Lorenc, B., Smith, P.A., Bavdaž, M., Haraldsen, G., Nedyalkova, D., Zhang, L.-C. and Zimmermann, T. (2018) (eds.). The unit problem and other current topics in business survey methodology, pp 237-251. Newcastle upon Tyne: Cambridge Scholars. Di Zio, M., Zhang, L.-C., and de Waal, T. (2017). Statistical methods for combining multiple sources of administrative and survey data. The Survey Statistician 76, 17-26. Diewert, E.W. and Fox, K.J. (2017). Substitution bias in multilateral methods for CPI construction using scanner data. Paper presented at the fifteenth Ottawa Group meeting, Eltville, Germany. Available at https://www.bundesbank.de/Redaktion/EN/Downloads/Bundesbank/ Research_Centre/Conferences/2017/2017_05_10_ottawa_group_07_ 2_paper.html?__blob=publicationFile.
236
Chapter 16
Hand, D.J. (2018). Statistical challenges of administrative and transaction data (with discussion). Journal of the Royal Statistical Society, Series A 181 555-605. Ivancic, L., Fox, K.J. and Diewert, E.W. (2011). Scanner data, time aggregation and the construction of price indexes. Journal of Econometrics 161, 24-35. IWGPS (2004). Consumer Price index manual: theory and practice. International Labour Organization. http://www.ilo.org/public/english/ bureau/stat/guides/cpi/index.htm. Krsinich, F. (2016). The FEWS index: fixed effects with a window splice. Journal of Official Statistics 32, 375-404. Noþ Razinger, M. (2018). Surveys on prices at the Statistical Office of the Republic of Slovenia. In Lorenc, B., Smith, P.A., Bavdaž, M., Haraldsen, G., Nedyalkova, D., Zhang, L.-C. and Zimmermann, T. (2018) (eds.). The unit problem and other current topics in business survey methodology, pp 253-266. Newcastle upon Tyne: Cambridge Scholars. Zhang, L.-C. (2018). Discussion of “Statistical challenges of administrative and transaction data” by D.J. Hand. Journal of the Royal Statistical Society, Series A 181, 579-580. https://doi.org/10.1111/rssa.12315. Zhang, L.-C., Johansen, I. and Nygaard, R. (2017). Testing unit value data price indices. Paper presented at the fifteenth Ottawa Group meeting, Eltville, Germany. https://www.bundesbank.de/Redaktion/EN/ Downloads/Bundesbank/Research_Centre/Conferences/2017/2017_0 5_10_ottawa_group_09_1_paper.html?__blob=publicationFile.
CHAPTER 17 ANALYSIS OF SCANNER DATA FOR THE CONSUMER PRICE INDEX AT STATISTICS CANADA CATHERINE DESHAIES-MOREAULT1, BRETT HARPER1 AND WESLEY YUNG1
Abstract The Consumer Price Index (CPI) produced by Statistics Canada is an indicator of the change in consumer prices, which is commonly used as a proxy for inflation in Canada. Statistics Canada is exploring alternative data sources for its statistical programs, including scanner data for Prices Statistics. Statistics Canada has recently begun to receive point of sale scanner data from a major Canadian food retailer. The use of scanner data in price indexes poses different challenges such as product classification, informatics infrastructure, products stability through time and how to appropriately incorporate these data into price indexes, that is, choice of index formula, aggregation method, and so on. This paper discusses some of these challenges but focuses on the impact of using different product definitions, index formulas and aggregation techniques on the resulting calculated price index. It is illustrated by comparing an index mimicking the current CPI approach to others using scanner data to a fuller extent. It also discusses similarities and differences observed by comparing the different approaches and which ones better serve the CPI.
1 Statistics Canada, 100 Tunney’s Pasture Driveway, Canada, K1A 0T6. Email: {catherine. deshaies-moreault, brett.harper, wesley.yung}@canada.ca.
238
Chapter 17
1. Introduction Consumer Price Indexes (CPI) have been produced for many years as a means to measure consumer price changes. For instance, Statistics Canada has been publishing a Canadian CPI since 1913. Since its inception, the Canadian CPI has been based on the principle of a fixed basket of goods and services with updates to reflect changes in products and consumer behavior. With advances in technology, in particular the collection of electronic point of sale data, national statistics organizations around the world are now looking at how to best to use these data in their price index programs. Point of sale data, or scanner data as they are commonly called, are a rich source of information, as they track sales and quantities of all items sold in retail stores through their Universal Product Code (UPC) in North America, or Global Trade Item Number (GTIN) in Europe. Several countries (for example, the Netherlands, Sweden, Norway and Switzerland) have already incorporated some retail scanner data into their CPIs using various approaches. One big challenge of using scanner data is the choice of index methodology. Additional challenges to using scanner data in CPIs include tracking commodities through time, product relaunches and substitutions (for more on these challenges, see Zhang et al. (2017) or Dalén (2017)). For the most part, traditional CPIs have been constructed using bilateral indexes which compare prices and quantities for a group of commodities for a current period to the prices and quantities for the same commodities for a base period. Given that these indexes compare characteristics for the same commodities, they do not use all of the information available from scanner data. For instance, scanner data contains information on all products sold and the quantities sold which would allow for the calculation of product expenditure weights and more frequent updates to basket expenditure weights. With this additional information, many national statistics organizations are looking at moving from unweighted fixed basket methods based on a limited number of goods to a dynamic basket of goods weighted by expenditure shares. Multilateral indexes have been proposed as alternatives to traditional bilateral indexes but there is no consensus as to which index is the best. For more on these indexes and the challenges which remain see Chessa et al. (2017) and Diewert and Fox (2017). In an attempt to distinguish between possible indexes, Zhang et al. (2017) proposed several tests to evaluate the index formulae with the goal of identifying which approach(es) are best suited for scanner data.
Scanner data for the consumer price index at Statistics Canada
239
The focus of this paper is on the impact that different product definitions, index formulas or aggregation techniques might have on the resulting calculated price index. The work in this paper is purely experimental and should not be taken as concrete plans for how Statistics Canada will use retail scanner data. The current Canadian CPI concepts will be covered in section 2. Section 3 will introduce the scanner data available to Statistics Canada, with some advantages and challenges. The results obtained from different implementations of scanner data will be presented in section 4, followed by a discussion in section 5.
2. Current Canadian CPI concepts The purpose of the CPI is to act as an indicator of changes in consumer prices experienced by Canadian households (International Labour Organisation et. al., 2004). In Canada, a fixed basket approach is used to compare, over time, the cost of goods and services purchased by consumers. By holding the quantity and quality of the goods and services fixed, the index reflects only pure price changes. The price movements of the items in the basket are weighted according to their expenditure shares, which come from Statistics Canada’s Survey of Household Spending (SHS) and are updated biennially to more accurately represent changing consumer preferences. In order for price collectors to know what to collect, Representative Products (RPs) are defined in such a way as to represent the price change of products similar in terms of price movements. Price collectors identify products that correspond to the RP descriptions, and try to price the exact same item every month. The RPs are found below Elementary Aggregates (EAs), which represent the lowest aggregation level within the aggregation structure for which SHS (or basket) weights are available. A graphic representation of the CPI aggregation structure is presented in Figure 1. In order to calculate the price index within each EA, a geometric mean of product price relatives is calculated using the Jevons formula. Higher levels of aggregation, such as basic classes, intermediate aggregates and major classes, are all obtained by using the Lowe (or Laspeyres type) formula with the basket weights. At the top of this aggregation is the All-items CPI.
3. Statistics Canada and Scanner Data Statistics Canada has obtained access to some retail scanner data and is actively negotiating access to other data sources. Under the current arrangement, weekly data consisting of revenue from sales, quantity sold,
240
Chapter 17
the Stock Keeping Units (SKU) code, the UPC, the store identification and an item description that can be used for classification are received. The SKU code is an internal identification code used by retailers which is typically at a level higher than the UPC code. The data used in this study covered a period of 29 months. As one can imagine, scanner data in general are a rich source of information with lots of potential for creating new and perhaps more frequent indices in the future, while also improving the current process. Obvious advantages include a reduction of collection costs and respondent burden, as well as a reduction of sampling variance due to having a census of products. Less obvious advantages include more representative basket weights based on quantities sold and revenue from sales, quicker identification of new products and the ability of incorporating product weights. The effect of the differences in the source of basket weights on the index is a focus of this paper.
AllͲitems
Food&NonͲ alcoholic Beverages
Household Operations
Intermediate Class
Clothing& Footwear
Intermediate Class
BasicClass
Products collected Equi-weighted within each EA
...
BasicClass
Elementary Aggregate Representative Products
Figure 17-1: CPI Aggregation Structure.
Finest level where weights are available
Scanner data for the consumer price index at Statistics Canada
241
While scanner data appear to have lots of potential, they are not without challenges. Those related to product classification and informatics infrastructure are well known (see for example Deshaies-Moreault and Emond, 2016), as well as those related to tracking a product through time and the consequences of identifying replacements or not (see Chessa, 2016). In addition, scanner data may offer multiple UPCs for a particular product which would have to be aggregated before being used in the CPI. These two dimensions are discussed in Deshaies-Moreault (2016). Additional dimensions are geography, stores and basket (elementary aggregates and above), which are the usual aggregation dimensions. The challenges of interest for this paper are the many ways the data can be aggregated and how the data can be incorporated in price indexes. For instance, scanner data offer multiple product prices per month and have to be aggregated in order to be integrated into the monthly CPI.
4. Analysis and results 4.1 Methodology Basic classes studied. To simplify the analysis, categories of products that can easily be mapped to the current Canadian CPI basket were identified to be further analyzed; after a quick overview, the basic classes under Fresh Fruits were chosen: Apples, Oranges, Bananas and Other fresh fruits. These classes of products are also interesting to study due to the seasonal nature of fruits. All results are presented at the Canada level with the base month index equal to 100. Note that the base month is not identified for confidentiality reasons. Influential Observations. Some extreme price movements were observed in the data, most likely a result of mistakes in the unit of measurement. Given the goals of the study, it was decided to remove the top and bottom 2.5% of price movements so as not to overly influence the index. Clearly in a production environment, these mistakes would be corrected and not disregarded. Also, products with less than $10 in total sales in a month (these products total