206 51 25MB
English Pages 320 [336] Year 2019
Work, Workers, and Work Measurement
WORK, W O R K E R S A N D WORK MEASUREMENT by
ADAM
ABRUZZI
PR0FE880R OF INDUSTRIAL
ENGINEERING
BTEVENS INSTITUTE OF TECHNOLOGY
COLUMBIA MORNINGSIDE
HEIGHTS
UNIVERSITY ·
NEW
YORK
PBESS ·
1956
COPYRIGHT ©
1956 COLUMBIA UNIVERSITY PRESS, NEW YORK
PUBLISHED I N GREAT BRITAIN, CANADA, I N D I A , AND PAKISTAN BY GEOFFREY CUMBERLEGE Γ OXFORD UNIVERSITY
PRESS
LONDON, TORONTO, BOMBAY, AND KARACHI LIBRARY OF CONGRESS CATALOG CARD NUMBER: 56-8159 MANUFACTURED IN THE UNITED STATES OF AMERICA
To A D A M , my son and, A D A M O , my father
Preface
M y Work Measurement is now nearly four years old—time for it to be growing up. Work Measurement did what it set out to do, which was to present new ways of measuring work. Work Measurement also said some things about work and workers, but it did not say what work is worth. Some insist that Work Measurement should have said what work is worth; others insist that Work Measurement must have said this and, because it must have said this, must have claimed more than it delivered. But measure and worth are different things, quite different things, and if Work Measurement had really claimed this, it would have claimed to give more than any book is capable of giving. Work, Workers, and Work Measurement does speak of ways of assessing the worth of work and the worth of workers as well. Add to this ways of measuring work, and the result is a theory about work, the theory being submitted here. The book is in three parts. In the first part there is the setting: "Work Measurement: Theory, Practice, and Fact." This part is not simply about work measurement if we mean by the term just that. This part is about work measurement if we mean by the term how work is considered; this may mean some measuring but, much more often, this means a few quick glances at a timepiece so as to nudge value judgments over into numbers. In the second part there is the substance: "A Work Measurement Theory: Procedure, Application, and Results." This part is mostly about measuring work, and it is as new as Work Measurement is new, for that is where it came from. There have been some additions and revisions, and the mode of presentation has been changed, with the emphasis shifted from formulas and technical detail to pattern and interpretation.
vili
PREFACE
In the third part setting and substance are brought together: The Theory of Human Work: Beliefs, Codes, and Observations. This part is about what authorities on work believe and what people at work do; which things codes explain and which things observations deny, which things can be practiced and which things must be sensed; what a theory of work can cover and what a theory of work cannot consider. Work, Workers, and Work Measurement is concerned with people and things, both at work, and it is therefore concerned both with science and with art. The social scientist, who is also concerned with both these things, is especially invited to look into the first part, where science and art are found entangled, and the third part, where science and art are blended together. The second part is mostly concerned with science, though this, too, turns out to be pretty much an art. New citations, like footnotes, are few in number. I hope the citations themselves will be considered sufficient acknowledgment of their usefulness. There remain the old citations and, since they are used here just as they were in Work Measurement, I acknowledge the courtesy of the following publishers in permitting quotations from their works: D. Van Nostrand Company; United States Department of Agriculture Graduate School; Appleton-Century-Crofts, Inc.; and the Controller of H. M. Stationery Office. Influences often are elusive in character, and the most important turn out the most difficult to trace. In large part, Work, Workers, and Work Measurement is an answer to the persistent and vigorous questions of my graduate students at Stevens, to be sure a little late for note-taking. There also are many workers and union men, management men and engineers, teachers and scholars, all friends: these men helped shape the book, in many cases without either their knowledge or consent. There is, for instance, Mr. Carter, the No. 1 man on a great tandem mill: he showed how it is that the nimble fingers of machines must always yield to the nimble arts of man. There are others who helped, these with both knowledge and consent. Catherine Abruzzi was foremost among these, a kind but rigorous critic. She was most patient, as well. To decipher often obscure sentences, she wisely invoked a maxim from her days with college German: "Be calm and look for the verb." Esther R. Lawrence, with customary professional care, transcribed a longhand piece of the manuscript that frequently verged on the unreadable. Veronica Kelleher, transcribing a
ix
PREFACE
dictated piece of the manuscript, performed her task well, perhaps slightly too well. For example, it was my view that a worthy worker is the worker who produces a worthy output. To her, a worker, the weary worker is the worker who produces a worthy output. ADAM
Castle Point, Hoboken, N. J. February, 1956
ABRUZZI
Contents
PART Ι.
1. 2. 3. 4.
Some Basic Problems of Inference The Nature of Time Study The Validity of Bating Procedures Work Measurement in Practice
PART IL.
5. 6. 7. 8. 9. 10. 11. 12.
3 18 30 44
A Work Measurement Theory: Procedure, Application, and Results
The Problems of Time Measurement The Nature and Function of Process Standardization Production Rates in the Short Term Interpreting Short-Term Studies Production Rates in the Long Term Interpreting Long-Term Studies Designing Studies of Production Activity Measuring and Estimating Delays
PART HI.
13. 14. 15. 16. 17. 18. 19.
Work Measurement: Theory, Practice, and Fact
61 77 87 106 118 141 147 166
The Theory of Human Work: Beliefs, Codes, and Observations
The Nature of Standard Data Systems Standard Element Estimates and Related Problems Standard Motion Estimates and Related Problems The Fundamental Character of Work The Nature of Work Fatigue Fatigue, Skill, and Work Postscript and Salutation
195 205 224 241 260 277 289
BIBLIOGRAPHY
303
INDEX
311
Tables 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22.
Resulte Using the Tolerance Limit Approach Typical Rating Results in a Factory Operation Comparative Data on the Marsto-Chron and Stop-Watch Ratio-Test Results on the Data for Operator ID Production-Rate Estimates Based on Two Studies on Operator 13A Production-Rate Data for Comparing Operations in the Two Plants The Relation between Skill and Consistency The Production-Rate Characteristics of Three Novices The Relation between Delays and Net Production Rates Data Showing How a Reduction in Delays Affects Production Performance The Period Breakdown Used in the Studies of Grand Stability Test Results on the Daily Means for Operation 4 Test Results on the Daily Means for Operation 6 Test Results on the Daily Means for Operation 7 Test Results on the Pooled Daily Means of the Seven Operations The Relation between Level and Consistency in Grand Production Rates The Period Breakdown Used in the Delay Study in Plant Β The Period Breakdown Used in the Delay Study in Plant A Delay Estimates and Allowances in Plant A Criteria Reported for Defining Operation Elements Independence-Test Results on Grouped Elements in Operation 3 The Summarized Data for the Cases in Which Independence Was Not Established
36 39 71 91 102 106 108 109 110 112 120 135 137 139 140 143 175 179 182 197 210 211
xiv
TABLES
23. The Summarized Data for the Cases in Which Independence Was Established 24. The Relation between Level and Consistency of ElementGroup Production Rates 25. Element Data of Operator 3H Showing How Relative Consistency Decreases with Mean Time 26. The Motion Descriptions and Times for the First Two Cycles in Study A 27. The Wink-Counter Readings in Motion Study A 28. Data from Three Motion Studies Showing the Relation between Level and Consistency
212 216 222 226 227 235
Figures
1. The Percentage of Operation Standardized in Advance by Survey Respondents 2. A Comparison of the Relative Output under Three Types of Wage Payment Plans 3. The Mean and Range Charte on Local Cycle Times for Operator ID 4. The Mean and Range Charte on Local Cycle Times for Operator 1H 5. The Mean and Range Charts on Local Cycle Times for Operator I*A, Showing the Separate and Combined R, and Limit Values 6. The Alean and Range Charts on Local Cycle Times for Operator 21 7. The Mean and Range Charts on Local Cycle Times for Operator IIA, Showing the Separate and Combined Έ, R, and Limit Values 8. The Mean and Range Charts on Local Cycle Times for Operation 21 9. The Group Mean and Range Charts for Operation 1, Showing Both the Outer 2s* and the Inner 3-eigma Limits for the Means 10. The Mean and Range Charts on the Data for Operation 1 Arranged According to the Operator 11. The Mean and Range Charts on the Data for Operation 1 Arranged According to the Work Period, Showing Both the Outer 2s* and the Inner 3-sigma Limits for the Means 12. The Group Mean and Range Charts for Operation 2, Showing Both the Outer 2s* and the Inner 3-sigma Limits for the Means
78 80 89 92
97 98
100 104
123 128
129
130
xvi
FIGUKE8
13. The Mean and Range Charts on the Data for Operation 2 Arranged According to the Operator 14. The Mean and Range Charts on the Data for Operation 2 Arranged According to the Work Period, Showing Both the Outer 2s* and the Inner 3-sigma Limits for the Means 15. The Group Mean and Range Charts for Operation 3, Showing Both the Outer 2s* and the Inner 3-sigma Limits for the Means 16. The Group Mean and Range Charts for Operation 4, Showing Both the Outer 2s* and the Inner 3-sigma Limits for the Means 17. The Pooled Means of Operators 4A and 4C Arranged According to the Garment Size 18. The Mean and Range Charts for Operation 6, Showing Both the Outer 3s* and the Inner 3-sigma Limits for the Means 19. Chart for Estimating the Number of Observations Required to Obtain Maximum Confidence Intervals of ± 5 Percent for Two Common Coefficient-of-Variation Values 20. Charts on the Total Delay Percentages Obtained in Plant B, Plotted by the Period and by the Day 21. Chart on the Total Delay Percentages Obtained in Plant A, Plotted by the Period 22. Charts on the Unavoidable and Personal Delay Percentages Obtained in Plant A, Plotted by the Period
131
132
133
134 136 138
161 177 180 181
PART
I
Work Measurement: Theory, Practice, and Fact
CHAPTER
ONE
Some Basic Problems of Inference
There are many fields, including classical industrial engineering, in which writers and practitioners alike seem compelled to claim that what they are doing is scientific. The most popular method of proving this claim is to make liberal use of the label "scientific" in either the "regular" or the "king" size. In industrial engineering the regular-size label is constantly being attached to work measurement, job evaluation, cost control, and many other techniques. If regular-size labels are not sufficiently persuasive, the king-size label, "scientific management," is attached to the over-all management function in the hope that it will persuade the recalcitrant. To be fully effective this technique of proof by proclamation requires that the proclamation be trumpeted at frequent intervals so that acceptance may be achieved by noise level rather than by verifiable content. Proof by proclamation must decree that results can be verified but only "if properly applied" by "properly trained observers" using "proper methods." It must decree that the results can be stated in multi-decimal terms arrived at by arbitrary, whimsical, or shallow methods. It must decree that assumptions and hypotheses be mentioned either not at all or phrased in terms of pontifical gibberish. Also, of course, it must decree that objectives be phrased in terms that can neither be contested nor have any concrete meaning. The technique of proof by proclamation has been found highly useful in popularizing certain political and other doctrines. It has been found equally useful by the trumpeters of industrial engineering doctrines. This book takes the view that this approach is useful only in the sense of providing economic and other advancement for the trumpeters. New approaches are needed, particularly in the area of work measure-
4
WORK
MEASUREMENT
ment, to extend the area of usefulness of industrial engineering to include the non-trumpeting segment of society. V I E W ON I N F E R E N C E - M A K I N G
IN THIS
FIELD
The problem of inference-making in industrial engineering has been conscientiously examined by a number of writers. In the main the results are quite disappointing because they simply provide a set of descriptive and innocuous rules that any experienced investigator would follow as a matter of course. The principal objection to most of these rules is that they fail to take into account the principles of modern experimental inference. There are some exceptions, notably in the works of Walter Shewhart and a sparse group of other writers.1 Shewhart himself has made a number of fundamental contributions while developing the theory of statistical quality control; this is certainly an important, though a rather lonely, example of an industrial engineering area with a respectable theory of inference. It is Shewhart's view that applied science has more demanding requirements than pure science. This is because the results of applied science are usually used as a basis for empirical action, often with important economic consequences. "Invariably," Shewhart adds, "each practical rule of action, so far as it has been adopted as a result of reasoning, is based upon some abstract concept or group of concepts." But it is essential that this rule of action be verified by experimental means before it can be finally accepted. The fact is that a scientific statement can have meaning only if it is capable of verification by a method specified in advance. Shewhart also discusses a problem that is basic to the approach adopted in this book, namely, the extent to which statistical methods can be used in applied science. Here he underscores the importance of developing empirical rather than statistical criteria for deciding whether statistical methods can justifiably be applied in a particular area. Theoretical and Applied Statistics. The distinction between theoretical and applied statistics cannot be overemphasized in view of the widespread misconception on the subject. 2 There is a profound difference between formal theory and the application of formal theory. In theo1
Shewhart. Statistical Method from the Vi-aojtmvt of Quality Control. Ser Abruzzi, "A Reply to Davideon's Review of Work Measurement," The Journal of Industrial Engineering, IV, No. 3 (1953), 18-22. 1
SOUS BASIC PROBLEUS OF INFERENCE
5
retical work a certain statistical population may be assumed to exist a priori without regard to whether that population can actually be found in the real world. In applications, however, no such assumption can be made; the existence or the nonexistence of a statistical population is the most fundamental question faced by an investigator. Once this question is resolved, the remaining questions become relatively simple. With an affirmative answer, the investigator's task becomes straightforward and rather mechanical; he simply follows the appropriate techniques to arrive at his conclusions. If the answer is negative, he cannot use statistical methods in any meaningful way unless, of course, a statistical population can be brought into being by empirical action. This fundamental question cannot be answered by statistical means, just as no mathematical theorem can be used to answer the question of whether a predecessor postulate is correct. The postulate is required to prove the theorem, and, if the postulate is wrong in some sense, it follows that the theorem is wrong. I t is scarcely appropriate, then, to use the theorem to evaluate the postulate. In the same way, statistical methods cannot be used to determine whether a statistical population exists because they are based on the assumption that such a population already exists. KEY ASPECTS OP MODERN EXPERIMENTAL INFERENCE
A number of principles and rules of modern experimental inference— perhaps the most important ones—have already been touched upon. Certain other basic principles and rules proposed by philosophers of science should also be mentioned in order to obtain a more comprehensive view of modern experimental inference. C. West Churchman, for example, suggests that a finite number of observations cannot give a complete solution to a scientific problem.1 Like Shewhart and certain others, he points out that to have a scientific result it must be possible to determine the error of observation, at least within limits. This leads naturally to the view that a formal theory of statistics is necessary, though not sufficient, for progress in science. Morris R. Cohen and Ernest Nagel also give some attention to these problems.4 They maintain that scientific investigations must begin with • Churchman, Theory of Expérimental Inference. 4 Cohen and Nagel, An Introduction to Logic and Scientific
Method.
6
WORK
MEASUREMENT
a careful definition of the problem; the relevant material must then be separated from the irrelevant. This eventually leads to hypotheses that are capable of predicting in a well-defined manner what will happen, i.e., that are capable of verification. Initially, of course, hypotheses can be framed only in qualitative terms. But after refinement they should be capable of being expressed in quantitative terms, with the ultimate goal a single hypothesis stated in numerical terms. Social Vailles in Science. One of the most important findings of modern experimental inference is concerned with the role of social and other values in science. I t is clear, for example, that the very decision to follow the scientific method requires a judgment involving concepts of value. Later chapters will show that value judgments, whether developed by individuals or by groups, also enter into scientific work in other crucial though often subtle ways. This constitutes an absolut« denial of the classical notion that scientists are interested exclusively in an abstract search for truth. One of the reasons why classical industrial engineering fails to perform its functions adequately is that this notion is still popular in the field. Paradoxically, social and other values intervene here in a much more decisive way than they do in the physical sciences. Constructing Models and Measurement Scales. All inference-making must originate from models and measurement scales. In classical industrial engineering, models and measurement scales are usually arbitrary and vague if, indeed, they are explicitly considered. A brief survey of the problems of model and scale construction is therefore particularly appropriate. Scientific models fall into two categories: formal models and material models. The distinction between these two types of models is not clearcut since, for example, a material model is always related to some formal model. In both cases the model represents a complicated system in terms of a simplified system capable of being manipulated by available experimental and analytical methods. Formal models must, of course, have a mathematical structure. In the past this has been distorted to imply that a discipline can be scientific only to the extent that its phenomena can be described by mathematical "laws." But a formal model is simply a convenient though perhaps elegant device; its usefulness depends on how well it can predict what will happen in some area in the absence of complete information.
SOME Β ABIC PROBLEMS OF INFERENCE
7
The problem of designing appropriate scales of measurement is another basic problem of inference-making, yet it has received only skimpy attention from philosophers of science. This is probably because formal general guides are next to impossible in an area where each situation requires unique treatment. In industrial engineering—and industrial engineering is certainly not alone in this—this has too often led to the use of numerical measurement scales for variables that do not even have sharply distinguishable characteristics. S. S. Stevens is one of the few who have done some spadework in this field.5 He defines the process of measurement "as the assignment of numerals to objects or events according to rules." Numerical scales of measurement, he adds, can be constructed only when a unique correspondence (in mathematical terms, an isomorphism) can be established between certain characteristics of objects or events and the properties of the number system. This is far from a complete story on the subject, but it does point up the basic requirement of a meaningful measurement scale. There is no necessity, however, for measurements to be numerical. In fact, numerical scales are possible only within the framework of a sophisticated theory. With many variables, the degree of differentiation available is still crude and only primitive qualitative scales can be applied. This means in effect that the variables in question are incompletely defined. Fortunately, this perplexing problem can largely be circumvented in the area of work measurement because the fundamental variable is time. The time variable can be defined here so as to minimize the problem of poorly-defined variables. How this may be done is discussed in detail in later sections of this book. SPECIFICATIONS FOB MAKING INFERENCES
It may well be argued that the foregoing material is essentially descriptive. This is not at all a damaging criticism; all that can be expected on these subjects is a philosophical orientation in the form of a set of guiding rules or specifications. These specifications are necessary ingredients of modem experimental inference. But they cannot provide the decisive "intuitive" component that gives richness and depth to an investigation; this must always be supplied by the investigator himself. » Stevens, "On the Theory of Scales of Measurement," Science, CIII (1946), 6 7 7 80.
8
WORK
MEASUREMENT
It is in this setting that a set of specifications is presented here as a guide to the minimum requirements of inference-making. These specifications are used later on in the book to help determine the validity of classical work measurement procedures. They were also used in planning and analyzing the experiments used as a framework for a proposed theory of work and work measurement. The initial requirement in a scientific inquiry is to define the fundamental concepts, in so far as it is possible, in operational terms. This means specifying a sequence of operations which enables investigators to interpret the concepts in the same way. This requirement is especially pertinent in industrial engineering where operational definitions are distinguished by their scarcity. The process of developing definitions is clearly not a simple one-step process. Ultimately a full definition can be developed only in terms of a sequcncc of investigations leading to ever-sharper definitions. Strictly speaking, this means that a problem cannot be fully defined until it is fully solved, for definitions must themselves be verified in terms of the specifications of which they are part. These remarks, it might be added, apply with equal force to each of the other specifications considered below. The objectives of an investigation must also be well defined. In doing this it is desirable to distinguish component objectives—sociological, physiological, psychological, and technological. The failure to distinguish these component objectives and take them directly into account may lead to serious difficulties like those which arise, for example, in classical work measurement. Also, if the problem is an estimation problem, it is necessary to specify the specific accuracy and precision requirements of the investigation. Similarly, if the problem involves testing hypotheses, it is necessary to specify the risks of making incorrect judgments. Both formal and material models will ordinarily be needed in the course of most investigations. These models should be selected so their assumptions tally as closely as possible with the real-life characteristics of the problem. In the process, it may be desirable to make a pilot investigation, particularly when the assumptions do not seem clearly established. This is also particularly desirable when some preliminary work is needed on measurement methods and scales. This is not a pressing problem in the work measurement case since, as indicated, the
SOME BASIC PROBLEMS OF INFERENCE
9
time variable can be manipulated to provide the answers to most questions of this kind. It hardly seems necessary to add that measurement capabilities determine the accuracy and precision that will be obtained. Selecting models means in effect selecting testing and estimating procedures. Plan and analysis should always be considered together. It should be kept in mind, though, that results often suggest changes in experimental structure. This reminder illustrates, among other things, that the steps involved in inference-making are sequential and interactive. Investigations can be divided into three categories. The first category comprises laboratory experiments which are primarily useful, for example, in comparing different measuring instruments. Investigations of the second category can be characterized as situational investigations. Investigations in this category will most often be needed in industrial engineering areas, such as work measurement, where experimental results depend directly on the environment in which they are made. A laboratory investigation, for example, can hardly be expected to reveal much about the performance characteristics of workers in a factory environment. The third category combines the first two categories: pilot investigations are conducted in the laboratory; then final investigations are conducted in the situational environment. The results are also subject to certain specifications. In all cases results should be accompanied with a statement explaining how they can be verified by other observers. What this means is that to the extent possible the process of verification should be independent of the observer. If the results are quantitative in nature, their principal statistical properties should be reported. This makes it possible to compare the results of different observers. Precision and accuracy are the two most important statistical properties of estimates. Precision or—to use the more connotative term— consistency refers to the degree of variation in the observed data. This is an important characteristic since the precision of an estimate defines the potential range of future observations made under the same measurement conditions. It is impossible to specify this range with complete certainty, and this is taken into account by stating the probability with which other observations might be expected to fall within certain limits. Although precision and accuracy are often related, a high degree of precision does not guarantee that an estimate is correct or accurate.
10
WORK
MEASUREMENT
Correctness in an absolute sense is, of course, impossible except in conceptual terms. In real life the correctness or accuracy of an estimate can only be determined by defining as correct the results obtained from a so-called "standard" measurement method. Like precision accuracy is concerned with the potential range of future observations, again defined in probability terms. The main difference, however, is that the range now refers to some function of the observed data, using the data obtained by the "standard" measurement method as a base. This function is intended to show how closely the results obtained by the "everyday" measurement method can be expected to correspond to the "true" value of the standard measurement method. Similar requirements exist when test results rather than estimates are obtained. In particular, each test result should be accompanied by a statement of the chances that a stated hypothesis is actually true though concluded to be falae. Conversely, it should be accompanied, whenever possible, by a statement of the chances that a stated (alternative) hypothesis is false though concluded to be true. The assumptions of a formal or material model are never fully satisfied in practice. Estimates and test results are only approximations— a point overlooked by many investigators who act as though their results are exact in a probability sense. Indeed, a result may be significant in a statistical sense but nonsignificant in an empirical sense. This point, too, is overlooked by some investigators who consider a result to l)e empirically meaningful simply because it is statistically significant. The meaning of a result depends on its empirical consequences; though statistical significance is necessary, it is far from a sufficient guide as to empirical action. To avoid incorrect and perhaps purely self-serving inferences, the estimation (and testing) function, which defines experimental results, must be distinguished from the evaluation function, which defines the empirical meaning of the results. This is particularly important in industrial engineering where many difficulties flow from the failure to make this distinction. Another specification is often cited in the literature. This specification suggests t hat a cooperative pool of scholars in related fields is extremely desirable. On the surface this sounds like an excellent suggestion. Unfortunately, scholars are rarely willing or even able to subdue their vested interests and viewpoints in favor of broad interdisciplinary
S O U S BASIC PROBLEMS O F INFERENCE
11
goals. Among other obstacles, jargon barriers and professional chauvinism must be dissolved before broad goals can begin to be understood. It seems more promising for scholars in one field to try to digest related information available in other fields. Specialists in one field who take the trouble to become well informed about related fields are much more likely to make progress than a group of querulous specialists devoted to professional segregation. DESIGNINO AN INVESTIGATION
To fully explore the specifications of inference-making would require a work of encyclopedic proportions; even then little would be added to the outline given here. Specifications should and can only be guideposts defining the framework for making inferences. Detail would simply prompt investigators to concentrate on the framework rather than the structure. It is much more useful to look into the question of how specifications could be implemented so as to guarantee sharp results.· This seems an insurmountable problem since the decisive component of successful experimental work is unique to the investigator. The manner in which that component is applied by an investigator varies with different problems, which is why it cannot readily be described in ordinary language. This so-called "intuitive" component of inference-making implies, parar doxically, that the scientific method is ultimately an art. This also helps to explain why it is only after an investigation has been successfully completed that the diffuse notes of the investigator can be organized into a systematic structure. Appending a systematic structure to an investigation does not mean that the investigation actually followed a neat series of steps. It simply means that the investigation is capable of being described in terms of a neat series of steps, and that its results are capable of verification in the manner described above. This is all that the process of experimental inference demands. Over and above this, a systematic structure is developed in response to the demands of both the investigator and the academic community. This simplifies the task of interpretation, and, probably at least equally important, it satisfies the aesthetic feeling that investigations should have a systematic structure. * See Abruzzi, "Problems of Inference in the Socio-Physical Sciences," The Journal of Philosophy, LI, No. 19 (1954), 537-19.
12
WOBK MEASUREMENT
What this really means is that there is no theory on how to construct theories. Some guidance can be obtained, to be sure, by observing how established theories have been constructed. Somewhat more guidance might be obtained if more investigators were to publish the actual paths taken by their investigations and the reasons why. The Role of Objectives. One of the specifications refers to the need for having well-defined objectives. This process is much more subtle than it appears at first glance, especially in industrial engineering work. A primary objective of inference-making is to achieve a high degree of precision and accuracy in as wide an area as possible. Stated in these broad terms this objective is common to all sciences since it is generally agreed that precision is a desirable inferential goal. It is true that all investigations involve making basic value judgments about what is desirable. But there is an essential distinction between the investigations of the physical scientist and those of, say, the industrial engineer. The physical scientist need not be directly concerned with the social problems generated by his results. Also, he need not be concerned Avith the social problems arising from the direct intervention of human behavior except in so far as it affects the process of observation. The result is that the physical scientist can devote himself to the relatively narrow technical problem of achieving accuracy and precision. An unfortunately large number of physical scientists have concluded from this that theirs are the only "trae" sciences. The fact is that theirs are the only sciences in which problems are simple enough to require just technical skills. The situation is completely different for the industrial engineer, who cannot confine his attention to the objective of precision and accuracy. He also must be directly and intimately concerned with obtaining results which take into direct account the objectives of individuals and social groups. Just a brief look at these social objectives is necessary to understand the weight of this problem. They are variable rather than fixed, interactive rather than independent, sometimes complementary but more often contradictory, and they are usually vaguely defined. These social objectives also change in number and weight with respect to time and environment. The result is that for the industrial engineer the objective of precision and accuracy is not an unrestricted goal. This objective is explicitly and sharply restricted by the need to take into account the
SOME BASIC PROBLEMS OF INFERENCE
13
complicated behavior patterns of human beings as individuals and in groups. Application to Work Measurement. It is possible to approach the problems of work measurement primarily from the viewpoint of the physical sciences: it would be necessary simply to find procedures insuring a maximum amount of precision and accuracy. Ultimately, however, this would require such an overwhelming change in workers' behavior patterns that they would essentially be stripped of their human characteristics—at least at the workplace. Later chapters show that this is impossible, but even if it were possible, it could be considered socially undesirable. Classical work measurement theory was developed along these lines, and this is precisely why it fails to be a satisfactory theory. The needs, abilities, and purposes of workers were simply not taken into account except in a superficial sense. On the other hand, these problems could also be considered exclusively in social terms. In that case the goal of accuracy and precision would be shunted aside in order to give workers complete freedom in determining how, when, and with what they work. This seems to be the feeling of all too many social scientists who have made a fashion of flogging modern industrial technology and extolling the largely imaginary virtues of preindustrial eras. They argue that humanity was then happily occupied doing exactly what it pleased. Apparently this sort of happiness means doing without adequate health, food, shelter, goods, services, and leisure—some might interpret this as doing nothing as one pleased. It seems paradoxical, too, that the social scientists who advance this viewpoint are rarely enthusiastic about giving up the benefits of technology including, it should be noted, the privilege of flogging it. Such extreme viewpoints are the result of the apparently basic human desire to find simple solutions to complicated problems. They represent a child-like hope that acting as though a problem were simple will make it so. They fail to recognize that what is required in this field are procedures which would make possible a useful degree of precision and accuracy though, concededly, with a well-defined limitation on restrictions of human freedom of action. The final decision on how such contradictory goals are to be weighted must ultimately rest with the society. This decision involves the fundamental use of an evaluation process imposed on the investigator at the beginning of his investigation and applied to his results at its conclusion.
14
WORK MEASUREMENT
The industrial engineer can aid in the process of evaluation by showing the society what social and other costs are involved in attempting to satisfy certain alternative goals. It then becomes his function to develop procedures that best satisfy the goals selected. DESIGNING AN INVESTIGATION IN WORK MEASUREMENT
Most observers familiar with this subject would probably agree that man-controlled operations exhibit a greater degree of variation than any other type of operation. This intuitive conclusion is strongly supported by experimental studies. S. Wyatt and P. M. Elton, for example, arrive at the following conclusion : the degree of variation in production rates reflecte the relative influence of the machine and the worker.7 These students also show that the greatest degree of variation is found in man-controlled operations. This is because production rates in such operations are heavily influenced by human or behavioral variables introduced by the worker. The situation is quite different with machinecontrolled operations. Here the influence of behavioral variables is sharply restricted so that the production rates are dominated by mechanical variables introduced by the machine. Also, machines are always constructed so as to have a relatively small range of variation. The net result is that machine-controlled operations have inherently a much smaller degree of variation than man-controlled operations. This line of reasoning can readily be extended to cover other types of operation. Here the degree of variation falls between the minimum of purely machine-controlled operations and the maximum of purely mancontrolled operations. Implications. It follows that man-controlled operations provide a maximum opportunity for the expression of variation in production rates. This variation would be expected to include trends and other forms of nonrandom behavior, some of which would not turn up when the worker does not dominate the production rate. The implications of this are profound. The key point is that other types of operation would exhibit some, but not all, of the variation forms exhibited by man-controlled operations. From an analytical viewpoint they can be considered special cases of man-controlled operations. Pro7 Elton, An Analysis of the Individual Differences in the Output of Silk-Weavers; Wyatt., "Some Personal Factors in Industrial Efficiency," Human Factor, VI (1932), 2-11.
SOME BASIC PROBLEMS OF INFERENCE
15
cedures for handling the variations exhibited in man-controlled operations should therefore be directly useful for handling the variations exhibited in other types of operation. On the other hand, procedures developed for operations partially controlled by machines would not necessarily be applicable to man-controlled operations. This means that concepts and procedures developed from studies of man-controlled operations should have direct application in all types of operation. But it does not follow that specific procedural details can be applied to all types of operation. In later chapters, for example, certain criteria of stability are developed for operations in the garment industry. These criteria are based upon the characteristics of the operations and the empirical requirements of the industry. The same criteria would not necessarily apply to operations in other industries unless they had similar characteristics and requirements. Concepts and procedures must always be adapted to the peculiar characteristics and requirements of the immediate plant and industry. Selecting Plants and Operations. The ladies' garment industry was selected for primary study because its operations are clearly mancontrolled. The function of most of the machinery used in the pressing and sewing operations given particular attention here is to supply workers with power and mechanical advantage. Except for this the workers exercise complete control over production rates. Another advantage of this industry was that two plants with widely different levels of production and managerial control were readily available for direct study. In the first plant most production and managerial-control functions were performed in a haphazard and unsystematic manner. The second plant, on the other hand, had a comprehensive program of worker selection and training, along with a complete production-control program. Here, indeed, the production process and the managerial process were regulated and controlled both earnestly and effectively. The two plants also differed in the way they handled production-rate and wage-payment problems. In the first plant there were piece rates of pay with sewing operations and hourly rates of pay with pressing operations; the actual pay rate in both cases was fixed by bargaining agreements. In the second plant, production standards were developed for all operations by classical work measurement or—to use the popular term—time study procedures. These standards were used as a base for
16
WORK
MEASUREMENT
a piece-rate incentive plan applying to all workers. Motion study procedures also were extensively used in the same plant for the purpose of improving work methods. Studies in these plants make up the core of the experimental material considered in this book. However, a sizeable number of laboratory and other studies was also made, some of them after Work Measurement was published. The Problem of Cooperation. Many observers have emphasized that, in making fundamental studies on problems of work and work measurement, it is highly desirable to have the cooperation of both management and labor. The ladies' garment industry proved to be an excellent choice from this standpoint, too. The International Ladies' Garment Workers' Union (ILGWU), which represents the workers in the industry, enjoys especially cordial relations with industry management. This made it possible for the union—through William Gomberg, the Director of Management Engineering—to enlist the active cooperation of both management and the workers for the studies. As a professional student of this field, Gomberg also became personally interested in the study program, providing direct assistance in the form of equipment and data. Management gave the experimental program much more than just token support. Both plants went so far as to rearrange work schedules when required by the program. In the second plant all the time study records and other technical information collected by the resident Methods Department were put at the disposal of the investigators. The bilateral assistance offered by the parties reached down to all levels, and it was undoubtedly instrumental in having a successful program. Even under these enviable circumstances, however, considerable time was spent explaining the objectives of the study program to the workers. These explanations soon satisfied the workers that the studies had a research objective and would not affect production requirements or earnings. This created a rather unexpected counterproblem. Certain workers became enthusiastic about participating in the research program and began to give uncharacteristic performances. Fresh explanations were required to impress upon these workers the need for unbiased performances. It seems scarcely necessary to add that this problem was fully overcome only when the investigators were accepted as expected members
SOME BASIC PROBLEMS OF INFERENCE
17
of the work environment, i.e., when they no longer seemed to the workers to be investigators. A rough-and-ready criterion of acceptance here was the fact that the principal investigator was invited on a number of occasions to take on non-investigative functions. One such function was acting as the Santa Claus at a plant Christmas party. But perhaps his acceptance at the party was not quite as complete as it might have been; his role was exclusively that of dispenser.
CHAPTER
TWO
The Nature of Time Study
Many critics, notably Eric Farmer, suggest separating the function of estimating production rates from the function of setting wage-payment rates. 1 The essence of their argument is that otherwise it is impossible to determine the validity of the estimating procedures. Essentially the same viewpoint is taken by certain writers on classical work measurement or time study. They, too, argue that the question of production standards should be separated from the question of rates of pay. The function of the time study engineer is simply to arrive at time standards, while rates of pay should be established through collective bargaining and considerations external to the work measurement process. These views are supported by certain experimental evidence, such as that of S. Wyatt, F. G. L. Stock, and L. Frost.2 These observers show, for example, that production rates become stabilized at a level that depends on the strength of the incentive plan. BARGAINING ASPECTS
The writers who advocate separating the two functions do not rule out using production standards as a basis for wage-payment rates. They believe that the two functions can be separated by simply forbidding the time study engineer from setting rates of pay. Ample evidence exists, however, to show that these functions must be separated in a much more complete sense. This is because workers adopt protective practices whenever they feel that rates of pay depend directly on production rates. One such practice is to protect lenient or "loose" production standards and rates of pay by manipulating output so that it has an essentially fixed relationship to the standards. 1
Farmer, Time and Motion Study. » Wyatt, Stock, and Froet, Incentives in Repetitive Work.
T H E N A T U R E OF T I M E
STUDY
19
Marvin Mundel corroborates this view. "Many standard times function well," he says, "even though they are incorrect, because the workers have learned it is advantageous to have them function." Also, workers "complain sufficiently on the standards which are difficult to achieve so that the standards are revised, and restrict their output on the easy ones so that the inequity there is undisclosed." * Practices like these, Mundel argues, are sufficiently widespread so that many inherently poor time study procedures appear to function well. The intimate relation between production standards and rates of pay also prompts workers to resist the veiy process of time study. It has been repeatedly observed, for example, that a working crew will modify its work activities at the mere hint of a study. Confirmatory evidence is supplied by the factory studies considered in this book. Some workers expressed strong opposition to the studies until they became completely convinced that the results would not adversely affect their earnings. Opposition disappeared only after enough time had elapsed to satisfy all the workers that this was actually the case. This reaction took place despite the fact that the studies were made under an extremely favorable experimental climate. This suggests that workers retain a residual fear and distrust of being observed, even under optimal circumstances. It would seem almost impossible, then, to obtain unbiased results when time studies are made for the ultimate purpose of establishing rates of pay. The Informal Bargaining Process. Workers' behavior over time studies can be classified as a component of informal bargaining, with the objective of obtaining favorable standards and rates of pay. This form of bargaining proceeds in two stages. In the first stage workers present a biased impression of the job requirements. In the second stage they protect lenient or "loose" standards and rates, while they protest stringent or "tight" standards and rates by a number of devices, including grievance procedures. It would be most unrealistic to expect time study engineers not to take this into account in developing their results. They have a ready opportunity, for example, to counterbalance real or imagined slowdowns and other biased practices by introducing counter-biases while rating performances. Rating is but one of many aspects of time study that almost beckon to be used for this purpose. » M u n d e l , Systematic Motion and Time Study, p. 156.
20
WORK MEASUREMENT
It would also be most unrealistic to expect that the engineer has no devices at his disposal for protesting lenient production standards and rates of pay. Time was when this protest was accomplished by blunt rate-cutting. This device has largely been abandoned in favor of the presumably more palatable device of introducing changes into the operation and, in the process, making the standards and rates more stringent. This, too, is bargaining behavior; the objective, of course, is to obtain standards and rates favorable to management. The Formal Bargaining Process. When workers are formally organized, their attitudes and behavior toward time study are taken into explicit account in the development of the union's official attitudes and behavior. Supporting evidence is supplied by the booklets published by the United Electrical Workers Union (UE) and by the United Automobile Workers Union (UAW).< These booklets are intended as guides for union stewards in their negotiations with management, each containing a detailed list of the controversial aspects of time study. They also stipulate that local unions should not participate in time study activity but should instead reserve the right to bargain collectively on individual studies, including their results. I t follows that unions are well aware of the shortcomings of time study. This awareness puts them in a very favorable bargaining position. They can then accept lenient production standards and rates of pay, reserving challenges for the others. The general viewpoint of the cited booklets and other works by labor spokesmen is that time study is primarily useful as a means of shrinking the areas of disagreement between management and labor. Gomberg sums up this viewpoint with the comment that "thus far modern industrial time study techniques can make no claims to scientific accuracy. They are at best empirical guides to setting up a range within which collective bargaining over production rates can take place." 5 This viewpoint suggests that labor's acceptance of time study is nothing more than a conditional acceptance. The condition is that the primary function of time study is to supply information for collective 4 U. E. Guide to Wage Payment Pian», Time Study, and Job Evaluation; The CJO Looks at Time Study. ' Cornberg, A Trade Union Analysis of Time Study, p. 170,
UAW-
T H E N A T U R E O F T I M E STUDY
21
bargaining. Labor's acceptance of time study, therefore, is based on the conviction that it is not scientific. This, of course, helps justify the argument that time study can be effective only in a collective bargaining framework. Specific Contractual Provisions. Both the UE and UAW booklets strongly urge local unions to obtain contractual safeguards on key aspects of time study. A comprehensive report on this subject has been made by the Bureau of Labor Statistics.6 This report points out that unions are generally allowed to appeal production standards; also, they usually reserve the right to bargain collectively on time study procedures and results. The report also cites 83 separate contract provisions intended to safeguard labor's interests concerning time study. A representative group includes the following: (1) an operation must be restudied at the request of the employee or the union; (2) employees must be informed about a new production standard within 24 hours of its application; (3) management and labor shall cooperate in selecting the operators to be studied; (4) no time study shall take less than one-half hour in order to assure representative results under normal conditions; (5) delay and fatigue allowances are to be established, with 5 percent for personal delays, and 2 percent to 12 percent for fatigue depending on the nature of the operation. To protect its interests management also obtains contractual safeguards in certain cases. The most common provision of this kind is fairly vacuous; it essentially pleads for reasonable and just performance by union members. Another common provision is that workers who consistently produce at a rate below the production standard are to be transferred. Their usually close relation to production standards prompts unions also to insist on contractual safeguards regarding rates of pay. This relation also explains why almost all the pay provisions cited in the Bureau's report are concerned—at least implicitly—with time study. The most important of the 157 distinct provisions of this kind specify that: (1) the company agree to consult with the union prior to changing the piece rate; (2) any employee may initiate a grievance about any piece-work price he considers inequitable; (3) piece rates can be changed • Collective Bargaining Provisions: Incentive Wage Provisions; Time Studies and Standards of Production.
22
WORK MEASUREMENT
only if a change is made in the method of manufacture, and any new time study and change in the rate of pay be applicable only to the affected elements; (4) improved worker skills shall not be the basis for changing established piece rates except by mutual consent; (5) piece rates shall include reasonable personal, fatigue, and machine allowances, and they shall enable an average worker to earn at least 25 percent above the base rat« (that is, the rate corresponding to the production standard). Implications of the Bargaining Process. These contract provisions represent the third level of bargaining between management and labor when rates of pay are based on production rates. On the second level formal bargaining also takes place, with the unions exercising a general review function regarding time study procedures and results. On the first or local level informal bargaining takes place between the worker and the engineer. Considered together these bargaining activities prove that time study is certainly not a scientific process but is a direct component of bargaining processes. It seems clear, then, that time study must be made essentially independent of rate setting if it is to perform the basic function of estimating production rates. A historical footnote on this subject is reported by Harold Davidson.7 He shows that the informal bargaining process between workers and time study engineers began immediately after time study was introduced in the late nineteenth century. The formal bargaining process, however, did not become well established until much later. This is only to be expected since labor unions were neither well organized nor privileged in the prevailing social climate of mechanistic codes. The intellectual climate of the time—much like the social—was also rigidly mechanistic. That climate nurtured the development and acceptance of time study doctrines rather than their critical examination. Fixing Dominating Variables. The estimating function can be made independent of the rate-setting function in just one way: production rates must be observed during a period when the wage-payment plan remains fixed. Also, the estimates obtained are valid only under this condition; new estimates would be needed to take into account the changed environment induced by a change in the wage-payment plan. This particular problem is mentioned because of the dominant effect 7
Davidson, Functions
and Bases
of Time Standards,
Chapter 1.
THE NATURE OF TIME STUDY
23
of wage-payment plans on work performance. But fixing the wagepayment plan does not guarantee that the estimating procedure will give valid results. It also is necessary to fix all other dominant variables. This can be demonstrated with a simple example. The production rates obtained from one type of machine will differ in general from the production rates obtained from another type of machine. Estimates obtained for one type of machine are of little value after a second type has been introduced. According to the literature, most time study writers and practitioners apparently recognize the need to fix dominant variables—with the important exception of variables concerned with human behavior, such as the wage-payment plan. This is why they say that the physical aspects of the production process must be standardized before making a time study. But they fail or refuse to recognize that the wage-payment plan (and other dominant behavioral variables) has at least as great a role in standardizing as the ones they agree should be fixed. The net result is that standardizing physical aspects does only part of the required job of standardization. Dominant variables are, of course, fixed in all fields of pure and applied science. In the language of experimental inference, dominant variables must be fixed to have the essentially constant conditions needed for making precise estimates. Estimates have meaning only when conditions remain essentially the same. But this would decidedly not be the case if, for example, some dominant behavioral variable, such as the wage-payment plan, did not remain fixed. In the framework of work measurement theory, the process of fixing dominant variables can be defined as a process of gross standardization. How this process ties in with other aspects of standardization is described in Chapter 6. Estimation and Evaluation. Another fundamental reason for fixing dominant behavioral variables is to separate the problems of estimation from the problems of evaluation. The process of setting production standards must involve a fundamental evaluation component over and above an estimation component. That time study is a direct component of a bargaining process is proof enough of that. Decisive supporting evidence is supplied in Chapter 4, where it is shown that many time study difficulties arise because the estimation function is not distinguished in any way from the evaluation function.
24
WORK
MEASUREMENT
MAJOR SHORTCOMINGS OF TIME STUDY
Difficulties of Verification. Some time study writers argue that production standards must be reasonably accurate; their basis is that many actual production rates are in the neighborhood of the standard rates. The fact is that the concept of accuracy has no place in discussions of production standards. Production standards are the direct result of an evaluation process involving decisions regarding what ought to be rather than what is. Such decisions can never be accurate except in the sense that a certain measure of agreement can—at least sometimes —be found among observers on what a production standard ought to be. Even under this agreement criterion, however, the claim of "accuracy" is not too well founded: numerous production standards are found "inaccurate" upon rechecking, sometimes by as much as 50 percent."
The focal argument here is that workers regulate their production rates during the time study so that the standard will be lenient; they also regulate their production rates after the study so that it has a fixed relationship to the standard. By these means the workers can easily make the standards appear accurate when it is in their interest. Under present conditions, then, production rates are based on social and other vocational interests of workers, both as individuals and as groups. The relationship between these rates and standard rates can therefore give no evidence whatever about cither the validity of the standard rates or about the procedures used to develop them. This is all the more true where formal bargaining takes place along the lines suggested above. Specific Shortcomings. A wide variety of competitive and frequently contradictory time study procedures are recommended in the literature. In the trenchant words of the UAW booklet, these procedures are "occasioned by nothing more solid than the desire to be different and, thereby, achieve exclusiveness and recognition." 0 In any event the very existence of numerous competitive procedures amounts to an admission that a generally acceptable theory does not exist. • See, for example, Golden and Ruttenberg, The, Dynamics of Industrial Democracy; Preegrave, The Dynamics nf Tims Study; Why te, "Incentive« for Productivity: The Bundy Tubing Case," Applied Anthropology, VII (1948), 1-16. • The UAW-CIO Loots at Time Study, p. 20.
T H E N A T U R E O F T I M E STUDY
25
I t is natural to find under these conditions that numerous competitive assumptions, definitions, and claims are made about time study procedures and results. The only characteristic they seem to have in common is that none has been verified by experimental means. There could hardly be any other result since the assumptions, definitions, and claims are often so empty of content that it is impossible to subject them to verification. Louden gives a typically empty definition when he says: "A fair standard is what we can expect a normal worker to do in the performance of a given job. . . . This standard is to be set on a fair, honest, and equitable basis, not requiring killing exertion, but at the same time requiring a normal day's output." 10 The fatal weakness of this definition is that it fails to specify how to identify the thing being defined. Louden's definition of " a fair standard" is subject to a multitude of interpretations, including what is meant by a "normal worker." Definitions like this should be contrasted to the more meaningful operational definitions defined in the first chapter. A suggestive example is provided by Shewhart who defines statistical control in the following terms: A phenomenon will be said to be controlled when, through the use of past experience, we can predict, at least within limits, how the phenomenon may be expected to vary in the future. Here it is understood that prediction within limits means that we can state, at least approximately, the probability that the observed phenomenon will fall within the given limits. 11
The Criterion of Acceptance. There is only one reason why competitive and even contradictory time study procedures—complete with competitive assumptions, definitions, and claims—can coexist. The reason is that time study is essentially a component of a bargaining process. In that framework the actual procedure is of negligible consequence and its assumptions, definitions, and claims are of even less consequence. All that matters is whether the production standards and the earning opportunities provided are considered acceptable by the parties. If they are, the procedure is considered acceptable regardless of its details. This is the only sense in which production standards can be "accu10 Louden, "Management's Search for Precision in Measuring a Fair Day's Work," Advanced Management, VII (1942), 27-31. 11 Shewhart, Economic Control of Quality of the Manufactured Product, p. 6.
26
WORK
MEASUREMENT
rate," and claims of accuracy simply mean that certain standards have been accepted. In view of the workers' bargaining behavior on the matter, acceptance and, hence, "accuracy" must really mean relatively lenient standards. The naïve use of the acceptance criterion can readily be shown to be unsatisfactory. According to this criterion, for example, crystal-ball gazing—which has a strong and perhaps hereditary resemblance to certain time study techniques—would be a perfectly reasonable procedure for measuring work. The Role of Judgment. It is frequently argued that some of the variables in time study work cannot be measured. This argument is claimed to justify the use of judgment in connection with these variables. What is really being argued here is that certain aspects of time study are purely evaluative. This is particularly true of the procedures used for rating worker performance. This is intended to establish what production rates ought to be; it has nothing to do with measurable variables in a scientific sense. Many difficulties indeed flow from this unannounced and unhappy wedding between a meek estimation function and an overbearing evaluation function. Even if measurable variables were being considered, there is no a priori reason to believe the results of subjective measurements would be either accurate or precise. The evidence from fields such as industrial psychology, where subjective measurements apparently must often be made, shows that they are neither sufficiently accurate nor sufficiently precise to be considered scientific. This certainly does not mean that judgment is never used in scientific work—a great deal depends on the investigator's judgment even in sciences as esteemed as physics. However, the rules of experimental inference do demand that estimating procedures be independent of the observer. Worker-performance rating definitely does not meet this criterion. The Pooled Judgment Approach. Another common argument is that the value of judgment is somehow improved by pooling the judgments of several observers. This argument likewise is not concerned with judgments used in an estimation sense. The judgments in question are used to evaluate what production rates ought to be in terms of the value systems of certain individuals. Pooling several of these judgments can
THE NATURE OF TIME STUDY
27
only serve the purpose of investing the final judgment with the mantle of authority by numbers; it can in no way transform an evaluation process into an estimation process. Even if the judgments do refer to measurable variables, the argument is still shallow. The observers in a pool usually have similar vocational backgrounds so that they will usually have similar judgments. The net result is that pooled judgment is only slightly, if at all, better than individual judgment. The only time when a pooled judgment is useful in measurement work is when each judgment represents an independent and random sample from a population of judgments. The pooled result can then be considered to yield an accurate or, in statistical terms, an unbiased estimate of some "true" judgment, i.e., the mean value of the population of judgments. Even so, the precision of estimate, which refers here to the degree of variation among the judgments, would be expected to be quite poor. An Example of Contradictory Conclusions. It has already been remarked that the unhappy wedding between estimation and evaluation is responsible for much difficulty in time study. Quite often this difficulty takes the form of contradictory conclusions about time study practice. An instructive example is provided by the views of Phil Carroll and William Schutt on the comparative merits of the so-called "snapback" and "continuous" methods of taking stop-watch observations. These writers agree that the snap-back method involves an error of observation due, among other things, to the mechanics of returning the watch hand to zero after each operation element. The implications of this in terms of accuracy and precision are presented in Chapter 5. To this point the evidence obtained is scientific in the sense that certain conclusions can be derived according to the rules of experimental inference. It then becomes appropriate to evaluate the results in terms of their empirical implications. Carroll's view is that this error is unimportant compared to the error introduced by the rating procedure.12 The fact that Carroll makes a meaningless comparison between estimates and evaluations is of no immediate consequence here. The point here is that Carroll's view, if it were to make sense, is itself the result of an evaluau
Carroll, Time Study for Cost Control, p. 70.
28
WORK MEASUREMENT
tion, not an estimate. Schutt, too, makes an evaluation when he concludes that the snap-back error is large enough to justify giving up this method of measurement.13 Time Study as an Art. Because of such problems, some time study writers plead that time study is an art rather than a science. This view is often offered in tandem with the collateral view that certain fundamental tools of the scientific method, particularly statistical procedures, cannot be applied to work measurement. This view would be acceptable if it could be interpreted as an admission that time study is essentially an evaluation process, with a few pallid attempts at estimation thrown in. This is not the case, however; every writer who takes this position also takes the openly contradictory position that highly accurate results can be achieved with time study. The fact is that scientific method certainly can be applied in the work measurement area—once it is recognized that estimation problems must lie distinguished from evaluation problems. Equating Minuteness with Accuracy and, Precision. The shortcomings of time study are brought into sharp focus by considering the problem of time measurement. In general, time study writers believe that more minute measurements mean sharper results. But the value of measurement depends on accuracy and precision, not minuteness. The accuracy and precision of observations should be compared to the stipulated level of accuracy and precision in the immediate problem. This—not the minuteness of the measurement method—determines whether measurements are satisfactory. THE COMPONENTS OF TIME STUDY
Thus far, time study procedures have been considered from an overall viewpoint. The more specific aspects of time study are given direct attention iu later chapters where they may readily be compared to their counterparts in the proposed theory. The one exception is rating, for which there is no direct counterpart in the proposed theory. This subject is therefore treated in a self-contained manner in the next two chapters; these chapters also provide a useful background for the subject matter of later chapters. It is desirable in any case to summarize the principal steps followed 13
Schutt, Time Study Engineering, p. 54.
THE NATURE OF TIME STUDY
29
in time study practice. These steps are common to all time study procedures, though they do differ in details, which turn out to be no more important than buttons on coat sleeves. The first step is process standardization. When standardization is considered achieved, the mechanics of making a study are defined; this includes selecting a worker for study, selecting a measuring instrument and method, and selecting a recording instrument. It is also customary to define the operation components or elements to be considered and to determine the number of readings needed. After the readings are taken, the data are usually summarized in terms of some measure of central tendency, such as an arithmetic average or mean. Except for cases where certain delays are also looked into, these steps make up the estimation component of time study. The next step is to transform the summarized data by a rating procedure. The result is presumed to represent what production performance ought to be in terms of a so-called "normal" worker. The data are then transformed again by adding allowances for fatigue and delays of different types. These allowances are sometimes based on estimates, but, much more often, they are based on nothing more than bargaining agreements and codes. This book also considers the rather popular standard data systems for developing production standards. These systems need not be defined here since they are exhaustively treated in later chapters. The same comments apply to the classical theory of work; this also is given a self-contained treatment in later chapters. From time to time, reference will be made in the chapters which follow to a field survey of the views and procedures of practitioners.14 The survey material indicates to what extent the procedures used differ from those formally recommended in the literature, and why. It also gives the views of practitioners on the problems of implementing both formally recommended and intuitively developed procedures. 14 Abruzzi, "A Survey of American Time Study Practices," Time and Motion Study, II, No. 1 (1952), 11-25.
CHAPTER
THREE
The Validity of Rating Procedures
Since the rating component of time study performs an evaluation function, it cannot possibly be a scientific process. The claims of scientific validity, however, have been so well advertised that almost everyone in this area has examined rating as though it were a scientific activity. 1 Then, too, rating is admitted, even by its most steadfast adherents, to be the weakest component of time study. Investigating an activity admitted to be weak because of its judgment base—even when that judgment purports to perform an estimation function—is particularly appealing, if only because the investigator has an ample opportunity to exercise his own judgment. In this chapter, rating procedures will be examined in terms of the definitions, assumptions, and results set forth in the literature. 2 This provides a springboard for the next chapter which examines the real nature of rating and its role in work measurement. T o simplify the discussion, the two most fashionable rating procedures will be considered: the so-called L M S leveling procedure and the "effort" rating procedure. 3 This is really 110 limitation; most of the argument will apply with equal force to the dozen or so other published rating procedures in current use. THE MECHANICS OF RATING PROCEDURES
The LMS Leveling Procedure. S. M . LowTy, H. B. Maynard, and G . J. Stegemerten, the originators of the L M S leveling procedure, consider 1 See, for e x a m p l e , G o m b e r g , A Trade Union .1 nalysis of Time Study; a n d D a v i d eon, Functions and Bases of Time Standards.
7 See Abruzzi, "Time Study Rating Procedures," The Journ/il of Industrial Enr gineering, V, No. 1 (1954), 5-7, 17. 2 See Abruzzi, "A Survey of American Time Study Practices," Time and Motion Study, IL, No. 1 (1952), 11-25.
THE VALIDITY OF RATING PROCEDURES
31
four factors as relevant: »kill, effort, consistency, and conditions. Like most other aspects of the rating approach, these factors are defined subjectively and vaguely. Thus effort is defined "simply as the will to work. . . . It is not related to the amount of foot-pounds of work exerted during a given period, but rather to the zest or energy with which the task at hand is undertaken." 4 The authors also define, again subjectively and just as vaguely, six classes for each of the four factors. They state, for example, that average effort is a little better than fair effort and a little poorer than good effort. These classes are assigned arbitrary numerical values which, upon close inspection, exhibit a definite and rather strong bias toward low ratings. The value assigned a worker with super-skill, for example, is closer to the value for a worker with average skill than is the value assigned a worker with poor skill. One of the chief assumptions made by these authors is that the entire human population has a normal distribution with respect to the sum of these four factors, i.e., when their numerical point values are added. There is no supporting evidence for this; nor is there any for the related assumption that the extremes in this population have a ratio of about 6 to 1. The authors then make the equally unsupported assumption that the industrial subpopulation also has a normal distribution with respect to (the sum of) these factors, but with the much smaller ratio of 2.76 to 1 between the extremes. The justification given is that the industrial subpopulation is made up entirely of individuals with marked ability to do their jobs. This is not only an unsupported assertion; it also directly contradicts the eligibility requirements established by the authors for industrial workers. According to their classification system a worker who makes many errors, lacks coordination, is a misfit for the job, obviously kills time, resents suggestions, appears lazy, and is unable to think for himself would be accepted as a legitimate member of the working populalation. Such a worker can hardly be considered to have marked ability for doing any kind of job. In their summary the authors remark that, with proper application and understanding of its underlying principles, their procedure gives satisfactory results. This illustrates one of the most popular forms of proof by proclamation. In this form an unsatisfactory result Is equated 4
Lowry, Maynard, and Stegemerten, Time and Motion Study, p. 216.
32
WORK MEASUREMENT
to misapplication, and every problem of verification is defined away, including the problem of how to determine what is a satisfactory result. The "Effort" Rating Procedure. By far the most popular rating procedure is Presgrave's "effort" rating procedure.5 This procedure formally considers just the single factor of relative pace. This factor is evaluated on a percentage scale, with 100 percent representing the "normal" pace. I t might well be noted that Presgrave rejects skill as a rating factor on the ground that skill is difficult to define and measure. This argument suggests that irrelevant simplicity is more important than relevant difficulty; this odd approach to problems of inference means, among other things, that action should precede understanding. In developing his procedure Presgrave, too, assumes a priori that there is a normal distribution of industrial workers. However, the Presgrave distribution refers to the relative pace of the workers, and it is presumed to have a ratio of 2.25 to 1 between extremes. In this case the ratio value was developed in large part from data on certain nonvocational characteristics gathered by David Wechsler and which will be considered a little later on.' Like certain other time study writers. Presgrave says that it would be ideal to rate—relative pace, in his case—by using statistical procedures. But since none seems to exist, he reconciles himself to the use of subjective procedures. Despite this he agrees with almost all other time study writers in believing that rating results can be "accurate." His view is that ratings can be "accurate" to within ± 5 percent. BASIC ASSUMPTIONS AND PROBLEMS
The Ralio between Extremes. Both the LMS leveling procedure and the Presgrave "effort" rating procedure make the fundamental claim that, between extremes, the ratio of workers' production rates is quite limited: 2.76 to 1 in the former case and 2.25 to 1 in the latter. Presgrave essentially supports his 2.25 to 1 ratio value by citing Wechsler's evidence. Presgrave also criticizes certain data published by Clark Hull indicating that the ratio between extremes in production rates for the average vocation is between ά to 1, and 4 to 1.' On the basis of these ' Fresgrave, TVie Dyraimics of Time Stwiy, pp. 70-107. • Wechsler, The Range of Human Capacities. ' Hull, Aptitude Testing.
T H E VALIDITY O F RATING
PROCEDURES
33
criticisms he proceeds to ignore these data. But he does accept Hull's other ratio values which are close to those reported by Wechsler. Presgrave also exhibits data on the production rates of 120 experienced lathe operators during a single day's trial. He considers this case, having a ratio of 2.04 to 1 between extremes, as sufficient industrial evidence to justify adopting the 2.25 to 1 ratio value. The occasional time study writer who endorses neither the LMS leveling procedure nor the Presgrave "effort" rating procedure does endorse the viewpoint that the range of human capacities, including vocational characteristics, is extremely limited. Mundel, for example, asserts that "with any physical attribute, the ratio between the best and the worst is seldom greater than 2 to 1 with few cases of even these extremes." 8 In their procedure Lowry, Maynard, and Stegemerten also postulate that vocational characteristics have a limited range, with the cited ratio of 2.76 to 1 between extremes. In their case, however, none of the basic data has been published; the only "proof" offered is that the procedure was developed in some apparently darkroom manner from the results obtained by 175 observers. The Problem
of Measurement
Scales.
I t is c e r t a i n l y n o t difficult t o
construct scales of measurement so that some relatively constant ratio between extremes, such as 2 to 1, will apply to a number of variables— all that is really necessary is to assign the number 2 to the upper extreme value and the number 1 to the lower extreme value. This procedure may have advantages in certain situations. All measurement scales involve an arbitrary choice of endpoints and, ultimately, units. But in time study rating primary interest is not in the extremes as such. It is always assumed, at least implicitly, that the scale is linear so that, for example, a worker with a skill rating of 160 will have 1.6 times as much skill as a worker with a skill rating of 100. This means, among other things, that an interval of a certain size must include the same skill content all along the scale. In more formal language, the units of a linear scale must be multiplicative so that a rating of 150 represents three times the content of a rating of 50. The units must also be additive so that a rating of 155 is equal to the sum of a rating of 100 and a rating of 55. These requirements together insure equal contents for equal intervals. I t is useful to consider some simple examples of linear and nonlinear • Mundel, Systematic Motion and Time Study, pp. 131-32.
34
WORK
MEASUREMENT
scales. The endpoints of even a simple foot rule must ultimately be arbitrary. But what is defined as two feet is, in fact, twice as long as what is defined as one foot. Also, one can add a reading of one foot and a reading of two feet to arrive at a reading of three feet. It is also clear that equal scale increments correspond to equal lengths, no matter where they appear on the rule. Thus the foot rule does meet the requirements of a linear measurement scale. The major league baseball player who bats .400, on the other hand, is concededly a great, deal more than twice as good as one who bats .200. In practice, this is recognized by the fact that the first player generally becomes a baseball immortal while the second may soon find himself in another profession. Similarly, an individual with an intelligence quotient of 150 is not 1.5 times as intelligent—if this is indeed the trait measured—as one with an intelligence quotient of 100. This is recognized by those who understand that the numbers in question are just convenient empirical devices rather than linear quantities. The basic difficulty, of course, is that the intelligence increment in the ten-point interval from 150 to 160, say, is much greater than the increment in the ten-point interval between 90 and^lOO. The performance rating situation is remarkably similar to the baseball and intelligence quotient situations. It is not difficult to construct scales for factors like skill or relative pace using the ratio of 2 to 1 between extremes; this is essentially what is done in the ease of batting averages and intelligence quotients. There is no evidence, however, that a worker with a skill rating of, say, 180 possesses 1.8 times as much skill as one with a skill rating of 100. Nor is there any evidence that rating scales meet the additive requirement of linear measurement scales. In this case equal intervals do not correspond to equal contents. On the contrary, this book shows that superior workers have entirely different performance characteristics from inferior workers. These differences are so great, in fact, that it may be impossible to compare these two kinds of workers in any quantitative sense, let alone relate their performances in a simple linear manner. Many of the differences are so minute and subtle that they cannot be detected by the simple visual means usually employed in the rating process. The general practice of setting up simple linear relationships to compare the performance characteristics of workers must therefore be considered arithmetic gibberish, sufficient in itself to condemn all current rating procedures.
THE VALIDITY OF RATING PROCEDURES
35
For rating procedures to be worthy of attention, it would be necessary to construct factor scales demonstrated to have the required linear properties. Only then would it make sense to use rating procedures in the manner described. It might well be repeated that it would first be necessary to develop definitions of rating factors that begin to make it possible to use numerical scales. Characteristically, these questions have received a trivial amount of attention in time study literature. The Tolerance Limit Approach. Even with well-developed definitions and measurement scales, there is a decisive shortcoming in using the ratio between extremes to define population endpoints (for human characteristics). It assumes that sample extremes are exactly the same as the extremes of the underlying population. This is definitely not the case. Sample size plays a fundamental role here, just as it does in all problems of estimation. This question, however, is completely overlooked by Wechsler and Hull and also by those using their results. There are formidable mathematical difficulties in developing a theory relating sample extremes to population extremes. However, an adequate theory for present purposes is available in the tolerance limit approach developed by S. S. Wilks.9 According to this approach the question asked is: How large a sample is needed to be sure that the probability is a that a minimum proportion b of the population will fall between a pair of sample extremes? In practice, a decision must first be made on the magnitude of a, using as a criterion the empirical consequences of arriving at an incorrect conclusion. Although any desired probability value can be used, the two common values of 99 percent and 95 percent seem suitable for the present problem. Another decision that must be made is the desired value for b; this decision also must be based on empirical factors. In the present problem it seemed desirable to have almost complete population coverage, and a value of 98 percent was selected. Certain data collected by Hull and Wechsler were subjected to the tolerance limit technique.10 These data were deliberately selected so as • Wilks, "On the Determination of Sample Sizes for Setting Tolerance Limits," Annals of Mathematical Statistics, X I I (1941), 9 4 - 9 5 ; also, "Statistical Prediction with Special Reference to the Problem of Tolerance Limits," Annals of Mathematical Statistics, X I I I (1942), 4 0 0 - 4 0 9 . 10 Wechsler, The Range of Human Capacities, Appendix B, Tables 9 - 1 5 ; Hull, Aptitude Testing, Table 7, p. 35.
36
WORK
MEASUREMENT
to point up the primary weakness of the ratio-between-extremes approach, and they gave the results shown in Table 1. TABLE 1. RESULTS USING THE TOLERANCE LIMIT APPROACH WECHSLER
DATA
Μ Ι Ν Ί Μ Γ Μ P R O P O R T I O N ('>) C O V E R E D
Characteristic
Sample Size Calcium in Spinal Fluid 49 Hemoglobin Content 40 94 Pulse Rate
a = 90%
Ratio of Extremes 1.16 to 1 1.25 to 1 1.66 to 1
a = 99%
85% 82% 92% HULL
81% 78% 90%
DATA
MINIMUM PROPORTION (b)
Productivity Index Cane I Case II Case III
Sample Size 26 32 32
Ratio of Extremes 1.5 to 1 1.5 to 1 1.8 to 1
a = 95% 83% 86% 86%
COVERED
α = 99% 79% 82% 82%
Two b values are reported in each case. With hemoglobin content, for example, the first b value means essentially that the probability is 95 percent that at least 82 percent of all people have hemoglobin contents falling between the reported extremes of 14.39 and 18.03 (grams per 1,000 cubic centimeters). (This statement is not strictly correct according to the formal theory of statistics; this is not an important question, however, in the present discussion.) According to the criterion value of 98 percent, the actual b values obtained for the characteristics considered are uniformly unacceptable. The table also suggests that sample size is a decisive factor in whether an observed b value is acceptable since, for example, larger sample sizes correspond to larger b values. Indeed, the primary reason why the ratio values fail to be acceptable is that the sample sizes are too small. Sample sizes of 500 or more would be needed to insure a minimum b value of 98 percent with an a value of 99 percent. It follows that the ratio values reported by Hull and Wechsler and relied on so heavily by Presgrave and others are quite likely incorrect. In the case of Lowry, Maynard, and Stegemerten, the data presumably obtained have not been published, so it was impossible to look into their 2.76 to 1 ratio value. The development which they present, however, indicates that this ratio also is quite likely incorrect. Since these ratio
T H E VALIDITY OF RATING PROCEDURES
37
values are used as a base for the corresponding rating procedures, the rating procedures also must be considered to have doubtful validity. To make valid inferences of the kind required, the tolerance limit approach should be applied to a comprehensive body of data. These data should meet all the requirements of that approach, particularly with regard to sample size. The data should also be based primarily on characteristics associated with industrial production rates rather than with unrelated human characteristics. The Assumption of Normality. Another basic difficulty of rating procedures is the assumption of normality. Even if the measurement scales were linear, there is no reason to believe that the corresponding distributions would be normal. Ρresgrave's assumption of normality is not based on any direct study of relative pace, the single characteristic formally considered in his rating procedure. It seems to be based instead on the Wechsler data, though, as Wechsler himself points out, these data are not normally distributed. The apparently darkroom approach used by Lowry, Maynard, and Stegemerten implies that their assumption of normality is even more questionable. But these questions cannot be treated in such a cavalier manner, for the validity of the results depends directly on the distribution assumptions that can reasonably be made. Procedures of analysis, such as the tolerance limit procedure, can be applied in any case, but unwarranted or unsupported assumptions effectively destroy the usefulness of any result. T H E CONSISTENCY AND ACCURACY OF RATING
Rating Consistency. Time study writers are quite ready to quarrel about which rating procedures should be endorsed. But they exhibit remarkable agreement with regard to the "accuracy" of these procedures, which is usually claimed to be of the order of ± 5 percent. Another point of remarkable agreement is that none of the claims is supported by independent evidence. For example, time study writers and practitioners alike rarely, if ever, make a distinction between accuracy and consistency.11 One of the consequences of this is that the claims of "accuracy" have nothing to do with accuracy but are only concerned with consistency. The claims always refer to the ability of one or more observers to 11 See Abruzzi, "A Survey of American Time Study Practices," Time and Motion Study, II, No. 1 (1952), 11-25.
38
WORK
MEASUREMENT
duplicate rating results in repeated trials—this is actually what is meant by consistency. Also, the results depend on the observer so that there can be no such thing as the consistency of a rating procedure. Consistency must therefore refer to the application of a procedure; it can have meaning only when observer and procedure are considered as joint components of an observation method. These facts prepare the way for evaluating experimental results on rating. One set of results has been reported by H. B. Rogers who tested the rating consistency of a group of observers.12 Rogers had each observer time and then rate several test subjects performing a simple card-dealing laboratory operation. Rogers developed a simple test of consistency for these data which indicated that many of the ratings had an error potential of 10 percent or more. His conclusion was that claims of rating consistency are not very well founded. A number of other students dispute the claims made on rating consistency. J. H. Quick, W. J. Shea, and R. E. Koehler, for example, describe a study in which 100 time study engineers from different plants and industries were asked to develop a production standard from a film record of a simple bol t-and-washer assembly.13 The result was that 62 different standards were recommended, with a total variation of 61 percent. A large number of tests for rating consistency was also made during the investigations leading up to this book. These tests were intended to determine the consistency of experienced time study engineers, as individuals and as groups, in using the "effort" rating procedure. Table 2 gives a typical set of results on two components or elements of a garment-sewing operation; they represent the data obtained by two engineers observing two operators within a period of two days. The data are given in units of a hundredth of a minute for case of presentation, a policy followed tliroughout the book except when otherwise specified. One of the principal claims made is that rated times will not usually vary by more than ± 5 percent on a single operation. Yet a simple 11 Rogers, "Practical Analysis of Time Variables for Standard Time Data," in Proceedings of the Time and Motion Study Clinic, Nov., 1941, pp. 10-19. 13 Quick, Shea, and Koehler, "Motion-Time Standards," Factory Management and Maintenance, CIII, No. 5 (1945), 97-108.
T H E VALIDITY O F RATING
39
PROCEDURES
T A B L E 2 . TYPICAL RATING R E S U L T S I N A FACTORY ELEMENT 1
OPERATION
ELEMENT 2
Engineer
Operator
Mean Time
Rated Time
Mean Time
Rated Time
1 1 1 1 1 2 2 2
1 1 1 2 2 1 2 2
16.3 14.3 19.0 20.2 22.7 18.5 20.6 19.7
20.4 15.7 22.8 24.2 27.2 23.1 22.7 23.6
10.0 8.8 9.8 11.1 11.1 9.3 10.0 11.4
12.5 9.7 11.8 13.3 13.3 11.6 11.0 13.7
analysis of the tabulated results shows that neither engineer met this criterion even in rating just one operator. The first engineer was somewhat poorer in this respect partially, it would appear, because he had two more opportunities to be inconsistent. Also, the rated time values of the two observers vary widely, both with a single operator and with the two operators. The table shows further that the rated times are closely related to the corresponding mean times. This contradicts the fundamental thesis of rating proponents who insist that rated times represent the times to be expected from a hypothetical "normal" worker. If this were true, rated times should be independent of the mean times. Another point is that the variations in the mean times are somewhat smaller than the variations in the rated times. This is surely an odd result in view of the insistent claims that rating procedures reduce observed times to a unique value. Rating procedures may not only yield inconsistent results; they apparently introduce an additional source of inconsistency into the time study process. The other studies in the series were made on a variety of operations in the garment industry. Tn general, the studies were concerned with from 2 to 12 operators; the ratings were made by 2 to 6 time study engineers; and the operations were made up of 2 to 12 components or elements. The results obtained were essentially the same as those suggested by Table 2. As might be expected, however, the degree of inconsistency per element increases rapidly as the number of operators and engineers increases. Also, no two observers agreed within 5 percent on all the elements of a given operation. Though always present, the relationship between mean times and rated times was not consistent either in degree or in type from one study to another. The only generalization
40
WORK MEASUREMENT
that could be made on this question is that, in general, larger mean values were associated with larger rated values. A somewhat similar piece of experimental evidence was obtained by L . Cohen and L. Strauss.14 Film studies were made of 21 experienced operators working on a man-controlled operation. Each operator was rated by three trained observers using the LAIS leveling procedure. When their pooled rating judgment was applied to the mean times, the rated times were found to vary from 59.04 to 124.70 film units (1/16 of a second). This hardly confirms the claim that rating procedures erase differences in operator performance and result in a unique value characteristic of a "normal" worker. The pooling process, of course, eliminated differences in judgment among the observers. If observer differences had been taken into account, the variations in the rated results would have been even more dramatic. Though weighty, this evidence does not necessarily imply that consistent ratings cannot be achieved, for it is certainly possible, at. least conceptually, to train a group of observers to evaluate performances in the same manner. The evidence does imply that often consistency is not achieved, even when a single observer evaluates a single operator. In any event it is meaningless to speak about the inherent consistency of a rating procedure. A rating procedure can only be as consistent as those who use it; it is impossible to do more than determine the consistency of a single observer or a group of observers in evaluating one or more operators. Rating Accuracy. Even if it could be established that rating procedures are consistent, this would not be sufficient to make them valid. There remains the crucial question of the accuracy of rating. A reasonable amount of diligence, accompanied by vocational pressures, could persuade any group to evaluate performance in essentially the same way. This certainly does not mean that the evaluation is a correct or an accurate one. But no evidence has ever been produced showing that rating procedures are accurate. This is quite understandable: accuracy is rarely even defined in current literature. T o examine rating accuracy, it is necessary to have an independent criterion to validate the rating factors 14 Cohen and Strauss, " T i m e Study and the Fundamental Nature of Skill," Journal of Consulting Psychology, X (1946), 146-53.
T H E VALIDITY OF RATINO PROCEDURES
41
used. An example from the physical sciences will help establish this point. I t is well known that tensile strength can be estimated from hardness and density. Tensile strength values can therefore be used as the independent criterion for determining the accuracy of estimates from hardness and density values. With rating, however, a non-removable dilemma is posed because the only independent criterion that could possibly exist is the performance characteristics of the hypothetical "normal" worker on the given operation. If this value is known, there would be no need to take data. If it is unknown, there is no way to determine accuracy. These remarks can best be understood by referring back to the tensile strength example. Tensile strength can be measured directly; it is estimated from hardness and density simply because of empirical factors like economy and ease. If tensile strength were not measurable, however, it would be impossible to determine how well tensile strength values could be estimated from hardness and density values. Even if the dilemma could be resolved, a major problem would still exist with respect to the LMS leveling procedure. This can be shown by considering the required estimating equation: Independent Criterion = a(Skill) + b(Effort) + c(Consifltency) + ¿(Conditions).
The important point is that the coefficients in this equation depend on the relationships that exist among the factors. In particular, the coefficient values can be unity only if the rating factors are mutually independent. But Lowry, Maynard, and Stegemerten admit this is not the case. They state that "it is not customary to find a given degree of skill accompanied by an effort more than two classes higher or lower than the skill class, or vice versa." They also state that "operators of high skill usually work more consistently than less skilled operators," and that consistency "must be weighed in the light of the skill and effort of the operator." 15 This means that the recommended procedure of simply adding the values for skill, effort, consistency, and conditions, i.e., assuming that each of the coefficient values is unity, is completely unjustified. Even if an independent criterion could be developed, there would still be no way to establish the accuracy of a rating procedure. The 15
Lowry, Maynard, and Stegemerten, Time and Motion Study, pp. 217, 225, 242.
42
WORK MEASUREMENT
explanation is much like the one given with respect to rating consistency; it would never be possible to do more than determine the accuracy of one or more observers in using a rating procedure. If the problem involved more than one operation, a study would have to be made of the accuracy of the observers for each operation, assuming always that the required independent criterion could be developed. Quantitative Definitions. Some students, notably Gomberg, have suggested defining the "normal" worker in quantitative terms. Gomberg's view is that "the population concept from which a normal is to be derived must be related to the actual distribution of productivities as they exist, subject to some improvement in the low sectors of the dis18 tribution. . . Though commendable these suggestions would not solve the problem of accuracy since the fundamental dilemma would still exist; the remarks made on that subject apply regardless of how the "normal" worker is defined. While a quantitative definition is usually preferable to a subjective definition, there would be no need to make a time study if the performance characteristics of the "normal" worker could be established in any manner. The problem of developing a quantitative definition also deserves some comment. The task would be monumental from a technical standpoint and completely useless from an empirical standpoint. The statistical work alone would be staggering since it would ultimately involve sampling the work of industrial workers in all industrial environments. At, best, the result would define normal performance as some function of the mean production rate estimated for this global population. Such a definition would be of no practical value. It would have little, if any, relation to specific operations in specific plants. I t would also require frequent modification to keep pace with changes in technology, human skills, and so forth, and the monumental task would have to be repeated at frequent intervals. Even if these problems could be overcome, the results obtained would not have the properties desired. The concepts of "normal" worker and "normal" performance are intended to define what ought to be rather than what is. This means that the data would ultimately have to be transformed by value judgments on what is considered desirable. The fundamental problem of rating would thus reappear so that the process " Gomberg, A Trade Union Analysis of Time Study, p. 133.
T H E VALIDITY O F RATING
PROCEDURES
43
of definition would lead back to the original problem, though by a completely impractical and roundabout route. The difficulty here, as in so many problems, arises because the estimating function is confused with the evaluation function. It is true, of course, that quantitative estimates can be highly useful in work measurement. But estimates by themselves can never provide complete answers to questions of value; these questions must ultimately be answered by applying a process of evaluation. It is the specific purpose of the next chapter and, implicitly, of the book, to bring out the meaning of estimation and evaluation processes as they apply to work measurement problems.
CHAPTER
Work Measurement
FOUR
in Practice
The preceding chapter establishes that rating procedures have no scientific validity. This has led to a rather frenzied search for so-called "objective" rating procedures which are presumed to minimize or eliminate the weaknesses of earlier and more popular procedures. Though they have achieved some vogue, a close examination of procedures such as those proposed by L. A. Sylvester and Marvin Mundel shows that they have the same fundamental weaknesses as their predecessors.1 Sylvester disguises the basic value judgments involved by invoking certain rather naïve statistical notions.2 Mundel's procedure also can be shown to be invalid in a scientific sense, though it does give careful consideration to a number of objective determinants of work performance. THE TRUE FUNCTION OF RATING
The Specification-Setting Function. The fact that all rating procedures fail to be valid in a scientific sense strongly suggests that time study rating may well be intended to perform a nonscicntific function. This is indeed the case, and the function turns out to be a purely evaluative function. Rating procedures are intended to transform observed time study data to a unique value. This value is considered to represent the production rate that ought to be realized by a so-called ''normal'' worker presumed to have certain idealized performance characteristics. It follows that rating is intended to set up production standards or—to use a 1 Sylvester, The Handbook of Advanced Time-Motion Study; Mundel, Motion and Time Study. ' S e e Abruzzi, "A Review of The Handbook of Advanced Time-Motion Study," Industrial and Labor Relations Review, IV, No. 2 (1951), 307.
WORK M E A S U R E M E N T I N
PRACTICE
45
more connotative term—production specifications. The concept of "ought" is basic to all evaluation procedures which means that all specifications, including those derived from rating procedures, must ultimately be based on value judgments on what is desirable. That remains true even though only objective factors or parameters are considered. An evaluation must always be made as to the factors considered relevant in particular situations and how they are to be weighted. Implications of These Facts. All claims referring to the scientific validity of rating procedures, "objective" or otherwise, are therefore completely irrelevant. The persistent and almost defiant claims of the validity of rating can therefore only mean that the claimants are either ignorant of the true situation or that they are attempting to camouflage it. To the extent that it exists, the attempt at camouflage is presumably based on the desire to smother labor challenges. Essentially the same comments can be made about time study procedures for establishing fatigue and other allowances; they, too, turn out to have an evaluative function, again with the primary objective of arriving at specifications. These allowances need not be considered directly here, however, since the arguments made with respect to rating applies to them with equal force—though perhaps not with equal intensity. P R O C E D U R E S FOR S E T T I N G
SPECIFICATIONS
Procedures in Other Fields. There is a fundamental difference between rating and the procedures used for setting specifications, say, in quality control work. The difference is that in quality control work the specification-setting procedures are detached in so far as possible from estimating procedures. A set of sample results might be used, for example, to estimate the percent defective in a product. In applying estimation procedures for this purpose no essential value judgments are made and, in particular, no specification is established. In this case the estimation procedure is therefore independent, and it is only when some estimate has been obtained that the specificationsetting procedure begins. The specification is usually designed to take into account cost and other value parameters considered locally relevant. However, it may and often does take into account earlier estimates of product quality so that estimation and evaluation procedures may and do interact. But the interaction is sequential rather than simultaneous;
46
WORK
MEASUREMENT
a specification may legitimately affect a later estimate and an estimate may legitimately affect a later specification. It may be desirable because of, say, assembly requirements to stipulate that there be a maximum percent defective for a given product. This involves making value judgments on a number of questions, particularly as to what is acceptable quality. If, however, estimates show that the production system is incapable of yielding the required quality, the specification might be changed to allow a greater percentage of defectives. On the other hand, practical situations exist where this alternative is not feasible. For example, with certain ammunition products it is imperative to minimize hazards by having a near-perfect quality level. In that event an existing specification might have to be retained and the required quality achieved by other means, such as 100 percent inspection. This shows that specifications are usually dominated by factors external to actual estimates, though, it might be repeated, earlier estimates are often used in their development and revision. Once established, a specification is intended to determine the acceptability of product quality as estimated by an independent procedure. The Situation with Rating. In rating, the dominant evaluation component of time study, the situation is completely different. Here the evaluation procedure is wedded to the estimation procedure. In fact, rating is considered a basic part of the inferential process and, as such, is quite often used to rationalize utterly inadequate estimating techniques. It is also common practice to rate the mean times for individual operation elements. Later chapters show that time study yields mean time values with little, if any, precision. Any manipulation of these values is without meaning, particularly when the manipulation is not part of an estimation procedure. Applying rating procedures to element data has another fundamental weakness. One of the principal results established in this book is that element time values are not usually independent. Elements, therefore, cannot be considered as distinct entities; rating them as though they were is simply arithmetic nonsense. There is another fundamental difference between rating and other evaluation procedures. In any given rating procedure there are a limited number of arbitrary factors with equally arbitrary weighting systems. This means that, if a given rating procedure is applied in two different operating situations, the same factors and weighting systems
WORK MEASUREMENT IN PRACTICE
47
would have to be used, even though they probably would not be equally appropriate in both situations. This is to be contrasted with the approach used in other fields where—as the quality control examples show—the factors and weights reflect the needs of the local environment and, indeed, the particular situation. CHANGES NEEDED IN RATING VIEWPOINT
Since setting specifications is its real function, rating should be modified in approach and procedure so as to do this in an optimal manner. To do this, it will be necessary in work measurement to separate rating from the estimation component. The grossly inadequate estimation procedures now used would then be revealed to be just that— grossly inadequate. New estimation procedures will have to be adopted, capable of yielding estimates that are precise and accurate. Thus far the specification-setting process in work measurement is similar to the processes in other industrial areas. The similarity ends here, however, because of the peculiar status of production specifications in the industrial society. This may best be seen by comparing the behavioral aspects of work measurement and quality control. Basically; the worker doesn't affect product quality nearly as much as he affects rate of production. This is largely due to the fact that the technical aspects of work are designed so as to insure well-defined and limited quality ranges. On the other hand, a worker exercises at least a limited influence, and often a dominant one, on production rates. The worker's welfare is affected to a much greater degree by production specifications than by quality specifications. He is unhappily affected by a production specification that is stringent or "tight," while he is happily affected by a specification that is lenient or "loose." The situation is greatly intensified by the usually intimate relation between production specifications and earnings; thus favorable specifications often yield a substantially greater earning opportunity than unfavorable specifications. This is why workers, as individuals and as formal groups, adopt definite behavioral stances with respect to time study and its results. This also is why time study is reduced to a component of bargaining between management and labor. The Role of Labor. The result is that workers will take an extremely keen interest in production specifications and in the predecessor rating
48
WORK
MEASUREMENT
procedures. Since specifications are ultimately based on value judgments, workers can justifiably argue that their representatives also have a right to develop a set of specifications. Labor's specifications will naturally be based on factors and weighting systems that tend to satisfy labor objectives, just as the specifications developed by management tend to satisfy management objectives. It is customary, however, for labor's representatives to adopt the rating factors used by management because weights and scores can be sufficiently manipulated to suit labor objectives. Labor's representatives may well decide not to avail themselves of the right to develop a specification when, as in most cases, this turns out to be in their best interest. To sum up, both management and labor are vitally concerned with production specifications and how they are established. The fact that specifications are the product of value judgments makes it clear that each party has the right to develop its own judgments and specifications if it so chooses. That each party can exercise that right is the reason why time study specifications can be considered from the viewpoint of the theory of games.' SYSTEMATIC BIASES IN RATING PRACTICE
I t has repeatedly been stated here that time study is actually a component of bargaining, primarily because of the evaluative nature of rating. Confirmatory evidence was obtained in a series of experiments based on an industrial film used for rater training. This film was shown to two sets of observers: (1) five time study engineers with at least one year of experience; (2) ten other observers with no previous work measurement experience. The observers used the Presgrave "effort" rating procedure to evaluate the workers in 25 distinct work scenes.4 The Results and Their Implications. The data were analyzed with the aid of two indexes. The first was constructed by comparing the observed ratings to certain so-called "true'' ratings arrived at by a group of expert time study engineers. The index proper was constructed by taking the difference between the observed ratings and the "true" rating, and dividing the result by the "true" rating. 5 4
See, for example, McKinsey, Introduction to the Theory of Presgrave, The Dynamics of Time Study, pp. 76-107.
(lames.
WORK MEASUREMENT I N PRACTICE
49
This index would be called an index of accuracy by those who consider "true" ratings of the kind described to be an independent criterion of accuracy. Since, however, rating is an evaluation procedure, the index is used here simply as a convenient base for comparing the results of the two groups of observers. The index has no significance beyond that of simplifying the presentation; the conclusions would have been exactly the same even if the "true" readings had not been introduced. In a typical experiment the average index value turned out to be —9.7 percent for the experienced observers and —4.18 percent for the inexperienced observers. The experienced observers had lower ratings, on the average, in 16 out of the 25 work scenes. The experienced observers therefore had relatively lower ratings than the inexperienced observers, with both groups on the low side of the "true" ratings. The final conclusion is that the experienced observers had a uniformly greater bias in the direction favorable to management than the inexperienced observers. The second index was developed by computing the standard deviation of the ratings for each group of observers around the corresponding mean value. This gives an index of consistency whose numerical value was 6.69 for the experienced observers and 10.68 for the inexperienced observers. After allowing for the unequal number of observers in the two groups, the experienced observers were clearly more consistent than the inexperienced observers. In view of the evaluative nature of rating, the two indexes are simply indicators of the degree and uniformity of judgment bias, i.e., prejudice in favor of a certain result. The results therefore mean that the experienced observers had developed a more uniform and a greater bias in favor of management than the inexperienced observers. The immediate explanation is that, in practice, the experienced observers had individually found it advisable to use rating judgments favorable to management. Apparently many practitioners are at least intuitively aware of the value-judgment base of rating as well as the ultimate bargaining nature of time study. Two other results obtained in these experiments can be interpreted in a similar manner. The fact that extremely high ratings were avoided, particularly by experienced observers, suggests an attempt to avoid production specifications highly favorable to workers. A somewhat less pronounced tendency to avoid low ratings likewise suggests an attempt
50
WORK
MEASUREMENT
to avoid specifications so unfavorable to workers that they would be almost sure to protest. Added Evidence of Systematic Biases. A number of other experiments were made with the same rating films. However, in these cases the observers were asked to use the "effort" rating procedure in two stages. They were first asked to rate the workers in the various work scenes run at the customary speed of 960 "frames" a minute. After a few minutes they were asked to rate the same work scenes run at a speed of about 770 "frames" a minute. The most striking result was that, for the second running, the work scenes were rather uniformly rated downward by the individual observers in the experienced group. However, the ratings were not reduced nearly to the degree expected on the basis of the reduced speed of projection. The experienced observers apparently had developed stereotyped images of what relative pace should be. These images were not distorted by the first running at normal projection speed. The same images persisted during the second running, even though the observers had been told to reformulate their concept of "normal" performance in terms of a target work scene. This helps confirm the view that experienced observers develop rating biases that are relatively firm and uniform. The rating images of the inexperienced observers, on the other hand, were much less firm. They found it easier to modify the concept of "normal" pace during the second running and, on the average, their ratings were rather close to what would be expected in terms of the reduced speed of projection. Also, their ratings varied group-wise to a somewhat greater degree than the ratings of the experienced observers. Once again the rating biases of inexperienced observers turned out less firm and less uniform than the rating biases of experienced observers. In the second running, two of the experienced observers reduced their ratings substantially more than seemed warranted on the basis of the reduced speed of projection. They explained this by saying that they were attempting to avoid consistently high ratings resulting from a "carry-over" effect. The conscious attempt to allow for a recognized source of bias in one direction thus led to bias in the other direction. T H E N A T U R E AND OUTCOME O F T H E
"GAME"
These experimental results help to show that time study is indeed capable of being described within the framework of the theory of games.
WORK M E A S U R E M E N T IN
PRACTICE
51
To develop this thesis it will be necessary to motivate and make certain assumptions. The first assumption is that the estimation component of time study yields constant mean time values. Disputes rarely arise over observed time values, and, in the sense of acceptance, mean time values can be considered as constants with no real effect on the specification-setting game as it is currently played. Other Assumptions. The games description for setting production specifications is primarily intended to put the bargaining process involved into a convenient conceptual framework. This brings out certain aspects of the process which would otherwise be obscure; it also gives a simplified and sharp account of what actually happens during the bargaining. But these advantages can be gained only at the expense of making a number of simplifying assumptions. The first is that the bargaining process is essentially the same regardless of the local situation. This, of course, is not completely true since each local bargaining environment has unique problems. The second assumption is that the bargaining process remains constant over time. This, too, is not completely true since the outcome of earlier bargaining situations in a given environment will directly affect the strategies used in a current bargaining situation. The third assumption is that the bargaining process can be considered a zero-sum two-person game. In essence, this means that a constant amount of monetary, and other rewards, is available to the parties, and one party gains only at the expense of the other. Like the others, this assumption is only approximately satisfied in real life. However, all three simplifying assumptions are realistic in the sense that they do represent the general bargaining situation with respect to time study specifications. The First Game. Two principal games can be distinguished under current time study conditions. The first occurs when management's strategy is to act as though time study is scientific, while knowing that it is not. Labor's strategy is to act as challenger and critic, also while knowing that time study is not scientific. Such being the case, management must make the first move of establishing a specification. In doing this it will attempt to introduce a bias in its own favor. However, this cannot be so overt that labor is likely to reject the result as being out of hand. The result is that management will not always be able to attain the result it seeks. This is especially true in view of the difficulty of arriving at uniform rating results. The
52
WORK
MEASUREMENT
outcome is that certain specifications will be relatively more favorable to workers. Labor's move is not overt but conditional; its move depends on which outcome appears to be true. If a given specification appears relatively favorable to the workers, its move is to accept rather than to challenge; if not, it challenges by instituting either a formal or an informal grievance. In the event of a challenge, the final outcome depends on the relative bargaining strength of the two parties but, in general, an adjustment is made so that the specification is more acceptable to the workers. In the long run, then, the final outcome of the game under the stated conditions is generally favorable to labor. It would seem advantageous for management to invite labor to develop simultaneously a set of ratings and specifications so that labor, too, would have to make an overt and unconditional move in the game. In that way the long-run outcome would probably not be so heavily weighted in labor's favor. A frank avowal that rating and specification-setting are not scientific would itself be of advantage to management since it would prevent successful challenges simply on that account. The only time when it might be advantageous for management to claim its specifications are scientific is when labor does not know that the claim is false. The claim may then be sufficiently persuasive to prompt unquestioning acceptance of management's results, and consistent biases would favor management in the long run. That strategy, however, is extremely risky since it may easily backfire when labor becomes aware of the true situation. The Second Game. The second principal game occurs when labor's strategy is to seek and enjoy the right to establish a competitive set of specifications—but only after a challenge. This strategy represents the minority viewpoint of certain labor spokesmen who are generally aware that time study is ultimately a component of the bargaining process. Management again adopts the strategy of claiming its specifications are scientific, and it makes the initial move of developing a set of specifications. Labor again makes a conditional move. I t accepts specifications that are relatively favorable to the workers, just as it does in the first game. However, labor's alternative move is to develop counter-specifications when management's specifications are relatively unfavorable to the workers. When this happens the procedures and parameters used by labor are
WORK MEASUREMENT IN PRACTICE
53
usually the same as those used by management. But the rating weights and scores are manipulated so that the labor specification is more favorable to the workers. The two specifications are then used as a base for a second game where the parties resolve their differences by accepting what is usually a compromise specification. Advantages of the Non-overt Challenger Strategy. The principal difference between the two games is that labor's challenge in the first game is not overt and takes place at the specification level, while its challenge in the second game is overt and takes place at the time study level. There are some distinct advantages in the non-overt challenger strategy. Since the same procedures are used by both parties, labor specifications may not differ much from management specifications, especially when labor's specification writers have been trained by classical manuals with a management-oriented viewpoint. Labor's representatives also may not fully understand the time study process, and they may unwittingly accept certain aspects as unchallengeable. The result might well be compromise specifications that are not completely acceptable to the workers. Even with the best training and the most earnest intentions, however, the value-judgment base of the time study process will inevitably lead to an imbalance among specifications so that some are relatively less favorable to the workers than others. It is perhaps equally inevitable that the workers directly affected will consider their representatives largely responsible. It is also true that many workers have adopted the stereotyped view that time study is primarily intended to serve management's purposes. They look upon time study activity by their representatives with suspicion and even resentment. Because of this labor representatives lose considerable prestige and worker support when they agree to participate in the time study process—even to the limited extent of overt challenge. It is largely because of this that a growing number of labor unions have adopted the non-overt challenger strategy, especially after having had experience with alternative strategies.5 A PROPOSED BABGAINING
STRUCTURE
External Conditions. Bargaining games over production specifications can readily be designed which are capable of yielding greater long-run • See, for example, Smith, Technology and Lahor.
54
WORK
MEASUREMENT
advantages to both management and labor. Describing how such a game might be played is one of the primary objectives of this book. As in the present games, it is convenient in formulating a proposed game to consider the external conditions essentially fixed. That assumption is clearly not satisfied in a dynamic sense since general economic conditions, industry conditions and many other external factors will have a decisive effect on the relative strategies of the parties and, ultimately, on the outcome of the game. The attitudes of the general public, both as consumers and as the seat of public opinion, will likewise have an effect on the relative strategies of the parties and the outcome of the game. This means that public policy will also have to be considered by the parties in their bargaining behavior. Indeed, if a fully dynamic description were required, the proposed bargaining game might well be expanded into a multi-person game. The external society, in terms of consumer or government agencies, could readily be included as active players in such a game. However, primary interest here is in individual plays of the game, i.e., individual bargaining situations between management and labor. For this essentially static case it is reasonable to assume that external conditions are essentially constant so that the game may be confined to the two immediate parties. Rationalizing New Internal Conditions. The internal conditions for the proposed game will be quite different from the conditions currently prevailing. The principal difference is that the estimation component of work measurement will be detached from the evaluation component. The estimation process will then have to yield stable estimates with a well-defined degree of precision and accuracy. The estimates obtained in time study are of trivial importance compared to the evaluations made in rating. This helps explain why time study estimates are considered to be essentially constant and why estimates have received scant attention in current games. This book shows, however, that time study estimating procedures yield results that are not only not constants but may even fail to have much meaning. The final result is estimates that are blindly favorable either to management or to labor. On the other hand, precise and accurate estimates would be of incalculable advantage to both parties. Such estimates mou Id remove blind estimation biases from specifications and. in the process, supply the
WORK MEASUREMENT IN PRACTICE
55
parties with highly useful information. True, even precise and accurate estimates are variables in the individual case. But this is of no ultimate consequence because valid estimates do act as constants in the long run. There are other important reasons for separating the estimation function from the evaluation function and, hence, for developing precise and accurate estimates. The industrial society needs precise and accurate estimates of production rates. That need is becoming increasingly acute as industrial technology moves toward an ever-greater degree of integration. The result is that the production activity of a plant cannot be planned and scheduled in an optimal manner unless such estimates are available for the component processes.· Indeed, the absence of such estimates may well lead to a breakdown of the production activity. Optimal production planning and scheduling is not the only reason why estimates with these properties are needed. They are also indispensable for estimating and controlling the costs of labor, materials and products, and, indeed, for the optimal performance of most other managerial functions. TL· Two Evaluation Processes. The proposed game requires two distinct though not unrelated evaluation components. In the first, management makes use of the estimates obtained for essentially managerial purposes, such as planning and scheduling production activities. Here the estimates help management to determine whether changes are necessary in the production activity in order to satisfy certain managerial objectives. In this context the estimates provide highly valuable guides to managerial action in terms of what might be defined as managerial production specifications. I t stands repeating that estimates interact sequentially with specifications in obtaining an optimal end-result. A similar process currently takes place in which time study data are used as a base. The results are far from optimal, however, precisely because time study data are not capable of yielding estimates with the required properties, if indeed they can be said to yield estimates in any meaningful sense. The second evaluation component of the proposed game takes place when management and labor jointly apply the estimates to help establish what have thus far been called production specifications. These specifications might well be redefined as labor production specifications, ' See, for example, Salveson, "A Mathematical Theory of Production: Planning and Scheduling," The Journal of Industrial Engineering, IV, No. 1 (1953), 3-6, 21-23.
56
WORK MEASUREMENT
thus differentiating them from managerial production specifications. It is this evaluation component that is to be substituted for the current bargaining games between the parties. Specific Rules and Strategics. Within the restrictions imposed by the external society, the basic rules and strategies of the game proper should be established by the parties themselves. Certain guaranteed annual wage proposals indicate, for example, that determinants other than direct production units might come to be used as a primary basis for establishing rates of pay. Indeed, this appears to be an inevitable result of the widespread and rapidly-growing movement toward highly integrated industrial operations. Rates of pay would then not be a direct factor in individual games over labor production specifications though, of course, they would be in the much broader game of basic negotiation between management and labor. This question is considered explicitly here because a close relationship between rates of pay and production rates has a decisive effect on the application of any work measurement theory. If that relationship is minimized, it would tend to stabilize the behavioral aspects of the work environment and yield the essentially constant conditions needed for making valid long-term estimates. But even if the relationship were not minimized, the proposals made in this book would still apply. There would be one problem: new estimates would have to be obtained every time the wage-payment plan— or any other dominant variable—were changed in a fundamental way. Such a change would greatly intensify the sequential interaction that always exists between performance and specification. The function of the new estimates would be to measure the effect of that change in what becomes essentially a new working environment. To return to the immediate question, local management and labor would, once the basic rules have been established, design specific strategies for playing the game. There is an almost infinite number of possible strategies, depending on the relationship between the parties, the nature of plant and industry, the economic and social strength of the parties, and so forth. It is quite clear, however, that labor will be interested in maximizing the occupational factors that enhance the workers' welfare. Management, in turn, will be interested in maximizing profits, production rates, and other factors that enhance its welfare.
WORK MEASUREMENT IN PRACTICE
57
A good deal more might be said about this relatively unexplored subject. It would seem desirable, for example, for management and labor to adopt strategies of collateral action that would increase the immediate monetary and other awards available to them. This would have the effect of generating an enlarged reward pool so that the game becomes a positive-sum game rather than a zero-sum game. This would make it possible in turn for each party to obtain a larger share than it is able to obtain under the essentially adversary strategies currently used. Approaches of this kind have, in fact, led to essentially that result when they have been applied on a broad level.7 Though tempting, an excursion into this subject is out of place here. But it is appropriate to remark that management and labor have only begun to visualize the rich opportunities made available by a collateral approach to problems of this kind. It is not the most trivial of the indictments of time study that it is largely responsible for incentive situations that act as incentives to restrictive practices and adversary strategies. This is only to be expected of a point of view which holds that minds and hands can be procured by applying an archaic and ignoble doctrine of economic survival. The Two Alternatives. It is not the privilege of any writer—if indeed one could be that persuasive—to impose on society any theory that vitally affects its welfare. The decision as to which theories are acceptable must rest with the society itself, particularly the groups and organizations immediately concerned. It is emphatically within a writer's domain, however, to indicate the consequences that seem to flow from alternative theories and courses of action. In work measurement the first alternative is to continue using time study procedures. One basic fact should be made unequivocally clear, however, if the industrial society does decide to retain time study procedures in essentially their present form. I t should be clearly understood that such procedures are scientific in no sense; they are and should be labeled as components of the bargaining process between management and labor. It is sometimes argued that time study procedures should be retained because they involve less time and difficulty than alternative procedures. If this argument were taken at face value, it would mean that pure 7
See, for example, Lineóla, Lincoln's Incentive
System.
58
WORK
MEASUREMENT
guesswork is preferable to time study procedures. The fact is that information cannot be purchased on any bargain counter, no matter how persistent the advertising. The real question is whether precise and accurate estimates are really needed. All indications are that the industrial society has answered yes to this question by promoting or, at least, acquiescing in the rapidly expanding movement toward integrated production systems. Such systems simply cannot be made to work without information of that kind. I t is pure fantasy to expect that complicated technologies and their rewards can be procured without payment. The currency of that payment is valid estimates. To repeat, the real question is whether industry is to continue to improve its technology so that society can enjoy potentially unlimited material and other rewards. This requires scrapping what would then clearly be obsolete information-gathering procedures. That is the issue and it must not be obscured by those—and there are all too many—both in management and in labor who have a direct interest in retaining time study more or less in its present form. Nor should the issue be obscured by those who have made a vogue out of decrying the deficiencies of time study and, with even greater vehemence, decrying any attempt to overcome those deficiencies. Their plea is that work measurement problems can only be solved by what is presumed to be a pragmatic approach; this always seems to mean an approach like time study modified just enough to suit their prejudices.
PART
II
A Work Measurement
Theory:
Procedure, Application, and Results
CHAPTER
The Problems of Time
FIVE
Measurement
The time measurement problems under the proposed theory are essentially the same as those existing in time study. Data on these problems were obtained from a large number of experiments. In keeping with the policy outlined in the Preface, however, technical material and direct experimental data will be minimized in the interest of providing a selfcontained presentation in essentially descriptive form. Readers interested in technical and experimental details should consult Work Measurement. STOP-WATCH MEASUREMENT
METHODS
Both time study and the proposed work measurement theory make fundamental use of the stop-watch for time observations. There are two common methods of using the stop-watch. In the continuous method the watch is allowed to run continuously while the observer takes readings. In the snap-back method, on the other hand, the watch hand is returned to zero at the end of each element. Representative Views. There are widely divergent views in time study literature on which of these methods is preferable. In Chapter 2, for example, it is pointed out that Carroll advocates the snap-back method on the basis of what turns out to be invalid reasoning.1 On the other hand, Schutt advocates the continuous method, primarily because the snapback method involves observational errors.2 The same view is held by most other time study writers, along with most of the practitioners canvassed by the survey described earlier.3 1
Carroll, Time Study for Cost Control, p. 70. • Schutt, Time Study Engineering, p. 54. • Abruzzi, "A Survey of American Time Studv Practices," Time and Motion Study, II, No. 1 (1952), 11-25.
62
A WORK MEASUREMENT
THEORY
Their basic argumente are that: (1) the snap-back method often yields consistently high or consistently low readings; (2) the error in the elapsed study time may be quite large; (3) observers are tempted to adopt the biased practice of repeating values already observed in making observations of the same operation component. A convincing argument in favor of the continuous method is presented by Lichtner. 4 He feels that since a worker moves continuously from one motion to another, it is not good practice to time each element separately in the manner prescribed by the snap-back method. Full confirmatory evidence for this view is given in Chapter 14, which shows that only independent operation components can be considered as distinct units in the sense required by the snap-back method. In addition to the completely untenable Carroll argument, the minority advocates of the snap-back method counter by arguing that: (1) it does away with the clerical time required to transcribe readings obtained by the continuous method; (2) delays can readily be disregarded; (3) transposed operation components can easily be recorded; (4) the observer can decide, on the basis of the consistency of the values already obtained, when to stop taking readings. The Need for Further Inquiry. The first argument is rather trivial, and the others are without merit. Delays and transposed components yield highly useful information about work characteristics and they should therefore be recorded as they occur. Fortunately, few writers and practitioners accept the fourth argument. It is simply a gross and intolerable perversion of experimental procedure to suggest that the number of observations should depend on the consistency of the data already obtained, especially when that consistency is invited to be a manipulated consistency. It would be reasonable to ask why the subject needs further inquiry when the issue seems to be settled in favor of the continuous method. There are a number of reasons why such an inquiry is desirable. A measurement method may have advantages which are unrecognized even by its proponents. This view takes on special import here because, until recently, there has been little experimental evidence on the comparative merits of the two stop-watch methods. Recent experimental work also takes in all the significant problems involved in making stop4
Lichtner, Time Study and Job A nalysis, p. 168.
T H E PROBLEUS OF TIME MEASUREMENT
63
watch observations. This makes it possible to look at the problems of time measurement as a whole. Experimental Data and Its Implications. Rogers is prominent among those who have looked into the snap-back method. His results are based on an extended series of laboratory experiments using time study students as observers.5 In his view the snap-back method promotes carelessness and a tendency to anticipate readings. Rogers also observed definite tendencies to overestimate or underestimate readings; indeed, the elapsed study times in individual cases were off by 5 to 20 percent in each direction. In the main Rogers's results have been verified and extended by the more recent experiments. In these experiments, however, there were no overestimates of the total elapsed time. This seems quite plausible, for in a snap-back study a certain amount of time must be expected to be lost. The most reasonable explanation for the Rogers overestimates is that his observers were neophytes with substantial positive biases in reading the stop-watch. Many of the recent experiments were made in the laboratory on certain assembly operations; these operations ranged from a simple bolt-and-nut assembly to rather elaborate toy and fuze assemblies. In the bolt-and-nut assembly case there were two observers, with one year and ten years of experience, respectively. In the other experiments there were about as many experienced observers as neophytes. Film records also were taken in certain cases; these supplied standard time values as criteria of accuracy and consistency. The observed differences between experienced observers and neophytes were primarily differences in degree rather than kind. For example, the neophytes had a good deal more manipulative difficulty in applying the snap-back method, especially with short operation components or elements. Many of them also had a somewhat greater degree of bias in estimating element times. In some cases these biases were consistent enough to cut down or add to the mechanical loss in the total elapsed time. However, the elapsed study time was always greater with the continuous method. Differences among the experienced observers had a similar pattern, although, as might be expected, there was a much smaller spread. 1 Rogers, "Making the Stop Watch Observation," in Proceedings of the Time and Motion Study Clinic, Nov., 1941, pp. 12-13.
64
A WORK MEASUREMENT
THEORY
More Specific Results. Because experimental differences were primarily a matter of degree, the results can be summarized in terms of three of the experiments on the bolt-and-nut assembly operation. This operation was made up of three elements, each less than 5 hundredths of a minute long, adding up to a cycle time of about 12 hundredths. The experimental situation was such as to accentuate the difficulty of using the snap-back method. In the first experiment the snap-back method, as used by the observer with one year's experience, missed two cycles completely, along with several elements and delays. On the other hand, the observer with ten years' experience had no appreciable manipulative difficulty with the continuous method. However, the snap-back results were so unsatisfactory that a proposed series of statistical tests was abandoned. In the second experiment the observers interchanged methods with the result that there was no manipulative difficulty. A series of statistical tests was run here, first on independence. This showed the time values for the operation elements to be mutually independent in each case, though with a somewhat stronger indication in the snap-back case. In the other tests, the differences between the mean cycle times were found to be somewhat smaller than the differences between the mean element times. The tests also showed that the two methods, in the hands of these two observers, had essentially the same degree of consistency with both elements and cycles. The third experiment in the series was made under essentially the same conditions, and it confirmed the results obtained in the second experiment. The mean element times and, to a considerably stronger degree, the cycle times were consistently smaller in the snap-back case. There was a cumulative time loss, in a study lasting 225 hundredths, of fully 28 hundredths. This is a dramatic example of the substantial cumulative error inherent in the snap-back method—even in comparatively short studies of about 20 readings. Again there was, in the snap-back case, a somewhat greater degree of independence among element times—a conclusion confirmed by most of the other experiments. This means that the snap-back method destroys the basic (time) relationships that usually exist among operation elements. The principal factor, of course, is the time loss resulting from the repeated proccss of returning the watch hand to zero. It is also true
THE PROBLEMS OF TIME MEASUREMENT
65
that some observers appeared to succumb to the ever-present temptation to repeat the same readings for a given element. In fact, the observed degree of independence seemed to be a direct function of the existence and magnitude of this sort of bias. Another result was that, with both stop-watch methods, the extremely short first element had nearly constant time values. This kind of result was obtained in a large number of experiments, illustrating a general experimental fact: constant readings appear when the operation component—whether an element or a motion—is not much larger than the smallest recordable measurement unit (usually one hundredth of a minute in the case of the stop-watch). Over-all Conclusions. Considered as a body the experiments show that the snap-back method has two immediate and overt disadvantages. It requires much more manipulative ability, as indicated, for example, by the difficulty of recording delays and extremely short elements. It also leads to substantial underestimates of the elapsed study time. The more subtle disadvantages of the snap-back method were brought out by statistical test. Mean element and cycle times are underestimated in most cases, and is ultimately why the degree of independence seems greater with the snap-back method. Somewhat smaller percentage differences exist between the two methods at the cycle level, largely because element underestimates and overestimates tend to balance out. Since mean time values are indicators of accuracy, these results imply that the snap-back method is less accurate than the continuous method. The balancing effect at the cycle level seems to be a uniform characteristic of stop-watch studies, continuous as well as snap-back. Since there is no gap in a time sense between elements, the balancing effect is usually much more prominent in the continuous case. The balancing effect is a dramatic example of another apparently general experimental fact: in a sequential series of observations biases are interdependent rather than independent. This does not always lead to a canceling effect, as it does in the time measurement case. It often shows up as a systematic set of biases in the same direction—a situation often encountered in, say, merit rating where it is sometimes labeled as a "halo effect." To return to the immediate question, the two stop-watch methods have essentially the same degree of consistency. This is not invariably true, however, for the snap-back method sometimes appears to give
66
A WORK MEASUREMENT THEORY
more consistent results than the continuous method. This is especiallylikely to occur with short elements and inexperienced observers, suggesting that the consistent time losses of the snap-back method tend to reduce variability. An auxiliary explanation is the fact that some snapback observers have a tendency to report the same readings for a given element. These explanations are supported by the results of experiments in which time records were obtained on film. Here the snap-back results seemed also to have a "greater" consistency than the consistency exhibited by the film time data. In essence, then, the snap-back method transforms the stop-watch to a measuring instrument with fundamentally different properties. According to the experimental results the snap-back method can be justified only with relatively long operation elements or cycles previously shown to be independent. A certain degree of bias in the accuracy sense and error in the consistency sense must be expected even then; thus it is never advantageous from a strictly measurement viewpoint to use the snap-back method. LENG'S STUDIES AND THEIR IMPLICATIONS
An informative piece of laboratory research on time measuring instruments has been done by Richard Leng.' Leng used operations constructed by cementing together film strips made up of elements lasting from 2.50 to 10 hundredths of a minute. These synthetic film operations were then projected at a constant speed of 960 "frames" a minute; this gave standard times for evaluating the observed results. The observation team was made up of nine observers having from six months to ten years of stop-watch experience. The observers were also trained for a short time on the use of the marsto-chron and the wink-counter, the two other instruments considered.7 Actual observations were taken on three separate days, with each observer rotating from one instrument to another. Leng's Primary Conclusions. The stop-watch data, which were obtained by the continuous method, showed that the statistical distribution of an element was a function of both its size and its neighbors. The element distributions were generally not symmetrical and somee
Leng, Observaliorud Error and Economy in Time Study. The construction and use of the marsto-chron and the wink-counter arc described in Morrow, Time Study and Motion Eronomy, pp. 88-90. 7
T H E PROBLEMS OF TIME MEASUREMENT
67
times had two modes or peaks, especially with short elements. On the other hand, the marsto-chron readings were uniformly symmetrical, while the wink-counter readings usually had only one mode and were frequently symmetrical. Another set of conclusions was derived after the more than two thousand readings obtained with each instrument were pooled. The pooled data showed that the standard deviation of the stop-watch readings was much greater than that of the wink-counter readings, and that the standard deviation of the wink-counter readings was much greater than that of the marsto-chron readings. Leng concluded from this that the marsto-chron is more than 6 times as consistent as the wink-counter and more than 33 times as consistent as the stop-watch. Comparative accuracy was then evaluated by using an index based on the average difference between the observed means and the film values. According to this index the stop-watch was much less accurate than the wink-counter, but the marsto-chron gave by far the most accurate results. A Review of Leng's Data. A close view of Leng's summarized data shows that in using the stop-watch the nine observers had widely different mean and standard deviation values. This also applies to the wink-counter data, though the variations here were somewhat smaller. In both cases the differences accounted for at least 50 per cent of the total variation. With the marsto-chron, however, the nine observers had almost identical readings. This means that with the stop-watch and the wink-counter the standard deviation was heavily inflated by pooling but with the marsto-chron the standard deviation was only slightly inflated by pooling. The result was that Leng used greatly inflated index values in the first two cases but not in the third. What he took to be an index of consistency was in effect an index of observer differences. These essentially statistical findings can readily be explained in empirical terms. Leng's observers had widely different degrees of experience with the stop-watch—an instrument whose effectiveness directly depends on the observational and manipulative ability and, hence, the experience of the observer. But the marsto-chron requires just a simple mechanical tapping of two keys so that degree of experience has scarcely any effect on the result. The wink-counter is much like the stop-watch in the sense that it, too, requires that readings be taken and recorded
68
A WORK MEASUREMENT
THEORY
in a dynamic manner. But no manipulative ability is required with the wink-counter BO that much less is demanded of the observer. These facts also explain Leng's conclusions on accuracy. Thus the mean values obtained by inexperienced stop-watch observers were quite biased. These biases had a pronounced effect on the pooled mean value for the stop-watch and made it seem inaccurate. But there were only slight differences among the marsto-chron mean values, and the pooled mean value turned out to be more nearly unbiased and, hence, accurate. A similar argument would show why the wink-counter yields a pooled mean value whose accuracy falls between that of the stop-watch and that of the masrsto-chron. Leng's index of accuracy, therefore, did not measure the inherent accuracy of the instruments, but the joint accuracy of observers using the instruments. Implications. Leng's data really show that observer differences will be greater with the stop-watch than with the marsto-chron, with winkcounter differences falling somewhere in between. This is an extremelj' useful finding; it shows that measurement accuracy and consistency depend to a much greater extent on observers than on inherent instrument characteristics, if indeed such characteristics can ever be fully discovered. Viewed on an individual observer basis, Leng's data do suggest that the marsto-chron is more accurate and more consistent than the winkcounter and the stop-watch, especially for short elements. The differential, however, decreases rapidi}' with the larger elements and becomes quite small with Leng's largest element of 10 hundredths. Modified in this manner Leng's findings generally agree with the findings suggested by the experiments considered a little later on in this chapter. It should also be recalled that the operations considered by Leng were completely synthetic. The same element structure was repeated identically from cycle to cycle, and there were neither delays nor any of the other dynamic factors to be expected in industrial operations. The observers could readily have memorized the element structure and introduced a bias into their results. The absence of dynamic factors also means, of course, that Leng's results do not reflect measurement performances to be expected under factory conditions.
THE PROBLEMS OF TIME
F U R T H E R LABORATORY
MEASUREMENT
69
STUDIES
It should be clear by now that the effectiveness of a measuring instrument ultimately depends on how it is used and by whom. For that reason it is preferable to speak of a measurement method which is defined not only by the instrument but by the way it is used. The winkcounter, for example, is appreciably more accurate and consistent in a camera field than it is in a visual field. It can be read to the nearest twothousandths of a minute in a camera field and probably only to the nearest hundredth visually. To obtain more information about comparative measurements, an impressive number of experiments were run as part of the time measurement research program considered in this book. These experiments were made on the marsto-chron and the stop-watch (using the continuous method). In most cases, film records of wink-counter readings also were obtained as time standards. The use of the wink-counter in a camera field made it possible to expose 100 feet of film at one time. Even then only an extremely small number of cycles could be observed, and just 16 cycles could be recorded with the extremely short bolt-and-nut assembly. Even fewer cycles were recorded with the other operations, many of which were also studied in the snap-back experiments. Typical Results. As before, the variations in results from experiment to experiment were largely a matter of degree. For that reason it is unnecessary to consider more than a single set of results. These results were obtained with a bolt-and-nut assembly which was made up of three elements similar to those considered earlier, but lasting this time about 25 hundredths of a minute. In this set of studies wink-counter film readings were obtained, sometimes with marsto-chron readings, sometimes with stop-watch readings, and sometimes with both. The studies showed that numerous short delays, though detected by the film, were not detected either by the marsto-chron observers or by the stop-watch observers. Longer delays were detected, but they were overestimated or underestimated by the stop-watch observers and, to a somewhat smaller degree, by the marsto-chron observers. This suggests that what is identified as a delay depends upon the mode of observation and that it is difficult to determine the endpoints of a delay when an observer's judgment intervenes.
70
A WORK MEASUREMENT THEORY
In the experiments simply comparing the stop-watch with the winkcounter, there were two observers, each having one year of stopw atch experience. In all cases the larger elements had more symmetrical distributions, with the cycles having the greatest degree of symmetry. Instrument-wise, the wink-counter data was substantially more symmetrical than the stop-watch data. Symmetry, then, is a direct function of both the measurement method and the size of component measured. This suggests that relative sensitivity rather than absolute sensitivity is the decisive factor. Relative sensitivity can be defined as the ratio between the size of the thing being measured and the basic measurement unit. The experiments also supplied evidence on the related subject of apparently constant times. The second element, which lasted about 5 hundredths of a minute, appeared to have essentially constant stopwatch times but clearly symmetrical wink-counter times. This, too, is the result of the much greater relative sensitivity of the wink-counter. The film records were then used to compare the results of the two stop-watch observers. I t turned out that one of the observers had consistent underestimates with the second element and equally consistent overestimates with the third element. However, these biases balanced out sufficiently well to make the cycle times comparable to the cycle times of the wink-counter. The same conclusion applies to the second observer, though his biases were somewhat less pronounced. Apparently stop-watch observers must be expected to have essentially unique biases in defining element endpoints and also in making observations, especially with short elements. This shows that the mechanics of the balancing process depends on the observer. The balancing of biases at the cycle level in both cases is added evidence that biases in sequential observations are interdependent. In each case the element times were found by statistical test to be mutually dependent. The two stop-watch observers also had about the same degree of accuracy as the wink-counter for the first element (as well as for the complete cycle); in these cases the observers could be considered unbiased. The consistency of the two stop-watch observers was equivalent to the consistency of the wink-counter for all elements. This dramatizes the fact that differences among alternative measurement methods are usually confined to accuracy differences resulting from observer and
71
THE PROBLEMS OF TIME MEASTJREMENT
other biases. With minor exceptions alternative measurement methods t u m out to have essentially the same consistency. Another experiment that deserves direct attention was made on the same operation: the observations were made by a stop-watch observer with one year of experience and a marsto-chron observer with no previous time measurement experience. A set of standard time values was again obtained by making film records of wink-counter readings. The numerical results are given on Table 3; the unit is one hundredth of a minute in accordance with the general policy stated in Chapter 3. TABLE 3 .
1
4.40 4.00 3.90 4.15 4.45 3.95 3.75 3.40 5.20 3.80 3.60 3.95 3.65 3.55 4.05
COMPARATIVE DATA ON THE MARSTO-CHRON AND STOP-WATCH
WINK-COUNTER
MARSTO-CHRON
BTOP-WATCH
ELEMENTS
ELEMENTS
ELEMENTS
2 3.95 3.30 4.60 4.40 3.05 3.95 4.20 2.90 3.15 4.15 2.80 3.50 4.25 5.30 3.90
3.99 3.83 0.41 0.68
S 12.65 10.75 12.75 8.00 11.40 14.70 11.70 13.85 11.05 12.15 14.50 15.15 13.95 15.25 14.75
Cycle 21.00 18.05 21.25 16.55 18.90 22.50 19.65 20.15 19.40 20.10 20.90 22.60 21.85 24.10 22.70
12.83 20.65 2.05 1.95
1
e
S
7.1 6.3 5.0 4.8 5.8 4.7 4.5 3.8 5.5 4.3 3.9 4.7 4.2 4.4 4.8
3.4 4.9 3.5 4.4 7.0 6.3 4.7 3.9 7.2 3.8 4.4 5.0 6.0 6.0 7.4
9.3 10.2 13.1 11.4 11.3 12.1 12.5 13.6 11.8 10.4 15.1 15.7 14.8 15.9 13.7
4.92 5.00 0.78 1.34
Cycle 20.6 19.9 23.0 19.7 21.5 23.8 23.3 22.1 21.2 21.9 22.8 24.8 24.0 25.3 25.9
12.73 22.65 2.05 1.90
1 4 5 6 6 6 5 4 δ 6 6 4 5 6 4 3
£ 6 4 6 6 4 5 7 4 4 6 6 4 6 6 7
4.73 5.07 0.82 1.01
S
12 11 12 7 11 14 12 13 12 11 13 16 15 17 14 12.67 2.39
Cycle 21 20 23 18 21 24 23 22 21 22 22 25 25 26 24 24.47 2.10
Formulas and Their Interpretation. The symbols X and 5 refer, respectively, to the mean and standard deviation values. They were computed from the formulas: 3Γ = — , η and _
- Χγ η - 1 where Ar¿ refers to the ith observation, and η to the number of units in the sample.
72
A WORK MEASUREMENT THEORY «
I t should be emphasized that the standard deviation is not a clear-cut indicator of measurement consistency. The standard deviation in measurement work represents the sum of two components considered to have additive effects: the inherent variation of the element measured and the inherent variation of the measurement method. Under the additive assumption, however, the inherent variation of the element measured can be considered a constant for different measurement methods. The result is that standard deviation values do represent the relative consistency of different measurement methods. Implications of the Tabulated Data. According to the table the cycle times and the times for the first two elements were overestimated by both the marsto-chron observer and the stop-watch observer. On the other hand, these observers had essentially the same standard deviation values as the wink-counter, suggesting a common degree of consistency. The measurement methods also became relatively more consistent (in terms of the ratio between 5 and X) with the third element and with the complete cycle. Here the element data were independent only in the case of the winkcounter. This result is similar to a result obtained in comparing the snap-back and continuous stop-watch methods. Seemingly element independence depends to some extent upon the measurement method, especially with short elements. General Implications. In other laboratory experiments the stop-watch and marsto-chron had essentially the same characteristics. These experiments confirm that the accuracy of a measurement method depends on how it is used and by whom. They also reveal that the marsto-chron has essentially the same accuracy in the hands of an inexperienced observer as the stop-watch in the hands of an experienced observer. Also, stop-watch observers must be expected to have consistent positive or negative biases with elements and delays, especially short ones. This means that the accuracy of the stop-watch varies with the observer. All measurement methods, including the snap-back method, improve in accuracy as the size of the component increases. Relative consistency also improves with larger components, but at a somewhat faster rate than accuracy. There is an even more fundamental difference between the results on accuracy and consistency: there seem to be no biases with respect to consistency. The consistency of all the measurement
THE PROBLEMS OF TIME
MEASUREMENT
73
methods considered, which covers a large number of stop-watch ot>servers, is about the same with individual components. The experiments also show that the degree of independence depends in part upon the measurement method used, especially for short elements. This means that it may not be desirable to consider elements in making studies intended primarily to estimate mean cycle times—a point considered in more detail in Chapter 14. Another and related result tallies with a result obtained by Leng. The statistical distribution of readings becomes more symmetrical with a more sensitive measuring instrument and with larger elements. Time Studies from Film. In another series of laboratory experiments, time studies of films were compared with time studies made by the same observers during the filming. It turned out that the element data in the film time studies were generally less consistent than the element data in the live time studies. On the other hand, the mean element times were essentially equivalent. One reason for poorer consistency in the film studies was that the observers did not have the sound and other physical signals available in the live studies. Shrinking the workplace to two dimensions aggravated the situation. The observers had to fix their attention on physical guideposts in the workplace and at the same time try to study the work activity. This frustrating requirement did not exist in the live situation. Here the work activity could be viewed as part of a full, three-dimensional field including physical guideposts. The live situation also provided sound and other signals, such as shadows, which enabled the observers to scatter their attention to some extent. This is added evidence that the efficiency of measurement depends on the measurement method. What is more, the efficiency of measurement also depends on how the thing being measured is presented to the observer. Observers seem to construct a measurement system to fit the experimental environment in which the measurement is being made. FACTORY AND O T H E R S U P P L E M E N T A R Y
STUDIES
Certain other experiments were made for the purpose of comparing the marsto-chron and the stop-watch under factory conditions. These experiments were performed in one of the two garment plants where the basic studies considered in this book were made. The production charac-
74
A WORK MEASUREMENT
THEORY
teristics of the operation and operator (ID) are described in Chapters 7 and 9. In the experiments, the marsto-chron data were obtained by an experienced outside observer while the stop-watch data were obtained by a staff time study engineer with six months' experience. In the initial experiment, readings were obtained for each of the nine elements in the operation, which was about one-half minute long. There were two significant observational factors in this experiment. The observers jointly made a number of preliminary stop-watch studies in order to synchronize element endpoints. Any consistent biases could then be attributed t o the use of the instruments. Also, the work of the operator was almost completely free from delays. This simplified the observation problem somewhat, especially for the stop-watch observer. The stop-watch results would therefore not be expected to differ as much from the marsto-chron results as they otherwise might. The marsto-chron readings turned out much more symmetrical than the stop-watch readings and, in both cases, the degree of symmetry was greater for longer elements. A related result is that the standard deviation values increased under the same conditions. The relation between degree of symmetry and standard deviation values is highlighted by the extremely short elements. The stop-watch readings for the ninth element, for example, were somewhat more uniform than the marsto-chron readings. This element also had the smallest pair of standard deviation values, with the stop-watch value somewhat smaller than the marstochron value. I n 6pite of these differences the two measurement methods were remarkably similar in consistency. They also had roughly equivalent degrees of accuracy, especially at the cycle level. I t follows that the marstochron and the stop-watch have essentially the same accuracy and consistency potential in a favorable observational environment, assuming that the observers have some experience and use essentially the same element endpoints. This study also gave confirmatory evidence that (apparent) constancy is a function of the sensitivity of measurement, thus clinching the argument against the claim of constant element (and motion) times presented, for example, by Lowry, Maynard, and Stegemerten. 8 A follow-up experiment was then made in which the first four elements were pooled into a single measurement unit. As anticipated this sirnpli' Lowry, Maynard, and Stegemerten, Time and Mo/ion Study, p. 96.
THE PROBLEUS OF TIME
MEASUREMENT
75
fied the observation process, and the two measurement methods turned out to have almost exactly the same accuracy and consistency. Thus, the difficulty of the observation problem directly influences the accuracy and consistency of a measurement method. Another informative experiment in this supplementary series was made in the laboratory on the bolt-and-nut assembly operation described earlier. Stop-watch readings were obtained by three observers, each with at least one year's experience. Despite some differences in element times, the three observers had an equivalent degree of cycletime accuracy. They also had essentially the same consistency, with hardly any discernible difference at the cycle level. This is added evidence that differences among stop-watch observers show up primarily in terms of element biases which generally balance out at the cycle level. A BASIS FOR A TIME MEASUREMENT
THEORY
A basis for a time measurement theory can be abstracted from the experimental studies. A primary point is that (apparently) constant readings are obtained when the measuring instrument is comparatively insensitive. This is related to the fact that symmetry increases both with a more sensitive measuring instrument and as the size of element increases. In turn, standard deviation values increase, and coefficient-ofvariation (s/Jt) values decrease under the same conditions. Another significant fact is that the relative accuracy of a measurement method increases as elements increase in size. The relative consistency also increases under the same conditions, but at a much faster rate. Mechanically, a stop-watch used in the snap-back sense becomes less accurate, while the wink-counter used in a camera field becomes more accurate. With elements the observer has a decisive effect on accuracy and some effect on consistency, but he has little effect on cycle accuracy and consistency. The observer's presence is primarily felt in terms of characteristic positive and negative biases at the element level which balance out at the cycle level. Accuracy and consistency also depend on the operation and the operator. For simple operations an inherently inferior instrument may be as useful as an inherently superior one. Operators whose performances involve few delays simplify the observation procedure and may bring about the same result. The Stop-Watch and the Marsto-Chron. According to the experiments,
76
A WORK MEASUREMENT
THEORY
there isn't much difference between the stop-watch and the marstochron when the stop-watch is used by an experienced observer. The marsto-chron is distinctly superior only with very short elements. However, as Chapter 14 shows, this is a trivial advantage since short elements are not usually independent and should therefore not be considered in making time measurements. It is true that the marsto-chron is a somewhat more effective instrument in the hands of an inexperienced observer, but this advantage is more apparent than real. An inexperienced observer would not usually be capable of doing an acceptable job of observation where recording what happens is ultimately much more important than recording times. Several practical factors must also be considered in choosing instruments. The marsto-chron requires a great, deal of time for transcribing tape readings. Also, delays and other dynamic work factors cannot conveniently be recorded, even by an experienced observer. This means that the marsto-chron is not nearly as versatile an instrument as the stopwatch; probably its most useful field of application lies in laboratory and other special-purpose work.
CHAPTER
SIX
The Nature and Function of Process Standardization
Most time study writers agree that the process should be standardized before studies are made. R. L. Morrow, for example, suggests an over-all study of all the operations in a given work sequence.1 The study is used to decide on sequence, equipment, and other key workplace characteristics, using as a guiding criterion the postulate that all unnecessary movements and materials handling should be eliminated. Morrow's belief is that planning the physical characteristics of work in this manner will yield maximum output along with acceptable quality. In common with most other time study writers, Morrow makes a particular point of specifying that the so-called "one best way" of performing the immediate operation should be a basic component of the standardization process. Ralph Barnes's views on the subject are essentially the same as Morrow's. He, too, recommends that work methods, materials, tools, and working conditions be standardized before studies are made and standards are established. 2 B y and large Barnes feels that a detailed study should be made, including a minute analysis of the work method, but he concedes that a cursory study may sometimes be sufficient for practical purposes. These arc typical of the views held by the great majority of time study writers who insist on some kind of advance standardization. In the distinct minority are writers like Presgrave who feel that stand1 1
Morrow, Time Study and Motion Economy, pp. 96-97. Barnes, Motion and Time Study, p. 280.
78
Δ WORK MEASUREMENT
THEORY
26
8
100 9 0 - 9 9 7 5 - 8 9 5 0 - 7 4 2 5 - 4 9 1 0 - 2 4 0 - 1 0 FIGURE
1. T H E P E R C E N T A G E O F OPERATION STANDARDIZED ADVANCE BY S U R V E Y R E S P O N D E N T S
% IN
ardization is not absolutely necessary, and that it is often necessary to make studies before standardizing work methods.3 STANDARDIZATION I N
PRACTICE
Despite the formal recommendations the process of standardization has generally been reported to be rather superficial. As long ago as 1916, for example, Robert Hoxie reported that studies were made with little or no previous standardizing. 4 Recent evidence on this question is provided by the results of a survey showing that, in practice, the amount of standardizing varies widely, particularly with respect to work methods.6 The views of 51 respondents are summarized in Figure 1, which shows that just about half of them made it a practice to standardize operations before making studies. The explanations offered by the other half are that: (1) it is impossible to standardize all operations in advance; (2) standardization can be attained through time study; (3) standardization is too costly; (4) it is only necessary to standardize key operations. Limitations of Standardizing Criteria. Quite aside from their rather 1
Presgrave, The Dynamics of Time Study, pp. 128-29. Hoxie, Scientific Management and Labor. • Abruzzi, "A Survey of American Time Stuciy Practices." Time and Motion Study, II, No. i (1952), 11-25. 4
PROCESS STANDARDIZATION
79
limited application the criteria of standardization are uniformly vague and descriptive. This explains why wide differences of opinion exist, ranging from the comprehensive standardizing suggested by Morrow to Presgrave's willingness to make studies without any overt standardizing. Wide differences of opinion also exist among writers and practitioners as to just what constitutes standardization; these differences are pointed up by the results of the survey. The criteria reported are too numerous to record here, but it is possible to put them into a list of concededly descriptive qualitative categories: (1) all obvious improvements have been made; (2) the rules of motion economy have been applied to methods, equipment, and materials; (3) operations are running smoothly and with a minimum number of delays; (4) production rates are running close to the time study standards; (5) quality standards are being met; (6) supervisory judgment that the process is standardized. Three respondents also made the significant comment that the process of standardizing is continuous, and that complete standardization is an ideal that can never be realized. Six others agreed essentially with Presgrave; their view was that it is often impractical to standardize before making studies though it might be desirable in an abstract sense. The principal difficulty with the recommended criteria is that they lead to many different opinions on the need for standardizing and what this implies. The difficulty can be overcome in just one way. The criteria must be based on procedures that are essentially independent of the observer, and they must be formulated in operational terms rather than vague and descriptive terms. They should also take into full account the nature and objectives of the plant and its operations. SUGGESTIVE FINDINGS AND PROPOSALS
In considering the results of the widely-publicized Hawthorne experiments, T. N. Whitehead hints at the importance of stable work patterns in industrial operations.® One of his findings, for example, was that workers seemed to have persistent work patterns when assigned to operations with essentially the same work content. Whitehead also finds that the production rates of experts were more stable than the production rates of novices. This finding is supported by the work of • Whitehead, The Industrial
Worker, I, 63-73.
80
A WORK MEASUREMENT THEORY
k
240-
P r e c e ß a / 9
S o n o s
Si/eo-
& a f e
Τί/neßdfe ^/OO / ¿ è t s
~¡o
/s
so
30
35
W££KS FIGURE 2 . A COMPARISON OF THE RELATIVE OUTPUT UNDER T H R E E T Y P E S OF WAGE PAYMENT PLANS (FROM WYATT, .STOCK, AND FROST, P. 5).
J. Loveday and S. H. Munro, who report that highly skilled workers had more uniform production rates than other workers.7 Numerous other investigators of the British Industrial Fatigue (later Health) Research Board have reported similar results in laboratory and factory studies. Isabel Burnett, for example, found that workers had remarkably persistent production-rate characteristics over a long period of time.8 Wyatt, Stock, and Frost went one step further. 5 They showed that, with a given wage-payment plan, production rates become stabilized at a characteristic level. This level remains unchanged as long as the work situation remains essentially the same. The actual results obtained with three wage-payment plans are summarized in Figure 2. Each reading on the chart represents the mean weekly output of workers assigned to certain packing and weighing operations. The weekly output values are presented in relative terms, using a base value of 100 percent for the first week's output. Elton's Hypothesis. In another Research Board report Elton focalizes the findings of the earlier investigators. Elton looked into the time required by each of twelve weavers to complete a series of warps on the 7
Loveday and Munro, Preliminary N'oies on the Boot and Shoe Indiistry. * Burnett, An Experimental Investigation of Repetitive Work. ' Wyatt , Stock, and Frost, Incentives in Repetitive Work.
PROCESS STANDARDIZATION
81
same kind of cloth. Despite a small number of readings for each weaver, Elton was able to conclude that "considering the large number of causes at work which affect output the consistency of these figures, which have been taken at random and have not been specially selected, is remarkable." This led Elton to formulate an extremely suggestive hypothesis. "The rates of production of individual weavers," he says, "are fairly consistent, and when a substantial departure is made from a worker's average rate of production it may reasonably be inferred that she has had to contend with particularly bad work (or exceptionally good), or that her normal working capacity has been stimulated or depressed."10 Though framed in descriptive and incomplete terms this is a noteworthy hypothesis. Elton clearly had an intuitive notion of the significance of uniformity in production rates. He failed to suggest, however, how specific criteria can be developed for deciding just how much variation constitutes " a substantial departure from a worker's average rate of production." Statistical Control Concepts. Elton might well have framed his hypothesis in terms of the statistical control concepts developed by Shewhart, had they been available at the time of his studies.11 The first hint that this might be desirable was given by Irving Lorge in a brief paper suggesting the wider application of statistical methods in this field.12 This hint was not implemented in any way, however, until Gomberg looked into the question in somewhat more definite though essentially conceptual terms.13 Gomberg's contribution was to point out that work measurement problems have a direct analogy in the product quality problems considered by Shewhart. He also raised some pertinent questions on whether Shewhart's concepts could be successfully transplanted to productionrate problems. Among these is the fundamental question of whether it would be possible to develop experimental procedures and criteria for direct application in this field. Gomberg's own view was that this would 10 Elton, An Analysis of the Individual Difference in Ote Output of Silk-Weavers, p. 11. 11 See, for example, Shewhart, Statistical Method from the Viewpoint of Quality Control. 11 Lorge and Haggard, A Physiologist and a Psychologist Look at Wage-Incentive MeOiods. " Gomberg, A Trade Union Analysis of Time Study, pp. 30-49.
82
A WORK MEASUREMENT
THEORY
not be the case, and that work measurement procedures would continue to be used in more or less their present form as aids to collective bargaining. S T A T I S T I C A L STABILITY AND P R O C E S S
STANDARDIZATION
Though suggestive, these findings and proposals have serious defects. They fail to point out that a unique correspondence exists between the statistical concept of stability and the empirical concept of standardization. They likewise fail to suggest procedures for bringing about stable production rates. The process of standardization can be divided into three distinguishable but interrelated levels. The first level is touched on in Chapter 2: it is suggested there that dominant production variables must be fixed or standardized to insure essentially constant estimating conditions. Thia level of standardization may be characterized as primary or gross standardization, and it covers both dominant physical variables and dominant human or behavioral variables. Primary standardization in this sense is usually achieved by industrial engineering techniques in the case of physical variables, and by bargaining and other codifying techniques in the case of behavioral variables. The other two levels of standardization jointly define secondary standardization. These levels are concerned with the more subtle aspects of the work situation. This suggests that these levels cannot be studied by empirical means alone but refluire a rather sophisticated analytical approach, including a fundamental use of statistical procedures. It cannot be emphasized too strongly, however, that any approach is useless unless a certain amount of primary standardization exists. Inferences about minute matters are simply impossible unless those matters can be considered to take place in an environment that has some semblance of stability. I t is only when primary standardization can be assumed that it becomes possible to determine whether production rates behave as though secondary standardization also exists. This determination requires a sequential network of estimates and evaluations. The estimates are obtained directly from data; and the results are evaluated in terms of empirical criteria of secondary standardization. These criteria also have obviously the important function of determining when secondary standardization does not seem to exist, in which case the data are scanned for suggestive clues on the underlying
PROCESS STANDARDIZATION
83
cause or causes. The sequential and iterative process of data-gathering, estimation and evaluation is then continued until secondary standarization is finally achieved. The theory and application of this process form the core of the proposed work measurement theory. Criteria of Secondary Standardization. It stands repeating that the process of primary standardization must take into account dominant behavioral variables as well as dominant physical variables. The criteria for the two secondary levels of standardization also must take into account both physical and behavioral variables; these might well be defined as ordinary variables to distinguish them from the dominant variables involved in primary standardization. For secondary standardization it is necessary to look into the statistical properties of production rates and find criteria for answering the fundamental questions: (1) When can the observed variations be attributed exclusively to random causes? (2) When must these variations be attributed to nonrandom or assignable causes, such as Elton's "substantial departures"? Without such criteria it is impossible to determine whether secondary standardization exists. This means, in statistical terms, that it is impossible to make precise estimates. The explanation for this is suggested in the opening chapter: statistical procedures should not be applied unless empirical criteria have shown that a stable population exists. Criteria for distinguishing between random-cause and assignable-cause situations perform a dual function. When they indicate a chance-cause situation, the samples are said to be random samples from a stable population. In Shewhart terminology this is defined to be a state of statistical control. When they indicate an assignable-cause situation, the samples are said to be nonrandom samples, and a stable population cannot be assumed. In Shewhart terminology this is defined as a situation in which statistical control is absent.14 These situations also can be defined within the standardization framework. This seems to be the most useful and general framework of reference because the two situations ultimately refer to a process of standardization in all fields of application. It is particularly useful and suggestive in the field of work measurement, where the concept of standardization is explicitly recognized to be of central importance. In this framework the first situation implies that there is secondary 14 These questions are considered in some detail in Shewhart, Statistical Method from the Viewpoint of Quality Control.
84
A WORK MEASUREMENT THEORY
standardization, and the second situation that there is not. This establishes the unique and complete correspondence between stability in the statistical sense and standardization in the empirical sense. The variables affecting product quality are much more limited in number and range of variation than the variables affecting production rates. The principal reason for this is that the behavioral variables introduced into the work environment by the worker affect production rates to a much greater degree than they affect product quality. This is clearly related to the fact that worker welfare, particularly in the form of earnings and general work status, is much more closely allied to production rates—though what is cause and what is effect here is unclear. The result is that the philosophical approach found effective in the product quality area simply would not be capable of handling work measurement problems. The fundamental role of behavioral variables in the work measurement area makes the problem of standardizing infinitely more complex. Local and Grand Stability. Applications of statistical concepts to work measurement problems do have something in common with applications to product quality problems. Successful application depends in both cases on having standardized the work situation in some sense. Also, the objective in both cases is to bring about a state of secondary standardization. Here the analogy ends. The ultimate reason is that little attention needs to be paid to behavioral variables in the product quality case, while a great deal of attention must be paid to behavioral variables in the work measurement case. This difference shows up at the primary level, where dominant behavioral variables must be fixed along with dominant physical variables. Similarly, it is sufficient to consider just one secondary level in the product quality case. That level is concerned with long-term aspects of quality standardization. In the work measurement case, however, it is also vital to explore and standardize the shortterm characteristics of production rates, where the effect of behavioral variables is felt in full and undiluted form. The two secondary levels in work measurement applications can be characterized as grand standardization and local standardization; their statistical counterparts can be characterized as grand stability and local stability. In studies of grand stability, production rates are examined in
PROCESS STANDARDIZATION
85
terms of small samples taken from homogeneous production lots or strata defined under the assumption that the essentially constant conditions of primary standardization prevail. The basic objective of these studies is to determine whether the long-term characteristics of production rates are stable and, if so, to determine their statistical properties. In studies of local stability, production rates are examined in terms of a continuous or, at least, an intensive series of samples taken over a relatively short period, most often a shift or a day. The basic objective here is to look into the short-term characteristics of production rates. Though this is not strictly necessary, it is usually desirable to study the performance of only one worker at a time. This gives a close-up view of the effect of behavioral variables on production rates. Studies of local stability therefore supply information which complements the information supplied by studies of grand stability. The ideal situation would be to make both kinds of studies and obtain essentially complete information about production-rate characteristics; this would also make it possible to arrive at a more comprehensive state of standardization. It might be advisable, however, for economic or other reasons, to make only one type of study. This would be true, for example, when the cost of making local studies is greater than the value assigned to the information obtained. This is apparently the case in product quality applications. Here it is usually not considered economical to look into the short-term performance characteristics of individual workers with respect to quality. The same reasoning might also apply in certain work measurement applications. In this case, however, it would not be a routine decision since the short-term performance characteristics of individual workers are a much more decisive determinant of production rates than quality. The Relation among Levels of Standardization. There is another and more compelling reason for preferring grand studies to local studies when a choice must be made. Grand stability represents long-term standardization which is more basic than the short-term standardization represented by local stability. For that reason grand stability must be established or assumed before local stability can be established. If not, production rates might well be stable within a given day, but there would be no assurance that this would be the situation on other days. Though they are distinct, the three levels of standardization make up
86
A WORK MEASUREMENT
THEORY
an inferential staircase. Primary standardization must be assumed for grand standardization, and grand standardization must be assumed for local standardization. Working backward, local studies can be used to determine whether grand, and even primary, standardization is being maintained at an acceptable level; similarly, grand studies can be used to determine whether primary standardization is being maintained at an acceptable level.
CHAPTER
SEVEN
Production Rates in the Short Term
The studies considered in this chapter were made within individual shifts, and they covered from four to six hours of continuous operation in most cases. The observations were made on consecutive small lots, each made up of from four to six dozen garments. Element readings and cycle readings were taken by the continuous stop-watch method, with separate time entries being made of delays. It bears repeating here that experimental data and other technical material are minimized in the interest of an essentially descriptive presentation; readers interested in technical material will find it in Work Measurement. AN OUTLINE OF ANALYTICAL PROCEDURE
The Sample Size. The first step in analysis was deciding on the sample size, a question on which the quality control field gives a highly suggestive clue. A sample size of four units is ordinarily recommended on the ground that this provides an optimal balance between two desirable but contradictory goals. The first goal is to minimize the chances of absorbing significant variations within small samples; this goal demands small sample sizes. The second goal is analytical simplicity, which includes having normally distributed sample means; this goal demands large sample sizes. Samples of three units were adopted for most of the local or short-term studies because they place a greater emphasis on the first goal. Such an emphasis is needed because often significant changes take place quite rapidly in local production rates. Another argument for using samples of three units is that, in most cases, only a comparatively small number of observations could be obtained. This sampling plan was used with cycle times and, in some cases, also
88
A WORK MEASUREMENT
THEORY
with element times. But samples were never constructed from production items in different lots; this preserved their homogeneity and helped to insure estimates that were both accurate and precise. S-pecific Criteria of Local Stability. A full-scale discussion on criteria of statistical stability is given in a later section of the book. It is sufficient to point out here that the 3-sigma limit criterion, conventionally used in product quality applications, turned out to be an optimal choice for making studies of local stability in the work situations considered. The actual limit criteria were computed from the formulas: X ± A2R for the stability (or control) charts for means, and D,R
and
DtR
for the stability (or control) charts for ranges. In these formulas X represents the mean of the observed sample means, while R represents the mean of the corresponding ranges. The AD3, and Dt values for samples of three units were obtained from tables readily available in quality control literature. 1 Other Statistical and Empirical Problems. These steps do supply a framework for making studies of local stability, but numerous other statistical and empirical questions must also be considered to fill them out. Some of these questions can best be understood in terms of illustrative examples; hence they will be discussed in connection with studies where they were specific factors. Other questions of a broader nature, particularly on application and interpretation, are discussed in later sections devoted to the general subject of stability. INITIAL STUDIES IN PLANT
A
The first set of studies was made in Plant A, which employed up-todate procedures of production and managerial control. There were about nine hundred production employees on a full-time basis—mostly women engaged in a variety of sewing operations. These employees were covered by a liberal collective bargaining contract, and they otherwise enjoyed an extremely cordial and cooperative relationship with management. I t seemed clear from these and other factors in the work environment that primary standardization had been achieved with respect to both types of dominant variables. 1
See, for example, Burr, Engineering Statistics and Quality Control.
89
PRODUCTION R A T E S I N T H E SHORT TERM
SO • •
45 Ζ < bl 2
'
·
,8·44.3
•
•
40
·
· •
•
·
"
LCL'38.5
35 20
•
·
•
.
LCL«94.6
90 60
UCL* 53.3
40 e
ζ< 20
0
•
•
•
R « 20.1
.
· 5
IO SAMPLE
15-
20
LCL'O
25
NUMBER
F I G U R E 8 . T H E MEAN AND RANGE CHARTS ON LOCAL CYCLE T I M E S FOR OPERATION 2 1
locally stable. There was some doubt, however, about the stability of the means on account of a series of five high points, one above the upper limit criterion. There was also some evidence of a positive correlation among the means; this was confirmed by a significant ratio-test value of 1.30. This example helps to establish that a limited number of samples does not give conclusive information regarding statistical stability. In such cases more confidence is justified in a conclusion of an assignable cause than in a conclusion of stability. Thus in the present case, it could well be argued that four high points, added to one outside point, justify the conclusion of an assignable cause. Other Studies. A number of other hand-pressing operations were also studied. The studies showed that there was local stability in only 50 percent cf the cases. Ratio tests were also made on the original readings
PRODUCTION RATES I N T H E SHORT TERM
105
in each case. By and large the test results indicated the presence of positive correlation. Three representative results are 0.12 for Operation 23, 1.44 for Operation 24, and 1.61 for Operation 25. The first two of these are clearly significant.
CHAPTER
EIGHT
Interpreting Short-Term Studies
According to conventional limit criteria, the production rates for almost all the operations of Plant A were found to be locally stable with respect to both means and ranges. On the other hand, 50 percent of the operations of Plant Β \vere locally stable, and in most of the remaining 50 percent, the means were responsible for the instability. Apparently local stability can exist in individual operations even when the production and managerial control procedures are poor. However, the existence of local stability on a broad basis does seem to depend on how effective the control procedures are. This experimental result tallies with the axiom that some measure of primary or gross process standardization, including control procedures, is needed to establish a standardized situation at the local level. In practico, this means that valuable information can be obtained from local (and grand) studies on whether there is gross standardization and, if so, whether it can be considered optimal. Indicators of OptimaKly. The question of whether the standardizing process is optimal can be evaluated in a relative sense with the aid of X and S estimates. This is illustrated by the data in Table 6, which gives estimates for two operations in each plant. The operations were selected on the basis of having comparable a values. This normalizing process TABLE
Γ».
PRODUCTION-RATE
DATA
IN T H E T W O
FOR
COMPARINO
OPERATIONS
PLANTS
Plant li
Plant A Operation
X estimate ê estimate Coefficient, of variation (%)
3 83.5 4.7 5.6
2
121.9 6.3 5.2
22 97.1 6.1
63
21 115.4 11.9 10.3
INTERPRETING SHORT-TERM STUDIES
107
was used because consistency of performance turns out to be a function of level of performance. To view the problem in more fully normalized terms, coefficient-of-variation estimates are also given on the table. It should be emphasized that normalizing in terms of magnitude does not make the operations fully comparable. Many other factors would have to be taken into account for the normalization to be fully effective. Despite this unavoidable confounding, however, the coefficient values for the two Plant Β operations are substantially greater than those for the two Plant A operations. This suggests that consistency of performance is an inverse function of the extent of standardization in the work environment. Temporal Characteristics. In several of the local studies the mean cycle times had a tendency to increase as the work day progressed. There also was some tendency for the consistency of the cycle times to decrease under the same conditions. These characteristics were not nearly as pronounced or as general, however, as they have been reported to be by certain investigators. These investigators include May Smith and J . Goldmark (and others) with respect to level of performance and James Weinland with respect to consistency of performance.1 In the absence of any other identifiable factor, the difference between the two sets of results must be attributed to the fact that delays were not included in the present studies. Time of day apparently does not have nearly as important an effect on direct work activity as is generally believed; what it does affect is the number and intensity of delays. The study of net production rates therefore has the vital empirical advantage of isolating factors responsible for variation in work performance. Consistency of Performance. With net production data it becomes possible to look into a hypothesis of Loveday and Munro, which is that highly skilled workers have more consistent (gross) production rates than unskilled workers.2 To do this a representative set of Plant A studies has been divided into two groups. Three studies whose means and ranges exhibited noticeable upward shifts make up Group 1; three studies with little evidence of such shifts make up Group 2. The work experience of the operators is recorded in Table 7. The table also gives 1 Smith, Some Studies in the Laundry Trade; Goldmark and others, Studies in Industrial Psychology: Fatigue in Relation to Working Capacity: 1. Comparison of an Eight-Hour Plant and a Ten-Hour Plant; Weinland, "Variability of Performance in the Curve of W o r k , " Archives of Psychology, Vol. X I V , No. 8 (1927). 1 Loveday and Munro, Preliminary Notes on the Boot and Shoe Industry.
108
A WORK MEASUREMENT
THEORY
the corresponding X, á, and § / £ estimates, i.e., coefficient-of-variation estimates. TABLE 7 .
T H E R E L A T I O N B E T W E E N S K I L L AND GROUP
Experience (Monlhi) 12
2 3
CROUP
ID 1Η
3L
30 19 18
CONSISTENCY
1
X
¿
ί/Χ (9c)
55.5 170.2 76.5
4.5 7.3 4.7
8.1 4.3 6.1
3.3 3.5 4.2
8.0
2
44.3 44.0 79.3
7.4 5.3
It is clear, of course, that length of experience is neither a quantitative nor a complete indicator of skill. But it can be assumed that the two groups of operators belong in qualitatively different skill categories comparable to those considered by Loveday and Munro. Under this assumption two conclusions seem clear: (1) the relatively unskilled operatore in Group 1 exhibit marked upward shifts in means and ranges; (2) the more highly skilled operators in Group 2 exhibit little evidence of shifts. With these findings the Loveday and Munro hypothesis that skilled workers have a greater absolute consistency can be extended to net production rates. The present findings bring out the added and more subtle fact: coefficient-of-variation values depend much more on the operation than they do on skill (as indicated by length of experience). However, they do seem to correspond to skill rankings within a given operation, a point considered in more detail in Chapters 17 and 18. Comparing the Performances of Novices. Studies on local stability can
also be used to look into the related finding of Isabel Burnett and C. A. Mace that novices differ in their mean (gross) production rates.3 Specific studies were made on Operation 13 and on three operators, each with only one month's experience; among them is Operator 13/1, whose work is considered in Chapter 7. In all three cases the production rates were • Burnett, An Experimental Investigation of Repetitive Work, and Mace, Incentives: Some Experimental Studies.
I N T E R P R E T I N G SHORT-TERM
109
STUDIES
found locally stable in chart studies; this justifies making a comparison of their production-rate characteristics. TABLE 8.
T H E PRODUCTION-RATE CHARACTERISTICS OF T H R E E NOVICES OPEBATOR
Readings (n) Sizes X estimate S estimate i / X (%)
ISA
126 34 93.3 4.8 5.14
1SB
128 36 141.8 8.0 5.50
ISC
144 36 99.4 6.5 6.54
The estimates given in Table 8 demonstrate that novices do differ substantially in their net production rates and that the differences show up both in level and in consistency. The table also shows that relative consistency, as measured by the coefficient values, is essentially the same for the three workers. This supports the view that relative consistency appears to be more closely related to the operation than it is to the worker. D E L A Y S AND LOCAL PRODUCTION
RATES
Delays and Productivity. Isolating delays from production rates also makes it possible to examine a hypothesis suggested by Elton and by Wyatt, Stock, and Frost. 4 These observers report that workers who produce more (in the gross sense) also have fewer and shorter delays than other workers. To look into this hypothesis, the relevant data for a representative group of operators from Plant A have been recorded in Table 9. The ratios of the published production standards to the mean cycle times are used as a rough-and-ready index of the relative productivity of workers on different operations. Though not completely satisfactory for this purpose, the ratio values do seem to provide usable guides, especially since they essentially correspond to the degree of experience of the operators. According to this criterion the proportion and duration of delays seem directly linked to productivity. In particular, operators with relatively low productivity values had a far greater proportion of 4 Elton, An Analysis of the Individual Differences in the Output of and Wyatt, Stock, and Frost, Incentives in Repetitive Work.
Silk-Weavers,
110
A WORK MEASUREMENT THEORY TABLE 9 .
T H E RELATION B E T W E E N DELAYS AND NET PRODUCTION RATES OPERATOR
ISA
ISA
1SB
ISC
30 212 44.3 95.0 2.14
19 258 44.0 95.0 2.16
6 146 121.9 214.0 1.76
5 129 63.6 132.0 2.08
1 126 93.3 161.0 1.73
1 128 141.8 161.0 1.14
1 144 99.4 161.0 1.62
0.12
0.17
0.23
0.22
0.24
0.31
0.31
0.05 0.17
0.05 0.22
0.08 0.31
0.11 0.33
0.30 0.54
0.34 0.65
0.52 0.83
0.05
0.05
0.11
0.15
0.17
0.18
0.34
ID Experience (months) Readings (n) X estimate Standard (X*) Ratio (X* t o í ) Unavoidable delays per garment Personal delays per garment Total delays per garment Average time lost (minutes per garment)
1H
il
delays and spent much more time unproductively. The stated hypothesis with respect to gross production rates can therefore be extended to net production rates. Related Conclusions. Table 9 also suggests a number of sharper conclusions. It shows that productivity is inversely related to personal delays, i.e., delays within complete worker control, and that these delays can be minimized by expert workers. To some extent expert workers also seem to be able to reduce unavoidable delays. Apparently unavoidable delays are partially within the control of the worker, but only in a preventive sense. The occurrence of unavoidable delays cannot easily be traced to the worker, but a reduced incidence of unavoidable delays can. An expert sewing-machine operator, for example, can reduce the incidence of broken needles even though there must be some broken needles. The fact that unavoidable delays can be prevented in some instances suggests that work performance improves sequentially. Personal delays, because they are completely within the control of the worker, are much easier to eliminate, and improvement in this area can be expected to take place somewhat earlier than in the area of unavoidable delays. With unavoidable delays the worker must modify the external characteristics of the work environment by exercising preventive skills. These skills are of a higher order than those involved in eliminating self-imposed delays, and they would be expected to develop later in the learning process. For example, broken needles must always be expected, but only
I N T E R P R E T I N O SHORT-TERM
STUDIES
111
a minimum number of personal delays need be expected of an experienced worker. Indeed, Table 9 indicates that this is exactly the case; experienced workers are able to reduce personal delays to a much greater degree than unavoidable delays. In later chapters considerable evidence is presented showing that the proportion, duration, and type of delays encountered is largely a function of the operator's work characteristics, at least with man-controlled operations. A striking example is provided by Operator 13C whose delay data are summarized in Table 9. A survey of the original data sheets showed that she had numerous characteristic delays; these included flexing fingers, looking at fingernails, and periodically riffling through the pile of finished items. In fact, 34 of the 130 delays counted for this operator were of this nature; this helps to explain the extremely high proportion of personal delays in her case. The example clearly indicates that a program intended to minimize delays and, eventually, to optimize production should concentrate on the delay habits of individual workers, particularly with respect to personal delays. Formal training programs cannot do much, however, about reducing unavoidable delays. The ability to do this must be developed from an informal and perhaps partly unconscious learning process, such as when work procedures created by expert workers are adopted by other members of a work group. This learning process is an aspect of skill development; it is based on the skill potential of individual workers defined in terms of their abilities and developed in terms of their purposes. This whole argument is based on the assumption that the delays considered here are undesirable. This is established by the fact that when their incidence is reduced the result is superior work performance. But delays or, more properly, non-productive activities can also be desirable in the sense of stabilizing the total work activity. Such desirable delays, so to speak, would be introduced into the work activity rather than removed. This subject is discussed more thoroughly in Chapter 16. This fundamental distinction is not usually made in the literature where delays are considered generally undesirable. The conclusion reached here does extend the hypothesis developed by Elton and by
112
A WORK MEASUREMENT
THEORY
Wyatt, Stock, and Frost, but only in the sense that it is concerned with undesirable delays. The same comment applies to the finding of Barnes, Perkins, and Juran considered immediately below.5 None of these investigators made a distinction between undesirable and desirable delays. They would have had no way to do this, even if they had been so inclined, because they considered only gross production rates. Also, they paid no attention to the question of stability, and, without this, sharp conclusions cannot be obtained. The Delay Situation with Novices. On the basis of a series of laboratory studies, Ralph Barnes, J. S. Perkins, and J. M. Juran found that the improvement in (gross) production rates by novices is primarily the result of the ability to reduce delay time. This finding was checked by looking into two sets of data on Operator 13^4 covering an interval of four months; these data are reproduced in Table 10. TABLE
10.
DATA
SHOWING
HOW
PRODUCTION
Experience
Readings (n) X estimate Unavoidable delays per garment Personal delays per garment Average time lost (minutes per garment)
A
REDUCTION
IN
DELAYS
AFFECTS
PERFORMANCE
One Month
126 93.3
Five Month»
53 72.4
0.24
0.13
0.30
0.19
0.17
0.15
The evidence here is too limited to yield a decisive conclusion. But it does confirm that improved performance is associated with a reduction in the proportion and duration of delays, both personal and unavoidable. This is a somewhat more informative finding than that of the Barnes group. It is based on net production rates observed under factory conditions, and it shows that both personal and unavoidable delays are involved. Even more significant is the fact that the proportion of delays is reduced much more substantially than their duration. This must mean that initial improvement is largely the result of eliminating short and s Barnes, Perkins, and Juran, A Study of the Effect of Practice on the Elements of a Factory Operation, p. 71.
I N T E R P R E T I N O SHORT-TERM
STUDIES
113
more subtle delays. These are precisely the delays, both personal and unavoidable, that are likely to be within the worker's control. Table 10 also shows that, when delays are separated out, sharp estimates can be made of the absolute amount of improvement in net production rates. In the present case the difference between 93.9 and 72.4 represents the improvement component. REGULATION OF PERFORMANCE
Regulation of Output. Table 9 brought out the fact that many of the operators in Plant A were highly productive, some of them even having productivity values exceeding 200 percent. Clearly the common practice of overtly depressing output to protect lenient rates and conditions was not followed in this plant. Yet there is considerable evidence that the workers in the plant did plan and regulate output. For example, one operator stated that she and her companion operator had decided that each of them would produce an average of six lots of garments daily. Also, they never finished more than three lots in the afternoon although they did produce as many as seven lots on some days. It was explained that, with this production policy, a satisfactory earning level could be achieved without undue exertion. The mechanics of output regulation is revealed by an incident involving another operator. When preparations were being made for a study, this operator was working on a operation different from the one scheduled for study. She became extremely disturbed when asked to make a change. The department supervisor later said that most operators carefully planned daily work schedules in terms of a definite number of lots on a definite operation. They also paced themselves during the day according to these schedules. They objected vigorously when asked to change operations, claiming that this disturbed their schedules and threw them "off their stride." These are typical examples of how workers regulate output according to ability and purposes. Regulation, then, is a direct outcome of the stance adopted by individual workers in what might be characterized as a game between them and the work environment. This game is an extension of the much more overt game between workers and time study engineers on production standards. The present game is much more subtle, with the work environment acting as the worker's adversary. Although it is probably not consciously developed, the stance of
114
A WORK MEASUREMENT THEORY
the worker represents an attempt to arrive at an optimum balance between his requirements and those of the work environment. In this framework what is called restriction of output is simply a special and overt case of regulation of output. It occurs when the game with respect to production standards dwarfs in importance the game between the worker and the general work environment. Regulation becomes restrictive when the worker's dominant work requirement is to protect a lenient production standard (or to protest a stringent one). Empirical Evidence. The hypothesis that workers ordinarily regulate output in this manner is supported by a large body of empirical evidence. For example, they slacken their work pace when they are ahead of their daily quota and quicken their pace when they fall behind. Workers also adjust their pace with respect to smaller production units, such as lots and even sublots. The studies of local stability show, for example, that many workers in the two plants produced the last portion of a lot at whatever pace was needed to complete the lot according to schedule. Numerous workers, especially in Plant A, divided lots into sublots of about a dozen garments each, which were then worked on as integrated units. Others worked in terms of even smaller sublots. Worker 27, for example, picked up about a dozen garments at a time and worked on them in units of four each. A related phenomenon was common in both plants. When difficulty was encountered because of, say, poor materials, the worker would attempt to compensate for the lost time by speeding up on the following group of garments. Work pace was adjusted even within individual cycles. Numerous cases were observed in which a prolonged delay in one part of the cycle prompted the worker to exceed the usual pace in subsequent parts of the cycle. Changes in Work Method. The studies of local stability also showed that most workers vary work methods from time to time. In some cases, for example, it was discovered that two sewing "runs" were used to complete a sewing component though one "run" was usually sufficient. In other cases workers would vary the method by introducing apparently extraneous elements, such as moving the pile of finished garments slightly or brushing away little pieces of almost nonexistent lint. Operator 13/1 gave a slight tug at the "plush roll" (used in her operation) once every four or five garments, presumably to make it taut. Similarly, Operator 13Β periodically snipped thread ends and pulled the garment
INTERPRETINO SHORT-TERM STUDIES
115
taut prior to sewing. In these, as in most other cases, the elements introduced were essentially unique to individual operators; each of them had a different way of introducing changes in work method. Changes in work methods were much more common and less subtle in Plant Β because there the basic work methods were essentially developed by the workers themselves. While repositioning a garment for further pressing, for example, certain hand pressers occasionally kept the iron in the hand instead of putting it back on the stand. Numerous workers in both plants also varied the method of putting away finished garments. In Plant A, for example, the prescribed method was to pick up a new garment before putting the old one away. Certain workers, however, seemed to have psychological difficulty with this method, and they would complete one cycle before beginning the next one. Statistical Evidence. Considered collectively the empirical evidence clearly shows that, in man-controlled operations, serial correlations are quite likely to exist, both among individual readings and among groups of readings. The direction and intensity of correlation depend upon the performance characteristics of individual workers, varying from time to time even for the same worker. These conclusions are supported by the often significant ratio-test results. The correlations were particularly strong in Plant Β because changes in methods were more frequent and more overt there. The test results also confirm that the correlations were fixed neither in direction nor in intensity. PROBLEMS OF APPLICATION
Key Empirical Factors. The factors to be considered in making applications depend upon the characteristics of the immediate work environment. Another way of saying this is that it is impossible to write prescriptions for applications that will be useful in all work environments. It is possible, however, to obtain some guidance by examining how key factors in man-controlled operations can be taken into account in designing studies. It seems clear from the local studies that serial correlations must be expected in man-controlled operations. These correlations could be eliminated only if work activity could be controlled complete!)' by external means. It will become quite evident later, if it isn't already, that com-
116
A WORK MEASUREMENT THEORY
plete external regulation is impossible. Even if it were possible, it seems highly doubtful whether complete external regulation would be at all desirable. A realistic regulating program requires that workers be given a certain amount of latitude in planning and developing work activities, if for no other reason than to take advantage of their skill potentials. In immediate terms this means that direct empirical action with regard to mancontrolled operations should not be taken just because correlations exist. Criteria of stability are ordinarily considered to be criteria for taking direct action; in this sense the results of ratio and other tests of correlation should not be used as criteria of stability. They are indicators of stability, however, in that they show how workers plan and organize their work methods so as to stabilize production performance. In this sense they do provide valuable information about work practices which can be used for planning long-term action, such as training programs on work methods. Selecting Criteria of Local Stability. The primary criteria of local stability, defined in the direct-action sense, should be based on the local work environment, particularly its economic and other requirements. In the present studies, for example, conventional limit criteria of product quality applications were tentatively adopted as primary criteria of local stability. These criteria seem to fit the characteristics and requirements of man-controlled operations of the kind considered. Extensive application is the acid test for any set. of criteria; extensive application shows whether in the long run the criteria actually yield the results desired. Criteria of stability should minimize the costs of making what statisticians call the two types of error.® A T y p e I error is saying that stability does not exist when it actually does. A Type II error is saying that stability exists when it actually does not. In more descriptive language the first error is looking for imaginary trouble, and the second error is not looking for real trouble. Criteria of local (and grand) stability cannot minimize both errors at the same time because they are mutually contradictory. However, with effective criteria an acceptable balance can be obtained between the costs of making these errors, where cost is defined both in a behavioral • Error considerations in statistical work are described in Hoel, Introduction Mathematical Statistics.
to
INTERPRETINO SHORT-TERM STUDIES
117
sense and in an economic sense. This is done by attaching empirically relevant weights to the risks of error. Selecting Significance Levels. In ratio and other statistical tests there is the question of selecting a significance level, which is simply another though common way of saying Type I error. In the ratio-test case a Type I error is made when random readings are considered correlated. A Type II error is made when correlated readings are considered random. The problem is that, while the risk of Type I error can readily be defined, the risk of Type I I error usually is difficult to pin down in numerical terms. However, the risk of Type II error is related to the risk of Type I error (that is, the significance level), for an increase in the risk of Type I error decreases the risk of Type II error. I t is also possible and sometimes preferable to reduce the risk of Type II error by increasing the sample size while retaining a given significance level. When the second alternative is not feasible, the significance level can be manipulated to obtain an acceptable balance between the two types of errors. In the studies considered here, for example, the significance level was established at 10 percent. This rather large risk was taken in order to minimize the risk of making a Type II error. This is a particularly useful policy in research investigations where reducing the risk of accepting a false hypothesis, that is, making a Type II error, is generally more important than reducing the risk of rejecting a true hypothesis, that is, making a Type I error. An Inferential Problem. In many local studies the presence of a correlation was suggested by inspecting the data. When checked with ratio tests the inspection judgments were generally confirmed. There are both formal and empirical objections to testing data which prompted a testing of the hypothesis. The most important objection is that the test then becomes little more than a formal device for confirming the hypothesis. Nevertheless, this procedure can be of considerable experimental value, especially in exploratory research work. In such cases, though, conclusions should be considered only tentatively established until confirmed by fresh evidence.
CHAPTER
NINE
Production Rates in the Long Term
Most studies on work problems performed by investigators outside of time study have been concerned with the grand characteristics of gross production rates. In these studies, the basic production unit considered varies widely, though always in accordance with the nature of the operations. Elton, for example, uses a warp as the basic production unit in his study on silk weavers.1 In W. D. Evans's study of the productivity of cigar rollers, the production unit was the number of cigars rolled in a forty-hour week.2 In his study on hosiery workers, Joseph Tiffin used still another basic production unit: dozens of pairs produced per hour.8 Although the importance of this has been recognized by such investigators as these, the need for examining the grand characteristics of production rates is rarely considered in time study work. A painstaking search through time study literature revealed only two articles in which this question is even mentioned. The first was written by A. L. Kress who points out that the "practice of taking a ten or fifteen minute series of readings can hardly be dignified by the term time study." 4 " I t would be good policy," he adds, "to take one study between nine-thirty and ten-thirty and another between three and five o'clock." The other article, by Ralph Langley, stresses the need for representative data. 5 The article suggests that a large number of observations be taken at various hours of the day to achieve greater accuracy in time study work. 1
Elton, An Analysis of ihe Individual Differences in the Output of Silk-Weavers. Evans, Individual Productivity Differences, p. 7. * Tiffin, Industrial Psychology, p. 7. 4 Kress, "Methods of Observation," Bulletin of the Taylor Society, X I I I (1928), 141-43. » Langley, "Notes on Time Studies," Industrial Engineering, XIII (1913), 385-86. 1
PRODUCTION RATES IN THE LONG TERM
119
ESTABLISHING GRAND STABILITY
In Chapter 6 the suggestion was made that production rates be examined on a grand basis when the variability among days and weeks is significant in either a statistical or an empirical sense. Many situations exist, for example, where only a relatively small proportion of the total variation in production rates is due to short-term factors. In such cases i t might be unwise to make studies of local stability, though it would ordinarily pay to make studies of grand stability. Defining Homogeneous Lots. In making studies of grand stability it is necessary to define a homogeneous production lot or stratum. The definitions developed for product quality applications are a valuable guide here; they show that only items produced under essentially the same conditions can be included in a homogeneous lot or stratum. In terms of the concepts of Chapter 6, a suitable definition might be that for a lot to be considered homogeneous, it must be standardized in the gross sense. Though suggestive, this statement does not supply a procedure for determining whether lots are homogeneous. A final answer to the question can be obtained only after the fact by examining the results obtained from studies of grand stability. The existence of homogeneous lots can be considered verified if grand stability exists; it can then be concluded by arguing back that gross standardization also exists. However, the lot size must be determined before making studies of grand stability. I t is reasonable in the beginning to define lots empirically in terms of strata suggested by accumulated evidence and other a priori considerations. Such a definition, however, is finally accepted only when homogeneity has been fully established by actual studies of grand stability. This is another example of the sequential nature of the process of inference-making and, in particular, the process of developing full definitions. In making a decision on lot size, just as in all other problems of inference-making, two contradictory goals must be reconciled. A large lot is economical in terms of inspection and analysis, while a small lot is more likely to be homogeneous. An optimal result is achieved by finding the largest lot size for which homogeneity can be assumed. The importance of homogeneity in sampling and analysis can readily be demonstrated. If a sample is taken from a single component of a non-
120
A WORK MEASUREMENT THEORY
homogeneous lot, it will not represent the other components of the lot and will therefore be biased. If a single sample is taken from different components, it will include items with different mean and standard deviation values. Biased results will again be obtained, particularly in the form of an inflated estimate of variability. A homogeneous lot, however, will give unbiased results with any kind of sample. This means, among other things, that the sample units can all be taken at the same time. The result is a minimal cost of observation and a maximal opportunity for detecting assignable causes of variation. THE NATURE OF THE STUDIES
For reasons given in Chapter 7, it was possible to make studies of grand stability only in Plant A. Seven key operations were selected for study, including Operations 1 through 3, on which studies of local stability also were made. These operations were selected to cover relatively old and relatively new operations and, also, groups of from one to twelve workers. In that way a representative picture was obtained of the operations and operating conditions existing in the plant. The Lot Size and Sampling Procedure. The sampling procedure provided that a sample be taken on one operator selected at random from each operation. This process was duplicated at random times in each of the four daily production lots or strata considered homogeneous by the plant's Methods Department. These strata had the added virtue of corresponding to the work periods defined by earlier investigators, such as H. G. Maule.6 The duration of these periods and the terms often used to designate them are given in Table 11. However, these terms refer to certain characteristics of gross production rates. Using a similar breakdown in the studies of grand stability made it possible to determine whether net production rates have the same characteristics. TABLE 11. Period
1 2 3 4
THE PERIOD BREAKDOWN USED IN THE STUDIBS OF GRAND STABILITY Duration Characteristic
9:00-10:30 10:45-11:55 1:00- 2:30 2:45- 4:30
A.M. A.M. P.M. P.M.
Morning spurt Morning decline Afternoon spurt Afternoon decline
• Maule, "Time and Movement Study in Lauudry Work," Human Factor, X (1936), 351-59.
PBODUCTION RATES IN THE LONG TERM
121
In mid-morning and in mid-afternoon an official rest period of 15 minutes was made available to the workers. These rest periods helped to provide logical boundaries for the four work periods. To obtain resulte representing characteristic performances, there were no observations between 8:00 and 8:45 A.M. and between 4:30 and 5:00 P.M. The first interval corresponds to the time when the day's work was being organized, the second to the time when repair and clean-up work was being done. The Sample Size. A sample size of five units was adopted in the studies of grand stability. Like the lot size problem, the sample size problem entails finding an optimal balance between two contradictory goals. Large sample sizes, it will be recalled, have desirable analytical properties, such as means with an approximately normal distribution. Small samples have the advantage that significant variations in production rates, even within a small group of units, can readily be detected. This problem, however, is not nearly as important in grand studies as it is in local studies, principally because successive samples no longer represent adjacent production items made by a single operator. For that reason greater weight was given to analytical advantages and the somewhat large sample size of five units was adopted. Empirical Factors. The proposed sampling procedure differs markedly from time study procedure, which ordinarily consists of taking a continous but limited set of readings on just one worker at some arbitrary time. This made it necessary to explain the sampling studies to supervisors and workers alike. The workers seemed quite receptive to the sampling procedure; they were especially pleased with the short time required to take a sample. This was one of the reasons little attention was paid to the observer after the studies were under way. On the basis of this experience, most workers can be expected to be more receptive to sampling studies than to time studies. Receptivity and bias go hand in hand, and the sampling procedure also minimizes biases. These biases take on positive and negative forms introduced, sometimes unconsciously, by workers who know their work is going to be observed. Another focal advantage of the proposed procedure is that it gives representative results for the operation as a whole under diverse working conditions. In the beginning the sampling procedure is distinctly more difficult to apply than the time study procedure. This is to be expected since the
122
A WORK MEASUREMENT
THEORY
observer must become thoroughly familiar with all the operations in the series to be covered. An excellent way for the observer to familiarize himself with the situation at hand is to make a pilot study covering a day or two, especially when several operations in the series have similar components. Once the operation components have become familiar, the sampling procedure has several important advantages: (1) the observer is less likely to make observational errors in a small sample; (2) the observer has a better opportunity to study work methods; (3) there is little danger of taking observations under nonrepresentative conditions. The studies proper covered a five-week interval. In that interval observations were made on sixteen days, with every working day included at least once. On certain days and during certain work periods, however, some of the operations were not on the production schedule; the result was an unequal distribution of samples. At the time of the studies, the plant was making permanent changes in styles. Also, standard broadcloth and satin fabrics were gradually being replaced by nylon and other light fabrics needed for the seasonal summer trade. Because of this, many workers were being reassigned to different operations; this made it necessary to modify the formal sampling schedule and observe some workers somewhat more often than others. S T U D I E S ON O P E R A T I O N
1
A full set of 64 samples (of five readings each) was taken on Operation 1 which, it will be recalled, was made up of 9 elements. These samples represented the performances of eight operators, 1A through III, working on garments ranging in size from 32 to 42. All operators used a standard power sewing machine except Operator 1 A, who used a highspeed machine. Though element times were also recorded, only cycle times were analyzed in the studies of grand stability. In the present case the cycle means and ranges are plotted in time order in Figure 9; the operators are identified by letter and the two samples from Operator 1A are marked IIS. The limit values on the range chart were computed from the conventional (3-sigma) formula given in Chapter 7. Only 2 of the 04 ranges fall above the upper limit criterion, both representing the work of a
PRODUCTION R A T E S I N T H E LONG T E R M
0
9
123
10 15 20 23 30 33 40 45 50 55 60 65 SUCCESSIVE
SAMPLES
F I G U R E 9 . T H E G R O U P MEAN AND R A N G E CHARTS FOR OPERATION 1, SHOWING BOTH T H E O U T E R 2 s * AND T H E I N N E R 3 - S I G M A LIMITS FOR T H E MEANS
novice, Operator IF. Under these conditions the ranges can be considered stable, indicating that the operator group had essentially the same degree of (cycle) consistency. Criteria for Evaluating Means. The inner limits on the chart for means were also computed from the conventional (3-sigma) formula given in Chapter 7. These limits clearly are useless as criteria of grand stability. According to them most of the plotted means would represent assignable variations. To use these limits as a direct guide to empirical action would certainly be unrealistic; it would mean, among other things, reconstructing the work characteristics of most workers. Conventional limit criteria are based on sample ranges which yield an estimate of variability within individual samples. Such an estimate is utterly unsuitable for developing criteria of stability in cases where widely varying inter-sample performance levels must be expected. Worker differences, for example, show up among samples, and this can only be taken into account by using limit criteria based on variability among samples. The Problem, in Product Quality Applications. Problems of this kind also turned up in certain product quality situations. There the problem was solved by using limit criteria based on an estimate of variability
124
A WORK MEASUREMENT THEORY
developed from differences between successive means. The basic formula is %±ks*,
where and
Σ
δ2 = ^
η - 1
In this formula k is the multiple of s* considered appropriate in the immediate environment; in the present studies k was given a value of two for reasons outlined below. The δ2 formula, it should be recalled, is identical in form with the numerator of the ratio test described in Chapter 7. Here it gives an estimate of variability based on differences between successive means. The use of this approach is illustrated by an ordnance problem in which Leslie Simon encountered a situation similar to the one described here.7 The ranges were found to be quite stable, but 10 to 40 percent of the sample means fell outside the conventional limits. An investigation revealed what Simon called a "systematic error" among samples. The "systematic error" was the result of a number of external factors, such as changes in wind velocity, which could not readily be eliminated. With the somewhat less stringent limits offered by the s* approach, the "systematic error" was permitted to remain in the process as an expected source of variation. Another application of the s* approach was made in Australian industry, again in an ordnance problem.8 Essentially the same kind of situation was encountered here, with more than 50 percent of the plotted means falling outside the conventional limits. I t was found that a "systematic error" was being introduced among days. Removing the causes of that "systematic error" would have required an extremely costly revamping of the process; this, too, was permitted to remain in the process by the less stringent s* limits.9 Only one sample mean fell outside the s* limits; this represented a source of variation over and 1 Simon, "Application of Statistical Methods to Ordnance Engineering," Journal of the American Statistical Association, XXXVII (1942), 313-24. 8 Quality Control Lectures, pp. 138-40. • An alternative to the s* method is the (lees efficient) method of mean moving ranges described in Hoel, "The Efficiency of the Mean Moving Range," Annals of Mathematical Statistics, XVII (1946), 475-82.
PRODUCTION RATES IN T H E LONG TERM
125
above what was expected because of the "systematic error" and it was therefore removed. The Problem in Work Measurement. In the work measurement area the "systematic error" among samples is largely due to the widely varying mean (cycle) times of different workers. This helps explain why successive sample means, as illustrated in Figure 9, also vary widely. There is still the question, however, of why the s* approach is better than other approaches also taking the variation among sample means into account. It might be appropriate, for example, to use limit criteria based on the formula: Σ (*< _ i-i In form this formula is identical with the denominator of the ratio-test formula but now, of course, it is applied to sample means. The reason the s* approach was adopted in the studies of granii stability is that, in most of the operations, a positive correlation was found among sample means. In that case the s* approach yields a somewhat smaller estimate of variability than its principal competitor, the 5 approach. The s* approach thus yields somewhat less stringent limit criteria than the conventional R approach but somewhat more stringent limit criteria than the 5 approach. This means that the s* method permits some inter-sample variation to remain in the process, but not nearly as much as the alternative S method. In statistical language this says that, given a certain risk of Type I error, s* limits involve a smaller risk of Type II error than S limits. By arguing along similar lines the 3 approach can be shown to be preferable when the correlations are predominately negative. In that case the δ value would be somewhat smaller than the s* value, and it would lead to more stringent limit criteria and a smaller risk of Type II error. In practice, a negative correlation would not often be expected in work measurement problems. It could well happen, however, such as when samples on extremely good operators are regularly followed by samples on extremely poor operators. But even this might have no ultimate significance. A positive correlation could easily exist among work periods, for example, and the correlations might well balance out.
126
A WORK MEASUREMENT
THEORY
The sampling procedure can have one of three possible effects on the extent and direction of correlation. I t can introduce a negative correlation as in the example just cited. This may be difficult to avoid if, say, just one good operator and one poor operator are assigned to an operation. The sampling procedure can also introduce a positive correlation when samples are taken on successively better or poorer operators, i.e., when a trend is created. This also may be difficult to avoid under certain operating conditions. Ideally, the sampling procedure should contribute neither to a positive correlation nor to a negative correlation. This ideal is approximately satisfied whenever samples can be taken at random from a large number of operators with well-distributed mean production rates. In any event the correlation contributed by the sampling procedure must be considered in evaluating the total correlation situation. The Immediate Situation. In the present studies the sampling procedure apparently had all three effects at different times. However, the positive correlation due to periods and days dominated in most cases so that the final result was either a positive correlation or no correlation. This helps to justify the use of limit criteria based on the s* approach. The next problem was to find a suitable value for the multiplier k in the s* limit formula. In general, a decision on this question narrows down to a choice between the integers, two and three, which correspond to 5 percent and 1 percent risks of making a Type I error. The integer two was selected here because it was felt that a 5 percent risk could be taken of concluding incorrectly that stability did not exist, i.e., making a Type I error. This made it possible to minimize the generally more important risk in research studies; this is the risk of concluding incorrectly that stability did exist, i.e., making a Type II error. Results on Operation 1. The sample means obtained from Operation 1 gave an s* value of 8.94 and 2s* limits of 41.2 and 77.0. Figure 9 shows that, the 2s* limits are much more realistic than the conventional limits, with only one exceptional sample mean. The variation among sample means can thus be considered essentially as the result of two principal components. The first represents random variations within samples; the second represents an expected correlation among sample means. According to the 2s* limit criteria, empirical action should be reserved for the fourth sample, which cannot be completely attributed to these components.
PBODUCTION RATES IN THE LONG TEEM
127
The ratio test was then applied to the sample means within days, but neither the ratio value for individual days nor the ratio value for the entire series turned out significant. Even so, a certain amount of positive and negative correlation did seem to exist. A number of ratio values were substantially smaller than 2, the value when there is no correlation, and a number were somewhat larger.10 Evaluating Individual Workers. The charts in Figure 9 are primarily intended to look into the question of grand stability for the operation as a whole. However, though workers are considered as random, internal variables in the group situation, a substantial amount of information can still be recovered about the long-term production characteristics of individual workers. To be sure, that information is not sharp because it is confounded with the effect of other variables, such as materials and work periods. Also, any time order relationships that may exist in the performances of individual workers are largely obscured since the samples were not taken in systematic order according to workers. This must be expected, however, since sharp estimates about internal variables cannot be obtained when the primary objective is to look into the question of grand stability for the operation as a whole. With these qualifications it turns out that a respectable amount of information can be obtained by rearranging the group data according to the worker. That procedure was applied to the data for Operation 1 with the results shown in Figure 10. On this chart the sample resulta for each operator are plotted according to the sequence in which they were observed; the single sample values obtained on Operators IB, ID, and 1E are included for the sake of completeness. On the chart· for ranges there are only minor differences among the R values of the operators; this result turned out to be characteristic of all the operations covered. More immediately, the uniformity of the R values made it possible to average them in computing a set of conventional limits. On the chart for means there are substantial differences among the X values of the operators. On the basis of these differences, separate limit criteria were computed for each operator using individual X values and 10 The application of an additive test for the existence of serial correlation in situations of this kind is described in Abruzzi, Work Measurement; the theory of the test is given in Baines, Methods of Detecting Non-randomness in a Given Series of Observations.
128
A WORK MEASUREMENT
THEORY
w · · •
70 60 A
«
.
.
-,τ—-
C
4
|A1
a so
·
G
m
•
•
· .
μ.
E
—
*
β
— Γ D
40
y
•
53.9 UCL 59.0 LCL 48.8
w
69.9 75.0 648
54.3 59.4 49.2
o:
52.4 57.5 47.4
UCL= 18.6
•
20
• •
ζ
64.8 69.9 59.7
IO 0
R 8.5
·
•
i 7.4
• · •
• ·
,·*
·
•
•
-s-
•« ·
11.2
•
9.0
·*
9.2
FIGURE 10. THE M E A N A N D RANGE CHARTS ON THE DATA FOR OPERATION 1 ARRANGED ACCORDING TO THE OPERATOR
the group R value. According to these limits the existence of grand stability is doubtful in a number of cases. In view of confounding effects, however, this conclusion must be considered tentative rather than decisive. It is clear from Figure 10 that conventional limit criteria are somewhat more realistic for individual operators than they arc for a group of operators, largely because inter-worker variability is eliminated. It must always be remembered, however, that the confounding effect of materials, work periods, and so forth might be strong enough in particular cases to justify using less stringent criteria even for individual operators. The Effect of Work Periods. The effect of work periods on production rates has been studied by a number of investigators, particularly S. Wyatt and Ε. H. Henshaw, P. Holman, and J. N. Langdon.11 These studies show that various work periods in the day have significantly different (gross) production rates, and that the (cycle) mean and variability both increase during the late afternoon. T o check the situation with net production rates, the sample data obtained in the grand studies were rearranged according to the work 11 Wyatt, Variations in Efficiency in Cotton Wearing, and llenshaw, Holman, and Langdon, Manual Dexterity; Effects of Training
129
PRODUCTION RATES IN T H E LONG T E R M
UÂLl®îZ47 _ 0
•
• USL'iÖL·. Sj =59 A
___
• LCL«54.J •
lcLTS*ͻ40.I
· · ·
·
LC L"(S*) *44.1
LCL(S»)~4I.7
·
U£L'.£8¿>fc.lfc.4 • .LCL'50.8
. UfiL'kiL·
LCLi'£5-0
LCL(S*]_'I08_S
100
30 en u i 20 g < κ IO O
• . R,«I2.9 '
* ' »
I
-·- ·*>·"·! .
·
. ·
2
-
..Ιϋίΐί5:? . ·
. · ' .V.13,1 ·
·
3
—
4
PCRIOOS
FIGURE 14. THE MEAN AND RANGE CHARTS ON THE DATA FOR OPERATION 2 ARRANGED ACCORDING TO THE WORK PERIOD, SHOWING BOTH THE OUTER 2 s * AND THE INNER 3-SIGMA LIMITS FOR THE MEANS
A study of Figures 11 and 14 shows that the means for Operation 2 vary much more than the means for Operation I. T o a large extent this is the result of the much larger grand mean value for Operation 2. Except for the question of magnitude, however, the main conclusions are the same in both cases. For example, the mean times for the second and fourth periods were, respectively, greater than the mean times for the first and third periods; this establishes that net production rates sometimes do decline in late morning and late afternoon. STUDIES ON OPERATION 3
A full set of 64 samples was taken on Operation 3, which had 10 elements. These samples represented the performances of 12 operators, 3.4 through 3L, working on garments ranging in size from 32 to 40. The
133
PRODUCTION RATES IN T H E LONG T E R M
LEGEND EX:EXTRA SAMPLES · : BROADCLOTH OR SATIN «:NYLON «¡NET
ISO 140 130 120 (Ζ A 110 « ω * 100
90
β
i
c - ¿ - c l --Ô
i ó
80
ΤΊΒ Β
& A li 70 60
~79 UCL'9.6
.«.
. . .
» , 3 3 2 3.' 3· Î' 2 η 4 . 4« 2 · 2 12 I· 2» R ; 4.6 «4 I · * · • · A · 2 2 II 2 3 3 ·· kt — 2 i .21 2 ST, . 2Î2 .232 I 4 2 I I 21 2 I I l i I S IO is 20 23 30 35 40 45 50 55 60 65 0 SUCCESSIVE SAMPLES 1
FIGURE 1 8 . THE MEAN AND RANGE CHARTS FOR OPERATION 6 , SHOWING BOTH THE OUTER 3 s * AND THE INNER 3-SIGMA LIMITS FOR THE MEANS
Six of the ratio values are significantly small. Some others are sufficiently close to the low critical values to justify the conclusion of a etrong positive correlation. The three factors, therefore, did make a substantial contribution to the total variation. This result also suggests that planning and regulating activities show up more sharpiy in the case of one worker than in the case of several. In any event the expected component of variation among samples here dwarfs the expected component of variation within samples. For that reason it was decided to adopt a less stringent multiplier and use the integer 3 as the k value in developing s* limit criteria. Other Studies and Tests. The 64 sample means and ranges obtained are plotted in time order in Figure 18, which also identifies size and style. According to the 3s* limits, which are given together with the conventional limits, the sample means could be considered stable. No adjustments were necessary in the criteria for ranges where conventional limits again proved satisfactory. These limits show that the ranges were essentially stable, though two of the 64 range values do fall above the upper limit.
139
PRODUCTION RATES IN THE LONG TERM
Since there was just one operator, it was a comparatively simple matter to make a number of relatively unconfounded studies. For example, the data for Style A were analyzed in terms of garment size. Conventional limits were developed, using the R value for that style, and applied as criteria of stability. The ability to apply conventional limits to individual styles confirms that inter-style differences did make a strong contribution to the total variation. According to these criteria the means for sizes 32 and 34 were not quite stable though the ranges were uniformly stable. The X values for the four sizes were almost identical, however, and there were only small differences among the R values. The four sizes could readily be considered equivalent for empirical purposes, especially when the confounding effect of periods, days, and so forth is taken into account. STUDIES ON OPERATION 7
A full set of 64 samples was obtained on Operation 7, also made up of only two elements. The samples were obtained on a single operator working on four different garment sizes and four different styles. The means and ranges were plotted on charts and identified according to size and style. In this case the ranges could be considered essentially stable; only two points fell above the upper conventional limit. Following the precedent of Operation 6, 3s* limits were used on the chart for means. In terms of these criteria the means also were concluded to be essentially stable. Evidence supporting the use of 3s* limits was again obtained from TABLE 1 4 .
TEST RESULTS ON THE DAILY MEANS FOR OPERATION DAY
8« Í*
Ratio
1 4.42 2.76 1.60
2 2.00 1.43 1.40
3 7.18 3.48 2.06
4 7.72 4.87 1.59
o 2.48 2.20 1.13
6 1.82 6.49 0.28»
3.48 2.55 1.36
8 18.38 6.36 2.89
14 3.80 8.07 0.47'
15 4.34 4.71 0.92
16 0.68 11.88 0.06»
DAY
4*
S* Ratio
9 0.58 0.97 0.60·
10 4.08 2.28 1.79
• Significant ratio values.
11 0.58 6.12 0.09»
12 0.38 0.20 1.90
13 3.26 1.59 2.05
7
140
A WORK MEASUREMENT
THEORY
ratio tests on the period means within days. The pertinent data are recorded in Table 14. Five ratio values are significantly small, and most of the others are small enough to warrant a general conclusion of a strong positive correlation. This result justifies the use of 3s* limit criteria for reasons given in the case of Operation 6. S T U D I E S ON P O O L E D DAILY
MEANS
Ratio tests also were made on the pooled daily means of each of the seven operations considered. The object of these tests was to establish whether there was a positive correlation among these means. If so, there would be another strong argument for the s* approach in evaluating sample means in the studies of grand stability. The basic data was obtained by pooling the means for the four periods in each day. This pooling process was justified by the fact that almost all the charts on period means supported a conclusion of stability. The results are shown in Table 15. TABLE 1 5 .
T E S T R E S U L T S ON T H E P O O L E D DAILY M E A N S OF THE SEVEN
OPERATIONS
OPERATION
Days covered (n) Ratio
1 16 1.61
£ 15 1.86
S 16 2.36
4 15 1.10'
6 15 0.90*
(7 16 1.25
7 16 0.68»
• Significant ratio values.
There are three cases with significantly small ratio values, and all except one of the others is smaller than the integer 2. This implies that there was a strong positive correlation among days. The correlation, being a substantial component of the expected variation among means, gives added and decisive support to the use of the s* limit approach.
CHAPTER
Interpreting Long-Term
TEN
Studies
According to the studies of grand stability, a substantial component of the total variation in mean (cycle) times is contributed by inter-sample variation. This was why limit criteria were chosen in which the intersample component is allowed to remain in the process as an expected component. The specific use of the s* approach was based upon the fact that a predominantly positive correlation was found to exist. Under these conditions limit criteria based on the s* estimate of variability involve a smaller risk of Type II error than limit criteria based on the alternative S estimate. Although conventional limits were unsuitable for evaluating the means on group charts, they proved acceptable for evaluating the ranges. Thus, the workers in a group have essentially the same consistency of performance, though they have widely different levels of performance. This is similar, though much more significant, to the findings of other observers, such as B. R. Philip, who looked into this question in terms of gross production rates. 1 Philip's studies, however, were confined to the laboratory, and they were concerned with a simple tapping task performed by students with presumably high motivation. The present finding, on the other hand, is based on studies of net production rates where delays were separated out. Also, these studies were made under factory conditions, on representative industrial operations, and on typical industrial workers. 1 Philip, "Studies in High Speed Continuous Work: I. Periodicity," Journal of Experimental Psychology, XXIV (1939), 499-510.
142
A WORK MEASUREMENT
O T H E R F I N D I N G S AND
THEORY
HYPOTHESES
Standardization of Work Methods. The results on means give added evidence that workers regulate their performance levels in terms of individual abilities and purposes. Similarly, the uniform consistency of the workers in a group suggests that this performance characteristic is a function of the gross work method used in an operation. Two factors seem to be at work: (1) a formal training process on work methods; (2) an informal training process where methods developed by certain workers are picked up by the others of a group. This is given strong support by related views presented in Chapters 7 and 8, suggesting that consistency of performance becomes stabilized somewhat earlier than level of performance, and that consistency, at least in a relative sense, depends much more heavily on the operation than on the worker. Altogether, these views have a highly important implication: uniform consistency in a homogeneous group of workers is a statistical indication that the gross aspects of work method have been standardized. The studies of grand stability, together with certain studies of local stability, show that differences among workers at the cycle level show up primarily in terms of level of performance. This means that, within gross and common work method, the minute aspects of work method are planned and organized by each worker. Work pace depends on the skill with which this is done, and differences in performance level simpiy reflect different modes of planning and organizing. (Chapter 8 shows that the organizing process also includes the handling of personal delays and even unavoidable delays.) When there is stability, consistency and level of performance refer to two different levels of standardization in the area of work methods. In the first case standardization is in the large, and it is common to all the workers in a homogeneous group. In the second case standardization is in the small, and it is unique to individual workers. Inter-operation Relations between Level and Consistency. The studies on grand stability also bring out two relations between level and consistency of performance on an inter-operation basis. This is shown in Table 16. Apparently, s values, which measure absolute consistency, increase as the mean increases. At the same time the $/X values, which measure
I N T E R P R E T I N O LONG-TERM TABLE
16.
THE
RELATION
BETWEEN
LEVEL
GRAND PRODUCTION
143
STUDIES AND
CONSISTENCY
IN
RATES
OPERATION
X estimate i estimate « / * (%)
7 9.9 1.7 17.2
6 12.8 2.0 16.0
δ 24.4 2.7 11.1
4 45.5 2.9 6.4
1 59.1 3.8 6.4
S 88.1 5.1 5.8
« 144.3 6.0 4.2
relative consistency, decrease, though at a gradually declining rate. There seems to be a practical upper limit on relative consistency which is approached with operations having large X values—provided, of course, they are standardized in the grand stability sense. In any event, production rates do become more variable when the mean production rate increases. Relative consistency also seems to be a function of the duration of the operation, but this increases rapidly under the same conditions. Similar results were obtained in the studies on comparative measurement methods, indicating that these two relations apply quite generally to work measurement data. Added evidence of this is provided by results of studies on elements and motions described in later chapters. Work Patterns over Time. The studies on grand stability show that except for the first two operations the (net) mean production rates for the four work periods were about the same. Although less evidence was obtained on this, the studies also show that there was no substantial difference between the production rates of morning and afternoon. These results lend little support to the hypothesis of earlier investigators, such as Smith. 2 This hypothesis, it will be recalled, states that the (gross) mean production rate increases substantially in late morning and late afternoon, with an over-all upward tendency as the work day progresses. In general, the studies of grand stability also fail to support the related hypothesis of Weinland and others that the variability of (gross) production rates increases substantially in late afternoon. 3 The principal experimental difference between the present and the earlier findings is that the production rates considered here do not include delays. As Chapter 8 suggests, the reported differences in gross * Smith, Some Studies in the Laundry Trade. * Weinland, "Variability of Performance in the Curve of Work," Archives of Psychology, Vol. XIV, No. 8 (1927).
144
A WORK M E A S U R E M E N T
THEORY
production rates must, therefore, be due to increased incidence of delays. These results also can be considered in terms of the findings on standardization of work methods. Viewed in these terms they indicate that workers, as individuals and as groups, do have stable work methods over time, both in the gross sense and in the minute sense. But stability is achieved and maintained only because there are sufficient opportunities for changes of activity, through delays, rest periods, and so forth, to prevent work methods from becoming unstabilized. To have stable work methods it is apparently necessary to establish a complementary and balanced relation between systematic work activities and nonsystematic work activities—a subject explored in detail in later chapters, particularly Chapter 18. Redefining Lot Size. Another result of the grand studies is that lot or stratum size could be redefined to cover a full day's production. The rationale is that the various periods turned out to have essentially the same production-rate characteristics. This is an added example of the fact that definitions, such as of lot size, have a sequential character and should be finally accepted only after being verified in experimental terms. Serial Correlations in Grand Studies. The positive correlations found in the studies of grand stability have two important implications. In an analytical sense they help to justify the s* approach for establishing limit criteria. In an empirical sense they provide useful information about long-term production-rate characteristics. A significant correlation among days, for example, brings out that the performance characteristics of an operation are integrated in a long-term pattern; they are certainly not made up of additive pieces that are essentially independent from one day to the next. ADVANTAGES OF ESTABLISHING
STABILITY
Establishing Grand Stability. Each of the advantages outlined in Chapters 7 and 8 for establishing local stability has a direct counterpart in the case of grand stability. One of the most important results of establishing grand stability, for example, is the possibility of making precise estimates, although the estimates now apply to the operation as a whole. These estimates can be used to help decide whether the process is standardized at an acceptable level. These estimates also can be used to
I N T E R P R E T I N G LONG-TERM
STUDIES
145
determine whether, in terms of cycle performance, the work method itself is standardized, both in a gross sense and in a minute sense. Establishing Stability. I t would be redundant to review all the other advantages of making grand studies. The most important advantage is making criteria available for deciding whether the process is standardized at one or more of the three levels. Local stability, it will be recalled, indicates that there is secondary standardization in the local and shortterm sense, while grand stability indicates that there is secondary standardization in the grand and long-term sense. In summary form the main advantages of looking into the question of stability are that: (1) experimental criteria are made available for isolating assignable causes interfering with the process of standardization; (2) with statistical stability and standardization, precise estimates can be made as guides to production performance and production policy. Production policy-making means making decisions as to whether the process is standardized at an acceptable level. If this is not the case, fundamental changes will be needed to bring about a more nearly optimal state of standardization. If well planned, studies of stability also provide a respectable amount of information about internal aspects of a process, such as work methods, the nature of output regulation, and so on; this information is likewise useful in establishing production policy. Detecting Changes in Production Rates. In time study work, changes in method, material, or design create a problem because there is no systematic way to take them into account in the standards. This is why the problem is handled arbitrarily in practice, with different time study writers recommending procedures for its solution which are frequently contradictory. Carroll asserts, for example, that whenever the method is revised the production standard should be revised, though he doesn't show exactly how.4 On the other hand, Mundel suggests making occasional check time studies so that the effect of cumulated improvements can be incorporated into the standard. Mundel's criterion is that there should be a change in a production standard whenever the mean production rate is reduced by 5 percent or more, though he also fails to specify exactly how.6 Studies of grand stability solve the problem by the simple and eco4 6
Carroll, Time Study for Cost Control, p. 36. Mundel, Systematic Motion and Time Study, p. 177.
146
A WORK M E A S U R E M E N T TI1EORY
nomical device of taking additional chart samples. The criteria of grand stability, supplemented where needed by appropriate tests, can then be used to decide whether there has been a significant change—not just a significant improvement. It is possible in this way to determine not only whether there has been a change in the level of performance, but also whether there has been a change in the consistency of performance. The question of what to do about the standards is settled in a later evaluation process; it should have no bearing on whether there has been a change in production rates. Production Specifications. Specification-setting processes are thoroughly explored in Chapter 4, but it cannot be emphasized too strongly that production specifications, whether managerial or labor, are based on an evaluation process. The evaluation process must be distinguished from the estimation process. To be sure, prior estimates can be used to help develop or revise specifications but, at any given time, specifications should be compared to estimates to determine whether a process is acceptable. The principal value of sharp estimates in the specification-setting process is that they indicate what is attainable. Too many specifications are attainable only in the mind of the specification-writer. S. Wyatt, J. N. Langdon, and F. G. L. Stock report, for example, that from 14 to 48 percent of the available work time in certain machine-feeding operations which they studied was not utilized, primarily because the workers were unable to keep up with the machines.6 This problem could readily have been avoided if realistic estimates had been available before worker assignments were made. • Wyatt, Langdon, and Stock, The Machine and the Worker: A Studi/ of MachineFeeding Processes.
CHAPTER
Designing Studies of Production
ELEVEN
Activity
Statistical stability is not a fixed state to be achieved once and for all by applying a simple and fixed set of rules. It is possible to define an indefinitely large number of levels of statistical stability and, of course, standardization. In practice, however, an optimal level is selected for a given problem after all the statistical and empirical factors are taken into account. An absolute state of stability is just as much an idealization as an absolute state of standardization. Idealization is no problem for the mathematician who can always postulate well-defined populations with well-defined properties. For him the postulate eliminates the fundamental problem, and he has no trouble showing that samples would also have well-defined properties. In real life, however, stability must always be less than absolute, which is the same as saying that it must be defined in empirical terms. An empirical definition is useful exactly to the extent that the real-life situation can be made to correspond to some idealized model. The criterion of success is the ability to make estimates with an acceptable degree of precision, while retaining the essential character of the real-life situation. Once the basic postulate is established, the question of number of samples is solved in the ideal case by asking for infinitely many. The symbol for infinity may solve the problem for the mathematician, but it is no solution for the practitioner. He would like to hang on to the mathematician's coattails and specify an extensive number of samples, but he must also hang on to the fact that samples are costly. His refuge is intuition and experience. Experience in product quality applications indicates that a satisfactoiy judgment on stability can be
148
A WORK MEASUREMENT
THEORY
made with 25 samples. Intuition, aided with a modest number of applications, suggests the same rule-of-thumb for studies of local stability, where the consequences of incorrect judgments on stability are neither immediate nor deep. But the consequences of studies of grand stability are immediate and deep. This implies that at least 40 samples should be obtained before an initial judgment on grand stability is made. The Continuity Problem. Even after stability is established, it is desirable to take periodic samples. By this means the process of attaining stability becomes continuous and self-correcting, yielding what can be considered the closest approximation to the idealized situation. In that process the frequency of sampling can gradually be reduced as more confidence can be placed in the conclusion of stability; the result is that standardization is maintained at an acceptable level with minimal costs. Maintaining stability charts on a continuous basis requires that decisions be made on a number of problems, all centering around the frequency of sampling. These problems are solved by taking samples according to the expected frequency of assignable variations, and even more fundamental changes, as determined by recent chart histories. This suggests that additional samples should be taken somewhat more often on novices than on experienced operators. Additional samples should be taken in any case, for workers seem to keep learning even after extensive experience.1 However, the rate of improvement decreases rapidly with time, and the sampling interval can gradually be lengthened so that, eventually, only an occasional sample is needed. Preserving Time Order. Different limit criteria were used in Chapters 7 and 9, but in all cases the sample data were arranged in time order. This is a natural consequence of the fact that in work measurement the fundamental parameter is time. It is time relationships that make it possible to identify assignable causes requiring direct action, and it is also time relationships that give information of long-term value, such as the nature of output regulation. These advantages simply would not exist if sample data were arranged according to magnitude. Despite this fact, it is rather common practice to arrange data according to magnitude and make certain "goodness of fit" tests. Such tests, however, are completely useless for evaluating the question of stability. Like all tests which have a formal basis, "goodness of fit" tests are based 1 See, for example, Burnett, An Experimental and Flügel, Practice, Fatigue, and Oscillation.
Investigation of Repetitive
Work,
DESIGNING STUDIES OF PRODUCTION ACTIVITY
149
on the assumption that stability already exists. They can only show whether the stability takes a certain given form, which is really a trivial question once stability can be assumed. Selecting Limit Criteria. In theoiy, limit criteria should be based upon the risks that can be tolerated of making the two types of error. In stability chart work, however, the risk of Type II error can be defined in numerical terms only when stability can be considered to exist. Here the assumption that stability exists is just as crippling as it is in the "goodness of fit" case. Assumptions also must be made about the nature of the underlying population and the specific alternatives to stability being considered. When all these assumptions can be made, limit criteria based on nearly exact risks of Type I and Type II errors can be developed.2 These criteria, it must be emphasized, are primarily useful for detecting welldefined departures from stability once it has been established; they are not useful for determining whether stability exists initially. Indeed, their only advantage over "goodness of fit" tests is that they do take into account time order. The only practical recourse is to develop limit criteria which will result in an optimal balance between the costs of the two types of errors in the event that stability exists, and which will offer an optimal amount of information in the event that it doesn't. This means that limit criteria are evaluative rather than estimative in nature. To be sure, evaluations can and should make use of estimates, but the estimates must always be considered in terms of a value system. It makes empirical sense, for example, to choose a set of criteria that gives the smallest Type II error with a given Type I error. This is why s* limits were selected over 3 limits in studies of grand stability. It is also possible to manipulate the Type I error to reduce the Type II error; this may be desirable in certain situations, such as in exploratory research. The Sample Size Question. Even where it can be defined, achieving a stipulated risk value for Type II error involves manipulating the sample size. With a fixed Type I error, for example, the risk of Type II error can only be reduced by using larger samples. However, as is so often the case with formal facts, this is not of much use in practice. It turns ' See, for example, Scheffé, "Operating Characteristics of Average and Range Charts," Industrial Quality Control, V, No. 6 (1949), 13-1S.
150
A WORK MEASUREMENT
THEORY
out that assignable causes may develop so rapidly in real-life problems— and production-rate problems are certainly no exception—that they might readily be absorbed, except in small samples. As soon as stability has been established, such rapid changes may be expected to occur much less often. This makes it possible to use a larger sample size and take advantage of a reduced risk of Type II error. Paradoxically, the problem of Type II error becomes relatively unimportant then so the sample size or, what ultimately amounts to the same thing, the sampling frequency can usually be reduced under these conditions. The same comments apply to the proposals for detecting departures from stability; departures would not often be expected after stability is established. Similar reasoning applies in deciding on lot or stratum size. It seems prudent in the beginning not to make the lot large for fear that it might not be homogeneous. When stability has been established, however, the lot can often be increased in size without much danger. In the studies of grand stability, for example, the lots were originally defined in terms of work periods. I t turned out that discretion here was the poorer part of valor, and that the basic lot might well have covered a full work day. Measures of Central Tendency. Though a number of sample estimators are available for the purpose, the sample mean has been consistently used here as a measure of central tendency or, more formally, to estimate the population mean. Sample means have the advantage of an approximately normal distribution except with small samples, and even with small samples the X value is normally distributed. It is often pointed out that the sample mean also has the advantage of being unbiased, which means essentially that it yields estimates that are too low as often as estimates that are too high. This is really a trivial advantage; it is no trouble at all to make any legitimate sample estimator unbiased. The most important advantage of using sample means is realized when the population itself has an approximately normal distribution. In that case it is an efficient estimator, which is a statistical term meaning in effect that the sample mean comes closer to the population mean than any other sample estimator. Under the assumption of normality, the median can also be used to estimate the population mean, though it is less efficient than the sample mean. The efficiency differential is not important with small samples, however, so that the median might well
DESIGNING STUDIES OF PRODUCTION ACTIVITY
151
be selected in such cases because it is obviously simpler to compute.1 Measures of Variability. The same considerations are involved in choosing among measures of variability or, more formally, estimators of population variability.4 The sample range is made to order for studies of stability where sample sizes are usually small. The sample range is then almost as efficient as the sample standard deviation and, of course, it is much easier to compute. The sample range yields to the sample standard deviation, however, with somewhat larger sample sizes, such as those used in studies of measurement methods. The sample standard deviation is then markedly more efficient and more stable (in the statistical sense) than the sample range.6 THE PROBLEM OF ACCURACY
The problem of accuracy of estimate is not considered in most stability chart work or, for that matter, in much of statistical literature. This is probably because accuracy of estimate is ultimately an experimental problem rather than a statistical problem. Yet statistical theory, particularly sampling theory, does provide extremely useful guides for designing experimental procedures that will yield accurate results.® Accuracy and Bias. The experimental concept of accuracy has a formal analogue in the statistical concept of unbiasedness. To insure an unbiased result the theory of statistics suggests that samples be obtained which represent all the significant strata or homogeneous subdivisions present in the problem being investigated. In the present studies of grand stability intra-day strata were taken into account by defining four work periods, each of which was assigned the same number of sample units. Inter-day strata also were taken into account by sampling each work day to approximately the same extent. Accuracy and Stratification. The process of stratification can theoretically be extended in almost infinitely many directions. In the studies of 3 The comparative advantages of the mean and the median are considered in Barnard, The Use of the Median in Place of the Mean in Quality Control Charts. 4 Common measures of central tendency and variability are compared in Shewhart, Economic Control cf Quality cf the Manufactured Product. 1 See, for example, Pearson and Hartley, "The Probability Integral of the Range in Samples of η Observations from a Normal Distribution," Biometrika, X X X I I (1942), 301-10. ' See, for example, Hansen, Hurwitz, and Madow, Sample Survey Methods and Theory.
152
A WORK MEASUREMENT
THEORY
grand stability, for example, it might have been possible to differentiate different classes of workers according to some criterion, and then provide for an appropriate representation of each class in the sampling procedure. In the same way it might have been possible to distinguish sewing machines according to manufacturer and so forth. In practice, however, the process of isolating strata should be extended only to the point where the degree of bias in the results is reduced to an empirically defined minimum. This, too, is an optimization problem in which the desired goal of minimizing bias must be weighed against the contradictory but also desirable goal of minimizing sampling cost and difficulty. How this is done in practice is illustrated in Chapter 12 where nonhomogeneous operations are considered in a single (grand) delay study. This chapter also shows that period and other strata need not be given equal weight in the sampling schedule if they are considered of unequal importance according to some appropriate weighting system. Period and other time strata have a second important function. That function is illustrated by the studies of grand stability, where time strata were used to take into account the temporal effects of other and welldefined variables, such as machine type, which also might have been stratified. In these cases the decision not to stratify was primarily based on economic consideration. Eut there are many poorly defined variables also having differential time effects, but which are difficult to measure or even define, such as fatigue, morale, teamwork, and so forth. These variables cannot readily be stratified and, hence, they can only be taken into account by means of time strata. Concededly, little information is made available about individual effects, but it is possible with sufficiently sensitive time strata to take these variables into cumulative account in making estimates. The result is that estimates are essentially unbiased in the sense of representing all variables with a significant time component, measurable or otherwise. Continued sampling, even after an initial conclusion of stability, helps to insure that in long-run estimates, there is a minimum amount of bias in this sense. Thus, the process of stratification is primarily experimental rather than formal; to be successful stratifying must be based on an intimate knowledge of the nature and purposes of the immediate application. Systematic Errors. There is one important aspect of the problem that
DESIGNING S T U D I E S O F PRODUCTION ACTIVITY
153
is purely a function of empirical considerations. That is the question of systematic measurement and other experimental errors. This question can be answered only when alternative measurement and, more generally, alternative experimental methods are available. In the work measurement area alternative measurement methods are readily available for evaluating systematic measurement errors. But other systematic errors, such as bias introduced by workers, at best, can be said to be minimized in any work measurement investigation. The procedure followed in the studies of grand stability, for example, can be considered to yield accurate results in the sense that systematic errors are small, at least with respect to the magnitude of the estimators used, i.e., the mean cycle time values. Stratification and Précision. Designing sampling procedures that identify and weight empirically significant strata also has a fundamental statistical advantage: it yields an estimate of population variability that is smaller, often by a substantial amount, than the estimate obtained from an ordinary random sampling approach.7 This is important in the sense that the final estimates are more precise than they would otherwise be. T H E JOB SHOP
PROBLEM
The procedures for evaluating stability and for making sharp estimates are tailor-made for repetitive operations. A repetitive operation really means that there is an indefinitely large reservoir of items which are measurable in terms of the same units. This is exactly what is needed to make effective use of sampling procedures, particularly in estimating the long-term characteristics of a work situation. It is a simple matter to make long-term estimates for repetitive operations. These estimates define certain long-term properties of an operation, and a repetitive operation provides the long-term data that make this possible. A Simple Job Shop Problem. To analyze a job shop situation it is necessary to give it the characteristics of a repetitive operation. This can best be seen by developing some examples. An extremely simple case arises when the same product is made on different types of machines. This problem could be handled with a sampling procedure assigning ' This subject is discussed in Hoel, Introduction to Mathematical Statistics, pp. 225-28.
154
A WORK MEASUREMENT
THEORY
observations to the different machine types in proportion to their empirical importance. If they produce essentially the same number of items in a base time period, the assignment might be made in terms of the number of machines of each type. If not, the assignment might still be made in terms of the number of machines of each type, but weighted by an estimate of the number of items produced in a base time period. In any event the approach amounts to a straightforward extension of the procedure used in developing the period strata in the studies of grand stability. In this simple example the same measurement unit, the time required to produce one item, can be used for all strata. However, no subgroup of machine types in the example is capable of producing enough items to satisfy the related requirement of an indefinitely large reservoir. By equalizing the production differences among the machine types, stratifying brings this about. Sampling procedures can then be applied with just as much assurance and just about as much value as though there truly were a repetitive operation. An Intermediate Job Shop Problem.. A more complicated kind of job shop problem arises when there are distinct product differences. An example is when different operations make the same product but at varying production rates, and the output of all the operations is pooled into a single stream. Two alternatives are p.vpilable in this situation, each leading to essentially the same result. The operations could be considered as strata to be sampled according to importance, which would be determined here by estimates of their mean production rates. Except for technical details this is what was proposed in the case of the different machine types. This method thus stratifies in the sense that the operations are weighted and the measurement unit is held constant. An alternative method is normalizing where the operations are held constant and the measurement unit is weighted. In the present example this is done by weighting the original measurement unit, the absolute time required to produce one item, in terms of the estimated mean times of the operations. Stratifying and normalizing use a similar weighting system; in the first case the weighting is done with respect to the operation, and in the second case the weighting is done with respect to the measurement unit. To a large extent, whether to stratify or normalize depends on the; empirical characteristics of the problem. However, stratifying does have
DESIGNING STUDIES OF PRODUCTION
ACTIVITY
155
one important advantage : it yields estimates with a more well-defined precision. This means that stratification is preferable in cases where it is simple to apply, for example, on the multi-machine type problem. A Complicated Job Shop Problem. The job shop problem becomes increasingly complicated as the operations being considered and their properties become more and more dissimilar and as the production runs become shorter. Stratifying might then become so difficult to apply that its formal advantage over normalizing would be dissipated. If the operations being considered v&ry substantially in quality characteristics, for example, speed of production and degree of mechanical control, the measurement unit would probably have to be normalized by giving appropriate weight to each of these factors. No general rule can be laid down for this except that, to be successful, normalizing should be based on the characteristics and requirements of the immediate environment. The normalized production index would also have to have the property of linearity so that the index values for the various operations could legitimately be added and multiplied. Normalizing will then yield a sufficiently large number of units with a common measurement base to permit effective use of sampling procedures. It must always be remembered, however, that there is a price for using the normalizing approach; that price is the loss of some sharpness in defining the precision of estimate. T H E F A I L U R E O F T I M E STUDY
PROCEDURES
Earlier chapters show that time study is essentially a component of bargaining between management and labor. They also show that time study is grossly inadequate either for the purpose of standardizing the process or for the corollary purpose of developing precise estimates. In most cases, it will be recalled, a time study consists of taking a limited number of successive readings over an arbitrary period of time, usually on a single worker, also selected arbitrarily. At best the estimating component of time study consists of taking a quick look at the production rates of one worker. This makes time study a local study in form though not in content for, in taking this look, no attention is or could be paid to establishing stability or developing precise estimates. But time study is intended to be used for developing long-term production specifications, not for the purpose of looking into local production rates. This simply cannot be done from local studies
156
A WOBK MEASUREMENT THEORY
even if well designed, which means that time study doesn't estimate much of anything, let alone what it is supposed to estimate. Claims of Accuracy. Despite this fact no one seems to have any great qualms about claiming that time study results are accurate. In many cases the claims are phrased in terms of a numerical range, such as ± 5 percent, in the face of the fact that accuracy and precision are rarely defined or even differentiated in the literature. Wherever it is mentioned, accuracy is naively thought to be equivalent to multi-decimal computation. The most fundamental objection to the accuracy claims is that they refer to standards or specifications, not estimates; the concept of accuracy simply cannot be applied to specifications in any field. To believe that a specification can be accurate is to believe that value systems can be accurate—a belief that would deny that there is value in human purpose. This view can only be supported by decree whereby human purpose is denied by postulating, paradoxically, what human purpose should be. To woo those who do not readily accept verbal decrees, some of the claims are dressed up in numerical terms. Frederick Shumard, for example, claims that rating—which is an evaluation procedure, pure and simple—is accurate to within ± 7 percent. 8 Even the arithmetic is bad in this case, for he joins the crowd in claiming that the final standards are accurate to within ± 5 percent. It is scarcely necessary to add t h a t no evidence is ever brought forward to support such claims and, indeed, none ever could be. Critical Views. Time study has been vigorously attacked almost since it was introduced by Taylor around the turn of the century. 9 In 1915, Hoxie made the first large-scale investigation of the subject by looking into applications in some 30 plants. His conclusions are aptly summarized in the comment t h a t : far from being the invariable and purely objective matters that they are pictured, the methods and practices of time study . . . are the special sport of individual judgment and opinion, subject to all the possibilities of diversity, inaccuracy, and injustice that arise from human ignorance and prejudice.10 Hoxie went on to charge that time study is "actually incapable of yielding the exact result claimed, and that, throughout, process and • Shumard, A Primer of Time Study, pp. 22, 76. • Taylor's views and proposals aie discussed in Taylor, Scientific Management. 10 Hoxie, Scientific Management and Labor, pp. 46-50.
DESIGNING STUDIES OF PRODUCTION ACTIVITY
157
results are peculiarly dependent on human judgment and prejudice." Hoxie also points out that arbitrary judgment is exercised in a score of steps in the time study process including: (1) the criteria for deciding whether the production process is standardized; (2) the mode of selecting the worker to be observed; (3) the criteria for discarding "abnormal" readings; (4) the method used to summarize the observed data. Among other early critics of time study were Frank and Lillian Gilbreth, who did most of the pioneer work on motion study.11 Among other things, they criticized the arbitrary choice of subject-workers, the arbitrary number of observations, and the arbitrary discarding of readings. These views have been essentially restated by more recent observers, such as Ryan, who considers all time study procedures to be essentially subjective.12 Ryan also objects to time study because it does not define the relation between subject-workers and other workers. In another critique Richard Uhrbrock points out that "the difficulty of discussing [time study] problems thoroughly is complicated by the fact that [time study writers and practitioners] do not ordinarilyjmblish adequate data in support of their assumptions." He also asks, "By what method do engineers determine the variability of working conditions, flow of materials, and quality of materials characteristic of a given operation?"13 Of the numerous labor spokesmen who have looked into these questions, Gomberg makes the most incisive and detailed comment." His principal conclusions are that: (1) subjective methods are used to select subject-workers; (2) the precision of reading and recording data is undetermined; (3) the number of readings usually taken is not large enough to obtain reliable estimates; (4) there is no justification for discarding "abnormal" readings simply because of magnitude; (5) most of the methods used to summarize observed data are invalid. Recently Davidson has published a comprehensive study of time study in which he raises essentially the same questions that other critical students raise.16 But he goes considerably further and considers in detail 11 F. B. Gilbreth and L. M. Gilbreth, "Time Study and Motion Study as Fundamental Factors in Planning and Control; an Indictment of Stop-Watch Study," Bulletin of the Taylor Society, VI (1921), 91-135. 12 Ryan, Work and Efori, pp. 210-19. 13 Uhrbrock, A Psychologist Looks at Wage-Incentive Methods, pp. 4-5. 14 Gomberg, A Trade Union Analysis of Time Study. 15 Davidson, Functions and Bases of Time Standards.
158
A WORK MEASUREMENT THEORY
various time study systems, from Taylor's original proposals to the very latest trivial variation of those proposals. This summary clearly establishes that time study is hardly fit to be classified as an estimating procedure. Indeed, Hoxie's critique is enough to establish this. But it is still useful to go into particulars, if only to acquire a fuller appreciation of the proposed procedures and the purposes they serve. Before doing this it should again be pointed out that time study is imprisoned, though presumably without sentence, by the managementlabor bargaining process. This largely takes the form of contract provisions specifying the manner in which time study is to be applied. Some of these provisions refer to questions considered by the critical students; they attempt to protect workers from being penalized because of the essentially evaluative nature of time study. For example, the UAW booklet recommends contract provisions stipulating that (1) no clement is to be considered whose mean time is less than 6 hundredths; (2) each time study should include at least 20 observations or last a t least onehalf hour; (3) the mean value shall be used to summarize observed data. 16 PRECISION OF ESTIMATE
Sample-Size- Recommendations. Many critical students have called attention to the fact that the number of observations taken in time study work is inadequate and, in any event, is not systematically determined. There is, indeed, a wide variety of prescriptions on the subject. Shumard, among others, recommends that the number of readings be determined by the observer according to the nature of the operation. 17 He qualifies this recommendation by saying that about ten readings should be taken under most circumstances. Walter Holmes makes a more specific, though equally arbitrary, recommendation. 18 He considers 2 to 6 readings enough for long operation cycles, and 10 readings enough for other cases. A presumably statistical recommendation is made by Martin Wiberg. He suggests that a time study should continue "until at least five and not more than ten repetitions of any one of the time values are obtained." 19 This recommenda" " " "
The VAW-CIO Look» at Time Study, p. 30. Shumard, A Primer of Time Study, pp. 124-25. Holmes, Applied Time and Motion Study, pp. 147-48. Wiberg, The Work-Time Dwtrihulion, p. 15.
DESIGNING S T U D I E S OF PRODUCTION ACTIVITY
159
tion suffers from the fatal defect, among other defects, that it openly invites the observer to be biased. These recommendations can hardly be given serious attention in view of the sampling considerations outlined in the chapters on statistical stability. Some of the recommendations might be acceptable for defining the sizes of single samples in a series. They are utterly inadequate for finding out anything about the grand characteristics of production rates; single samples of the kind recommended simply cannot give any information about long-term characteristics, such as whether there is stability and standardization. Sample Sizes and Confidence Intervals. The precision problem can best be brought out with the aid of an example. Grand stability will be assumed established in an operation having an X estimate of 6.00 hundredths and an S estimate of 2.00. A single sample of 25 units will also be assumed; this is as large as most of the samples recommended in time study literature. A third assumption is that a 5 percent risk could be taken of making an incorrect statement about the population mean value. Under these assumptions an interval is computed from the formula
X ±
2(g/Vn).
In the present case this gives 6.00 ± 2(2.00/v / 25), with final limit values of 5.20 and 6.80. The population mean value could therefore be said to fall in the interval between 5.20 and 6.80 hundredths, with 6.00 hundredths as the most likely value. As it stands, however, the statement is incomplete for it does not take into account the fact that an estimating statement can never be absolute. The 5 percent risk value leads to the following qualification: estimating statements of the kind given will be correct approximately 95 percent of the time if the same estimating procedure were applied to an indefinitely large number of distinct situations. The Meaning of Confidence Statements. In statistical terminology this is called a "confidence statement" since it refers to the confidence that can be placed in the stated interval. The interval itself, whose limits in the example are 5.20 and 6.80, is called the "confidence interval," and it defines the precision of estimate or the estimating error. Using R to
160
A WORK MEASUREMENT
THEORY
denote the risk value adopted, Ι - ß represents the degree of confidence that can be attached to the stated interval. Its numerical value, which is 95 percent in the example, is called the "confidence coefficient." The multiplying factors used in practical problems are usually obtained from tables of the normal distribution since, it will be recalled, sample means have an approximately normal distribution except for small sample sizes. With a confidence coefficient of 95 percent, the tables give the multiplying factor 2, used in the formula. Another common confidence coefficient is 99 percent, which corresponds to the multiplying factor 3. These two values are considered to satisfy most empirical requirements though multiplying factors and, hence, estimating intervals can be computed for any other confidence coefficient.20 The Acceptability of an Interval. In the studies of grand stability the sampling schedule called for 64 samples of 5 units each, a total of 320 observations. If that many observations had been assumed in the example, the confidence limits would have been 5.78 and 6.22. This interval is much shorter than the interval obtained with a single sample of 25 units, but this doesn't fully establish that single samples of that size are unacceptable for estimating purposes. This question can be answered only by some criterion of what is considered an acceptable estimating interval. Such a criterion should be developed in terms of the requirements of the immediate situation, though it does seem reasonable to assume that a minimum confidence coefficient of 95 percent would usually be required, along with a maximum interval of ± 5 percent. According to this criterion the estimating interval obtained from a single sample of 25 units would clearly not be acceptable. Figure 19 gives a chart intended to simplify decisions of this kind. (In single-sample situations the chart refers to sample size, but in multisample situations it should be interpreted as referring to the total number of observations.) This chart can be used to determine whether, with a given 5/% estimate, the total number of observations is sufficient to yield an acceptable confidence interval. The coefficients considered are the two common ones of 95 percent and 99 percent. With the estimates used in the example, the chart quickly verifies that 25 readings is far from sufficient to insure a maximum interval of ± 5 percent. 10 Confidence intervals are discussed in Hoel, Introduction tistics.
to Mathematical
Sta-
DESIGNING STUDIES OF PRODUCTION ACTIVITY
COEFFICIENT
OF
VARIATION
161
(PERCENTAGES)
FIGURE 19. CHART FOR ESTIMATING THE NUMBER OF OBSERVATIONS REQUIRED TO OBTAIN MAXIMUM CONFIDENCE INTERVALS OF ± 5 PERCENT FOR TWO COMMON COEFFICIENT-OF-VARIATION VALUES
Figure 19 also can be used to determine how many observations should be taken to insure an acceptable interval, assuming that stability can be considered established. An estimate of the coefficient of variation is now required; this is ordinarily available from earlier studies on the same or similar operations. Using the same illustrative data, with a 95 percent coefficient there would have to be well over 150 observations, and with a 99 percent coefficient there would have to be well over 300 observations. This establishes that the sample sizes recommended in time study literature are hopelessly inadequate for estimating purposes, let alone for establishing whether stability exists. On the other hand, the 300-odd
1G2
A WORK MEASUHEMENT THEORY
observations taken in the studies of grand stability would yield acceptable estimating intervals for most coefficient-of-variation values likely to be encountered in practice. Studies of grand stability therefore have a double virtue: they establish whether stability exists and they yield acceptable estimating intervals. Planning Studies from This Viewpoint. In the chapters on statistical stability certain recommendations are made on sample sizes and the minimum number of samples. These recommendations are based on the assumption that primary interest is in attaining stability and standardization, with secondary interest in obtaining precise estimates. In that case sampling would ordinarily be continued after an initial judgment on stability so that, ultimately, estimates of a high degree of precision are obtained. In certain cases, however, it might be desirable to have a sufficient number of samples to insure that the initial estimates will have a predetermined degree of precision. In that case Figure 19, or the formula from which it was developed, can be used to estimate the total number of observations required. This may, of course, involve taking more than the minimum number of samples recommended in the chapters on statistical stability. Then, if stability can be established, the estimates obtained will have the required degree of precision. The procedure has not been illustrated in the case of production rates simply because the emphasis there is on the question of stability. The requirement of representative long-run results will ordinarily insure an acceptable degree of precision, especially when sampling is continued after it is concluded that stability exists. In (grand) delay studies, however, chart studies are often terminated after the initial evaluations and estimates are made. What is even more important, these studies are usually based on qualitative data which require many more observations than quantitative data to achieve a given level of precision. For this reason it is desirable to base the question of observations on the precision of estimate; the required procedure is illustrated in Chapter 12. OTHER ASPECTS O F CURRENT PROCEDURES
Estimators. Time study literature gives a wide variety of recommendations to those searching for measures of central tendency, including the; mean, the median, the mode, and various others, such as the so-called
DESIGNING S T U D I E S OF PRODUCTION
ACTIVITY
163
"typical time" recommended by Gillespie.21 The trouble with these recommendations is that the theory of pure competition does not apply to measure? of central tendency, if indeed it applies to anything real. The choice is heavily restricted when the measure is intended to be an estimator, the only legitimate function it can perform. This restricts the choice to the sample mean, with the median an acceptable alternative in certain cases. The other measures recommended by time study writers simply have no value. Even when an acceptable measure of central tendency is used, variability is not given even token consideration. There is no way under these conditions to define precision of estimate—another indication that time study writers will have nothing to do with estimation. Apparently estimates would just slow up the process of developing "accurate" production standards. The Problem of "Abnormal" Readings. The general impatience with estimation processes also shows up in the "abnormal" readings doctrine. This doctrine postulates that certain extreme values are unsuitable, presumably because they interfere with the neat packet of numbers considered desirable. Typically, Lowry, Maynard, and Stegemerten write that an "abnormal" reading is one which is extremely high or low and therefore easy to pick out.22 To make the discarding process more palatable, some writers use numerical criteria for this purpose. Schutt, for example, regards readings differing from the average by more than 25 percent to be unacceptable, apparently on the ground that production rates can be made uniform by pencil work, if they are not suitably uniform in real life.23 There simply is no justification for discarding readings just because of magnitude, especially in the indiscriminate manner recommended in many texts. This practice is nothing more than an ill-disguised attempt to impose properties on data which they do not have, such as nearperfect uniformity. This gets rid of the problem of data interpretation, and it also gets rid of the essence of the matter. In the bargaining framework of time study this practice is also an invitation to bias. It is simpler to justify a relatively stringent standard when high readings have been ruled out of existence. Gilleepie, Dynamic Motion and Time Study, p. 83. " Lowry, Maynard, and Stegemerten, Time and Motion Study, pp. 228-29. " Schutt, Time Study Engineering, pp. 58-59. 11
164
A WORK M E A S U R E M E N T
THEORY
Presgrave's handling of the problem is far more honest and, as it turns out, far more sensible. His view is that any recording that is beyond the normal range can not be ignored with safety, in spite of the common practice of discarding extremes. Unusually low timings are especially open to conjecture, since they may provide the key to the whole study. Unusually high readings . . . also may be an indication of conditions that must be corrected.24 The procedure of discarding "abnormal" readings is always applied to elements, though time studies are usually intended to establish cycle standards. Later on it will be shown that no useful purpose is served by considering elements when dealing with cycle problems. This reduces the question to how to define "abnormal" cycle readings and what to do about them. The answer is uncluttered with nonsense. "Abnormal" readings do have meaning, great meaning, but they must be defined as the assignable variations detected by criteria of stability. However, assignable variations are defined on the basis of experimental data rather than personal whim. Also, assignable data are used rather than discarded, since they provide valuable information about the work situation, such as whether it can be considered standardized. This is presumably what is behind the Presgrave suggestion. The Question of Economy. There can be no doubt that time studies cost far less than the proposed sampling procedures, but there also can be no doubt that this is because time studies don't estimate anything. Cost must always refer to purpose and result, and, if the purpose is to attain precise results, there can be no choice but to use the proposed procedures. If numbers without content were desired, there would be really no reason to prefer time study to crystal-ball gazing unless, of course, stop-watches are cheaper than crystal balls. Economy, of course, is not a trivial matter, and it is useful to look into ways of reducing unit costs. The unit costs of the proposed procedures can be reduced by the practical device of scheduling observation tours to cover several operations. Unit costs can also be reduced by the formal device of covering a large number of operations in a single study using, where necessary, stratifying and normalizing procedures. However, the information obtained from an omnibus study about individual operations must not be expected to be nearly as sharp as the 11
Presgrave, The Dynamics of Time Study, p. 185.
DESIGNING S T U D I E S O F PRODUCTION
ACTIVITY
165
information obtained from individual studies. Economy can never be attained without giving up quality—quality of information in this case. If economy is vital, it can be attained with procedures that perform a useful function. Economy can never be attained with procedures whose chief function seems to be inviting trouble.
CHAPTER
Measuríng and Estimating
TWELVE
Delays
In general, the term "non-productive activities" is felt to be synonymous with the term "delays." It turns out, however, that certain non-productive activities—perhaps most of them—are desirable because they are needed for developing optimal work methods. The evidence for this view is summarized in later chapters, particularly Chapters 16 through 18, which are directly concerned with that problem. There are, of course, undesirable non-productive activities. In Chapter 8, for example, production performance is shown to be a direct function of the nature, frequency, and duration of certain delays. These delays were indeed undesirable since ability to reduce them was associated with superior production rates. The two-sided character of non-productive activities suggests using the generic term "non-productive activities" instead of the term "delays." That would remove the unrealistic "one best way" connotation that these activities are uniformly undesirable and should be eliminated. The term "delays" is so deeply imbedded in the jargon of the field, however, that it will continue to be used in this book as though it were a synonym for "non-productive activities." MAKING STUDIES OF DELAYS
Complementary Relation to Productive Activities. Non-productive activities and productive activities are complementary and additive components making up the total work activity. They are also complementary in the more significant sense that superior performance in the one case is usually associated with superior performance in the other. This suggests that non-productive activities can be analyzed from the same viewpoint as productive activities. It suggests, in particular, that the
MEASURING AND ESTIMATING DELAYS
167
time parameter also be used as the fundamental parameter for nonproductive activities. This is exactly what was done in developing estimates of the rates of non-productive activities using, however, sampling procedures that differ somewhat from those used in developing estimates of production rates. The complementary relation also suggests that non-productive activities can be studied either on a local basis, on a grand basis, or both. The objectives of local and grand studies of non-productive activities are exactly the same as they are in the case of production rates. The primary objective is to establish whether non-productive activities are standardized at a grand level and at a local level, always assuming that they are already standardized in a primary (or gross) sense. A second and related objective is to provide precise estimates of the rates of different types of non-productive activity. There is one major difference between studies of productive activities and studies of non-productive activities. That is a difference of emphasis. In studies of non-productive activities, it is usually considered desirable to concentrate on the estimation question rather than the standardization question. Since a fairly high level of precision is usually required, there will be a sufficient number of samples for a decisive judgment about stability and, hence, standardization. Local Studies. It may, of course, be desirable to look into the shortterm and local aspects of delays (considered in the sense defined above). This is especially true when there is a plan to study local production rates. Local delays and local production rates are complementary, and local studies, such as those described here, are capable of looking into both questions at the same time. Grand Studies on Lang Technical Delays. Two types of studies can be made on long-term or grand delay characteristics. Long-term estimates may be needed about the duration of relatively long technical delays and, implicitly, about the amount of time required to restore production activity. No absolute criterion can be given for a long technical delay, but there is an empirically useful criterion: a long technical delay is one which has inter-cycle effects and for which the remedy is technical, immediate, and overt. Delays of this kind would not be expected nearly as often as more subtle delays occurring within an operation cycle. They call for the use of a quantitative approach rather than a qualitative approach, such as
168
A WORK M E A S U R E M E N T
THEORY
the one applied below to more subtle delays. Much more information is available per observation from quantitative methods than from qualitative methods; this is an especially important advantage in view of the fact that long technical delays usually do not have a high frequency of occurrence. There are a number of reasons for not going into the required procedure. Long technical delays are not a long-term problem in the usual sense, since remedial steps are ordinarily taken with each occurrence. Direct remedial action is, in fact, the only realistic procedure for situations where there is a wide variety in delay type and intensity. In such cases analysis based on a normalized measurement unit would generally be too cumbersome to be of much use. When variations in type and intensity do not create a difficult measurement problem, the procedures for making studies of grand stability can be directly applied. Certain adjustments would have to be made, of course, to fit the characteristics and requirements of the local work environment. These adjustments would be primarily concerned with defining strata and measurement units. But these are problems of application rather than problems of procedure; they should be capable of ready solution if the suggestions given in this book are followed. Grand Studies of Other Delays. The second type of delay study is concerned with delays other than long technical delays. This type of study is much more useful empirically because the delays in question are neither so individualistic as to defy straightforward analysis nor so overt in their implications as to warrant immediate remedial action. Yet these delays usually account for most of the time used unproductively; indeed, they often take up a substantial proportion of the total available work time. Studies of the grand characteristics of these more subtle delays are particularly important from a practical standpoint. I t is possible to consider both types of delays in a single study and, in practice, they usually are (that is, where procedures like those proposed in this book are applied). This is an acceptable procedure. But it remains true that there are two distinct problems involved, and they should be given separate attention whenever possible. The studies considered here are concerned solely with rather subtle delays although, with some modification, they can be applied to situations where it seems desirable to study the two types of delays together. The Qualitative Approach. There are a number of important reasons
MEASURING AND ESTIMATING DELAYS
169
why these studies should be qualitative rather than quantitative, which means that results refer to percentage rather than duration. Time is still the fundamental parameter, though it is now considered from the standpoint of frequency of occurrence rather than duration. One immediate advantage of the qualitative approach is that in most cases a large number of observations can readily be obtained. This advantage far outweighs the advantage of obtaining more information per observation by using the quantitative approach. Also, the cost and difficulty of making qualitative observations is much smaller than the cost and difficulty of making quantitative observations. Another and certainly not trivial advantage of the qualitative approach is that delays difficult to handle in quantitative terms are easily handled in qualitative terms. It is relatively simple to define a common qualitative measurement unit for different types of delays. That is what makes it possible to obtain an indefinitely large number of observations, and it also explains why the two types of delays considered can be handled in the same qualitative study. TIPPETT'S PROCEDURES AND RESULTS
The Sampling Approach. The problem of obtaining long-term estimates of delay percentages has been investigated by L. H. C. Tippett.1 TiDDett's researches culminated in a set of procedures for handling that problem. As a statistician, Tippett apparently was not aware of many of the empirical problems in estimating delay percentages, such as the need to distinguish between local and grand characteristics of delays. Despite this, Tippett's work represents a significant advance in the field of work measurement. Although the original motivation for this choice seems to have been primarily economic, it was Tippett who first suggested that the qualitative approach be used instead of the quantitative approach. He also recommended that observations be taken in a random manner over a protracted interval. This is to be contrasted to classical procedures, such as the "interruption study" (considered below), which requires that quantitative observations be taken over a continuous but limited period. The result is that the estimates obtained do represent the long-term 1 Tippett, "A Snap-Reading Method of Making Time-Studies of Machinée and Operatives in Factory Surveys," Journal of the Textile Industry, X X X V I (1935), 51-70.
170
A WORK MEASUREMENT
THEORY
characteristics of delays, thus overcoming one of the most important objections to classical procedures. It might be added that Tippett was not aware of the fundamental distinction between the two approaches since, for example, he used "interruption study" results as a standard of accuracy. Tippett's procedure was to take momentary observations in each of six identifiable work periods in the cotton industry, where the basic applications were made. Tippett also was aware of the need to minimize the unconscious and the conscious tendency of workers to anticipate observations and, more generally, the need to schedule the observations in each work period at random. The sampling schedule was also planned so that systematic work stoppages, such as official rest periods, would be excluded. For making observations a "snap-shot" technique was recommended, but it was recognized that this could lead to the problem of observer bias. To prevent this the observer was cautioned to make observations as soon as the work stations came into view. According to Tippett a single stoppage should be counted just once, even though the formal sampling schedule might require it to be observed several times. This suggestion, it turns out, does involve accepting slightly biased estimates of delay percentages. But the slight bias is of little consequence; the important fact is that level of precision can be established by the Tippett proposal, but not by alternative procedures. Repeated observations of the same stoppage are not (statistically) independent, and independent observations are necessary to establish precision of estimate. This would be no problem if—as it is recommended here—only subtle delays are considered in the studies. A fundamental component of the Tippett procedure is the provision that the required number of observations be computed in advance. The procedure for doing this is quite similar to a procedure described in Chapter 11. That procedure explains how, with production-rate data, the number of observations needed to obtain certain confidence intervals can be computed. The specific procedure in the delay case is illustrated by the case histories given a little later on in this chapter. This, too, is intended to insure confidence intervals of a predetermined size in estimating, in this case, underlying delay percentages. The Analysis. Tippett applied this sampling procedure to certain weaving and spinning operations in eleven cotton mills in England. In
MEASURING AND ESTIMATING DELAYS
171
making observations the observer recorded whether a given operator (or machine) was productive or non-productive; in the latter case he also reported the cause. The percentages of delays of various types were then computed and analyzed. The method of analysis was to arrange the data into frequency groupings and run a "goodness of fit" test. This test is intended to determine how well the frequencies obtained matched the frequencies expected from the binomial distribution.* (The binomial distribution is the formal distribution type considered appropriate for qualitative data of the kind considered here.) A second series of tests was made to determine whether different groups of looms in a given mill had essentially the same delay characteristics.' From these tests Tippett concluded that the binomial distribution actually did fit the observed data in most applications and described "the variations admirably" in some. The need to determine whether the data could be considered binomially distributed is due to a principle that cannot be emphasized too strongly. Precise estimates would be impossible unless some well-defined distribution or, what amounts to the same thing, a standardized situation can be assumed to exist. As Chapter 11 points out, however, the "goodness of fit" approach does not provide an answer to the question, principally because it does not take into account fundamental time order relationships. Applications of the Tippett Approach. The Tippett sampling procedure for estimating long-term delay percentages is being adopted by a fastgrowing number of practitioners. Recognizing the value of sampling procedures constitutes a fundamental and welcome change in viewpoint. But there are many weaknesses and even distortions of acceptable practice in the applications. There is hardly any recognition of the role of stability and standardization. Also, only meager attention is paid to the question of precision of estimate, or to the need to develop empirically significant strata. The outcome is that the applications do consider the long-term aspects of delays, but the results are certainly not nearly as sharp as they should be. It seems unfortunate that valuable concepts, such as those proposed by Tippett, are usually mishandled when they acquire some vogue. It * "Goodness of fit" tests of this kind are described in Hoel, Introduction to Mathematical Statistics, pp. 194-95. * Theye testa are often called teste of the "binomial index of dispersion," and are described in Hoel, Introduction to Mathematical Statistics, pp. 195-97.
172
A WORK MEASUREMENT THEORY
also seems unfortunate that certain appliers seem compelled to claim that the concepts originated in their applications. The only originality involved is the coining of terms like "ratio-delay" and "work sampling." THE STABILITY CHART PROCEDURE
Tippett's recommendations on sampling can be used without essential change. But the "goodness of fit" tests used by Tippett for analysis have serious disadvantages. The principal disadvantage is that statistical procedures are used to determine whether statistical procedures can be applied in the first place. Such tests simply are not appropriate for establishing whether statistical stability exists in the sense, for example, that a binomial distribution can be considered to apply to long-term delay characteristics. The stability chart approach does not have this fatal weakness.4 In fact, it has essentially the same advantages with delays as it does with production rates. These advantages include : showing whether assignable causes exist in the observed data; bringing out other time order relationships that might exist; and permitting assignable causes to be identified quickly. By these means stable and standardized delay characteristics can be brought about when they do not exist in the beginning. CASE HISTORY: PLANT
Β
A study of delays was first made in Plant B. It covered a total of 145 different work stations. These work stations included the hand-pressing operations considered in Chapter 7, along with all hand-sewing operations, regular sewing-machine operations, and special sewing-machine operations. Computing the Required Number of Observations. When primary interest is in precise initial estimates, the first problem is to determine the required number of observations. As in making production-rate estimates, the maximum confidence interval and the confidence coefficient must be determined in advance, using as a guide the empirical requirements of the immediate situation. Where an advance estimate of the coefficient of variation was required before, however, an advance 4 The chart approach described here was originally proposed in Abruzzi, "Delay Allowances by Statistical Methods," Columbia Engineering Quarterly, I, No. 1 (1948), 6-8, 23.
MEASURING AND ESTIMATING
DELAYS
173
estimate of the total delay percentage is required here because of the qualitative (binomial) nature of the problem. In the absence of an explicit statement by management, the confidence interval and confidence coefficient questions were decided on an ad hoc basis in the Plant Β application. A 99 percent confidence coefficient was adopted together with a maximum confidence interval of ± 3 percent; these values were considered to represent typical estimation requirements on delays. An advance estimate of total delay percentage should, if possible, be based on accumulated plant information on, for example, related operations. In the present case the advance estimate of 15 percent for all delays was provided by time study engineers with extensive experience on similar operations in other plants. When such advance information is either unavailable or unreliable, an acceptable working estimate can be obtained from pilot or preliminary delay studies. With the three required values known, the total number of observations can be computed from the formula MCI =
3^/ML^JL). Ν
Here MCI, p, and Ν refer, respectively, to the maximum confidence interval, the advance estimate of the total delay percentage, and the total number of observations. It is more convenient for present purposes to rewrite the formula as =
9(p)(l - f ) (MCI)2
This gives y
=
«ÍL^SS „ .0009
75
in the present case. At least 1,275 observations would have been required to run only a 1 percent risk that the confidence interval obtained would not be greater than ± 3 percent. The formula applied here is based on the assumption that the data will turn out to be stable, which means in this kind of application having a binomial distribution. The formula also assumes that the normal distribution gives a satisfactory approximation of the binomial distribution;
174
A WORK MEASUREMENT
THEORY
this assumption can safely be made in this kind of problem, since the ρ value is usually greater than .10 and the Ν value is usually more than 500.® In this example the integer 3 was used as a multiplying factor. This multiplier is obtained from tables of the normal distribution, and it corresponds to a confidence coefficient of 99 percent. The other common confidence coefficient is 95 percent which, as Chapter 11 illustrates, corresponds to the multiplier 2. These two confidence coefficient values ehould satisfy most empirical requirements, although a multiplier corresponding to any other desired confidence coefficient value can be obtained from normal tables. The Sampling Schedule. I t is usually good practice to plan on taking more than the computed number of observations because the original ρ estimate may be somewhat off. There is no general rule for the size of the safety margin since this depends on the unknown size of the error in the ρ estimate. In most situations, however, a 10 percent margin should be sufficient. In the present case there seemed to be no need for a safety margin because the interval and coefficient requirements had been developed on an ad hoc basis and were not intended for direct empirical use. In fact, 1,200 observations were taken rather than the computed 1,275. The 1,200 observations were assigned equally to the ten successive days covered by the study; the 120 daily observations were then assigned equally to the five work periods defined in Table 17. The day and period strata defined here are, of course, time strata in the sense described in Chapter 11. The uniform distribution of samples among the strata implies that individual days and periods were considered of equal importance with respect to delays. The period strata defined in Table 17 were based on the criterion of homogeneity used in developing the period strata for the studies of grand stability. There is one rather obvious difference; the criterion was applied this time to delay characteristics rather than to production-rate characteristics. Empirical evidence of homogeneity was obtained from the time study engineers mentioned above; they advised that on the basis of their experience the five periods defined in Table 17 represented delay strata ' The normal approximation of the binomial distribution is described in Hoel,
Introduction to Mathematiad Statistics, pp. 45-50.
MEASURING AND ESTIMATING DELAYS
175
to be expected within single days. When advance evidence like this is incomplete or unreliable, a pilot or preliminary study modeled along the lines of the main study should provide the needed information. TABLE 1 7 . THE PERIOD BREAKDOWN USED IN T H E DELAY STUDY IN PLANT Β
Period 1 Period 2 Period 3
8:00- 9:30 9:30-11.00 11:00-12:30 ΑΪΤΚΒΝΟΟΝ
Period 4 Period 5
1:00- 2 :30 2:30- 4:00
Planning Observation Tours. The observer assigned to make the study had had little experience either with observation techniques or with industrial operations. Once he had been instructed, however, he seemed to have little difficulty in using the prescribed observation procedure. The instructions included a list of the delays of the unavoidable delay category, such as thread breaks, and also of the personal delay category, such as talking, which he was likely to encounter. The observer was also given a box of 145 numbered discs, each corresponding to a single work station, from which he drew 24 discs at random for each observation tour. The process, however, was modified in the following manner: The observations in each tour were assigned to the different types of operation in proportion to their number. This insured that the samples would be representative as well as random. The particular time within each work period for making the tour also was selected at random. This was to prevent the workers from acting in anticipation of the observations; it also avoided giving a disproportionate amount of attention to particular time segments within work periods. The Preliminary Study. As in production-rate cases, pilot or preliminary delay studies serve to familiarize the observer with the sampling procedure and the operating situation. In the delay case they also provide information about the specific delays likely to be encountered and, where required, about the ρ value and the definition of period strata. In this case the preliminary study was made over a two-day interval. It revealed that certain workers shifted from one type of operation to another; that other workers did not report until afternoon; and that
176
A WORK MEASURKMKNT THKOHY
still other workers were sent home early because materials were unavailable. Another discovery in the preliminary study was that it was difficult to decide whether talking should always be considered a delay. To settle the problem, talking was defined as a delay only when there was a clear work stoppage, though all cases of talking were recorded. This example points up the need and the difficulty of establishing clear-cut definitions of specific types of delays. Some workers were also rather curious and even apprehensive about the impact of the study on work requirements and rates of pay. This reaction was similar to the reaction of certain workers, particularly in Plant A, to the studies on production rates. A careful program of explanation was necessary in both cases to reassure the workers on these questions. The effectiveness of this program is brought out by entries in the observer's notebook indicating that little attention was paid to the tours of observation during the main delay study. Among other advantages, preliminary studies help to minimize conscious and unconscious biases of workers unfamiliar with delay studies and their objectives. Viewed as a whole, the preliminary study showed that the circumstances under which the main study would be made were far from ideal. This meant that a full-scale analysis would be of questionable value especially since, as another concession to expediency, the observation program had to be crammed continuously into ten work days. A study plan was therefore adopted which brought out essential points but did not bother with subtleties. The Chart Results. The total delay percentages were computed by periods, and they were plotted as shown in Figure 20. The criteria of stability tentatively adopted were the conventional limits used in product quality applications for qualitative data of this particular kind. The actual limit criteria were computed from the formula r
\
η
Here p refers to the observed mean delay fraction of .124 (12.4 percent), and η refers to the sample size, of 24 units in the case of periods and 120 units in the case of days/ •Chart studies of this nature are discussed, for example, in Burr, Statistics and Quality Control.
Engineering
MEASURING AND ESTIMATING
35
177
DELAYS
I
"Stîîîs
30 » ?»
o « 20 K z eo 15 α. IO 5 0 25
I
•
34
2
4 1 S 45 14 1 · · * · · · · · · 3 * 2
5
S I
Ρ·Ι2.4
34 2 · 3
I 34 2 S 3 23 234 · 5 3 4 4 2 LCL'O 10 15 20 25 30 35 40 45 50 SUCCESSIVE PERIODS UCL'21.5
20 Ρ «12.4 '
o 10 LCL*3.4 2
4 SUCCESSIVE
6 DAYS
10
FIGURE 20. CHARTS ON THE TOTAL DELAY PERCENTAGES OBTAINED IN PLANT B , PLOTTED BY THE PERIOD AND BY THE DAY
Figure 20 shows that the period data were not stable. The fifth period had substantially greater percentage values than the other periods, including one sample point above the upper limit. The chart for periods also shows that there were substantial differences among the delay percentages for the various periods. The accompanying chart gives the results for the pooled daily delay percentages. Almost all the points fall close to the p value of .124, making it seem that there was an exceptionally high order of stability. What actually happened, however, was that substantial differences among the period values were absorbed by the pooling process. This had two immediate results. The extreme period percentages canceled out to give daily percentages close to the mean percentage. This also gave rise to an inflated estimate of variability and, eventually, to exceptionally wide limit values. The net result was an erroneous impres-
178
A WORK MEASUREMENT
THEORY
sion of stability which can be traced to the pooling of nonhomogeneous data. The period chart results actually showed that the delay percentages for the fifth period and, to a lesser extent, the first period could be considered the result of assignable causes. Investigation revealed that a great deal of time was spent in early morning and late afternoon either waiting for materials or putting them away. Also, a substantial proportion of all the delays recorded was the result of problems in scheduling and materials handling. I t would have paid the plant to look into these problems which result from poor production control. By eliminating the assignable causes, the plant would have been able to stabilize delays at an economically acceptable level and, in the process, reduce the percentage of delay time, perhaps substantially. CASE HISTORY: PLANT
A
In Plant A a similar study was made, covering seven days spaced over a period of a month. An intermittent study like this has an important advantage over a more compact study like the one made in Plant B. All the significant delay characteristics likely to occur in a plant's operations can be taken into account, thus insuring unbiased and, ultimately, accurate estimates. In this study there were 310 work stations representing the four operating divisions of the plant. The operations were all of the same general type, and they were performed on sewing machines differing only with respect to minor attachments. The result was that these operations were much more homogeneous than the operations considered in Plant B. In Plant A the p estimate was supplied by the Methods Department in terms of allowance data. A 16 percent time allowance was granted the workers as follows: 1 percent for unavoidable delays (exclusive of lot changes), 10 percent for fatigue, and 5 percent for personal delays. There was also a special allowance of 0.21 minutes per dozen garments for delays associated with lot changes. Under the conservative assumption that the mean (cycle) time for these operations was 0.50 minutes, this was equivalent to 3.5 percent of the available working time. With these facts as a rough guide, it was decided that a p value of 20 percent would be adopted, primarily to determine how well the allowance would tally with the facts.
MEASURING AND ESTIMATING
DELAYS
179
The supervisor of the Methods Department, for whom the estimates were being made, decided what the estimation requirements would be. His decision was that the estimate of the total delay percentage should have a maximum confidence interval of ± 4 percent, using a confidence coefficient of 99 percent. These values were substituted into the computing formula given above; this gave an Ν value of 900 units. The Sampling Schedule. I t has been suggested that a safety factor be used in deciding on the actual number of observations to be taken. An ample safety margin seemed desirable in the present case in order to make doubly sure of precise results. This led to the decision of taking 1,470 observations instead of the computed 900. The allocation plan assigned 210 observations to each of the seven days, and 30 observations to each of the seven work periods defined in Table 18. TABLE 18.
T H E P E R I O D BREAKDOWN U S E D
I N T H E DELAY STUDY I N PLANT
A
MORNING
Period Period Period Period
8:45- 9:30 9:30-10:30 10:45-11:15 11:15-11:55
1 2 3 4 AFTERNOON
Period 5 Period 6 Period 7
1 M - 2:30 2:45- 4:00 4:00- 4:55
The period breakdown was developed by the plant's Methods Department from information on file regarding delay characteristics. Among other advantages a relatively detailed breakdown, such as this sevenperiod breakdown, provides an excellent opportunity for detecting any significant delay strata within single days. The 15-minute breaks in mid-morning and mid-afternoon represent the two formal rest periods. Preparing for the Main Study. A list of potential delays was abstracted from previous studies of local stability. These delays were then defined, with the advice of the Methods Department, as either unavoidable or personal. Personal delays were defined as those completely within the control of the worker, while unavoidable delays were defined as those not completely within the control of the worker. No attempt was made in either case to distinguish between desirable and undesirable delays.
180
A WORK M E A S U R E M E N T
THEORY
UCL-42.1
Ρ • 20.1
LCL'0.9
14
21
SUCCESSIVE
28
33
PERIODS
42
49
F I G U R E 2 1 . C H A R T ON T H E T O T A L DELAY P E R CENTAGES OBTAINED I N P L A N T A, P L O T T E D BY T H E P E R I O D
However, unavoidable delays were classified as delays associated with repairs, delays associated with lot changes, and miscellaneous delays. The observations in each period were made in the same manner as in the delay study in Plant B. In this case, however, the observations were assigned according to the number of work stations in the four operating divisions. The divisions would then be given a representative weight if they turned out to have different delay percentages. In Plant A the pilot or preliminary study lasted only one day since the observer had become familiar with the operations in earlier studies of local stability. The pilot study showed that there would be little technical difficulty in making the main study. However, there was a problem with worker reaction, even though the workers were familiar both with the observer and with the objective of the study. A certain amount of added explanation was necessary to put them at ease in a performance sense. The Chart Results. Figure 21 gives the total delay percentages for the various work periods. According to conventional limit criteria the delay percentages could be considered stable; on this basis a confidence interval was computed for the underlying delay percentage. With a 99 percent confidence coefficient the computing formula becomes
MEASURING AND ESTIMATING DELAYS
181
UNAVOIDABLE DELAYS UCL«33¿
55
30 23 η Iti J 20 tg isl
" " Ρ·Ι4.3~
Id
10 · ·
*
*
14
20
21
PERSONAL
« · · · ·
33
LCL'O 42 49
DELAYS
UCL>I8.C
28
ui 10 υ e α. 3
-P'&S
kl
0
14 21 28 33 SUCCESSIVE FERIOOS
LCL'O 42 49
FIGUBE 22. CHARTS ON THE UNAVOIDABLE AND PERSONAL DELAY PERCENTAGES OBTAINED IN PLANT A, PLOTTED BY THE PERIOD
where N * refers to the total number of observations actually taken, and ρ refers to the mean observed delay fraction. The actual confidence interval turned out to be 20.1 db 3.1 percent, a range safely within the advance requirement of ± 4 percent. The unavoidable and personal delay percentages are given in Figure 22, which shows that in both cases stability could be assumed. Confidence intervals were then computed, again for a confidence coefficient of 99 percent. The result was 14.3 ± 2.7 percent for unavoidable delays, and 5.5 ± 1 . 8 percent for personal delays. Charts were constructed also for each of the three classes of unavoidable delays. In each case a confidence interval was computed on the strength of a conclusion of stability. These intervals were 7.1 ± 2.7 percent, 3.5 ± 1 . 4 percent, and 4.0 ± 1 . 5 percent, respectively, for miscellaneous delays, lot^change delays, and repair delays.
182
A WORK MEASUREMENT
THEORY
Implications. To permit ready comparison the study results are recorded in Table 19 side by side with the corresponding allowances. An immediate advantage of the studies is that they provide estimates in terms which define their range of error. It can be stated, for example, that in this case the underlying total delay percentage would fall between 23.2 percent and 17.0 percent, with the most likely value being 20.1 percent. The confidence coefficient of 99 percent is interpreted to mean that such a statement would be correct about 99 percent of the time if similar estimating procedures were repeatedly applied. TABLE
19.
DELAY E S T I M A T E S A N D A L L O W A N C E S I N P L A N T
Study Results Unavoidable 14.3 Miscellaneous 7.1 Repairs 4.0 Lot changes 3.5 Personal 6.5 Total
± 2.7 ± 2.0 ±1.5 ± 1.4 ± 1.8
20.1 ± 3.1
Allowances Unavoidable All except lot changes
A
4.5 1.0
Lot changes Personal Fatigue
3.5 5.0 10.0
Total
19.5
The plant allowances were clearly far from realistic, as might be expected from the fact that they were largely obtained from published tables. The allowance for unavoidable delays is particularly unrealistic, and it turns out that the fatigue allowance was just an allowance for unavoidable delays. On the numerical face of it, the personal delay allowance seems to be adequate. However, this allowance was primarily intended to give the workers sufficient time to take care of physiological and other personal needs; it was not intended to take into account work delays. The personal delay percentage obtained in the study shows that· much of the allowance was used up by work delays. A situation like this should be corrected by a program of training if the delays are undesirable, or by a more liberal allowance if they are not. The management of the plant apparently had only a fuzzy notion of the kind of delays that took place and how much time they consumed. Such situations arc not uncommon, even though they can be extremely costly from both an operating viewpoint and an economic viewpoint. Estimates like the ones obtained in this study help in avoiding these situations because they give information that Li at once realistic and sharp. Such information is indispensable in uncovering delay factors
MEASURING AND ESTIMATING DELAYS
183
which use up a disproportionate amount of time, and which should be made the focus of a systematic correction program. In standardization terms this means that more nearly optimal standardization could be achieved with respect to delays. Supplementary Information. Grand delay studies and grand production-rate studies are alike in another sense: they provide a respectable amount of information about internal variables, such as workers. The data here could have been used, for example, to determine whether certain operators had an exceptionally high number of delays. It must always be remembered, however, that information about internal variables is always confounded with the effects of other variables. On the strength of the conclusion of stability, a formal test was made to check the chart indication that the period strata had equivalent delay percentages. In order to minimize confounding, the effect of days was separated out in working out the experimental design.7 It turned out that the periods did have equivalent delay percentages and that the periods did not constitute distinct strata, at least in terms of delay characteristics. Results of this kind have immediate as well as long-term value. In the present case, for example, future delay studies could be planned without considering period strata. It might be added that the nonsignificance of the period strata in this case tallies with the result obtained by Tippett in similar tests on delays in certain cotton mill operations. THE KEY ADVANTAGES OF GRAND DELAY STUDIES
Grand delay studies have exactly the same objectives and advantages as grand production-rate studies. In delay terms the key advantages are: (1) showing whether delay percentages are stable and, hence, whether the delay situation is standardized; (2) providing guides to corrective action if stability does not exist; (3) supplying precise percentage estimates of the time used up by different types of delays in a stable delay situation; (4) providing a mechanism for determining whether delays are standardized at an optimal level and, if not, suggesting the nature of the corrective action required. The analogy between the two kinds of grand studies carries over to 7 The test details are given in Abruzzi, Work Measurement, design is described in Snedecor, Statistical Methods, p. 448.
p. 268; the underlying
184
A WORK MEASUREMENT
THEORY
sampling advantages. Sampling minimizes unconscious and conscious biases introduced by workers while reacting to the process of observation. Delay studies also enjoy two added advantages because observations are made qualitatively. Hardly any manipulative ability is demanded of the observer, though he should be familiar with the operations and operating conditions. The qualitative approach is also quite inexpensive in the sense that many operations can be observed in a short time. The Versatility of the Procedures. The studies considered here are concerned solely with the long-term characteristics of delays. But the same procedures can be used to excellent advantage in many other significant problems related to production. Among these problems are: (1) determining the optimum number of machines to be assigned to a worker; (2) estimating the amount of attention required by the machines in a related group, such as a battery; (3) estimating the number of machines or work stations required at various stages in straight-line processes, such as assembly lines. The same procedures can also be applied to problems not immediately related to production, such as time used up in different phases of materials handling.8 Underscoring its versatility the sampling procedure has been successfully applied in a study of psychological behavior.® Applications in Local Studies. In an earlier section it is recommended that local studies of delays be made together with local studies of production rates. This implements the basic recommendation made in Chapter 6, which is that local studies should be made on a continuous series of production items produced by individual workers. The basic recommendation is intended to insure a maximum amount of information about how workers plan and organize both the productive and the nonproductive aspects of work activity. However, it may sometimes be advisable for economic or other reasons to make local studies on groups of workers, and even groups of operations, rather than on individual workers. In that event the study plan might well be modeled after the grand delay studies. But this would be profitable only when interest is concentrated either on production rates or on delays, for separate sampling studies would have to be made if both areas have to be covered. 8 See, for example, Schaeffer, "Observation Ratios," Factory Management Maintenance, X C I X (1941), 58-59, 159. • Vernon, The Assessment of Psychological Qualities by Verbal Methods.
and
185
MEASURING AND ESTIMATING DELAY8
This is neither as economical nor as informative as a single continuous study covering both areas. With these reservations the procedure for making grand delay studies may readily be adapted for making local studies. Essentially this was done in a local delay study recently made in two English factories covering several groups of workers. The approach used was the same as the approach illustrated by the two case histories.10 A Review of the Advantages of Qualitative Studies. The versatility of the sampling procedure and, for that matter, the estimating procedure in grand delay studies flows from the basic fact that the measurements are qualitative. Many variables cannot be considered readily in quantitative terms, but they can in qualitative terms. The qualitative approach makes up for being relatively less sharp in an information sense by being more versatile than the quantitative approach in a measurement sense. However, the qualitative approach does require many more observations for the same amount of information, which means that an indefinitely large reservoir of observable items must be available in order for the approach to work. This is no great problem in practice, however, since the percentage measurement unit normalizes out potential measurement differences among operations, and the process of stratifying normalizes out other important differences. Further adjustments are usually unnecessary when the qualitative approach is used, even with a complicated job shop situation. This is because qualitative units, such as percentages, have a universality which quantitative units, such as absolute time values, do not have. Relatively straightforward stratification is usually sufficient to take into account any job shop characteristics that might be involved. DESIGNING STUDIES OF DELAYS
Chapter 11 shows how studies of production rates can be designed according to the viewpoint taken in this book. The same design characteristics apply to studies of delays except for the adjustments involved in passing from a quantitative approach to a qualitative approach. This means that only key design questions need to be reviewed, primarily to show how they are treated in a qualitative framework. In the two case histories conventional limits based on the binomial distribution were used as primary criteria of stability. These limits, 10
Williams, Avoidable Delays among Women Workers on Light
Work.
186
A W O R K MEASUREMENT THEORY
however, would not necessarily be selected in all grand delay studies. I t might be appropriate, for example, to use the integer 2 as the multiplying factor in the formula for computing the limits. I t might be desirable, for example, to minimize the risk of Type II error even at the expense of increasing the risk of Type I error. It should again be emphasized, however, that the criteria finally selected should yield an optimal result in terms of costs of error in repeated application. There were a large number of operations in each of the two case histories. This has the advantage of giving sharp estimates for the delay situation as a whole. While it also gives a respectable amount of information about the delay characteristics of individual operations, that information can't help but be confounded. If sharp estimates are required about individual operations, the proposed procedures can still be applied, but, of course, a substantially greater number of observations would be needed to cover the same number of operations. There is an advantage of considering small groups of operations, and even individual operations. The problem of obtaining representative samples is greatly simplified, as indeed are many other problems which are bothersome in multi-operation and, hence, confounded studies. I t must always be assumed, of course, that the number of items available for sampling is sufficient to permit valid stability judgments and precise estimates. The problem of defining a homogeneous lot or stratum for sampling purposes is the same with delays as it is with production rates. In the cases described here this problem was solved by dividing the work day into the periods considered empirically significant. The importance of a correct decision on such questions is brought out by the first case history, where extremely biased results would have been obtained if samples had been taken on a daily basis rather than on a period basis. Stratifying. In many cases it will be economically undesirable or empirically difficult to make delay studies covering only homogeneous operations. Both factors came into play in the two case histories. In the first case history there were several groups of nonhomogeneous operations, and stratifying was done in terms of the number of work stations in each type of operation. In the second case history the operating divisions were suspected of having significantly different delay percentages, and stratifying was done in terms of the number of work stations in each operating division.
MEASURING AND ESTIMATING DELAYS
187
Stratifying usually should solve the problem when nonhomogeneity of this kind is suspected; with percentages it is rarely necessary to normalize the measurement unit. The result will be unbiased and, ultimately, accurate estimates. Where it is actually needed, stratifying also gives an estimate of variability that is smaller, sometimes substantially, than the estimate ordinarily obtained without stratifying.11 This means, of course, that the final estimates have more precision. Correct strata development is more important in delay studies than in production studies because delay studies are often terminated after the initial samples have been taken. But this doesn't mean that (grand) delay studies cannot be usefully extended into the future. Taking additional samples has the same advantages here as in (grand) productionrate studies: they insure that the delay structure remains standardized at an acceptable level, and that the estimates obtained become progressively less biased and more precise. In any event it is not enough to take into account inter-period strata. Significant inter-day and even inter-week strata must also be expected in most cases, and they, too, must be taken into account. In the second case history, for example, the seven study days were distributed over an interval of about one month for exactly that reason. THE FAILURE OF CLASSICAL DELAY PROCEDURES
Production-rate problems and delay problems are treated in essentially the same way in the proposed theory. Time study procedures also treat these two kinds of problems in essentially the same way, but the treatment is a failure. The Evaluative Base of ike Procedures. Like the time study process itself, the time study approach to delays is primarily evaluative. Delay studies are primarily intended to develop specifications (they are usually called allowances in delay cases), and any estimation that is done is subservient to an evaluation process. This gives time study two major evaluation components. In addition to the rating of production performance, a subject given a thorough going-over in Chapters 3 and 4, there also is what might well be called the rating of delay performance. These evaluations are applied in sequence so that a production standard actually represents the time that 11 The formal reasoning involved here is considered in Hoel, Introduction matical Statistics, pp. 225-28.
to Mathe-
188
A WORK MEASUREMENT
THEORY
ought to be required for production after it has been modified by a factor based on the time that ought to be required for delays. The completely unjustifiable procedure of multiplying value judgments in different dimensions seems to disturb no one in time study circles, but it is sufficient to justify the immediate rejection of classical delay allowances and production standards. As might be expected from this, once the evaluation component is stripped away, there is little or nothing left in classical delay studies that even remotely resembles an estimation component. For example, the personal delays considered in time study literature and practice are not work delays, but delays allowed for physiological and other needs not immediately associated with work. Work delays originating with the worker are not usually provided for, presumably because workers should not encounter such delays. This wholly unrealistic view is considered in Chapter 16 along with its ancestor, the "one best way" view of work. The essentially evaluative nature of delay allowances is obvious from the views of time study writers. For example, Carroll asserts, presumably on the basis of a private value system, that fatigue and personal (need) delays should account for about 20 percent of the total work time in most industrial operations.12 Similarly, Shumard recommends certain tabulated though equally arbitrary allowance values for what he calls "rest factors," which are presumed to cover fatigue and personal (need) delays.13 These include an allowance of 2.5 percent for personal (need) delays for male workers and 4 percent for female workers. However, he does recommend direct studies of delays for which the worker is not held accountable. Holmes, too, recommends a fixed range of allowance values for personal (need) delays, which he believes should account for 3 to 5 percent of the available work time in most cases.14 He also suggests a fixed range of allowance values for special delays, such as tool maintenance. Barnes recommends a similar allowance range of 2 to 4 percent for personal (need) delays, but he believes that fatigue is such a negligible factor in most modern operations that it need not be considered in time study work.15 This sample makes clear that time study writers recommend essen,s
Carroll, Time Stvdy for Cost Control, pp. 08-100. " Shumard. A Primer of Time Study, pp. 242-45. 14 Holmes, Applied Time and Motion Study, p. ISO. " Barnes, Motion and Time Study, pp. 278-S0.
MEASURING AND ESTIMATING DELAYS
189
tially arbitrary allowance values to cover fatigue and personal (need) delays. These allowances are based on nothing more formidable than private value systems, since no discernible estimation process is involved. Eloquent confirmatory proof is provided by the use of fatigue allowances in the face of the fact that fatigue has not yet been fully defined. "Interruption Studies." Some time study writers do recommend making direct "interruption studies" to develop allowances for unavoidable delays for which the worker is not held accountable. In essence, these studies are large-scale versions of ordinary time studies except that what is recorded now is the gross amount of time used up by productive activity and by different delays. The delays considered, however, are generally long and, therefore, do not usually include, for example, intracycle delays.16 Like ordinary time studies, "interruption studies" are made on a continuous basis and usually cover only one worker. There is a difference in the length of the study period, which is usually much longer in the case of "interruption studies." But the analogy reappears in the recommendations on length of study which in both cases are based on arbitrary judgments. In the delay case the recommendations range from two to four hours at one extreme to a week or two at the other, with an eighthour study considered sufficient on the average. Survey respondents also offered a wide variety of recommendations on how long "interruption studies" should last.17 There were 38 specifications, which fall into general categories as follows: (1) 12 specified eight hours or an entire shift; (2) 6 specified a maximum of eight hours; (3) 9 specified from eight hours to a full day; (4) 8 specified from several shifts to a week; (5) 3 specified from two weeks to a month. Also, 11 respondents did not make specific recommendations, arguing that the length of study should be determined by the operations and nature of the delays encountered. One respondent freely admitted that the governing criterion for him was to satisfy workers and supervisors that the studies were long enough. Bargaining Aspects of Allowances. Allowances represent standards for delays. Like production standards, of which they are a component part, allowances are ultimately controlled by the bargaining of management and labor. Thus the process of developing allowances is at once a small1 4 The procedure for making "interruption studies" is described in Morrow, Time Study and Motion Economy, pp. 171-72. 17 See Abruzzi, " A Survey of American Time Study Practices," Time and. Motion Study, II, No. 1 (1952), 11-25.
190
A WORK MEASUREMENT THEORY
scale replica of time study and an integral component of time study. Allowances like these may have the external appearance of validity because they are sometimes published in handbook form. But they must ultimately be based on value judgments on what is equitable. This is put into sharp focus by the fact that fatigue and personal needs can only be taken into account in terms of some ultimately bargained code; they cannot be measured in any decisive way. I t is true that labor may not participate overtly in setting the allowances. But labor does become directly involved as a challenger and critic, just as it often does with final production standards. It accepts fatigue and other allowances regardless of origin when this seems advantageous; otherwise it makes a challenge. The bargaining activity exhibited with production standards ia reproduced by the bargaining activity ultimately exhibited with delay allowances. The only important differences are in scale, intensity, and degree of codification. Bargaining over allowances is subsidiary in scale and intensity to bargaining over production standards, which include allowances. Allowances also have acquired a certain stature and immunity by having been incorporated in a traditional and semi-rigid code of interindustry working rules. Labor sometimes finds it desirable to put contractual restrictions on whatever studies are used in establishing allowances. The UAW booklet provides a useful example; it urges local unions to insist that "interruption studies" last at least eight hours.18 These restrictions are imposed in much the same manner and for much the same reason that contractual restrictions are put on parent time studies. The fact that there aren't nearly as many challenges on delay allowances as there are on production standards may mean that allowances are more lenient than standards. It may also mean that allowances are more difficult to challenge because they have become institutionalized. The most likely explanation, however, is that challenges on production standards are sufficient to obtain labor's objectives in this area. In that event labor needn't become involved in what turns out to be mere detail. There is no point in making an issue of allowances if the same objectives can be realized by making an issue of standards. The fact that labor passes on allowances, at least indirectly, proves that bargaining takes place even though this may not be formalized. » The UAW-CI O Look* at Time Study, p. 30.
191
MEASURING AND ESTIMATING DELAY8
There are numerous instances, though, where the allowance question is formally recognized. The Bureau of Labor Statistics, for example, has reported a number of contract provisions specifying delay and fatigue allowances.19 Also, the UAW booklet recommends contract provisions specifying that (1) at least 6.7 percent be allowed for personal (need) delays; (2) at least 6.7 percent be allowed for other types of delay; (3) at least 10 percent be allowed for fatigue. The Estimation Component. When direct delay studies are made, it is possible to isolate an estimation component. Again there is an analogy to parent time studies since the defects of the estimation component are exactly the same in both cases. For example, these studies are concerned solely with the local delay characteristics of individual workers; they simply cannot provide any useful information about grand delay characteristics. Nor do these studies pay any attention to minute delays which are more significant, at least in the work methods sense, than prolonged delays. The best that can be said for these studies is that they give a fragmentary view of certain kinds of local delays exhibited by specially selected workers. There is hardly any point in belaboring the fact that these studies fail to estimate much of anything, local or grand. It is sufficient to say that no attention is paid either to the problem of stability or to the problem of precision of estimate. The Question of Economy. There is one notable difference between the delay situation and the production-rate situation. The procedures proposed for dealing with delays cost far less than the corresponding classical procedures. This is due to the speed with which qualitative measurements can be made and the ability to incorporate many operations in one study in an essentially straightforward manner. The proposed procedures supply information about delay characteristics that is not and cannot be supplied by classical procedures. The two sets of procedures are therefore not comparable. Yet it is gratifying to know that it is much less costly to use procedures which yield valid judgments about stability and precise estimates than to use classical procedures which yield neither. An ounce of good theory is always more practical than a pound of poor practice. " Collective Bargaining Standards of Production.
Provisions:
Incentive Wage Provisions;
Time Studies
and
PART
III
The Theory of Human Work: Beliefs, Codes, and Observations
CHAPTER
THIRTEEN
The Nature of Standard Data Systems
Both the objectives and the basic procedure for developing standard data systems were originally defined by Frederick W. Taylor in the early part of the century.1 In essence, Taylor recommended that each operation be divided into component elements and timed with a stopwatch. Studies were to be made on specially selected and usually skilled operators. The time values obtained were to be recorded, indexed, and then used to determine the time required to perform other operations made up of similar elements. Taylor's belief and, it turns out, fundamental error was that element times could be subtracted from particular operations and added to others in a simple arithmetic manner. A Closer View of Procedure. Time study writers seem compelled to decorate a procedure with glossy detail, and then claim that they have invented a new procedure. This is the case with standard data systems which are actually all based on the procedure originally proposed by Taylor. The only change of any consequence—and this is just an extension in the area of application—has been to develop systems for motions as well as for element combinations of motions. The first specific step in developing element standard data systems is, of course, to define the elements. Element time values are then obtained by conventional time study procedures, and these values are usually summarized by computing some measure of central tendency. Thus far the estimation component of time study is used and, for what that is worth, estimates are being obtained. The Evaluation Process. At this point the evaluation component appears and creates havoc in exactly the same manner as it does when time study is used directly to develop production standards. What happens 1
Taylor, "The Present State of the Art of Industrial Management," Transactions,
American
Society of Mechanical
Engineers,
X X X I V (1912), 1 1 9 9 - 1 2 0 0 .
196
THE THEORY OF HUMAN WORK
is that the summarized element times are rated. Whatever estimation value the results may have had vanishes in a process which amounts to setting production standards for the elements in question. The result is that standard data systems involve adding production standards, not estimates. Since production standards must be based on value judgments, this implies that a linear scale can be constructed for value judgments which permits them to be added and multiplied with impunity. This is outand-out nonsense since value judgments, by their very definition, are not measurable. Otherwise, they could be brought within the framework of estimation theory and would cease to be value judgments at all. It follows that element standard data systems (this applies with equal force in the case of motion standard data systems) are patently absurd on a simple a priori basis. The only inquiry that makes sense is looking into the estimation component of standard data systems to see whether they are capable of yielding time estimates that can legitimately be added and multiplied. Element Definitions. Many element definitions, most of them vague and arbitrary, are recommended and used in practice. A typical example is given by Morrow who recommends that elements be as small as possible for standard data work.2 Barnes's recommendation is equally unsatisfactory; he says that elements should be defined in such a way as to permit standard data systems to be developed.8 These recommendations are reflected in the views offered by practitioners. 4 There were 60 practitioners who reported using element breakdowns in their work, and in 57 cases the breakdowns were used to help develop standard data systems. The criteria reported are summarized in Table 20 along with the number of times cited. Claimed Advantages of Element Systems. The principal argument for element standard data systems is economy, for only an occasional time study is presumably needed after a system has been developed. But the whole approach stands or falls on the parent and twin assumptions that production standards developed from a standard data system are both accurate and mutually consistent. 1
Morrow, Time Study and Motion Economy, p. 228. Barnes, Motion and Time Study, p. 288. 4 See Abruzzi, "A purvey of American Time Study Practices,"' Study, II, No. 1 (1952), 11-25. 3
Time and Motion
197
T H E NATURE OP STANDARD DATA SYSTEMS TABLE 2 0 .
CRITERIA REPORTED FOR DEFINING OPERATION
ELEMENT*
Criterion
Times Cited
Natural and definite "breaks," e.g., sound, motion, etc. Breakdown into constant and variable element groups Smallest elements consistent with accuracy Observer's judgment Breakdowns compared to those of similar jobs Breakdowns compared to those on standard tables Other criteria
15 15 14 14 5 5 5
In Carroll's view the property of consistency is the most valuable property of standard data systems.6 This leads him to proclaim that this approach has no disadvantages, and that it can be applied to all types of work except creative mental work. The claim of consistency is even supported by Gomberg who is otherwise suspicious of standard data systems.· Gomberg's acceptance is apparently based on the bargaining advantage that consistency would give workers, for he writes that "standard data at least reduces to writing an implied bargain between the workers and the management." In view of what is said in this book about production standards and evaluation processes, it simply could not be true that standard data systems are capable of being either accurate or consistent. These properties can only have meaning with respect to estimates obtained from a process of measurement; they cannot have meaning with respect to results obtained from a process of evaluation. Though standard data systems do "reduce to writing," there is no way of pinning down just what is being written about. Time study writers are always prepared to quibble about trivial details, but never about basic problems. It is not surprising to find essentially closed ranks on the claim that element standard data systems are superior to motion standard data systems. A few writers do have some evidence for taking this position. Mundel's position, for example, is based on the negative fact that in certain laboratory studies the times required to perform adjacent motions were interrelated.7 Gillespie takes essentially the same position as Mundel and for much « Carroll, Time Study for Cost Control, pp. 18-32. • Gomberg, A Trade Union Analysis of Time Study, pp. 158-59. * Mundel, Systematic Motion and Time Study, p. 178.
198
T H E THEORY O F HUMAN WORK
the same reason. 8 He feels that it is much more reasonable to use elements as primary units since they are made up of unified groups of related motions. I t must be kept in mind, however, that the Mundel and Gillespie argument can have meaning only if the problem is viewed strictly as an estimation problem. The reported advantages of element standard data systems have apparently persuaded many practitioners; the percentage rate of application, according to a survey made by Barnes, increased from 69 percent in 1945 to 78 percent in 1948.® An even greater percentage was reported in the later survey considered here, where 57 out of 66 respondents reported using a certain amount of standard data in their work.10 Claimed Advantages of Motion Systems. The other side of what amounts to a quarrel over area of application is taken by a number of writers. Their view is that motion standard data systems are more accurate than e l e m e n t s t a n d a r d d a t a système.
Robert MacLatchie, for example, asserts that rating makes time study incapable of producing accurate results.11 He proposes that "motion time" standards be used instead, but he conveniently forgets that they also involve a rating process. Quick, Shea, and Koehler take the same position as MacLatchie. 12 They cite examples of the wide discrepancy in the results obtained by time study engineers in rating work films but they, too, are conveniently forgetful. There is at least an advertising difference between the two kinds of systems, for almost every published motion system is claimed to have universal application. The single exception is the system published by Barnes who, with unique modesty, indicates that the system should be applied only to assembly operations.13 In addition to the claim of universal application, each published motion system is claimed to be suptv ri or to all others because of what are considered distinctive features. These always turn out to be nothing more than window dressing. A rather naïve example is provided by A. B. Segur who asserts that * Gillespie, Dynamic Motion and Time Study, p. 77. * These survey results are contained, respectively, in Barnes, Work Measurement Manual, and Barnes, Industrial Engineering Survey. 10 Abruzzi, "A Survey of American Time Study Practices," Time and Motion Study, II, No. 1 (1952), 11-25. 11 MacLatchie, "Wage Incentives from Motion Time Standards," in Proceedings of the Time and Motion Study Clinic, Nov., 1945, pp. 59-65. 12 Quick, Shea, and Koehler, "Motion-Time Standards," Factory Management and Maintenance, CHI, No. 5 (1945), 97-108. " Barnes, Motion and Time Study, pp. 333-44.
THE NATURE OF STANDARD DATA SYSTEMS
199
he has developed a system using 17 so-called fundamental motions. This system is based upon the claim that the time required by experts to perform a fundamental motion is constant within reasonable limits.14 By defining 24 fundamental motions Holmes develops an entirely different system." In turn, Quick, Shea, and Koehler maintain that their system is more satisfactory than other systems because it takes into account distance, control, weight, and body member.16 Quite recently another system has been published by H. B . Maynard, G. J . Stegemerten, and J . L. Schwab.17 They claim that this system provides both time standards and a procedure for improving work methods. The fact that these claims are mutually contradictory has apparently not been much of a handicap, for this particular system has achieved quite some vogue. The "It Works" Criterion. The most persistent claim made for each kind of system is that "it works." This claim is neither surprising nor convincing. The same claim is made with respect to standards derived from ordinary time studies; it is invalid in both cases, primarily because "it works" means only that there is a certain measure of acceptance. Refuting the claim that "it works" for a standard data system simply requires that the arguments presented in Chapters 2 through 4 on the subject be extended to element or motion production standards. For example, admittedly competitive and even contradictory motion systems can coexist only in a framework where acceptance may be granted or withheld on the basis of the results obtained in local application. This is certainly not the same thing as having validity from an estimation standpoint. Other Verification Claims. In some cases the "it works" criterion takes the form of "test" verification. The "tests" consist of comparing (cycle) standards developed from some standard data system to standards developed from direct time studies. This is a completely meaningless activity for a number of reasons. Both sets of standards are based on value judgments, and value judgments can be made to match. Matched results simply indicate that the same judgment pattern is used in both cases. This is exactly what hap14 Segur, "Motion Time Analysis," in Proceedings of the Time and Motion Study Clinic, Nov., 1938, pp. 40-48. 15 Holmes, Applied Time and Motion Study, p. 217. " Quick, Shea, and Koehler, "Motion-Time Standards," Factory Management and Maintenance, CIII, No. 5 (1945), 97-108. 17 Maynard, Stegemerten, and Schwab, Methods-Time Measurement.
200
T H E T H E O R Y OF HUMAN W O R K
pens, for the ratings in matching "tests" are usually made by the same observers or, what amounts to the same thing, observers trained in the same rating tradition. Even under these extremely biased circumstances the match is poor in many cases. Even observers with essentially the same background have difficulty in developing a uniform judgment pattern and, like the workers they observe, must be expected to have "abnormal" readings. Nothing fundamental would be accomplished even if matching "tests" were uniformly successful. To be able to match standards is to succeed only in developing firm and uniform biases. There simply is no way of determining the accuracy of production standards arrived at in any manner—unless absolute value systems can be decreed into existence by proclamation. This is another way of saying that tests can have meaning only when they are applied to estimates. In that case precision becomes meaningful and, if at least two independent estimating procedures are available, accuracy also becomes meaningful. This is the viewpoint from which standard data systems will be examined here. The question that must be asked is whether the estimation component of standard data systems is capable of yielding estimates that can legitimately be added and multiplied; nothing else is relevant. I t becomes little more than a diversion, then, to look into the structure of the "tests" used in the few cases where there has been any attempt to describe them. A typical example is given by Maynard, Stegemerten, and Schwab in what turns out to be an abortive attempt to show that their standard data results do match time study results. It is of more than passing interest that these authors admit that rating probably brought the two sets of results closer together, though they are either unable or unwilling to see that this makes the entire matching process invalid.18 Quite aside from this, however, they find it convenient to pool the percentage difference values for the pairs of standards obtained on each of 27 motions. The pooling was done on an arithmetic basis, and it resulted in a mean difference of 1.85 percent. This, it is claimed, proves that their standard data system gives valid results. But arithmetic pooling is a process without meaning, and in this case it concealed the fact that the individual percentage differences ranged '« Ibid., pp. 36, 217.
T H E NATURE O F STANDARD DATA SYSTEMS
201
from —14.3 percent to +13.3 percent. A much more meaningful index would be obtained by computing the mean absolute difference. This turns out to be 7.37 percent, a far cry from 1.85 percent. CRITICAL V I E W S AND E X P L O R A T O R Y
EXPERIMENTS
Incompleteness of the Critiques. Looking into critiques of standard data systems can be of value. Unfortunately, the critiques do not recognize the fundamental need for separating their estimation component from their evaluation component. For that reason the critiques themselves do not have meaning unless they are only considered to apply to the estimation component of standard data systems. Interpreting the Critical Views. The first major critical notice of the standard data approach is found in Hoxie's comprehensive critique of time study published in 1916.19 In his view element standard data systems introduce another unscientific and possibly unjust factor into time study work. Hoxie also made what turns out to be the remarkably accurate prediction that only with machine-controlled operations would standard data systems have any usefulness. In recent years this question has received a great deal of critical consideration. Ryan, for example, questions what he considers to be the fundamental assumption of the standard data approach, i.e., that the time required to perform a given motion or element is the same regardless of the other motions or elements in an operation.20 That is not, however, the assumption that is made. The real assumption is that the time that ought to be required to perform a given motion or element is the same regardless of the other motions or elements in an operation. Ryan's question does have meaning if, as indicated, it is applied to the estimation component of standard data systems. In that context Ryan's added comment also becomes meaningful. It is impossible, he says, for standard data systems to have predictive value if their fundamental assumption (as modified) is invalid. Ryan also looks into Segur's claim that motion times are constant within reasonable limits. Segur's full claim, however, refers to experts which again brings in an evaluation component. Ryan should be talking exclusively about time estimates. Then his comments make sense. He states that Segur's claim is inconsistent with experimental evidence— " Hoxie, Scientific Management and Labor, pp. 51-52. » Ryan, Work and Effort, pp. 232-35.
202
THE THEORY OF HUMAN WORK
including, it might be pointed out, the not inconsiderable evidence given in this book—showing that workers differ greatly in production rates. Edwin Ghiselli and Clarence Brown also make the error of considering the fundamental assumption of standard data systems to be that each movement in an operation is independent of all the others. 21 When the assumption is formulated correctly, their added comment acquires meaning. They emphasize that an individual works on an operation as a totality, and that every part of an operation affects every other part. T o support this view these writers present the results of a simple experiment on a key-tapping operation. This experiment showed that eliminating two of the movements did not reduce the cycle time as much as would be expected from the fundamental assumption (as modified). As a labor spokesman, it will be recalled, Gomberg considers the (claimed) property of consistency to justify the use of element standard d a t a systems. 22 Gomberg does stress, however, that the fundamental assumption of the (estimation component of the) standard data approach has not been substantiated in experimental terms. As Gomberg puts it, there has been no evidence that "the elements of which a job is composed make up an additive set." Gomberg also points to certain exploratory experiments indicating t h a t motion times depend on adjacent motions. This, together with inconsistencies among motion standard data systems, persuades him that such systems are "very dangerous to use." At least one time study writer seems aware of some of the questions considered here. T h a t writer is William Lichtner who wrote that the time required to complete one element depends on the time required to complete the preceding element. 23 The explanation was that workers move continuously from one movement to another. Apparently Lichtner did not visualize the full implications of this, for he had no hesitancy about endorsing the standard data approach. Survey Results. According to survey results many practitioners do not fully accept the claims of time study writers on standard data systems. Indeed, they seem to be intuitively familiar with some of the limitations. 24 When asked, for example, whether there was any risk of error " Ghiselli and Brown, Personnel and Industrial Psychology, pp. 26S-70. " Gomberg, A Trade Union Analysis of Time Study, pp. 156, 159-60. " Lichtner, Time Study and Job Analysis, p. 168. 14 See Abruzzi, "A Survey of American Time Studv Practices," Time and Study, II, No. 1 (1952), 11-25.
Motion
T H E NATURE OF STANDARD DATA SYSTEMS
203
in using standard data systems, 46 replied that there was and only 14 that there was not. Of the first group 19 provided percentage estimates ranging from 3 percent to 50 percent. Six others stated that production standards based on standard data systems have a maximum error potential of ± 0 . 0 2 minutes. In all other cases the estimates of error were given in qualitative terms; these estimates ranged from very small to very large. The 14 who maintained that standard data standards are free of error contended that this could be accomplished by proper judgment in applying the data. Practitioners were also asked whether removing an element from an operation would affect other element times; 50 reported that it would and 8 reported that it would not. The chief reasons given by those in the first group are that: (1) the tempo, the rhythm, or the motion pattern changes; (2) the relationships among elements change; (3) changes in direction, distance, or weight affect element times; (4) workers form persistent work patterns; (5) exhausting elements affect the times for other elements. There also were 58 replies to the question of whether a change in sequence would affect the element times. Again 50 said yes and 8 said no. The yeses took the position that substantial production changes, particularly when work methods are improved, may accompany changes in sequence. Apparently many practitioners intuitively recognize the existence of relationships among elements, and even some of the causative factors. By and large, they agree with the critical students on the limitations of standard data systems. But practitioners and critics alike miss the central point, which is that the only component of standard data systems that can have experimental meaning is the estimation component. Exploratory Experiments. In 1939, Barnes and Mundel made a number of laboratory experiments on a simple task : pick up and transport, position, and insert (pins into bushings with beveled h o l e s ) W h e n the hole size was changed, an expected change took place in the time for positioning and inserting. But there also was a change in the pick-up and transport times. This would not have happened if motion times were inde" Barnes and Mundel, A Study in Simultaneous Hand Motions.
204
T H E THEORY OF HUMAN
WORK
pendent, and the authors concluded that the time required to perform a motion must be affected by adjacent motions. In the following year Barnes, Perkins, and Juran obtained similar results in another series of experiments. They reported that "seemingly, the elements within one cycle compensate one another to avoid unduly long or short cycle time." 26 These two sets of results were used by Mundel as a negative argument for concluding that element standard data systems are superior to motion standard data systems. They were also used by Gomberg to support the speculation that element times might not be additive. Limitations of these Results. The simple study made by Ghiselli and Brown completes this extremely small group of laboratory studies, all indicating that operation components are interdependent. Although they do have some exploratory value, these experiments have serious empirical and statistical limitations. They were made in the laboratory; the tasks were extremely simple ; the subject-workers were highly motivated; and the results only refer to (gross) mean times. The basic data were not evaluated for statistical stability, nor was there any attempt to look into other important aspects of the situation. The problems involved in appraising the estimation component of standard data systems are much more complex and far-reaching than either the critical studies or the experimental studies suggest. To give just one example, it is important not only to consider whether operation components are independent but also to determine how such components can be defined so that they actually will be independent. These problems are taken up in the next two chapters, which also describe the experimental results obtained and their implications. M Barnes, Perkins, and Juran, .4 Study a Factory Operation, pp. 5 9 - 6 0 .
of the Effect of Practice
on the Elements
of
CHAPTER
FOURTEEN
Standard Element Estimates and Related
Problems
Most of the experiments on problems related to what should be defined as standard element estimates were made on data taken from local studies, primarily in Plant A. In most cases stability was established to exist, which meant that formal statistical tests could be applied with confidence. TESTS OF INDEPENDENCE ON BASIC ELEMENTS
The Test Procedure. Tests of independence are somewhat complicated in theory and quite tedious in application. Interested readers are referred to Work Measurement for technical details; the emphasis here will be on results and implications.1 A test of independence might well be termed an omnibus test; it determines whether the elements in an operation considered as a group are mutually independent. The approximate probability of independence was reported in all cases, with a 5 percent level used as a significance criterion. Results on Opération 1. Two tests of independence were made on Operation 1 which, it will be recalled, was made up of 9 elements. These tests were made on representative samples of data from the studies of local stability on Operators ID and 1H. The probability of independence turned out in both cases to be less than .1 percent, a clear indication that the elements were correlated. Inspection of the test data showed that the element structures of the two operators differed greatly in nature and intensity of correlation. This means that the nature and intensity of the relationships among 1 See Abruzzi, Work Measurement, pp. 129-37, for procedure and application, and Wilka, Mathematical Statistics, for theory.
206
T H E THEORY O F HUMAN WORK
elements are largely a function of the operator—a view confirmed by most of the other studies made. Results on Other Operations. A test of independence was also made on Operation 1*; three tests were made on Operation 2; two testa were made on Operation 3; and four tests were made on Operation 13. In all but two cases the probability of independence turned out to be less than .1 percent. For operators working on the same operation, the nature and intensity of the relationships among elements were again found to vary substantially. In the case of Operator 3H, however, the probability of independence was fully 60 percent, which means that the elements here could be considered independent. Another exception was a test result on data for Operator 3L which had a probability value of 7 percent. However, in a second test made on data obtained just one day later, the probability value was less than .1 percent. This pair of back-to-back test results suggests that the relationships among elements for the same worker does not remain constant over time—even one as experienced as Operator 3L, who had 18 months' experience. Other Representative Results. Tests of independence were made in numerous other cases, using data obtained from the same operations as well as others. With one exception the probability of independence was less than .1 percent. A follow-up study of the test data reaffirmed the view that the relationships among elements largely depend on the operator. The existence of independence was verified just once. This was with an operation in Plant Β made up of just three elements, one with a mean time of about 3.0 hundredths. This suggests that number and magnitude have an important effect on the independence of elements. R E L A T E D EMPIRICAL EVIDENCE AND IMPLICATIONS
As originally defined, the elements in these operations were not independent though the element definitions were developed by experienced time study engineers. The criteria used were essentially those enumerated in Table 20, which indicates, upon review, that some of the criteria are intended to provide signals to aid the observer. Apparently this is not quite the same thing as providing logical element endpoints. The results also suggest a relationship between independence and the number and the magnitude of the elements, and variation in the rela-
STANDARD ELEMENT ESTIMATES
207
tionehips among elements with different operators and, with time, even with the same operator. This extends the finding presented in Chapter 8 that the relationships among cycles depend on the operator and do not remain constant even for one operator. Suggested Hypothesis. In Chapter 8 also, a great deal of statistical and empirical evidence is presented on the nature and intensity of correlations among cycles. These correlations are explained in terms of the hypothesis that workers plan and regulate production rates to suit abilities and purposes. The present results show that there also are significant correlations among elements. The hypothesis referring to cycles can therefore be extended to cover elements, and relationships among elements can also be considered to reflect worker planning and regulating. The primary factors are again abilities and purposes, though probably with a shift in emphasis from overt social needs to more subtle behavioral needs. Evidence from Plant A. Some evidence supporting this view has already been presented. For example, a prolonged delay in one part of a cycle prompts the worker to quicken the pace in a subsequent part of the cycle; workers have different delay potentials in terms of number, type, and duration; workers vary the way they perform certain elements; workers introduce changes in work methods from time to time. Many other examples of regulation were observed and, in almost all cases, the characteristic reported was peculiar to one or a limited number of operators working on a given operation. Operator 1/1, for example, picked up both the left and right backs of a garment simultaneously; this was contrary to the prescribed method defined in Chapter 7. Operator 1/ picked up the eye-end before picking up the label though the prescribed method required that they be picked up together. The same operator did somewhat more straightening than the other operators in the eighth element. A particularly significant phenomenon was observed when Operator I F was assigned to work on a lot of garments requiring no labels. She had become so accustomed to the usual element pattern that she instinctively reached for an imaginary label while performing the fifth element. This movement would have been eliminated if it had been independent of the movement of reaching for the eye-end. Operator 21 was found to modify the work method in a number of ways. She "toyed" with the stack of finished units while putting away
208
THE THEORY OF HUMAN WORK
a newly completed unit. In performing elements 3 and 6, she used two sewing "runs" instead of the usual one. Also, unlike other operators, she guided the garment with her fingers while performing the sixth element. A number of unique characteristics were observed in the work method of Operator 3N. In performing elements 1 and 10, for example, there was a definite pause not characteristic of the other operators. This operator also placed finished units into two stacks instead of the usual one. Operator 3M, on the other hand, started a stitching movement during the second element rather than the more customary third element. She was also observed to pause perceptibly at the end of each cycle. Operator 13^4 included a clipping movement in element 10 while pulling the garment away from the sewing machine, though this was supposed to be done in the following element. This operator also laid down the scissors used in element 8 immediately after completing the element. Sometimes Operator 13Β followed this procedure, too, but more often she laid down the scissors at the end of the cycle. A third variation was used by Operator 13C ; she always held the scissors until the end of the cycle. Operator 13C was also observed to work in terms of groups of elements within which individual elements were difficult to identify. She paused slightly before starting each of these element groups, as though in preparation. Many of these pauses were accompanied by a garment inspection so superficial that it seemed as though she were using the inspection to rationalize the pause. The example from the work of Operator 13C is just one of the many ways in which an essentially non-productive activity was rationalized by concealing it within an apparently productive activity. The rationalizing process is a facet of the close interrelation between productive activities and non-productive activities. It also gives dramatic evidence that workers plan their work activities in such a highly individualistic and intuitive manner that they sometimes—perhaps unconsciously— attempt to hide activities that might be considered unacceptable. 2 Evidence from Plant B. The work methods in Plant Β would be expected to be much more individualized than those in Plant A ; they were for the most part determined by the workers. This turned out to be exactly the case. Operator 22A, for example, reversed the order of two 1 See also Williams, Avoidable Delays among Women Workers on lÀghi Work, pp. 13-14.
STANDARD ELEMENT ESTIMATES
209
pressing elements in certain cycles. He also pressed the entire center seam in one element, though it was more common to press the under part of the seam separately. Operators 26A and 28A also exhibited unique work characteristics, particularly in positioning garment sleeves and in pressing collars. Implications. There seems to be no doubt that workers organize operation elements into an integrated and essentially unique cycle pattern. Workers do not consider work in terms of individual elements, but in terms of integrated groups of elements. This was brought home to the observers in a convincing way; they always found it necessary, so to speak, to relearn an operation with each operator. The general implication is that, as currently defined, many elements do not make up logical operation components, particularly in mancontrolled operations. This does not rule out the possibility of developing such components, though it does seem to make the prospect dim. The prospect is in arranging elements for the purpose of absorbing the bulk of the correlations within the groups. TESTS OF INDEPENDENCE ON ELEMENT GROUPS
The Grouping Procedure. The grouping criteria used were independence-test data together with empirical evidence on the relationships among elements. To the extent that it was wisely done, grouping would be expected to confirm the relationships among elements. The most important objective, though, was to determine whether independent operation components could be constructed. Independent components would have the property of linearity, which would make it possible to add or multiply estimates. Results on Operation 1. The nine elements of Operation 1 were combined into five element groups, and fresh tests of independence were made on the (grouped) data for Operators ID and IH. With probability values of 45 percent and 8 percent, respectively, independence could be considered achieved in both cases, a sharp reversal of the results obtained with the original elements. Results on Other Operations. The original elements were also grouped in every other case where a test of independence had already been made. The results were essentially the same, with probability values ranging from 14 percent to 99 percent in most cases. There were a few exceptions, however, as in the case of Operator 21, where the probability value after
210
T H E THEORY OF HUMAN WORK
grouping was only 1 percent. On the other hand, the grouping process did yield independent components for Operator 2F, where the probability value turned out to be 52 percent. A highly suggestive set of results was obtained with Operation 3, where the ten original elements were combined into seven groups. The probability values for the grouped data are given in Table 21 along with TABLE
21.
INDEPENDENCE-TEST RESULTS IN OPERATION 3
ON
GROUPED
ELEMENTS
OPERATOR
Readings used (n) Probability value With element groups With elements
3H 53
3/, 51
3N
S6% 60%
99%