Psychological testing: principles and applications [Sixth edition, Pearson New International Edition] 1292040025, 1269374508, 9781292040028, 9781269374507

For sophomore/junior-level courses in Psychological Testing or Measurement. Focuses on the use of psychological tests to

598 88 4MB

English Pages ii, 486 pages: illustrations; 28 cm [491] Year 2013;2014

Report DMCA / Copyright

DOWNLOAD PDF FILE

Table of contents :
Cover......Page 1
Table of Contents......Page 4
1. Tests and Measurements......Page 6
2. Defining and Measuring Psychological Attributes: Ability, Interests, and Personality......Page 25
3. Testing and Society......Page 57
4. Basic Concepts in Measurement and Statistics......Page 77
5. Scales, Transformations, and Norms......Page 97
6. Reliability: The Consistency of Test Scores......Page 121
7. Using and Interpreting Information About Test Reliability......Page 139
8. Validity of Measurement: Content and Construct-Oriented Validation Strategies......Page 158
9. Validity for Decisions: Criterion-Related Validity......Page 183
10. Item Analysis......Page 207
11. The Process of Test Development......Page 231
12. Computerized Test Administration and Interpretation......Page 253
13. Ability Testing: Individual Tests......Page 269
14. Ability Testing: Group Tests......Page 292
15. Issues in Ability Testing......Page 320
16. Interest Testing......Page 356
17. Personality Testing......Page 399
Appendix: Forty Representative Tests......Page 430
Appendix: Ethical Principles of Psychologists and Code of Conduct......Page 431
References......Page 448
B......Page 482
C......Page 483
F......Page 484
I......Page 485
M......Page 486
P......Page 487
R......Page 488
S......Page 489
W......Page 490
Y......Page 491
Recommend Papers

Psychological testing: principles and applications [Sixth edition, Pearson New International Edition]
 1292040025, 1269374508, 9781292040028, 9781269374507

  • 0 0 0
  • Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up
File loading please wait...
Citation preview

Pearson New International Edition Psychological Testing Principles and Applications Kevin R. Murphy Charles O. Davidshofer Sixth Edition

Pearson Education Limited Edinburgh Gate Harlow Essex CM20 2JE England and Associated Companies throughout the world Visit us on the World Wide Web at: www.pearsoned.co.uk © Pearson Education Limited 2014 All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, without either the prior written permission of the publisher or a licence permitting restricted copying in the United Kingdom issued by the Copyright Licensing Agency Ltd, Saffron House, 6–10 Kirby Street, London EC1N 8TS. All trademarks used herein are the property of their respective owners. The use of any trademark in this text does not vest in the author or publisher any trademark ownership rights in such trademarks, nor does the use of such trademarks imply any affiliation with or endorsement of this book by such owners.

ISBN 10: 1-292-04002-5 ISBN 10: 1-269-37450-8 ISBN 13: 978-1-292-04002-8 ISBN 13: 978-1-269-37450-7

British Library Cataloguing-in-Publication Data A catalogue record for this book is available from the British Library Printed in the United States of America

P

E

A

R S

O N

C U

S T O

M

L

I

B

R

A

R Y

Table of Contents

1. Tests and Measurements Kevin R. Murphy/Charles O. Davidshofer

1

2. Defining and Measuring Psychological Attributes: Ability, Interests, and Personality Kevin R. Murphy/Charles O. Davidshofer

20

3. Testing and Society Kevin R. Murphy/Charles O. Davidshofer

52

4. Basic Concepts in Measurement and Statistics Kevin R. Murphy/Charles O. Davidshofer

72

5. Scales, Transformations, and Norms Kevin R. Murphy/Charles O. Davidshofer

92

6. Reliability: The Consistency of Test Scores Kevin R. Murphy/Charles O. Davidshofer

116

7. Using and Interpreting Information About Test Reliability Kevin R. Murphy/Charles O. Davidshofer

134

8. Validity of Measurement: Content and Construct-Oriented Validation Strategies Kevin R. Murphy/Charles O. Davidshofer

153

9. Validity for Decisions: Criterion-Related Validity Kevin R. Murphy/Charles O. Davidshofer

178

10. Item Analysis Kevin R. Murphy/Charles O. Davidshofer

202

11. The Process of Test Development Kevin R. Murphy/Charles O. Davidshofer

226

12. Computerized Test Administration and Interpretation Kevin R. Murphy/Charles O. Davidshofer

248

13. Ability Testing: Individual Tests Kevin R. Murphy/Charles O. Davidshofer

264

I

14. Ability Testing: Group Tests Kevin R. Murphy/Charles O. Davidshofer

287

15. Issues in Ability Testing Kevin R. Murphy/Charles O. Davidshofer

315

16. Interest Testing Kevin R. Murphy/Charles O. Davidshofer

351

17. Personality Testing

II

Kevin R. Murphy/Charles O. Davidshofer

394

Appendix: Forty Representative Tests Kevin R. Murphy/Charles O. Davidshofer

425

Appendix: Ethical Principles of Psychologists and Code of Conduct Kevin R. Murphy/Charles O. Davidshofer

426

References Kevin R. Murphy/Charles O. Davidshofer

443

Index

477

Tests and Measurements

The term psychological test brings to mind a number of conflicting images. On the one hand, the term might make one think of the type of test so often described in television, movies, and the popular literature, wherein a patient answers questions like, “How long have you hated your mother?” and in doing so reveals hidden facets of his or her personality to the clinician. On the other hand, the psychological test might refer to a long series of multiple-choice questions such as those answered by hundreds of high school students taking college entrance examinations. Another type of “psychological test” is the self-scored type published in the Reader’s Digest, which purports to tell you whether your marriage is on the rocks, whether you are as anxious as the next fellow, or whether you should change your job or your lifestyle. In general, psychological tests are neither mysterious, as our first example might suggest, nor frivolous, as our last example might suggest. Rather, psychological tests represent systematic applications of a few relatively simple principles in an attempt to measure personal attributes thought to be important in describing or understanding individual behavior. The aim of this text is to describe the basic principles of psychological measurement and to describe the major types of tests and their applications. We will not present test theory in all its technical detail, nor will we describe (or even mention) all the different psychological tests currently available. Rather, our goal is to provide the information needed to make sensible evaluations of psychological tests and their uses within education, industry, and clinical practice. The first question that should be addressed in a psychological testing text is, “Why is psychological testing important?” There are several possible answers to this question, but we believe that the best answer lies in the simple statement that forms the central theme of this text: Tests are used to make important decisions about individuals. College admissions officers consult test scores before deciding whether to admit or reject applicants. Clinical psychologists use a variety of objective and From Chapter 1 of Psychological Testing: Principles and Applications, Sixth Edition. Kevin R. Murphy, Charles O. Davidshofer. Copyright © 2005 by Pearson Education, Inc. All rights reserved.

1

Tests and Measurements

projective tests in the process of choosing a course of treatment for individual clients. The military uses test scores as aids in deciding which jobs an individual soldier might be qualified to fill. Tests are used in the world of work, both in personnel selection and in professional certification and licensure. Almost everyone reading this text has taken at least one standardized psychological test. Scores on such a test may have had some impact on an important decision that has affected your life. The area of psychological testing is therefore one of considerable practical importance. Psychological tests are used to measure a wide variety of attributes—intelligence, motivation, mastery of seventh-grade mathematics, vocational preferences, spatial ability, anxiety, form perception, and countless others. Unfortunately, one feature that all psychological tests share in common is their limited precision. They rarely, if ever, provide exact, definitive measures of variables that are believed to have important effects on human behavior. Thus, psychological tests do not provide a basis for making completely accurate decisions about individuals. In reality, no method guarantees complete accuracy. Thus, although psychological tests are known to be imperfect measures, a special panel of the National Academy of Sciences concluded that psychological tests generally represent the best, fairest, and most economical method of obtaining the information necessary to make sensible decisions about individuals (Wigdor & Garner, 1982a, 1982b). The conclusions reached by the National Academy panel form another important theme that runs through this text. Although psychological tests are far from perfect, they represent the best, fairest, and most accurate technology available for making many important decisions about individuals. Psychological testing is highly controversial. Public debate over the use of tests, particularly standardized tests of intelligence, has raged since at least the 1920s (Cronbach, 1975; Haney, 1981; Scarr, 1989).1 An extensive literature, both popular and technical, deals with issues such as test bias and test fairness. Federal and state laws have been passed calling for minimum competency testing and for truth in testing, terms that refer to a variety of efforts to regulate testing and to increase public access to information on test development and use. Tests and testing programs have been challenged in the courts, often successfully. Psychological testing is not only important and controversial, but it is also a highly specialized and somewhat technical enterprise. In many of the natural sciences, measurement is a relatively straightforward process that involves assessing the physical properties of objects, such as height, weight, or velocity.2 However, for the most part, psychological attributes, such as intelligence and creativity, cannot be measured by the same sorts of methods as those used to measure physical attributes. Psychological attributes are not manifest in any simple, physical way; they are manifest only in 1 Special issues of American Psychologist in November 1965 and October 1981 provide excellent summaries of many of the issues in this debate. 2 Note, however, that physical measurement is neither static nor simple. Proposals to redefine the basic unit of length, the meter, in terms of the time that light takes to travel from point to point (Robinson, 1983) provide an example of continuing progress in redefining the bases of physical measurement.

2

Tests and Measurements

the behavior of individuals. Furthermore, behavior rarely reflects any one psychological attribute, but rather a variety of physical, psychological, and social forces. Hence, psychological measurement is rarely as simple or direct as physical measurement. To sensibly evaluate psychological tests, therefore, it is necessary to become familiar with the specialized methods of psychological measurement. This chapter provides a general introduction to psychological measurement. First, we define the term test and discuss several of the implications of that definition. We then briefly describe the types of tests available and discuss the ways in which tests are used to make decisions in educational, industrial, and clinical settings. We also discuss sources of information about tests and the standards, ethics, and laws that govern testing.

PSYCHOLOGICAL TESTS—A DEFINITION The diversity of psychological tests is staggering. Thousands of different psychological tests are available commercially in English-speaking countries, and doubtlessly hundreds of others are published in other parts of the world. These tests range from personality inventories to self-scored IQ tests, from scholastic examinations to perceptual tests. Yet, despite this diversity, several features are common to all psychological tests and, taken together, serve to define the term test. A psychological test is a measurement instrument that has three defining characteristics: 1. A psychological test is a sample of behavior. 2. The sample is obtained under standardized conditions. 3. There are established rules for scoring or for obtaining quantitative (numeric) information from the behavior sample.

Behavior Sampling Every psychological test requires the respondent to do something. The subject’s behavior is used to measure some specific attribute (e.g., introversion) or to predict some specific outcome (e.g., success in a job training program). Therefore, a variety of measures that do not require the respondent to engage in any overt behavior (e.g., an X-ray) or that require behavior on the part of the subject that is clearly incidental to whatever is being measured (e.g., a stress electrocardiogram) fall outside the domain of psychological tests. The use of behavior samples in psychological measurement has several implications. First, a psychological test is not an exhaustive measurement of all possible behaviors that could be used in measuring or defining a particular attribute. Suppose, for example, that you wished to develop a test to measure a person’s writing ability. One strategy would be to collect and evaluate everything that person had ever written, from term papers to laundry lists. Such a procedure would be highly accurate, but impractical. A psychological test attempts to approximate this exhaustive procedure by

3

Tests and Measurements

collecting a systematic sample of behavior. In this case, a writing test might include a series of short essays, sample letters, memos, and the like. The second implication of using behavior samples to measure psychological variables is that the quality of a test is largely determined by the representativeness of this sample. For example, one could construct a driving test in which each examinee was required to drive the circuit of a race track. This test would certainly sample some aspects of driving but would omit others, such as parking, following signals, or negotiating in traffic. It would therefore not represent a very good driving test. The behavior elicited by the test also must somehow be representative of behaviors that would be observed outside the testing situation. For example, if a scholastic aptitude test were administered in a burning building, it is unlikely that students’ responses to that test would tell us much about their scholastic aptitude. Similarly, a test that required highly unusual or novel types of responses might not be as useful as a test that required responses to questions or situations that were similar in some way to those observed in everyday life. Standardization A psychological test is a sample of behavior collected under standardized conditions. The Scholastic Assessment Tests (SAT), which are administered to thousands of high school juniors and seniors, provide a good example of standardization. The test supervisor reads detailed instructions to all examinees before starting, and each portion of the test is carefully timed. In addition, the test manual includes exhaustive instructions dealing with the appropriate seating patterns, lighting, provisions for interruptions and emergencies, and answers to common procedural questions. The test manual is written in sufficient detail to ensure that the conditions under which the SAT is given are substantially the same at all test locations. The conditions under which a test is administered are certain to affect the behavior of the person or persons taking the test. You would probably give different answers to questions on an intelligence test or a personality inventory administered in a quiet, well-lit room than you would if the same test were administered at a baseball stadium during extra innings of a play-off game. A student is likely to do better on a test that is given in a regular classroom environment than he or she would if the same test were given in a hot, noisy auditorium. Standardization of the conditions under which a test is given is therefore an important feature of psychological testing. It is not possible to achieve the same degree of standardization with all psychological tests. A high degree of standardization might be possible with many written tests, although even within this class of tests the conditions of testing might be difficult to control precisely. For example, tests that are given relatively few times a year in a limited number of locations by a single testing agency (e.g., the Graduate Record Examination Subject Tests) probably are administered under more standard conditions than are written employment tests, which are administered in hundreds of personnel offices by a variety of psychologists, personnel managers, and clerks. The greatest difficulty in standardization, however, probably lies in the broad class of tests that are ad-

4

Tests and Measurements

ministered verbally on an individual basis. For example, the Wechsler Adult Intelligence Scale (WAIS-III), which represents one of the best individual tests of intelligence, is administered verbally by a psychologist. It is likely that an examinee will respond differently to a friendly, calm examiner than to one who is threatening or surly. Individually administered tests are difficult to standardize because the examiner is an integral part of the test. The same test given to the same subject by two different examiners is certain to elicit a somewhat different set of behaviors. Nevertheless, through specialized training, a good deal of standardization in the essential features of testing can be achieved. Strict adherence to standard procedures for administering various psychological tests helps to minimize the effects of extraneous variables, such as the physical conditions of testing, the characteristics of the examiner, or the subject’s confusion regarding the demands of the test. Scoring Rules The immediate aim of testing is to measure or to describe in a quantitative way some attribute or set of attributes of the person taking the test. The final, defining characteristic of a psychological test is that there must be some set of rules or procedures for describing in quantitative or numeric terms the subject’s behavior in response to the test. These rules must be sufficiently comprehensive and well defined that different examiners will assign scores that are at least similar, if not identical, when scoring the same set of responses. For a classroom test, these rules may be simple and well defined; the student earns a certain number of points for each item answered correctly, and the total score is determined by adding up the points. For other types of tests, the scoring rules may not be so simple or definite. Most mass-produced standardized tests are characterized by objective scoring rules. In this case, the term objective should be taken to indicate that two people, each applying the same set of scoring rules to an individual’s responses, will always arrive at the same score for that individual. Thus, two teachers who score the same multiplechoice test will always arrive at the same total score. On the other hand, many psychological tests are characterized by subjective scoring rules. Subjective scoring rules typically rely on the judgment of the examiner and thus cannot be described with sufficient precision to allow for their automatic application. The procedures a teacher follows in grading an essay exam provide an example of subjective scoring rules. It is important to note that the term subjective does not necessarily imply inaccurate or unreliable methods of scoring responses to tests, but simply that human judgment is an integral part of the scoring of a test. Tests vary considerably in the precision and detail of their scoring rules. For multiplechoice tests, it is possible to state beforehand the exact score that will be assigned to every possible combination of answers. For an unstructured test, such as the Rorschach inkblot test, in which the subject describes his or her interpretation of an ambiguous abstract figure, general principles for scoring can be described, but it may be impossible to arrive at exact, objective scoring rules. The same is true of essay tests in the classroom; although general scoring guidelines can be established, in most cases, it is

5

Tests and Measurements

difficult to describe the exact rules that are used in scoring each essay. Thus, two similarly trained psychologists reading the same Rorschach protocol or two teachers reading the same essay generally will not score it in identical ways. However, most psychological tests are designed so that two examiners confronted with the same set of responses will give similar scores. A measure that does not meet this criterion cannot be considered a satisfactory example of a psychological test. The existence of a set of rules and procedures, which may be general or implicit, for scoring responses to a test is of vital importance in psychological testing. Imagine a personality test in which the method of scoring is left totally to the whim of the person who administers the test. If someone were to take this test and ask three different psychologists to score it, he or she would probably receive three totally different scores. It is likely that the test scores would tell you more about the psychologist who scored the test than they would about the person who took it. Since, in this case, test scores would bear no valid relation to the behavior of the person taking the test, it is impossible to regard such a test as an accurate or a useful measure of some stable characteristic or attribute of the person taking the test.

TYPES OF TESTS Most psychological tests can be sorted into three general categories: 1. Tests in which the subject performs some specific task, such as writing an essay, answering multiple-choice items, or mentally rotating images presented on a computer screen. 2. Tests that involve observations of the subject’s behavior within a particular context. 3. Self-report measures, in which the subject describes his or her feelings, attitudes, beliefs, interests, and the like.

Tests of Performance The first and most familiar category of tests is one in which subjects are given some well-defined task that they try their best to perform successfully and in which the test score is determined by the subject’s success in completing each task. Cronbach (1970) refers to this type of test as a “test of maximal performance.” The defining feature of a performance test relates to the intent or the state of mind of the person taking the test. It is assumed in performance testing that the examinee knows what he or she should do in response to the questions or tasks that make up the performance test and that the person being tested exerts maximum effort to succeed. Thus, performance tests are designed to assess what a person can do under the conditions represented by the testing situation. Standardized tests of general mental ability, referred to as intelligence tests, are among the best exemplars of this type of test. Here, subjects may respond to hundreds of multiple-choice items, and the test score is determined by the number of items answered correctly. Tests of specific abilities, such as spatial ability or mechanical comprehension, are also examples of this type of test. Tests of more specific skills or proficiency, such as a biology test or a music test, also fall into this category.

6

Tests and Measurements

A number of tests ask the respondent to perform some physical or psychomotor activity. In a typical computerized psychomotor ability test, the examinee might have to use the joysticks to keep a cursor in contact with a specific spot that moves randomly on a screen; performance on this type of task can be assessed in terms of total contact time, number of times contact was broken, or some combination of these factors. Other examples of complex physical performance tests include flight simulators and even the road test you had to take before getting your driver’s license. Even video games could be thought of as psychomotor performance tests. Behavior Observations Many psychological tests involve observing the subject’s behavior and responses in a particular context. To assess salespeople’s competence in dealing with problem customers, many stores recruit observers to come into their stores at random intervals and observe and record each salesperson’s behavior. They may even play the role of a troublesome customer and record the salesperson’s technique (or lack of it) for handling different sorts of problems. This type of test differs from a performance test in that the subject does not have a single, well-defined task that he or she is trying to perform. In fact, in this type of test, the subject may not even know that his or her behavior is being measured. Thus, rather than assessing maximal performance on some welldefined and well-understood task, behavioral observation tests assess typical behavior or performance within a specific context (Cronbach, 1970). Observations of typical performance or behavior are used in measuring a variety of attributes, ranging from social skills and friendship formation to job performance. Interviews, including both clinical interviews and employment interviews, can be thought of as behavior observation methods. Although an applicant may indeed try to “do his best” in an employment interview, the task confronting that applicant is relatively unstructured, and useful data may be obtained by observing the applicant’s behavior in this potentially stressful dyadic interaction. Systematic observations of behavior in naturalistic situations are particularly useful in assessing attributes such as social skills or adjustment. For example, a school psychologist who wishes to assess a child’s ability to relate well with his or her peers might observe, perhaps indirectly, that child’s interaction with other children in a variety of structured and unstructured situations (e.g., in the classroom, in a playground) and might systematically record the frequency, intensity, or direction of several types of critical behaviors. Similarly, the behavior of a patient of a mental ward might be systematically observed and recorded in an effort to assess his or her responses to specific treatments. Self-Reports The final class of test includes a variety of measures that ask the subject to report or describe his or her feelings, attitudes, beliefs, values, opinions, or physical or mental state. Many personality inventories can be regarded as self-report tests. This category also includes a variety of surveys, questionnaires, and polls.

7

Tests and Measurements

Self-reports are not necessarily taken at face value. A hypothetical personality test might include an item like “Artichokes are out to get me.” The fact that a person endorses (or fails to endorse) such an item may have little to do with his or her true feelings regarding artichokes but may provide valuable information about the respondent’s state of mind. Likewise, the subject’s indication that he or she prefers going to museums to reading magazines may, by itself, provide little information. However, if we knew that all successful opticians preferred magazines, whereas successful stockbrokers always preferred museums, we would be in a position to predict that in certain respects that person’s interests more closely matched those of stockbrokers than those of opticians. A number of measurement techniques contain features of both behavior observations and self-reports. For example, interviews may include questions dealing with the respondent’s thoughts, opinions, or feelings; this is particularly true for clinical interviews. The two methods may not always yield comparable conclusions; a person who describes himself or herself as timid and withdrawn may nevertheless exhibit aggressive behaviors in a variety of settings. However, the two techniques will often yield comparable and complementary information about an individual.

TESTS AND DECISIONS—USES OF PSYCHOLOGICAL TESTS Origins of Testing The roots of psychological testing can be traced back to ancient times. Over 3,000 years ago, the Chinese used written tests in filling civil service posts, and systematic testing of both physical and mental prowess was employed by the ancient Greeks. However, as Table 1 illustrates, the systematic development of psychological tests is a relatively recent phenomenon. The theory and technology that underly modern psychological testing have, for the most part, been developed within the last 100 years.

Table 1

SOME MILESTONES IN THE DEVELOPMENT OF TESTS

1000 B.C. 1850–1900 1900–1920 1920–1940 1940–1960 1960–1980 1980–Present

8

Testing in Chinese civil service Civil service examinations in the United States Development of individual and group tests of cognitive ability, development of psychometric theory Development of factor analysis, development of projective tests and standardized personality inventories Development of vocational interest measures, standardized measures of psychopathology Development of item response theory, neuropsychological testing Large-scale implementation of computerized adaptive tests

Tests and Measurements

At the turn of the century, experimental psychologists such as Cattell and biologists such as Galton and Pearson contributed to the development, construction, and evaluation of psychological tests (Hilgard, 1989). Developments in the theory of intelligence, together with refinements of statistical techniques such as factor analysis, also contributed to the growth of testing. The strongest impetus for the development of psychological tests, however, has been practical rather than theoretical. Consistently throughout the history of testing, major test types have been developed to meet specific practical needs. The history of intelligence testing provides the clearest example of the influence of practical considerations on the development of psychological tests. The first successful systematic tests of general intelligence, developed by Binet and Simon, were designed to assist the French Ministry of Public Instruction in assessing and classifying mentally retarded students. Binet’s tests require a highly trained examiner and are administered on a one-to-one basis. Although these tests provided a reliable method of assessing mental ability, they were not suitable for large-scale testing programs. America’s sudden entry into World War I, which brought about a large influx of recruits who needed to be classified according to mental ability, pointed to the need for tests that could be administered to large groups of people. The U.S. Army’s attempts to classify recruits led directly to the development of group tests of general mental ability, most notably Army Alpha and Army Beta. Today, the Armed Services Vocational Aptitude Battery, a standardized group-administered test of several abilities, represents the most widely administered test of its kind. A combination of practical and theoretical concerns has led to several developments in intelligence testing. Most notable is the development of computerized adaptive testing, in which the test is tailored to the individual respondent. Computerized tests take advantage of advances in technology to adapt tests to the responses and capabilities of the individuals examined. The history of personality testing provides a similar example of how the practical need to make decisions about individuals has fostered test development. The major impetus for the development of a structured measure of psychopathology was, again, the need to screen large numbers of recruits during World War I (P. H. Dubois, 1970). Although developed too late to serve this function, the Woodward Personal Data Sheet became a prototype of other self-report measures. The development of objective tests such as the Minnesota Multiphasic Personality Inventory and projective tests such as the Rorschach inkblot test, though not tied to such large-scale decision problems, nevertheless stemmed from the need to make systematic decisions about individuals. Yet another similar example can be drawn from the field of scholastic testing. Before the advent of standardized tests, college admissions procedures were typically subjective, cumbersome, and highly variable from one school to another. As the number of students applying to college increased, the need for selection procedures that were fair, accurate, and economical became apparent. Programs initiated by the College Entrance Examination Board led to the development of large-scale scholastic tests such as the SAT and the Graduate Record Examinations (GRE). These tests are

9

Tests and Measurements

extensions of the group-testing techniques developed by the U.S. Army and represent the most extensive civilian applications of psychological testing. Educational Testing. Educational institutions must make admissions and advancement decisions about millions of students every year. Standardized massproduced tests, such as the SAT, are widely used in admissions, whereas decisions to advance or hold back students, particularly at the elementary and junior high school levels, are likely to be made largely on the basis of locally constructed classroom tests. Although admissions and advancement are the most obvious applications of tests in educational settings, they are not the only applications. For example, intelligence tests are likely to be included in the assessment of students for placement in a variety of special education programs. School psychologists might use a number of performance tests and behavior observation tests in diagnosing learning difficulties. Guidance counselors might use vocational interest measures in advising students. Tests are used by schools and localities in needs assessment and in evaluating the effectiveness of different curricula. Test results may be used in allocating funds to different schools for remedial education. Finally, the publication of average test results for each school in a district is often part of a general drive for accountability in education. At one time, certification of a student as a bona fide high school graduate was largely the responsibility of the individual school. Today, minimum competency tests are used in many states and communities for the purpose of certifying graduates. A student who successfully completes 4 years of high school but who fails such a test may now receive a certificate of attendance rather than a diploma. Personnel Testing. Ability tests and tests of specific skills are widely used in personnel selection in both the public and the private sectors. A high score on a written test is necessary to qualify for many jobs in the local, state, and federal governments. Ability tests are also used widely in the military, both for screening new volunteers and for the optimal placement of individual soldiers according to their skills or abilities. A variety of formal and informal training programs is carried out at the workplace. Tests are used to assess training needs, to assess an individual worker’s performance in training, and to assess the success of training programs. Tests might also be used as part of a management development and career counseling program. At regular intervals, supervisors are called on to assess the job performance of their subordinates; such supervisory performance appraisals can be thought of as behavior observation measures. Similarly, interviews and managerial assessment tests can be thought of as applications of behavior observation testing to personnel evaluation. Clinical Testing. Clinical psychologists employ a wide variety of tests in assessing individual clients. The most common stereotype of a clinical psychologist suggests an extensive use of diagnostic tests. Indeed, a number of objective and projective personality tests and diagnostic tests is widely used by clinical psychologists. However, clinical testing is by no means limited to personality tests.

10

Tests and Measurements

Clinical psychologists also employ neuropsychological test batteries. For example, perceptual tests are quite useful in detecting and diagnosing many types of brain damage. Many clinical psychologists receive specialized training in administering individual intelligence tests. Not only do these tests provide data regarding the intellectual performance of individuals, but they also represent a valuable opportunity to observe the subject’s behavior in response to a challenging situation. Thus, an intelligence test represents a specialized version of the most common type of clinical measurement, the clinical interview. In some respects, clinical testing is unique in that the clinician is often an integral part of the testing experience. Thus, in clinical testing the clinician must be regarded as a measurement instrument.

TESTING ACTIVITIES OF PSYCHOLOGISTS The consumers of tests represent a large and varied group, ranging from individual counselors to school principals and personnel administrators. Although there is considerable overlap in the ways in which psychologists and other consumers use these tests, in some ways, psychologists’ uses of the tests are specialized. The testing activities of psychologists are of considerable interest to psychology students, and it is useful to describe briefly some of the uses of tests by different types of psychologists. Psychological assessment is common in several of the fields of applied psychology, although the types of assessment devices and techniques vary widely across different fields. Table 2 lists typical assessment activities for several fields of applied psychology; detailed descriptions of many of these assessment activities are presented in Wise (1989). It is important to note two things. First, the table is not comprehensive. Practicing psychologists in each field may carry out a variety of assessment activities not listed here. Second, there is considerable overlap among the assessment activities of the different fields. For example, psychologists in any of the five fields listed in Table 2

TYPICAL ASSESSMENT ACTIVITIES FOR SEVERAL FIELDS OF APPLIED PSYCHOLOGY

Field Clinical Psychology Counseling Psychology Industrial/Organizational Psychology School Psychology Neuropsychology

Typical assessment activities Assessment of intelligence Assessment of psychopathology Assessment of career interests Assessment of skills Assessment of social adjustment Assessment of managerial potential Assessment of training needs Assessment of cognitive and psychomotor ability Assessment of ability and academic progress Assessment of maturity and readiness for school Assessment of handicapped children Assessment of brain damage Evaluation of Alzheimer’s syndrome patients

11

Tests and Measurements

Table 2 might use tests of general intelligence in their assessments; the tests themselves may, however, vary across fields. There are two differences in the ways in which individuals with training and those without training in psychology typically use these tests. First, psychologists who use tests in their work are more likely than other consumers of tests to have some training in measurement theory. Psychologists are, therefore, often more sophisticated consumers who recognize the limitations of the tests they use. They are better able to effectively use the results of test research. Second, many of the tests used by psychologists require a greater degree of professional judgment in their interpretation than is required for tests used by a wider audience. There is an ongoing debate within the profession of psychology about the competencies needed to administer and use psychological tests. In particular, questions have been raised about the use of psychological tests by individuals who are not licensed psychologists. In our view, there is no compelling argument in favor of restricting the use of psychological tests to licensed psychologists, nor is there a good argument that holding a license automatically qualifies one to use and interpret all psychological tests. In many states, licensure is restricted to psychologists involved in the mental health areas, and individuals are not always required to demonstrate expertise in psychological testing or test theory to obtain a license as a psychologist. Different types of tests require different competencies (Eyde, Robertson & Krug, 1993), and psychologists in the different fields listed in Table 2 might develop expertise in the types of tests used in that field, without developing competencies in other areas of testing. The Ethical Principles of Psychologists and Code of Conduct (reproduced in Appendix: Ethical Principles of Psychologists and Code of Conduct) clearly states that a psychologist will limit his or her practice to areas in which he or she has relevant training or expertise, and psychologists who abide by that code are likely to limit their use of tests to the specific areas in which they can demonstrate some competency. The range of tests that is available (and the range of skills needed to administer, score, and interpret these tests) is so vast that very few individuals will have the competencies needed to wisely use all types of psychological tests.

INFORMATION ABOUT TESTS The greater diversity of psychological tests is both a help and a hindrance to the potential test user. On the positive side, tests are available to meet a wide variety of practical needs, from personnel selection to neurological diagnosis. Unfortunately, many tests might be used in any specific setting, and they vary considerably in cost and quality. Furthermore, new tests are always being developed, and research that points to unsuspected flaws or novel applications of established tests is constantly coming out. To make sensible choices regarding specific tests, the user must know what types of tests are available for specific purposes and must have access to current research involving those tests, as well as to critical evaluations of each test. Fortunately, the Mental Measurements Yearbooks, originally edited by O. K. Buros, fulfill these functions. The Mental Measurements Yearbooks have been issued at somewhat irregular intervals over a period of more than 50 years. Together, the Mental Measurements Yearbooks

12

Tests and Measurements

provide detailed information regarding most of the tests currently available in English. The yearbook entry for each test indicates the publisher, available forms, and price of the test, together with a comprehensive bibliography of published research involving that test. Of most importance, most tests are critically reviewed by a number of experts. These reviews point out the technical strengths and weaknesses of each test, thereby representing an invaluable evaluation of each test. In the event that a test is judged technically deficient, reviewers are encouraged to indicate whether another commercially available test does a better job of measuring the specific attributes targeted by the test under review. A similar multivolume reference work, entitled Test Critiques, contains reviews of a large number of tests. The first volume of this series was released in 1984 (Keyser & Sweetland, 1984), and several additional volumes have been released since then. Since Buros’s death, the task of compiling the yearbooks has been taken over by the Buros Institute, which is affiliated with the University of Nebraska. Figure 1 shows a typical Mental Measurements Yearbook entry, which describes a popular personality inventory. This entry is quite short because no evaluative review is attached. Other entries, especially those for widely used tests, might run several pages and might include the comments of several reviewers. In addition to the Mental Measurements Yearbooks, Buros compiled a comprehensive bibliography of all commercially available tests published in English-speaking countries. Tests in Print V and the Thirteenth Mental Measurements Yearbook are the latest volumes in the series begun by Buros. Buros also compiled Intelligence Tests and Reviews (1975a), Personality Tests and Reviews (1975b), and Vocational Tests and Reviews (1975c), which contain additional material that had not appeared in any Mental Measurements Yearbook. Although by no means as comprehensive as the yearbooks, a list of representative tests, grouped according to their major application (e.g., occupational tests, clerical), is presented in Appendix: Forty Representative Tests. Research on psychological testing is published regularly in several professional journals. Several journals deal primarily with test theory, or psychometrics, whereas other journals deal primarily with testing applications. A list of some of the professional journals that represent primary outlets for research on psychological testing is presented in Table 3. It is important to note that the classification of each journal as primarily psychometric or applications-oriented is by no means ironclad; both applied and theoretical articles appear in all these journals.

STANDARDS FOR TESTING Given the widespread use of tests, there is considerable potential for abuse of psychological testing. A good deal of attention has therefore been devoted to the development and enforcement of professional and legal standards and guidelines for psychological testing. The American Psychological Association (APA) has taken a leading role in developing professional standards for testing. The most general document dealing with psychological testing is the “Ethical Principles of Psychologists and Code of Conduct,”

13

Tests and Measurements Personal Orientation Inventory. Grades 9–16 and adults; 1962–74; POI; 12 scores; time competence, inner directed, self-actualizing value, existentiality, feeling reactivity, spontaneity, self-regard, self-acceptance, nature of man, synergy, acceptance of aggression, capacity for intimate contact; I form (’63, 8 pages); manual with 1980 supplementation (’74, 56 pages); profile (’65, 2 pages); separate answer sheets (Digitek, IBM 1230, NCS, hand scored) must be used; 1983 price data; $10.50 per 25 tests; $5.50 per 50 Digitek, IBM 1230, or NCS answer sheets; $5.75 per 50 hand scored answer sheets; $10 per set of IBM or hand scoring stencils; $4 per 50 profiles; $2.75 per manual; $3.75 per specimen set; Digitek or NCS scoring service, $.95 or less per test; (30–40) minutes; Everett L. Shostrum; EdITS/Educational and Industrial Testing Service.* See T3:1789 (98 references); for an excerpted review by Donald J. Tosi and Cathy A. Lindamood, see 8:641 (433 references); see also T2:1315 (80 references); for reviews by Bruce Bloxom and Richard W. Coan, see 7:121 (97 references); see also P:193 (26 references). Test References 1. Abernathy, R. W., Abramowitz, S. I., Roback, H. B., Weitz, L. J., Abramowitz, C. V., & Tittler, B. The impact of an intensive consciousness-raising curriculum on adolescent women. PSYCHOLOGY OF WOMEN QUARTERLY, 1977, 2, 138–148. 2. Reddy, W. B., & Beers, T. Sensitivity training . . . and the healthy become self-actualized. SMALL GROUP BEHAVIOR, 1977, 8, 525–532. 3. Ware, J. R., & Barr, J. E. Effects of a nine-week structured and unstructured group experience on measures of self-concepts and self-actualization. SMALL GROUP BEHAVIOR, 1977, 8, 93–100. 4. MacVanc, J. R., Lange, J. D., Brown, W. A., & Zayat, M. Psychological functionin