237 6 11MB
English Pages [543] Year 2020
Individual Differences and Personality
Individual Differences and Personality provides a student-friendly introduction to both classic and cutting-edge research into personality, mood, motivation and intelligence, and their applications in psychology and in felds such as health, education and sporting achievement. Including a new chapter on ‘toxic’ personality traits, and an additional chapter on applications in real-life settings, this fourth edition has been thoroughly updated and uniquely covers the necessary psychometric methodology needed to understand modern theories. It also develops deep processing and effective learning by encouraging a critical evaluation of both older and modern theories and methodologies, including the Dark Triad, emotional intelligence and psychopathy. Gardner’s and hierarchical theories of intelligence, and modern theories of mood and motivation are discussed and evaluated, and the processes which cause people to differ in personality and intelligence are explored in detail. Six chapters provide a non-mathematical grounding in psychometric principles, such as factor analysis, reliability, validity, bias, test-construction and test-use. With self-assessment questions, further reading and a companion website including student and instructor resources, this is the ideal resource for anyone taking modules on personality and individual differences. Colin Cooper lectured in Psychology at Queen’s University Belfast for 20 years before emigrating to Canada. He is the author of several books on personality, intelligence and psychological measurement and of numerous research articles. He was elected president of the International Society for the Study of Individual Differences from 2021–2023.
Individual Differences and Personality Fourth Edition Colin Cooper
Fourth edition published 2021 by Routledge 2 Park Square, Milton Park, Abingdon, Oxon, OX14 4RN and by Routledge 52 Vanderbilt Avenue, New York, NY 10017 Routledge is an imprint of the Taylor & Francis Group, an informa business © 2021 Colin Cooper The right of Colin Cooper to be identifed as author of this work has been asserted by him in accordance with sections 77 and 78 of the Copyright, Designs and Patents Act 1988. All rights reserved. No part of this book may be reprinted or reproduced or utilised in any form or by any electronic, mechanical, or other means, now known or hereafter invented, including photocopying and recording, or in any information storage or retrieval system, without permission in writing from the publishers. Trademark notice: Product or corporate names may be trademarks or registered trademarks, and are used only for identifcation and explanation without intent to infringe. First edition published by Hodder Education 2010 Third edition published by Routledge 2015 British Library Cataloguing-in-Publication Data A catalogue record for this book is available from the British Library Library of Congress Cataloging-in-Publication Data A catalog record has been requested for this book ISBN: 978-0-367-18109-3 (hbk) ISBN: 978-0-367-18111-6 (pbk) ISBN: 978-0-429-05957-5 (ebk) Typeset in Sabon by Apex CoVantage, LLC Visit the companion website: http://www.routledge.com/cw/cooper
To Wesley
Contents
Preface
ix
1 Introduction to individual differences
1
2 Personality by introspection
9
3 Measuring individual differences
33
4 Depth psychology
53
5 Factor analysis
75
6 Broad trait theories of personality
99
7 Emotional intelligence
125
8 Reliability and validity of psychological tests
147
9 Narrow personality traits
171
10 Sub-clinical and antisocial personality traits
191
11 Biological, cognitive and social bases of personality
223
12 Structure and measurement of abilities
253
13 Ability processes
277
14 Personality and ability over time
301
15 Environmental and genetic determinants of personality and abilities
325
16 Psychology of mood and motivation
351
17 Applications in health, clinical, counselling and forensic psychology
381
18 Applications to education, work and sport
403
19 Performing and interpreting factor analyses
431
20 Constructing a test
457
vii
Contents
viii
21 Problems with tests
471
22 Integration and conclusions
491
Appendix A Correlations Appendix B Code of fair testing practices in education Author Index Subject Index
499 507 515 525
Preface
This book is all about personality, intelligence, mood and motivation – the branch of psychology that considers how (and, equally importantly, why) people are psychologically very different from one another. It also considers how such knowledge can be applied to realworld problems (e.g., identifying psychopaths, discovering what leads children to perform well in school) and considers how individual differences may be measured using psychological tests and other techniques. The book is written for students in psychology departments who are taking courses such as individual differences, personality, intelligence and ability, psychometrics or the assessment of individual differences. However, students in other disciplines (such as education) and at other levels (e.g., postgraduate courses in occupational/ organisational psychology) are also likely to fnd much to interest them – as is anyone else who is interested in the nature, causes and assessment of personality and intelligence, or who uses or develops psychological tests. It has fve distinguishing features. l It is ‘student friendly’ and introduces material as simply as possible, keeping jargon and
mathematical detail to a minimum. l It offers a critical evaluation of theories, methods and experiments. There are two reasons
for this. Many years ago I learned from a book which described many theories but gave little indication about what their merits and problems were. This was unsatisfying, as it did not show which theories were dead-ends (or why), and did not show how the feld was developing. Secondly, in ‘leading by example’ I hope that my critiques will encourage readers to learn actively – to get into the habit of thinking about what they read, and relating it to what they already know, rather than just memorising facts. This ‘deep learning’ will make the material easier to remember, as well as developing important skills of evidence-based evaluation. l It takes readers on a journey – from older (though sometimes still relevant and infuential theories) through to modern research and its applications in the real world. l It covers abilities, mood, motivation, test development and psychometric methods (such as factor analysis, reliability, validity, test bias) as well as personality theories. There are plenty of other textbooks covering personality theory, but students graduating in psychology should surely also know something about the psychology of abilities, mood, motivation and so on – especially as many of these are important when using psychology to address real-life problems. In addition, without a little psychometric knowledge it is impossible to understand modern personality theories or identify and use appropriate questionnaires and tests for research projects or other purposes.
ix
Preface l It includes ‘self-assessment questions’ and exercises in each chapter to ensure that key
concepts have been grasped and to encourage active learning, together with a range of resources on the companion website which also support learning and teaching. The book consists of sixteen chapters of psychological content (theories, applications of those theories, etc.) interspersed with six methodological chapters. The chapters are meant to be read in order, as that way a methodological chapter is encountered before the psychological chapter which relies on it. Thus although ordering of chapters may seem unusual, I believe it is necessary to link them in this way. Rest assured though, as this is not a psychometrics textbook, so readers will search in vain for the mathematical formulae that are fundamental to this branch of psychology. Given psychology students’ famous dislike of matters mathematical, I have instead tried to provide a conceptual understanding of the measurement issues. This edition has been substantially revised and updated throughout in the light of recent research. Two new chapters have been added – one on further applications of individual differences research to real-life issues, and one on toxic personality traits, such as the ‘Dark Triad’ of psychopathy, narcissism and Machiavellianism. My criteria for including theories and methodologies should be made explicit. I have included theories if they are well supported by empirical evidence, important for historical reasons, currently popular, or useful in applied psychology. I have not included much information about the development of personality and ability, social learning theory or constructivist models, since these are traditionally taught in developmental and social psychology courses. I must thank my editors Eleanor Taylor, Sarah Gore and Alison Macfarlane at Routledge, together with Sarah Fish, as well as the panel of anonymous reviewers who gave excellent feedback on draft chapters. The omissions and errors which remain are entirely my own fault. Finally I must thank Wesley for his forbearance whenever I scuttled to the basement with a distracted look on my face to continue writing . . . and our second generation of cats (Lucky and PussPuss) for coming down to play with me. Colin Cooper London, Ontario June 2020
x
Chapter 1 Introduction to individual differences
Introduction Most branches of psychology examine how people (or animals) behave in different settings or under different experimental conditions: they assume that people are all much the same. Thus when developmental psychologists talk about ‘stages of development’, the tacit assumption is that all children develop in broadly similar ways. Likewise, social psychology produces theories to explain why people in general may show obedience to authority, prejudice and other group-related behaviours. Cognitive psychologists have shown that people recognise the meaning of a word more quickly when it is preceded by a semantically related ‘prime’. Physiological psychologists often assume that everyone’s nervous system has much the same sort of structure and will operate in much the same way. So much of psychology involves fnding rules that describe how people in general behave. Yet this is only part of the story, for there is also signifcant variation between people. Some of this variation is random; people will behave differently on different occasions for reasons which are not clear. However some of the variations will be systematic. Some individuals are more obedient to authority or more prejudiced than others. Some people will recognise all types of words quickly while others will take far longer. Some will be more anxious than others, whatever the situation. The psychology of individual differences seeks to understand two things. It frst explores the ways in which people vary psychologically – using terms such as ‘anxiety’ or ‘intelligence’. Second, it develops theories to explain how and why such variations in behaviour come about. Do these individual differences in anxiety or intelligence arise from the way in which children developed, their genetic makeup or the family environment which they experienced during childhood? Or are variations in physiological makeup, such as the extent and depth of the wrinkles on the surface of the brain or the amount of activity in the autonomic nervous system, linked to any of these characteristics? The possibilities are endless.
1
Introduction to individual differences
Given that the aim of psychology is to describe, explain and predict the behaviour of organisms (people), it is clearly necessary to understand both the setting in which the behaviour occurs, the general laws and relevant individual differences. For example, in order to predict whether a jailed psychopath will re-offend if released, it is necessary to understand both the general law (the statistical probability that a jailed psychopath taken at random will re-offend), and individual differences that are related to re-offending behaviour (for example, the extent to which the person shows remorse or empathy for their victims). This book focuses on individual differences.
Learning outcomes Having read this chapter you should be able to: l l l
Discuss why it is important to study individual differences, and how it differs from other areas of psychology. Outline how individual differences may be discovered. Appreciate why it is necessary to determine how people vary (‘structural models’) and the lower-level processes which cause these variations to occur (‘process models’) to have a proper understanding of individual differences.
I believe that there are four main reasons for studying individual differences: l It is of interest in its own right. Most of us believe that personality, intelligence and so on
are important characteristics of people when making friends and seeking partners – perhaps enduring music which we dislike, boring parties and dating apps rather than simply marrying the person next door. But how should we conceptualise their personality? What characteristics of a person should we look for? l Psychological tests are useful in applied psychology. The study of individual differences almost invariably leads to the publication of psychological tests. These measure abilities, knowledge, personality, mood and many other characteristics. They are of immense value to educational, occupational and clinical psychologists, teachers, nurses, careers counsellors and others who may want to diagnose learning diffculties, dyslexia or outstanding mental ability or seek to assess an individual’s suitability for promotion, level of depression or suitability for a post that requires enormous attention to detail. The proper use of psychological tests can thus beneft both society and individuals. l Tests are useful ‘dependent variables’ in other branches of psychology. Psychologists make extensive use of psychological tests when conducting experiments. A clinical psychologist may suspect that feelings of hopelessness often lead to suicide attempts. In order to test this hypothesis, it is obviously necessary to have some way of measuring hopelessness and by far the simplest way of doing so is to look for an appropriate psychological test. Cognitive psychologists studying the link between mood and memory (‘state dependent memory’) must be able to assess both mood and memory in order to be able to test whether a particular theory is valid and so they need sound mood questionnaires and tests or experimental tasks measuring memory.
2
Introduction to individual differences l Other branches of psychology can predict behaviour better when they consider individual
differences. Other branches of psychology rely on broad laws to predict behaviour, for example ‘behaviour therapy’, in which the principles of conditioning are used to break some undesirable habit. The therapist may know that a certain percentage of his or her patients may be ‘cured’ by this technique, but is unlikely to be able to predict whether any one individual is more or less likely than average to beneft from the therapy. However, it might well be found that the effectiveness of a particular type of treatment is affected by the individual’s personality and/or ability – a treatment that is successful in some individuals may be much less successful in others. By taking such individual differences into account, statistical tests become more sensitive. Instead of using analysis of variance to determine whether patients who are given behaviour therapy for a particular problem tend to show fewer symptoms than those who are assigned to a control condition, it is far better also to measure relevant personality and ability traits. One can then perform what is known as an ‘analysis of covariance’ instead. This shows whether the personality and ability traits are related to the number of symptoms shown and whether the two groups differ in the number of symptoms shown. It can show whether individual differences and/or the experimental condition infuence behaviour.
Main questions Any attempt to understand the nature of individual differences must really address two quite separate questions. The frst concerns the nature of individual differences – how individual differences should be conceptualised. There is a wide range of answers to this question, as will be seen in the following chapters. Indeed, it has been suggested that personality does not exist and that how we behave may be determined entirely by the situations in which we fnd ourselves rather than by anything ‘inside us’ and the evidence for such claims must be scrutinised carefully. The second important question concerns how and why individual differences in mood, motivation, ability and personality arise. It should be clear that research into the ‘how’ of individual differences can really only start once there is general agreement about their structure. It would be a waste of time to perform experiments in order to try to understand how ‘sociability’ (or ‘creativity’, ‘depression’, ‘the drive for achievement’, etc.) works if there is no good evidence that sociability is an important dimension of personality in the frst place. Thus studies of processes must logically follow on from studies of structure. Process models of individual differences address questions such as the following. Why should some children perform much better than others at school? Why should some people be shy and others outgoing? Why do some individuals’ moods swing wildly from depression to elation and back again? Why are some individuals apparently motivated by money to the exclusion of all else? We cannot hope to answer all of these questions in the following chapters, but we shall certainly explore what is known about the biological (and to some extent the social) processes that underlie personality, mood, ability and motivation. There is, however, one problem. Unless it is possible to measure individual differences accurately, it will be completely impossible either to determine the structure of personality, intelligence, etc. or to investigate its underlying processes. The development of good, accurate measures of individual differences (a branch of psychology known as psychometrics) is an absolutely vital step in developing and testing theories about the nature of individual differences and their underlying processes. For this reason, this book contains several chapters (3, 5, 8, 19–21) that focus on measurement issues.
3
Introduction to individual differences
How can we discover individual differences? What sort of data should we use to discover individual differences? This is not an easy question to answer, for there are several possibilities.
Clinical theories Several theories have grown out of the experiences of clinical psychologists, who realised that the ways in which they conceptualised ‘abnormal behaviour’ (particularly conditions such as anxiety, depression and perhaps schizophrenia) might also prove useful in understanding individual differences in the ‘normal’ population. Some have probably been rather quick to do this. Freud, for example, saw rather small samples of upper-middle-class Viennese women (many of whom showed symptoms that are so unusual that they do not appear in modern diagnostic manuals), refused to believe some of what they told him (such as memories of sexual abuse) and built up an enormous and complex theory about the personality structure and functions of humankind in general. Some clinically based theories are discussed in Chapter 2, and Freud’s contribution is covered in Chapter 4.
Studying individuals in detail Many people claim to have a rather good understanding of ‘what makes others tick’ – for members of their families and close friends, at any rate. For example, we may believe that we know through experience how to calm down (or annoy) others to whom we are close and may feel that we have a good, intuitive understanding of the types of issue that are important to them, thereby allowing us to ‘see the world from their point of view’ and predict their behaviour. For example, we all have some intuitive feeling about when to mention diffcult issues to those close to us. Perhaps getting to understand individuals in this way should be the mainstay of individual difference research? There are several diffculties with this approach, even if it can be proved that it leads to accurate prediction of
4
Introduction to individual differences
behaviour. First, it will (presumably) take a long period of time to know anyone well enough to be able to make good predictions about their behaviour. Second, it is not particularly scientifc, as it will be diffcult to quantify anything, or say precisely how one intuits that the other person will behave. Third, the vagaries of language will make it very diffcult to determine whether different people operate in different ways. Two people could describe the same characteristic in an individual in two quite different ways and it would be impossible to be sure that they were referring to precisely the same characteristic. However, the greatest problem of all is self-deception. It is very easy to overestimate how well one can predict someone else’s behaviour and there is good evidence that most observers will see and remember the 1% of behaviours that were correctly predicted and ignore or explain away the 99% of predictions that were incorrect – a phenomenon called ‘confrmatory bias’. Davies (2003) discusses this in the context of personality assessment.
Armchair speculation If one has made good, unbiased observations of how individuals behave in many situations, it might be reasonable to generate and test some hypotheses about behaviour. For example, you may notice that some individuals tend to be anxious and jumpy, worry, lose their temper more easily than most and so on. That is, the observer may notice that a whole bundle of characteristics seem to vary together and suggest that ‘anxiety’ (or something similar) might be an interesting aspect of personality. Of course, there are likely to be all sorts of problems associated with such casual observations. The observations may simply be wrong, or they may fail to take account of situations. For example, the people who were perceived as being anxious might all have been in some stressful situation – it may be that the situation (rather than the person) determines how they react. Moreover, the ideas may be expressed so vaguely that they are impossible to test, as in Plato’s observation that the mind is like a chariot drawn by four horses. Literature contains several testable hypotheses about personality, for example when Shakespeare speaks through Julius Caesar: Let me have men about me that are fat, Sleek-headed men and such as sleep o’ nights, Yond’ Cassius has a lean and hungry look. He thinks too much; such men are dangerous. (Shakespeare: Julius Caesar I:ii) This quotation suggests some rather interesting (and potentially empirically testable) process models of ‘dangerousness’.
Scientifc assessment of individuals using mental tests Because of the problems inherent in other approaches discussed already, many psychologists opt for a more scientifc approach to the study of personality and other forms of individual difference. One popular approach involves the use of statistical techniques to discover consistencies in behaviours across situations and to determine which behaviours tend to occur together in individuals. The raw data of such methods are either behaviours (such as performance on test items that require certain skills or knowledge), ratings of behaviour (obtained by trained raters who note well-defned behaviours and so very different from the clinical approach just mentioned) or self-ratings on questionnaires that are constructed using sound statistical techniques. Much care is taken to ensure that the measurements are both accurate and replicable.
5
Introduction to individual differences
A major problem with this approach was fagged by Michell (1997). The problem is that most research in individual differences uses questionnaires or tests, as is discussed in Chapter 3 – and Michell argues that the numbers which these tests produce do not constitute proper scientific measurement. A typical questionnaire item might be ‘do you suffer from “nerves”?’ and respondents are asked whether they strongly agree, agree, are neutral, disagree or strongly disagree with the statement. This item might be scored by awarding fve points for ‘strongly agree’, down to one point for ‘strongly disagree’; the higher the score on this item, the more neurotic the individual is thought to be. The problem is that these numbers do not behave the same way as measurements of length, weight, etc. For example, you should be concerned about whether someone who agrees with the item (scoring 4) is twice as neurotic as someone who disagrees (scoring 2). If you think that they are, remember that using numbers between 1 and 5 to represent the various responses was arbitrary; I could have chosen the numbers 0 to 4, in which case someone who agreed with the item (scoring 3) would have a score which is three times as large as someone who disagreed with it (scoring 1) . . . It is good to worry about such issues, because most psychologists – even professionals and learned researchers – do not, and tend to ignore Michell’s work in the hope that it will somehow go away. We return to these issues in Chapter 3.
Summary This brief chapter has introduced the general area of individual differences and has suggested that the topic is worth studying because of its inherent interest, the many practical applications of tests designed to measure individual differences, the need to measure individual differences in order to test theories in other branches of psychology and the ability to make more accurate predictions from theories that consider both individual differences and the impact of experimental interventions on behaviour. We have also considered some methods for studying individual differences, each with its own advantages and drawbacks, and have introduced the distinction between structural models (‘how do people differ?’) and process models which try to explain how these individual differences arise and operate.
6
Introduction to individual differences
References Davies, M. F. (2003). Confrmatory Bias in the Evaluation of Personality Descriptions: Positive Test Strategies and Output Interference. Journal of Personality and Social Psychology, 85(4), 736–744. Michell, J. (1997). Quantitative Science and the Defnition of Measurement in Psychology. British Journal of Psychology, 88, 355–383.
7
Chapter 2 Personality by introspection
Introduction We start by looking at the simplest possible way of measuring personality – asking people about themselves and others, in the hope that they will be honest enough (and insightful enough!) to provide useful data. The theories of George Kelly and Carl Rogers are included in this book because they are both simple and highly infuential in counselling psychology. They suggest that we should understand personality by asking how people view themselves and others. Rogers preferred to use clinical interviews and questionnaires to explore this whilst Kelly developed a technique known as the ‘repertory grid’ to understand how each individual sees other important people (or things) in their life. This technique has recently been used to understand how experts classify objects and develop computer programs (‘expert systems’) to help machines make such decisions – e.g., identify tumours from diagnostic scans. It is therefore of considerable practical use. However we identify several problems which suggest that asking people to introspect may not be the best way of advancing our understanding of individual differences.
Learning outcomes Having read this chapter you should be able to: l l l
Outline what is meant by a phenomenological theory. Administer and interpret a repertory grid to discover an individual’s personal constructs. Start to critically evaluate Kelly’s theory, identifying its strengths and limitations.
9
Personality by introspection
l l
Describe and discuss the contribution of Carl Rogers and client-centred therapy. Consider the strengths and weaknesses of Rogers’ theory, and the problems of measuring personality by exploring individuals’ self-concepts.
The theories of George Kelly and Carl Rogers are both phenomenological in nature – they focus on understanding each person as an individual, rather than classifying people into groups (‘she is an extravert’) or comparing people with others (‘she is considerably more sociable than the average person’). More specifcally, these theories suggest that a person’s personality can be understood by knowing how they understand and experience the world, both emotionally and cognitively. Thus both Kelly and Rogers focus on how the individual perceives him- or herself and other people. The aim is to understand each individual’s unique view of the world, through exploring his or her thoughts, feelings and beliefs. This is perhaps the most obvious and simple way of investigating personality, which is why we consider these theories frst. Both Rogers and Kelly maintain that our subjective view of ourselves, other people and events is of paramount importance. Since we can all choose to view events from a variety of standpoints (losing one’s job may be viewed as a personal rejection, a minor inconvenience or a great opportunity to change one’s lifestyle), any such theory of personality must focus on how we view and interpret events. Kelly suggested that understanding how a person views themselves and other people is the key to understanding their personality. Because of this belief in the ability of the individual to communicate his or her thoughts, feelings and experiences and potentially to alter ‘unhelpful’ views of the world, the theories described in this chapter have had a substantial impact on counselling and clinical psychology. Rogers’ theory, in particular, is almost indistinguishable from the practice of ‘client-centred therapy’ that is the cornerstone of counselling techniques; we return to this in Chapter 17.
Self-assessment question 2.1 What is meant by the phenomenological approach to personality study?
An introduction to George Kelly’s personal construct theory George Kelly trained as a clinical psychologist and ran a mobile psychological service in Kansas in the 1920s. He had a background in psychoanalytical theory, but gradually became convinced that his clients were ‘paralyzed by prolonged drought, dust-storms and economic concerns, not by overfowing libidinal forces’ (Rykman, 1992). He felt that what was lacking from traditional theories of personality was any account of the way in which individuals conceptualised, or tried to make sense of, the world. Indeed, such theories appeared to ignore
10
Personality by introspection
what seemed obvious to Kelly – that people strive to understand what is happening in their lives and to predict what will happen next. He noted that: A typical afternoon might fnd me talking to a graduate student at one o’clock, doing all those familiar things that thesis directors have to do – encouraging the student to pinpoint issues, to observe, to become intimate with the problem, to form hypotheses either inductively or deductively, to make some preliminary test runs, to relate his data to his predictions, to control his experiments so that he will know what led to what, to generalize cautiously and revise his thinking in the light of experience. At two o’clock I might have an appointment with a client. During that interview I would not be taking the role of scientist but rather helping the distressed person work out some solutions to his life’s problems. So what would I do? Why, I would try to get him to pinpoint issues, to observe, to become intimate with the problem, to form hypotheses either inductively or deductively, to make some preliminary test runs, to relate outcomes to anticipations, to control his ventures so that he will know what led to what, to generalize cautiously and to revise his dogma in the light of experience. (Kelly, 1955) Kelly’s cognitive theory thus suggests that each person operates rather like a scientist – making observations (of people), using these observations to formulate rules that explain how the world works, then applying those rules to new people and observing whether they behave as expected. If the rules do indeed seem to explain behaviour, the ‘model’ thus created is useful. If not, it needs to be refned or discarded in favour of an alternative. Different individuals will develop quite different models to predict how others will behave – a principle that Kelly termed ‘constructive alternativism’. The theory therefore how focuses on our perception of others and not (necessarily) how they really are. As a result of this we each develop our own informal theories about how others behave – ‘people who laugh a lot are probably sad and unhappy inside’, ‘bigoted people are not in touch with their own feelings’ and so on. Each of these theories may be useful in helping us to understand how certain individuals will react in certain situations and we are constantly trying to determine the types of person for whom these theories make accurate predictions and the settings in which the predictions work. For example, we test our theory of bigotry when we meet new people by trying to gauge how bigoted and how in touch with their feelings each person is and estimating whether there is a negative correlation between these two characteristics. I am unashamedly enthusiastic about Kelly’s work, which is in danger of being forgotten. It offers an excellent way of understanding how we each try to make sense of the world in idiosyncratic ways – something which the trait theories discussed in Chapters 6, 7, 9 and 10 are not designed to do. Understanding people as individuals, rather than comparing them to other people, seems to be hugely important for clinical work, and it will become apparent that some of the techniques developed by Kelly have applications in many branches of psychology and other disciplines. Hence Kelly’s theory is covered in more depth than is currently fashionable.
Why different people develop different systems of constructs There are several possible reasons why different individuals develop and use different constructs – why we view the world in different ways. First, not everyone is interested in predicting the same set of outcomes. A salesperson will have a fnancial interest in distinguishing
11
Personality by introspection
those who will make a purchase from those who are browsing and will therefore probably notice features of people’s behaviour that are not at all obvious to the rest of us. Second, people differ in the way in which they use language – by ‘antisocial’ I may mean ‘quiet and reserved’, but someone else may mean ‘aggressive or destructive’. So it is possible that two people who use two different words to describe a characteristic might be referring to precisely the same thing – one person’s ‘aloof’ might refer to exactly the same psychological characteristics as another person’s ‘shy’. Third, people’s own experiences, backgrounds and values will infuence the way in which they construe behaviours. Suppose two individuals witness a third placing a traffc cone on top of a lamppost. One might construe the person as ‘criminal’ (as opposed to law abiding) and another might construe them as ‘high spirited’ (as opposed to boring).
Self-assessment question 2.2 Why do people generally use different constructs to describe the same element(s)?
When construct systems fail Our predictions about people’s behaviour are not always accurate. Suppose we believe that the three ideal ingredients for a pleasant night’s socialising involve other people whom we construe as talkative, relaxed and generous. We would expect a stranger whom we construed in this manner to be good company. Suppose that the prediction is completely wrong, and the night turns out to be a disaster. Faced with a discrepancy between their mental model of what should happen and the actual behaviour and feelings of all concerned, the individual doing the construing has several courses of action open to them: l They can re-evaluate the person with regard to each of the characteristics. Perhaps the
stranger was not quite as generous/talkative/relaxed as was frst thought. This implies that the model for the kind of person who makes good company may still be correct, as it was the initial construing of the stranger’s character that was faulty. l Perhaps the whole ghastly evening shows that some other characteristics (such as extreme political views or a penchant for practical joking) can also infuence whether a person makes good company. The model can be extended to take these constructs into account. The stranger would not have performed well against some of these new criteria and so this accounts for the terrible experience. l They may conclude that the model is a failure and that they need to consider quite different characteristics of people in order to predict their behaviour. Abandoning one’s mental model of how people work and having to construct a new one is thought to be associated with feelings of threat and anxiety, as we shall see.
Discovering personal constructs: the repertory grid Kelly suggests that we each classify objects and people (‘elements’) by means of ‘personal constructs’. These can be represented by two terms that have opposite meanings for that individual, e.g., good/bad, lively/boring, closed-minded/exciting. He argues that we assess people by estimating their position on each of these constructs. You will notice from the third
12
Personality by introspection
example (closed-minded/exciting) that constructs do not have to be opposites in the sense of dictionary defnitions. Four interesting questions immediately arise: l What constructs does an individual use? In other words, what aspects of other people seem
to matter most to this individual? What characteristics does he or she fnd useful in attempting to predict how others will behave? Are there any obvious features that simply do not seem to be considered? Are there any very idiosyncratic constructs? l How does the individual construe certain key ‘elements’ in his or her life? In particular, how does the person construe him- or herself, as well as their family and their friends? The technique offers an important way of discovering how the person views themself. l What are the relationships between the constructs? It is quite possible that the constructs will be correlated. For example, we may fnd that a patient views ‘lively’ (rather than ‘dull’) people as ‘untrustworthy’ (rather than ‘honest’) – which may open up all kinds of fruitful avenues for clinical enquiry. l What are the relationships between the elements? For example, do heterosexual men choose partners who they construe in much the same way as they do their mothers? What if a woman construes her husband in much the same way as she construes sex criminals?
The repertory grid The repertory grid technique can help to address all of these issues. The basic idea is very simple and you will fnd it useful if you try the technique on yourself or a friend, using a grid similar to that shown in Figure 2.1, only with about 12–14 columns and rows. The participant is given a ‘role title’ (e.g., ‘brother’) and is asked to write down the name of one real person (‘element’) who occupies this role in their life – e.g., ‘Aaron’. This is then repeated for about a dozen role titles. They would typically include the participant, their mother, father, partner, a particular brother, a particular sister, someone whom the participant dislikes, someone for whom they feel sorry, their boss at work, a close friend of the same sex, a well-liked teacher, etc. and are chosen so as to cover the main relationships in the participant’s life, with disliked as well as liked individuals featuring on the list. All that matters is that the person knows each of the elements well enough to have formed an opinion about them. It is important to appreciate that the elements are real individuals – not stereotypes – and each person should appear only once. The participant taking the grid shown in Figure 2.1 has flled in the names of fve appropriate individuals under their ‘role titles’ in the frst row of the grid. They are then given just three of these names (‘triads’), which are chosen more or less at random, and are asked to think of one important psychological way in which any one of these individuals differs from the other two. For example, they might see one characteristic as being ‘mean’ – this is the ‘emergent pole’ of the frst construct. They would then be asked to describe the opposite of ‘mean’. It could be ‘generous’, ‘helpful’ or many other possibilities; it does not have to be a logical opposite in the sense of a dictionary defnition. However, let us suppose that our participant decides that the opposite of ‘mean’ is ‘generous’ – this is the ‘contrast pole’ of the construct. The words describing the emergent and contrast poles of the construct are entered into the frst row of the repertory grid, as shown in Figure 2.1. The process is then repeated using another three elements, until suffcient constructs have been generated or until the participant is unable to produce any new constructs. Few people can produce more than a dozen or so different constructs. If someone fnds it diffcult to think of a new construct, it is useful to encourage them to group the elements differently. Rather than looking for a construct which differentiates one’s boss from oneself and one’s brother, one could refect on how one differs from one’s brother and one’s boss, how one’s brother differs from one’s boss and oneself and so on.
13
Personality by introspection
Figure 2.1 Example of a repertory grid
Two points need to be borne in mind when encouraging someone to reveal their constructs. First, they should be encouraged to produce constructs that are psychological characteristics and not (for example) physical features (e.g., tall vs short, male vs female) or anything to do with an individual’s background or habits (e.g., poor vs wealthy, drinker vs teetotaller). The second important principle is that each construct should be different from those given previously. Gentle questioning is necessary to ensure that the participant feels that this is the case. Once the participant’s constructs have been discovered, it is necessary to fnd out how they view (‘construe’) each of the elements. There are several ways of doing this. The simplest is to ask the participant to rate each element on a fve-point scale. A rating of 1 indicates that the phrase in the left-hand column (the ‘emergent pole’) of the construct describes the element very accurately – for the frst construct in Figure 2.1 this is ‘mean’. A rating of 5 indicates that the phrase in the right-hand column (the ‘contrast pole’ of the construct) describes the element very accurately – ‘generous’ in this case. A rating of 2 would show that the element was fairly mean, a rating of 4 that they were fairly generous, and a rating of 3 that they were of average generosity. Each person or ‘element’ is rated in this way and one then moves on to the next construct. The repertory grid eventually looks something like that shown in Figure 2.2.
Self-assessment question 2.3 What is meant by: 1. 2. 3. 4.
14
a construct a role title an element a triad?
Personality by introspection
Figure 2.2 Completed repertory grid
Kelly argued that the development and elaboration of a construct system is a vitally important aspect of mental health. If an individual employs few constructs, then they will simply not have the cognitive apparatus for perceiving the fne distinctions between people that are necessary to predict their actions. Because of this they may feel anxious; they simply cannot understand how people behave. At the other extreme, an individual could have a vast and unwieldy system of constructs that are not well organised and that can provide conficting suggestions as to how the other person should behave. Instead, Kelly considered that a system of constructs should be relatively small, but constantly refned.
Analysis of repertory grids Whole books have been written about how to analyse these grids and the possibilities are virtually endless. Simply looking at the grids can help one to obtain a general impression of how an individual construes him- or herself, and some clinical work relies on this approach. However, most analyses explore two main areas – patterns of similarities between the constructs and similarities between the elements. Similarities between the constructs are interesting because they reveal the ‘stereotypes’, which are discussed later. You can see this for yourself from the grid shown in Figure 2.2. If an element has a rating of 1 or 2 on the ‘happy vs miserable’ construct, then they also tend to have a rating of 1 or 2 on the ‘lively vs quiet’ construct. The same is true for ratings of 4 and 5. This shows that the individual perceives happy people as being lively and miserable
15
Personality by introspection
individuals as being quiet, which is really rather an interesting fnding. The obvious way to analyse these relationships is to correlate the rows of the matrix together. The second way of analysing grids is to look for similarities between its columns. This shows whether the participant sees two elements as being similar. For example, look at the columns representing the participant’s father (Peter) and their boss (Mark) in Figure 2.2. You can see that this participant tends to place these two individuals in similar positions on each of the constructs. When one has a rating of 1 or 2 the other is likely to have a similar rating. This shows that the participant regards these two individuals as having much the same type of personality. However, there is a problem when correlating elements. A large positive correlation indicates that two columns are almost directly proportional, not that they are the same. For example, if two elements are given ratings of 1, 2, 2, 1 and 2, 5, 5, 2 on four constructs, these elements will correlate perfectly, although it is clear that the ratings given to the elements are very different. A statistic called the ‘congruence coeffcient’ gets round this problem. Some psychologists perform much more complex analyses on the data from repertory grids. One popular technique, called ‘factor analysis’, will be discussed in Chapter 5. Other researchers use a similar technique called cluster analysis. My view is that most such analyses should be avoided unless they are based on elements and constructs which are supplied to people, as discussed below. Otherwise the small amount of data produced by a single person’s repertory grid means that each of the correlations between the elements (or constructs) has a large amount of error associated with it – so that a small change to one of the numbers in a grid will have quite a dramatic effect on the size of the correlations and thus on the results from the factor or cluster analysis. If one supplies the elements and constructs which each person must use, the responses of several people may be combined. This means that the correlations between elements (or correlations between constructs) can be based on the responses of several people, so that the correlations will be more accurate than if they were based on just one person’s data (see Chapter 8).
Applications of repertory grids Repertory grids are useful in several branches of psychology. An occupational psychologist might invite a worker to identify people who flled the roles of ‘productive colleague’, ‘lazy colleague’, ‘someone who works hard but ineffectively’, ‘good manager’, etc. They can thus discover how workers construe colleagues who perform well – something that may be fed into the personnel selection system. It is possible to specify some or all of the elements for a repertory grid, rather than letting a person choose their own. For example, a therapist who wanted to understand a person’s self-concept might include ‘myself as I would like to be’, ‘myself as I am now’, ‘myself as my family see me’, ‘myself fve years ago’ amongst the elements. One who wanted to understand a client’s addiction might include ‘me before drinking’, ‘me after drinking’, ‘how I will be after quitting drink’. Supplying elements also means that the technique has many applications outside personality study. For example, the elements might be paintings, and a psychologist with an interest in experimental aesthetics might present a person with three paintings and ask how one differs from the other two to understand how people classify works of art. Or the same thing could be done with political attitudes, by choosing well-known politicians as elements. It is also possible to supply people with the constructs so that the grid merely involves them rating each element on each (supplied) construct. It seems to me that this detracts from the main advantage of personal construct psychology – its ability to reveal each person’s unique way of construing their world. However it has some merits when grids are used by
16
Personality by introspection
researchers (not clinicians) who seek to explore the relationships between elements and/or constructs, as described below. The technique may be used to elicit experts’ knowledge which forms the basis of ‘expert system’ computer programs (e.g., Chao et al., 1999). Stephens and Gammack (1994) wanted to design a system to help insurance underwriters (risk assessors who are not medical experts) to assess the severity of an illness. How should they categorise illnesses in order to do so? What do they need to fnd out to construct the ‘knowledge base’? Experienced underwriters were given a repertory grid in which the various medical conditions were the elements; after being given the names of three illnesses at random, one expert’s frst construct was ‘terminal vs treatable’, the next construct was ‘indicative of something more serious vs limited risk’ and the third ‘chronic vs acute’. Other experts gave similar constructs, and so representing these illnesses in terms of these three constructs provides the information which underwriters need to know in order to assess risk. Of course different categorisations would be used by other types of experts, such as physiotherapists or physicians; the whole point of using repertory grids is that they represent how information can be usefully structured for different purposes rather than (for example) classifying diseases in terms of their biological origins or the similarity of their symptoms.
Self-assessment question 2.4 A theatre manager wants to attract more students to performances, and so wants to fnd out how students generally classify the 12 plays and shows which will be staged next year so that they can write promotional literature which resonates with the students. You have access to a random sample of 50 students. How might you explore how students typically categorise these plays and shows?
Emotion Core constructs are those which describe our selves, and since it is important that we should be able to understand and predict our own behaviour, any disruption of these constructs is associated with several unpleasant emotional experiences. Guilt is experienced when we construe ourselves as behaving in ways that are inconsistent with our previous self-construction. Suppose someone who rates themselves as ‘honest’ on the core construct ‘honest/dishonest’ steals from a shop in order to keep in with their peer group. They will need to re-evaluate their position along the ‘honest/dishonest’ construct, and will consequently experience guilt. Feelings of threat may arise when one realises that one’s core constructs are about to change radically. For example, anyone would feel threatened when diagnosed with some terrible illness (e.g., Alzheimer’s disease) which affects one’s lifestyle, as this means that ‘self’ would have to be radically reconstrued. Fear is similar, but involves things about which we know less. I know something about the symptoms of Alzheimer’s disease, but relatively little about the effects of being bitten by a rattlesnake – I feel threatened by the former and fear the latter. Children feel threatened if they feel that their parents may separate, but they fear ghosts. Hostility arises when one continues to try to use a construct that is manifestly not working – our hostile actions attempt to coerce others into giving us evidence that our construing is,
17
Personality by introspection
after all, correct. Some football supporters whose team has tumbled to the bottom of the league will want to continue to construe their team as ‘excellent’ (as opposed to useless) and will react with hostility whenever they realise this is inappropriate. As mentioned above, anxiety is felt whenever it is realised that one’s construct system is unable to deal with events – that is, that the events which occur fall outside the range of convenience of the construct system. Feelings of anxiety may well then be followed by attempts to extend the construct system in order to understand the novel situation. For example, if a person has few constructs that are useful in helping them to predict how people will behave in groups, it should follow that they experience anxiety in group situations which will prompt them to develop more constructs to cover group behaviour. Kelly’s view of aggression is, surprisingly, diametrically opposed to his view of hostility. Whereas hostility is seen to be the result of clinging to constructs that simply fail to predict events, aggression accompanies experiments that are performed to check the validity of one’s construct system. We may literally ‘play games’ with people to see if they react as expected, or test out our construct system in new settings, by performing new activities or by assuming some other persona. This constant testing of the construct system is a vitally important aspect of personal growth, since it is only through fnding the weaknesses of one’s construct system that one can usefully develop it. Chiari (2013) gives a useful account of the place of emotion within personal construct theory.
Critical thinking about Kelly’s theory Kelly’s theory is not currently popular in psychology, although it is excellent for understanding how an individual person views their world. It is also useful in many other areas of psychology which need to explore how an individual (or group of people) discriminates between several different objects or people. Although I am enthusiastic about this approach, I shall briefy mention some possible problems. l Kelly’s theory relies on language to describe the constructs and this can be problematical.
It would be a brave therapist who claimed she could view her world through the construct system of another person, whose subtle nuances of meaning may elude her. l The technique is not useful for comparing individuals. If a company wants to recruit salespeople, then sociability is one obvious requirement – the organisation wants a salesperson who will go and talk to potential customers. While it would be possible to attempt to understand the phenomenal (personal) world of each applicant and to fnd out from the self-rating on the repertory grid how outgoing each person perceives him- or herself to be, there is absolutely no guarantee that the self-reports will be accurate. I can think of several rather dull individuals who are frmly convinced that they have a frst-rate sense of humour! There is no guarantee whatsoever that people’s views of how they (and others) behave are accurate. l Third, if a person lacks a particular construct, we cannot tell whether they have a ‘blind spot’ for that characteristic, or whether they are simply bad at judging people on the characteristic. A person’s construct system develops to help them predict the behaviour of themselves and others. Thus a construct will only appear in their repertoire if (a) it is useful in predicting behaviour, and (b) the individual is able to assess people’s position on the construct with accuracy. If the person doing the construing makes extremely bad judgements when attempting to assess others’ levels of ‘trustworthy vs deviousness’, then the construct will clearly be useless for helping them to predict behaviour. If they could only learn how to evaluate people better it might be extremely helpful. Finding that a person lacks a
18
Personality by introspection
l
l
l l l
l
particular commonly used construct could either mean that they have a ‘blind spot’ for that characteristic or that they are unable to accurately assess people on it. There are also several areas that are not covered by Kelly’s theory. There is no good theory of personality development apart from the belief that we all seek to expand and extend our construct systems. How, when and why we learn to construe is something of a mystery. Some may also feel that it is all rather mechanistic – classifying individuals and making coldly logical decisions about how they may behave (with emotions popping up as a sort of by-product) does not sound like a very warm, human activity. And given that personal experience plays such an important part in this theory, it is slightly odd that we rarely seem to perceive ourselves using constructs consciously. What are the criteria that an individual uses to decide that their construct system requires revision? Do different people use different criteria? What determines how this is done (e.g., the introduction of new constructs or the re-evaluation of certain elements)? How could one determine whether Kelly’s view of ‘man-the-scientist’ is correct? The key thing about any scientifc theory is that it should be capable of being shown to be incorrect. For example, Newton’s laws may be tested by dropping objects of different masses and timing how long they take to hit the ground, or noting what happens when billiard balls collide. If objects fail to behave as expected, then the theory needs modifcation. It is diffcult to see how Kelly’s theory could be tested like this. How could one ever show that people do not use personal constructs? The theory may ‘feel right’ to individuals, but, as we argue in Chapter 3, subjective reactions are really a rather poor way of establishing the worth of a theory. It is diffcult to make specifc predictions on the basis of the theory. If a person construes himself as being tough, hostile and determined to get his own way, then it seems reasonable to suspect that he may be somewhat aggressive – but how? Engaging in heated debate, partner beating, playing sport? The theory cannot tell us.
It should be clear if you have tried administering and interpreting another person’s repertory grid that phenomenology is both the greatest strength and the greatest weakness of Kelly’s theory. What we need in order to evaluate this theory are hard, empirical studies of precisely how ordinary people decide that their construct systems are failing and then revise them – that is, we need to check whether our construct systems are indeed used and revised as Kelly suggested.
The person-centred theory of Carl Rogers Whereas Kelly believed that people form hypotheses and behave analytically in forging a useful construct system, Carl Rogers focused closely on the meanings of the feelings experienced by each individual. As we shall see in Chapter 4, a Freudian would dismiss emotional experiences as mere by-products of some more fundamental unconscious confict and a personal construct psychologist would regard them as interesting indicators of how well the construct system was working. For Rogers, the subjective experience of reality was by far the most interesting source of data. According to Rogers’ view, a psychologically healthy individual is one who is in touch with their own emotional experiences, who ‘listens’ to their emotions and can express their true feelings openly and without distortion. Rogers was a clinical psychologist by training. His theories developed gradually from the 1940s to the 1980s and the basic principles of his approach have been adopted enthusiastically
19
Personality by introspection
by counsellors and clinical psychologists who believe that a careful, empathic study of what a person thinks and feels is the most valuable way of gaining an insight into their nature. He believed that open, honest communication between individuals is an exceptionally rewarding experience for both parties, and his method of therapy is primarily concerned with creating a safe, supportive environment in which the client feels comfortable enough to remove any barriers and to explore his or her true feelings about him- or herself and others. For Rogers, therapy is not about a therapist ‘doing things to’ a client but instead about giving the client the space and support to verbalise their thoughts and feelings and eventually to work out their own solutions. The most important concept in Rogers’ personality theory is the ‘self’. This concept sprang from his clinical work, in which clients would use expressions such as ‘I’m not my usual self today’, implying that they had a unifed mental picture of themselves which could be consciously examined and evaluated. This is known as the ‘self-concept’. Rogers’ clinical work further led him to believe that his clients basically wanted to get to know their true self – to stop behaving as they felt they ought to behave and just fulflling the demands of others: As I follow the experience of many clients in the therapeutic relationship which we endeavour to create for them, it seems to me that each one is raising the same question. Below the level of the problem situation about which the individual is complaining – behind the trouble with studies or wife or employer, or with his own uncontrollable or bizarre behavior, or with his frightening feelings – lies one central search. It seems to me that at the bottom each person is asking ‘Who am I, really? How can I get in touch with this real self, underlying all my surface behavior? How can I become myself?’ (Rogers, 1967) Having done so, individuals become less defensive and more open to their own attitudes and feelings and can see other people and events as they truly are, without using stereotypes. The individual also gets to like him- or herself, and to stop fearing their emotions: Consciousness, instead of being the watchman over a dangerous and unpredictable lot of impulses, of which few can be permitted to see the light of day, becomes the comfortable inhabitant of a society of impulses and feelings and thoughts, which are discovered to be very satisfactorily self-governing when not fearfully guarded. (Rogers, 1967) When he or she becomes aware of the self, the individual becomes much less dependent on the views and opinions of others. What matters now is to do what seems right for them, rather than looking to others for approval. ‘Unconditional positive regard’ is thought to be of paramount importance in allowing the individual to explore him- or herself. By this is meant
20
Personality by introspection
a relationship with another person in which the individual feels that it is they themselves who is valued and loved. This love would hold whatever terrible thing they were to do (or admit to in the course of exploring their self-concept) – it is not conditional on how they behave. In addition to the self-image, we can all envisage several other types of ‘self’, e.g., ‘self as I was when I left school’ and ‘self as I would like to be’, and it can sometimes be useful to consider the ways in which these memories and ideals differ from the current view of the self.
Self-assessment question 2.5 What is the self-concept? How can it be discovered?
The suggestions just discussed have implications for therapy and Rogers (1959) listed several conditions that should be met if the therapist is to be able to help the client to explore their self-concept. These requirements are as follows: l that the client and therapist are in psychological contact – that is, that they are ‘on the same
wavelength’ and are honest with themselves and with each other. l that the client’s behaviour is not at odds with his or her self-image (a situation known as
l l
l
l
‘a state of incongruence’ which results in feelings of anxiety). They are not trying to be something which they are not. This seems to resemble Kelly’s notion of guilt, discussed above. that the therapist is congruent – that is, that the therapist’s self-image accurately refects the way in which he or she behaves. They are not ‘role-playing being a therapist’. that the therapist shows unconditional positive regard for the client. The client must feel that the therapist values them as a person whatever they choose to do; their liking and respect for the client is not conditional on the client behaving in a particular way (e.g., giving up drinking, or achieving any other therapeutic goal). that the therapist experiences an empathic understanding of the client’s internal frame of reference – that is, that he or she can sense the person’s feelings, and view things from the client’s perspective. that the client perceives the therapist’s unconditional positive regard and empathic understanding – the therapist successfully communicates his or her unconditional regard and empathy to the client.
Thus, in Rogerian (or ‘client-centred’) therapy, the therapist may at times disclose his or her own feelings towards the client, even if these may sometimes be negative; not doing so would mean that the client–therapist relationship lacked honesty. However, care will be taken to ensure that the client realises that this is just the therapist’s impression of the client and does not indicate that there is anything wrong with the client. Rogers believed that the self-concept developed as a result of the child’s interaction with their parents (particularly whether they were perceived to love the child unconditionally, or whether their love was conditional on the child behaving in a certain way), and as a result of feedback from other people in childhood and later life. He further suggested that people behave in a manner that is consistent with their self-concept. For example, someone who views him- or herself as shy and retiring will try to behave in a manner consistent with this view. It is almost as if the individual uses the ‘self-concept’ as a kind of Delphic oracle – a
21
Personality by introspection
source to be consulted in times of uncertainty that will tell the individual how they should behave. We try to ensure that our behaviour is congruent with our self-image. Whenever an incongruence does arise (for example, when someone who views himself as mild mannered loses their temper), feelings of tension and anxiety are experienced that are most unpleasant. In fact, these feelings are so unpleasant that the person may try to distort reality (e.g., by telling himself that he was an innocent victim of someone else’s aggression), thereby denying any incongruence between his self-image and his behaviour. The problem is that such distortions make it diffcult for the person to perceive his true self: it is much better, in the long run, for an individual to recognise and accept the behaviour than to use ‘defences’ such as this. More recent work suggests that the self-image may not be as consistent as Rogers imagined, and this interacts with culture, so that (for example) in Japan people’s self-image is more variable than in the West (Kanagawa et al., 2001). Understanding how the self-concept develops (for example, as a consequence of feedback from others) is a core aspect of social psychology; as such it will not be considered here. Rogers suggests that each individual ‘has one basic tendency and striving – to actualize, maintain and enhance the experiencing organism’ (Rogers, 1951). Self-actualisation is a term that basically describes how we each fulfl our human potential – we learn to know, accept and like our true selves, we can form deep relationships with others, we can be sensitive to others’ needs and view others as individuals (rather than in terms of stereotypes) and we can learn to break free from the demands of conformity. The term (coined by Abraham Maslow) is highly abstract and it is not at all obvious how it can be assessed. However, the idea that we become wiser, more sensitive, more in touch with our true selves and less concerned with convention as we get older does have some intuitive appeal. What is less obvious is that this is a ‘basic tendency and striving’ – it could, arguably, be a mere by-product of having had more experience of other people as one ages. The acid question is whether Rogerian therapy is effective – and if so, for which types of problem. There is a substantial literature which typically shows that Rogerian (or ‘personcentred’) therapy is effective for many conditions – but so too are other techniques (Stiles et al., 2008; Laska et al., 2014) but such issues are beyond the scope of this book.
Measuring self-concept by the Q-sort Measuring attitudes to the self by clinical interview can be a long, diffcult process. However, there are other techniques for measuring attitudes towards the self, ideal self, etc. The Q-sort is a measure designed by Stephenson (1953) to do just this. The client is given a pile of approximately 100 cards, on each of which is a self-descriptive statement such as ‘I usually like people’, ‘I don’t trust my emotions’, ‘I am afraid of what other people may think of me’ and so on. The client is asked to sort these into about fve piles, ranging from statements that are most characteristic of them to those that are least characteristic. The number of cards to be placed in each pile is usually specifed. The placement of each card is recorded and the exercise is repeated under other conditions, e.g., after several sessions of therapy or after asking the client to sort the cards according to his or her ‘ideal self’ rather than actual self. It is a simple matter to track changes in the way in which the cards are sorted and thus to build up a picture of how the self-concept is evolving or the precise ways in which the self is seen to be rather different from the ideal self and these can be usefully explored during the therapeutic interview. A clinician could make up their own cards and this might be useful for exploring particular issues that may have been raised or hinted at, but not fully explored. There also seems to be no reason why the items in any standard personality questionnaire could not be transferred
22
Personality by introspection
on to cards and administered in this way, but the tendency is to use a standard set of items (the California Q-set) to cover most of the standard feelings about the self. Block (1961) gives full details on the use of the Q-sort technique in various settings. This measure provides a quick method of allowing individuals to express their feelings about themselves in a quantifable manner.
Measuring self-concept by the Twenty Statements Test Another simple way of discovering someone’s self-concept is to ask them – and this is what the Twenty Statements Test (the TST; Rees and Nicholson, 1994) achieves. It simply asks people to say what sort of person they are, by answering the question ‘I am . . .’ in 20 different ways, to fnd out how the person views themselves; ‘I am a socialist’, ‘I am an activist’, ‘I am a rebel’, for example, shows a rather different set of priorities from someone who puts ‘I am a person who enjoys learning’, ‘I am a deep thinker’, ‘I am smart’ at the top of their list. Like the Q-sort it provides data which are qualitative (rather than numbers with which one can perform exciting statistical analyses), and like the Q-sort there are few ways of checking whether what the person says is correct. Are they perhaps ‘putting on a front’ to try to impress the person who asks them to complete the test, for example? There is some tension between social psychologists and individual difference psychologists over such issues. Those who study individual differences focus on discovering whether personality, abilities, etc. infuence behaviour. Social psychologists are generally more interested in how people view themselves and others (prejudice, etc.) and how people’s perceptions of situations affect their behaviour. A social psychologist would thus explore the formative experiences and cultural factors which led our student to view themself as a left-ofcentre activist in certain settings; a personality psychologist would be more interested in discovering which personality characteristics lead certain individuals to behave in a rebellious way in most situations. Here the person’s view of themselves is thought to be a consequence of their personality rather than the reason for their behaving in a particular way. For example, a personality trait of rebelliousness might cause someone to view themselves as a socialist, rebel and activist, whilst intelligence might cause someone to view themselves as a smart intellectual who enjoys learning. There are a number of problems with this argument as stated here; it makes little sense to observe behaviour and explain behaviour using the same term, for example. You may have noticed that I said that someone was a rebel because they were rebellious, which makes little sense. We address these issues in Chapter 6, and show that they are not a major issue for personality theory. Dunlop (2015) gives a good overview of some of these issues from the point of view of a social psychologist.
A little integration Thus far we have looked at how people view themselves (and other people, in the case of Kelly) by: seeing where they position themselves along various personal constructs noting how they describe themselves when writing their ‘self characterisation’ exploring the self-concept through clinical interview using the Q-sort to discover which characteristics a person feels describes them well, and which are unlike them l using the Twenty Statements Test to discover how people view themselves. l l l l
23
Personality by introspection
In each case the aim is to try to understand the person’s self-concept which may be very useful if the person is undergoing therapy or counselling, the idea being that the therapist can understand how the individual views themselves and (perhaps) explore some inconsistencies, important areas which are not mentioned, or (in fxed role therapy) discovering how it feels to act as though one was at the opposite end of one or more constructs – for example when a timid person tries to be confdent. The problem is that this sort of data relies on introspection and self-insight, and it is diffcult to know how accurate a person’s insights are.
Accuracy of self-reports It is not hard to fnd instances where people have unrealistic views of themselves – even when they are trying to be honest. For example, men tend to overestimate their intelligence whereas women underestimate theirs (Beloff, 1992) and we probably all know people who think they are the life and soul of the party whilst everyone else views them as just being loud and obnoxious. Vazire and Carlson (2011) discuss such blind-spots, and suggest that self-reports alone are unlikely to reveal the truth about an individual’s personality. During therapy, the therapist will explore any discrepancies between the way the person views themselves and their behaviour, and they will give the client feedback about how they appear to others, which may help them alter their self-concept. But if we try to use self-reports as a basis for understanding personality, then inaccuracies in self-reports become a major problem. It is always possible that people will be deluded when describing their selves, and so it is unwise to rely solely on how they describe themselves when studying personality. This is not so much of a problem for social psychologists who tend to be interested in attitudes, of which attitudes towards the self is one. They explore how perceptions, social interactions and other environmental variables infuence people’s views of themselves (e.g., Pilarska and Suchanska, 2015), and frequently suggest that how a person views themselves determines their behaviour. Their focus of interest tends to be why a person believes something about themselves (for example, what causes them to feel that they are anxious), rather than establishing whether they really are more anxious than other people. Whilst it may be very useful to understand what leads to certain perceptions of oneself (for example, exploring whether someone feels that being bullied in childhood has made them anxious) because these analyses are performed on individuals, it is diffcult to know whether the attributions made by one person will generalise to other people. The same objective experience may be interpreted by different people in different ways. For this reason most psychologists who study individual differences focus on how people actually do behave, rather than how they see themselves. So rather than noting that a person describes themselves as ‘anxious’ and trying to identify environmental events which caused this, a psychologist with an interest in individual differences would try to discover what (if anything) is meant by ‘anxiety’, whether it really exists, and (if so) how it should be measured. They will then try to measure it and determine empirically if a particular person is more or less anxious than average. The phenomenological approach, trying to understand each individual’s view of themselves, simply cannot provide such information.
Qualitative rather than quantitative data The methods described above tend to give qualitative data: descriptive words, rather than anything which can be represented by numbers. If one person says that they are ‘anxious’, another that they are ‘nervy’ and a third that they are ‘twitchy’, do they all mean exactly the
24
Personality by introspection
same thing? It is hard to tell for sure. Some would argue that this is a disadvantage of these methods, for this makes it diffcult to compare people numerically. Even if people use the same word, and we are prepared to accept that they feel that it means the same thing to both of them, we cannot quantify the extent to which they feel the characteristic. Without being able to quantify the extent to which each person is anxious, eccentric, intelligent, etc. it is diffcult to test hypotheses using quantitative methods. Rightly or wrongly, psychologists and other professionals often attempt to quantify personality – for example to say that ‘this person is more anxious than 80% of the population’ – and this involves comparing people on the same characteristic (e.g., anxiety) rather than exploring how each individual views themselves or construes the world. The theories discussed in Chapter 6 onwards therefore attempt to compare people, rather than focusing on them as individuals.
Attitudes and self-esteem Attitudes are really just evaluations of people, objects or concepts – and so we can develop as many attitudes as there are nouns in the dictionary. We may have attitudes to things which we have not encountered (Martians or dragons, for example), and so it is hardly surprising that you have formed some sort of opinion – attitude – about what sort of person you are. This is what we discussed earlier as the ‘self-concept’. We can also develop attitudes to our ‘ideal self’, ‘future self’ and suchlike, and it is tempting to suggest that the study of personality should just involve identifying all of the important objects in our lives and measuring the strengths of these attitudes. There may be some problems with doing so. l The frst is that attitudes are, by defnition, subjective. My attitude to broccoli is almost
certainly different to yours, even though we are both evaluating exactly the same thing. So we are not explaining any real property of broccoli when we explore our attitudes to it, but just giving a subjective opinion of it which is neither correct nor incorrect. Likewise, my evaluation of my ‘self’ may not tell us anything useful about what I am really like. It is a subjective opinion, rather than necessarily being based on reality. It could be argued that this is a problem, as scientifc theories should try to measure how people are, not how they think they are. l The second diffculty is that this approach is not very parsimonious. Given that there are potentially billions of objects, concepts and people in the world, and we might have an attitude about each one, understanding personality by measuring the strengths of our attitudes to each of them would be a truly massive undertaking. l The third problem is that attitudes change – although some change more than others (Krosnick et al., 1993) – and social psychology has explored factors leading to the development and change of attitudes in considerable detail; see for example Olson and Maio (2003) or virtually any other social psychology textbook. Describing personality by means of a vast list of subjective attitudes which constantly shift would thus be a formidable task. However we do not abandon attitudes entirely. Chapter 10 identifes some attitudes which vary together from person to person. For example, some people tend to value organised religion, believe in capital punishment, prefer not to be spontaneous, dislike the idea of legalised drugs and so on. Clusters of attitudes which tend to vary together from person to person can be viewed as a personality trait (‘conservatism’ in this example).
25
Personality by introspection
Notwithstanding the problems outlined above, several questionnaires have been developed to measure self-esteem – the extent to which a person likes their self. These may allow selfesteem to be quantifed, and are widely used in social psychology and one test, the ‘Self-Esteem Scale’ (Rosenberg, 1965, 1989), is cited well over 4000 times in the literature. The items in this ten-item scale are very similar to each other and basically just ask ‘do you like yourself?’ in ten different ways. We argue in Chapter 6 that this may be a problem. It is important to determine whether self-esteem causes people to behave in a certain way or experience particular emotions, or whether some other things (e.g., personality, or having a job or relationship) tends to boost self-esteem and also infuence a person’s behaviour/ feelings. Social psychologists tend to assume that self-esteem drives feelings and behaviour, but it is important to establish this empirically. This work is important because although selfesteem is a subjective variable, scores on self-esteem scales might be able to predict more objective characteristics, such as how depressed a person is. Sowislo and Orth (2013) show that low self-esteem seems to predict levels of depression later in life; it is not the case that depression causes someone’s self-esteem to become eroded. It seems that self-esteem does seem to have some predictive power, much as Rogers suggested – although it takes some fairly sophisticated analyses to demonstrate this.
Self-assessment question 2.6 How might you test whether the Q-sort gives an accurate assessment of self-concept?
Evaluation It is diffcult to evaluate Rogers’ theory, since it is more of a philosophical view of the person than a formal, concrete psychological theory. It evolved from Rogers’ own experiences as both therapist and man, rather than from any formal empirical data that can be independently scrutinised. In addition, it may not be a scientifc theory in that it would be diffcult to show that the self-concept does not exist. We all think about work colleagues, family members, mass murderers and heads of state from time to time and make some attempts to understand why they act in the various bizarre ways in which they do: it would be amazing if we never thought about ourselves at all. Likewise, perhaps I have an ‘integrated self-concept’ (see myself as a single individual) since this is what I am. Thus it could be argued that we are all bound to have an integrated self-concept, much as Rogers described. A great deal of what Rogers says about the development of self-concept and the search for the true self certainly feels true to many people. However, as we discuss in Chapter 3, this does not necessarily mean that it is correct. Many people presumably feel convinced that they have a good chance of winning a fortune whenever they buy a lottery ticket or back a horse. More empirical evidence is needed to back up these claims. The next problem is the source of data. In Rogers’ theory (along with that of Kelly) there is a total reliance on self-report; the clients’ self-insights are assumed to be correct. What if the client is unwilling or unable to introspect in detail, perhaps because they do not want to take part in the assessment or they have poor verbal skills or problems dealing with abstract concepts?
26
Personality by introspection
That said, the theory has many strengths. Like Kelly’s personal construct theory, it attempts to explain how the whole person functions, rather than concentrating on just one or two aspects of personality. It has also made some very specifc suggestions about the necessary conditions for therapeutic change and some old empirical studies have shown that the client’s experience of warmth, unconditional positive regard and empathy are indeed crucial factors in determining the success of some forms of psychotherapy (Truax and Mitchell, 1971). However, the problem is that many of the concepts are rather vaguely defned – the accurate assessment of ‘empathy’ and ‘genuineness’ is a nontrivial problem and patients cannot agree as to what the terms mean (Bachelor, 1988). Nonetheless, some aspects of the theory can be tested, e.g., the assumption that psychological health is marked by congruence between self and experience. Rogers and Kelly’s theories are similar in many respects. They are both phenomenological; they focus on a person’s subjective view of the world and themselves. Both are weak on developmental theory. Both theories are based on clinical experience and try to help people understand how they view their world. However, they differ drastically in terms of therapy. While Rogers focuses on helping the client grow, mature, discover and accept their true self, Kelly’s method of fxed role therapy encourages people to try out new selves (e.g., becoming spontaneous rather than cautious) to see whether this improves their ability to predict behaviour.
Suggested additional reading Walker and Winter (2007) and Winter (2013) provide useful overviews of Kelly’s work, although many readers will want to consult some of the original sources that they cite. Benjafeld (2008) gives a historical and philosophical view of Kelly’s theories and places them in context. Personal construct theory is developed in the two volumes of Kelly (1955) the frst three chapters of which are reprinted as Kelly (1963). http://kellysociety.org contains some useful resources, as does the old site http://www.pcp-net.org/encyclopaedia/main.html. Other sources include Fransella and Thomas (1988) and the work of Neimeyer and Neimeyer (1990, 1992) together with papers in the Journal of Constructivist Psychology and the International Journal of Personal Construct Psychology. Chiari (2013) gives a good account of the links between personal construct theory and emotion. Those who are interested in the application of repertory grid techniques to software engineering might like to consult Edwards et al. (2009) or Tan and Hunter (2002). Caputi and Viney (2012) give another good account of grid methodology. Rogers (1967) provides a humble and very personal account of his theory and clinical methods that makes compelling reading. Other useful sources include Rogers et al. (1989) and the chapter by Raskin and Rogers (1989). Papers by Bozarth and Brodley (1991), and two reprints of Rogers’ early papers (Rogers, 1992a, 1992b) together with Anderson (2001) will appeal to those who are interested in the therapeutic implications of these theories, whilst the Journal of Humanistic Psychology covers all aspects of Rogers’ theories and therapies. The obvious source of information on Q-sort methodology is Block’s (1961) old book The Q-Sort Method in Personality Assessment and Psychiatric Research which is available online, and also his account of a large longitudinal study which used Q-sort methodology to tame an unruly mass of qualitative data (Block, 1971). There are also plenty of papers demonstrating the utility of the Q-sort methodology in different felds (e.g., van Ijzendoorn et al., 2004).
27
Personality by introspection
Answers to self-assessment questions SAQ 2.1 This involves trying to discover and understand each person’s unique view and interpretation of the world in which they live. For example, when examining family dynamics, a phenomenologist would be interested in the implications of a child’s belief that her mother was cruel and unkind and not particularly interested in whether the mother really was unkind.
SAQ 2.2 I suggest three main reasons. First, different observers may be trying to predict different outcomes and so may pay attention to rather different aspects of people’s behaviour. Second, the vagueness of language makes it possible that two observers might use two different phrases when they are actually referring to precisely the same characteristic in another person. Third, all kinds of social values are likely to colour the way in which we interpret behaviour.
SAQ 2.3 1. A construct is a psychological dimension that an individual fnds useful when classifying a certain set of elements (usually people). 2. A role title is a type of person with whom the individual is likely to be well acquainted (e.g., a brother). A wide range of role titles is used to ensure that the individual considers a wide range of elements (people) when completing the grid. 3. An element is a particular individual who flls one of the role titles (e.g., John, who is ‘a brother’). 4. A triad is a group of three elements used to elicit constructs by asking the individual to name one psychological way in which any one of the elements in the triad is very different to the other two.
SAQ 2.4 Using the names of the 12 plays as elements, elicit personal constructs from each student in turn by choosing three plays at random and asking each
28
Personality by introspection
person how one differs from the other two. Repeat until no new constructs can be found; the number of constructs will probably be different for each person. Look at the constructs and try to identify which ones are most common; these are the terms which the theatre manager might be well advised to stress in their promotional material. Examining the average rating given to each element (play) on each of these constructs will show which constructs best describe each play. (Of course the canny manager will choose not to mention a negatively toned construct such as ‘boring’, even if everyone agrees that it describes a particular play wonderfully well.) As each student rates the same set of elements, it would also be possible for an enthusiastic researcher to learn to perform a cluster analysis or factor analysis in order to identify groups of plays which the students view as being similar.
SAQ 2.5 The self-concept is our view of our personality, abilities, background, etc., as it really is, and not as it is modifed by the demands of society or our own aspirations. It is a coherent mental image of the type of person that we really are, deep down. It can best be explored in the context of a supportive relationship with another person.
SAQ 2.6 There is no simple way of testing this. After all, the individual is the only person who can really comment on some of his or her deepest, most personal feelings. However, it might be possible to attempt some experiments. One approach could involve giving the Q-sort twice, after a gap of a week or two, and checking that the person produces roughly the same arrangement of cards. If there is a marked discrepancy (and the client cannot think of any reason why their self-image may have altered), then it may be unwise to proceed with the technique. Alternatively, the cards (and particularly those towards the extremes of the distribution, which best and least describe the individual) could be put into a questionnaire and other people (e.g., partner, parents, close friends) could be asked to rate the client on each quality, thereby showing whether the client’s self-image was consistent with other people’s impressions. Or the therapist could be asked to describe the person’s self-concept, and this could be compared with the results of the Q-sort.
29
Personality by introspection
References Anderson, H. 2001. Postmodern Collaborative and Person-Centred Therapies: What Would Carl Rogers Say? Journal of Family Therapy, 23, 339–360. Bachelor, A. 1988. How Clients Perceive Therapist Empathy: A Content Analysis of ‘Received’ Empathy. Psychotherapy, 25, 227–240. Beloff, H. 1992. Mother, Father and Me: Our IQ. The Psychologist, 5, 309–311. Benjafeld, J. G. 2008. George Kelly: Cognitive Psychologist, Humanistic Psychologist, or Something Else Entirely? History of Psychology, 11, 239–262. Block, J. 1961. The Q-Sort Method in Personality Assessment and Psychiatric Research. Springfeld, IL: Thomas. Block, J. 1971. Lives through Time. Berkeley, CA: Bancroft Books. Bozarth, J. D. & Brodley, B. T. 1991. Actualization: A Functional Concept in Client-Centered Therapy. Journal of Social Behavior and Personality, 6, 45–59. Caputi, P. & Viney, L. L. 2012. Personal Construct Methodology. Malden, MA: Wiley. Chao, C.-J., Salvendy, G. & Lightner, N. J. 1999. Development of a Methodology for Optimizing Elicited Knowledge. Behaviour & Information Technology, 18, 413–430. Chiari, G. 2013. Emotion in Personal Construct Theory: A Controversial Question. Journal of Constructivist Psychology, 26, 249–261. Dunlop, W. L. 2015. Lives as the Organizing Principle of Personality Psychology. European Journal of Personality, 29, 353–362. Edwards, H. M., McDonald, S. & Young, S. M. 2009. The Repertory Grid Technique: Its Place in Empirical Software Engineering Research. Information and Software Technology, 51, 785–798. Fransella, F. & Thomas, L. F. (eds.) 1988. Experimenting with Personal Construct Psychology. London: Routledge. Kanagawa, C., Cross, S. E. & Markus, H. R. 2001. ‘Who Am I?’ – the Cultural Psychology of the Conceptual Self. Personality and Social Psychology Bulletin, 27, 90–103. Kelly, G. A. 1955. The Psychology of Personal Constructs. New York: Norton. Kelly, G. A. 1963. A Theory of Personality. New York: Norton. Krosnick, J. A., Boninger, D. S., Chuang, Y. C., Berent, M. K. & Carnot, C. G. 1993. Attitude Strength – One Construct or Many Related Constructs? Journal of Personality and Social Psychology, 65, 1132–1151. Laska, K. M., Gurman, A. S. & Wampold, B. E. 2014. Expanding the Lens of Evidence-Based Practice in Psychotherapy: A Common Factors Perspective. Psychotherapy, 51, 467–481. Neimeyer, G. J. & Neimeyer, R. A. (eds.) 1990. Advances in Personal Construct Psychology: A Research Annual. Greenwich, CT: JAI Press. Neimeyer, R. A. & Neimeyer, G. J. (eds.) 1992. Advances in Personal Construct Psychology. Greenwich, CT: JAI Press. Olson, J. M. & Maio, G. R. 2003. Attitudes in Social Behavior. In T. Millon & M. J. Lerner (eds.) Handbook of Psychology Vol. 5. Hoboken, NJ: Wiley. Pilarska, A. & Suchanska, A. 2015. Self-Complexity and Self-Concept Differentiation – What Have We Been Measuring for the Past 30 Years? Current Psychology, 34, 723–743. Raskin, N. J. & Rogers, C. R. 1989. Person-Centred Therapy. In R. J. Corsini & D. Wedding (eds.) Current Psychotherapies (4th Edn). Ithaca, IL: F. E. Peacock. Rees, A. & Nicholson, N. 1994. The Twenty Statements Test. In C. Cassell & G. Symon (eds.) Qualitative Methods in Organizational Research: A Practical Guide. New York: Sage. Rogers, C. R. 1951. Client-Centered Therapy. Boston, MA: Houghton-Miffin.
30
Personality by introspection
Rogers, C. R. 1959. A Theory of Therapy, Personality and Interpersonal Relationships, as Developed in the Client-Centered Framework. In S. Koch (ed.) Psychology: A Study of a Science. New York: McGraw-Hill. Rogers, C. R. 1967. On Becoming a Person. London: Constable. Rogers, C. R. 1992a. The Necessary and Suffcient Conditions of Therapeutic Personality Change. Journal of Consulting and Clinical Psychology, 60, 827–832. Rogers, C. R. 1992b. The Processes of Therapy. Journal of Consulting and Clinical Psychology, 60, 163–164. Rogers, C. R., Kirschenbaum, H. & Henderson, V.-L. (eds.) 1989. The Carl Rogers Reader. Boston, MA: Houghton-Miffin. Rosenberg, M. 1965. Society and the Adolescent Self-Image. Princeton: Princeton University Press. Rosenberg, M. 1989. Society and the Adolescent Self-Image, Rev. Edn. Middletown, CT: Wesleyan University Press. Rykman, R. M. 1992. Theories of Personality. Belmont, CA: Wadsworth. Sowislo, J. F. & Orth, U. 2013. Does Low Self-Esteem Predict Depression and Anxiety? A Meta-Analysis of Longitudinal Studies. Psychological Bulletin, 139, 213–240. Stephens, R. A. & Gammack, J. G. 1994. Knowledge Elicitation for Systems Practitioners – A Constructivist Application of the Repertory Grid Technique. Systems Practice, 7, 161–182. Stephenson, W. 1953. The Study of Behavior. Chicago: University of Chicago Press. Stiles, W. B., Barkham, M., Mellor-Clark, J. & Connell, J. 2008. Effectiveness of CognitiveBehavioural, Person-Centred, and Psychodynamic Therapies in UK Primary-Care Routine Practice: Replication in a Larger Sample. Psychological Medicine, 38, 677–688. Tan, F. B. & Hunter, M. G. 2002. The Repertory Grid Technique: A Method for the Study of Cognition in Information Systems. MIS Quarterly, 26, 39–57. Truax, C. B. & Mitchell, K. M. 1971. Research on Certain Therapist Interpersonal Skills in Relation to Process and Outcome. In A. E. Bergin & S. L. Garfeld (eds.) Handbook of Psychotherapy and Behaviour Change. New York: Wiley. Van Ijzendoorn, M. H., Vereijken, C., Bakermans-Kranenburg, M. J. & Riksen-Walraven, J. M. 2004. Assessing Attachment Security with the Attachment Q Sort: Meta-Analytic Evidence for the Validity of the Observer AQS. Child Development, 75, 1188–1213. Vazire, S. & Carlson, E. N. 2011. Others Sometimes Know Us Better Than We Know Ourselves. Current Directions in Psychological Science, 20, 104–108. Walker, B. M. & Winter, D. A. 2007. The Elaboration of Personal Construct Psychology. Annual Review of Psychology, 58, 453–477. Winter, D. A. 2013. Still Radical after All These Years: George Kelly’s The Psychology of Personal Constructs. Clinical Child Psychology and Psychiatry, 18, 276–283.
31
Chapter 3 Measuring individual differences
Introduction The previous chapter mentioned the use of a questionnaire to measure self-esteem, but it did not go into any detail about how questionnaires and tests are constructed, used and evaluated. If we are unable to actually measure people’s levels of self-esteem, anxiety, intelligence or other characteristics accurately, then the study of individual differences can never be scientifc or quantitative; it will be diffcult to test theories, or to assess these characteristics for the purpose of diagnosis, individual guidance or personnel selection. So before studying other theories it seems sensible to pause and to discuss how we can go about measuring individual differences; more detail about designing and using tests and questionnaires is given in Chapters 20 and 21. This chapter thus provides an introduction to psychometrics, the branch of psychology that deals with the measurement of individual differences. It introduces the concepts of trait and state. Various types of psychological test are outlined and the interpretation of individual scores through the use of norms is discussed. Finally, some guidance is given as to how to select a test and use it ethically.
Learning outcomes Having read this chapter you should be able to: l l
Discuss what is meant by psychometrics. Show an understanding of what an ‘operational defnition’ is, and the distinction between operational defnitions and theoretical constructs.
33
Measuring individual differences
l l l
l l
Explain the differences between ability traits, personality traits and attainments. Evaluate the advantages and disadvantages of measuring personality using Q-data, Q’-data, L-data and T-data, and projective tests. Explain how and why scores on psychological tests behave differently from physical measurements, and how norms may give meaning to test scores Outline Michell’s objections to treating responses to test items as if they were real numbers. Show that you can locate a test.
One of the most important distinctions between psychology and other disciplines that also claim to give insight into the ‘human condition’ is that of measurement. Students of literature are happy to make suggestions about the personality, motives and moods of fgures such as Hamlet, Hannibal or Hagar and although these contributions may be scholarly, they are not scientifc in that they cannot ever be shown to be wrong. This is where psychology differs radically from other methods of understanding how humans function. While great literature, religious doctrines, therapists’ explanations, old wives’ tales, psychoanalytical interpretations of the causes of neurosis, armchair speculation and ‘common sense’ may provide some accurate and useful insights, it is perfectly possible that some or all of these are simply incorrect. We certainly cannot tell what is true just by our emotional reactions. A favourite trick of academics is to ask students to complete personality questionnaires that are then taken away for computerised scoring. The next week, the students are given a talk about the ethics of testing, are handed sealed envelopes each holding a computer-generated analysis of their personality and asked to rate how accurate they fnd it. In my experience, most students are amazed by how insightful and accurate the results are and astounded by the depth of these insights. Asking members of the class to compare their descriptions is the kindest way of showing that everyone in the class has received precisely the same personality assessment! The point is that one simply cannot trust one’s emotional judgements to sort out fact from fction. The fact that the results obtained from some personality test ‘feel right’ to an individual is not an adequate criterion. This, of course, is unsurprising – given that we are all human beings, it will be possible to make some broad descriptions about how human beings in general behave or feel, which is a far cry from the scientifc assessment of individual differences. Instead, we need to develop more rigorous techniques.
Psychometrics Psychometrics is the branch of psychology concerned with the scientifc measurement of individual differences and six chapters of this book are devoted to psychometric principles. This is because the accurate assessment of individual differences, using proper psychological tests or other techniques, is vital for a proper, scientifc study of the discipline. It involves fnding an ‘operational defnition’ for each theoretical term: a score on some questionnaire, test or some other instrument which is thought to refect the level of some theoretical concept.
34
Measuring individual differences
It is vital to appreciate that a person’s level of some theoretical concept is not the same as their score on a test – because making any operational defnition involves a lot of arbitrary decisions. In the nineteenth century, Francis Galton surmised that ‘intelligent’ people might be able to react faster than less intelligent people – a theory that has some value in so far as it suggests that intelligence may be linked to the speed with which our nervous systems can process information, as discussed in Chapter 13. To test this hypothesis, it is necessary to measure some individuals’ reaction times and also their intelligence; the analysis involves correlating these two scores together. Without effective measures (operational defnitions) of both intelligence and reaction time, the hypothesis is untestable.
An operational defnition of reaction time What might a task designed to measure reaction time look like? The important thing is to design the task so that performance refects reaction time rather than anything else. One could program a computer to present a shape somewhere on a screen and time how long it takes the person to push the space bar on the keyboard when they see it. This sounds simple – but a lot of decisions need to be made which will probably infuence the accuracy of the measure. For example: l Should we measure reaction time just once, or average a person’s responses from several
trials? (Many trials – this will reduce the infuence of random errors) l Should we give practice trials at the start of the experiment? (Does the literature show that
performance improves over time? If so, we should, otherwise individual differences in rate of learning to perform the task might be confused with reaction time. We need to ensure that everyone is performing at their peak level.) l Should the shape be presented in the same position each time? (Yes, for otherwise the task will also measure the speed at which people can scan the area of the screen, which might be different from reaction time.) l Should the time between trials always be the same – for example, should the next shape appear 1 second after the person makes a response? (Certainly not – they will anticipate it. A random interval might be better. Perhaps between 1–3 seconds? If it is much longer, people might get bored . . .) l How can outside distractions (noises, etc.) be minimised? There are many other issues to be considered, too. How many trials should be given? 10? 1000? Should there be rest breaks, and if so who should choose when they happen? Should we assume that very long reaction times (several seconds) refect a lapse of attention and so should be dropped from the analysis? If so, what should the ‘cut-off’ point be? Should we use a tone instead of a shape? Does the size, colour or location of the shape on the screen matter? Should the participant be warned by a tone a few seconds before each trial, so that they can focus their attention? If so, should this warning interval be random (e.g., between 1 and 3 seconds) to prevent people anticipating when the shape will appear? Is the context important – for example, would it be better to measure reactions to brake lights in a driving simulator? It is clear that plenty of arbitrary decisions are made even when developing a simple task like this. The bottom line is that if two people develop tasks measuring ‘reaction time’ these tasks are likely to be rather different – and whilst both are (hopefully) infuenced by reaction time to some extent, neither of them will correspond perfectly to the theoretical concept. One of the main aims of psychometrics is to develop tests which correlate substantially with the theoretical concepts which they are designed to measure.
35
Measuring individual differences
An operational defnition of intelligence The same principles apply when developing a scale to measure intelligence. You may even like to consider whether it is even appropriate to try to measure ‘intelligence’, as some authors (Gardner, 1993; Guilford, 1967) have argued that there are many quite distinct (uncorrelated) abilities, and so combining them to form a measure of ‘intelligence’ is meaningless. We return to this issue below, and in Chapter 12. However assuming that it is appropriate to measure intelligence, a test doing so will probably present people with a wide selection of different puzzles to solve – perhaps measuring how many are solved correctly in a certain time. It would be sensible to include a wide range of puzzles – perhaps measuring various forms of memory, inductive and deductive reasoning, visualisation, and so on. Examples of such items are given in Chapter 12. Should any puzzles involve language, such as anagrams or measures of vocabulary? If so, this may pose problems for those whose frst language is not English, or people with dyslexia. How many puzzles should be used, for when do fatigue and boredom set in? Should the person write the answer, or choose the correct answer from a list? It is also necessary to ensure that the diffculty of the puzzles is appropriate for the age and ability of the people being tested, the instructions need to specify whether or not people should guess if they are unable to solve a puzzle, and whether it is permissible to skip an item if one is unable to solve it.
The importance of psychometrics If there is no obvious relationship (correlation) between operational defnitions of intelligence and reaction time (which is what Galton found) this could either be because there is no link between intelligence and reaction time, or that in reality there is such a link, but this cannot be detected because one or both of the concepts have inadequate operational defnitions. The intelligence test and/or the method of measuring reaction times may be fawed. One aim of psychometrics is to develop sound operational defnitions of theoretical concepts (‘intelligence’, ‘anxiety’, ‘self-concept’, etc.). (In fact, Galton’s means of measuring reaction time was somewhat crude, as were his attempts at assessing intelligence; see Chapter 13 to discover what we now know about this relationship.) The extent to which a test measures what it is supposed to is known as its validity and we shall return to this concept in Chapter 8. The key point is that we need highly accurate psychological tests in order to provide operational defnitions of abstract terms. Without such tests to provide solid operational defnitions of psychological concepts, theories degenerate into mere speculations of zero scientifc value. The second main use of psychometrics is thus to discover how we should describe personality, ability, mood and motivation using a psychometric technique called factor analysis. Without understanding the basic principles of this (which will be covered in Chapter 5) it is impossible to grasp how these theories have developed, their strengths and weaknesses and whether a particular theory is based on sound empirical evidence. These issues are important because psychological tests are very widely used. Researchers in many branches of psychology use tests and questionnaires to measure concepts such as ‘working memory’, ‘anxiety’ and ‘prejudice’. Occupational psychologists and personnel managers use tests to assess the potential of job applicants. Educational psychologists may use tests to detect learning diffculties, diffculties in using language, etc. Medical psychologists may use a questionnaire to identify individuals who may show early signs of dementia; there are many other applications, too as described in Chapter 17 and 18. It is vitally important that test users understand how these instruments are constructed, how to identify those which
36
Measuring individual differences
will produce accurate scores for a particular application, how they should be administered, scored and interpreted and the importance of assessing measurement error, bias and other important and potentially litigious issues.
Self-assessment question 3.1 a. What is an operational defnition? b. Try to think what you might use as operational defnitions of Frenchspeaking ability and meanness.
Most readers of this book will use psychological tests in some shape or form, either as dependent variables for testing psychological theories or in applied psychology, and everyone will need to understand some basic principles of psychological measurement in order to grasp modern theories of ability and personality. Given the prevalence and importance of psychological tests in academic and applied psychology, it is embarrassing to admit that many are simply not worth the paper that they are printed on. Some patently useless tests are slickly marketed and widely used by the unwary and so one aim of this book is to teach you enough about the basics of psychological measurement to allow you to choose a test that is likely to provide a suitable operational defnition for the concepts that you wish to assess.
Traits and states Most psychological tests measure ‘traits’ of one kind or another. Traits are simply useful descriptions of how individuals generally behave. For example, ‘sociability’ is generally regarded as a trait, since few people are the life and soul of the party one day and a virtual recluse the next. Since individuals tend to have a characteristic level of sociability we term it a trait. There are plenty of others.
Self-assessment question 3.2 Try to decide which of the following are (probably) traits: a. b. c. d. e.
musical ability hunger liberality of attitudes rage good manners.
It is conventional to group traits into three classes, namely abilities, attainments, and personality traits. Ability traits are concerned with a person’s level of cognitive performance in some novel area, in which they have not been previously trained, and so where the emphasis
37
Measuring individual differences
is on thinking rather than knowledge. Examples of ability traits might include reading maps, solving mental arithmetic problems, solving crossword clues, visualising what shapes would look like if they were rotated, understanding a passage of prose or coming up with creative ideas. These refer to thinking skills (rather than knowledge) either in areas that are not explicitly taught (e.g., visualisation of rotated shapes) or in areas in which everyone can be presumed to have had the same training (e.g., being taught to read and write). Abilities are related to future potential (i.e., thinking skill in a particular area) rather than to prior achievement or knowledge. Measures of attainment are of lesser interest to psychologists, other than educational psychologists. They measure how well an individual performs in a certain area, following a course of instruction. School examinations are an example of attainment tests. If students attend lessons and read and memorise the textbooks, they should be able to achieve a perfect mark in a knowledge-based attainment test. Levels of attainment are specifc to a particular area. If a pupil has a frst-rate knowledge of British social history in the nineteenth century, it is not possible to say whether or not she knows anything about modern economic theory, seventeenth-century history or anything else. The scores on attainment tests cannot be assumed to generalise to other areas of knowledge. The distinction between abilities and attainment becomes a little more blurred at university level, where students are required to search out references, and think about their implications and make a coherent argument when writing an essay. Here abilities, personality traits and motivational factors will also play a part. The essay mark will, in part, measure attainment, but will also be infuenced by motivation, the student’s ability to express themself and so on. Personality traits instead refect a person’s style of behaviour. Words such as ‘extraverted’, ‘punctual’, ‘shy’ or ‘anxious’ all describe how (rather than how well) a person usually behaves. Each person is thought to have a particular characteristic level of extraversion, etc. and the way they behave is thought to depend both on their level of this trait and the situations they fnd themselves in; even the bubbliest extravert is unlikely to tell risqué jokes during a funeral service. Nevertheless, like abilities, these traits may be useful in helping us to predict how individuals will probably behave most of the time, and which people are likely to show a greater degree of extraverted behaviour than others in a particular situation. Cattell (e.g., 1957) also argues that it is necessary to consider two types of ‘state’. Unlike traits, states are short-lived, lasting for minutes or hours rather than months or years. Moods (or emotions, as the distinction is not clear) refer to transient feelings such as fright following a near miss when driving, or joy or despair on learning one’s examination results. He also identifes ‘motivational states’ – forces that direct our behaviour. For example, the basic
Table 3.1 Similarities and differences between abilities, attainments and personality traits
38
Ability trait
Attainment
Personality trait
Measures how well a person can perform some task
Measures thinking skills
May predict behaviour in other areas (e.g., at work)
Measuring individual differences
biological drives (food, sex, etc.) and socially directed motives (such as caring for our family, socialising) can direct our behaviour, but only for a short time. After we have eaten, or visited our boring relatives we do not feel the urge to do so again for a while. So these, too, are states, not traits. How might we measure individual differences in practice? We consider the measurement of states in Chapter 16 and so the following section will focus on personality and ability traits using psychological tests. The key point about all psychological tests is that the individuals taking them should have precisely the same experience, no matter who administers the test or in which country it is given. Great care has to be taken to ensure that the testing situation is standardised. Time limits must be strictly followed, the test instructions must be given precisely in accordance with directions and no variation should be made to the format of the question booklet or answer sheet lest performance be affected. The way in which the test is scored (and interpreted) also has to be precisely explained and followed rigidly.
Measuring traits Some tests are designed to be administered to just one person at a time – for instance, those involving equipment and/or where there is a need to build up a good testing rapport. Using the Wechsler Intelligence Scale for Children to diagnose dyslexia is an obvious example. Some tests are administered to individuals online, which creates a whole set of problems. Is the person taking the test who they say they are? Are they using a fast computer with a good screen in a quiet environment? Who do they ask if they do not understand something? Although personality tests can be easily assembled using sites such as SurveyMonkey, ability tests are more problematical as they often involve pictures and diagrams. Thus it is still reasonably common to administer tests and questionnaires to groups of people in computer suites, or to use paper-and-pencil questionnaires.
Ability tests Ability tests are simply samples of problems, all of which are thought to rely on a particular mental ability. For example, a test designed to assess memory might consist of some puzzles involving memory for lists of numbers or letters (e.g., ‘179384615’), memory for pairs of words (e.g., ‘happy elephant’) or a myriad other things. But once again, plenty of arbitrary decisions are being made. Should the numbers be recalled immediately or after a time interval of minutes, hours or days? Should participants be told that they will be asked to recall them later (in which case they will try to memorise them) or should the recall test come as a surprise? And should the person be asked to recall the items, or recognise them (e.g., ‘were you earlier shown the words “joyful elephant”?’). However, it is not possible merely to put together a set of items and call it a test. For example, there is absolutely no guarantee that all of these items do, in fact, measure the same underlying ability – it is possible that there are several different types of memory, and that a person who is able to recall long strings of numbers a few seconds after they are presented may not be particularly good at recognising pairs of words after a long time interval. It is important, therefore, to bear in mind that the thing that one wants to assess may not exist. There may not be a single ‘memory ability’ which can be assessed using a test. Instead it may be necessary to measure several different aspects of memory, each using a different type of task. One of the most important aims of psychometrics is therefore to develop tests and questionnaires whose items measure a single trait. In the above example, it would be easy to
39
Measuring individual differences
perform a statistical analysis (‘factor analysis’, described in Chapters 5 and 19) in order to discover whether all of these tasks measure a single ‘memory ability’. Some ability and attainment tests involve participants choosing the correct answer from a selection of several possibilities; these are known as multiple-choice tests. Some may ask participants to do something to solve the problem. This could involve writing the answer, solving a jigsaw, or listening to a string of numbers and repeating them backwards, for example. These are known as free response tests. In addition, some tests (‘power tests’) give participants almost unlimited time to solve a set of problems, whilst others (‘speeded tests’) impose time limits, so that most participants will not fnish within the time allocated.
Personality questionnaires: Q’-data and Q-data Perhaps the most obvious way of measuring personality is through self-report questionnaires. Most readers have probably encountered some already: they usually contain questions or statements with several possible answers (see Table 3.2, for example). More detailed examples are given in Chapter 6 and subsequent chapters. It is important to bear in mind two points about such tests. First, it is not possible simply to devise a few questions, decide how the responses should be scored and then administer the test. The reasons for this will be covered in some detail in Chapters 8 and 21, but the basic issue is that there is no guarantee that the questions that you ask will measure the trait that you expect. Instead, it is necessary to perform some statistical analyses to check this. The second important point is that, just because someone ticks a box in a questionnaire in a certain way, this does not imply that we can accept their response at face value. To borrow an example from Cattell (1973), suppose someone strongly agreed with an item in a questionnaire that said ‘I am the smartest man in town’. One can treat this piece of information in two ways. The frst would be to assume that what the person said is accurate and credit them with high intelligence. Cattell terms this Q’-data. A response is treated as Q’-data if the psychologist chooses to believe that the person has made a true, accurate observation about themself. This is common practice in social psychology, and we have already encountered it in the context of Rogers’ personality theory. It is however regarded as rather dangerous by those who work in the feld of individual differences. Cattell argues that lack of self-insight, deliberate attempts to distort responses and other variables considered in Chapter 21 make Q’-data of little value for a scientifc model of personality.
Table 3.2 Typical items from four questionnaires measuring different aspects of personality
40
1.
How sociable are you?
Very
Fairly
Average
Not very
Not at all
2.
The courts are too lenient when passing sentence
Strongly agree
Agree
Neutral/unsure
Disagree
Strongly disagree
3.
I have some undesirable habits
Yes
No
4.
I enjoy the feeling of silk against my skin
Very much
Slightly
Neutral
Not very much
Not at all
Measuring individual differences
The second approach pays no heed to the apparent meaning of the answer, but only to its pattern of relation to other things. Responses to questionnaires are regarded as ‘box-ticking behaviours’ for statistical analysis, rather than as true, insightful descriptions of how the individual behaves. Cattell calls such responses Q-data. This is an important distinction that is often misunderstood both by psychologists and those who are exposed to psychological tests. Psychologists do not believe that people are making true statements about themselves when they answer items in personality tests. They do not need to believe respondents when the responses are analysed statistically. Suppose that a frm wants to use some kind of test to identify those applicants who will develop into highly successful sales staff. They may ask existing staff to complete a questionnaire and discover that successful sales staff strongly agree that ‘they are the smartest person in town’, while the less effective sales staff do not. Thus it would be reasonable just to use this question as part of the selection procedure. No one is interested in whether the applicants are all really the smartest individuals in town (logically they could not all be!), so the response is treated as Q-data rather than as Q’-data. Techniques for developing tests along similar lines are discussed in Chapters 5, 19 and 20. This also highlights an important difference between ability/attainment traits and all other questionnaire measures of personality, mood or motivation. People can only obtain a high score on ability or attainment tests correct by knowing or working out the correct answers. Items in personality, mood and motivation scales, however, do not actually measure performance. Instead, they ask people to introspect about how they behave. This means that the answers may not be accurate for several reasons. For example, if a person is asked to rate themselves on a fve-point scale as to how nervous they feel in lifts (1 = totally relaxed, 5 = terrifed), they may misremember their true feelings, try to paint a favourable impression of themselves or wrongly assume that, as everybody feels totally panic stricken in a lift, their experience of only moderate terror shows they are much more relaxed than the norm and should thus get a rating of 2 rather than 4. It is possible to detect deliberate faking, and we discuss this in Chapter 21. Some questionnaires include a ‘lie scale’ – a list of common but socially undesirable peccadilloes embedded in the personality questionnaire, for example ‘Did you ever cheat at a test in school?’ Someone who admits to few of these is either a saint, out of touch with how they really behave or deliberately distorting their responses.
Ratings of behaviour: L-data The third main way of assessing personality or ability comes from ratings of behaviour from other people. Raters can be carefully trained how to classify behaviours according to a particular checklist. This checklist should ideally relate to behaviour, rather than to general impressions; ‘number of minutes spent per day using social media’ rather than ‘usage of social media is low/average/high’. If they then follow individuals around for a long period (weeks, rather than days) in a wide variety of situations, these ratings of how people behave in their everyday life may give useful insights into their personality or abilities. Cattell terms this ‘L-data’ (for ‘life record’). Techniques such as the Child behaviour Checklist (Achenbach and Rescorla, 2001) and similar measures designed for adults are useful in quantifying personality. Those checklists which are designed for children in residential care are completed by caregivers; examples are given at https://aseba.org/samples-of-forms/. They are designed to tap variables such as anxiety and memory problems as well as various types of problem behaviour, such as those discussed in Chapter 10.
41
Measuring individual differences
Ratings of behaviour are often used to assess personality during interviews and other selection exercises and there are probably some important characteristics (e.g., leadership quality, social skills) that are diffcult to assess by other means. However, because behaviour is observed only for a short time in such settings (and in situations which may not be typical), such assessments are often not accurate (Cronbach, 1994).
Objective tests: T-data The fourth form of evidence stems from analyses of the behaviour of individuals (generally in laboratory situations) who are either unaware of which aspect of their behaviour is being measured or are physically unable to alter their response. These tests should therefore overcome the principal objection to personality questionnaires, which is that responses can easily be faked. For example, Cattell and Warburton (1967) suggested that highly anxious people are likely to fdget more than others. In order to measure fdgeting (and hence anxiety), they ftted a special chair with microswitches and left it in a waiting room outside the laboratory. Volunteers arriving to take part in the experiment sat in the chair and were kept waiting for some minutes – not realising that their behaviour was being measured, and that sitting in the chair was the experiment. These days it is possible that job applicants or those undergoing neurological evaluations might be asked to play a computer game; however unlike conventional games various cognitive processes such as reaction-time or decision-making processes may be recorded (ValladaresRodriguez et al., 2016). Or a text-based adventure game might perhaps assess personality – for example by putting the player in situations where they are asked to choose between several alternatives, each of which represents a different personality dimension – for example, seeking help from others or running away (McCord et al., 2019). This ‘gamifcation’ of psychological measures is a new development, but one which seems to show promise.
ExERCISE You may wish to discuss the ethical problems of this sort of approach to measuring personality. How different is measuring behaviour without the person’s knowledge to observing their behaviour in the street, for example? Is it ethical to ask questions where they are unaware of the implications of answering a question in a certain way? If so, would this prevent all forms of personality assessment? Is it ever ethical to draw inferences about a person’s personality (e.g., via questionnaires) without their knowing what you are assessing and how?
More recently social psychologists have tried to measure attitudes using ‘Implicit Association Tests’ (Greenwald et al., 1998). These claim to show prejudice by revealing attitudes which people do not realise they have, by measuring how long it takes people to associate some words with some groups of people. Unfortunately evidence is accumulating to show that although still incredibly popular, such objective tests have a number of unrecognised problems (Singal, 2017; Forscher et al., 2019). Cattell termed the data from all such experiments ‘T-data’ (since they arose from objective tests) and argues that they should form an excellent basis for measuring personality and
42
Measuring individual differences
ability, since they are entirely objective. Since individuals do not know which aspect of their behaviour is being measured or are unable to manipulate their responses, such tests are diffcult to fake. However, it is very diffcult to devise and develop suitable objective tests and few objective personality tests have so far been found to be of much practical use.
Projective tests of personality ‘Projective tests’ provide a ffth source of evidence about personality. In these tests, individuals are presented with some ambiguous, unclear or completely meaningless stimuli and are assumed to reveal their personalities, experiences, wants, needs, hopes, fears, etc., when describing these. The Rorschach inkblots are probably the most famous projective test. In this test, participants are shown a series of inkblots, rather similar to that shown in Figure 3.1, and are asked to describe in their own words what they ‘see’ in them. (Be warned – replying ‘an inkblot’ is classed as pathological!) Their responses are scored according to any one of about three main scoring systems and are supposed to reveal hidden depths of personality. The really odd thing about these scoring systems is that they have few compelling links to any mainstream psychological theory. For example, they code whether the person pays attention to the whole stimulus or just part of it, and whether they report movement. The problem is that no theory of which I am aware specifes why movement or interpretation of only part of the stimulus should refect personality. It rather looks as if these aspects of performance are coded because they are easy to measure, rather than because they measure anything of psychological value. Because of this and the Byzantine complexity of the scoring schemes, such tests simply do not work and are now rarely used. However, ‘multiple-choice’ projective tests (in which respondents choose responses from a list, rather than describing the pictures in their own words) may perhaps be of some value (Holmstrom et al., 1994).
Figure 3.1 Example of an inkblot, similar to those used in Rorschach’s test 43
Measuring individual differences
Self-assessment question 3.3 What is or are: a. b. c. d. e.
Q-data L-data Q’-data T-data projective tests?
Scoring tests Every test must have some clearly defned technique for converting an individual’s responses into some kind of score. Details of how to administer, score and interpret the test will almost always be given in the test manual. This is generally a substantial booklet that contains other information that may be useful in assessing the merit of the test, although some tests rely on journal articles to provide this information. In the vast majority of cases (projective tests being the only real exception), the score will be a number, since the test tries to quantify each person’s level of the ability, state or personality trait which the scale measures. Multiple-choice ability tests are the easiest to score. In the vast majority of cases, one point will be awarded for each correct answer. Sometimes, in an attempt to prevent guessing, one point will be deducted for each incorrect answer. Items that have not been attempted almost invariably score zero. These tests are generally scored either by computer, by entering the responses into a spreadsheet (Cooper (2019) provides some sample spreadsheets which can easily be edited to score specifc tests) or by using templates. These templates are acetate sheets that are positioned over the top of the answer sheet and which clearly indicate the correct answer for each item. The beauty of scoring multiple-choice tests is that it is almost 100% accurate and requires no subjective judgements. Cattell (never one to resist a neologism) calls such scoring schemes highly ‘conspective’ (literally, ‘looking together’), meaning that different scorers will arrive at the same conclusion, as opposed to essays, for example. Scoring free responses from ability tests can be problematic. Much depends on the quality of the test manual and the skill of the person administering the test. For example, suppose that in a comprehension test a child is asked to defne the meaning of the word ‘kitten’ and they reply that it is ‘a type of cat’ – an answer that is neither completely correct nor totally wrong. Good test manuals will provide detailed instructions, with examples, to show how such answers should be scored; such directions are arbitrary, but at least ensure that scorers behave consistently. Where actions are timed (e.g., the amount of time to solve a jigsaw is recorded), the test manual will show how many points to award for a particular solution time; a puzzle which is solved within one minute might earn fve points, the child is given four points if it is solved within three minutes, etc. Most personality mood and motivation scales require participants to say how well a particular statement describes them or another person. Item 1 in Table 3.2 (‘How sociable are you?’) measures the personality trait of Extraversion, and the test manual might specify that answering ‘very’ is given a score of 5, ‘fairly’ gives 4 points, and so on – with ‘not at all’
44
Measuring individual differences
earning 1 point. Such items are known as Likert scales. If another item in the scale was phrased in the reverse direction, such as: ‘How much do you enjoy being alone’ strongly agreeing with this item would result in a score of 1 point and strongly disagreeing would score 5 points. To calculate a person’s total score on a scale, one simply adds up their points for the items which measure the same trait. Most personality scales urge participants to answer all of the questions and so there should be no missing data. The scoring of most projective tests is very complex, which is another reason why they are now rarely used. Features such as ‘degree of movement’, ‘originality’ (unusualness of response) and ‘response latency [which] may reveal struggling social interactions’ are coded. Some scoring systems do not even attempt to quantify responses, but instead produce qualitative data which are not really amenable to statistical analysis. Those who wish to use these tests professionally have to serve an apprenticeship under the guidance of an experienced user of the test in order to appreciate fully the intricacies of the scoring system, and learn how to apply it. Even so, the level of conspection of most projective tests is lamentably low: two different people are likely to come to vastly different conclusions when interpreting the same set of responses, a point that was made with some force by Eysenck (1959). However, there seems to be no good reason why responses to projective tests cannot be coded objectively, using some form of content analysis – that is, specifying a large list of characteristics (e.g., ‘mentions any nonhuman animal’), each to be coded as present or absent.
The problem of quantifcation The scoring guidelines for ability and personality scales outlined in the previous section may sound much like common sense. It may seem blindingly obvious that items should be scored as described above, and that scores on a scale should be calculated by adding together people’s scores on items which measure the same trait or state. We are all so accustomed to this ‘common sense’ procedure that it is easy to overlook its problems. It is easy to forget that when numbers are assigned to responses to items, they do not behave in the same way as numbers assigned to points on a ruler or thermometer. Someone who ‘agrees’ that they enjoy parties (scoring 4 on this Extraversion item) is not twice as sociable as someone who says that they ‘disagree’ with it (scoring 2). The numbers assigned to these answers do not behave in the same way as numbers on a ruler, for example, where a line whose length is 4 is twice as long as a line whose length is 2. Indeed, the whole idea of awarding scores of 5, 4, 3, 2 and 1 to the various answers is arbitrary. Why not use 11, 9, 6, 2, 0 or any other arbitrary set of descending numbers? Will treating the data as ordinal numbers solve the problem? That certainly forces the responses to each item to produce the numbers 1–5, but it does not solve the problem that someone who scores 4 is not necessarily twice as sociable as someone who scores 2. Michell (1997) identifes a more fundamental problem, too. This is that we assume that a characteristic such as Extraversion is quantitative – i.e., that it can be represented using numbers. This need not be the case at all. In conceptualizing an attribute as quantitative a scientifc hypothesis is proposed. There is no logical necessity that any attribute should have this kind of structure. Hence, accepting this hypothesis is speculative . . . (Michell, 1997)
45
Measuring individual differences
Determining whether an attribute (such as Extraversion) is quantitative is not straightforward, but Michell’s point is that psychologists usually rush ahead and assume that it is, without any evidence whatsoever; he calls psychometrics a ‘pathology of science’ (Michell, 2004) for that reason. Michell’s work is not easy to read unless one has a background in philosophy and/or measurement theory, but it has broad implications. If Michell is correct (and I have seen no convincing refutations of his position) then psychology will never to be able to use tests to develop proper quantitative theories, akin to the gas laws in chemistry or Newton’s laws of motion in physics. This is a very fundamental, far-reaching problem, which is generally ignored by researchers. You should perhaps worry about it. The last few pages of Barrett (2016) gives an excellent summary of the issues together with some useful references, as does Cooper (2019).
Norms ExERCISE Suppose that Jane completes a 20-item test of musical ability and gives correct responses to 15 items. What can be concluded about her musical ability?
The answer is simple – nothing. You may possibly have thought that since there were 20 items in the test and Jane answered more than half of them correctly, this would indicate that her score was above average, but, of course, it does not, for in almost all cases, a person’s score on a test depends on the level of diffculty of the test items. The items may have been so trivially easy that 99 out of 100 children might have obtained scores above 15 on this test, in which case Jane would be markedly less musical than other children of her age. This once again shows that the numbers representing test scores simply do not behave in the same way as numbers used in the physical sciences. To interpret the meaning of an individual’s test score it is usually necessary to use norms. Tables of test norms simply show the scores of a large, carefully selected sample of individuals. For example, the test might be given to 2035 children aged between 96 and 99 months, ensuring that the sample contains roughly equal numbers of males and females, that they are sampled from different regions of the country (in case some regions are more musical than others) and that the proportion of children from ethnic and other minorities is consistent with that in the general population. A frequency distribution of these scores can be drawn up in a similar way to that shown in Table 3.3. The frst column shows each possible test score, the second column shows the number of children in the sample who obtained each score, the third column shows the number of children who obtained each score or less and the fourth column shows this fgure as a percentage – a fgure known as the percentile. Many test manuals show the percentile scores making it a simple matter to interpret an individual’s test score. it is merely necessary to look across and discover that 62% the children scored 15 or less on the test. Sometimes, however, percentiles are not shown. If these scores follow a normal (bell-shaped) distribution whose mean and standard deviation are known, it is still straightforward to estimate the percentage of the population having a score as low as
46
Measuring individual differences
Table 3.3 Norms for a test of musical ability, based on a (hypothetical) random sample of 2035 children aged 96 months Score
Number of children with this score
Number of children with this score or lower
Percentile
0
3
3
1
2
3+2=5
100 x 3/2035 = 0.15 0.25
2
6
3 + 2 + 6 = 11
0.54
3
8
19
0.93
4
8
27
1.33
5
13
40
1.97
6
17
57
2.80
7
23
80
3.93
8
25
105
5.16
9
33
138
6.78
10
57
195
9.58
11
87
282
13.86
12
133
415
20.39
13
201
616
30.27
14
293
909
44.67
15
357
1266
62.21
16
270
1536
75.48
17
198
1734
85.21
18
126
1860
91.40
19
100
1960
96.31
any particular test score. For example, the average of the scores shown in Table 3.3 is 14.47 and the standard deviation is 4.978. Suppose that we want to estimate what percentage of the population has a score of 15 or below. To do this, simply calculate: z=
15 − 14.47 = 0.106 4.978
A table of the standard normal distribution (found in almost any statistics book) will then show the proportion of individuals having a score lower than this value, which is 54%. This fgure is similar (but not quite identical) to the one that we read directly from Table 3.3. The discrepancy arises because we assumed that the scores follow a normal distribution, whereas, in fact, they do not quite do so. However, this approach can be useful if you know the mean and standard deviation, but do not have access to the full table of norms. Children’s performance on cognitive tasks increases with age, so when evaluating a child’s score on a test it is essential to compare their scores to those of other children of the same age. Norms for these tests are typically gathered for 3-month age intervals (e.g., 7 years 6 months, 7 years 9 months, etc.) to facilitate this.
47
Measuring individual differences
Most test manuals contain several different tables of norms, for example those collected in different countries, separate norms for each sex and (almost invariably in the case of ability tests) at different ages. All that is necessary is to choose the table that is the most appropriate for your needs, ensuring that it is based on a large sample (a minimum of several hundred) and that care has been taken to sample individuals properly. It is also important to consider legal issues. For example, it may be illegal to use different tables of norms for males and females.
Self-assessment question 3.4 Why are different norms used for different ages in ability tests?
Finally, I ought to mention that tables of norms are only necessary for the interpretation of one individual’s score on a test. Researchers will usually just want to correlate scores on a test with scores on other variables (e.g., correlating intelligence with head size) or compare the test scores of two groups (e.g., using a t-test to determine whether males and females are equally musical). Then it is not only unnecessary to convert the norms to percentiles, but also bad practice, for you will fnd that the original distribution of scores is much more bell-shaped than that of the corresponding percentiles and this is an assumption of most statistical techniques.
Obtaining and using tests Commercial tests Commercial psychological tests such as the Wechsler Intelligence Scale for Children (Wechsler et al., 2016) and the NEO-Personality Inventory (NEO-PI(R)) are expensive to develop and can only be bought and used by qualifed individuals. Imagine what would happen if a photocopy of a commercial intelligence test reached student common rooms. Groups of students would while away some time by trying to solve the items and (given that it is a group effort with unlimited time) might well succeed in solving most items in the test. If some of these people were then given the same test as part of a screening process for employment, those who had had previous experience of the test might well be able to remember at least some of the correct answers and so would score higher than they ‘should’ do, reducing the effectiveness of the test in selecting the best applicants. One way of fnding commercial tests such as these is to search test publishers’ catalogues and websites. However these invariably describe tests in glowing terms and so will not necessarily help you to identify a good (appropriate, and accurate) test. The best way to identify a good commercially published test is to consult the Mental Measurements Yearbooks. These weighty volumes were introduced by Oscar Buros in 1938 to provide a ‘consumer guide’ to commercially published psychological tests – both in print and at http://buros.org/. They contain indices listing tests by type, by name and by author, but the real value of these volumes lies in the critical reviews of tests, which have been written by psychometric specialists. These will sometimes state, quite bluntly, that a particular test is so severely fawed that it should be avoided or only be considered for research purposes.
48
Measuring individual differences
Even the best test is useless if it is not administered, scored and interpreted correctly. The second reason for making these tests available to qualifed users is to ensure that they will be administered and interpreted properly. It is important to adhere to the standard instructions, practice examples and so on so that participants fully understand what they are expected to do. Time limits must be carefully observed, and scoring instructions followed rigorously. It is also necessary to ensure that ethical standards are maintained – that the test which is chosen is appropriate for the particular application and sample of people, that any feedback given to participants is accurate and that the test results are kept confdential. This is another reason why test publishers will only sell tests to users who have demonstrated competence in test use – normally by attending a course of instruction approved by the test publisher. In the UK, the British Psychological Society oversees a three-tier system of licensing test use in psychology. At the lowest level are test administrators, who are qualifed only to administer certain group-administered tests under the direction of a more highly qualifed psychologist. The next level involves training in the selection, scoring and interpretation of certain ability tests. The highest level of certifcation involves demonstrable skill in choosing, administering and interpreting the results from personality tests. Other countries have adopted similar policies, and the International Test Commission’s ‘guidelines on test use’ are an invaluable resource. They may be found at https://www.intestcom.org/ and provide broad principles covering the selection of appropriate tests, guidance for their administration, their scoring and the interpretation of results. They should be consulted in conjunction with guidelines produced by one’s national psychological association.
Non-commercial tests A great many tests are not published commercially but appear on the internet or as appendices to journal articles. The beauty of these tests is that they are almost always free to use; the disadvantage is that the responsibility for evaluating a test’s suitability and accuracy falls frmly on the test user. Some such tests will be patent rubbish. Others may be inappropriate for use in some cultures, ages-ranges or applications. Thus the suitability of any ‘free’ test for any application should be determined by searching journal articles for evaluative studies, reading the ITC and other guidelines and consulting Chapters 5, 8 and 19–21 of this book, or alternatives. There are several ways of fnding free-to-use tests. The Educational Testing Service has a database of commercial and non-commercial tests. Its url at the time of writing is https://www.ets.org/test_link. It is possible to search for tests measuring a particular trait (e.g., ‘visualization’; note that it requires American spellings of search terms). It shows brief descriptions of tests with references to where they may be obtained, but not the test items themselves. The International Personality Item Pool at http://ipip.ori.org contains a huge range of personality scales which are free to use for any purpose. Several of them are provided as free alternatives to expensive commercial products. Unlike the ETS database, this website includes the test items, scoring directions and (sometimes) links to norms and information about the accuracy the scales in some contexts. If the name of a test or its author is known, it is possible to search for a test using a database such as PsychInfo or Web of Science (not a simple internet search). Search for the name of the test or its author in the title or topic; very often the paper containing the test will be the author’s most cited work. If the original paper cannot be found, papers by other authors which report the ‘factor structure’ of the test will often reproduce the items. Beware copyright laws, however. The reproduction and use of items published in journal articles is something of a grey area.
49
Measuring individual differences
Finding a test is easy; determining whether it is likely to be useful is far harder and more time consuming, as there is nothing like a Mental Measurements Yearbook to guide you. It is vital that you consider dispassionately whether a test is really going to suit your needs. It is necessary to review the literature oneself to determine this. Issues of reliability, validity and bias (discussed in Chapters 8 and 21) are of particular importance.
Summary Having read this chapter, you should appreciate the need to develop operational defnitions of psychological concepts in order to assess individuals or test hypotheses. You should be able to describe what is meant by the terms ‘trait’ and ‘state’, describe the main types of mental test, discuss how individual test scores may be interpreted through the use of norms, outline the ethical principles associated with test use and show some appreciation of Michell’s devastating criticisms of all forms of psychological assessment using tests and questionnaires.
Suggested additional reading If you have not already done so, you should consult the International Test Commission guidelines on test use at https://www.intestcom.org (together with the guidelines produced by your national psychological society or association) in order to learn about standards of testing procedure. You may fnd it interesting to browse through the Mental Measurement Yearbooks and search out some reviews, for example reviews of the Rorschach test, Sixteen-PersonalityFactor test, Lüscher Colour Test, Adjective Checklist and/or the Wechsler Adult Intelligence Scale. You may also enjoy visiting http://ipip.ori.org to see what some typical test items look like or to follow a link to test your own personality using the IPIP-NEO online.
Answers to self-assessment questions SAQ 3.1 a. Any score from a psychological test or other device that is used as if it measured some theoretical concept. b. A person’s score on a test of appropriate diffculty consisting of whatever items are thought to make up the concept of ‘French-speaking ability’, for example items measuring knowledge of colloquialisms and vocabulary and ratings of quality of accent and fuency. The measurement of meanness is a much more diffcult proposition. Direct self-report questions will probably not work – who would admit to this quality? Asking if any regular payments to a charity are made from bank accounts may be more useful. Ratings by friends may be a possibility, as may ‘objective tests’ (e.g., asking for a loan, or observing behaviour in the vicinity of a beggar). However, most of these measures could indicate ‘poverty’ instead of ‘meanness’.
50
Measuring individual differences
SAQ 3.2 a, c and e are probably traits, as they will be fairly stable over time and across situations. Hunger and rage (b and d) are probably states, as one’s level of hunger falls after feeding, and anger probably occurs in response to some event and declines thereafter.
SAQ 3.3 a. Responses to questionnaire items that are analysed in ways that do not assume the truth of what the respondent is saying. b. Ratings of behaviour. c. Responses to questionnaires that are analysed as if the respondents are making some true comment about themselves. For example, if a person strongly agrees that they are happy, the psychologist would conclude that they are, indeed, happy. d. Data from objective tests of personality – tests whose purpose is hidden from the test participant or in which the test participant physically cannot alter his or her response. e. Personality tests in which an individual is asked to describe or interpret an ambiguous stimulus (e.g., a picture, an inkblot or a sound). The rationale for such tests is that the way in which the stimuli are interpreted will provide some insight into the individual’s personality, needs and experience. Scoring the responses to such tests is notoriously diffcult, as Eysenck has shown.
SAQ 3.4 Because abilities tend to increase with age. In order to determine whether a particular child is performing markedly better or worse than ‘the average child’, it is important to compare them with other children of the same age. Almost all abilities increase with age as a result of physiological maturation and education. Suppose that a test had a table of norms that covered a fairly broad range of ages (e.g., children aged 6–10). It would not be possible to use these to interpret a child’s score, since an ‘average’ (or median) score could indicate a 6 year old who was performing much better than most 6 year olds, an 8 year old of average ability or a 10 year old whose performance was below average. The best ability tests give norms at 3-month intervals (e.g., one table for children aged 74–76 months, another table for children aged 77–79 months, etc.).
51
Measuring individual differences
References Achenbach, T. M. & Rescorla, L. A. 2001. Manual for the ASEBA School-Age Forms & Profles. Burlington, VT: University of Vermont, Research Center for Children, Youth, & Families. Barrett, P. 2016. Electrophysiology, Chronometrics, and Cross-Cultural Psychometrics at the Biosignal Lab: Why It Began, What We Learned, and Why It Ended. Personality and Individual Differences, 103, 128–134. Cattell, R. B. 1957. Personality and Motivation Structure and Measurement. Yonkers: New World. Cattell, R. B. 1973. Personality and Mood by Questionnaire. San Francisco: Jossey-Bass. Cattell, R. B. & Warburton, F. W. 1967. Objective Personality and Motivation Tests: A Theoretical Foundation and Practical Compendium. Urbana: University of Illinois Press. Cooper, C. 2019. Psychological Testing: Theory and Practice. London: Routledge. Cronbach, L. J. 1994. Essentials of Psychological Testing. New York: Harper-Collins. Eysenck, H. J. 1959. Review of the Rorschach Inkblots. In O. K. Buros (ed.) Fifth Mental Measurement Yearbook. New Jersey: Gryphon Press. Forscher, P. S., Lai, C. K., Axt, J. R., Ebersole, C. R., Herman, M., Devine, P. G. & Nosek, B. A. 2019. A Meta-Analysis of Procedures to Change Implicit Measures. Journal of Personality and Social Psychology, 117, 522–559. Gardner, H. 1993. Frames of Mind (2nd Edn). London: Harper-Collins. Greenwald, A. G., McGhee, D. E. & Schwartz, J. L. K. 1998. Measuring Individual Differences in Implicit Cognition: The Implicit Association Test. Journal of Personality and Social Psychology, 74, 1464–1480. Guilford, J. P. 1967. The Nature of Human Intelligence. New York: McGraw-Hill. Holmstrom, R. W., Karp, S. A. & Silber, D. E. 1994. Prediction of Depression with the Apperceptive Personality Test. Journal of Clinical Psychology, 50, 234–237. McCord, J.-L., Harman, J. L. & Purl, J. 2019. Game-Like Personality Testing: An Emerging Mode of Personality Assessment. Personality and Individual Differences, 143, 95–102. Michell, J. 1997. Quantitative Science and the Defnition of Measurement in Psychology. British Journal of Psychology, 88, 355–383. Michell, J. 2004. Item Response Models, Pathological Science and the Shape of Error – Reply to Borsboom and Mellenbergh. Theory & Psychology, 14, 121–129. Singal, J. 2017. The Creators of the Implicit Association Test Should Get Their Story Straight. New York Intelligencer. New York, NY. Valladares-Rodriguez, S., Perez-Rodriguez, R., Anido-Rifon, L. & Fernandez-Iglesias, M. 2016. Trends on the Application of Serious Games to Neuropsychological Evaluation: A Scoping Review. Journal of Biomedical Informatics, 64, 296–319. Wechsler, D., Pearson Education Inc. & Psychological Corporation 2016. WISC-V: Wechsler Intelligence Scale for Children. London: NCS Pearson PsychCorp.
52
Chapter 4 Depth psychology
Introduction This chapter discusses psychoanalysis and some other theories which emphasise unconscious mental processes. Although they are a century old, and rarely feature in modern personality textbooks, these theories are included here for several reasons. When considering the work of Kelly and Rogers, and the use of personality questionnaires, we have assumed that self-reports can tell us something useful about behaviour. This may not be the case. Modern neuroscience (e.g., Dehaene et al., 2014) shows that brains respond to stimuli of which we are unaware, and of which we cannot become aware by switching attention. Conscious awareness comes fairly late in the process of perceiving and thinking. It is possible that we are experts at deceiving ourselves, making self-reports of little use. Psychoanalytic theories focus on unconscious determinants of behaviour, and so it might be worthwhile scrutinising these theories in case they have something useful to offer. These theories never quite go away. When they hear that you are studying psychology, many readers’ friends and family will ask ‘are you psycho-analysing me now?’ – and so it seems appropriate to outline what psychoanalysis is, and whether it is well thought of. In addition these theories still hold sway in areas such as social work and literary criticism. Thus although they have largely disappeared from psychology courses, readers may encounter them in other contexts and so a little critical evaluation might be useful. They are unusual in that they try to explain many aspects of human behaviour. As well as outlining a theory for the development and functioning of personality, the theories can (supposedly) also explain why we enjoy music, art and literature, and point to the origins of religious belief, humour, warfare and aggression. They claim to show clinical syndromes such as depression and schizophrenia develop and operate, and how such problems may be cured through therapy. This is a far cry from the rather narrow views of personality outlined in Chapter 2; if psychoanalytic theories can explain just some of these phenomena, then they merit study.
53
Depth psychology
They also force us to think about what makes a ‘good’ theory of personality. For example, what sort of data should a theory be based on? Are clinical insights suffcient to develop a theory? Is it important that results should be replicable? How complex can a theory become before it stops being useful? Is quantifcation desirable? Should we assume that fndings from clinical patients generalise to the rest of us? Psychoanalytic theory showed that all these issues are important, and therefore shaped the direction of future research in personality, mood and motivation.
Learning outcomes Having read this chapter you should be able to: l l
l l
Show an appreciation of how psychoanalytic theories were developed, and how psychoanalysis supposedly cures neurosis. Explain terms such as ego, id, superego, defence mechanism, repression, Oedipus complex, inferiority complex, mandala and Jung’s psychological orientations. Develop an opinion about whether the models developed by Freud and other analysts were actually correct. Discuss what makes a good personality theory, with reference to the work of Rogers, Kelly, Freud, Adler and Jung.
Freud’s theories Sigmund Freud (1856–1939) was the founder of psychoanalysis – a collection of theories which evolved over his lifetime; his theory of aggression only developed in 1920, for example. There is no substitute for reading translations of Freud’s original works, especially as several of them were written to educate non-expert readers, and some suggestions are given at the end of this chapter. They are remarkably easy to read. The theories assume that there is no clear distinction between abnormal behaviour (e.g., those diagnosed with schizophrenia and depression) and normal behaviour. People with such diagnoses just have more extreme levels of the characteristics which ‘normal’ people show, and the mechanisms which cause them are thought to be the same. This is now known as the ‘continuity hypothesis’, and there is good support for the basic idea (Claridge, 1985). Freud began studying medicine in Vienna in 1873. In the nineteenth century, psychiatric disorders that had no obvious physiological basis were dismissed out of hand as being impossible. Some patients who showed symptoms such as memory failure, paralysis, nervous tics (twitching), tunnel vision, blindness, loss of speech and convulsions were diagnosed as suffering from various forms of hysteria: that is, symptoms without a good physiological cause. Freud’s interest in psychiatry began in the 1880s when he travelled to Paris to study with Charcot, the celebrated neurologist. Charcot was researching how hypnotic suggestion could be used to implant a suggestion in the mind of a normal volunteer who would then perform some activity (or experience some feeling, such as a numb leg) even after being released from the hypnotic trance.
54
Depth psychology
Freud saw some similarity between the behaviour of Charcot’s hypnotised subjects and that of patients who suffered from hysteria. In both cases, the behaviour or symptom had no physical cause – the numbing of the leg resulting from a hypnotic suggestion had a purely psychological origin. In addition, subjects who were hypnotised were unaware of the source of their behaviour or motivations to perform certain actions, in much the same way that hysterical patients claimed to be completely ignorant of the origins of their symptoms. Thus Freud speculated that some kind of mental event may explain the symptoms of hysterics. As both the hypnotised volunteer and the hysterical patient seem to have no memory for such an event, it must be located in an area of the mind that is not normally accessible to consciousness, but from where it can nevertheless infuence behaviour and bodily and mental symptoms. Perhaps hysterical symptoms might be cured through appropriate psychological manipulations in much the same way as a hypnotic suggestion can be cancelled through the instructions of the hypnotist? On returning to Vienna, Freud set himself up as a specialist in the treatment of ‘nervous diseases’ and collaborated with Josef Breuer to publish Studies in Hysteria, which is the basis of Freud’s theory of psychoanalysis. They reported some remarkable cures of hysterical symptoms through the use of hypnosis. Patients were hypnotised and were then encouraged to believe that their symptoms would disappear. Freud then probed for information about the causes of the symptoms, and considered that the careful probing of the patients’ reminiscences (rather than hypnotic suggestion) was the ultimate cause of these cures. He claimed that: Each individual symptom immediately and permanently disappeared when we had succeeded in bringing clearly to light the memory of the event by which it was produced, and in arousing its accompanying affect [‘emotion’], and when the patient had described that event in the greatest possible detail and had put the affect into words. (Freud and Breuer, 1893/1964) This is the frst description of the technique known as free association. It involved the patient’s reclining on a couch, while the psychoanalyst asked her to say the very frst thing that came to mind in response to occasional words spoken by the psychoanalyst. These words would sometimes refect themes from the patient’s dreams, for reasons which will become apparent below. The patient was particularly urged not to omit any detail, no matter how trivial, unimportant, embarrassing or irrelevant it appeared to be. The psychoanalyst paid careful consideration to these reminiscences and noted in particular those occasions when the patient found it diffcult to produce an association in response to a word or phrase. This might indicate that some psychological mechanism, then known as a repression, was interfering with the patient’s ability to respond. As the patient had been urged to say whatever came to mind, a failure to produce an association was not due to selfcensorship. Instead, it suggested
55
Depth psychology
that there was some unpleasant, emotionally charged memory that had been pushed into an unconscious part of the mind and that might be the cause of the hysterical symptom. Freud claimed that his patients were eventually able to recall events (often from childhood) that had been completely forgotten for many years and that had been associated with some strong and unpleasant emotional experience. When they both recalled the painful event and re-experienced the emotion that was felt at the time, the physical symptom simply disappeared. Freud claimed to have cured symptoms such as an inability to drink, paralysis, blindness, nervous tics and loss of memory using this technique, which he termed psychoanalysis. Here is a description of one of Breuer’s early attempts to probe the unconscious mind using hypnosis in an attempt to cure a young woman who used to enter trance-like states, similar to some forms of epilepsy: It was observed that, when the patient was in her states of ‘absence’ [altered personality accompanied by confusion], she was in the habit of muttering a few words to herself which seemed as if they arose from some train of thought that was occupying her mind. The doctor, after getting report of these words, used to put her into a kind of hypnosis and repeat them to her so as to induce her to use them as a starting point. The patient . . . in this way reproduced in his presence the mental creations which had been occupying her mind during the ‘absences’ and which had betrayed their existence by the fragmentary words which she had uttered. They were profoundly melancholy phantasies – ‘daydreams’ we should call them – sometimes characterised by poetic beauty, and the starting point was as a rule the position of a girl at her father’s sickbed. When she had related a number of these phantasies, she was as if set free, and she was brought back to normal mental life . . . It was impossible to escape the conclusion that the alteration in her mental state which was expressed in the ‘absences’ was a result of the stimulus proceeding from these highly emotional phantasies. (Freud, 1925/1959 section IV)
Self-assessment question 4.1 Try to explain the following terms: a. free association b. resistance c. repression.
Analyses of dreams, ‘slips of the tongue’, wit and similar everyday experiences led him to conclude that the mental structures that he deduced from the analysis of hysterical patients were universal. Thus his theory sought to become a general theory of personality, rather than an explanation for hysterical behaviour alone.
A psychoanalytic model of the mind As memories and fantasies can be dragged reluctantly back into consciousness: l It seems that there are two main areas of the mind – one that is susceptible to conscious
scrutiny through introspection and another whose contents are normally unconscious.
56
Depth psychology l It is also important to discover how the contents of the unconscious region of mind
infuence behaviour, as in the case of hysterical symptoms. l Finally, it is necessary to understand how memories pass from the conscious to the
unconscious regions of the mind and why memories that have been transferred into the unconscious region usually stay there, except in the case of psychoanalytical treatment. Freud thus suggested that the mind consists of the ego, which is conscious and capable of logical, rational thought. The other, unconscious, area of mind he called the id. This contains the basic instincts (sex and aggression), satisfaction of which leads to feelings of pleasure and which therefore motivate behaviour – the ‘pleasure principle’. He also suggested that memories and thoughts that have become associated with these instincts have also been pushed into the id. The instincts, memories and fantasies in the Freudian id were thought to infuence day-to-day behaviour and experience, without the individual’s being aware that they even exist. The analysis of hysterical patients led Freud to a conclusion that shocked Vienna. The unconscious memories and fantasies of his proper, well-to-do female patients were almost invariably sexual. Psychoanalysis revealed that the emotional content of a thought or memory (or the emotion that became attached to it through association, a process now known as classical conditioning) determined its destiny. Painful memories or unacceptable thoughts (e.g., sexual fantasies) were banished to the unconscious mind (the id), where they joined the seething lusts and aggressive drives that are found there and somehow strived to force themselves back into consciousness. The id therefore contains two basic instincts, namely eros, also known as the life instinct, which is satisfed through sexual activity, and a self-destructive force known as thanatos or the death instinct, which can also be turned outwards against the world, resulting in aggressive or sadistic behaviour. Freud believed that we always seek to satisfy these deep-seated (but unconscious) instincts for sex and aggression – the pleasure principle. The id demands their immediate and complete satisfaction, no matter how inappropriate, impractical or unreasonable this may be and without consideration of social conventions or other people. Freud called this primary process thought, which is also characterised by its illogicality and tolerance of ambiguity. For example, the id would have no diffculty in simultaneously desiring and wishing to destroy a person. Melanie Klein claimed to have identifed terrifying destructive and aggressive fantasies in very young infants, which are thought to be typical of the way in which the id operates. Freud believed that the ego splits off from the id as the child grows older, as a result of its experiences of the outside world through sight, touch, hearing and the other senses. A part of the mind becomes conscious, aware of the outside world, self-aware and capable of rational thought. The basic aim is still the satisfaction of the drives from the id, but these can be assuaged far more effectively through postponement and the use of social conventions. Satisfying the sexual instinct immediately and completely with the frst person one encountered (as the id would demand) would probably not be a good idea. It would result in a prison sentence and so would fail to provide a long-term solution for most people. It would also run counter to the moral standards of one’s conscience (the superego), which is the third main area of the mind. It will be far more effective in the long term to act rationally under the control of the ego and develop a relationship. There are also other ways in which the drives can be satisfed, such as by reading or other activities (e.g., art, humour, gastronomy), in which basic drives are being satisfed in socially acceptable ways. These are sometimes known as derivatives of the original drives, that is, ideas that are connected to the basic instincts of sex and aggression, but whose satisfaction
57
Depth psychology
is more acceptable to the ego. Fenichel (1946) claimed that most neurotic symptoms are derivatives of the two basic instincts. Childhood experiences are thought to be all important for adult development. According to Freud, childhood is a battleground between the two major instincts and the attempts of parents to ensure that they are satisfed in socially acceptable ways. Derivatives of the sexual and aggressive instincts are formed and the ego develops defence mechanisms.
Defence mechanisms: battles in the unconscious It is necessary to understand why some painful memories are ‘forgotten’, why they result in a physical symptom and why the symptom disappears when the memory is relived. Freud noted with interest that the key forgotten memories tended to be mentioned but immediately dismissed as unimportant by the patient. Only as a result of careful listening and prompting by the analyst could they eventually be recalled. This suggested that there were some forces (of which the patient was unaware) that tried to keep the painful memory out of consciousness. They were called defence mechanisms and over a dozen of them have been proposed by Freud and his followers. The most well known is called repression and this term is used interchangeably with ‘defence mechanism’ in Freud’s early writings. It implies a purposeful forgetting of an emotion-laden wish or memory. However, once it reaches unconsciousness it constantly tries to re-enter consciousness with the result that there is a perpetual battle taking place between the wish (which has been repressed into the id) and the defence mechanisms. The more memories an individual banishes into the id, the stronger the defence mechanisms that are required to keep them there. Thus everyone is to some extent neurotic, because we have all dealt with painful memories by forcing them into unconscious regions of the mind, from where they continue to try to infuence on our feelings and behaviour. When a repressed wish or memory tries to re-enter consciousness, anxiety levels supposedly rise and this triggers the defence mechanisms into operation. Disguised and unrecognisable substitutes for the original wish may, however, sneak into consciousness as daydreams, fantasies – or as neurotic symptoms. At night the defence mechanisms relax and the contents of the id can infltrate the ego in a disguised form, often as symbols in dreams. Many of these are sexual in nature – for example, Freud (1900) claimed that the penis may be represented by sticks, daggers, telescopes, watering cans, fshing rods, feet, etc. Other defence mechanisms include isolation, in which an instinct or memory may be allowed into consciousness because the all-important emotion associated with the drive is stripped away: aggressive instincts may be neatly satisfed through becoming a surgeon. Projection involves attributing one’s own unacceptable fantasies, impulses, etc. on to other people. An aggressive person may see others as being aggressive. Displacement of object involves the expression of a sexual or aggressive wish, but against a substitute person, animal or object. Reaction formation occurs when one’s ego reacts as if a threat (from the id) is always present. Thus someone who has strong sexual urges may show the extreme prudery which is obvious in anti-pornography campaigners. If one decides that it is one’s duty to constantly look for signs of flth and depravity, then one is (reluctantly, of course) exposed to rather a lot of it – which conveniently satisfes the demands of the id, while keeping the ego in ignorance of one’s true motives. There are more defence mechanisms than those discussed here, several of which were suggested by followers of Freud. Descriptions of some of the more popular ones may be found in Kline (1984) or in any of the sources mentioned at the end of this chapter.
58
Depth psychology
Self-assessment question 4.2 a. What is a ‘defence mechanism’? b. Suppose that a patient is able to remember some horrifc incidents from her childhood, but is able to talk about the events in a detached, unemotional manner. Which defence mechanism may she be using? c. Suppose that a person uses two defence mechanisms to keep his sexual instincts at bay, namely projection and reaction formation. Try to work out how this may lead him to feel about someone of the same sex whom they (unconsciously) love, remembering that in Freud’s day this feeling would have been unacceptable to the ego.
The structure of the mind is rather as shown in Figure 4.1. This shows the conscious and unconscious areas of the mind – and also a grey area known as the ‘pre-conscious’ which contains information of which one is unaware that can potentially become conscious if one attends to it, or thinks hard. Note that the id is entirely unconscious. The sexual and aggressive
Figure 4.1 Freud’s structure of mind
59
Depth psychology
drives and their derivatives try to bubble up into the ego. Anxiety is thought to be triggered when they do so and this prompts the operation of defence mechanisms – though when we sleep these mechanisms relax somewhat so that the ego may become aware of some of the id content, albeit in disguised (sanitised) form, such as symbols, so as to not trigger the defence mechanisms.
Perceptual defence: measuring repression in the laboratory? The defence mechanism of repression seems to be very similar to the phenomenon of perceptual defence as studied by Bruner and Postman (1947). They claimed that individuals fnd it harder to recognise emotionally threatening words than neutral or positively toned words in laboratory studies. A typical experiment might involve the establishment of ‘recognition thresholds’ for various words, which are basically the minimum duration for which a word has to be exposed for it to be correctly recognised. The words would be displayed for a very brief period using a piece of equipment called a tachistoscope. The person being tested would guess what the word was. If they guessed incorrectly, the word would be shown again for a slightly longer duration, this process being repeated until it was correctly identifed. It was found that words with negative associations (e.g., ‘whore’, ‘rape’, ‘death’) had to be shown for longer periods of time than other words in order to be correctly recognised; they had longer recognition thresholds. This suggested that some type of psychological mechanism monitored the emotional content of the words as they were being perceived and could in some way affect the visibility of the words in an attempt to keep threatening words from being consciously perceived. Such studies raised a stream of protests and alternative explanations, ranging from philosophical objections to the idea of a homunculus – a ‘little man inside the head’ who monitors what enters consciousness – to sound methodological criticisms of the early experiments. The old perceptual defence experiments were fawed because they failed to control for word length and word frequency, both of which can affect the ease with which words are recognised. Short, common words are recognised much more rapidly than long or unusual words. Furthermore one would have to be very certain that the word ‘penis’ really was present before calling it out to one’s lecturer, whereas one would guess neutral words quite freely. This alone would result in the ‘taboo word’ appearing to have a higher recognition threshold than the neutral word (‘response suppression’). There are other problems with perceptual defence experiments, some of which are discussed by Brown (1961) and Dixon (1981). Since the paradigms were developed by experimental psychologists, the emphasis has been on proving that the perceptual defence effect exists, rather than on examining its correlates. Thus while many studies claim to show (or not to show!) that the emotional meaning of a stimulus is related to its detectability, hardly any of them examine individual differences. It would be simple to calculate scores indicating how pronounced the perceptual defence effect is for each individual, for example by subtracting the individual’s mean threshold for neutral words from their threshold for threatening words. One could then test whether individuals who show a massive difference are anxious, rated by their therapists as ‘repressors’, and in other ways examine the correlates of perceptual defence. Such studies are rare, but see Watt and Morris (1995). These days cognitive scientists are happy to accept that the emotional words and pictures of which a person is apparently unaware can infuence both their physiological reactions and their cognitions (e.g., Hendler et al., 2003; Gibbons, 2009). The fnding that our nervous systems react to the emotional content of stimuli of which we are unaware is less controversial than it has ever been.
60
Depth psychology
Neurotic symptoms and their cure Freud identifed several different types of neurotic symptom in his patients. These include anxiety hysteria, in which excessive anxiety is attached to certain people or situations, anxiety neurosis, in which the feeling of anxiety is powerful but diffused and not linked to any particular individuals or settings, and conversion hysteria, in which unconscious fantasies are acted out as emotional outbursts. Freud believed that physical illnesses such as muscle pains, breathing irregularities and even short-sightedness could sometimes be traced back to psychological causes – the so-called organ neuroses. Depression, the choice of unusual sexual objects, compulsive actions, addictions and compulsions (such as shoplifting) are also perceived as having an underlying psychological cause. Despite the highly varied nature of these symptoms, Freudians believe that they have several features in common. First, as Fenichel (1946) reminds us, the patient experiences some unusual symptom – a feeling of anxiety, a compulsion to steal, bodily pains or whatever – but is unable to determine its cause through introspection. Second, Freudians believe that all these symptoms have their roots in some confict between the id and the ego, which has activated defence mechanisms. They therefore stem from the blocking of one of the powerful instincts by the ego. The exact form that the symptom takes depends on childhood experiences and the techniques the infant used to deal with threats from the id. Third, the symptom is always thought to have some special sexual or aggressive signifcance to the individual. Why might a man become sexually excited by viewing and touching women’s shoes? If he is (unconsciously) afraid of being castrated, heterosexual intercourse may arouse anxiety because it reminds him that women do not possess a penis. A foot may symbolise a penis, so it may prove less threatening to obtain sexual excitement from a woman’s shoes than from sexual relationships with real women. Foot fetishists endow women with a symbolic penis in an attempt to reduce castration anxiety (Fenichel, 1946). Alternatively, castration fears may emerge as a phobia, as in the case of ‘Little Hans’, who became anxious that a horse would bite him. The ‘big horse’ was, of course, the boy’s father, and Hans’s unconscious fear was that the father would cut off his penis. This is an example of Freud’s making an extraordinarily large leap from what Hans said to its theoretical meaning. According to Freud, all symptoms thus result from some powerful emotion, wish or thought which is repressed or otherwise defended against. These are usually sexual and often involve the memory of childhood sexual abuse. Freud took the conventional view of his time that abuse was unlikely to have occurred and he proposed that the ‘memory’ of these events instead revealed a fantasy about something for which the child longed: Almost all my women patients told me that they had been seduced by their father. I was driven to recognise in the end that these reports were untrue, and so came to understand that hysterical symptoms are derived from phantasies and not from real occurrences. (Freud, 1932) These ‘screen memories’ are really fantasies that are superimposed on to our early memories and so become indistinguishable from them. This is why Freud formed the view that some form of sexual instinct is present even in children and that the ‘blocking’ of this instinct could result in neurotic symptoms later in life. Psychoanalysis seeks to undo the defence mechanisms that lock a particular thought or wish in the unconscious. It will release it into consciousness, which should also remove the symptom that emerged as a derivative of the original impulse. Thus the therapist must frst fnd a link between the symptom and some unconscious mental process, for example through
61
Depth psychology
noting ‘blockages’ in the free association process or free associating to dream contents. Once the analyst has a clear picture of the source of the neurotic symptom, this underlying fear or wish is explained to the patient, together with examples of the defence mechanisms used and the nature of the derivative(s) of the original impulse. The patient will generally object that what the analyst said never took place: their objections are brushed aside by the analyst because they represent a feeble attempt to use defence mechanisms to avoid recognising the truth. This is in stark contrast to the therapeutic techniques discussed in Chapter 2, which assume the patient is able to articulate something of value. When the ego eventually faces up to these childhood memories and fears and appreciates the nature of the resistance that has been applied, there is no need to defend further, as the ego is fully aware of the ‘threat’. Thus all of the derivatives of the threat should disappear, including the troublesome symptoms.
Freud’s theory of infantile sexuality and adult character traits Freud believed that the newborn infant is essentially id – demanding total and immediate satisfaction of all instinctual demands. Freud (1923/1955) postulated that, in the early years of life, the sexual instinct was satisfed through oral contact with objects – exploring, sucking and later biting the nipple and other objects. Following the oral phase, the focus of sexual satisfaction shifts to the anus during potty training. Presenting the parents with ‘gifts’ (or, alternatively, withholding the faeces) at this stage is the child’s frst real opportunity to exercise control over the environment and faeces are thought to have strong symbolic links to money, babies and the penis. Phrases such as ‘stinking rich’, ‘flthy lucre’, ‘rolling in it’ and ‘where there’s muck there’s money’ do, after all, crave explanation. The penis and clitoris become the focus of the libido during what is known as the ‘phallic’ phase, but at the age of about 5 years young boys suffer the ‘Oedipus complex’ and the ‘castration complex’ – supposedly the most important and traumatic event in any man’s life. The Oedipus complex is named after the hero of Sophocles’ play who unknowingly kills his father and then commits incest with his mother. It occurs when the young boy lusts after his mother and so views his father as a sexual rival. He realises through observation that girls do not possess a penis and will come to a terrible conclusion – that a girl’s penis has been cut off by a jealous father. Anxious to avoid this catastrophe, the young boy will repress all sexual feelings for his mother, and will ‘identify’ with his father. Identifcation is a form of defence mechanism into which threatening objects are incorporated – in this instance, the father’s moral values are adopted to form the ‘superego’ or moral conscience. As girls do not possess a penis they blame their mother for this inadequacy and identify with their father in order to have a penis at their disposal (the ‘Electra complex’). Following these traumas, both sexes enter what is known as the ‘latency period’ in which there is no focus of sexual activity, then puberty (the ‘genital phase’). The degree of satisfaction or frustration that the infant encounters at each psychosexual stage is thought to determine his or her adult personality type. Kline (1981) compiled some lists of the adult characteristics that Freud and his followers have linked to overindulgence or frustration at various stages of psychosexual development. Those fxated at the oral incorporation phase are optimistic, dependent and curious and would love soft and milky foods. Oral sadists would be bitter, hostile, argumentative and jealous. Those fxated at the anal phase would become obstinate, excessively tidy and stingy, and would enjoy dealing with money (symbolic faeces). Why else would anyone choose to become an accountant or a tax offcial?
62
Depth psychology
Self-assessment question 4.3 What might go through a psychoanalyst’s mind if a male patient: a. b. c. d.
said that they loved their mother hesitated when producing an association to the word ‘stick’ says that they woke up with a start, feeling anxious seemed to be very dependent, optimistic and curious, and came to the session drinking a milk shake e. screamed and shouted at them, and said that the analyst was talking rubbish when claiming that something happened in their childhood?
Later developments Psychoanalysis has not stood still since the time of Freud and developments such as object relations theory (Fairbairn, 1952) and Lacanian psychoanalysis (Lacan and Wilden, 1968) have proved popular with some, even though their basic tenets seem even more abstract and less open to scientifc scrutiny than does the work of Freud. For example, Lacan’s approach is deeply philosophical and rooted in the meaning of language; discourses on ‘the real’ (in a lecture Lacan gave in 1953) for example seem to produce few hard propositions that are amenable to scientifc scrutiny. Fairbairn stresses the importance of relationships (either with other people or one’s own thoughts and fantasies) in development. His view was that we are motivated not by the drives from the id, but by relationships, and suggests that the unconscious mind develops from the ego (rather than vice versa) when repression of intolerable guilty thoughts effectively splits the ego into several parts. I know of no attempts to test such theories empirically and suspect that their proponents would be generally unsympathetic to the idea of doing so.
Carl Jung Carl Jung, a Swiss psychiatrist, corresponded and worked with Freud between 1905 and 1913. He developed an alternative approach to psychoanalysis, one which emphasised the role of mysticism, fantasy, symbolism and spirituality in human behaviour and which played down the role of sexual instincts as a motivating force. Jung believed that there were two, quite different, aspects of the unconscious mind. The ‘personal unconscious’ held memories that had been repressed or forgotten – but without the life and death instincts that so dominate the Freudian id. The personal unconscious played a minor role in Jung’s theories. Pride of place was given to the collective unconscious – a storehouse of collective memories that is (somehow) passed down from generation to generation and which contains the wisdom and experiences of humankind. Although Jung maintained that we are not aware of this region of our mind, he maintained that it predisposes us all to be sensitive to certain images and themes in life. For example, whereas Freud’s view of symbols was limited to a rather crude set of analogies between sexual organs and various objects (guns and penises, for example), Jung was fascinated by shapes called mandalas (see Figure 4.2). These are abstract shapes, representing wholeness or completeness: they are circular, fairly symmetrical and have signifcance in both Buddhist and Hindu religions where they signify wholeness or the universe: ‘It is the path to
63
Depth psychology
Figure 4.2 An example of a mandala the centre, to individuation . . . I knew that in fnding the mandala as an expression of the self I had attained what was for me the ultimate’ (Jung, 1963). As well as the mandala, Jung proposed that various types of character are represented in this collective unconscious. They are known as archetypes. Thus he suggests that we are predisposed to recognise characters such as the hero, wise old man, caring mother, the shadow (the darker, evil side of the self), the anima (the feminine side of a man) or the animus (the masculine side of a woman), the persona (the mask we present to the world, hiding our true self behind it). This might explain why similar characters play a part in myths and legends from around the world – although, of course, a simpler explanation is that mothers feature large in myths and legends because we are drawing on the personal experiences of having had a mother and not because of any inherited predisposition at all. It is very diffcult to see how one could possibly tell whether this part of Jung’s theory is correct. Neither is there any obvious physiological mechanism that would allow memories to be passed on directly from generation to generation, thereby perpetuating the archetypes. Jung paid much attention to apparent coincidences and believed that that visions and dreams could be genuinely informative, rather than just representing some horrible sexual urge from the id. For example, in 1913 and 1914 Jung experienced the following vision on two occasions: I saw a monstrous food covering all the northern and low-lying lands between the North Sea and the Alps. When it came up to Switzerland I saw that the mountains
64
Depth psychology
grew higher and higher to protect our country. I realized that a frightful catastrophe was in progress. I saw the mighty yellow waves, the foating rubble of civilization, and the drowned bodies of uncounted thousands. Then the whole sea turned to blood. (Jung, 1963) The First World War began shortly afterwards. This is an example of synchronicity – the idea that two events, one of which is external and the other internal (the war and the vision), occur at the same time without there being any clear causal link between them. Rather than treating the vision as a psychotic episode, Jung suggests that visions and dreams carry important messages from the unconscious and should be interpreted using symbolism. According to Jung, some people were thought to focus more on events external to themselves: other people, events in the world and so on. He coined the term ‘extraverts’ for these individuals and contrasted them with ‘intraverts’, who are much closer to their own inner self and who are far more interested in inner, spiritual and psychological experiences than the outer world. This idea resurfaces in modern trait theories of personality (Chapter 6) although it is important to note that Jung regarded introversion/extraversion as a type (someone is either extraverted or introverted) whereas trait theories regard it as a continuum (there are degrees of extraversion). Neuroses, according to Jung, came about because a fawed relationship with another person, adverse life events or some similar cause. For example, having a dominating mother may affect one’s ability to form relationships with other women. They were not the result of repressed sexual desires bubbling up from the id. Instead, such a relationship may affect the individuation process – the period of psychological maturation during which the person learns to embrace and understand their unconscious minds (e.g., through dream interpretation) – they become aware of the archetypes such as the shadow and anima/animus and ultimately understand themselves as whole people. Jung likewise identifed four different types of thought: he called them ‘psychological orientations’: [W]e must have a function which ascertains that something is there (sensation); a second function which established what it is (thinking); a third function which states whether it suits us or not, whether we wish to accept it or not (feeling), and a fourth function which indicates where it came from and where it is going (intuition). (Jung, 1963) The Myers-Briggs personality type indicator is a questionnaire which claims to categorise people on this basis. According to this theory people are classed either sensitive or using intuition, thinking or feeling, judging or perceiving, and extraverted or introverted and there are thus 16 (2 x 2 x 2 x 2) types of people. Although this questionnaire is used by some personnel managers, its theoretical basis is extremely weak (what is the evidence that these orientations exist?) and there are a number of rather devastating critiques (e.g., Stein and Swan, 2019; Hunsley et al., 2015; Lorr, 1991). It will not be mentioned again. If Freud’s ideas are diffcult to test, Jung’s are almost impossible to scrutinise scientifcally. First, many of the concepts are vague in the extreme. For example, it is not entirely clear what represents a mandala and what does not. Second, we have already observed that the notion of archetypes is almost impossible to falsify. Even if it could be shown that archetypes such as the ‘wise old man’ or ‘all-embracing mother’ exist and infuence our psychological
65
Depth psychology
development, it is not clear whether these fgures stem from the depth of our collective unconscious or from our memories of our own parents that have accumulated through childhood. Third, there is no evidence that the interpretations of dreams, visions and other inner experiences or accounts to explain the origins of neurosis are either accurate or consistent. Would two Jungian analysts come to the same interpretation? There appears to have been no attempt to fnd out in the research literature. Even if the interpretations were consistent, how could one tell whether or not they were actually correct? Most of Jung’s theories are therefore problematic from a scientifc point of view.
Alfred Adler Adler was another disciple of Freud’s who, like Jung, rejected Freud’s theories because of their emphasis on sexual motivation. A Marxist doctor who worked with working-class patients, Adler (see Adler and Jelliffe, 1917) noticed that someone with a physical disability might strive to excel in an area in which they might have thought it would be impossible. For example, people with artifcial legs have climbed Everest (Wainright, 2006); Winston Churchill stuttered but sought out a career that involved extensive public speaking and oratory (Begbie, 1921). Adler himself had been crippled in childhood, which may be part of the reason for his interest in the area. Whatever the reason, Adler believed that the main motivating factor was not sexual, but a striving for superiority. This model, in which personality has its roots in social factors rather than biological drives, ties in with his Marxist beliefs. The principle can apply to psychological factors as well as physical ones. We all start as weak, powerless children and seek to develop our skills. If this does not happen (when a child does not perform well at school or at sports for example) an inferiority complex can develop if, because of a poor self-image, the person introspects too much and obsesses about this issue. Such individuals will feel that they are a failure. By way of contrast, some others who obsess about it may develop a superiority complex, whereby they cover up for their perceived psychological weaknesses and ‘lord it’ over others. Both are thought to be pathological. Adler’s model of personality, individual psychology (Adler, 1930), takes a holistic view of the individual, rather than breaking the person down into components such as the id and the ego. Indeed, l society (friendships, humanitarian interests) l work (cooperative activity benefting others) l love (caring more for another than oneself)
are thought by Adler to be the most important forces in life; note that there is no mention of unconscious conficts here, and Adler stressed the importance of parenting in teaching the child how to respect others and develop in a pro-social way. Each person is thought to have some sort of idealised goal which is established early in life and which we each try to achieve – either through cooperating with others, or by manipulating and dominating them. The former approach is thought to lead to neurosis. Adler’s approach to assessment included clinical interviews, in which he was alert to signs of unhealthy need to dominate others (e.g., siblings) during childhood. This led him to theorise that birth order was all important in shaping personality, the frst-born feeling resentful of later arrivals and tending to bully them. Unfortunately we now know conclusively that birth
66
Depth psychology
order does not infuence personality (Rohrer et al., 2015) and so Adler’s assertion that such childhood experiences mould adult personality is open to doubt. Adlerian therapy is a cooperative exercise, very different from the Freudian model. Patients are urged to make explicit their goals and to develop new ones if necessary which will allow them to enhance society, rather than focusing on themselves. It is clear that many of the concepts used by Adler are vague and poorly defned. It is therefore diffcult to develop sound operational defnitions of many of the terms he used and therefore his theories, too, are diffcult to test empirically.
Critique What sources of data are useful in evaluating these theories? We discount patients’ and therapists’ accounts of the effectiveness of treatment, as neither of these will be objective; having invested thousands of pounds (or a lifetime’s work) in psychoanalysis, it would be diffcult to admit to oneself that it is completely useless (‘cognitive dissonance’). Scientifc studies of the effectiveness of psychoanalysis offer another possibility, although proper control groups are needed, since if patients do get better this may be because of ‘spontaneous remission’, cognitive dissonance or perhaps just having someone taking an interest in them. However, evaluating psychoanalysis is not suffcient to test the veracity of Freud’s theories. Farrell (1981) has observed that Freudian theory is not a single theory, but a collection of loosely linked subtheories. Thus even if psychoanalysis is shown to be of little value, other aspects (e.g., psychosexual personality types, structure of mind, defence mechanisms) may possibly be correct. Some aspects of Freud’s theories are clearly of central importance, since they are used as explanations in virtually all the subtheories. If it could be shown that behaviour is completely unaffected by unconscious factors (i.e., that the id is an unnecessary concept), that there is no evidence that defence mechanisms flter memories or perceptions on the basis of their emotional content or that the Oedipus complex does not exist, then many aspects of Freudian theory would collapse. Hence it makes sense to concentrate on such issues. Several books summarise the hundreds of studies that have been performed to scientifcally test some aspects of Freud’s theories, for example, Fisher and Greenberg (1996). Kline (1981) is one of the better ones because it is evaluative. Clearly, if an experiment has been poorly designed (perhaps not having control groups, valid tests, a large sample of participants) and/ or is based on a misunderstanding of Freudian theory (for example by asking children which parent they prefer in an attempt to discover unconscious Oedipal wishes), then its failure to show anything interesting says nothing about the merits of Freudian theory. Even though Kline’s book reports several experiments that do appear to give some support to some aspects of the theory, very few of them appear to have been successfully replicated by independent research groups. For example, Hammer’s (1953) work on symbolism showed that when men who were about to be surgically sterilised were asked to draw a house, a tree and a person, their trees were more often shown as being cut down and their houses lacked chimney pots when compared with control groups – a fnding which was explained in terms of sexual symbolism. But no one appears to have replicated this work. Likewise a fnding that people sweated more (showing anxiety) when exposed to symbols of death rather than to other shapes (Meissner, 1958) does not appear to have been replicated. One cannot have confdence that these are solid, reliable effects. Most of the literature described here is elderly, because these days very few psychologists see any point in attempting to test Freud’s theories.
67
Depth psychology
Lessons to be learned We can learn a lot about what a ‘good’ theory of personality should look like by examining the theories discussed above. Perhaps a good theory should provide the following: 1.
2.
3.
4.
5.
6.
68
Public data. No one other than the patient and analyst knew what went on in the psychoanalysis sessions, as recording or flming was not allowed. We therefore have no frm evidence that the patients behaved or responded as the analysts claimed. This does not imply that the analysts were mistaken or cheating somehow – but the problem is that we cannot show conclusively that events took place as the analysts claimed. If there are problems replicating this work (for example, identifying the roots of hysterical symptoms) it is not possible to go back and check how a particular set of inferences were drawn from the interaction between the patient and the analyst. At least when questionnaires are used to measure personality there is a record of precisely how each person responded. If there are later queries about the inferences drawn from these responses (how the questionnaires are scored, and what these scores mean) it is possible to go back and re-analyse the original data. This is simply not possible if no record is kept. Replicable results. If the theory is correct, then any properly trained analyst should be able to draw the same conclusion as Freud, if they are given patients with similar symptoms. The problem is that few, if any, studies explored this in any proper scientifc way, even when psychoanalysts were being trained. We do not know how consistent analysts are. This is analogous to scoring tests – drawing inferences about theoretical constructs from observed behaviour (responses to items). It is easy for different people to draw the same inferences about people from questionnaires, yet diffcult to do so from their responses to projective tests. A sound basis of evidence. Drawing inferences about how people in general behave from tiny samples of mainly female, upper-class people with mental health issues may be unwise. The rest of us may behave quite differently. Finding that obsessional neurosis is linked to the defence mechanism of isolation (for example) in just one or two patients does not imply that this is invariably the case – even if it is true in these few instances. Large, representative samples of people are needed if one wishes to develop a theory which applies to everyone. Clear concepts which have sound operational defnitions. Whilst some concepts (e.g., the anal character, striving for superiority) may be operationally defned using questionnaires and other techniques as described by Kline (1981) it is unclear how one could ever go about measuring others. How might you measure the death instinct, or strength of a person’s Oedipal complex, for example? The only real method is the claims of analysts. By contrast, it is easy to create operational defnitions for terms using questionnaires or rating scales; to assess sociability one just needs to write items which measure different aspects of the trait. Testable hypotheses. Cynicism may be (a) be an acceptable way of expressing aggressive feelings about an object or (b) it could stem from a strong sexual desire that is converted into its opposite through the defence mechanism of reaction formation. A cynical view could indicate unconscious feelings of either love or hatred. A theory which is capable of explaining either outcome may be of little value, for it will be impossible to refute it, to show that it is incorrect. It is generally agreed that this is a key aspect of any scientifc theory. Predictions, not just post-hoc explanations. Freud’s theories were modifed over the years in the light of new data; whenever the theory did not work it was made more complex
Depth psychology
7.
8.
so that it ftted the new observations. It is easy constantly to modify a theory so that it fts past behaviour like this; it is much harder to devise a theory which will predict how a new individual will behave, or how a person will behave in future. However as we will see in Chapters 17 and 18, modern theories of personality and abilities are rather good at this. Simple explanations. Oedipal conficts, psychosexual development, superego strength, repressed complexes and many other variables (of which the individual is frequently unaware) are thought to interact to infuence behaviour. Complex theories such as these are diffcult to test. Occam’s Razor is a principle which suggests that theories should be as simple as possible; psychoanalysis was a complex theory (involving unconscious motivation, etc.) from the outset. Contrast that with the simple theory that individual differences in intelligence come about because some people’s nerves transmit information faster than other people’s (Reed and Jensen, 1991). Practical effectiveness. If psychoanalysis is designed to cure neuroses, then any failure to do so must cast some doubt on the underlying theory. Freudian psychoanalysis is generally not an effective form of therapy (Eysenck, 1957) and some of the patients whom Freud claimed to have cured remained highly disturbed (Sulloway, 1979).
Depth theories in context Chapter 2 introduced some very simple theories which generally relied on self-reports, and assumed that people are able and willing to describe their behaviour accurately. The theories discussed in this chapter go to the other extreme, and assume that nothing that a person says can be taken at face value because we all practise self-deception in order to protect our egos from harm. That said, the great complexity of these theories, and their somewhat shaky evidential basis, are major problems for the approaches discussed in this chapter. Furthermore their use of unconscious mental processes makes the theories extraordinarily diffcult to test. Perhaps it is better to base theories on sound observation of behaviour, coupled with a healthy scepticism about whether self-reports are necessarily accurate – a compromise between the approaches discussed in Chapter 2 and the present chapter. This approach might allow theories to be tested and disproved in some cases, which is seen as a necessary step in developing a scientifc basis for individual difference research; such approaches will be considered from Chapter 6 onwards. However the nagging doubts raised by Michell (1997) remain. How can personality psychology hope to be ‘scientifc’ if its operational defnitions of theoretical terms are not proper numerical measurements? Has personality theory pretended to be more scientifc than it really is in order to distance itself from the qualitative models of the psychoanalysts? Is current research scientism, rather than science? It is good to worry about these things as you read on.
Suggested additional reading Most of the references in this chapter are embarrassingly old, for little is now being written about classical psychoanalytic theory. There really is no substitute for consulting the excellent and highly readable accounts written by Freud for non-specialist audiences. The shortest summary is the Five Lectures on Psycho-Analysis (Freud, 1910/1957) with the Introductory Lectures on Psychoanalysis (Freud, 1917/1964) offering slightly more detailed treatment, and
69
Depth psychology
the New Introductory Lectures on Psychoanalysis (Freud, 1932) providing an updated and extended summary of Freud’s main theories. There are also some useful summaries of Freud’s theories, such as those by Brown (1964) or Stevens (1983), but it would be a shame to consult these without frst reading some original Freud. Kline’s (1981) Fact and Fantasy critically discusses the methodological problems associated with the various experiments and tries to ignore the technically fawed literature when drawing some broad conclusions about how well Freud’s theories have stood up to empirical testing. Other useful texts are Erdelyi’s (1985) Psychoanalysis: Freud’s Cognitive Psychology (but only for those with a background in cognitive psychology) and Fisher and Greenberg’s (1996) Freud Scientifcally Appraised, which covers much of the more recent work, although sometimes less critically than one might expect. Jung’s theory does not seem to generate so many research papers: those that are published in the Journal of Analytical Psychology are usually deeply theoretical and/or clinically based and written by practitioners who are largely committed to the broad principles of Jung’s theories. They tend not to be based on empirical evidence and so do not really allow for the scientifc evaluation of Jung’s theories; they do not (perhaps cannot) seek to falsify the basic tenets of Jungian theory. For example, how could one research synchronicity? That said, the Society for Analytical Psychology offers a number of short papers for download on their website (https://www.thesap.org.uk/) which may be of interest. Adlerians publish in the Journal of Individual Psychology and once again, the material tends to be clinically based or suggests how the theory could be applied in other settings (e.g., education). But to my eye at any rate, it is all rather uncritical.
Answers to self-assessment questions SAQ 4.1 a. Free association is one of the main features of psychoanalysis. The patient agrees simply to say whatever comes into his or her mind, no matter how trivial, feeting or shocking it may be. The analyst speaks a word and observes the patient’s response and in particular notes any diffculty in producing an association. Such diffculties may suggest that the word is linked to some traumatic memory and so an unconscious mental process (called a ‘repression’ in Freud’s early work) strives to keep all associations with this event out of consciousness. b. A resistance is an unconscious force that prevents the patient from producing free associations with words which touch on unresolved conficts. The resistance prevents these associations from becoming conscious – it is the force that opposes free association. c. Repression is one example of a defence mechanism. It is a process by which a memory, wish or impulse (generally sexual in nature) is kept out of conscious awareness.
70
Depth psychology
SAQ 4.2 a. Any psychological mechanism whose aim is to prevent the contents of the id from reaching the ego. b. Isolation. c. The feeling of love would be turned to hate (by reaction formation), so that the unconscious feeling ‘I love him’ becomes ‘I hate him’. Projection turns this into ‘he hates me’. Freud suggested that these two defence mechanisms occurred in homosexuals, who were supposed to feel paranoid for this reason.
SAQ 4.3 a. It is not obvious what the analyst would make of this though it would not be taken as evidence for some sort of Oedipal fxation because the ego is aware of these feelings. If there were an unresolved Oedipal confict, the patient would use defence mechanisms to keep it unconscious, and so the patient would be unaware of the fact that he longed to sleep with his mother. b. This would indicate that some sort of defence mechanism was at work, inhibiting the free association process. They would probably want to probe childhood memories, examine other ‘blockages’ in association and so on to try to fnd out what the signifcance of the stick was to the individual. c. The defence mechanisms which keep the id contents from entering consciousness have relaxed too much, causing anxiety. ‘[The defence mechanisms] can no longer perform its function of preventing an interruption of sleep, but assumes instead the other function of promptly bringing sleep to an end. In doing so it is merely behaving like a conscientious night-watchman, who first carries out his duty by suppressing disturbances so that the townsmen may not be woken up, but afterwards continues to do his duty by himself waking the townsmen up, if the causes of the disturbance seem to him serious and of a kind that he cannot cope with’. Freud (1900) d. It sounds as if the individual may be fxated at the oral passive stage of development. e. Again, the defence mechanisms are trying to keep the ‘truth’ revealed by the psychoanalyst out of consciousness.
71
Depth psychology
References Adler, A. 1930. The Science of Living. London: George Allen & Unwin Ltd. Adler, A. & Jelliffe, S. E. 1917. Study of Organ Inferiority and Its Psychical Compensation: A Contribution to Clinical Medicine. New York: The Nervous and Mental Disease Publishing Company. Begbie, H. 1921. The Mirrors of Downing Street: Some Political Refections. London: G.P. Putnam’s Sons, the Knickerbocker Press Brown, J. A. C. 1964. Freud and the Post Freudians. Harmondsworth: Penguin. Brown, W. P. 1961. Conceptions of Perceptual Defence. British Journal of Psychology Monograph Supplements, 35. Bruner, J. S. & Postman, L. 1947. Emotional Selectivity in Perception and Reaction. Journal of Personality, 16, 69–77. Claridge, G. S. 1985. Origins of Mental Illness. Oxford: Blackwell. Dehaene, S., Charles, L., King, J. R. & Marti, S. 2014. Toward a Computational Theory of Conscious Processing. Current Opinion in Neurobiology, 25, 76–84. Dixon, N. F. 1981. Preconscious Processing. New York: Wiley. Erdelyi, M. H. 1985. Psychoanalysis: Freud’s Cognitive Psychology. New York: Freeman. Eysenck, H. J. 1957. The Dynamics of Anxiety and Hysteria; An Experimental Application of Modern Learning Theory to Psychiatry. New York: Frederick A. Praeger. Fairbairn, W. R. D. 1952. Psychoanalytic Studies of the Personality. London: Routledge and Kegan Paul: Tavistock Publications. Farrell, B. A. 1981. The Standing of Psychoanalysis. Oxford: Oxford University Press. Fenichel, O. 1946. The Psychoanalytic Theory of Neurosis. London: Routledge & Kegan Paul. Fisher, S. & Greenberg, R. P. 1996. Freud Scientifcally Appraised. New York: John Wiley. Freud, S. 1900. The Interpretation of Dreams. London: Hogarth Press. Freud, S. 1910/1957. Five Lectures on Psychoanalysis. Harmondsworth: Penguin. Freud, S. 1917/1964. Introductory Lectures on Psychoanalysis. Harmondsworth: Penguin. Freud, S. 1923/1955. Historical and Expository Works on Psychoanalysis: Two Enyclopedia Articles: Libido Theory. Harmondsworth: Penguin. Freud, S. 1925/1959. An Autobiographical Study. Harmondsworth: Penguin. Freud, S. 1932. New Introductory Lectures on Psychoanalysis. Harmondsworth: Penguin. Freud, S. & Breuer, J. 1893/1964. Studies in Hysteria. Harmondsworth: Penguin. Gibbons, H. 2009. Evaluative Priming from Subliminal Emotional Words: Insights from Event-Related Potentials and Individual Differences Related to Anxiety. Consciousness and Cognition, 18, 383–400. Hammer, E. F. 1953. An Investigation of Sexual Symbolism: A Study of HTPs of Eugenically Sterilised Subjects. Journal of Projective Techniques, 17, 401–415. Hendler, T., Rotshtein, P., Yeshurun, Y., Weizmann, T., Kahn, I., Ben-Bashat, D., Malach, R. & Bleich, A. 2003. Sensing the Invisible: Differential Sensitivity of Visual Cortex and Amygdala to Traumatic Context. Neuroimage, 19, 587–600. Hunsley, J., Lee, C. M., Wood, J. M. & Taylor, W. 2015. Controversial and Questionable Assessment Techniques. In S. O. Lilienfeld, S. J. Lynn & J. M. Lohr (eds.) Science and Pseudoscience in Clinical Psychology (2nd Edn). New York; London: Guilford Press. Jung, C. G. 1963. Memories, Dreams, Refections. London: Collins, Routledge & Kegan Paul. Kline, P. 1981. Fact and Fantasy in Freudian Theory. London: Methuen. Kline, P. 1984. Personality and Freudian Theory. London: Methuen.
72
Depth psychology
Lacan, J. & Wilden, A. 1968. The Language of the Self: The Function of Language in Psychoanalysis. Baltimore, MD: Johns Hopkins Press. Lorr, M. 1991. An Empirical Evaluation of the MBTI Typology. Personality and Individual Differences, 12, 1141–1145. Meissner, W. W. 1958. Affective Response to Death Symbols. Journal of Abnormal and Social Psychology, 56, 295–299. Michell, J. 1997. Quantitative Science and the Defnition of Measurement in Psychology. British Journal of Psychology, 88, 355–383. Reed, T. E. & Jensen, A. R. 1991. Arm Nerve Conduction Velocity (NCV), Brain NCV, Reaction Time and Intelligence. Intelligence, 15, 33–47. Rohrer, J. M., Egloff, B. & Schmukle, S. C. 2015. Examining the Effects of Birth Order on Personality. Proceedings of the National Academy of Sciences of the United States of America, 112, 14224–14229. Stein, R. & Swan, A. B. 2019. Deeply Confusing: Confating Diffculty with Deep Revelation on Personality Assessment. Social Psychological and Personality Science, 10, 514–521. Stevens, R. 1983. Freud and Psychoanalysis. Milton Keynes: Open University Press. Sulloway, F. 1979. Freud, Biologist of the Mind. London: Burnett. Wainright, M. 2006. A Double Amputee, Several Grandparents and a Playboy Model . . . It’s Been a Busy Week on Everest. The Guardian, 20 May, p. 3. Watt, C. A. & Morris, R. L. 1995. The Relationships among Performance on a Prototype Indicator of Perceptual Defense Vigilance, Personality, and Extrasensory Perception. Personality and Individual Differences, 19, 635–648.
73
Chapter 5 Factor analysis
Introduction This chapter may come as something as a surprise, and it is certainly very different from the previous one. The reason why it is here is that all modern theories of individual differences (personality, intelligence, mood and motivation) are based on a statistical technique called factor analysis, and so to properly understand how these theories developed (together with their strengths and limitations) it is necessary to have a basic grasp of what factor analysis is, and how it operates. The theories described in Chapter 6, 7, 9, 10, 12 and 16 all draw on the methods outlined in this chapter. The chapter introduces you to the logic of factor analysis in a way which involves little or no statistical knowledge; it is just necessary to know what a correlation is. (If you are unsure, read Appendix A.) It also introduces all the terms which you will encounter when reading books or research papers which use factor analysis to develop models of individual differences. It does not go into the detail which you need to actually perform factor analysis, or check that others have performed one properly; this is covered in Chapter 19. Although we introduced the term ‘trait’ in Chapter 3, traits have not featured greatly in the theories discussed thus far. However there has been some mention of them. Chapter 4 introduced the idea of the ‘anal character’, for example – and pointed out that Freud claimed that obstinacy, orderliness and meanness are three characteristics which tend to occur together in people (i.e., are all correlated together) and so form a trait. Kline’s questionnaire which was designed to measure the anal character did indeed fnd that people who said that they were stubborn also said that they kept their room tidy and avoided unnecessary expenses. Even though the items look different, people tend to respond to them in similar ways. The three items defne a trait. Factor analysis is a statistical technique which can be applied to selfreports, observations or ratings of behaviour in order to identify traits and develop scales to measure them, and so it is the basis for many modern theories of personality and cognitive abilities.
75
Factor analysis
Learning outcomes Having read this chapter you should be able to: l
l
l l
Show an understanding of how correlations between variables can be represented geometrically, and how factors can be positioned to describe the data. Explain terms such as factor matrix, factor loading, orthogonal solution, oblique solution, communality, unique factor, eigenvalue, principal components analysis, hierarchical analyses. Outline the difference between principal components analysis and factor analysis. Discuss how factor analysis may be used to identify personality and ability traits, free of any theoretical expectations.
‘Factor analysis’ can refer to two rather different statistical techniques. Exploratory factor analysis is the older (and simpler) technique and forms the basis of this chapter and the frst section of Chapter 19. Confrmatory factor analysis and its extensions (sometimes known as ‘structural equation modelling’, ‘latent variable analysis’ or ‘LISREL models’) are useful in many areas other than individual differences and are particularly popular in social psychology. A brief outline of this technique is given at the end of Chapter 19. Authors do not always make it clear whether exploratory or confrmatory factor analysis has been used. If you see the term ‘factor analysis’ in a journal, you should assume that it refers to an exploratory factor analysis. Suppose that, in the interests of science, you were to measure the following six variables (V1–V6) from a random sample of (say) 200 fellow students in the bar of your university or college late one evening: l l l l l l
V1 body weight (kg) V2 degree of slurring of speech (rated on a scale from 1 to 5) V3 length of leg (cm) V4 volubility of talking (rated on a scale from 1 to 10) V5 length of arm (cm) V6 degree of staggering when attempting to walk in a straight line (rated on a scale from 1 to 3).
It seems likely that V1, V3 and V5 will all vary together, since large people will tend to have long arms and legs and be heavier. These three items all measure some fundamental property of the individuals in your sample, namely their size. Similarly, it is likely that V2, V4 and V6 will all vary together – the amount of alcohol consumed in the bar is likely to be related to slurring of speech and talkativeness and to diffculty in walking in a straight line. Thus although we have collected six pieces of data, these questions measure only two ‘traits’, namely size and drunkenness. If you wanted to describe a person to a friend, you could just say that they were ‘small and sober’ rather than having to describe them on each of the six different characteristics listed; factor analysis shows that it is legitimate to summarise the data like this without losing much information.
76
Factor analysis
Exploratory factor analysis therefore does two things: l It shows how many distinct psychological factors are measured by a set of variables. In the
last example, there are two factors (size and drunkenness). l It shows which variables measure which constructs. In our example, it might well show
that V1, V3 and V5 measure one factor and that V2, V4 and V6 measure another, quite different factor. If the items relate to personality (e.g., ‘do you keep your desk tidy’. . .) the factors will be personality traits. Scores on entire tests (rather than test items) can also be factor analysed. The imprecision of language and different theoretical background of researchers means that the same psychological construct may be given several different names by different researchers. Do six different tests that are marketed as assessing anxiety, neuroticism, negative affect, ego strength, emotional instability and low self-actualisation measure six different aspects of personality or do they overlap? If these tests are given to a sample of people, factor analysis of their scores will show whether or not they all measure the same thing.
Discovering personality factors An enormous number of words can be used to describe personality, mood, emotions and abilities; Allport and Odbert (1936) found over 4500 words in the dictionary that could be used to describe personality alone. We will see in Chapter 6 that it is possible to identify the main personality traits by taking a sample of these words and asking people to rate how well each of these words describes them. Crucially, this approach does not require any sort of theoretical input from the psychologist, other than to try to identify the personality traits which are revealed by the computer program. This is in stark contrast to the theories considered in Chapters 2 and 4, which required psychologists to infer which personality traits were present (‘self-concept’, ‘oral character’, etc.) from clinical interviews. It certainly looks as if this is an objective, scientifc means of gathering data. The responses that people give when rating themselves are public and verifable. The analysis can be repeated in other samples, to check for replicability. It is possible to show that a trait which one researcher identifed is not found in another group, and so the technique produces results which are falsifable (and hence perhaps scientifc). In short, this approach to measuring personality by searching for traits overcomes a great number of the issues identifed at the end of Chapter 4. The technique is not restricted to test items or test scores. It would be possible to factor analyse reaction times from various types of cognitive test in order to determine which (if any) of these tasks were related. Alternatively, suppose that one took a group of schoolchildren who had no specifc training or practice in sport and assessed their performance in 30 sports by any mixture of coaches’ ratings, timings, mean length of throw, percentage of maiden overs obtained, goals scored or whatever performance measure is most appropriate for each sport. The only proviso is that each child must compete in each sport. Factor analysis would reveal whether individuals who were good at one ball game tended to be good at all of them, whether short-distance and long-distance track events formed two distinct groupings (and which events belonged to which group) and so on. Thus, instead of having to talk in terms of performance in 30 distinct areas, it would be possible to summarise this information by talking in terms of half a dozen basic sporting abilities (or however many abilities the factor analysis revealed).
77
Factor analysis
Exploratory factor analysis by inspection Table 5.1 shows a six-item questionnaire. Suppose that six students were asked to answer each question, using a fve-point rating scale as shown in the table. Table 5.2 shows how each student answered each question; these numbers show the extent to which each student agreed with each item (a ‘Likert scale’).
Table 5.1 Six-item personality questionnaire Strongly agree Agree Neutral Disagree Strongly disagree 1 I enjoy socialising
1
2
3
4
5
2 I often act on impulse
1
2
3
4
5
3 I am a cheerful sort of person
1
2
3
4
5
4 I often feel depressed
1
2
3
4
5
5 It is diffcult for me to get to sleep at night
1
2
3
4
5
6 Large crowds make me feel anxious
1
2
3
4
5
ExERCISE Table 5.2 shows the responses which six students gave when they completed the questionnaire shown in Table 5.1. Looking at the numbers in Table 5.2 alone, try to see whether there is any overlap between any of the six items – and if so, which.
Table 5.2 Responses of six students to the items in Table 5.1
78
Ql
Q2
Q3
Q4
Q5
Q6
Stephen
5
5
4
1
1
2
Ann
1
2
1
1
1
2
Paul
3
4
3
4
5
4
Janette
4
4
3
1
2
1
Michael
3
3
4
1
2
2
Christine
3
3
3
5
4
5
Factor analysis
Table 5.3 Correlations between the six items in Table 5.2 Ql
Q2
Q3
Q4
Q5
Ql
l.000
Q2
0.933
l.000
Q3
0.824
0.696
1.000
Q4
–0.096
–0.052
0.000
1.000
Q5
–0.005
0.058
0.lll
0.896
1.000
Q6
–0.167
–0.127
0.000
0.965
0.808
Q6
1.000
You may have observed that individuals’ responses to questions 1, 2 and 3 (Variables 1–3) tend to be similar. Stephen tends to agree with all three, Ann tends to disagree with them, while the others feel more or less neutral about them. These are rough approximations, of course, but you can see that no one who rates him- or herself as ‘1’ or ‘2’ on one of these three questions gives him- or herself a ‘4’ or ‘5’ on one of the others. This may suggest that enjoying socialising, acting on impulse and having a cheerful disposition tend to go together and so these three items form a factor – a personality trait. Much the same applies to items 4 to 6. Again, people such as Stephen or Ann who give themselves a low rating on one of these three questions also give themselves a low rating on the other two questions, while Christine gives herself high ratings on all three questions. Thus there seem to be two clusters of questions in this questionnaire. The frst group consists of questions 1, 2 and 3 and the second group consists of questions 4, 5 and 6. However, spotting these relationships by eye is diffcult. Had the order of the columns in Table 5.2 been rearranged, these relationships would have been diffcult or impossible to identify by just looking at the responses. Fortunately, however, a statistic called the correlation coeffcient makes it possible to determine whether individuals who have low scores on one variable tend to have low (or high) scores on other variables. A brief summary of correlational methods is given in Appendix A, which should be consulted now, if necessary. Table 5.3 shows the correlations calculated from the data in Table 5.1. The detailed calculation of these correlations is not shown, as statistics books such as Howell (2012) explain this in detail. These correlations confrm our suspicions about the relationships between the students’ responses to questions 1 to 3 and questions 4 to 6. Variables 1 to 3 correlate strongly together (0.933, 0.824 and 0.696, respectively) but hardly at all with questions 4 to 6 (−0.096, etc.). Similarly, variables 4 to 6 correlate highly together (0.896, 0.965 and 0.808, respectively), but hardly at all with questions 1 to 3. Thus it is possible to see from the table of correlations that variables 1 to 3 form one natural group and variables 4 to 6 form another – that is, the questionnaire actually measures two factors, one consisting of the frst three questions and the other consisting of the fnal three questions. While it is easy to see this from the correlations in Table 5.3, it should be remembered that the correlations shown there are hardly typical: l The data were constructed so that the correlations between the variables were either very
large or very small. In real life, correlations between variables would rarely be larger than 0.5, with many in the range 0.2–0.3. This makes it diffcult to identify patterns by eye.
79
Factor analysis l The questions were ordered so that the large correlations fell next to each other in Table 5.3.
If the questions had been presented in a different order, it would not be so easy to identify clusters of large correlations. l Only six questions were used, so there are only 15 correlations to consider. With 40 questions there would be 40 x 39 / 2 = 780 correlations to consider, making it much more diffcult to identify groups of intercorrelated items. There are several other problems associated with performing factor analysis ‘by eye’, not least of which is that different people might well reach different conclusions about the number and nature of the factors – the whole process is rather unscientifc. Fortunately, however, well-known mathematical methods can be used to identify factors from groups of variables that tend to correlate together and even the very largest factor analyses can now be performed on a desktop computer. Several statistics packages can be used to perform factor analysis, including R, SPSS, SYSTAT and SAS or the excellent and free factor analysis program by Lorenzo-Seva and Ferrando (2006). To understand how computers can perform this task, it is helpful to visualise the problem in a slightly different way and adopt a geometrical approach.
A geometric approach to factor analysis It is possible to represent correlation matrices geometrically. Variables are represented by straight lines of equal length, which all start at the same point. These lines are positioned such that the correlation between the variables is represented by the cosine of the angle between them. The cosine of an angle is a trigonometric function that can either be looked up in tables or computed directly by all but the simplest pocket calculators – you do not need to know what cosines mean, just where to fnd them. Table 5.4 shows a few cosines to give you a general idea of the concept. Remember that when the angle between two lines is small, the cosine is large and positive. When two lines are at right angles, the correlation (cosine) is zero. When the two lines are pointing in opposite directions, the correlation (cosine) is large and negative. It is but a small step to represent entire correlation matrices geometrically. A line is drawn anywhere on the page representing one of the variables – it does not matter which one. The other variables are represented by other lines of equal length, all of which fan out from one end of the frst line. The angles between the variables are, by convention, measured in a clockwise direction. Variables which have large positive correlations between them will fall close to each other, as Table 5.4 shows that large correlations (or cosines) result in small angles between the lines. Highly correlated variables literally point in the same direction, variables with large negative correlations between them point in opposite directions and uncorrelated variables point in completely different directions. Figure 5.1 shows a simple example based on three variables. The correlation between V1 and V2 is 0.0; the correlation between V1 and V3 is 0.5 and the correlation between V2 and V3 is 0.867. The correlation between V1 and V2 is represented by a pair of lines of equal length joined at one end; as the correlation between these variables is 0.0 the angle between them corresponds to a cosine of 0.0; that is, 90° (Table 5.4). The correlation between V1 and V3 is 0.5 corresponding to an angle of 60° and that between V2 and V3 is 0.867 (30°), so V2 and V3 are positioned as shown.
80
Factor analysis
Table 5.4 Table of cosines for graphical representation of correlations between variables Angle (in degrees)
Cosine of angle
0
1.000
15
0.966
30
0.867
45
0.707
60
0.500
75
0.259
90
0.000
120
–0.500
150
–0.867
180
–1.000
210
–0.867
240
–0.500
270
0.000
300
0.500
330
0.867
Figure 5.1 Correlations between three variables and their geometric representation
81
Factor analysis
Self-assessment question 5.1 Figure 5.2 shows the geometric representation of the correlations between fve variables. Use Table 5.4 to try to answer the following questions: a. Which two variables have the highest positive correlation? b. Which variable has a correlation of zero with V3? c. Which variable has a large negative correlation with V3?
ExERCISE Without looking ahead, try to sketch out roughly how the correlations between the six test items shown in Table 5.2 would look if they were represented geometrically.
It may have occurred to you that it is not always possible to represent correlations in two dimensions (i.e., on a fat sheet of paper). For example, if any of the correlations in Figure 5.1 had been altered to a different value, one of the lines would have to project outward from the page to some degree. This is no problem for the underlying mathematics of factor analysis, but it does mean that we cannot perform factor analysis simply by drawing diagrams. Figure 5.3 is a fairly good approximation of the data shown in Table 5.3. Ignoring lines F1 and F2 for now, you can see that the correlations between variables 1, 2 and 3 shown in this fgure are all very large and positive (that is, the angles between these lines are small).
Figure 5.2 Geometrical representation of correlations between fve variables 82
Factor analysis
Figure 5.3 Approximate geometrical representations of the correlations shown in Table 5.3
Similarly, the correlations between variables 4 to 6 are also large and positive. As variables 1 to 3 have near zero correlations with variables 4 to 6, variables 1, 2 and 3 are at 90° to variables 4, 5 and 6. Computer programs for factor analysis essentially try to ‘explain’ the correlations between the variables in terms of a smaller number of factors. It is good practice to talk about ‘common factors’ rather than just ‘factors’ – later on we will encounter ‘unique factors’ and it is safest to avoid confusion. In this example, it is clear that there are two clusters of correlations, so the information in Table 5.3 can be approximated by two common factors, each of which passes through a group of large correlations. The common factors are indicated by long lines labelled F1 and F2 in Figure 5.3. It should be clear that, from measuring the angle between each common factor and each variable, it is also possible to calculate the correlations between each variable and each common factor. Variables 1, 2 and 3 will all have very large correlations with factor 1 (in fact variable 2 will have a correlation of nearly 1.0 with factor 1, as factor 1 lies virtually on top of it). Variables 1, 2 and 3 will have correlations of about zero with factor 2, as they are virtually at right angles to it. Similarly, factor 2 is highly correlated with variables 4 to 6 and is virtually uncorrelated with variables 1 to 3 (because of the 90° angle between these variables and this factor). Do not worry for now where these factors come from or how they are positioned relative to the variables, as this will be covered in Chapter 19. In the example just examined, the two clusters of variables (and hence the common factors) are at right angles to each other – what is known technically as an ‘orthogonal solution’, a term that you should note. However, this need not always be the case. Consider the correlations shown in Figure 5.4. Here it is clear that there are two distinct clusters of variables, but it is equally clear that there is no way in which two orthogonal (i.e., uncorrelated) common factors, represented here by lines F1 and F2, can be placed through the centre of each cluster. It would clearly make sense to allow the factors to become correlated and to place one common factor through the middle of each cluster of variables. Factor analyses where the
83
Factor analysis
Figure 5.4 Correlations between six variables showing two factors at right angles
factors are themselves correlated (i.e., not at right angles) are known as ‘oblique solutions’. The correlations between factors form what is called a ‘factor pattern correlation matrix’. Try to remember these terms – it will be helpful when you come to interpret output from factor analyses, or read books or papers which use factor analysis. When an orthogonal solution is performed, all the correlations between the various factors are zero. (A correlation of zero implies an angle of 90° between each pair of factors, which is another way of saying that the factors are independent.) It is possible to show all the correlations between every item and every common factor in a table, called the ‘factor matrix’ or sometimes the ‘factor structure matrix’. These correlations between items and common factors are usually known as ‘factor loadings’. By convention, the common factors are shown as the columns of this table and the variables as the rows. The values in Table 5.5 were obtained by looking at the angles between each common factor and each variable in Figure 5.3 and translating these (roughly) into correlations using Table 5.4.
Table 5.5 Approximate factor matrix from Figure 5.3 showing the loadings of each variable on each factor Variable
84
Factor 1
Factor 2
V1
0.90
0.10
V2
0.98
0.00
V3
0.90
–0.10
V4
0.10
0.85
V5
0.00
0.98
V6
–0.10
0.85
Factor analysis
Self-assessment question 5.2 Without looking back, try to defne the following terms: a. b. c. d. e.
oblique solution factor loading factor structure matrix orthogonal solution factor pattern correlation matrix.
The factor matrix The factor matrix is extremely important, as it shows three things. From it one can tell which variables are represented by each factor, how well the factors (taken together) represent a particular variable, and how important each factor is.
Which variables make up each common factor This can be seen by noting which variables have loadings that are more extreme than some arbitrary cutoff. By convention, the cutoff value is 0.4. The easiest way to see the variables that ‘belong’ to a factor is thus to underline those that have loadings higher than 0.4 or less than −0.4. Thus in Table 5.5, one would deduce that factor 1 is a mixture of V1, V2 and V3 (but not V4, V5 or V6, as their loadings lie between –0.4 and 0.4). Likewise, factor 2 is a mixture of V4, V5 and V6. Thus the factor matrix can be used to give a tentative label to the common factor. For example, suppose that 100 ability items were given to a sample of people, and each item was scored as to whether it was answered correctly (1) or incorrectly (0). These 100 items could then be correlated together and factor analysed and it might be found that the variables that had substantial loadings on the frst common factor were concerned with spelling, vocabulary, knowledge of proverbs and verbal comprehension, while none of the other items (mathematical problems, puzzles requiring objects to be visualised, memory tests, etc.) showed large loadings on the factor. Because all of the high-loading items involve the use of language, it seems reasonable to call the frst common factor ‘verbal ability’, ‘language ability’ or something similar. Be warned, however, that there is no guarantee that labels given like this are necessarily correct. It is necessary to validate the factor exactly as described in Chapter 8 in order to ensure that this label is accurate. However, if the items that defne a common factor form a reliable scale that predicts teachers’ ratings of language ability correlates nicely with other well-trusted tests of verbal ability and does not correlate at all strongly with other measures of personality or ability, it seems likely that the factor was correctly identifed. You will remember that the square of the correlation coeffcient (i.e., the correlation coeffcient multiplied by itself) shows how much ‘variance’ is shared by two variables – or in simpler language, how much they overlap. Two variables with a correlation of 0.8 share 0.82 or 0.64 of their variance. (Consult Appendix A if this is unfamiliar ground.) As factor loadings are merely correlations between common factors and items, it follows that the square of each factor loading shows the amount of overlap between each variable and the common factor. This simple fnding forms the basis of two other main uses of the factor matrix.
85
Factor analysis
Amount of overlap between each variable and all of the common factors If the common factors are at right angles (an ‘orthogonal’ solution), it is simple to calculate how much of the variance of each variable is measured by the factors, by merely summing the squared factor loadings across factors. (When the common factors are not at right angles, life becomes more complicated.) It can be seen from Table 5.5 that 0.902 + 0.102 (= 0.82) of the variance of Variable 1 is ‘explained’ by the two factors. This amount is called the communality of that variable. A variable with a large communality has a large degree of overlap with one or more common factors. A low communality implies that all of the correlations between that variable and the common factors are small – that is, none of the common factors overlaps much with that variable. This might mean that the variable measures something that is conceptually quite different from the other variables in the analysis. For example, one personality item in among 100 ability test items would have a communality close to zero. It might also mean that a particular item is heavily infuenced by measurement error or extreme diffculty, for example, an item that is so easy that everyone obtained the correct answer or was ambiguously phrased so that no one could understand the question. Whatever the reason, a low communality implies that an item does not overlap with the common factors, either because it measures a different concept, because of excessive measurement error, or because there are few individual differences in the way in which people respond to the item.
Relative importance of the common factors It is possible to calculate how much of the variance each common factor accounts for. A common factor that accounts for 40% of the overlap between the variables in the original correlation matrix is clearly more important than another that explains only 20% of the variance. Once again it is necessary to assume that the common factors are orthogonal (at right angles to each other). The frst step is to calculate what is known as an eigenvalue for each factor by squaring the factor loadings and summing down the columns. Using the data shown in Table 5.5, it can be seen that the eigenvalue of factor 1 is 0.902 + 0.982 + 0.902 + 0.102 + 0.02 + (–0.10)2 or 2.60. If the eigenvalue is divided by the number of variables (six in this instance), it shows what proportion of variance is explained by this common factor. Here factor 1 explains 0.43 or 43% of the information in the original correlation matrix.
Self-assessment question 5.3 Try to defne the terms eigenvalue and communality. Then look back at Table 5.5 and: a. b. c. d.
86
calculate the communalities of variables 2, 3, 4, 5 and 6 calculate the eigenvalue of factor 2 work out what proportion of the variance is accounted for by factor 2 work out, by addition, the proportion of variance accounted for by factors 1 and 2 combined.
Factor analysis
Before we leave the factor matrix, it is worth clarifying one point that can cause confusion. Suppose that one of the factors in the analysis has a number of loadings that are both large and negative (e.g., –0.6, –0.8), some that are close to zero (e.g., –0.1, +0.2), but no large positive loadings. Suppose that the items with large negative loadings are from items such as ‘are you a nervous sort of person?’ and ‘do you worry a lot?’, where strong agreement is coded as ‘5’ and strong disagreement with the statements as ‘1’. The large negative correlations imply that the factor measures the opposite of nerviness and tendency to worry, so it may be tentatively identifed as ‘emotional stability’ or something similar. While it is perfectly acceptable to interpret factors in this way, it may sometimes be more convenient to reverse all the signs of all the variables’ loadings on that factor. Thus the loadings mentioned would change from –0.6, –0.8, –0.1 and +0.2 to +0.6, +0.8, +0.1 and –0.2. It is purely a matter of convenience whether one does this, as SAQ 5.4 will reveal. However, if you do alter the signs of all factor loadings you should also: l change the sign of the correlation between the ‘reversed’ factor and all the other factors
in the factor pattern correlation matrix l change the sign of all the ‘factor scores’ (discussed later) calculated from the factor
in question.
Self-assessment question 5.4 a. Use Table 5.4 to represent graphically the sets of correlations between one factor (F1) and two variables (V1 and V2) shown in Table 5.6. b. Next, change the sign of the correlation between the variables and F1 and replot the diagram. c. From this, try to explain how reversing the sign of all of a factor’s loadings alters the position of a factor.
Factor analysis and component analysis When you completed SAQ 5.3 (d), you will have noticed something rather odd. The two common factors combined only account for 83.4% of the variance in the original correlation matrix. Similarly, all the communalities are less than 1.0. What happened to the ‘lost’ 17% of the variance?
Table 5.6 Correlations between two variables and one factor F1
V1
F1
1.000
V1
–0.867
1.000
V2
–0.867
0.500
V2
1.000
87
Factor analysis
Factor analysis is essentially a technique for summarising information – for making broad generalisations from detailed sets of data. Here, we have taken the correlations between six variables, observed that they fall into two distinct clusters and so decided that it is more parsimonious to talk in terms of two factors rather than the original six variables. In other words, the number of constructs needed to describe the data has fallen from six (the number of variables) to two (the number of common factors). Like any approximation, this one is useful but not perfect. Some of the information in the original correlation matrix has been sacrifced in making this broad generalisation. Indeed, the only circumstance under which no information would have been lost would have been if V1, V2 and V3 had shown an intercorrelation of 1.0 (likewise for V4, V5 and V6) and if all the correlations between these two groups of variables had been precisely zero. Then (and only then!) we would have lost no information as a result of referring to two factors rather than six variables. This, then, is the frst part of the explanation of the missing variance. It is a necessary consequence of reducing the number of constructs from six to two. Suppose, however, that instead of extracting just two factors from the correlations between the six variables, one extracted six factors (all at right angles to each other and hence impossible to visualise). Since there are as many factors as there are variables, there should be no loss of information. You might expect that the six factors would explain all the information in the original correlation matrix. Whether or not this is the case depends on how the factor analysis is performed. There are two basic approaches to factor analysis. The simplest, called ‘principal components analysis’, assumes that six factors can indeed fully explain the information in the correlation matrix. Thus when six factors are extracted, each variable will have a communality of precisely 1.0 and the factors will between them account for 100% of the variation between the variables.
Principal components analysis More formally, the principal components model states that, for any variable being analysed: total variance = common factor variance + measurement error Suppose we construct a test in which ten items inquire about worries and negative moods: ‘my moods vary a lot’, ‘I feel agitated in confned spaces’, ‘I get embarrassed easily’, etc. with each item being answered on a fve-point scale ranging from ‘describes me well’ to ‘describes me badly’. Suppose that we know that all these items form a single factor which we call ‘neuroticism’. The principal components model therefore states that the way a person answers an item depends only on their level of the trait that is measured by all the items (here there is only one trait: neuroticism) plus some measurement error. Measurement error in this context might refect that some people’s mind may wander, that they may make a mistake when using the rating scale or anything else implying that their response to the item is not completely accurate. It implies that when as many factors are extracted as there are variables, these common factors can explain all of the information in the correlation matrix.
88
Factor analysis
Factor analysis The assumption that anything that is not measured by common factors must just be measurement error is rather a strong one. Each test item may have some morsel of ‘unique variance’ that is unique to that item, but which cannot be shared with other items. The extent to which people fear enclosed spaces may not be 100% determined by their level of neuroticism: people may vary in the extent to which they show a specifc fear of being shut in, perhaps as a result of childhood experiences. So to predict how a person would respond to the item, you would need to consider their level of neuroticism, their level of this specifc fear of being enclosed and random measurement error. The extent to which the item measures the specifc fear, unrelated to the personality factor(s) that run through all the items, is refected in the size of its unique factor variance. The factor analysis model thus assumes that for any variable or item being factor analysed: total variance = common factor variance + unique factor variance + measurement error The amount of unique factor variance in an item is refected in its communality. The more variance an item shares with the other items in the test, the higher are its correlations with both them and the common factors. Therefore the higher its communality will be. (Remind yourself how communality is calculated in order to check this.) An item with a substantial amount of unique factor variance will therefore never have a communality as high as 1.0, even if the factor analysis extracts as many factors as there are variables. This is because each item has some ‘unique factor variance’ associated with it: some people are just more (or less) afraid of enclosed spaces than you would expect from knowing their level of neuroticism.
Self-assessment question 5.5 Suppose children are given a 20-item attainment test to assess their knowledge of the geography of Austria, following a course of instruction. Each question has two possible answers, with a point being awarded for each correct answer and a point deducted for each incorrect answer. What would you expect to infuence a child’s performance on this test, a. assuming the principal components model b. assuming the factor model?
The test includes some items that involve naming Austrian towns, lakes and rivers. Which of the following behaviours would lead to random errors of measurement, and which might contribute to unique factor variance?
c. guessing when one does not know the answer d. having previously gone on holiday to Austria e. having a keen interest in fshing
89
Factor analysis
f.
getting confused by the answer sheet and marking an answer in the wrong box g. being absent from some of the classes on which the test is based h. Suppose that incorrect answers were not penalised. In that case, would guessing (rather than leaving some answers blank) affect unique factor variance, or error variance?
It follows that factor analysis is a more complicated process than component analysis. Whereas component analysis merely has to determine the number of factors to be extracted and how each variable should correlate with each factor, factor analysis also has to estimate (somehow) what the communality of each variable would be if as many factors as variables were extracted. Subtracting this fgure from 1.0 shows how much unique variance is associated with each variable. The good news is that, in practice, it does not seem to matter too much whether one performs a component analysis or a factor analysis, as both techniques generally lead to similar results. In fact, the authorities on factor analysis can be divided into three groups. Some believe that factor analysis should never be used. Leyland Wilkinson fought to keep factor analysis options out of his SYSTAT statistics package although commercial pressures eventually won. Contrariwise, some maintain that a form of factor analysis is the only justifable technique for analysing data from personality or ability items or tests (e.g., Carroll, 1993) and, fnally, some pragmatists argue that, as both techniques generally produce highly similar solutions, it does not really matter which of them one uses (Tabachnick and Fidell, 2018). One slightly worrying problem is that the loadings obtained from component analyses are always larger than those that result from factor analyses, as the former assume that each variable has a communality of 1.0, whereas the latter computes a value for the communality that is generally less than 1.0. Thus the results obtained from a component analysis always look more impressive (i.e., have larger common factor loadings) than the results of a factor analysis. This has clear implications for many rules of thumb, such as regarding factor loadings above 0.4 (or less than –0.4) as being ‘salient’ and disregarding those between –0.39 and +0.39, but these have not really been addressed in the literature. It is also vitally important that authors of papers should state clearly whether they have followed the factor analysis or component analysis model. Authors sometimes mention ‘factor analysis’ in the text even though they have performed principal components analysis.
Hierarchical analyses Suppose that one correlates 20 variables together and performs a factor analysis. Suppose that six factors are found – and that these factors are themselves correlated. It is possible to factor analyse the correlations between these factors to obtain what are known as ‘higher order’ factors. And indeed if these higher order factors are correlated, it might be possible to factor analyse the correlations between these . . . and so on, until (a) all of the factors are uncorrelated, or (b) only one factor is found, when the process stops. It is easiest to understand this by looking at Figure 5.6. The bottom of this diagram shows some variables (squares). A factor analysis shows that some of these have large loadings on
90
Factor analysis
Figure 5.5 A hierarchical factor analysis of 16 variables, showing three frst-order and one second-order factors
some factors; the leftmost fve variables have large loadings on one factor, the next six variables have large loadings on a second factor, the next three variables have large loadings on a third factor, and the remaining two variables do not have large loadings on any factors. They have small communalities. In real life, the frst fve variables might represent arithmetic problems, the next six items might be vocabulary items, the fnal fve items might measure memory. The reason why only three of the memory items form a factor needs to be investigated; it might be because they are too hard or too easy (so that no one or everyone answers them correctly) or because they do not measure memory at all. So the ‘frst-order factors’ represent arithmetic skill, vocabulary and memory. If it is found that these frst-order factors are correlated, factor analysing the correlations between these three factors might reveal one ‘second-order factor’ which might be called general intelligence or something similar. Hierarchical models are very common in personality and ability research, as will become apparent in the next chapter.
Uses of factor analysis Factor analysis has three main uses in psychology.
Test construction First, it may be used to construct tests. For example, 50 items might be written more or less at random; we do not have to assume that they all measure the same trait or state. The items would then be administered to a representative sample of several hundred individuals and
91
Factor analysis
scored (in the case of ability tests) so that a correct answer is coded as 1 and an incorrect answer as 0. Responses that are made using rating scales (as with most personality and attitude questionnaires) are simply entered in their raw form – one point if box (a) is endorsed, two if (b) is chosen, etc. The responses to these 50 items are then correlated together and factor analysed. The items that have high loadings on each factor measure the same underlying psychological construct and so form a scale. Thus it is possible to determine how to score the questionnaire in future simply by looking at the factor matrix. If items 1, 2, 10 and 12 are the only items that have substantial loadings on one factor, then one scale of the test should be the sum of these four items and no others. It is likely that some items will fail to load substantially on any of the factors (i.e., show low communalities). This could happen for a number of reasons: in the case of ability tests, the items could be so easy (or hard) that there is little or no variation in people’s scores. Personality items may refer to uncommon actions or feelings where there is again little variation – e.g., ‘there are occasions in my life when I have felt afraid’, an item with which everyone is likely to agree. Or items may fail because they are severely infuenced by measurement error or because they measure something different from any of the others that were administered. Test constructors do not usually worry about why, precisely, items fail to work as expected. Items that fail to load a factor are simply discarded without further ado. Thus factor analysis can show at a stroke: l how many distinct scales run through the test l which items belong to which scales (so indicating how the test should be scored) l which items in a test should be discarded.
Each of the scales then needs to be validated as described in Chapter 8 to ensure that we have properly understood what it measures. For example, scores on the factors may be correlated with scores on other questionnaires, used to predict educational success, etc.
Data reduction The second main use of factor is in data reduction, or ‘conceptual spring cleaning’. Huge numbers of tests have been developed to measure personality from different theoretical perspectives and it is not at all obvious whether these overlap. If we take six scales that measure personality traits, it is useful to know how much overlap there is between them. Do they really each assess quite different aspects of personality? At the other extreme, might they perhaps all measure the same trait under different names? Or is the truth somewhere in the middle? To fnd out it is simply necessary to administer the test items to a large sample, then factor analyse the correlations between the items. This will show precisely what the underlying structure truly is. For example, two factors may be found. The frst factor may have large loadings from all the items in tests 1, 5 and 6. All the substantial loadings on the second factor may come from items in tests 2, 3 and 4. Hence it is clear that tests 1, 5 and 6 measure precisely the same thing, as do tests 2, 3 and 4. Any high-fown theoretical distinctions about subtle differences between the scales can be shown to have no basis in fact and any rational psychologists seeing the results of such an analysis should be forced to think in terms of two (rather than six) theoretical constructs – a considerable simplifcation. (It is also possible to factor analyse the scores on the tests rather than responses to individual items when performing this sort of analysis. However, this makes the assumption that the tests were properly constructed in the frst place, which may not be the case.)
92
Factor analysis
Refning questionnaires The third main use of factor analysis is for checking the psychometric properties of questionnaires, particularly when they are to be used in new cultures or populations. For example, suppose that the manual of an Australian personality test suggests that all the oddnumbered items form one scale, while all the even-numbered items form another. When the test is given to a sample of people in the UK and the correlations between the items are calculated and factor analysed, two factors should be found, with all the odd-numbered items having substantial loadings on one factor and all the even-numbered items loading substantially on the other. If this structure is not found it shows that there are problems with the questionnaire, which should not be scored or used in its conventional form.
Summary It is easy to see why factor analysis is so important in individual differences and psychometrics. The same statistical technique can be used to construct tests, resolve theoretical disputes about the number and nature of factors measured by tests and questionnaires, and check whether tests work as they should or whether it is legitimate to use a particular test within a different population or a different culture. This chapter has provided a basic introduction to the principles of factor analysis. It has left many questions unanswered, including the following: l How does one decide how many factors should be extracted? l How can computer programs actually perform factor analysis? l How can we position factors through the centre of clusters of variables using purely
mathematical methods? l What types of data can usefully be factor analysed? l How should the results obtained from factor analytical studies be interpreted and reported?
These and other issues will be explored in Chapter 19.
Suggested additional reading Eysenck’s ancient paper on the logical basis of factor analysis (Eysenck, 1953) is still well worth reading, while Child (2006) offers another ‘student-friendly’ basic introduction to the topic.
Answers to self-assessment questions SAQ 5.1 a. The smallest angle between any pair of variables in Figure 5.2 is that between V1 and V2. Hence these variables are the most highly correlated. b. The angle between V3 and V2 is approximately 270° (moving clockwise). Table 5.4 shows that this corresponds to a correlation of 0. c. The angle between V5 and V3 is approximately 210°, which corresponds to a correlation of –0.87.
93
Factor analysis
SAQ 5.2 a. An oblique solution is a table of factor loadings in which the factors are not at right angles, but are themselves correlated together. b. A factor loading is the correlation between a variable and a factor. c. The factor structure matrix is a table showing the correlations between all variables and all factors. d. An orthogonal solution is a table of factor loadings in which the factors are all uncorrelated (i.e., at right angles to each other). e. A factor pattern correlation matrix is a table which shows the correlations between all of the factors in a factor analysis. For an orthogonal factor analysis, all of the correlations between factors will be 0 (as they are independent). For oblique solutions, the correlations will generally not be zero.
SAQ 5.3 The eigenvalue associated with a factor is the sum of the squared loadings on that factor, calculated across all of the variables. The communality of a variable is the sum of the squared loadings on that variable, calculated across all of the factors. a. The communality of variable 2 is 0.982 + 02 = 0.9604. The communalities of variables 3, 4, 5 and 6 are likewise 0.82, 0.7325, 0.9604 and 0.7325, respectively. b. The eigenvalue of factor 2 is 0.102 + 0.02 + (–0.102) + 0.852 + 0.982 + 0.852, or 2.4254. c. As there are six variables, factor 2 accounts for 0.4042 of the variance between them. d. The worked example in the text shows that factor 1 accounts for 0.423 of the variance. As the factors are orthogonal, factors 1 and 2 combined account for 0.43 + 0.4042 = 0.834 of the variance.
SAQ 5.4 a. To tackle this question, you can see from Table 5.4 that the angle which corresponds to a correlation of −0.867 is either 150° or 210° and for 0.5 the angle is 60° or 300°. (Why two possibilities for each? Well, it depends
94
Factor analysis
if you move clockwise or anti-clockwise when calculating the angles between the factor and each variable.) If these are plotted, the only two possible sets of lines that will represent these correlations are shown at the top of Figure 5.6. (It does not matter if you have drawn the fgure rotated in some other direction: the position of the lines on the page is arbitrary. The angle between the lines is the critical thing.) Note that you cannot put both variables on top of each other at an angle of 150° or 270° to the factor, for then the angle between them would be zero which corresponds to a correlation of 1.0 not the value of 0.5 shown in Table 5.6. b. Changing the correlations mean that the angles between the variables and the factor change to 30° and 330°, as shown at the bottom of Figure 5.6.
Figure 5.6 Answer to self-assessment question 5.4
c. Reversing all of the loadings between the variables and a given factor in effect causes the factor to have a correlation of –1.0 with its previous
95
Factor analysis
position. Table 5.4 shows that this corresponds to its pointing in the opposite direction (180°) to its previous position. Large negative correlations between a variable and a factor imply that the factor points away from the variables which have the largest correlation with it. Reversing the signs of all of the correlations reverses the direction of the factor so that it passes through the cluster of variables.
SAQ 5.5 a. The principal components model assumes that performance on the test is determined by just geographical knowledge (plus some random error; a child who knows only 20 facts about Austria may be lucky enough to see 15 of these appear on the test paper, whereas another who also knows 20 facts may see just fve of them represented in the test). b. The factor model accepts that performance on the 20-item test will be infuenced by geographical knowledge, random error plus unique variance: that is, some children will have better (or worse) knowledge about Austria than you would expect, given their performance on other geography tests. c. Random errors of measurement: some children will guess correctly (their score will be overestimated) and some will guess incorrectly (their score will be underestimated). d. Unique variance: these children may be able to draw on their holiday memories, and so will know more than you would expect them to. e. Unique variance: it is possible that fshing magazines may contain articles about fshing in Austrian rivers or lakes, and if so children who have read these may have a better knowledge of these aspects of Austrian geography than one would expect. f. Random error. Some of the wrongly-marked answers will be correct, and some incorrect and so guessing will not systematically increase children’s performance. g. Unique variance, as children who missed the classes may not know some of the information, and may therefore not perform to their normal level. h. If incorrect guessing was not penalised, children who guessed answers (rather than leaving them blank) would have a higher score, as some of the guesses will probably be correct. Guessing, rather than leaving items blank, would affect unique variance – children who guess achieve better scores than they should.
96
Factor analysis
References Allport, G. W. & Odbert, H. S. 1936. Trait Names: A Psycho-Lexical Study. Psychological Monographs, 47, whole issue. Carroll, J. B. 1993. Human Cognitive Abilities: A Survey of Factor-Analytic Studies. Cambridge: Cambridge University Press. Child, D. 2006. The Essentials of Factor Analysis. London: Continuum. Eysenck, H. J. 1953. The Logical Basis of Factor Analysis. American Psychologist, 8, 105–114. Howell, D. C. 2012. Statistical Methods for Psychology. Belmont, CA: Wadsworth Publishing. Lorenzo-Seva, U. & Ferrando, P. J. 2006. Factor: A Computer Program to Fit the Exploratory Factor Analysis Model. Behavior Research Methods, 38, 88–91. Tabachnick, B. G. & Fidell, L. S. 2018. Using Multivariate Statistics. New York: Pearson.
97
Chapter 6 Broad trait theories of personality
Introduction In our everyday lives, we often explain people’s behaviour through ascribing to them characteristics that are consistent across both time and situations (so-and-so is ‘nervy’ or ‘talkative’, for example). Trait theory develops this idea scientifcally, using factor analysis to discover the main ways in which people differ from one another and to design tests to measure these traits. It also checks whether people’s behaviour really is consistent. Trait theories are now widely believed to be the most useful means of studying personality – although there is less agreement about precisely which trait theory is the most appropriate. Several questionnaires have been developed to provide reliable and valid measures of the main personality traits, many of which have been found to have a biological basis, as discussed in Chapters 11 and 13.
Learning outcomes Having read this chapter you should be able to: l
l l
Discuss the relevance of the lexical hypothesis for discovering personality traits, and discuss the problems of circularity and the use of scales which have items of very similar meaning. Outline Cattell’s model of personality and discuss its strengths and problems. Chart the development of various five- and six-factor models of personality, and show familiarity with the main factors which have been found.
99
Broad trait theories of personality
l l
Describe Eysenck’s and Gray’s theory-driven model of personality. Consider criticisms of trait theory that are sometimes levied by social psychologists and learning theorists.
The personality theories outlined in Chapters 2 and 4 attempted to understand the nature of personality and its underlying processes through detailed analyses of individuals. The pattern of free associations generated by Freud’s small sample of patients led to elaborate and speculative theories about the structure of mind, child development, motivation and other aspects of personality structure and process. Kelly sought to understand each person’s unique pattern of personal constructs – the unique cognitive framework that each individual uses to anticipate and model events in the world. Rogers’ theory is still less analytical, viewing people as whole individuals who grow and develop over time, coming to like and accept themselves as they are. One problem that is common to all three theories is the diffculty of measuring personality. ‘Self-actualisation’ remains a vague abstraction. There is no clearly valid way of assessing unconscious mental processes such as the extent of an individual’s Oedipal fxation. Moreover, while the use of repertory grids may be particularly useful for understanding the number and nature of the constructs used by individuals, it cannot easily allow individuals to be compared. Three individuals might complete three different repertory grids, but ultimately there is no easy way of deciding how, precisely, their personalities differ. This is a problem, for in order to discover whether (for example) strict parenting affects personality in adulthood, it is necessary to measure all adults’ personality in the same way. It may, therefore, be useful to approach the problem from another direction. Instead of considering each individual’s phenomenological world (as Kelly and Rogers did), it may be better to develop techniques that will allow people’s personalities to be compared, that is, to map out the main ways in which people differ from one another. Once the main dimensions of personality have been established, it should be possible to develop tests to show the position of each person along each of these dimensions – thus allowing people’s personalities to be compared directly. Trait theories of personality follow precisely this approach. They assume (then later test) that there is a certain constancy about the way in which people behave – that is, they propose that behaviour is to some extent determined by certain characteristics of the individual, and not entirely by the situation. This seems to tie in well with personal experience. We very often describe people’s behaviour in terms of adjectives (‘bossy’, ‘timid’) implying that some feature of them, rather than the situations in which they fnd themselves, determines how they behave. (We shall discuss the role of situations in a little more detail later.) Describing people in terms of adjectives may sound a little reminiscent of Kelly’s theory and the self-theories encountered in Chapter 2; after all, one use of the repertory grid is to fnd out which individuals are construed in similar ways. However, the crucial difference lies in the type of data used. According to Kelly and the self-theorists, an individual’s perception of other people is what is important – it does not matter whether these perceptions are accurate. However, trait theories attempt to map out the ways in which people really do differ from one another and so depend on accurate techniques for measuring personality, abilities and attainments.
100
Broad trait theories of personality
The basic aims of trait theories are therefore simple: l To discover the main ways (dimensions) in which people differ. Once this is achieved then
l
l l
l
it is necessary to develop valid tests to measure these traits. One can then describe a person’s personality merely by noting their scores on all of these personality dimensions, in much the same way that the position of a ship may be defned by specifying its latitude and longitude. To check that scores on these dimensions do, indeed, stay reasonably constant across situations – for, if not, situations must determine behaviour, and there is no such thing as personality. To discover whether people’s scores are reasonably constant over their lifespan, as discussed in Chapter 14. To discover how and why these individual differences come about – for example, whether they are passed on genetically, through crucial events in childhood (as Freud would have us believe), through observing others (as suggested by Bandura) or because of something to do with the biology of our nervous systems. If traits are not socially determined, it would be interesting to see if the same traits are found in different cultures.
The present chapter deals with the frst two issues and (briefy) the last one. A discussion of personality processes is left until Chapters 11 and 13 and changes over the lifespan in Chapter 14.
Factor analysis applied to personality You will recall from Chapter 3 that the main purpose of measuring traits is to describe how an individual will behave most of the time – hence personality descriptions such as ‘John is an anxious person’ may lead us to suspect that John will be more likely than most to leap into the air when startled, to worry about examinations, to be wary of bees and so on. Using one word – ‘anxious’ – allows us to predict how John will probably behave in a whole range of situations. But how should one decide which words one should pluck from the dictionary in order to describe personality? The frst problem is that there are just too many words. Allport and Odbert (1936) were the frst to explore this ‘lexical approach’ to discovering personality and found over 4500 words that described personality traits. It would clearly be impossible to assess people on all of them. Second, if different investigators choose different adjectives, it will be diffcult to prove that these mean the same thing – for example, if one individual says that ‘Elizabeth is retiring’, while another says that she is ‘shy’, are they describing precisely the same personality characteristic? It is impossible to be certain just by looking at the meaning of the words, since language tends to be imprecise, with many adjectives having subtle nuances and/or multiple meanings (perhaps you thought that Elizabeth was coming to the end of her working life). Thus different people may use different words to describe the same feature of personality and may perhaps use the same words to describe different aspects of behaviour. This does not bode well for accurate measurement! Third, it is possible to develop scales that describe trivial aspects of behaviour, while leaving important aspects of behaviour unmeasured. For example, it would probably be easy to develop a scale to measure the extent to which individuals feel paranoid, even though paranoia is probably not an important feature of most people’s behaviour.
101
Broad trait theories of personality
Fourth, this approach only lets us describe personality. It cannot indicate what might cause people to have different personalities. A ‘circular defnition’ involves deducing presence of a trait from some behaviour and then using it explain that behaviour. You might feel pain from swollen tonsils, visit your physician and be told that your throat hurts because you have tonsillitis. Except, of course, that your physician has just translated your symptom into Latin (tonsillitis = swollen tonsils) and offered it to you as an explanation. Similarly, in the previous paragraph we noted that Elizabeth was quiet and retiring and deduced from this that she is less extraverted than other people. It is very tempting to then try to explain these behaviours by means of the trait of extraversion – to assert that the reason why Elizabeth avoids parties is because she is low on the trait of extraversion. To avoid falling into this trap it is vital to ensure that traits are far broader than the behaviours that are used to defne them. ‘Tonsillitis’ can be used as explanation only if the doctor understands that it implies an acute viral or bacterial infection that will probably last a few days and might perhaps be treated with penicillin. Extraversion is a useful trait if (and only if) it has broader implications than a liking for parties, etc. If it could be shown to have a genetic basis, to be heavily infuenced by some actions of one’s parents or related to some aspect of brain or neuronal functioning, then we could be confdent that the trait of extraversion is a real characteristic of the individual. The behaviour that we observe (the swollen tonsils, the lack of socialising) arises because of some real biological, social, developmental, chemical, cognitive, genetic or other peculiarity of that person. It is a ‘marker’ for some broad ‘source trait’ that leads the person to avoid social gatherings and is not just a convenient label to summarise a person’s behaviour. Without such evidence, traits are only convenient descriptions of behaviour – they cannot be used to explain it. Finally, it is worth mentioning that factor analysis makes some assumptions about the data being analysed – particularly that the relationship between two variables (items) can be described by a straight line rather than a U-shape or some other curve. If this is not the case, then it makes little sense to calculate a correlation, or perform a factor analysis. Should researchers check this? Yes. Do they? No, because it would involve looking at thousands of graphs.
Discovering the origins of personality How, then, can we go about discovering these ‘source traits’ – these broad dispositions that cause people to behave in certain ways? Chapter 5 showed that if we factor analyse various pieces of data from people who have been drinking alcohol to different extents, the technique tells us that one factor corresponds to the alcohol-induced behaviours (slurring, staggering, etc.). This obviously represents true causal infuences on behaviour: the three behaviours have a common origin (alcohol in the bloodstream) and this will affect other behaviours too – for example, blood pressure and vomiting. The logic of this approach to discovering source traits is simple – though rarely spelt out in detail in textbooks. We know that most things which infuence how people behave, think or feel affect more than one aspect of behaviour, thought or feeling. The fu virus makes me feel tired, raises my temperature, depresses my appetite, etc. Depression makes someone feel ‘down’ – but also affects their cognitive processes, sleep habits, appetite, and so on. Alcohol affects a person’s co-ordination, feeling of well-being, tiredness and a host of other things. And we know that genetic variations lead to not one but several very different changes in behaviour and/or physiology; those who have the genetic mutation which causes phenylketonuria typically show light coloured skin, cognitive impairment, eczema, behavioural
102
Broad trait theories of personality
problems and a host of other symptoms. Identifying groups of symptoms like this is how physicians diagnose a particular disease. Determining whether a person’s lethargy is associated with feeling down or a raised temperature will guide the physician in deciding whether they have depression or infuenza. And this is precisely what factor analysis also does. If we measure a wide range of quite different behaviours in a large sample of people and perform a factor analysis, the factors represent groups of behaviours which rise and fall together from person to person – just as the disease symptoms did in the medical examples. Each of these personality or ability traits may therefore have a common cause – and data need to be gathered to determine what this is. For example, it is found that adults who enjoy socialising with other people also tend to be dominant, cheerful, enjoy telling jokes, are happygo-lucky and full of energy. This fnding is interesting because there is no logical reason why people who are dominant have to be optimistic (etc.) – the terms look as if they are quite different, but factor analysis shows that they tend to correlate together. Having established that this factor exists, it is necessary to discover what causes people to display these characteristics. Are genetic variations responsible, for example? Is it something to do with the way in which parents bring their children up? Could it be something to do with levels of dopamine (a neurotransmitter) in parts of the brain? It is possible to develop and test plenty of theories such as this, and some of these will be considered in Chapter 11. However the remainder of this chapter will focus on the structure of human personality – the main traits which have been discovered. Some other traits are discussed in Chapters 7, 9 and 10. Before leaving this section I should mention that the use of factor analysis to discover ‘source traits’ was originally proposed by Cattell (e.g., 1973) and not everyone would be happy with the claim that factor analysis can identify true causal infuences. l It is logically possible that some characteristic of the person (whether it is physiological,
connected with child-rearing or other experience) might infuence just one aspect of behaviour. As it affects just one aspect of behaviour, it could not appear as a factor because it does not correlate with anything else. Can you think of any intervention, drug, gene or other infuence which only affects one aspect of a person’s behaviour? I cannot, but this does not mean that they do not exist. l In addition, just because groups of correlated variables have a common cause in the examples quoted above, perhaps it does not follow that they always do. If this is the case, factor analysis might not be effective in identifying main personality traits. l Some social psychologists might be unhappy with the idea that any questionnaire can ever reveal anything about real behaviour; perhaps each person’s reality is unique, so items such as ‘do you enjoy lively parties?’ mean very different things to different people. This makes it nonsensical to correlate people’s scores. We try to avoid this by designing questionnaires which inquire about behaviour to reduce the amount of self-insight required (see Chapter 20), but it is a possible diffculty. l Finally, Michell’s (1997) work reminds us that it is dangerous to treat the numbers arising from Likert scales (or any other rating scale) as representing a quantity. It may therefore be inadvisable to perform statistical analyses on them. Most researchers ignore this issue. To summarise: in order to fnd personality traits it is simply necessary to measure a range of variables (behaviours, ratings or self-reports) from a large number of people, correlate these variables together and carry out a factor analysis as described in Chapter 5. The factors that emerge from this analysis will show which behaviours tend to occur together and these are termed ‘personality traits’. If people’s scores on these traits can be shown to be infuenced by genetic, biological, social, developmental or cognitive variables, then we can be confdent that
103
Broad trait theories of personality
a true ‘source trait’ has been identifed, which can then be used to explain behaviour. For an alternative view, see Buss and Craik (1983).
Narrow and broad traits One diffcult problem with factor analysis is deciding which variables to measure in the frst place. For (fairly obviously) which factors are found depends on what is measured; if sociability, dominance, cheerfulness, etc. are not assessed, then the factor analysis cannot possibly reveal a factor of Extraversion. Determining which variables to assess is therefore of crucial importance, and there are three main possibilities. l The remainder of this chapter looks at broad personality traits – those which are developed
from measuring as many varied aspects of behaviour as possible. The aim of researchers who follow this approach is to include in the factor analysis virtually all characteristics of people which could possibly be measured. The domain of items is as broad as possible. l Narrow personality traits arise when someone wants to focus on a more specifc, albeit theoretically-driven trait. For example, Kline’s work on the ‘anal character’ involved asking questions which tapped individual differences in obstinacy, orderliness and meanness. The aim of this type of research is not to map out the entire structure of personality, but just to see if a narrower sample of items form a trait. We return to these narrow traits in Chapters 9 and 10. The narrower domain of items is determined by some theory. l Finally, some researchers take narrowness to an extreme – and essentially ask the same question over and over, making the domain of items trivially narrow. Someone who agrees that they ‘continue to punish a person who has done something that I think is wrong’ must surely also agree that ‘I continue to be hard on others who have hurt me’ for these items mean almost exactly the same thing. If a person understands the meaning of the words, they must surely answer them in the same way, which means that the items must correlate and must form a factor. There is no point in even gathering data to demonstrate this. The two items above are from a genuine test (which I do not cite out of kindness to its authors) and there are a surprising number of such questionnaires in everyday use. I do not see how they can measure traits at all (Cooper, in press), although many psychologists use them uncritically.
Self-assessment question 6.1 What are the four main problems involved in selecting trait descriptors?
The lexical hypothesis and Cattell’s theory In the previous section, I suggested (somewhat glibly) that, in order to discover the main broad personality factors, it is merely necessary to measure huge numbers of behaviours, correlate them together and factor analyse the results. The problem is, of course, that deciding how to quantify human behaviour is a remarkably diffcult task. Measures of behaviour often contain dependencies. For example, it is not easy for a person to eat, drink and talk simultaneously (though I have known several who try) so if an individual is eating, they will not be drinking
104
Broad trait theories of personality
or talking. We shall note in Chapter 19 that such dependencies can play havoc with factor analyses – they can produce entirely spurious factors. Cattell (1946) argued that, instead of measuring behaviour, we should focus on adjectives. Specifcally, he argues that every interesting aspect of personality would probably have been observed during the course of evolution and a term describing it would have entered the language. Thus the dictionary should contain words that describe every conceivable personality characteristic – generally in the form of adjectives such as ‘happy’, ‘bad tempered’, ‘anxious’, ‘uptight’, ‘phlegmatic’ and so on. Suppose a sample of these adjectives were to be extracted from a dictionary. It would probably be the case that some of these words mean the same as others – ‘anxious’ and ‘uptight’, for example. Wherever this happens, one or other of the adjectives could be dropped. This would produce a sample of words that describe different aspects of personality. Suppose now that all of these words were put into a questionnaire (so that people could rate themselves on each adjective) or that trained raters assessed people’s behaviour on the basis of these adjectives. It would then be possible to collect data from a large sample of people (several hundred, at least) and use factor analysis to determine which of these ratings tend to group together. The factors that emerge should be the main personality ‘source traits’. The assumption, that the factor analysis of correlations between adjectives that differ in meaning will reveal all of the main personality traits, is sometimes referred to as the ‘lexical hypothesis’. All one does is measure a sample of people on as diverse a range of items as possible, and let factor analysis identify the source traits. The approach does not require any theory about the number or nature of such traits; the entire process is mathematically driven. Cattell drew heavily on the work of Allport and Odbert (1936) when attempting to identify the adjectives (trait names) which described personality. These authors extracted a list of adjectives that could be used to describe people from the 1925 edition of Webster’s New International Dictionary; this list of adjectives is sometimes known as the ‘personality sphere’. Cattell then asked English scholars to eliminate synonyms from this list in order to avoid the problem of interdependent variables; this took the total number of adjectives down to 171. Unfortunately, it was simply not feasible to perform a factor analysis on 171 variables using hand calculators and the primitive computers of the time, and so Cattell reduced the number of variables down to 45 by eliminating some variables which, although not judged as synonyms, proved to be highly correlated with other variables (Cattell, 1957). He also added in the names of some personality traits identifed by other theorists, to ensure that nothing was left out. Thus the original list of 4500 adjectives was reduced to 180 and then to 45. These terms were used to rate the behaviour of a group of people, whereupon the correlations between the 45 trait descriptors were computed and factor analysed, yielding some 23 factors. The assessment of behaviour through ratings was time consuming. Furthermore, there may be some personality traits (such as feelings and attitudes) that, except in pathological cases, may not be obvious to someone who is rating behaviour. Someone may feel frightened when walking down a lonely road at night, but step out bravely, and so will not appear to be frightened. The range of situations in which people are observed will inevitably be limited, so that someone who is grumpy when they get out of bed (for example) might never be rated thus. Moreover, the very presence of the rater may alter some behaviours – our pedestrian may not feel as anxious as usual because a friend with a clipboard is close by. Cattell therefore also attempted to measure personality traits by methods other than observers’ ratings – most notably through the use of questionnaires and ‘objective tests’. The hope is that many of the personality factors identifed in ratings will also appear when responses to questionnaires and objective tests are analysed.
105
Broad trait theories of personality
ExErCIsE The crucial assumption of the lexical hypothesis is that single adjectives in the dictionary can describe all types of behaviour. Can you think of any patterns of behaviour that are not described by a single word? You might like to discuss some with your friends, to ensure that you all agree that the patterns of behaviour do actually exist. To get you started, I cannot think of any English word that describes taking pleasure in others’ misfortunes (as in the German Schadenfreude). If you can think up many such patterns of behaviour, this may indicate that the lexical hypothesis is unlikely to be able to cover the full range of behaviours (or, of course, it may just be that the behaviours that you notice in other people are illusory – you may think that you notice characteristics that do not, in fact, exist).
Self-assessment question 6.2 How might you tell whether a personality factor derived from a self-report scale is the same as a factor that appears in ratings of behaviour?
The 16PF and other tests Cattell’s most famous tests for measuring personality are the Sixteen Personality Factor Questionnaires, the 16PF (Cattell and Cattell, 1995; Cattell et al., 1970; Cattell and King, 2000). I refer to these questionnaires in the plural because there is a whole series of different forms of these self-report questionnaires that yield scores on 15 personality traits plus intelligence. An ‘industrial version’ was published in 2000, otherwise the ffth edition (1993 in the USA, 1995 in the UK) is the most recent, with scales that have somewhat less measurement error (are ‘more reliable’) than in previous editions. However, the correlation between some of the scales in the ffth edition and their supposedly equivalent predecessors is sometimes less than 0.4! The scales of the 16PF are shown at the top of Table 6.1. Two words are given to describe each trait and these show the characteristics of a low and high scorer. However, it is important to remember that these are traits (not personality types) and so the questionnaire does not categorise someone as either ‘warm’ or ‘reserved’, for example. Instead, for each person, the questionnaire will produce a score that will usually be somewhere between the two extremes. Each scale in the 16PF is identifed by a ‘source trait index’. These start at A, B, etc. but there are some gaps; the 16PF does not measure Factor D, for example. This is because Cattell found that some of the traits which he identifed in ratings of behaviour could not be measured through self-report questionnaires. For example, how could you write questionnaire items to measure modesty? Anyone who said they were modest would not be! Contrariwise, some traits may possibly show up in self-report questionnaires but not through behaviour ratings, etc. and these are prefxed Q in Table 6.1. So ‘radical/conservative views’ (Q1) can
106
Broad trait theories of personality
Table 6.1 Personality traits measured by the 16PF questionnaire, plus seven other traits identifed in questionnaires or ratings but not both Source trait In Q-data In L-data Description index
Typical item: do you . . .
A
✔
✔
Warm/reserved
Know how to comfort others
B
✔
✔
Intellect
Learn quickly
C
✔
✔
Ego strength
Am relaxed most of the time
E
✔
✔
Assertive/cooperative
Take charge
F
✔
✔
Cheerful/sober
Am the life of the party
G
✔
✔
Conscientious/expedient Try to follow the rules
H
✔
✔
socially bold/shy
start conversations
I
✔
✔
sensitive/self-reliant
Cry during movies
L
✔
✔
suspicious/trusting
Distrust people
M
✔
✔
Imaginative/practical
Enjoy wild fights of fantasy
N
✔
✔
socially shrewd/artless
reveal little about yourself
O
✔
✔
Guilty/self-assured
Feel crushed by setbacks
Q1
✔
✔
radical/conservative
Enjoy hearing new ideas
Q2
✔
✔
self-suffcient/affliative
Want to be left alone
Q3
✔
✔
Controlled/impulsive
Like order
Q4
✔
✔
Tense/tranquil
Get irritated easily
Factors found in ratings or questionnaires, but not both D
✔
Insecure excitability (conduct disorder)
Bubble over with ideas
J
✔
Zestful
Enjoy leading group activities
K
✔
Mature socialisation/ boorishness
Interested in socially important issues
P
✔
sanguine casualness
rarely strays into fantasy
Q5
✔
Group dedication
Enjoy projects into which I can throw all energy
Q6
✔
social panache
Good at inventing justifcations when in wrong
Q7
✔
Explicit self-expression
Usually have a clear idea of what to do next
107
Broad trait theories of personality
only supposedly be detected via questionnaires – and it is reasonable to claim that this would not be obvious to someone who just observed one’s behaviour. The bottom part of Table 6.1 shows the traits which were identifed in either ratings or questionnaires – but not in both. Cattell relished inventing novel names for factors: his Factor H was termed ‘Threctia/Parmia’, for example. I cannot see the point in reproducing these names here, and neither can most authors. This is the reason why different authors who describe Cattell’s 16PF factors often use different words. It is therefore always much safer to use the ‘source trait index’ numbers and letters when describing a 16PF scale (‘Factor C’ for example) rather than descriptive words. Cattell developed other questionnaires, too. There is also a variant of the 16PF designed for adolescents (the High School Personality Questionnaire), 8 to 14 year olds (the Children’s Personality Questionnaire), 6 to 8 year olds (the Early School Personality Questionnaire) and 4 to 6 year olds (the Preschool Personality Questionnaire). There is some variation in the number and nature of the factors measured by these questionnaires, which Cattell (e.g., 1973) explains by suggesting that some traits develop later than others. He also explored personality in clinical groups and developed the Clinical Analysis Questionnaire, which measures (among other things) some seven distinct factors of depression. More controversially, Cattell tried to develop ‘objective tests’ to measure personality. Some were just bizarre, and included ‘amount of laughter at jokes’, ‘willingness to play practical jokes’, ‘readiness to imitate animal sounds’, ‘distance covered in a brass fnger maze with and without electric shock’ (Cattell and Kline, 1977). The ‘Objective Analytic Test Battery’ (Cattell and Schuerger, 1978) claimed to measure a number of personality traits, some of which should have been the same as those in the 16PF. These ‘objective tests’ measured personality in ways which were not obvious to the participants who took part. For example, participants completed an attitude questionnaire, the questions of which were totally irrelevant. The test was scored by noting the extent to which people chose the middle (neutral) option rather than the extremes – which might show whether they tended to have strong opinions about issues. Cattell also tried to measure personality by fnding how many factors were needed to explain individual differences in humour, in the hope that some of these would also refect personality. The Humour Test of Personality (Cattell and Tollefson, 1966) presented participants with 104 pairs of jokes and asked them to decide which was the funnier. (The jokes were all terrible, which made this task much less fun than it sounds.) The Music Preference Test (Cattell and Eber, 1954) did the same thing with music.
Overview of Cattell’s work It is probably fair to say that Cattell was more interested in determining the number and nature of the main personality factors through statistical analysis than exploring their origins through the methods of experimental psychology. Although he did a little work exploring whether some of his personality factors were infuenced by genetics or the environment in which children grew up, he developed his own method for performing such analyses which turned out to be fawed (Loehlin, 1965). So the process of discovering how and why individuals develop different levels of each of these factors was largely left for others to attempt. It seems from the previous section that Cattell’s work was huge in scope. He clearly understood the need to sample a wide range of items in order to discover the broad personality source traits, and developed an amazing array of questionnaires and other techniques for measuring them. So why consider any alternative? The problem is that no one apart from Cattell and his colleagues can fnd anything approaching 16 factors in the 16PF. If the questionnaires do, indeed, measure 16 distinct personality traits, it should be possible to
108
Broad trait theories of personality
administer these to a large sample of people, correlate the items together, perform the factor analysis as described in Chapter 19 and discover that the items load 16 factors precisely as claimed by Cattell. The literature on this really is defnitive – for whatever reason, the 16PF simply does not measure anything even approaching 16 factors (Barrett and Kline, 1982a; Byravan and Ramanaiah, 1995; Howarth, 1976; Matthews, 1989; Wells and Good, 1977). Most of the factors which Cattell studied simply do not exist, and so there is no point in even trying to explore their origins. This means that the 16PF should not be scored according to the instructions in the handbook, as each ‘scale score’ on the questionnaire is a mixture of quite different traits. Items simply do not belong to the scales which the test manual claims. Everyone who has been using the test has been scoring it in a meaningless fashion. Despite this fundamental problem, the test is still widely used, particularly in occupational psychology, which is a source of concern. Cattell’s attempts to measure personality by means of ‘objective tests’ were, if anything, even less successful. Kline and Cooper (1984) showed that the factors in the ‘Objective Analytic Test Battery’ resolutely refused to emerge as expected and that the test appeared to measure ability rather better than it assessed personality. It completely failed to measure any of the expected personality factors. The IPAT music test is just as bad (Kline, 1973). Cattell (Cattell, 1986; Cattell and Krug, 1986) disagreed with the studies that failed to demonstrate the expected structure in the 16PF. He argued that everyone else had failed to perform factor analyses properly – an opinion with which most psychometricians would disagree. The simplest way to rebut the criticisms would be for Cattell to produce a clear, 16-factor matrix arising from his studies with the 16PF. The 1970 handbook for the 16PF did not include such a table and when one was eventually produced (Paul Barrett, personal communication) it did not show the clear factors that everyone expected. The reasons for these problems are uncertain. Recall that the 16PF was constructed through factor analyses performed by hand – an enormously laborious and error-prone process. It may be that errors crept in here. But the problem is that the raw data which Cattell analysed were never made available; others could not re-analyse it to see if they came to the same conclusions. It seems that there are fewer personality traits than suggested by Cattell and that all the tests designed to measure Cattell’s personality traits are fawed. Given these problems, it is hardly worthwhile discussing the nature of Cattell’s factors or considering the evidence for the validity of the 16PF. So whilst Cattell’s work is important in that he showed how work in this area could usefully proceed, his implementation had major issues. Instead, we shall consider a more modern lexical approach to personality measurement.
Self-assessment question 6.3 When a test is factor analysed, why is it important to check that the items that form factors correspond to the instructions for scoring the test that are given in the test handbook?
The ‘Big Five’ models of personality Tupes and Christal Several researchers have developed slightly different models of personality based on fve (rather than 16) factors. Tupes and Christal (1961/1992) were military psychologists who
109
Broad trait theories of personality
became interested in reapplying the behaviour rating scales for personality devised by Cattell and described in the previous section. As before, the aim was to factor analyse the correlations between the adjectives to reveal the fundamental dimensions of personality. They asked several groups of 25–30 Air Force trainee offcers who all lived and trained together to rate each other on 35 of the variables that Cattell used. The average rating given to a person by the other 24–29 members of the group on each variable was calculated and it was argued that this would be more accurate than just using one rater, as Cattell did. By using large samples of people (790 in one study), more modern methods of factor analysis and a computer to perform the analysis, they hoped to discover a factor structure that was more accurate and hopefully more replicable than Cattell’s. They repeated this analysis in eight samples (including students and older offcers) to check whether the same factors emerged consistently from each sample – which they did. Tupes and Christal consistently found just fve personality factors rather than Cattell’s 16, as shown in Table 6.2. Unlike Cattell, Tupes and Christal seem to have found some highly replicable personality factors, most of which also seem to make psychological sense, although the ffth (‘Culture’) factor is harder to interpret just by looking at its large loadings. The only real problem is that Passini and Norman (1966) discovered when people were asked to rate the behaviour of complete strangers (whose behaviour they had not observed), the same factors emerged! It is therefore possible that the raters used by Tupes and Christal may not have recorded how people actually behaved, but instead how they thought the people being rated should behave (Lassiter and Briggs, 1990). Thus the factors may be based on stereotypes, rather than accurate assessments of behaviour.
Norman and Goldberg Because of the problems with Cattell’s work, Norman (1967 and available online from ERIC at https://eric.ed.gov/?id=ED014738) started again from scratch, re-visiting and reducing the Allport and Odbert list of trait descriptors. He compiled a list of 2800 terms that could describe personality, describing in detail how and why the original list of terms was reduced. These were sorted into 75 clusters of words having similar meanings. Goldberg (1990) took 1710 of Norman’s trait descriptors and persuaded a group of college students to rate themselves on each term. He then calculated a score for each person on each
Table 6.2 Five factor model of Tupes and Christal (1961/1992)
110
Factor
Terms from rating scale
surgency or Extraversion
Talkativeness, Frankness, Adventurousness, Assertiveness, sociability, Energetic
Agreeableness
Good-Natured, Not Jealous, Emotionally Mature, Mildness, Cooperativeness, Trustfulness
Dependability
Orderliness, responsibility, Conscientiousness, Perseverance,
Emotional stability
Not Neurotic, Placid, Poised, Not Hypochondriacal, Calm, Emotionally stable, self-suffcient
Culture
Cultured, Esthetically Fastidious, Imaginative, socially Polished, Independent-Minded
Broad trait theories of personality
of the 75 ‘Norman Clusters’ for each person. These scores were then factor analysed, and fve factors emerged once again. The factors were very similar to the Tupes and Christal traits shown in Table 6.2. There were two main differences: l Conscientiousness replaced Dependability. This probably makes sense, as the Tupes and
Christal work was mainly performed on offcers who needed to rely on each other; perhaps dependability is not such an important characteristic for college students, and so does not appear in the ratings which they make l Intellect replaced Culture. The ‘Culture’ factor was always hard to interpret, as Tupes and Christal themselves acknowledged. The ‘Intellect’ factor identifed by Goldberg refects interest in intellectual activities, creativity, curiosity and insight, which makes far more sense. It also somewhat mirrors Cattell’s fnding of an ‘intelligence’ factor (Factor B) in the 16PF. When similar studies have been performed in other languages, broadly similar factors have emerged – although there have been some inconsistencies, as discussed by Ashton et al. (2004a) amongst others. These authors also tried extracting six factors (rather than the usual fve) from their English-language data, and found that the sixth factor seemed to be one of honesty v. humility – a factor which is also found in several other languages (Ashton et al. (2004a)). Their HEXACO model is discussed below.
Costa and McCrae The strongest proponents of a fve-factor model are Paul Costa and Robert McCrae (Costa and McCrae, 1976). They began by using cluster analysis (a rough and ready approximation of factor analysis) to investigate correlations between the scales (not the items!) in Cattell’s 16PF. They found two clear clusters of scales that appeared to measure ‘extraversion’ (Cattell’s A, F, H, -Q2 factors) and ‘Neuroticism’ (-C, O, Q4 and L.), plus a tiny third cluster (Cattell’s I and M factors). More items were added in order to give more breadth to the third factor (Costa and McCrae 1978) which was then termed Openness to Experience and supposedly refected the ways in which people deal with novel events, such as fantasy, aesthetics, feelings and actions. They then decided, quite arbitrarily, that each of these three factors should be measured by six aspects of behaviour that they termed ‘facets’. These represent lower level forms of behaviour which, taken together, defne the factor. For example, ‘angry hostility’ is one facet of Neuroticism – it could be assessed by expressions such as ‘I am quick to anger’, ‘it doesn’t take a lot to make me mad’ and so on. In other words, Costa and McCrae developed a hierarchical model rather like that shown in Figure 5.6. The personality items are at the bottom of the hierarchy. The frst-order factors represent the facets, and the second-order variables represent the personality factors. There are some obvious problems with this work. First, there is no earthly reason why there should be precisely six facets per factor. This seems to be a completely arbitrary decision which does not fow from the analysis of the data. For factor analysis might show that some factors may have only four facets, others might have nine; it is surely an empirical issue rather than one to be decided by fat. Second, recall that Costa and McCrae began by analysing the scales in the 16PF. Earlier on we mentioned that there is excellent evidence that the items in the 16PF do not form 16 scales, and so that it is nonsensical to score the test as Cattell recommended. Doing so will result in scale scores which are mixtures of quite different traits. Correlating and factor analysing these scale scores thus makes little sense. It is far better to factor analyse items, or clusters of items which are near synonyms, which is what Goldberg (1990) did.
111
Broad trait theories of personality
Table 6.3 Costa and McCrae’s fve personality factors Trait
Description of someone scoring high
Openness
Imaginative, moved by art, emotionally sensitive, novelty seeker, tolerant
Conscientiousness
Competent, orderly, dutiful, motivated to achieve, self-disciplined, thinks before acting
Extraversion
Warm, gregarious, assertive, active, excitement seeker, positive emotions
Agreeableness
Trusting, straightforward, altruistic, cooperative, modest, tender minded
Neuroticism
Anxious, angry, hostile, depressed, self-conscious, impulsive, vulnerable
Third, even if the 16PF had formed scales as Cattell suggested there would still be problems. Costa and McCrae correlated and factor analysed the scores on each of Cattell’s 16 scales of the 16PF (A, B, C, E, etc.). These scores correspond to the frst-order factors in Figure 5.6; Costa and McCrae were essentially analysing the correlations between Cattell’s 16 frst-order factors, and not the correlations between the 184 items in the 16PF. They were looking for higher-order factors, although they never made this clear. Finally, it should be noted that the items used to measure the facets were often very similar in meaning – near synonyms, in fact. There is nothing wrong with this, as long as one does not try to interpret scores on the facet as if it were a source trait; the problem is that some authors do just this (e.g., Walton et al., 2018). Undeterred by all this, Costa and McCrae developed 6 × 3 = 18 sets of eight items, which eventually managed to form factors much as they were designed to. However, by this stage all attempts to keep to a lexical sampling model had been abandoned – the items were designed to measure a particular set of facets and factors. Two further factors were later added to this three-factor model in order to allow Goldberg’s two factors of ‘Agreeableness’ and ‘Conscientiousness’ to be measured, although some violence was done to Goldberg’s fve-factor model in the process. No one argued about the basic factors of Extraversion and Neuroticism, but Goldberg’s third factor (previously identifed as ‘intellect’) was altered so that it more closely resembled ‘openness to experience’. The fve factors shown in Table 6.3 (acronym OCEAN), constitute the currently popular fvefactor model of personality, whose factors are usually measured using the NEO-PI(R) questionnaire (Costa and McCrae, 1992b) or Goldberg’s free alternative at http://ipip.ori.org. Considerable work has now gone into researching the NEO-PI(R) factors, generally by: l establishing their existence in a variety of languages with varying degrees of success. Early
work seemed to show that the factors appeared in several cultures (McCrae and Costa, 1997) but more recently De Raad et al. (2010) show that only Extraversion, Agreeableness and Conscientiousness seem to occur in the 12 languages they examined, and there seems to be little evidence for the model in largely-illiterate Bolivians (Gurven et al., 2013). This is important, as if the fve factors are truly source traits which represent the ways that humans in general behave, they should appear in all cultures.
112
Broad trait theories of personality l correlating them with other questionnaires. l determining whether scores on the traits predict behaviour; for example, sexual behaviour
(Allen and Walter, 2018). l examining the origins of these factors and facets; that is whether they are infuenced by
genes, family environment or extra-familial infuences (Jang et al., 1998). l checking the structure of the test by factor analysing or correlating the items or facets.
Facets that supposedly belong to quite different factors can turn out to be quite highly correlated, as shown in the NEO-PI(R) manual. For example, assertiveness (a facet of Extraversion) correlates –0.42 with self-consciousness (a facet of Neuroticism). This correlation is actually larger than four out of the fve correlations between assertiveness and the other facets of Extraversion! And this is not an isolated example. A related problem is that some of the fve factors are themselves highly correlated. Block (1995) mentions a correlation of –0.61 between Neuroticism and Conscientiousness. This paper is well worth reading if one is at all sceptical about the merits of the fve-factor model. In fairness I should mention that some studies fnd that the correlations between the facets do ft the fve-factor model as expected (e.g., Aluja et al., 2005).
Variants of the fve-factor model HExACO The HEXACO model of personality sprang from Michael Ashton’s discovery that in crosscultural studies of the lexical basis of personality, six (rather than fve) factors often emerged (Ashton et al., 2004b), and that an honesty-humility factor may need to be added to the Big Five traits. In addition, some of the facets seemed to differ. The HEXACO model built on these cross-cultural studies by measuring those traits and facets which were best replicated across cultures. Thus although most of the HEXACO traits have similar names to the Big Five traits, there are usually differences in their makeup, as the facets are different. The four facets that defne each trait are those which appear most consistently across cultures. Lee and Ashton (2004) describe the origin and structure of the HEXACO Personality Inventory (HEXACO-PI) in some detail; the revised version of this inventory can be downloaded free (for academic use) from http://hexaco.org. It measures Honesty/humility, Emotionality, Extraversion, Agreeableness, Conscientiousness, and Openness. The HEXACO model is becoming popular, with good reason. Its factor structure emerges cleanly and consistently (Lee and Ashton, 2018), and it shows predictive power in some real-life settings (e.g., Anglim et al., 2018; Burtaverde et al., 2017).
ExErCIsE spend 15 minutes measuring your own personality at http://hexaco.org/ hexaco-online to see what a modern personality test looks like. You may want to try to work out which items belong to the six HExACO scales as you take the test. Do you agree with the results? If you feel comfortable doing so, ask a close friend to rate your personality using the rating form of the questionnaire on the same site. Are the results similar to the self-report?
113
Broad trait theories of personality
Table 6.4 The six main HExACO factors. Scale
Brief description
Facets
Honesty-Humility
Uninterested in manipulating others, wealth or social status
sincerity, fairness, greedavoidance, modesty
Emotionality
similar to Neuroticism; anxious, fearful and empathetic
Fearfulness, anxiety, dependence, sentimentality
eXtraversion
sociable, confdent and enthusiastic
self-esteem, social boldness, sociability, liveliness
Agreeableness
Forgiving, compromising and without a quick temper
Forgivingness, gentleness, fexibility, patience
Conscientiousness
Careful, thoughtful, wellorganised and disciplined
Organisation, diligence, perfectionism, prudence
Openness
Imaginative, inquisitive, curious and unconventional.
Aesthetic appreciation, inquisitiveness, creativity, unconventionality
The scales measured by the HEXACO model are shown in Table 6.4.
Zuckerman’s alternative Big Five model Marvin Zuckerman has long been interested in sensation seeking as an aspect of personality and developed the Zuckerman-Kuhlman Personality Questionnaire (Zuckerman, 2002) to measure this and other personality traits. Rather than trying to sample the whole range of trait descriptors (which would ensure that all traits were identifed, if the assumptions of the lexical model are correct) Zuckerman based his questionnaire on the analysis of items from 46 published scales including his own Sensation Seeking Scale. So it is perhaps unsurprising that a sensation seeking factor (‘Impulsive Sensation Seeking’) emerged. Neuroticism appeared much as expected, as did Sociability (Extraversion). Impulsive Sensation Seeking was the virtual opposite of Conscientiousness and Aggression-Hostility proved to be very highly (negatively) correlated with Agreeableness. The only real difference between the models was in the ffth factor: Zuckerman’s factor of Activity did not resemble Openness to Experience in the NEO-PI(R). This model is not widely used; it is mentioned in case readers encounter it elsewhere.
Cloninger’s model Cloninger’s model of personality was based on a model of how people adapt to novelty, reward and punishment. It is highly infuential within psychiatry (Cloninger, 1987; Cloninger et al., 1993) measuring seven traits, viz. novelty seeking, persistence, harm avoidance, reward dependence, self-directedness, self-transcendence and cooperativeness. De Fruyt et al. (2000) show that each of these correlates at least 0.4 with one of the NEO-PI(R) scales, and that a combination of NEO scales could predict some of the scores on the Cloninger
114
Broad trait theories of personality
scales, although it is clear that the two systems are not equivalent. This is perhaps hardly surprising given their different origins; Cloninger’s traits sprang from theoretical and clinical considerations, rather than an attempt to discover source traits. Again, this model of personality is rarely used by psychologists who prefer to base their inferences on non-clinical groups.
The work of Eysenck and Gray Whereas Cattell and the Five Factor theorists’ approach to exploring broad personality traits was data-driven – a ‘bottom-up’ process relying on statistical analyses rather than theory – Hans Eysenck advocated a ‘top-down’ method of analysis in which likely personality traits were identifed from the clinical and experimental literature. He spent 50 years investigating the three main aspects of personality, namely Introversion vs Extraversion, Neuroticism vs emotional stability, and Psychoticism (or tough mindedness) vs tender mindedness. The frst two terms, in particular, have a long history. Eysenck and Eysenck (1985) have shown that these appeared in the 18th and 19th century and we noted in Chapter 4 that Extraversion featured in the writings of Carl Jung during the 1920s. They are of course currently aspects of the various fve-factor models. The basic premise of all these writers is that there are two basic dimensions of personality, rather as shown in Figure 6.1. The vertical line in Figure 6.1 represents the personality dimension of emotional instability, or ‘Neuroticism’. Highly neurotic individuals are moody, touchy and anxious, whereas those low on Neuroticism are relaxed, even tempered and calm. The second dimension is that of Extraversion. Extraverts are hearty, sociable, talkative and optimistic individuals – the types who insist on talking to you on train journeys. Introverts are reserved, pessimistic and keep themselves to themselves. The fact that these two lines are at right angles suggests that these two dimensions of personality are independent (uncorrelated). Figure 6.1 also shows the four ‘Galen types’ (melancholic, choleric, phlegmatic and sanguine) relative to these axes and around the outside are descriptions of behaviours that might be observed from various combinations of these dimensions. Thus someone who is neither particularly stable nor unstable, but very introverted, might be described as quiet or passive, a stable extravert might be described as responsive and easygoing, and so on. Figure 6.1 does not show Eysenck’s third major dimension of personality, which was added to the theory in 1976. Eysenck’s dimensions of Extraversion and Neuroticism could not differentiate between people with schizophrenia and those with no psychiatric diagnosis. A third dimension, Psychoticism, was introduced to the model in an attempt to achieve this. Individuals scoring high on Psychoticism would be emotionally cold, cruel, risk taking, manipulative and impulsive. They are more likely than low scorers to develop schizophrenia (Chapman et al., 1994). Those scoring low on Psychoticism would be warm, socialised individuals. Psychoticism is thought to be largely uncorrelated with either Extraversion or Neuroticism and so Eysenck’s model of personality can thus be represented by three personality dimensions at right angles to each other – Extraversion, Neuroticism and Psychoticism. These traits are measured using the Revised Eysenck Personality Questionnaire (EPQ-R) or an alternative at http://ipip.ori.org. Unlike some of the other questionnaires considered earlier, the items in Eysenck’s scales do form three clear factors precisely in accordance with expectations (Barrett and Kline, 1982b; Eysenck et al., 1985). Eysenck has thus concentrated on two of the personality traits that emerge from the fvefactor model together with a dimension of Psychoticism that he fnds to be appreciably
115
Broad trait theories of personality
UNSTABLE moody anxious
touchy restless
rigid
aggressive
sober
excitable
pessimistic
changeable
reserved
Melancholic
unsociable
implusive
Choleric
optimistic
quiet
active EXTRAVERTED
INTROVERTED passive
sociable
careful
Phlegmatic
Sanguine
outgoing
thoughtful
talkative
peaceful
responsive
controlled
easygoing
reliable
lively
even-tempered carefree
calm STABLE
Figure 6.1 Personality dimensions of Neuroticism and Extraversion (reproduced from Nyborg, 1997, p. 18, with permission)
negatively correlated with Costa and McCrae’s ‘Agreeableness’ and ‘Conscientiousness’. According to Eysenck (1992) Goldberg found a correlation of –0.85 between Psychoticism and these two measures (combined), indicating that (low) Agreeableness and Conscientiousness may well be components of Psychoticism, rather than factors in their own right; Heaven et al. (2013) also found these variables to be substantially correlated. Jeffrey Gray devised a personality theory which was very similar to Eysenck’s, except that the factors representing Extraversion and Neuroticism were rotated through 45 degrees in Figure 6.1. The axis passing from ‘peaceful’ to ‘excitable’ was known as ‘impulsivity’, and the one from ‘responsive’ to ‘sober’ was termed anxiety. Gray’s work was based on animal models of reward and punishment and is important because it led to reinforcement sensitivity theory, discussed in Chapter 9.
116
Broad trait theories of personality
A general factor of personality? Given that the fve factors of the Big Five model are somewhat correlated, some researchers have suggested that the correlations between these factors should themselves be factor analysed to produce a ‘higher order’ factor of personality, rather as described in Chapter 5. Musek (2007) found that such a factor consisted of a mixture of Extraversion, Agreeableness and Conscientiousness and Openness to Experience, together with low Neuroticism. But what does it mean? One possibility is that it measures social desirability (the tendency to present oneself in a good light when completing personality questionnaires) as high scores on the general factor (high Extraversion, etc.) are socially valued. However the correlation between scores on the general factor and a scale measuring social desirability is only in the order of 0.2 to 0.25 (Schermer and Vernon, 2010), suggesting that the general factor is not just social desirability. Instead it has been suggested that the general factor may refect ‘social effectiveness’ (van der Linden et al., 2016) by which is meant making oneself popular and likeable. This paper is well worth reading for its review of the literature in the area. Given that there seems to be evidence linking Eysenck’s Psychoticism factor to low Conscientiousness and low Agreeableness, it might be worthwhile exploring whether the general factor of personality corresponds to low Psychoticism in Eysenck’s model of personality. However this does not seem to have been attempted yet. Although research on the general factor of personality is currently popular, the exact meaning of the factor is still somewhat unclear, and it is not entirely obvious that the same general factor of personality is found when different questionnaires are analysed. However the literature here gets somewhat technical, as Revelle and Wilt (2013) have identifed a number of technical and conceptual problems with analysing and interpreting the general factor of personality. It is best described as ‘work in progress’.
Dissenting voices Before we leave this chapter, it is necessary to consider some criticisms that have been levelled against all types of trait theory. Mischel (1968) and other social psychologists have argued that it is situations, not traits, that determine behaviour. They argue that although we choose to believe that individuals behave in consistent ways (corresponding to personality traits), such beliefs are at odds with the data. However, three pieces of evidence show that this view is untenable. l If personality were determined purely by situations, personality traits simply would not
exist and so measures of these traits (such as scores on personality questionnaires) would be unable to predict any aspects of behaviour. However plenty of evidence shows that personality traits can predict a range of behaviours (e.g., Paunonen, 2003; Graham et al., 2017). l If personality were determined by situations, then personality could not be infuenced by our genetic makeup. Several are, as described in Chapter 15. l Finally, studies such as those by Conley (1984) and Locke et al. (2017) show a considerable consistency in behaviour over time and/or situations. Indeed, Conley has shown that after correcting for measurement error, the consistency of personality traits over time is of the order of 0.98.
117
Broad trait theories of personality
Self-assessment question 6.4 Using the data in Table 6.5, plot a graph showing how each individual’s level of anxiety varies from one situation to another. What would you expect this graph to look like if: a. situations, not personality, determine behaviour b. personality, not situations, determines behaviour c. behaviour is determined by the interaction of situation and personality? Which of these best resembles your graph?
Other theories such as learning theory and social learning theory have been proposed as alternatives to traits, for if our actions are determined by our reinforcement history and/or the behaviour of our role models, then personality traits certainly cannot be said to cause behaviour. One way of addressing this issue is to consider the effects of such processes on personality traits. If role modelling and reinforcement techniques are all important in determining how a child or adult behaves, one would expect brothers and sisters to have similar personalities, by virtue of their having been brought up according to the same family ethos. To anticipate Chapter 15, the infuence of this ‘shared environment’ on children’s personalities is minuscule and there is enormous variation in personality within families. This makes it unlikely that such learning theories will be of value as explanations for behaviour. A third criticism of trait psychology comes from social psychologists with an interest in the ‘construction process’ (Hampson, 1997). Rather than viewing personality as a characteristic of the individual, these theorists delve into the processes whereby personality attributions are made – that is, how behaviours are identifed, categorised and attributed to either the person (‘John is a snappy person’) or the situation (‘John is under stress’). Personally, I have no problem with this work, except that it is social psychology rather than anything to do with individual differences! For if we defne traits as being real, internal characteristics of the individual, then it matters not one jot how others view them. Studying the social construction process may be useful in fostering awareness of some measurement problems inherent in personality questionnaires, but once again the acid question is whether or not scores on personality questionnaires can predict anything. If they can, then it seems to me that there is more to personality than this.
Table 6.5 Hypothetical anxiety scores of three people in fve situations
118
In lectures
Coffee with friends
In a statistics practical class
Watching a horror flm
Reading Dickens
8
7
18
14
6
3
2
16
4
2
7
8
17
8
7
Broad trait theories of personality
summary This chapter has covered a lot of ground and has outlined the most promising approaches to the measurement of personality. You should now be able to defne traits and discuss the problem of ‘circularity’ in using traits to describe behaviour, and discuss the logic of using the lexical hypothesis for sampling personality traits from the language – you may have your own views on the adequacy of this method. Five-factor models are currently hugely popular, although I have far more regard for Goldberg’s model and the HEXACO six-factor model than Costa and McCrae’s work, as the former two approaches involved fewer ad-hoc modifcations of the model. The ‘take home’ message from this chapter is that if you want to measure a person’s personality, you should use a fve- or six-factor model – perhaps the HEXACO model, whose questionnaire is free to download and use from hexaco.org, and which has been widely used and translated. There is general agreement that Extraversion and Neuroticism are two important aspects of personality. Conscientiousness and Agreeableness are also found consistently across cultures, although Eysenck argues that these might be better represented as components of Psychoticism. The status of Intellect (or ‘Openness’, or ‘Culture’, depending which fve-factor model one follows) is far less clear. Research into the general factor of personality is currently popular, though there is less-than-perfect consensus about what the factor means, which is a problem. You may also wish to consider whether the totally atheoretical approach to discovering personality source traits outlined in this chapter is entirely adequate, or whether (and why?) there are any important personality traits which do not seem to be captured by the factor models discussed here. Finally, you should be familiar with some criticisms of trait theory that are sometimes levied by social psychologists and learning theorists.
suggested additional reading The best solution would be to consult the papers and books cited in the text, perform a literature search, browse the journals Personality and Individual Differences, the Personality section of Journal of Personality and Social Psychology, the Journal of Research in Personality, the European Journal of Personality, or Journal of Personality, or visit William Revelle’s excellent website, the ‘Personality Project’ (https://personality-project.org) at Northwestern University. Old papers by Costa and McCrae and Eysenck (Costa and McCrae, 1992a; Eysenck, 1992) are still well worth reading for two conficting accounts of the advantages of the fve-factor model over the three-factor model, while Matthews et al. (2009) provide an excellent account of the nature and background of trait theories of personality.
Answers to self-assessment questions SAQ 6.1 First, there are a great many adjectives in the language and so it is diffcult to know how to choose between them. second, if different investigators happen to choose different samples of words, it is going to be diffcult to
119
Broad trait theories of personality
ascertain whether their tests actually measure the same underlying psychological dimensions of personality. Third, it is diffcult to ensure that all potential personality traits have been examined – that is, that the model is comprehensive. Finally, it is necessary to show that any concept (trait) measured by a set of adjectives is broader in scope than the adjectives that are used to measure it, before it can be used as an explanation for behaviour.
SAQ 6.2 There are two possibilities, both of which require that the same large (n >100) sample of people have their behaviour assessed by raters and fll in the questionnaire. One can either correlate the two measures together or factor analyse the correlations between all of the items (both ratings and questionnaire items). Factoring is the better approach, as this will also identify any items that fail to measure the trait(s) in question.
SAQ 6.3 The factor analysis shows which items measure the same underlying trait. suppose that a ten-item test is factored, and the analysis shows that items 1 to 5 load on one factor and items 6 to 10 load on another quite different factor. Obviously the test should be scored so that items 1 to 5 form one scale and items 6 to 10 form another. If, by way of contrast, the test handbook recommends adding together scores on items 1, 2, 3, 7, 8 and 9, the resulting total score will be a mixture of two quite different traits – and quite meaningless. (Consider, for example, how you would interpret a person’s score from a test in which fve of the items measured nervousness and fve items measured vocabulary – it is impossible.)
SAQ 6.4 a. Very jagged – each person’s score would vary enormously from situation to situation, and there would be no differences in the average score of each person (computed across situations). b. The graphs would be entirely fat – people would yield precisely the same score in every situation. c. A mixture of the two. Your graph is best described by (c).
120
Broad trait theories of personality
references Allen, M. S. & Walter, E. E. 2018. Linking Big Five Personality Traits to Sexuality and Sexual Health: A Meta-Analytic Review. Psychological Bulletin, 144, 1081–1110. Allport, G. W. & Odbert, H. S. 1936. Trait Names: A Psycho-Lexical Study. Psychological Monographs, 47, whole issue. Aluja, A., García, O., García, L. F. & Seisdedos, N. S. 2005. Invariance of the ‘NEO-PI(R)’ Factor Structure across Exploratory and Confrmatory Factor Analyses. Personality and Individual Differences, 38, 1879–1889. Anglim, J., Lievens, F., Everton, L., Grant, S. L. & Marty, A. 2018. HEXACO Personality Predicts Counterproductive Work Behavior and Organizational Citizenship Behavior in Low-Stakes and Job Applicant Contexts. Journal of Research in Personality, 77, 11–20. Ashton, M. C., Lee, K. & Goldberg, L. R. 2004a. A Hierarchical Analysis of 1,710 English Personality-Descriptive Adjectives. Journal of Personality and Social Psychology, 87, 707–721. Ashton, M. C., Lee, K., Perugini, M., Szarota, P., De Vries, R. E., Di Blas, L., Boies, K. & De Raad, B. 2004b. A Six-Factor Structure of Personality-Descriptive Adjectives: Solutions from Psycholexical Studies in Seven Languages. Journal of Personality and Social Psychology, 86, 356–366. Barrett, P. & Kline, P. 1982a. An Item and Radial Parcel Factor Analysis of the 16PF Questionnaire. Personality and Individual Differences, 3, 259–270. Barrett, P. & Kline, P. 1982b. Personality Factors in the Eysenck Personality Questionnaire. Personality and Individual Differences, 1, 317–333. Block, J. 1995. A Contrarian View of the Five-Factor Approach to Personality Description. Psychological Bulletin, 117, 187–215. Burtaverde, V., Chraif, M., Anitei, M. & Dumitru, D. 2017. The HEXACO Model of Personality and Risky Driving Behavior. Psychological Reports, 120, 255–270. Buss, D. M. & Craik, K. H. 1983. The Act Frequency Approach to Personality. Psychological Review, 90, 105–126. Byravan, A. & Ramanaiah, N. V. 1995. Structure of the 16PF 5-Ed from the Perspective of the Five-Factor Model. Psychological Reports, 76, 555–560. Cattell, R. B. 1946. Description and Measurement of Personality. New York: World Book Company. Cattell, R. B. 1957. Personality and Motivation Structure and Measurement. Yonkers: New World. Cattell, R. B. 1973. Personality and Mood by Questionnaire. San Francisco: Jossey-Bass. Cattell, R. B. 1986. The 16 PF Personality Structure and Dr. Eysenck. Journal of Social Behavior and Personality, 1, 153–160. Cattell, R. B. & Cattell, H. E. P. 1995. Personality Structure and the New 5th-Edition of the 16PF. Educational and Psychological Measurement, 55, 926–937. Cattell, R. B. & Eber, H. W. 1954. The IPAT Music Test of Personality. Champaign, Il: Institute for Personality and Ability Testing. Cattell, R. B., Eber, H. W. & Tatsuoka, M. M. 1970. Handbook for the Sixteen Personality Factor Questionnaire. Champaign, IL: IPAT. Cattell, R. B. & King, J. 2000. 16PF Industrial Version (Revised). Henley-on-Thames: The Test Agency. Cattell, R. B. & Kline, P. 1977. The Scientifc Analysis of Personality and Motivation. London: Academic Press.
121
Broad trait theories of personality
Cattell, R. B. & Krug, S. E. 1986. The Number of Factors in the 16PF: A Review of the Evidence with Special Emphasis on Methodological Problems. Educational and Psychological Measurement, 46, 509–522. Cattell, R. B. & Schuerger, J. M. 1978. Personality Theory in Action. Champaign, II: IPAT. Cattell, R. B. & Tollefson, D. L. 1966. The IPAT Humor Test of Personality. Champain, II: Institute for Personality and Ability Testing. Chapman, J. P., Chapman, L. J. & Kwapil, T. R. 1994. Does the Eysenck Psychoticism Scale Predict Psychosis? – a 10-Year Longitudinal-Study. Personality and Individual Differences, 17, 369–375. Cloninger, C. R. 1987. A Systematic Method for Clinical Description and Classifcation of Personality Variants – A Proposal. Archives of General Psychiatry, 44, 573–588. Cloninger, C. R., Svrakic, D. M. & Przybeck, T. R. 1993. A Psychobiological Model of Temperament and Character. Archives of General Psychiatry, 50, 975–990. Conley, J. J. 1984. The Hierarchy of Consistency: A Review and Model of Longitudinal Findings on Adult Individual Differences in Intelligence, Personality and Self-Opinion. Personality and Individual Differences, 5, 11–25. Cooper, C. 2019. Pitfalls of Personality Theory. Personality and Individual Differences, 151, Article no. 109551. Costa, P. T. & McCrae, R. R. 1976. Age Differences in Personality Structure: A ClusterAnalytic Approach. Journal of Gerontology, 31, 564–570. Costa, P. T. & McCrae, R. R. 1978. Objective Personality Assessment. In M. Storandt, I. C. Siegler & M. F. Elias (eds.) The Clinical Psychology of Aging. New York: Plenum. Costa, P. T. & McCrae, R. R. 1992a. 4 Ways 5 Factors Are Basic. Personality and Individual Differences, 13, 653–665. Costa, P. T. & McCrae, R. R. 1992b. NEO-PI(R) Professional Manual. Odessa, FL: Psychological Assessment Resources. De Fruyt, F., Van De Wiele, L. & Van Heeringen, C. 2000. Cloninger’s Psychobiological Model of Temperament and Character and the Five-Factor Model of Personality. Personality and Individual Differences, 29, 441–452. De Raad, B., Barelds, D. P. H., Levert, E., Ostendorf, F., Mlacic, B., Blas, L. D., . . . Katigbak, M. S. 2010. Only Three Factors of Personality Description Are Fully Replicable across Languages: A Comparison of 14 Trait Taxonomies. Journal of Personality and Social Psychology, 98, 160–173. Eysenck, H. J. 1992. 4 Ways 5 Factors Are Not Basic. Personality And Individual Differences, 13, 667–673. Eysenck, H. J. & Eysenck, M. W. 1985. Personality and Individual Differences. New York: Plenum Press. Eysenck, S. B., Eysenck, H. J. & Barrett, P. 1985. A Revised Version of the Psychoticism Scale. Personality and Individual Differences, 6, 21–29. Goldberg, L. R. 1990. An Alternative ‘Description of Personality’: The Big-Five Structure. Journal of Personality and Social Psychology, 59, 1216–1229. Graham, E. K., Rutsohn, J. P., Turiano, N. A., Bendayan, R., Batterham, P. J., Gerstorf, D., . . . Mroczek, D. K. 2017. Personality Predicts Mortality Risk: An Integrative Data Analysis of 15 International Longitudinal Studies. Journal of Research in Personality, 70, 174–186. Gurven, M., Von Rueden, C., Massenkoff, M., Kaplan, H. & Vie, M. L. 2013. How Universal Is the Big Five? Testing the Five-Factor Model of Personality Variation among ForagerFarmers in the Bolivian Amazon. Journal of Personality and Social Psychology, 104, 354–370.
122
Broad trait theories of personality
Hampson, S. 1997. The Social Psychology of Personality. In C. Cooper & V. Varma (eds.) Processes in Individual Differences. London: Routledge. Heaven, P. C. L., Ciarrochi, J., Leeson, P. & Barkus, E. 2013. Agreeableness, Conscientiousness, and Psychoticism: Distinctive Infuences of Three Personality Dimensions in Adolescence. British Journal of Psychology, 104, 481–494. Howarth, E. 1976. Were Cattell’s Personality Sphere Factors Correctly Identifed in the First Instance? British Journal of Psychology, 67, 213–230. Jang, K. L., McCrae, R. R., Angleitner, A., Riemann, R. & Livesley, W. J. 1998. Heritability of Facet-Level Traits in a Cross-Cultural Twin Sample: Support for a Hierarchical Model of Personality. Journal of Personality and Social Psychology, 74, 1556–1565. Kline, P. 1973. The IPAT Music Preference Test of Personality in Great Britain: A Study of Validity. British Journal of Projective Psychology & Personality Study, 18, 27–29. Kline, P. & Cooper, C. 1984. A Construct Validation of the Objective-Analytic Test Battery (OATB). Personality and Individual Differences, 5, 323–337. Lassiter, G. D. & Briggs, M. A. 1990. Effect of Anticipated Interaction on Liking – An Individual Difference Analysis. Journal of Social Behavior and Personality, 5, 357–367. Lee, K. & Ashton, M. C. 2004. Psychometric Properties of the HEXACO Personality Inventory. Multivariate Behavioral Research, 39, 329–358. Lee, K. & Ashton, M. C. 2018. Psychometric Properties of the HEXACO-100. Assessment, 25, 543–556. Locke, K. D., Church, A. T., Mastor, K. A., Curtis, G. J., Sadler, P., Mcdonald, K., . . . Ortiz, F. A. 2017. Cross-Situational Self-Consistency in Nine Cultures: The Importance of Separating Infuences of Social Norms and Distinctive Dispositions. Personality and Social Psychology Bulletin, 43, 1033–1049. Loehlin, J. C. 1965. Some Methodological Problems in Cattell’s Multiple Abstract Variance Analysis. Psychological Review, 72, 156–161. Matthews, G. 1989. The Factor Structure of the 16PF: 12 Primary and 3 Secondary Factors. Personality and Individual Differences, 10, 931–940. Matthews, G., Deary, I. J. & Whiteman, M. C. 2009. Personality Traits (3rd Edn). Cambridge: Cambridge University Press. McCrae, R. R. & Costa, P. T. 1997. Personality Trait Structure as a Human Universal. American Psychologist, 52, 509–516. Michell, J. 1997. Quantitative Science and the Defnition of Measurement in Psychology. British Journal of Psychology, 88, 355–383. Mischel, W. 1968. Personality and Assessment. New York: Wiley. Musek, J. 2007. A General Factor of Personality: Evidence for the Big One in the Five-Factor Model. Journal of Research in Personality, 41, 1213–1233. Norman, W. T. 1967. 2800 Personality Trait Descriptors: Normative Operating Characteristics for a University Population. Ann Arbor: University of Michigan, Department of Psychological Sciences. Nyborg, H. 1997. The Scientifc Study of Human Nature: Tribute to Hans J. Eysenck at Eighty. Oxford: Pergamon. Passini, F. T. & Norman, W. T. 1966. A Universal Conception of Personality Structure? Journal of Personality and Social Psychology, 4, 44–49. Paunonen, S. V. 2003. Big Five Factors of Personality and Replicated Predictions of Behavior. Journal of Personality and Social Psychology, 84, 411–424. Revelle, W. & Wilt, J. 2013. The General Factor of Personality: A General Critique. Journal of Research in Personality, 47, 493–504.
123
Broad trait theories of personality
Schermer, J. A. & Vernon, P. A. 2010. The Correlation between General Intelligence (g), a General Factor of Personality (Gfp), and Social Desirability. Personality and Individual Differences, 48, 187–189. Tupes, E. C. & Christal, R. E. 1961/1992. Recurrent Personality Factors Based on Trait Ratings. Journal of Personality, 60, 225–251. Van Der Linden, D., Dunkel, C. S. & Petrides, K. V. 2016. The General Factor of Personality (Gfp) as Social Effectiveness: Review of the Literature. Personality and Individual Differences, 101, 98–105. Walton, K. E., Pantoja, G. & McDermut, W. 2018. Associations between Lower Order Facets of Personality and Dimensions of Mental Disorder. Journal of Psychopathology and Behavioral Assessment, 40, 465–475. Wells, W. T. & Good, L. R. 1977. Item Factor Structure of the 16PF. Psychology: A Journal of Human Behavior, 14, 53–55. Zuckerman, M. 2002. Zuckerman-Kuhlman Personality Questionnaire (ZKPQ): An Alternative Five-Factorial Model. In B. De Raad & M. Perugini (eds.) Big Five Assessment. Seattle, WA: Hogrefe and Huber.
124
Chapter 7 Emotional intelligence
Introduction Chapter 6 discussed the broad personality traits which were discovered by exploring the ‘lexical hypothesis’ – the idea that if a personality characteristic exists, then words will have been invented to describe it. Thus broad personality traits should emerge when people’s responses to the various words are factor analysed. It did not take long for it to be recognised that this approach did not seem to reveal all important aspects of personality. This chapter, and Chapters 9 and 10, explore narrow personality traits – traits which are hypothesised to exist, but which did not seem to appear as broad personality factors in the fve-factor (or any other) model. One apparent gap is ‘emotional intelligence’, or ‘EI’. Whilst each of the personality models considered in Chapter 6 included a factor of ‘neuroticism’, ‘anxiety’ or ‘emotional [in]stability’, these refer to feelings of anxiety, depression and other negative emotions; some people are habitually more anxious/depressed/moody than others. It might also be interesting to discover how people use their emotions. Why are some individuals ‘good with people’ – by which I mean able to identify their emotional state, work out the right thing to say and demonstrate social awareness? Likewise, why do some people fy into passionate rages and appear to be at the mercy of their emotions, while others seem to be able to understand and manage their feelings? The whole area received a massive burst of publicity in 1995 following the publication of Emotional Intelligence by Daniel Goleman, a journalist with a PhD in psychology (Goleman, 1995). It was written for a popular audience and made several far-reaching claims: for example, the sub-title of the book is ‘why it [emotional intelligence] can matter more than IQ’. Because this work is so well known (and often accepted uncritically by non-psychologists), it is important to outline its merits and its shortcomings.
125
Emotional intelligence
Learning outcomes Having read this chapter you should be able to: l l l l l l
Describe and evaluate Goleman’s contribution, and explain the difference between ability emotional intelligence (ability-EI) and trait-EI. Outline Mayer and Salovey’s four-branch model of ability-EI, and discuss experimental evidence which may support the model. Discuss how ability-EI may be measured using the MSCEIT, STEM and STEU. Consider the overlap between ability-EI and general intelligence (g), and the problems that this presents for researchers. Discuss Bar-On’s model, and evaluate the claim that it measures 15 aspects of trait-EI. Evaluate Schutte’s, and Petrides and Furnham’s models of trait-EI, and their relationship to other aspects of personality.
Goleman’s contribution Throughout his book Goleman argued that mainstream psychology has paid insuffcient attention to recognising and managing emotions. He argues that the being able to recognise one’s own and other people’s emotions and learning how to regulate one’s own emotional state is important in understanding why the most successful people are not (he claims) always the most intelligent – i.e., why EI is important in everyday life. He suggests that problems in recognising, regulating and using emotions can have important implications for everyday life (e.g., our dealings with others; high school shootings) and points to the importance for psychological and social wellbeing of controlling impulses and negative feelings such as rage or worry. Only then can we reach the ‘fow’ – a condition in which we are able to focus completely on what we are doing, with little awareness of events in the outside world; this produces a feeling of great satisfaction or joy. Goleman presents a number of case studies to support his thesis that emotional intelligence really matters. These include the burglar who ‘saw red’ and bludgeoned two people to death, the child who is skilled at using several techniques to alter their sibling’s emotional state, and people with no social graces who simply lack the self-insight to recognise when they are boring others or behaving inappropriately. Goleman discusses some literature which shows that the limbic system and amygdala are important in determining how we express our emotions, and that cognitive techniques also modify the nature and strength of the emotions that we experience. For example, we may recognise that we are starting to feel upset, so focus on some happy memory. All this does not sound very different from Eysenck’s theory of personality. As outlined in Chapters 6 and 11, Eysenck’s factor of neuroticism describes, at one extreme, people who are at the mercy of their negative feelings, such as worry, guilt, anxiety and depression, and mood swings and, at the other, people who are sanguine, in touch with their inner emotional life and emotionally stable. Is this form of ‘emotional intelligence’ the same as neuroticism? Unfortunately, it is impossible to tell this from reading Goleman’s book, as it does not contain a single reference to any of the researchers who have studied anxiety/neuroticism, such as Eysenck, Cattell, Costa or Goldberg. Neither does Goleman give any hints as to how we should go about
126
Emotional intelligence
assessing this form of emotional intelligence, which makes the theory diffcult to test. Without having an operational defnition of emotional intelligence one can only rely on case studies and persuasive argument – neither of which are particularly scientifc, as the accuracy of the model and its predictions cannot be shown to be false. The clinical approaches discussed in Chapter 4 showed that although case studies may make interesting reading, these alone cannot always reveal psychological truths. Much the same applies to Goleman’s thesis. Although he argues that emotional intelligence is important, his thesis tends to depend on anecdote rather than sound empirical research with good operational defnitions of theoretical concepts. Goleman also sometimes ignores alternative explanations for phenomena. For example, he cites Mischel et al. (1989) on delay of gratifcation. Here young children are given the choice of taking one small reward (a marshmallow) immediately or being promised that they will receive two when the experimenter returns in 20 minutes. Goleman argues that delay of gratifcation is an early attempt at emotional control and so children with higher scores on emotional intelligence will be more likely to opt for two marshmallows. However, there is also a substantial literature relating performance on this task to general intelligence (g). A meta-analysis (Shamosh and Gray, 2008) shows that intelligent children delay gratifcation. Finding that some children delay gratifcation while others do not is not, on its own, sound evidence for emotional intelligence. Indeed, one could argue that an emotionally aware child who is well aware of the fckle nature of adults should take the marshmallow immediately, given there is no guarantee that the experimenter will ever return – the very opposite of Goleman’s prediction! Goleman recounts how children who delay gratifcation when aged 4 also tend to show higher scores on mental ability tests, are better adjusted as adolescents and are far superior as students to those who act on a whim. This might indicate that being able to manage one’s emotions leads to positive outcomes in later life, as Goleman suggests. However it is dangerous to assume a causal model from correlational data. Mehrabian (2000) factor analysed several measures, including delay of gratifcation, emotional sensitivity and general intelligence (g). Delay of gratifcation fell out on the same factor as did general intelligence; it did not have a large loading on the emotional intelligence factor. It is therefore more likely that the delay of gratifcation task is infuenced by general intelligence (or vice versa), and that general intelligence (rather than emotional intelligence) infuences performance later in life. It is probably unkind of me to criticise Goleman’s work in this way as his book was written early and was intended for the general public rather than the specialist. The reason I comment on it is to show that: l as we learned in Chapter 4, a collection of case studies cannot really provide solid evidence
for the usefulness of a theory. Proper operational defnitions, sound experimental design and the analysis of data from carefully constructed samples of the population are far preferable. l it is incredibly important to read widely and test for alternative explanations whenever performing research in individual differences. This is something which I have tried to do throughout this book; rather than just listing theories and empirical studies I have tried to evaluate them. Of course some of my evaluations may be quite wrong – but critical evaluation is a useful skill to develop whenever you read a journal article, or a story in the popular press. Are there plausible alternative explanations? Does the literature suggest that any of the alternative explanations are more likely? Applying Occam’s razor, which explanation is the simplest? This is how science progresses, and this ‘deep processing’ of information will also help you understand and remember what you read (Gordon and Debus, 2002).
127
Emotional intelligence
Emotional intelligence: ability trait, personality trait, or both? The frst experimental studies of emotional intelligence took place in the late 1980s when researchers such as Salovey and Mayer (1990) reviewed the literature and started to develop a detailed model of emotional intelligence. They defned emotional intelligence as the ‘ability to monitor one’s own and others’ feelings and emotions, to discriminate among them, and to use this information to guide one’s thinking and actions’ and link it to the concept of ‘social intelligence’. According to this defnition, emotional intelligence is an ability, as it should be possible to evaluate how well a person can monitor their own feelings, discriminate between different emotions in themselves and others, and use information about emotional states to infuence one’s own or others’ behaviour. There may be clear right and wrong answers. Does that grimace indicate amusement, disgust or fear? Should I listen to gentle relaxing music if I want to ‘brainstorm’ and come up with creative ideas? What should you say to calm a customer who is shouting at sales staff? Other researchers such as Bar-On (1997) and Petrides and Furnham (2000) have argued that this is not the whole story. Being sensitive, genuine, well adjusted, assertive, optimistic, happy, able to inspire trust, and being self-actualised are all qualities that may refect emotional intelligence. However they are (arguably) not abilities: they are instead descriptions of how a person usually behaves across most situations. They are personality traits, in other words refecting style of behaviour, rather than how well a person can perform. Hence it is usual to treat emotional intelligence as both an ability and a personality trait. Some researchers focus on one branch, and some on the other. When reading the literature on emotional intelligence it is important to bear in mind whether a particular book or research paper refers to ability-EI or what some authors have called ‘trait-EI’, the branch of research which treats EI more like a personality trait. (The term ‘trait-EI’ is unfortunate as ability-EI is also a trait, but the term is now well ingrained in the literature.)
Self-assessment question 7.1 What is the essential distinction between trait-EI and ability-EI?
Ability-El Mayer and Salovey (1997) argued that there are four main areas in which people can demonstrate their level of ability-EI: the ‘four-branch model’ shown in Figure 7.1. They claim that scores on these four abilities are correlated so that it is also possible to combine scores in these four areas and use an overall measure of ability-EI. They expanded the theory slightly in 2016 (Mayer et al., 2016).
Perceiving emotion Perceiving emotion is perhaps the most obvious way of assessing a person’s ability-EI. For example, how well can a person perceive the emotion in someone’s face, tone of voice, posture, behaviour and so on? Can we tell when others try to deceive us when they display an ‘emotion’
128
Emotional intelligence
Figure 7.1 Mayer and Salovey’s four-branch model of ability-EI
which they do not really feel? Can we recognise emotions in music and other forms of art? How accurately can we perceive our own emotional states? Some of these are clearly more diffcult than others. The literature shows that there are substantial individual differences in people’s ability to ‘read’ the emotional state of others, although these skills are also affected by cultural factors, group membership, etc. (Reich et al., 2001). Factors other than EI seem to matter when recognising emotions. It is perhaps unsurprising to discover that emotional recognition can be learned; what is most surprising is that it is learned very early in life. Dunn et al. (1991) assessed the extent to which parents, children aged 36 months and a sibling used ‘feeling state terms’ (words such as ‘sad’ or expressions such as ‘that’s disgusting’) when interacting normally at home. The 36 month old infants were followed up and it transpired that there was a signifcant correlation between the amount of emotional talk at 36 months and these children’s ability to recognise emotions when they were 6 years old – and this did not just seem to happen just because some children were more verbally fuent than others. It also seems that social skills are important predictor of later individual differences in emotional understanding; children seem to develop at different rates but those who have good social skills, good language skills and parents who understand their child’s perspective are better able to understand emotions as they develop (Karstad et al., 2015).
Emotions and thought The second branch of emotional ability concerns how well people understand the links between emotions and thought. Mayer et al. (2016) suggest that this may include generating emotions in oneself so as to better relate to the experiences of another person, or to improve one’s judgement or memory (calming oneself down before an examination, or before making an important decision, for example). Or we could decide what cognitive operations would be most appropriate given our emotional state; for example, brainstorming and generating new
129
Emotional intelligence
ideas when excited or slightly manic. There is a considerable literature showing that a person’s emotional state (both natural and induced) infuences the way in which they tackle problems and the quality of thought. Isen’s (2001) review demonstrates that positive feelings increase creativity and improve problem solving in both laboratory-style tasks and more practical tasks, such as resolving disputes between people and even making accurate medical diagnoses. It has been suggested (Spering et al., 2005) that false feedback about performance on an intelligence test produces ‘negative affect’ (feelings of sadness – see Chapter 16) and this changes the way in which information is processed, leading to more low-level information gathering and careful analysis. The usual interpretation of this is that our mood may affect our cognitive skills. It is however dangerous to infer causality from a correlation. It seems just as possible that when a person thinks they have performed poorly on an intelligence task, it might be better to slow down, make sure that one has not missed any important information and take care in future tasks; negative affect may have no direct infuence on performance. Whatever the fne details of such relationships, someone with high levels of emotional intelligence should be sensitive to the ways in which their moods affect their cognitive processes.
Understanding emotions The third aspect of ability-EI involves the understanding of emotions – for example, being able to say which emotion one is experiencing, understanding which situations are likely to lead to which emotions, forecasting how a person might feel if their situation changes, or knowing what the emotional impact of a particular event is likely to be for oneself or others. This can be assessed by presenting people with various scenarios and asking them to assess their likely emotional impact. This ability too seems to have a surprisingly early onset. Denham et al. (1994) asked preschool children (aged approximately 48 months) to identify the emotions shown by four puppets (happy, sad, angry and afraid). The children were asked to describe events that could have made the puppet feel like this. For example, the child might feel that a puppet was happy because it had been given an ice cream. These authors were mainly interested in the origins of such skills (socialisation), but from our perspective the interesting thing is that such young children were able to identify sensible causes of emotion at all. It has also been found that these children’s understanding of emotion predicts their social competence later in life (Denham et al., 2003; Denham, 2018), so individual differences in social understanding emerge at an early age. The presence of autistic traits and the development of theory of mind are both likely to affect performance on this aspect of ability-EI. For adults, this ability is likely to be more analytical: for example, to be able to appreciate the distinction between liking and loving, to understand complex feelings such as awe and to appreciate the possibility of feeling love and hate simultaneously.
Emotion management The fnal area of ability-EI is the management of emotion. Sometimes feelings just develop as a consequence of life events (for example, sadness following a tragedy) but at other times it may be possible to manage or manipulate emotions in oneself or in others. For example, an emotionally intelligent individual may know which activities will help them to manage their anger or depression or what to say to another person in order to change their mood in some desired direction. Playing sport or exercising in order to get in the mood to study is one obvious example. This aspect of ability-EI has also been studied in another context by Gross and John (2003), whose Emotion Regulation Questionnaire measures the extent to
130
Emotional intelligence
which people cognitively re-appraise situations (i.e., look at things in a different way) in order to change their mood.
Self-assessment question 7.2 What are the four branches of ability-EI proposed by Mayer and Salovey (1997)?
Measuring ability-EI The MSCEIT. Ability-EI is usually assessed using the Mayer–Salovey–Caruso Emotional Intelligence Test, the MSCEIT (Mayer et al., 2002) a good description of which is given by Mayer et al. (2003). Figure 7.2 shows a still from a video (McRorie and Sneddon, 2007) recorded as part of a research project designed to capture real (rather than acted) emotions. To test your emotional intelligence, what emotion do you think is being experienced?
Figure 7.2 Spot the emotion (reproduced from McRorie and Sneddon (2007), with permission) 131
Emotional intelligence
Perceiving emotion is perhaps the most obvious way of assessing a person’s ability-EI. For example, how well can a person perceive the emotion in someone’s face, tone of voice, posture and so on? In Figure 7.2, Lynne Spence (our long-suffering school secretary) was genuinely experiencing great fear – she was teetering on top of a high and (she thought) extremely unsafe stack of crates and the rope round her waist was (she thought) only loosely attached to a tree branch. Unfortunately almost all research in this area asks people to judge emotions from photographs which are posed by actors under the direction of a psychologist – as with Figure 7.3. The frst problem is that there is no guarantee that the emotions posed in this way resemble genuine emotions. What the psychologist believes to be a typical expression of fear or joy may not be accurate. Some facial muscles may not be under voluntary control, so the actor may not be physically able to display a realistic emotion. Secondly, viewing a still picture (rather than a video) reduces the information available to the observer and makes the whole exercise rather artifcial; we do not look at still photographs when assessing emotions in our everyday lives. A meta-analysis of smiling by Gunnery and Ruben (2016) shows that both of these
Figure 7.3 A posed facial expression from Ebner et al. (2010)
132
Emotional intelligence
things can really matter, and it is worthwhile bearing this issue in mind when reading the ability-EI literature. Ability-EI can sometimes be assessed like any other ability – by developing a test whose items can each be scored as correct or incorrect – for example, whether a person can correctly identify the emotions being depicted in posed (or ideally naturalistic) photographs or videos. It is also possible to write items to explore how well a person can understand or use their emotions using items such as those shown in Table 7.1. That one is fairly straightforward. Others are harder, such as the item in Table 7.2 which measures how people know how to use their emotions. The MSCEIT is not without its controversies. First, how should one determine the ‘correct’ answers to items such as the one shown in Table 7.2? How should one score the items? Two approaches have been followed: consensus scoring and expert scoring. Consensus scoring simply means administering the items to a large sample of people (several thousands) and assuming that the most popular answer is ‘correct’. For example, if most people feel that tension would be a fairly useful emotion to feel when meeting in-laws, people who answered ‘fairly useful’ might be awarded fve points. The adjacent answers (‘very useful and ‘neutral’) might get four points, ‘not very useful’ three points and ‘not at all useful’ two points. Thus unlike most ability tests the items in the MSCEIT are not scored as ‘correct’ or ‘incorrect’, but rather how close they are to the best answer. The problem is that some people may give the most socially desirable response, or rely on stereotypes; there is no guarantee that the popular view is correct.
Table 7.1 Sample item from the MSCEIT (Caruso, 2019) Tom felt anxious and became a bit stressed when he thought about all the work he needed to do. When his supervisor brought him an additional project, he felt ____. (Select the best choice.) a) Overwhelmed
c) Ashamed
b) Depressed
d) Self-Conscious
e) Jittery
Table 7.2 A second sample item from the MSCEIT (Caruso, 2019) What mood(s) might be helpful to feel when meeting in-laws for the very frst time? Mood
Not at all useful
Not very useful
Neutral
Fairly useful
Very useful
Tension
1
2
3
4
5
Surprise
1
2
3
4
5
Joy
1
2
3
4
5
133
Emotional intelligence
Expert scoring involved a panel of 21 expert researchers making essentially the same decision. Roberts et al. (2001) and Maul (2012) identify this as being a problem, and also outline a number of other issues with the MSCEIT; Mayer et al. (2012) respond. Anyone planning to use the MSCEIT could usefully read these papers. As this is a commercial test, the scoring system is not publicly available, which makes life diffcult for those who want to check out the psychometric properties of the scales (e.g., their factor structure). One obvious problem with this test is that cultural differences may well creep in; it is probably wise to be wary of using the (US-based) scoring scheme in other cultures. Although there is some disquiet in the literature about the merits of these approaches and concern that they may sometimes produce very different results (Roberts et al., 2001), if just a few items have been wrongly scored as a result of this procedure, this should be detected via the factor analysis/item analysis procedure and so such items are likely to be removed from the test. There are other problems with this approach which are a little too technical to cover here although readers who plan to use the MSCEIT should explore them. When the scoring system is improved to eliminate these (Legree et al., 2014) there seems to be little evidence for the four-branch model shown in Figure 7.1. Instead there is just a single factor of ability-EI which correlates very substantially (0.79) with general intelligence. It also seems that the items in the MSCEIT are rather easy for most adults, and that several of its items do not measure emotional intelligence very well (Fiori and Antonakis, 2011). The STEM and the STEU. MacCann and Roberts (2008) have developed two tests, The Situational Test of Emotion Management (STEM) and Situational Test of Emotional Understanding (STEU) to measure two aspects of ability-EI; their paper shows that the two scales seem to assess what they claim, which was also the case in the analyses performed by Austin (2010). Furthermore the MacCann and Roberts tests are free to use (rather than being a commercial product like the MSCEIT) and the details of the scoring procedure are freely available – two features which make them extremely attractive to researchers. Typical items from the STEM and STUE are shown in Table 7.3. These two tests have not yet been as widely used as the MSCEIT but seem to have a lot of potential. Tests of emotional competence. Developmental psychologists have also studied ability-EI in children, which they call ‘emotional competence’. The Test of Emotional Comprehension (Pons and Harris, 2000) is typical of work in this area. It presents children with pictures; a story is read to them and they are asked to choose between several possible emotions (by pointing at a face) to show which emotion a character is experiencing. Some of the variables measured by this test involve theory of mind as well as emotional competence (for example, how does a rabbit feel when it eats a carrot, unaware of a fox hiding behind a tree) but many of the variables measured by this test certainly seem to resemble ability-EI. These include identifcation of emotions from facial expressions, understanding how events may infuence emotions, the regulation of emotion and the possibility of hiding one’s emotional state; there are others, too. A Portuguese study suggests that the items form factors much as they are supposed to (Rocha et al., 2015), but I cannot fnd any literature which links performance on this scale to the MSCEIT, or which develops a similar measure for adults. This is a shame, as the format looks promising.
134
Emotional intelligence
Table 7.3 Examples of items from the STEM and STUI (MacCann and Roberts, 2008) Katerina takes a long time to set the DVD timer. With the family watching, her sister says, ‘You idiot, you’re doing it all wrong, can’t you work the video?’ Katerina is quite close to her sister and family. What action would be the most effective for Katerina? a
Ignore her sister and keep at the task.
b
Get her sister to help or to do it.
c
Tell her sister she is being mean.
d
Never work appliances in front of her sister or family again.
An irritating neighbour of Eve’s moves to another state. Eve is most likely to feel: a
regret
b
hope
c
relief
d
sadness
e
joy
The importance of ability-EI When assessing the importance of ability-EI for making real world decisions about people (e.g., for personnel selection) it is important to consider two distinct issues. First, do people’s scores on a test measuring ability-EI correlate with real world behaviour, such as performance at work or school, or health or emotional problems. An applied psychologist with little interest in theory would need to know nothing more than this. They just want a test which predicts behaviour. However as scientists we academics also need to understand why such correlations are found. Is it because being able to identify, understand, use and manage emotions is an important new characteristic of people which was previously overlooked? Or is ability-EI really not much more than a mixture of several well-known and wellunderstood abilities and (perhaps) personality traits? If it is an ability, does it behave like an ability? It is possible to train emotional competence in children (Sprung et al., 2015) which is interesting for two reasons. First, it shows that emotional competence behaves rather differently from general intelligence, which is resistant to training (Redick et al., 2013) – although of course training people on how to approach the particular cognitive test which is used can be effective, training someone to perform well at intelligence tests does not necessarily boost their intelligence. It might just teach them how to best tackle the problems. Second, it raises the question of what a person’s ability-EI tells us. Is it an ability, or is it more of an attainment (as defned in Chapter 3)? Remember that the defnition of ability used there was that the problems in ability tests should be unfamiliar; it is not obvious that asking people to identify emotions is a totally unfamiliar task! However this is just my personal view, and many researchers and practitioners regard ability-EI as a cognitive ability (Mayer et al., 1999). That said, there are some oddities. For example, Elizabeth Austin shows that there is no simple link
135
Emotional intelligence
between ability-EI and performance on other tasks which correlate with intelligence. Intelligent people tend to have faster reactions, as discussed in Chapter 13. Intelligent people are also better able to identify shapes which are presented for shorter periods of time. However there is no link between ability-EI and either of these variables (Austin, 2009; Farrelly and Austin, 2007) suggesting that ability-EI does not behave as one might expect if it were a form of intelligence. It is also important to check that any correlation between ability-EI and performance does not simply occur because both are infuenced by general ability. In other words, it is necessary to ensure that ability-EI adds something over and above general ability. We have already noted that when it is scored differently the MSCEIT correlates substantially with intelligence (Legree et al., 2014). Schulte et al. (2004) administered the MSCEIT alongside a Big Five personality questionnaire and a test of ability scored it in the conventional way and found that these scores correlated 0.45 with general cognitive ability (‘general intelligence’). Their analysis showed that scores on the MSCEIT very closely resembled the scores which people obtained on a test of general intelligence combined with measures of the Big Five personality factors. The multiple correlation was over 0.8 and as these authors had a restricted range of ability in their student sample, the value in the general population would probably be even higher. They concluded that the MSCEIT really just measured a mixture of general intelligence and personality. Perhaps being able to assess emotional states is not a distinct ability at all. The importance of ability-EI in applied psychology (education, psychology at work, etc.) will be discussed in Chapter 18.
Trait-El Several models of the personality-like trait emotional intelligence have emerged since the 1980s. Trait-EI deals with sensitivity to one’s own and other people’s emotional state, being able to control one’s emotions, social skills and so on. We will consider just two models in detail – the Bar-On model because it is widely used, and Petrides and colleagues’ Trait Emotional Intelligence Questionnaire (TEIQue) because it is well supported by research. One problem with the trait-EI approach concerns the sampling of items. Precisely what types of question should go into a questionnaire measuring emotional intelligence? Theorists cannot take a global approach like Cattell and the early fve-factor theorists did, identifying all the words in the language that could possibly describe emotional intelligence – because by defnition, no one knows at the outset what emotional intelligence actually is! Some theorists view traits such as ‘conscientiousness’ and ‘achievement drive’ as components of emotional intelligence, which is odd considering they are major and well-studied traits in their own right. There is a real danger that the concept of emotional intelligence may have been over-extended by ‘pop psychologists’ and those wishing to sell tests to occupational psychologists under this fag of convenience. Many authors seem to draw on Salovey and Mayer’s (1990) list of facets of emotional intelligence (see Table 7.4) which includes several facets that were dropped from their 1997 theory of ability-EI because they did not seem to measure abilities.
Bar-On’s model Reuven Bar-On initially studied factors associated with feelings of wellbeing and developed 15 scales/133 items to measure 15 factors of emotional intelligence. These 15 scales, each of
136
Communicating one’s feelings to others through bodily (i.e., nonverbal) expression Attending to others’ nonverbal emotional cues, such as facial expressions and tone of voice
Nonverbal emotional NvExp expression RecOth
Emp
Recognition of emotion in others
Empathy
Emotion in others: nonverbal
Emotion in others: empathy
RegOth
CrTh MRA
Regulation of emotion in others
Intuition versus reason
Creative thinking
Mood redirected attention
Motivating emotions MotEm
Regulation of emotion in others
Flexible planning
Creative thinking
Mood redirected attention
Motivating emotions
IvR
RegSlf
Regulation of Regulation of emotion in the self emotion in the self
Having strong emotions forces me to understand myself
People think my ideas are daring
I often use my intuition in planning for the future
Usually, I know what it takes to turn someone else’s boredom into excitement
I can keep myself calm even in highly stressful situations
I am sensitive to other people’s feelings
I can tell how people are feeling even if they never tell me
I like to hug those who are emotionally close to me
If I am upset, I know the cause of it
Sample item
Pursuing one’s goals with drive, perseverance I believe I can do almost and optimism anything I set out to do
Interpreting strong – usually negative – emotions in a positive light
Using emotions to facilitate divergent thinking
Using emotions in the pursuit of life goals; basing decisions on feelings over logic
Managing others’ emotional states, particularly in emotionally arousing situations
Controlling one’s own emotional states, particularly in emotionally arousing situations
Understanding others’ emotions by relating them to one’s own experiences
Being in touch with one’s feelings and describing those feelings in words
Emotion in the self: nonverbal
RecSlf
Recognition of emotion in the self
Emotion in the self: verbal
Abbreviation Defnition
Current label
Original label
Table 7.4 Salovey and Mayer’s (1990) original facets of emotional intelligence
Emotional intelligence
137
Emotional intelligence
which is supposed to indicate an aspect of successful emotional functioning, are supposed to form fve higher-order factors as shown in Figure 7.4. There is also a short (51-item) version of the test and versions designed for younger participants. Although it claims to be based on a comprehensive survey of the literature, very little of Bar-On’s work has been published in academic journals. It is very much a commercial (rather than an academic) enterprise and the test has reached the marketplace without the usual processes of peer commentary, debate over the soundness of the underlying theory, statistical methods and so on. There are several points of interest about this model. First, the defnition of emotional intelligence is very broad. Reality testing, for example, is supposedly the ability to recognise whether one’s impressions accurately correspond with the real world: problem solving is the ability to defne and solve problems. It is not at all obvious to me why these are included. Neither is it clear why happiness and optimism are thought to be indices of emotional intelligence. Even if it can be successfully argued that happiness and optimism should be
Figure 7.4 The Bar-On model of trait emotional intelligence 138
Emotional intelligence
included, my second point is that there could well be explanations other than EI that account for some people being happier than others. Third, some of the concepts seem rather similar: happiness and optimism again, for example. Is it possible to be optimistic without being happy? Fourth, as Palmer et al. (2003) observe, Bar-On’s own empirical evidence for the 15-factor/fve dimension structure of the test is less than convincing; he sometimes suggests that other models ft the data better. Palmer et al. (2003) factor analysed the items of the EQ-i using an Australian sample. Their analyses were methodologically superior to those used by Bar-On, and showed just six (not 15) scales. The most important factor found in their analysis comprised items that should, in theory, have formed two quite different scales: self-regard and happiness items, together with a scatter of items from other scales. The impulse control, problem solving and emotional self-awareness factors were found to emerge cleanly, while items that should have formed two separate fexibility and independence factors all loaded on the same factor. Items that should have formed three distinct factors (interpersonal relationship, social responsibility and empathy) all loaded substantially on one factor, suggesting that no meaningful distinction can be made between these concepts. All this is hardly convincing evidence for the structure of the EQ-i. When just one factor was extracted, most of Bar-On’s items showed substantial loadings on the factor, suggesting that it might be safer to talk in terms of a general factor of EI rather than Bar-On’s 15-factor/fve-dimensional model. There appear to have been no later attempts to test the structure of the English version of this scale. It seems that there are real problems with the Bar-On model. The items may measure a single trait of emotional intelligence, but there is scant evidence that the items in the questionnaire measure 15 facets of trait-EI which form fve factors.
Self-assessment question 7.3 What other sorts of experiment might you run to check whether a test claiming to assess emotional intelligence really did assess this trait?
Schutte’s model Schutte et al. (1998) developed a short self-report measure of trait-EI, the Emotional Intelligence Scale, which was designed to assess a single factor of emotional intelligence. Subsequently, others such as Petrides and Furnham (2000) and Saklofske et al. (2003) noted that only 33 of the 62 items which Schutte administered actually showed large loadings on this EI factor; the others were simply dropped from the questionnaire. This makes it diffcult to argue that Schutte’s questionnaire measures all aspects of EI. Like Saklofske et al. and others, Keele and Bell (2008) showed that the 62 items actually measured four factors of EI labelled ‘optimism/mood regulation’, ‘appraisal of emotions’, ‘social skills’ and ‘utilisation of emotions’ although some others have occasionally found different structures (Gong and Paulson, 2018). Despite these problems the Schutte scale is still fairly widely used, not least because it is not a carefully protected commercial test; the 33 items are reproduced in the original 1998 article.
139
Emotional intelligence
Petrides and colleagues’ model Petrides and his colleagues at University College London have been developing and testing a four-factor model of trait emotional intelligence which they have defned as a constellation of emotional perceptions measured through questionnaires and rating scales (Petrides et al., 2016; Petrides et al., 2007). It tried to include measures of all of the variables shown in Table 7.1, plus other relevant personality characteristics such as alexithymia (inability to identify and describe emotions). These 15 facets of trait emotional intelligence are (sensibly) not interpreted individually, but are found to form four factors – wellbeing, self-control, emotionality and sociability. These four factors then intercorrelate to produce a second-order factor of trait emotional intelligence. In addition, two additional facets infuence the secondorder general factor of emotional intelligence directly. Table 7.5 shows the detailed structure of this scale. The test which measures these factors, the TEIQue (the Trait Emotional Intelligence Questionnaire; Petrides, 2009) has been extensively peer reviewed, is available in both a long and a short form (153 or 30 items) and has been translated into more than 30 languages. Some sample items are shown in Table 7.5. Best of all it is free to use (although donations are encouraged), and may be downloaded from http://psychometriclab.com/. This site also contains a number of useful links and offprints. The structure of the TEIQue has received much empirical support from other researchers, for example, Mikolajczak et al. (2007), whilst a wealth of other studies. (e.g., Chirumbolo et al., 2019; Laborde et al., 2016) fnd good support for its structure in different cultures. It seems that if one wants to measure trait emotional intelligence, the TEIQue is the test to use. It has been found that trait-EI is very closely related to the general factor of personality (tentatively named ‘social effectiveness’) which was outlined in Chapter 6. The evidence for this is reviewed by van der Linden et al. (2016) who conclude that trait-EI ‘centralizes the emotionrelated variance that is scattered among the fve, supposedly orthogonal, higher-order personality dimensions and augments it by incorporating signifcant additional variance as refected in its compelling evidence of incremental validity’. Their analysis of a large body of data revealed a correlation of 0.86 between the general factor of personality and trait-EI as measured by the TEIQue, after correcting for errors of measurement. The two traits are extremely similar. This is all the more impressive as the items in the questionnaires which measure the general factor of personality are not generally similar in meaning to those in the TEIQue. The importance of trait-EI in applied psychology (education, psychology at work, etc.) will be discussed in Chapters 17 and 18.
EXERCISE Scores on the general factor of personality are calculated by combining the traits from the fve-factor model of personality. Van der Linden et al. (2016) demonstrated that the general factor of personality is very similar to trait-EI. Think whether knowledge of people’s scores on trait-EI can be expected to predict behaviour much better than knowing their fve personality traits?
140
Facets which infuence general EI directly
Sociability
Emotionality
accomplished networkers with superior social skills. capable of infuencing other people’s feelings forthright, frank, and willing to stand up for their rights fexible and willing to adapt to new conditions driven and unlikely to give up in the face of adversity
Assertiveness
Adaptability
Self-motivation
capable of maintaining fulflling personal relationships
Relationships
Emotion management (others)
capable of taking someone else’s perspective
Empathy
Social awareness
capable of communicating their feelings to others
Emotion expression
Stress management clear about their own and other people’s feelings
refective and less likely to give in to urges capable of withstanding pressure and regulating stress
Impulse control
Emotion perception (self and others)
capable of controlling their emotions
confdent and likely to ‘look on the bright side’ of life
Optimism
Emotion regulation
cheerful and satisfed with their lives
Happiness
Self-control
successful and self-confdent
Self-esteem
Well-being
Description
Facet
Factor
I tend to get a lot of pleasure just from doing something well.
I usually fnd it diffcult to make adjustments to my lifestyle. (R)
When I disagree with someone, I usually fnd it easy to say so.
I’m usually able to infuence the way other people feel.
I can deal effectively with people.
I generally don’t keep in touch with friends. (R)
I fnd it diffcult to understand why certain people get upset with certain things. (R)
Others tell me that I rarely speak about how I feel. (R)
I often it diffcult to recognise what emotion I’m feeling
I’m usually able to deal with problems others fnd upsetting.
I tend to get ‘carried away’ easily. (R)
When someone offends me, I’m usually able to remain calm.
I generally believe that things will work out fne in my life.
Life is beautiful.
I believe I’m full of personal strengths.
Sample item (‘R”’= reverse-scored)
Table 7.5 Sample items from the TEIQue. Adapted from Petrides (2009) with permission
Emotional intelligence
141
Emotional intelligence
Summary Two aspects of emotional intelligence (ability-EI and trait-EI) seek to explain how well people can identify and use emotions, and understand why it is that some people seem to show better emotional awareness and sensitivity than others. We briefy discuss some experimental evidence supporting several models of ability-EI and trait-EI together with methods of assessing them. We show that some caution needs to be used when interpreting the results of scales measuring ability-EI, and that the TEIQue appears to be an effective means of measuring trait-EI.
Suggested additional reading You might wish to download the TEIQue from http://www.psychometriclab.com to look at its items. As the norms are not publicly available you will not be able to interpret your score – re-read Chapter 3 if this seems unfamiliar. You could however compare scores with friends, to see who is highest, and perhaps produce a table of norms for your class. Very little of the literature in this area is particularly technical (although some does require a basic grasp of multiple regression techniques) and so any of the references cited in this chapter can safely be recommended. Mayer et al. (2016) spell out some of their more recent thoughts about ability-EI. For a broad overview of the area coupled with a discussion of how EI is related to performance at work, Cartwright and Pappas (2008) is hard to beat. Mayer et al.’s (2008) upbeat overview of the status of EI is also worth reading – although perhaps in conjunction with Schulte et al. (2004), as it is possible that the reason that ability-EI predicts performance is because it is correlated with general cognitive ability, g. (See also Chapter 12 of this book.) Petrides et al. (2016) offer a well-balanced overview of recent developments in the area, while Andrei et al. (2016) present a large meta-analysis based on the TEIQue. The paper by Westfall and Yarkoni (2016) is crucially important when determining whether any measure of EI measures anything over and above general intelligence or the Big Five personality traits, but at the time of writing EI researchers do not appear to have explored the issues which it raises.
Answers to self-assessment questions SAQ 7.1 Trait-EI refers to personal characteristics – personality traits, if you like – that describe how ‘emotionally intelligent’ a person is – but where the assessment method does not require them to demonstrate how good they are at perceiving, managing, regulating or using emotions. Trait-EI is typically assessed by asking someone to rate how well various statements such as ‘It is important for me to keep in touch with my true emotions’, ‘People have told me that I am “genuine”’, ‘I am good at calming other people down’, for example, apply to them. Ability-EI requires people to demonstrate how effectively they can use emotions. For example, someone might be asked to identify which emotions
142
Emotional intelligence
they would experience in a particular situation or what emotion another person is experiencing in a video clip (where there is a strong consensus from experts as to what the ‘correct’ answers are).
SAQ 7.2 Mayer and Salovey’s four branches of ability-EI refer to: 1. perceiving emotions accurately in others and oneself 2. being able to use emotions to facilitate thought 3. grasping the meaning of emotions, recognising how your actions may affect other people’s emotional state and being able to talk about emotions 4. managing emotions where necessary – e.g., preventing anger leading to road rage.
SAQ 7.3 In order to test whether EI is the cause of a particular relationship – for example, to test whether high levels of EI result in greater happiness – it is necessary to also measure other variables which could affect EI and happiness: for example, the Big Five personality traits and perhaps general intelligence. Then one can perform some statistical analyses that essentially determine how well the Big Five personality traits and g (but not EI) predict happiness. Finally, one can consider whether adding EI to this selection of tests signifcantly improves the overall level of prediction. If so, then it is clear that the correlation between EI and happiness does not just come about because some other traits (the Big Five and g) infuence both happiness and scores on the trait-EI questionnaire. It is instead likely that there is something ‘special’ about EI, which causes it to infuence happiness, quite independently of the other traits. Hierarchical regression analysis, path analysis and partial correlations are some of the types of analysis that can be used to test this.
References Andrei, F., Siegling, A. B., Aloe, A.M., Baldaro, B. & Petrides, K. V. 2016. The Incremental Validity of the Trait Emotional Intelligence Questionnaire (TEIQue): A Systematic Review and Meta-Analysis. Journal of Personality Assessment, 98, 261–276.
143
Emotional intelligence
Austin, E. J. 2009. A Reaction Time Study of Responses to Trait and Ability Emotional Intelligence Test Items. Personality and Individual Differences, 46, 381–383. Austin, E. J. 2010. Measurement of Ability Emotional Intelligence: Results for Two New Tests. British Journal of Psychology, 101, 563–578. Bar-On, R. 1997. Manual for the Bar-on Emotional Quotient Inventory. North Tonawanda, NY: Multi Health Systems. Cartwright, S. & Pappas, C. 2008. Emotional Intelligence, Its Measurement and Implications for the Workplace. International Journal of Management Reviews, 10, 149–171. Caruso, D. 2019. Example MSCEIT Items [Online]. Available: https://www.eiskillsgroup. com/example-items/ Chirumbolo, A., Picconi, L., Morelli, M. & Petrides, K. V. 2019. The Assessment of Trait Emotional Intelligence: Psychometric Characteristics of the TEIQue-Full Form in a Large Italian Adult Sample. Frontiers in Psychology, 9. https://doi.org/10.3389/fpsyg.2018.02786 Denham, S. A. 2018. Implications of Carolyn Saarni’s Work for Preschoolers’ Emotional Competence. European Journal of Developmental Psychology, 15, 643–657. Denham, S. A., Blair, K. A., Demulder, E., Levitas, J., Sawyer, K., Auerbach-Major, S. & Queenan, P. 2003. Preschool Emotional Competence: Pathway to Social Competence? Child Development, 74, 238–256. Denham, S. A., Zoller, D. & Couchoud, E. A. 1994. Socialization of Preschoolers’ Emotion Understanding. Developmental Psychology, 30, 928–936. Dunn, J., Brown, J. & Beardsall, L. 1991. Family Talk About Feeling States and Children’s Later Understanding of Others’ Emotions. Developmental Psychology, 27, 448–455. Ebner, N. C., Riediger, M. & Lindenberger, U. 2010. Faces – A Database of Facial Expressions in Young, Middle-Aged, and Older Women and Men: Development and Validation. Behavior Research Methods, 42, 351–362. Farrelly, D. & Austin, E. J. 2007. Ability El as an Intelligence? Associations of the MSCEIT with Performance on Emotion Processing and Social Tasks and with Cognitive Ability. Cognition & Emotion, 21, 1043–1063. Fiori, M. & Antonakis, J. 2011. The Ability Model of Emotional Intelligence: Searching for Valid Measures. Personality and Individual Differences, 50, 329–334. Goleman, D. 1995. Emotional Intelligence. New York: Bantam Books. Gong, X. P. & Paulson, S. E. 2018. Validation of the Schutte Self-Report Emotional Intelligence Scale with American College Students. Journal of Psychoeducational Assessment, 36, 175–181. Gordon, C. & Debus, R. 2002. Developing Deep Learning Approaches and Personal Teaching Effcacy within a Preservice Teacher Education Context. British Journal of Educational Psychology, 72, 483–511. Gross, J. J. & John, O. P. 2003. Individual Differences in Two Emotion Regulation Processes: Implications for Affect, Relationships, and Well-Being. Journal of Personality and Social Psychology, 85, 348–362. Gunnery, S. D. & Ruben, M. A. 2016. Perceptions of Duchenne and Non-Duchenne Smiles: A Meta-Analysis. Cognition & Emotion, 30, 501–515. Isen, A.M. 2001. An Infuence of Positive Affect on Decision Making in Complex Situations: Theoretical Issues with Practical Implications. Journal of Consumer Psychology, 11, 75–85. Karstad, S.B., Wichstrom, L., Reinfjell, T., Belsky, J. & Berg-Nielsen, T.S. 2015. What Enhances the Development of Emotion Understanding in Young Children? A Longitudinal Study of Interpersonal Predictors. British Journal of Developmental Psychology, 33, 340–354.
144
Emotional intelligence
Keele, S. M. & Bell, R. C. 2008. The Factorial Validity of Emotional Intelligence: An Unresolved Issue. Personality and Individual Differences, 44, 487–500. Laborde, S., Allen, M. S. & Guillen, F. 2016. Construct and Concurrent Validity of the Shortand Long-Form Versions of the Trait Emotional Intelligence Questionnaire. Personality and Individual Differences, 101, 232–235. Legree, P. J., Psotka, J., Robbins, J., Roberts, R. D., Putka, D. J. & Mullins, H. M. 2014. Profle Similarity Metrics as an Alternate Framework to Score Rating-Based Tests: MSCEIT Reanalyses. Intelligence, 47, 159–174. MacCann, C. & Roberts, R. D. 2008. New Paradigms for Assessing Emotional Intelligence: Theory and Data. Emotion, 8, 540–551. Maul, A. 2012. The Validity of the Mayer–Salovey–Caruso Emotional Intelligence Test (MSCEIT) as a Measure of Emotional Intelligence. Emotion Review, 4, 394–402. Mayer, J. D., Caruso, D. R. & Salovey, P. 1999. Emotional Intelligence Meets Traditional Standards for an Intelligence. Intelligence, 27, 267–298. Mayer, J. D., Caruso, D. R. & Salovey, P. 2016. The Ability Model of Emotional Intelligence: Principles and Updates. Emotion Review, 8, 290–300. Mayer, J. D. & Salovey, P. 1997. What Is Emotional Intelligence? In P. Salovey & D. J. Sluyter (ed.) Emotional Development and Emotional Intelligence: Educational Implications. New York: Harper Collins. Mayer, J. D., Salovey, P. & Caruso, D. R. 2002. Mayer–Salovey–Caruso Emotional Intelligence Test (MSCEIT) User’s Manual. Toronto, ON: MHS Publishers. Mayer, J. D., Salovey, P. & Caruso, D. R. 2008. Emotional Intelligence – New Ability or Eclectic Traits? American Psychologist, 63, 503–517. Mayer, J. D., Salovey, P. & Caruso, D. R. 2012. The Validity of the MSCEIT: Additional Analyses and Evidence. Emotion Review, 4, 403–408. Mayer, J. D., Salovey, P., Caruso, D. R. & Sitarenios, G. 2003. Measuring Emotional Intelligence with the MSCEIT V2.0. Emotion, 3, 97–105. McRorie, M. & Sneddon, I. 2007. Real Emotion Is Dynamic and Interactive. Second International Conference on Affective Computing and Intelligent Interaction (ACII2007) Lisbon, Portugal. Mehrabian, A. 2000. Beyond IQ: Broad-Based Measurement of Individual Success Potential or ‘Emotional Intelligence’. Genetic Social and General Psychology Monographs, 126, 133–239. Mikolajczak, M. R., Luminet, O., Leroy, C. & Roy, E. 2007. Psychometric Properties of the Trait Emotional Intelligence Questionnaire: Factor Structure, Reliability, Construct, and Incremental Validity in a French-Speaking Population. Journal of Personality Assessment, 88, 338–353. Mischel, W., Shoda, Y. & Rodriguez, M. L. 1989. Delay of Gratifcation in Children. Science, 244, 933–938. Palmer, B. R., Manocha, R., Gignac, G. & Stough, C. 2003. Examining the Factor Structure of the Bar On Emotional Quotient Inventory with an Australian General Population Sample. Personality and Individual Differences, 35, 1191–1210. Petrides, K. V. 2009. Psychometric Properties of the Trait Emotional Intelligence Questionnaire (TEIQue). In C. Stough, D. H. Saklofske & J. D. A. Parker (eds.) Assessing Emotional Intelligence: Theory, Research, and Applications. New York, NY: Springer Science + Business Media. Petrides, K. V. & Furnham, A. 2000. On the Dimensional Structure of Emotional Intelligence. Personality and Individual Differences, 29, 313–320.
145
Emotional intelligence
Petrides, K. V., Mikolajczak, M., Mavroveli, S., Sanchez-Ruiz, M. J., Furnham, A. & PerezGonzalez, J. C. 2016. Developments in Trait Emotional Intelligence Research. Emotion Review, 8, 335–341. Petrides, K. V., Pita, R. & Kokkinaki, F. 2007. The Location of Trait Emotional Intelligence in Personality Factor Space. British Journal of Psychology, 98, 273–289. Pons, F. & Harris, P. 2000. Test of Emotion Comprehension: TEC. Oxford: Oxford University Press. Redick, T. S., Shipstead, Z., Harrison, T. L., Hicks, K. L., Fried, D. E., Hambrick, D. Z., Kane, M. J. & Engle, R. W. 2013. No Evidence of Intelligence Improvement after Working Memory Training: A Randomized, Placebo-Controlled Study. Journal of Experimental Psychology-General, 142, 359–379. Reich, J. W., Zautra, A. J. & Potter, P. T. 2001. Cognitive Structure and the Independence of Positive and Negative Affect. Journal of Social and Clinical Psychology, 20, 99–115. Roberts, R. D., Zeidner, M. & Matthews, G. 2001. Does Emotional Intelligence Meet Traditional Standards for an Intelligence? Some New Data and Conclusions. Emotion, 1, 196–231. Rocha, A. A., Roazzi, A., Silva, A. L., Candeais, A. A., Minervino, C. A., Roazzi, M. M. & Pons, F. 2015. Test of Emotion Comprehension: Exploring the Underlying Structure through Confrmatory Factor Analysis and Similarity Structure Analysis. In A. Roazzi, B.C. D. Souza & W. Bilsky (eds.) Facet Theory: Searching for Structure in Complex Social, Cultural and Psychological Phenomena. Recife, Brazil: Editora UFPE. Saklofske, D. H., Austin, E. J. & Minski, P. S. 2003. Factor Structure and Validity of a Trait Emotional Intelligence Measure. Personality and Individual Differences, 34, 707–721. Salovey, P. & Mayer, J. D. 1990. Emotional Intelligence. Imagination, Cognition, and Personality, 9, 185–211. Schulte, M. J., Ree, M. J. & Carretta, T. R. 2004. Emotional Intelligence: Not Much More Than g and Personality. Personality and Individual Differences, 37, 1059–1068. Schutte, N. S., Malouff, J. M., Hall, L. E., Haggerty, D. J., Cooper, J. T., Golden, C. J. & Dornheim, L. 1998. Development and Validation of a Measure of Emotional Intelligence. Personality and Individual Differences, 25, 167–177. Shamosh, N. A. & Gray, J. R. 2008. Delay Discounting and Intelligence: A Meta-Analysis. Intelligence, 36, 289–305. Spering, M., Wagener, D. & Funke, J. 2005. The Role of Emotions in Complex ProblemSolving. Cognition & Emotion, 19, 1252–1261. Sprung, M., Munch, H. M., Harris, P. L., Ebesutani, C. & Hofmann, S. G. 2015. Children’s Emotion Understanding: A Meta-Analysis of Training Studies. Developmental Review, 37, 41–65. Van der Linden, D., Dunkel, C. S. & Petrides, K. V. 2016. The General Factor of Personality (Gfp) as Social Effectiveness: Review of the Literature. Personality and Individual Differences, 101, 98–105. Westfall, J. & Yarkoni, T. 2016. Statistically Controlling for Confounding Constructs Is Harder Than You Think. Plos One, 11, 22.
146
Chapter 8 Reliability and validity of psychological tests
Introduction Chapter 3 made the point that in order to measure abstract psychological concepts we need to construct tests or questionnaires and use the scores on these tests or scales as operational defnitions of the theoretical terms. For example, we might decide to use the score on the TEIQue as an operational defnition of trait emotional intelligence. Clearly, if we make a bad decision when choosing a test, so that scores on test do not refect levels of the trait we are trying to measure, the results of any study would be meaningless. Thus we have to fnd ways of identifying a ‘good’ test, by which we mean one whose scores correlate substantially with the trait we are trying to measure. In practice this normally involves two steps: determining whether a set of items form a trait (reliability) and checking that the trait which the items measure is the one we are trying to measure (validity).
Learning outcomes Having read this chapter you should be able to: l l l l l
Outline the principles of psychological measurement. Discuss the domain sampling model of reliability. Show an appreciation of the meaning and importance of coeffcient alpha. Discuss several forms of validity and give examples of their use. Show an understanding of the relationship between reliability and validity, and why both of them are important characteristic of scales.
147
Reliability and validity of psychological tests
l
Show an understanding of the terms unidimensionality, random errors of measurement, systematic errors of measurement, bias, reliability, true score, domain sampling model, standard error of measurement, coeffcient alpha, validity, face validity, content validity, construct validity, predictive validity and incremental validity.
All measuring instruments, whether measuring physical or psychological characteristics, share three basic principles. The frst is that they should assess only one aspect of an object (or person) at a time. The reading on a ruler is infuenced by the length of the thing which we wish to measure – not its weight, colour, hardness or anything else. Similarly, scores on a psychological scale need to refect just the single trait or aptitude which the scale is supposed to measure, and not other things – otherwise it is impossible to determine the meaning of a person’s score. Suppose that a person obtained a middling score on a scale half of whose items measured knowledge of sport and half of which measured their ability to visualise things. You would not be able to tell whether the person knew a lot about sport but could not visualise things at all, whether they were wonderful at visualisation but knew nothing about sport, or whether they performed at an average level in each area. Knowing that the person had a middling score would not really help you to understand them at all. The second principle is that there should be as little random error as possible associated with the measurement. We are all familiar with the problem of trying to take readings from devices such as those in Figure 8.1. These are diffcult to read accurately; because the needle is above the dial, the angle of the gauge will affect the reading which we make. And the sundial lacks any fne markings, so we have to estimate whether the time it shows is 2.53 or some slightly different value. The third basic principle is one that we usually take for granted in the physical sciences – that the instrument actually measures what it claims to. We trust the manufacturer’s claim that the dial attached to the piping measures temperature and is uninfuenced by pressure, or anything else. We assume that the position of the shadow on the sundial represents time of day, and not the season of the year or the humidity of the air. When applied to psychological tests, the principle that a scale should measure only one trait or state is known as ‘unidimensionality’. The principle that scores of test should be accurate, in the sense that they are little infuenced by random errors of measurement, is known as ‘reliability’. The principle that the inferences drawn about the meaning of a particular test score should be correct is known as ‘validity’. Psychometrics is a branch of statistical theory that has been developed to produce tests that are unidimensional, reliable and valid. Before trying to estimate how accurately we can measure psychological concepts, it is also necessary to ensure that the numbers that we measure are suitable for the types of mathematical operation that we perform on them. For example, if you coded the colours of fruit as red = 1, green = 2 and yellow = 3, there is nothing to stop you computing a number from the sum of a plum, a lime and a banana; you could even say that these three fruits are equal to eight red apples if you want to. But these mathematical operations make no sense at all: the numbers that you have measured are not suitable for mathematical operations such as this. You may feel that this point is obvious – but as we noted in Chapter 3, Michell (1997) has argued that
148
Reliability and validity of psychological tests
Figure 8.1 Two dials which are diffcult to read accurately
it is inappropriate to compute scores from questionnaires, etc. – or indeed perform any mathematical operations (e.g., adding together scores on several items) without frst demonstrating that the data are suitable for this. There is not space to go into these issues here, but they merit thought should you plan to do advanced work in this area.
Random errors of measurement It is useful to break down the measurement that we obtain into two components – the thing we are trying to measure (the true time, in the case of the sundial) plus measurement error: reading obtained on any occasion = true time + measurement error The reason this is useful is that it allows us to develop some statistical principles for reducing the amount of measurement error. In fact, it is possible to deduce much of this from common sense: if we believe that the sundial is accurate then it seems reasonable to assume that the error that infuences the readings that we make is random. That is, that we are (a) as likely to overestimate as underestimate the time, and (b) the amount by which we overestimate the time is likely to be similar to the amount by which we underestimate the time on another occasion: the distribution of errors is likely to be symmetric rather than skewed. Finally (c), we are more likely to make a small error (e.g., misreading the time on the sundial by 1 minute) than a large one (misreading it by 5 minutes). The frst two assumptions tell us that the mean for the error term is likely to be zero. (If the error term did not have a mean of zero this would imply that we will systematically over- or underestimate the time – as would happen, for example, if the sundial was poorly constructed or positioned so that its readings were always 15 minutes ‘fast’.) Those with a background in the physical sciences will know that even when measurement errors are substantial (such in reading the sundial), there is a simple method for obtaining a more accurate reading. Assuming that the measurements we make are independent of each other, because we assume that the errors form a normal distribution, measuring something on several occasions and averaging these measurements is a surefre way of improving the
149
Reliability and validity of psychological tests
accuracy of our measurement. This is because some estimates are likely to be a little high, some a little low, but when they are averaged they should be close to the true value. Random measurement errors tend to cancel one another out. So if we ask 71 people to read the sundial (in very quick succession!) and plot a graph (histogram) of their readings, it will look something like Figure 8.2a. We can estimate the true time by averaging the readings; for this example, it is 10.53. To see the distribution of error scores, it is just necessary to subtract this value from each reading, as shown in Figure 8.2b. These errors of measurement form a bell-shaped normal distribution with a mean of zero. The width of the curve (its standard deviation) refects how much measurement error there is; the wider the curve, the more error there is in reading the dial. In this example, we can be fairly sure that we can read the time to an accuracy of ± 2 minutes or so. We would like the standard deviation of the measurement error to be as close as possible to zero, for this implies that a single reading is likely to be very close indeed to the true time. Can we apply this principle when measuring ability, mood, personality, etc.? Given that repeated measurements reduce the amount of error in any assessment, it would seem logical to construct a test simply by simply repeating the same item over and over again. In the physical sciences (above the atomic level, at any rate) the act of measuring something does not appreciably infuence the quantity being measured. However, the same is not true in psychological tests. Once a person has solved a puzzle in an ability test, they are going to remember the solution if given exactly the same puzzle again. If a person is asked to rate how much they fear the dark, they will simply remember what they answered if asked the same question on a second occasion; the second response will be infuenced by the way they responded on the frst occasion. The measurements are not independent if the same item is repeated several times in a test. One reason why it is not possible to construct a test by repeating exactly the same item over and over again is because these measurements
Figure 8.2 Time readings and error distribution from a sundial
150
Reliability and validity of psychological tests
will not be ‘locally independent’ of each other, as required by the statistical theory underlying the averaging of scores.
Systematic errors of measurement There is a second reason why it is inappropriate to repeat the same item over and over in a test. Single items in psychological tests are probably not unidimensional: they do not measure a single trait plus measurement error. The way we respond to an item is infuenced by (a) the trait or state we seek to assess, (b) random error and (c) several systematic errors. Systematic errors are simply traits or states other than the one we are trying to assess, and the infuence of these needs to be minimised for us to be able to interpret the meaning of scores. Some of these systematic errors may simply be personality or ability traits. For example, way in which people respond to the item ‘Do you often procrastinate?’ using a seven-point Likert scale will be infuenced both by the personality trait of conscientiousness and by their vocabulary, as some people may not know the meaning of the word ‘procrastinate’. To make matters worse, some people may (for reasons best known to themselves) tend to agree with every question which they are asked . . . there are other possibilities, too. These are known as ‘response biases’ because they describe idiosyncratic ways in which people use the response scale. They are discussed in Chapter 21. The key thing to note is that while readings on physical measuring devices (rules, scales, etc.) are generally infuenced by only a single characteristic of the thing being assessed (e.g., its temperature, or the time of day), the way in which a person responds to an item in a test will be infuenced by a whole bundle of different ability traits, personality traits, moods, motivational factors and response biases, as well as random error. Individual test items are unlikely to be unidimensional.
ExERCISE Suppose that you asked a student the question ‘do you enjoy drunken parties?’ in a personality questionnaire and they then responded by marking a fve-point scale ranging from ‘strongly agree’ to ‘strongly disagree’. Make a list of half a dozen things that might infuence the response they mark.
Apart from those variables that are likely to show little variation within the group (such as being able to understand all of the words in the sentence), my list includes the following: l their level of Extraversion (a personality trait), which is what the item was designed to
measure l the number of parties they have recently attended (their liver may need a rest) l their age
151
Reliability and validity of psychological tests l their religious beliefs/ethnic background l social desirability – some students may fnd it diffcult to admit that they would much
l l
l l l l
rather be working in the university library than partying and so would tend to overstate their true liking for parties the context in which the question is asked – a potential employer and a psychology student might well obtain different answers to this question the student’s impression of what is being assessed – for example, one person may read the question, assume that it is being used to assess whether they have a drinking problem and answer it accordingly, whereas someone else may believe that it measures their level of extraversion and so answer it accordingly the way in which a person uses the fve-point scale – some individuals use scores of 1 and 5 quite freely, while others never use the extremes of the scale a tendency to agree (acquiescence) the student’s mood random error – if you asked the student the same question a couple of minutes later you might obtain a slightly different result.
Your list probably contains other important variables, too. A whole host of these ‘nuisance factors’ determines how an individual will respond to a single item in a personality test and we shall consider some of these in Chapter 21. Much the same applies to items in ability tests. Performance here might be affected by anxiety, luck in guessing the correct answer, misunderstanding of what is expected, the effects of drugs, social pressures (perhaps copying others, or deliberately underperforming so as not to stand out from the group), perceived importance of obtaining a high mark and so on, as well as ability. We could make a similar case for ratings of behaviour (when aspects of the rater’s personality and sensitivity would also affect the ratings made). Every piece of data collected when assessing individual differences is thus likely to be infuenced by a vast number of factors, as shown in Figure 8.3 The variables at the top of Figure 8.3 are systematic infuences. One of these is what the item is supposed to assess, while the others are unwelcome systematic errors of measurement. The ‘measurement error’ term refers at the bottom of the fgure refers to random errors of measurement. Figure 8.3 also includes ‘unique variance’. This should be familiar from Chapter 5. Here it means that people enjoy drunken parties to a particular extent – and this is cannot be predicted from the other traits. Some people just like or dislike parties for no obvious reason! It is easy to estimate how well an item measures the trait which it is supposed to measure, simply by gathering some data and correlating people’s scores on an item with the best available estimate of the true score. In practice it is almost impossible to fnd a personality test item where the personality trait explains more than 20% or so of the variance (corresponding to a correlation with the true score of about 0.45); 80% or so of the variation in scores between people refects random error, unique variance and traits other than the one the item is supposed to assess.
Self-assessment question 8.1 How might you estimate each person’s true score, so that you can determine how accurately each item measures it?
152
Reliability and validity of psychological tests
Agreeing
Social desirability
Peer group pressure
Age
Sociability
Response to one item in questionnaire
Unique variance
Measurement error
Figure 8.3 Some variables that may infuence a person’s response to one item on a personality inventory This is all rather embarrassing. It seems that it is diffcult or impossible to devise items that are pure measures of a trait, since individuals’ responses to a single test item are going to be infuenced by a whole host of traits, states, attitudes, moods and luck. It also provides a second reason why it is nonsensical to construct a psychological test by repeatedly administering the same item (or another item which means almost the same thing) over and over again. Unlike random errors, the systematic errors will not tend to cancel one another out when measurements are repeated. Suppose that we write an item which is supposed to measure Extraversion. Item 1: I love meeting new people at social functions. We then paraphrase it twice: Item 2: Parties are great for getting acquainted with people. Item 3: Getting to know other people at festive occasions is fun. I think these items are paraphrases because it is not obvious how a person could agree with one of the items yet disagree with one of the others. They all mean much the same thing, and will so be all affected by the same ‘nuisance variables’. I would also guess that two of these will be social desirability (few people will want to admit that they are shy) and social anxiety. Hence responses to all three items will probably be infuenced by all three of these variables. (This can be checked, by correlating responses to each item with scores on other personality traits.)
153
Reliability and validity of psychological tests
Let us assume that these items are scored using a Likert scale, with ‘strongly agree’ scoring fve points and ‘strongly disagree’ one point. Administering all three of these items to people, rather than just the frst one, and averaging their scores on the three items will reduce the infuence of random errors of measurement; people who misread the question or did not know the meaning of one of the words are unlikely to make the same mistake three times in succession! However it should be obvious that a person’s average score on the three questions will still measure a mixture of extraversion, social desirability and social anxiety. Creating tests by using items with similar meanings will not help eliminate non-systematic errors of measurement. However, there is a way round this problem. Instead of taking multiple measurements using the same item, it is possible to devise several different items measuring extraversion, each of which will probably be infuenced by a different set of ‘nuisance factors’. Chapter 6 shows that Eysenck views extraverts as being sociable, optimistic, talkative and impulsive, etc., so it would be possible to phrase questions to measure these variables, too. An item such as ‘do you keep quiet on social occasions?’ would be infuenced by a number of nuisance factors, only a few of which would be the same as for the frst item. Thus if a questionnaire were constructed from a number of items, each affected by a different set of nuisance factors, the infuence of the nuisance factors would tend to cancel out, while the infuence of the trait would accumulate. To give an example of this, in order to stop individual differences in acquiescence (agreeing with items) from being confounded with scores on the trait of interest, it is simply necessary to ensure that about half of the items in a test are phrased so that agreeing with it implies a high score on the trait (‘I enjoy drunken parties’ for extraversion) while half of the items are phrased in the opposite direction (‘I enjoy spending time on my own’). In order to produce a more accurate measure of a trait, we need to average responses to several different items, each of which is infuenced by a different set of ‘nuisance factors’. It is important to remember this assumption, as many people write items that are so similar in content that they will be infuenced by the same nuisance factors. For example: Item 1. At times I think I am no good at all. Item 2. All in all, I am inclined to feel that I am a failure. Item 3. I feel I do not have much to be proud of. Item 4. I certainly feel useless at times. These items are taken from a real test that has over 1000 citations in social psychology, the Rosenberg inventory, which was mentioned in Chapter 2. It is supposed to measure selfesteem, but it seems to be fawed on two counts. First, as the items are paraphrases of each other, people will remember how they answered previous items and are bound to answer them all similarly because they all say the same thing. Second, because they all say the same thing, each will be infuenced by the same set of ‘nuisance factors’ (for example, ‘no’ seems to be the socially desirable response for all of them), so when we add together people’s scores on the items, the test score will not be unidimensional – and this is precisely what has been found (Marsh et al., 2010). Contrast these items with the following: Item 1. I believe that insurance schemes are a good idea. Item 2. I would take drugs that may have strange effects. Item 3. I sometimes tease animals.
154
Reliability and validity of psychological tests
These three items (from the Psychoticism scale of another real test – the Eysenck Personality Questionnaire) ask genuinely different questions. People’s responses to the frst item would not dictate how they must respond to the other two. The frst item is scored in the opposite direction to the other two, to reduce the effects of acquiescence. And because the items vary so much in content, each is likely to be affected by a different set of ‘nuisance factors’ meaning that when the scores are added together these will tend to cancel out, thus leading to a test score that measures a single dimension of personality. Psychometricians would say that the items show ‘local independence’, which means that the only reason that they correlate is because they each measure Psychoticism to some extent; answering one item in a particular way does not determine how one has to respond to another item. Many psychologists make their reputations and livings from developing and analysing scales whose items are not locally independent, and the journals are full of such papers and ‘theories’. I think that doing so is seriously misguided for the reasons given above, but this is perhaps a minority view; your lecturer may well disagree with it. However if I am correct, it means that many scales which are widely used in psychology may not measure traits at all. They just show that people can understand the meaning of the items, and respond consistently. I discuss the issues further in Cooper (2019).
Reliability The domain sampling model Suppose that we want to assess spelling ability. As a huge dictionary contains a list of all the words in the language, it would be possible (in theory) to ask people to spell every single word in the language under optimal testing conditions. The score they obtained would be a completely accurate measure of their spelling ability, as it is based on every word in the language (the complete domain of items). It is known as their true score. Of course, administering all these words is impractical; it would take weeks or months to do. It makes sense to try to estimate what people’s true scores on the spelling test would be through giving them tests based on samples of words taken from the dictionary. For example, one could repeatedly stick a pin in the dictionary in order to produce a random sample of words which could be given to a large sample of people. If the test is effective, people’s scores on the test should correlate very substantially with their true scores – the scores that they obtained when given all words in the domain. The square of this correlation between people’s scores on a test and the scores that they obtain if they are given every possible item that could conceivably be written to measure the topic is known as the reliability of a test. It shows how much random measurement error is associated with people’s scores on the test, a reliability of 1.0 indicating that that the test is completely free from measurement error, as it correlates perfectly with people’s true scores. A term called the ‘standard error of measurement’ (SEM) refects this directly. If scores on a test have a standard deviation of s and a reliability of α, the standard error of measurement is given by the formula: SEM = s 1 − ˛
So if scores on a test correlate 0.8 with the true score (implying a reliability of 0.64: see above) and the standard deviation of its scores is 10, its SEM is 6. The SEM is important since it allows us to infer how close to a person’s true score their test score is: there is only a 0.05 chance that their observed score will be more than about 2 SEMs away from their true score. If a person’s true score is 72 on a test with reliability of 0.64 and a standard deviation of 10, we can be fairly confdent that when we measure their score using this test it will lie somewhere
155
Reliability and validity of psychological tests
between 72 – (2 × 6) and 72 + (2 × 6) – that is, between about 60 and 84. The ‘2’ is because we know that about 95% of the scores lie plus or minus two standard deviations from the mean of a normal distribution. (Several test handbooks say that if a person’s observed score is 72 their true score lies between 60 and 84. This is not correct; see for example Cooper (2019) if you ever plan to perform such analyses.) This is a very useful technique to remember if you ever plan to interpret an individual’s score on a test – which often happens in applied psychology, when assessing individuals on tests of intelligence, etc. Although the idea of the domain sampling model is most obviously applicable when the number of potential items is fnite (e.g., the number of words in the dictionary: the number of two-digit addition problems), the same principles apply even when the number of items that could be written is near infnite. For example, a test measuring anxiety is a sample of all the (many!) items that could conceivably have been written to measure the many aspects of anxiety. A test of mathematical ability is a sample of the near infnite number of mathematical items that could possibly have been written. The concept of reliability is still useful in these cases as (counter-intuitively, perhaps) it is not necessary to actually compute the correlation between the true score and the scores on the test to estimate the reliability of a test. Statistics such as ‘coeffcient alpha’ can give us a good approximation of what this correlation would be.
Coeffcient alpha I earlier showed that while an individual test item is a poor measure of a trait, a much better estimate of the value of a trait can be obtained if we add up the scores on a number of items, each of which measures some different aspect of the trait. Suppose some items are written to measure a certain trait and are then administered to about 200 people. Specialised computer programs (such as the SPSS ‘reliability’ procedure) can be used to calculate a statistic from these data that various authors have referred to as the ‘alpha’, ‘coeffcient alpha’, ‘KR20’, ‘Cronbach’s alpha’ or ‘internal consistency’. It is amazingly important. For long tests, coeffcient alpha is a very close approximation to the square of correlation between individuals’ scores on a particular mental test and their ‘true score’ on a trait (Nunnally 1978). Thus a coeffcient alpha of 0.7 implies a correlation of 0.7 or 0.84 between scores on the test and individuals’ true scores, whereas an alpha value of 0.9 implies that the correlation is as large as 0.95. Since the whole purpose of using psychological tests is to try to achieve as close an approximation as possible to a person’s true score on a trait, it follows from this that tests should have a high value of alpha. The key point is that it is a statistic, calculated from data, that can be used to estimate the correlation between the test score and the true score – even if we cannot measure people’s true scores. As you might expect from reading the previous section, coeffcient alpha is infuenced by two things: l the average size of the correlation between the test items. Since we assumed in the previous
section that different test items are affected by different ‘nuisance factors’, the only reason why individuals’ responses to any pair of items should be correlated is because both of these items measure the same underlying trait. A large correlation between a pair of items thus indicates that they are both good measures of the trait; a smaller correlation suggests that responses to one or both of the items are substantially infuenced by random error or by ‘nuisance factors’ such as social desirability, other personality or ability traits, etc. l the number of items in the scale. Again, I pointed out that the whole purpose of building a scale from several test items is to try to ensure that the nuisance factors cancel out.
156
Reliability and validity of psychological tests
It should hopefully be obvious that the more items there are in a scale, the more likely it is that these nuisance factors will all cancel out. A statistical procedure known as the Spearman–Brown formula (given in any standard psychometrics text) is useful here. It allows one to predict how the reliability of the scale will increase or decrease if the number of items in the scale is changed. A widely applied but theoretically baseless rule of thumb suggests that a test should not be used for any purpose if it has an alpha value below 0.7, and that it should not be used for important decisions about an individual (e.g., for assessment of the need for remedial education) unless its alpha value is above 0.9. Unfortunately, if items are written so that they share the same ‘nuisance factors’ (as in the example of the self-esteem scale given earlier) the correlations between the items will be infated and the value of alpha will be high. It is up to the person writing the items to ensure that the only possible reason why the responses to any pair of items should be correlated together is because of the underlying personality or ability trait that they are both assumed to measure and not because they are infuenced by similar response biases or other traits, or because answering one item in a particular fashion forces them to answer another item in a particular way, just because the items mean much the same thing.
Self-assessment question 8.2 Five test items designed to measure Extraversion were given to a large sample of people. The correlations between the test items were calculated, and are shown in Table 8.1. a. What does the correlation between any pair of items show? b. Which item seems to be the least effective measure of the trait? c. Suppose that you calculated alpha from the correlations shown in Table 8.1 and found that it was lower than 0.7. What might you do?
Table 8.1 Correlations between fve hypothetical test items, each of which has been scored such that a high score indicates the extraverted response Item 1
Item 2
Item 3
Item 4
Item 1
1.0
Item 2
-0.02
1.0
Item 3
0.10
0.28
1.0
Item 4
-0.15
0.31
0.24
1.0
Item 5
-0.12
0.25
0.27
0.36
Item 5
1.0
157
Reliability and validity of psychological tests
It is also important to ensure that the sample of individuals whose test scores are used to compute alpha is similar to the group on whom the test will be used. It is pointless to fnd an alpha value of 0.9 with university students and then conclude that the test will be suitable for use with the members of the general population, for university students are not a random sample of the population; they are generally young, academically talented, literate and numerate. Again, there is no numerical way to determine whether a test that has a high alpha value in one sample will also work in another group – it is a matter of using common sense. I would certainly be wary about assuming that a personality test, developed using American college students, will work for the general population of the UK (or vice versa), but not everyone shares these misgivings. It is safest to calculate alpha whenever one uses a test, although this does presuppose that a large sample of people will be tested. This is because like any statistic, the value of alpha will vary from sample to sample, so it is highly advisable to calculate confdence intervals for alpha as described by Feldt et al. (1987); a spreadsheet for performing these analyses is given in Cooper (2019). This shows how close the value of alpha calculated from a particular sample is likely to be to its true value – the average alpha which would be obtained if data were obtained from hundreds of different samples. If alpha is calculated from a small sample of people, it can be substantially larger or smaller than its true value, so calculating alpha from small samples is pointless (although several researchers do so). There are also other ways of estimating the reliability of a test; McDonald’s omega, Revelle’s beta and Raykov’s rho are alternative statistics which will occasionally be encountered.
Self-assessment question 8.3 Suppose that a 50-item questionnaire has a coeffcient alpha of 0.9. Do you think that this must imply that it measures a single trait?
Alternate-forms reliability This involves constructing two different tests by randomly sampling items from the domain. For example, two spelling tests, Test A and Test B, may each be constructed by randomly sampling 100 words from the dictionary. As these tests are the same length and are constructed in the same way, you would expect each of the test scores to show the same correlation with the true score: rA,true = rB,true: the tests will have the same reliability, which is the square of this correlation. We cannot easily measure the correlation between scores on the test and the true score. However, we can administer both versions of the test to the same group of people and work out the correlation between Test A and Test B, rA,B. The reliability of either test can be shown to be estimated by rA,B. So if two tests are constructed by randomly sampling items from some domain and are found to correlate 0.8 in when administered to a large sample of people, we may conclude that reliability of the test is 0.8.
Parallel-forms reliability This is similar to alternate-form reliability, except that it involves two tests that have been developed to have items of similar diffculty and so the distribution of scores on the two
158
Reliability and validity of psychological tests
versions of the test will be the same. In order to create two parallel forms of a test, items are administered to a large sample of people and pairs of items with similar content and diffculty are identifed. For example, both may involve the solution of seven-letter anagrams, the answer being a word with a similar frequency of occurrence in the language. The diffculty of the two items must also be similar; perhaps only about 25% of the sample will be able to solve each one. One item will then be assigned to Form A of the test and the other to Form B. These two tests are marketed separately and (in theory) it should not matter which one is used for a particular application, since care is generally taken to ensure that the two versions produce similar distributions of scores (thus allowing the same tables of norms to be used for each form of the test). If both tests measure the same trait, one would expect a high positive correlation between individuals’ scores on the two forms of the test, just as with alternateforms reliability. This correlation is known as the parallel-forms reliability. However, as relatively few tests have parallel forms, it is rarely used.
Split-half reliability In the early days of testing, when reliability coeffcients had to be calculated by hand, calculating coeffcient alpha was time consuming and so a shortcut was developed to estimate alpha. Instead of adding together all of the items in a test to derive a total score, two scores were calculated – one based on the odd-numbered test items and the other on the evennumbered items. These two scores were then correlated together and after applying the Spearman–Brown formula (since the set of the odd or even items is only half as long as the full test), this yielded the split-half reliability. The problem is that there is no compelling reason why one should take the odd and even numbered items. Why not divide the test into three parts rather than two? Or compare the frst half with the last half? There are obviously a number of different ways of dividing up the items to perform a split-half analysis and each will probably give a different result – yet there is no logical or statistical reason for preferring one method of subdividing the test to another. Split-half reliability lacks the clear statistical rationale of alpha. There is no good reason to use it today.
Test-retest reliability Test-retest reliability, sometimes known as temporal stability, provides us with another way of estimating the reliability of a test. Whilst the four forms of reliability discussed above try to estimate how well scores on a test correlate with people’s true scores on a trait, testretest reliability checks whether trait scores stay more or less constant over time. Most tests are designed to measure traits such as extraversion, numerical ability or neuroticism and the defnition of a trait stresses that it is a relatively enduring disposition. This implies that individuals should have similar scores when they are tested on two occasions provided that: l nothing signifcant has happened to them in the interval between the two tests (e.g., no
emotional crises, developmental changes or signifcant educational experiences that might affect the trait) l the experience of sitting the test at Time 1 does not infuence the scores obtained at Time 2 other than via the trait which the test assesses. For example, we must assume that people cannot remember their answers to individual questions. Otherwise the score at Time 2 will be infuenced by memory, as well as the trait which the test assesses.
159
Reliability and validity of psychological tests
If a test shows that a child is a genius one month and of average intelligence the next, either the concept of intelligence is a state rather than a trait or the test is fawed. Assessment of test-retest reliability typically involves testing the same group of people on two occasions, which are generally spaced at least a month apart (in order to minimise the likelihood of people remembering their previous answers), yet not too far apart (in case developmental changes, learning or other life events affect individuals’ positions on the trait). Test-retest reliability is simply the correlation between these two sets of scores. If it is high (implying that individuals have similar levels of the trait on both occasions) it can be argued that the trait is stable and that the test is likely to be a good measure of the trait. This form of reliability is very different from the others discussed above, as it is based on the total score – it says nothing about how people perform on individual items. A set of items that had nothing in common could still have perfect test-retest reliability. Suppose you asked someone to add their house number, their shoe size and their year of birth on two separate occasions. This statistic would show impressive test-retest reliability, although the three items correlate zero (implying a coeffcient alpha of zero), and adding the items together makes no sense as the ‘test’ that results is not unidimensional.
Self-assessment question 8.4 a. What are the KR20 and the internal consistency of a test? b. Why is it unwise to paraphrase the same item a number of times when designing a test? c. What does the standard error of measurement tell you? d. Suppose you have two tests that claim to assess anxiety. Test 1 has a reliability of 0.81, and Test 2 a reliability of 0.56. What will be the correlation between each of these tests and the true score? What is the largest correlation that you are likely to obtain if you correlate scores on Test 1 with scores on Test 2?
When constructing tests researchers usually report both coeffcient alpha and test-retest reliabilities, as these two statistics tell us rather different things about the scale.
Validity At the start of this chapter, we set out three criteria for adequate measurement. These are that scores should only reflect one property of the thing (person) being assessed (unidimensionality), that they should be free of measurement error (show high reliability) and that the scores on the measuring instrument should refect the thing that the test claims to measure: a test purporting to measure anxiety should measure anxiety and not something else. The frst two of these issues were dealt with in the previous section. The validity of a test addresses the third issue, that of whether scores on a test actually measure what they are supposed to. We have seen that reliability theory can show whether or not a set of test items seem to measure some underlying trait. What it cannot do is shed any light on the nature of that trait,
160
Reliability and validity of psychological tests
for just because an investigator thinks that a set of items should measure a particular trait, there is no guarantee that they actually do so. In the early 1960s a considerable literature built up concerning the ‘Repression-Sensitization Scale’ (R-S Scale). This scale was designed to measure the extent to which individuals used ‘perceptual defence’(see Chapter 4) – that is, tended to be less consciously aware of emotionally threatening phrases than neutral phrases when these were presented for very brief periods of time. The items formed a reasonably reliable scale, so everyone just assumed that this scale measured what it claimed to do – it generated a lot of research. Then it was found that scores on this test showed a correlation of –0.91 with a well-established test of anxiety (Slough et al., 1984). As you know, the maximum correlation between two tests is limited by the size of their reliabilities, so a correlation of –0.91 actually implies that all of the variance in the R-S Scale could be accounted for by anxiety. It did not measure anything new at all. This tale conveys an important message. Even if a set of items appears to form a scale, it is not possible to tell what that scale measures just by looking at the items. Instead, it is necessary to determine this empirically by a process known as test validation. Test validation shows whether scores on a test provide a good operational defnition of some theoretical trait; for example, whether the Extraversion scale of the NEO-PI test is a good measure of extraversion. A test is said to be valid if it does what it claims to do, either in terms of theory or in practical application. For example, a test that is marketed as a measure of anxiety for use in the general UK population should measure anxiety and not social desirability, reading skill, sociability or any other unrelated traits. A test that is used to select the job applicants who are most likely to perform best in a particular occupation really should be able to identify the individual(s) who will perform best. Whilst the reliability of a test can be expressed as a certain number (for a particular sample of individuals), the validity of a test also depends on the purpose of testing. For example, a test that is valid for selecting computer programmers from the UK student population may not be useful for selecting sales executives. A test that is a valid measure of depression when used by a medical practitioner will probably not be valid for screening job applicants, since most individuals will realise the purpose of the test and distort their responses. It follows that reliability is necessary for a scale to be valid, since low reliability implies that the scale is not measuring anything. However, high reliability itself does not guarantee validity, since as shown earlier this depends entirely on how, why and with whom the scale is used. There are fve main ways of establishing whether a scale is valid.
Face validity Face validity merely checks that the items in a test or scale look as if they measure the trait of interest. The R-S Scale debacle described earlier shows that scrutinising the content of items is no guarantee that the test will measure what it is intended to. Despite this, some widely used tests (particularly in social psychology) are constructed by writing a few items, ensuring that alpha is high (which is generally the case because the items are paraphrases of one another) and then piously assuming that the scale measures the concept that it was designed to assess. It is vital to ensure that a test has better credentials than this before using it. It is necessary to establish that the test has some predictive power and/or that it correlates as expected with other tests before one can conclude that the test really does measure what it claims. The only real use of face validity is to ensure that the items look vaguely sensible, for if they do not then some participants may not take the test seriously.
161
Reliability and validity of psychological tests
Content validity Very occasionally it is possible to construct a test that must, by defnition, be valid. For example, suppose that one wanted to construct a spelling test. Since, by defnition, the dictionary contains the whole domain of items, any procedure that produces a representative sample of words from this dictionary has to be a valid test of spelling ability. This is what is meant by content validity. To give another example, occupational psychologists sometimes use ‘workbasket’ approaches in selecting staff, where applicants are presented with a sample of the activities that are typically performed as part of the job and their performance on these tasks is in some way evaluated. These exercises are not psychological tests in the strict sense, but the process can be seen to have some content validity. The problem is that it is rarely possible to defne the domain of potential test items this accurately. How would one determine the items that could possibly be used to measure creativity, for example? Thus the technique is not often used in ability tests, although it can be useful in constructing tests of attainment, where the assessment shows how many facts a person has assimilated following a course of instruction. Will your instructor assess you by looking at the index of this book and check that you know the meaning of a sample of these entries? Should you want to see how this works in practice, Hunt et al. (2011) show how a mathematics anxiety scale may be developed; they try to ensure content validity by identifying as wide a range of situations as possible where students are likely to encounter mathematics.
Construct validity One useful way of checking whether a test measures what it claims to assess is to perform thoughtful experiments to determine whether scores on the test behave as they should according to theory and previous studies. Suppose that a test is designed to measure anxiety in UK university students. How might its validity be checked through experiment? The frst approach (sometimes called convergent validation) is to check that the test scores relate to other things as expected. For example, if there are other widely used tests of anxiety on the market, a group of students could be given both tests and the two sets of scores may correlated together. A large positive correlation would suggest that the new scale is valid. A test measuring state anxiety could be administered to a group of students who claim to have a phobia about spiders, before and after showing them a tarantula. If their scores increase, then the test might indeed measure anxiety. It would be also be possible to discover whether students who were receiving counselling because of their high trait anxiety obtained higher scores on the test than control groups (e.g., a random sample of students, or students receiving counselling for other reasons). The basic aim of these convergent validations is to determine whether the test scores vary as would be expected on theoretical grounds. Unfortunately, a failure to fnd the expected relationships might be due to some problem, either with the test or with the other measures. For example, the other test of anxiety may not be valid or some of the individuals who say that they are phobic about spiders may not be. However, if scores on a test do appear to vary in accordance with theory, it seems reasonable to conclude that the test is valid. Studies of divergent validity check that the test does not seem to measure any traits with which it should, in theory, be unrelated. For example, the literature claims that anxiety is unrelated to intelligence, socioeconomic status, social desirability and so on. Thus if a test that purportedly measured anxiety actually showed a massive correlation with any of these variables, doubt would be raised as to whether it really measured anxiety at all.
162
Reliability and validity of psychological tests
Predictive validity Psychological tests are very often used to predict behaviour and their success in doing so is known as their predictive validity. For example, a test might be given to candidates for a post as a salesperson. The test would have predictive validity if it could be shown that the people with the highest scores on the test tended to make the most sales. This process sounds straightforward, but in practice tends not to be. The frst problem is the nature of the criterion against which the test is to be evaluated. Although noting the volume of sales achieved is quite straightforward, many occupations lack a single criterion. A university lecturer’s job is a case in point. Mine involved teaching, administration and research, committee work, the supervision of postgraduate students, providing informal help with statistics and programming, supporting and encouraging undergraduates and so on – the list is a long one and it is not obvious how most of these activities can be evaluated or their relative importance determined. In other cases (e.g., where employees are rated by their line manager), different assessors may apply quite different standards and so a higher score need not necessarily indicate better performance. A second problem is known as ‘restriction of range’. Selection systems generally operate through several stages, for example initial psychometric testing to reduce the number of applicants to manageable proportions, followed by interviews and more detailed psychological assessments of individuals who get through the frst stage. Applicants who are eventually appointed will all have similar (high) scores on the screening tests, otherwise they would have been rejected before the interview stage. Thus the range of scores in the group of individuals who are selected will be much smaller than that in the general population. This will create problems for any attempt to validate the screening test, since this restricted range of abilities will tend to reduce the correlation between the test and any criterion. There are ways around this (Dobson, 1988), but these two examples show how diffcult it can be to establish the predictive validity of a test. There are several designs that can be used to establish the predictive validity of tests. Most obviously, the tests may be administered frst and criterion data gathered later, such as when ability tests are used to predict sales performance, or aptitude in piloting an aircraft. Or the criterion data may be gathered retrospectively: for example, a test may be administered to measure adults’ social adjustment and the criteria data (of the quality of interaction with peers when they were at school) may be gathered through interviewing parents, teachers, etc. This is sometimes known as ‘postdiction’. Sometimes the criterion data may be gathered at the same time as the test is administered, such as when a frm is considering using a new selection test and decides to see whether it can predict the effectiveness of workers who are already in the organisation. Here the test and performance measures (e.g., supervisors’ ratings, sales fgures) may be obtained at the same time that the test is administered, a design that is sometimes referred to as ‘concurrent validity’.
Incremental validity Applied psychologists often want to combine scores on several different tests in order to predict behaviour. They will therefore want to know whether considering an extra variable (perhaps scores on another test, administered to job applicants) will appreciably improve their ability to predict some criterion performance. We will assume that they have access to people’s scores on several selection tests, and that they also know how well each person performs within the organisation. Techniques such as multiple regression (or logistic regression when the criterion has only two values, such as passing or failing a driving test) allow good performance on one scale to
163
Reliability and validity of psychological tests
compensate for poor performance on another; it shows the relative importance of the variables for predicting the criterion, as well as a statistic called the ‘multiple correlation coeffcient’, R, which indicates how well the optimally combined test scores together predict the criterion. It has a sound statistical rationale and shows how the scores on the various scales should be weighted and added together to predict the criterion. For example, analysing the data from 200 job applicants might show that the best possible estimate of a salesperson’s annual turnover is given by the equation Estimated turnover (in dollars) = 100 000 + 500 × Extraversion score – 130 × Neuroticism score Once a regression equation like this is set up, it is possible to study the extent to which adding another variable to the selection battery improves the accuracy of the prediction. If adding a new variable (for example, including a test of general intelligence as well as Extraversion and Neuroticism) leads to a useful increase in predictive power (a signifcant increase in R) then the new variable is said to have incremental validity.
Self-assessment question 8.5 a. b. c. d.
Must a reliable test be valid? Must a valid test be reliable? What is meant by the construct, content and predictive validity of a test? What are ‘convergent validity’ and ‘divergent validity’?
Overview of validity Establishing whether a test is valid is time consuming, diffcult – and absolutely vital. The problem is that here is no short answer to questions such as ‘what is the best test for assessing verbal ability’. It all depends on the age and ability of the people to be tested, their background (a test developed for white middle-class samples may well not be valid for minority group members, an issue known as ‘test bias’ – see Chapter 21) the degree of accuracy required, the time available for testing and whether the test is to be given to individuals or groups. Some tests may make fne discriminations between people at the bottom end of the spectrum (useful for detecting dyslexia) but be unable to make fne distinctions between the most able pupils. Some may measure speed of response as well as level of response, for example, determining how many synonyms a person can identify within 5 minutes, while others may give the child as long as they want to do the test. Some tests will require participants to make an open-ended response (e.g., defning the meaning of a word) while others will be multiple choice. So much depends on the purpose of testing that it is rarely possible to recommend a single test for all situations.
Factors which can reduce the validity of tests Several things can affect the validity of test scores; they are discussed in more depth in Chapter 21. One of the more important variables is ‘bias’ – where a test systematically over- or underestimates
164
Reliability and validity of psychological tests
the scores of group members. For example, a quiz about the rules of sport which contained questions about cricket would likely be harder for Canadians than for people from Pakistan or Australia, as Canadians do not generally play or follow the game. The quiz would systematically underestimate the sporting knowledge of Canadians. Issues of bias become important when measuring personality or ability from people of different racial, gender or cultural backgrounds. Bias must reduce the validity of a test, as a person’s score on a biased test will be affected by both the level of the trait which the test supposedly measures and by their membership of a certain group (racial, gender, etc.). It is a particular problem if tests are used for personnel selection, as a biased test will systematically under- or overestimate the scores of some applicant groups. Note that although they affect validity, bias and the other problems described below would not necessarily affect the reliability of a test. For example, if members of one group rated themselves one point lower than they should on each item of a Likert scale, this would not affect the correlations between the items (and so would not affect coeffcient alpha). Impression management is also important; when they are trying to give a good impression of themselves people may try to make themselves appear more extraverted, agreeable, conscientious and open and less neurotic than they really are. (How do we know this? By asking people to fll in questionnaires under two conditions – e.g., honestly and as if they want to portray themselves in a good light – and note the difference in means.) If everyone distorted their scores to the same extent it would not pose a problem; the validity of a test will be affected if some people are honest whilst others are less so. Because (once again) the score on a test refects both people’s level of the trait which the test is designed to measure and the extent to which they distort their responses. Finally, people might be genuinely unaware about how they behave. For example, you might think you are an excellent wit, whilst in reality everyone cringes because you insist on telling awful inappropriate jokes. Hence although you genuinely believe that you are completing a questionnaire honestly, your responses may not be accurate, and so this too will affect the validity of the scale.
Summary The reliability of a test is important because it shows how closely the test score resembles a person’s true score on the trait being measured, provided that some assumptions are met. Unfortunately, it is easy to overestimate internal consistency (coeffcient alpha) by including items in a test that are paraphrases of one another – an obvious problem that is not suffciently recognised in the literature. To avoid this, test designers need to check all pairs of test items to ensure that the assumption of local independence seems reasonable. The second form of reliability, test-retest reliability, shows whether scores stay stable over time. If test-retest reliability is low, the test may not measure a trait. It is vitally important to establish the content validity, construct validity and/or predictive validity of a test before using it for any purpose whatsoever. These analyses show how well scores on a particular test serve as an operational defnition of some trait or state. A test with low reliability cannot be a valid measure of a trait. However, high reliability does not guarantee high validity.
Suggested additional reading All psychometrics textbooks give accounts of reliability theory and discuss validity, although modern approaches to the topic can become rather technical. The book that psychometricians
165
Reliability and validity of psychological tests
usually recommend is still Nunnally (1978) as this derives many of the formulae mentioned earlier. Messick (1987) offers a frst-rate treatment of test validity which may be downloaded without charge and De Von et al. (2007) give a simple account of test reliability and validity aimed at nurses. Cooper (2019) gives rather more detail about the topics covered here.
Answers to self-assessment questions SAQ 8.1 One can either use their score on another scale measuring the same trait or calculate their total on all the other items in the scale to estimate their true score. It would not be a good idea to correlate each person’s score on an item with the total score on that scale, as that item also contributes to the total score.
SAQ 8.2 a. Assuming that each pair of items is infuenced by a different set of nuisance factors, the correlation will show the extent to which the pair of items assesses the trait being measured – Extraversion in this case. b. Since Item 1 shows low correlations with all of the other items, it would appear to be a poor measure of Extraversion. c. The obvious thing would be to write more test items, give the old and new items out to a new sample of individuals (at least 200 of them) and recompute the correlations and alpha. Since coeffcient alpha depends in part on the number of items in the test, increasing the number of items will increase alpha. There is also another possibility. We saw in the answer to (b) that Item 1 really is not very good – it shows low correlations with all of the other test items. Removing this item from the test will both increase the average correlation between the remaining items (from 0.206 based on ten correlations to 0.285 based on six correlations) and reduce the length of the test. The frst factor will tend to increase alpha and the second will tend to decrease it. Thus it is possible – although not certain – that removing Item 1 may also increase alpha. We shall return to this in Chapter 20.
SAQ 8.3 Because coeffcient alpha depends on the number of items in a scale and the average correlation between them, it is possible to obtain a large coefficient alpha even if the items in a scale measure two or more
166
Reliability and validity of psychological tests
uncorrelated factors. This is illustrated in Table 8.2 for a four-item scale; the same average correlation is obtained if all of the items have a modest correlation with each other (i.e., measure one factor) or if the items form two uncorrelated factors – V1&V2 and V3&V4. It is important to appreciate that factor analysis is necessary to show whether a set of items measure a single trait. Coeffcient alpha does NOT show this.
Table 8.2 Two very different sets of correlations between items which will produce the same coeffcient alpha V1
V2
V3
V1
–
V2
0.8
–
V3
0
0
–
V4
0
0
0.8
Average r = 0.267
V4
V1
V2
V3
V4
–
–
0.267
–
0.267
0.267
–
0.267
0.267
0.267
–
Average r = 0.267
SAQ 8.4 a. Alternative names for the internal consistency or reliability of a test, which I have termed alpha throughout this book. b. Doing so is bound to produce a very high reliability, as the items share unique variance as well as measuring the same trait. c. The SEM shows how accurate an assessment of an individual’s score is likely to be. For example, if one test suggested that a person’s IQ was 100 with an SEM of 3, we would be more confdent that the child’s IQ really was 100 than if the test had an SEM of 5. d. 0.81 = 0.9 and 0.56 = 0.75. Suppose that Test 2 was completely reliable. The correlation between Test 1 and Test 2 would then be the same as the correlation between Test 1 and the true score, i.e., 0.9. However, as Test 2 is not perfectly reliable, the correlation will be lower. It can be shown that the largest correlation that one can expect between two tests is the product of the square roots of their reliabilities. In this case, one would not expect the tests to correlate together by more than 0.9 × 0.75 or 0.675. This is how I was earlier able to claim that all of the variability in the R-S Scale could be explained by anxiety, although the correlation between the two scales was only 0.91, not 1.0.
167
Reliability and validity of psychological tests
SAQ 8.5 a. Most certainly not. High reliability tells you that the test measures some trait or state, not what this trait or state is. b. Yes. Although beware if the reliability of a short scale appears to be too high (e.g., ten-item scales with a reliability of 0.9). This may suggest that the same item has been paraphrased several times. c. Construct validity involves relating people’s test scores to scores on other traits, to determine if it shows the correlations which would be expected if scores on the test were a good operational defnition of the concept (trait) that the test supposedly measures. Content validity involves sampling items randomly from some well-defned domain of items, such as all possible two-digit addition problems. Studies of predictive validity relate test scores to some later criterion behaviours – for example, measuring children’s emotional intelligence and relating it to their school performance or level of behavioural problems some years later. d. In construct validation, convergent validity is a measure of whether a test correlates with the characteristics with which it should correlate if it is valid – for example, whether an intelligence test correlates with teachers’ ratings of children’s academic performance. Divergent validity checks that a test shows insubstantial correlations with characteristics to which it should theoretically be unrelated. For example, scores on the intelligence test could be correlated with tests measuring social desirability, various aspects of personality, etc., the expectation being that these correlations will be close to zero.
References Cooper, C. 2019. Pitfalls of Personality Theory. Personality and Individual Differences, 109551. Cooper, C. 2019. Psychological Testing: Theory and Practice. London: Routledge. De Von, H. A., Block, M. E., Moyle-Wright, P., Ernst, D. M., Hayden, S. J., Lazzara, D. J., Savoy, S. M. & Kostas-Polston, E. 2007. A Psychometric Toolbox for Testing Validity and Reliability. Journal of Nursing Scholarship, 39, 155–164. Dobson, P. 1988. The Correction of Correlation Coeffcients for Restriction of Range When Restriction Results from the Truncation of a Normally Disributed Variable. British Journal of Mathematical and Statistical Psychology, 41, 227–234. Feldt, L. S., Woodruff, D. J. & Salih, F. A. 1987. Statistical-Inference for Coeffcient-Alpha. Applied Psychological Measurement, 11, 93–103. Hunt, T. E., Clark-Carter, D. & Sheffeld, D. 2011. The Development and Part Validation of a UK Scale for Mathematics Anxiety. Journal of Psychoeducational Assessment, 29, 455–466. Marsh, H. W., Scalas, L. F. & Nagengast, B. 2010. Longitudinal Tests of Competing Factor Structures for the Rosenberg Self-Esteem Scale: Traits, Ephemeral Artifacts, and Stable Response Styles. Psychological Assessment, 22, 366–381.
168
Reliability and validity of psychological tests
Messick, S. 1987. Validity. Document RR-87-40. ETS Research Report Series. Princeton NJ: Educational Testing Service. Michell, J. 1997. Quantitative Science and the Defnition of Measurement in Psychology – Reply. British Journal of Psychology, 88, 401–406. Nunnally, J. C. 1978. Psychometric Theory. New York: McGraw-Hill. Slough, N., Kleinknecht, R. A. & Thorndike, R. M. 1984. Relationship of the RepressionSensitization Scales to Anxiety. Journal of Personality Assessment, 48, 378–379.
169
Chapter 9 Narrow personality traits
Introduction The theories described in Chapter 6 sought to map out the whole of personality: that is, discover (and measure) all the important personality traits along which people vary. However, not everyone has approached the study of personality in this way. Some have chosen to focus on one particular trait – perhaps based on something that has been identifed from the clinical literature, such as autism or self-concept; we have already encountered Emotional Intelligence which falls into this category. By writing items, factor analysing the correlations between them and validating the scale as described in Chapters 5 and 8 it is possible to determine whether such a trait exists. An important part of this validation process involves establishing how the trait relates to the ‘standard’ personality traits described in Chapter 6 as there is always the risk that a supposedly ‘new’ trait is in fact identical to an existing trait or a mixture of existing traits. There are vast numbers of tests that claim to number huge numbers of narrow personality traits and choosing which to include in this chapter is inevitably arbitrary. I have included some which are widely cited in the literature, or are of some theoretical or practical importance. Traits which have obvious relevance to clinical psychology or psychopathology are discussed in Chapter 10; the present chapter discusses those which do not have such connections.
Learning outcomes Having read this chapter and the recommended reading you should be able to: l
Discuss the nature and biological basis of sensation seeking and its relationship to other personality traits.
171
Narrow personality traits
l l l
Give an overview of Morningness-Eveningness, and comment on its applications and relationship to behaviour and other correlates. Outline research into Mindfulness and its applications. Form an opinion about some problems with assuming that self-esteem infuences behaviour.
Chapter 6 considered models of personality that were intended to identify all the major personality traits and suggest how they should best be assessed. The advantages of this approach are several. First of all, it allows for the proper sampling of variables. The lexical hypothesis (on which Cattell’s model and the fve-factor model are based) should, in theory, ensure that all possible traits have been considered when mapping personality. What is more, because synonyms were removed from the list of trait names, we can be reasonably sure that the traits are not trivial. Trivial personality traits – Cattell calls them ‘bloated specifcs’ – emerge when the words or phrases used in a personality scale are essentially paraphrases of each other or all refer to the same very narrow aspect of behaviour. If we assume that people behave consistently when answering such a questionnaire, the response they give to one item determines what the responses to the rest of the items must be. There is no need to gather data to prove this point: it follows inevitably from the meanings of the words in the scale. Finding that such items form a factor is not a fnding of any scientifc interest, for there is no evidence that any such factor represents a source trait. Many colleagues will disagree with this view and narrow traits are widely used in practice; you may well fnd that your lecturers and other books and journal articles do not see this as being an important issue. It therefore seems fairest to fag that this is a possibly idiosyncratic view at the outset, and let readers make up their own minds.
Self-assessment question 9.1 Suppose that participants were asked to rate how accurately each word or phrase described them. Suppose too that factor analysis showed each item had a loading above 0.4 on the one factor that emerged. How might you investigate whether these items form a ‘bloated specifc’? l l l l l
certain of myself self-assured confdent in control of myself unsure of myself (reverse scored)
This chapter and Chapter 10 will introduce some of the many narrow personality traits that have been developed by researchers who are interested in exploring specifc aspects of personality, rather than trying to discover its broad structure.
172
Narrow personality traits
It is quite possible that a ‘narrow’ personality factor lies somewhere between two or more well-established personality traits (e.g., from the fve-factor model, or Eysenck’s model of personality). For example, suppose we have some scale that claims to measure a new trait – Trait X. A factor analysis might show that all of the items in that scale form a single factor (i.e., they are unidimensional). However the location of this factor might mean that people’s scores on Trait X may be predicted if one knows their scores on other traits, such as Extraversion (E) and Neuroticism (N). Suppose that we gave scales measuring Trait X, E and N to a sample of people and found that the correlation between E and N was 0.0, that E correlated 0.866 with Trait X and N correlated 0.5 with Trait X. This shows that people’s scores on Trait X can be entirely predicted by knowing their levels of Extraversion and Neuroticism. How can we tell? You will remember from your statistics courses that the square of the correlation shows how much variance is shared by two variables. As 0.8662 plus 0.52 =1.0, a person’s score on Trait X can be predicted with perfect accuracy if one knows their scores on E and N. Another way of looking at it is to point out that the items used to measure Trait X, E and N might form three distinct oblique factors, with Trait X being correlated with both E and N. Finding the position of a factor relative to other, well-established factors is sometimes known as locating a trait within personality factor space. This raises an interesting issue. It is quite possible for a trait to exist – even though it may have zero incremental validity. But it may not be of much practical use, if it does not explain anything more than existing, well-established traits. This is why I discuss the extent to which each trait overlaps with the Big Five personality traits.
Self-assessment question 9.2 What does ‘locating a trait within factor space’ mean and why is it important?
Sensation Seeking This personality trait has an unusual origin, stemming from military psychologists’ attempts to determine how susceptible individuals would be to ‘brainwashing’ techniques of interrogation supposedly used during the 1960s. Brainwashing involved putting people into environments that were as unstimulating as possible: for example, they might be immersed in a tank of water in a dark, silent room, so receiving no stimulation from touch, sight, hearing, smell or taste. It was found that this was a most unpleasant sensation and people often became disoriented, losing track of time and hallucinating. People in these settings try to stimulate themselves, for example by tapping or singing in order to reduce the monotony of the environment (Jones, 1969). Such activity is not confned to humans. Sensory stimulation acts as a positive reinforcer for animals who are kept in unstimulating surroundings, who will learn to press a bar if this leads to stimulation (Kish, 1966). Thus it seems that people are hungry for stimulation when placed in monotonous environments. This effect may not be limited to unstimulating laboratory studies. There may also be important individual differences in the extent to which people seek out novel, thrilling behaviour. Some people thrive on driving fast, having varied sex lives and forever seeking exciting new experiences (e.g., bungee jumping, parascending) while others may actively avoid novelty and risk and prefer an unstimulating, routine lifestyle, enjoying the same routine at
173
Narrow personality traits
work each day and travelling to the same hotel at the same time each year to read safe, unexciting books. Sensation seeking may thus be a personality trait. Marvin Zuckerman researched individual differences in sensation seeking for over 40 years (Zuckerman, 1979, 1994, 2007). He defned sensation seeking as ‘the seeking of varied, novel, complex and intense sensations and experiences, and the willingness to take physical, social, legal and fnancial risks for the sake of such experience’ (Zuckerman, 1994) and developed several scales to measure this trait, two of which are reproduced along with norms, scoring keys, reliability and validity data and details of foreign translations (Zuckerman, 1994). The items from Form V of the SSS (Sensation Seeking Scale) are in Zuckerman et al. (1978). Factor analysis of items written to assess sensation seeking suggests that it comprises four distinct facets: l Thrill and Adventure Seeking involves seeking out frightening physical activity. l Experience Seeking involves stimulation through travel, the arts and trying alternative
lifestyles. l Boredom Susceptibility implies a dislike of a fxed routine or of dull people. l Disinhibition involves stimulation from social activities: casual sex, getting drunk or using
drugs socially. The correlations between these facets are all positive, but vary drastically in size depending on which version of the test is being used. For Form IV of the scale the correlations usually range from 0.28 to 0.58, whereas for Form V of the SSS they vary from about 0.14 to about 0.41 (Zuckerman, 1994). Despite this, factor analyses suggest that the factor structure of Form V is satisfactory (Roberti et al., 2003).
Sensation Seeking and the main personality factors The four subscales from Form V of the SSS and the scale scores from a short form of Eysenck’s EPQ were jointly factored by Glicksohn and Abulafa (1998). These researchers extracted three factors: Extraversion, Neuroticism and Psychoticism. Three of the sensationseeking subscales (Experience Seeking, Boredom Susceptibility and Disinhibition) all had substantial loadings (above 0.6) on Eysenck’s P (Psychoticism) factor. Thrill and Adventure Seeking loaded on both P and E. A factor analysis of the items in the SSS and EPQ-R-S revealed a similar story and almost the same thing was earlier reported by Zuckerman et al. (1988). Zuckerman et al. (1991) found that all the sensation-seeking scales fell onto the P factor. It is therefore clear that Sensation Seeking is highly correlated with Psychoticism and, to a lesser extent, Extraversion. Zuckerman and Glicksohn (2016) explore the nature of these relationships in more detail, and discuss the possibility that Impulsivity infuences both Sensation Seeking and Psychoticism. Figure 9.1 shows Eysenck’s dimensions of Psychoticism and Extraversion (cf. Figure 6.1 for Extraversion and Neuroticism) and how Sensation Seeking relates to these two factors. This fgure shows that sensation seeking is not simply a mixture of Psychoticism and Extraversion: if this were the case it would be of little psychological interest, because there would be no need to develop a questionnaire to measure Sensation Seeking; instead, one could just estimate it by (for example) multiplying someone’s Psychoticism score by 3 and adding it to their Extraversion score. Instead, the line representing sensation seeking in Figure 9.1 slants up through the plane of the paper into a third (sensation-seeking) dimension. Sensation Seeking is a dimension of personality that is correlated with P (and to
174
Narrow personality traits
Figure 9.1 Sensation seeking (SS) located relative to Eysenck’s factors of Extraversion (E) and Psychoticism (P)
some extent with E) but one that also has some special meaning of its own; scores on a Sensation Seeking scale cannot be perfectly estimated from P and E alone, as shown by Glicksohn and Abulafa (1998).
Self-assessment question 9.3 What are the main characteristics of sensation seeking?
As Costa and McCrae included excitement-seeking as one of the facets which defne their Extraversion factor and that novelty-seeking is a facet of their Openness factor, it is unsurprising that Sensation Seeking correlates with Extraversion and Openness when these are measured using the NEO-PI(R). This is not a fnding of much psychological signifcance as far as I can see – but a mere consequence of asking (essentially) the same questions twice, in the SSS and the NEO-PI(R) scales. Given that Psychoticism is essentially a combination of (low) Agreeableness and (low) Conscientiousness in the Five-Factor model, as discussed in Chapter 6, one might expect Sensation Seeking to correlate with these traits, which it does (Aluja et al., 2003). These relationships are also found with the HEXACO model, although the SSS also correlates with HEXACO Honesty–Humility and Emotionality (de Vries et al., 2009).
Correlates of sensation seeking Scores on the various sensation-seeking scales have been found to correlate signifcantly with such a huge range of variables that it is not possible to summarise them all here; summaries are given in Zuckerman (1994, 2007). High sensation seekers take part in risky activities, ranging from climbing Everest to bomb disposal (Breivik, 1996; Glicksohn and Bozna, 2000).
175
Narrow personality traits
The higher someone’s level of sensation seeking, the more they self-stimulate themselves through movement when undergoing sensory deprivation (Zuckerman et al., 1966). They are more likely to use drugs and explore sex at an early age (Stanton et al., 2001), have ‘one night stands’ (Kaspar et al., 2016), enjoy abstract art (Furnham and Avison, 1997), drive after using psychoactive drugs (Jamt et al., 2020) and fnd rude and incongruous jokes funny (Ruch, 1988). In short, they seek out excitement thrills and novelty. An obvious problem with this work is that some of the items in the sensation seeking questionnaire may relate directly to the criterion behaviour. For example, if we count alcohol as a drug then fve of the items in the SSS-V scale relate directly to drug use, with others (e.g., admitting to sometimes doing things that are a little frightening or illegal) having more tangential relevance. If you ask someone whether they take drugs and then include several drug-related questions in a personality questionnaire, then there is bound to be a link between the two measures. To combat this, researchers sometimes omit the drug-related items in the SSS when looking at sensation seeking and drug use; drop the sex-related items when exploring sensation seeking and sexual behaviour or attitudes and so on. It is worthwhile checking that this sort of precaution has been taken when interpreting any study – not just those involving sensation seeking. For if only fve out of 40 items relate perfectly to drug taking while the other 35 have no relationship whatever to it, the correlation that is obtained will be in the order of 0.35–0.40, which can easily get mistaken for a genuine fnding instead of a statistical artefact.
Psychobiology of Sensation Seeking As discussed in Chapter 6, in order to use any trait description (‘sensation seeking’, ‘intelligence’, etc.) as an explanation for why a person behaves in a certain manner, it is necessary to ensure that scores on the trait refect some fundamental process inside the individual, and is not just an arbitrary attribution ‘in the eye of the beholder’ which does not refect how people really behave. These fundamental processes might include determining whether scores on the SSS are linked to the expression of certain genes or some maturational or social infuence. This explains why Zuckerman has spent much time exploring the origins of the sensation seeking trait. If: l there is a close correspondence between sensation-seeking scores and levels of some
hormone, neurotransmitter, etc. and l this relationship is so strong that scores on the sensation-seeking scale may be used as a
cheaper alternative to biologically assaying levels of these chemicals then it may be reasonable to conclude that Sensation Seeking (which is really the behavioural consequence of certain hormonal levels, or whatever) causes the individual to behave in certain ways – for example, driving recklessly. This is the frst time we have explored the possible origins of personality traits; it is an incredibly important issue because we can only really say that we understand a concept (such as a personality trait) when we can explain how it operates. Chapters 11 and 13 therefore follow a similar approach for the main personality traits and abilities. There is evidence that Sensation Seeking is linked to brain activity when viewing mildly unpleasant emotional pictures (Deng and Zhang, 2019) – indicating that this trait is related to brain activity and the processing of emotion. Sensation Seeking is related to low levels of monoamine oxidases (MAOs). These are enzymes that destroy various types of neurotransmitter
176
Narrow personality traits
in the synapse between neurones; monoamine oxidase inhibitors (MAOIs) are chemicals (some being well known as antidepressants) that reduce the depletion of these neurotransmitters. A substantial literature shows that reduced levels of MAO are associated with increased motor activity, social activity and antisocial behaviour (e.g., Klinteberg, 1996); male sex hormones also reduce MAO levels. The behaviours associated with low levels of MAO certainly sound as if they may be related to sensation seeking and this model even explains the sex difference that is commonly observed in sensation-seeking behaviour. Indeed there is a signifcant negative correlation between MAO levels and scores on sensation-seeking questionnaires (Bardo et al., 1996) and some gene-variants which infuence MAO levels also infuence Sensation Seeking (Thomson et al., 2013). However, although frequently signifcant, this correlation is not large enough to suggest that MAO levels are the sole determinant of sensation-seeking behaviour and so Zuckerman has expanded the theory to look in more detail at the serotenergic, dopaminergic and noradrenergic systems. Several of these seem to control behaviour relevant to sensation seeking. For example, sertotonergic systems are related to behavioural inhibition and dopaminergic systems to seeking out reward and novelty. Zuckerman also weaves in cortisol levels and the interaction between these systems and stress to create a complex theory of the biological basis of sensation seeking (Netter et al., 1996). There is evidence that sensation seeking is linked to genetic makeup (Stoel et al., 2006) although this probably involves small infuences from a very large number of genes; there is no single ‘gene for sensation seeking’ (Aluja et al., 2009; Powell and Zietsch, 2011).
Overview Sensation seeking is a useful concept when it comes to predicting risky behaviour. However when interpreting such studies, care must be taken to ensure that items in the SSS do not themselves ask about the behaviour with which the SSS is being correlated. The factor analytic results clearly show that sensation seeking is linked to Openness, Psychoticism and the HEXACO factors of Honesty–Humility and Emotionality; they are also linked with Extraversion and Openness in the NEO model because aspects of sensation seeking are used as facets when defning the NEO facets. There is also good evidence for some association between sensation seeking and levels of MAO and neurotransmitters. However, it is less clear whether all these variables can be combined to predict a person’s level of sensation-seeking behaviour accurately.
Morningness Rumour has it that some people love to get up early, wake quickly and produce their best work early in the day, whereas others are slow to wake up, prefer to stay up late at night and work well in the late evening. These extreme personality types are sometimes known as ‘larks’ and ‘owls’, but most researchers believe that people fall along a continuum – a trait with extreme morningness at one end and eveningness at the other. Unsurprisingly, this trait is of particular interest to those working with circadian rhythms and there is a substantial literature linking this trait of Morningness-Eveningness to various aspects of behaviour. For example, a literature review by Cavallera and Giudici (2008) shows that questionnaire scores have been linked to performance at shift work and various physiological variables such as the time at which body temperature reaches its maximum (which is earlier for ‘larks’ than for ‘owls’) and
177
Narrow personality traits
so on. The 19-item Morningness-Eveningness Questionnaire (Horne and Östberg, 1976) measures the trait by asking questions such as ‘If you got into bed at 11:00 PM, how tired would you be?’ and ‘If you usually have to get up at a specifc time in the morning, how much do you depend on an alarm clock?’ The items do not appear to be synonyms and seem to be locally independent, as the way in which one answers one item does not determine how they must answer another. Scores on this questionnaire are supposed to refect individual differences in circadian clocks, which seem to involve neural oscillators in the ventral hypothalamus (Saper et al., 2005). One may therefore expect this trait to have substantial heritability, and it does (Toomey et al., 2015). Interestingly, these authors also show that the very genes which infuence a preference for Eveningness (staying up and working late) also infuence depression. It has long been known that ‘evening’ people tend to experience depressed moods (e.g., Hidalgo et al., 2009) and the work of Toomey et al. shows that the reason for this is partly biological; the same genetic variants which make people ‘owls’ also tend to make them depressed. Much of this Morningness-Eveningness literature is rather specialised and not particularly relevant for this book, but it does seem that scores on the various questionnaires relate as expected to physiological responsiveness and behaviour. Several studies have explored the links between Morningness-Eveningness and personality; Lipnevich et al. (2017) summarise these and show that the largest correlation is with Conscientiousness (estimated r= 0.3; larks tend to be more conscientious). However the other correlations are considerably smaller. Adan et al. (2012) provide an excellent summary of research on the psychology and psychobiology of circadian rhythms, and its applications (e.g., jet lag, shift work). Rather than just correlating scores on morningness-eveningness questionnaires and sleeping habits with personality, Randler (2008) explored how scores on a morningness-eveningness questionnaire related to sleeping behaviour and personality on weekdays and weekends. This overcomes the problem that some would-be ‘owls’ may be forced to go to bed earlier than they would wish in order to obtain enough sleep to function well at work or college. They found a small albeit statistically signifcant correlation (r = 0.13) between Morningness and the Big Five Agreeableness factor in adolescents; there was no signifcant relationship in adults. Conscientiousness correlated signifcantly with Morningness in both adults and adolescents. Several correlations between sleeping habits and personality reached statistical signifcance, but as this study was based on a large sample (n > 1000) many of the signifcant correlations are rather small and indicate only a very weak relationship between the variables. The exception is once again Conscientiousness, which correlates 0.23 with hours of sleep taken at weekends, and it also seems that there is a slight tendency for more conscientious individuals to stay in bed for longer at weekends. But the most interesting fnding is probably that there is no evidence at all for any relationship between Morningness-Eveningness and the two main personality traits, Extraversion and Neuroticism. Morningness-Eveningness seems to be a narrow trait that correlates somewhat with Conscientiousness, but which is otherwise quite distinct from the main personality factors.
Time perception Do we live in the past, present or future? Freud famously claimed that neurotics think too much of their past experiences, Kelly argues that we focus on anticipating the future, whilst models of mindfulness (discussed below) suggest that we should try to experience the present.
178
Narrow personality traits
Hence Zimbardo and Boyd’s (2008, 1999) work on time perception has recently become popular. The Zimbardo Time Perception Inventory (ZTPI) claims to measure the extent to which people reminisce about bad experiences in the past (‘Past Negative’), good experiences in the past (‘Past Positive’), focusing on pleasure in the present (‘Present Hedonistic’), not feeling in control of one’s life (‘Present Fatalistic’) and striving for future goals (‘Future’). These look like narrow personality traits – though once again, inspection reveals that many of the items in its ‘scales’ are virtual synonyms, making them ‘bloated specifcs’ rather than personality traits. For example, one scale contains the items ‘I think about the bad things that have happened to me in the past’ and ‘painful past experiences keep being replayed in my mind’. The only scale which does appear to contain diverse items is ‘Present Hedonistic’ with items such as ‘Taking risks keeps my life from becoming boring’, ‘I do things impulsively’. However this scale correlates 0.74 with a measure of sensation seeking (Muro et al., 2015). Neither scale has perfect reliability (coeffcient alpha) but it can be shown that if the traits could be measured perfectly reliably (i.e., without measurement error) they would correlate 0.9. ‘Present Hedonistic’ orientation is virtually identical to sensation seeking. Despite the narrowness of its scales, there are some correlations between ZTPI scores and behaviour. For example, students who are future-orientated start their assignments earlier than those who live in the present (Zimbardo and Boyd, 1999) which is perhaps hardly surprising. The discovery that those who live in the present are more likely to have risky driving habits is a necessary consequence of the correlation between the ‘present’ scale and sensation seeking noted above; the effect practically disappears (a correlation of 0.35 turns into a partial correlation of 0.17) once sensation seeking is allowed for. Other researchers become strangely excited about the correlation between living in the present and other known correlates of sensation seeking, such as drug use (e.g., Fieulaine and Martinez, 2010). There is a substantial literature on time perception in other branches of psychology – for examining neuropsychological correlates, the infuence of emotion on time perception and so on; many of us will have experienced time appearing to slow down during a car accident, for example. The methodology usually requires people to encode a particular time interval (such as the time between two tones, sometimes whilst being given other tasks to do in order to prevent them trying to count the seconds) and then reproduce it – for example, by pushing a button when they think the same interval (or some fraction or multiple of it) has elapsed. Not many such studies use the ZTPI, although Wittman et al. (2011) found quite a large relationship (r=–0.58) between the future perspective score on the ZTPI accuracy at reproducing a time interval; the quicker that people pushed a button when estimating a time interval, the lower their score on future time perspective. The meaning of this correlation is unclear, however, as Wittman et al. treat the ZTPI as if it were a measure of impulsivity. Others fnd no relationship between time perspective scores and accuracy of time reproduction (e.g., Siu et al., 2014). Likewise it is not obvious what the correlation between measures of brain activity and scores on the ZTPI imply. Although it is diffcult to be enthusiastic about the ZTPI, it is mentioned here for two reasons; frst because it is widely used (Zimbardo and Boyd (1999) has been cited over 1000 times) and second because it shows the importance of critically examining the validity of scales, and hence the concepts which they measure – even if you are a student and you read the work of respected psychologists. Do the items appear similar to items measuring other traits, and if so is there empirical evidence that the traits correlate so substantially that they are essentially identical? Do the items appear to form a ‘bloated specifc’ rather than a broad trait? It is worthwhile considering such issues whenever one uses a ‘new’ scale or tries to make sense of the literature.
179
Narrow personality traits
Mindfulness Mindfulness is another popular topic in personality psychology. It stems from some principles of Buddhist philosophy and meditation and is the ability to pay attention to one’s current perceptions, thoughts and feelings – recognising them without trying to evaluate or judge them. Hayes (2003) describes it as achieving a state of ‘dispassionate calm’ where one is aware of both internal feelings and perceptions, but without being distracted by worries, hopes, fears or outside events. It is a state of mind which can be achieved through meditation; the extent to which people are (or say they are) able to perceive their emotions without feeling forced to reacting to them, accept themselves for who they are and so on is thus arguably a measure of the trait of Mindfulness, if one makes the assumption that mindfulness is a characteristic of the person which pervades their everyday life, rather than just happening during meditation. Several questionnaires have been developed to measure mindfulness – the Freiburg Mindfulness Inventory (Walach et al., 2006) is popular, though there are others, too. Some typical items are shown in Table 9.1.
ExErCISE Here are some guidelines for one widely used form of meditation which supposedly enhances mindfulness, from Kabat-Zinn (1990). You may fnd it useful to try the following. Sitting in a chair in a quiet room ‘we bring our attention to our breathing. We feel it come in, we feel it go out. We dwell in the present, moment by moment, breath by breath . . . And whenever we fnd that our attention has moved elsewhere, wherever that may be, we just note it and let go, and gently escort our attention back to the breath, back to the rising and falling of our own belly.’
A meta-analysis shows that Mindfulness is related to Neuroticism (ρ=–0.58) and Conscientiousness (ρ=0.44) from the Big Five model (Giluk, 2009) though somewhat surprisingly its correlation with Openness to Experience is rather smaller (ρ=0.20). It is also related to Trait Emotional Intelligence (Miao et al., 2018). Practising Zen meditators score half a standard deviation higher on Mindfulness than matched controls (Brown and Ryan, 2003), a signifcant and fairly substantial effect. Some authors have identifed fve facets of Mindfulness – for example, Baer et al. (2006) factor analysed the items from a number of mindfulness scales and found the facets shown in Table 9.1 Once again, the problem with this analysis is that many of the items are near-identical in content; for example ‘I drive on “automatic pilot” without paying attention to what I’m doing’ and ‘I drive places on “automatic pilot” and then wonder why I went there’ are two separate (reverse-scored) items in the ‘acting with awareness’ facet. Some or all of the facets may well be bloated specifcs, and so I would personally be more comfortable using the general mindfulness factor than focusing on its facets. The problem with mindfulness is that it is diffcult to be sure that what a person says about their experiences actually matches their behaviour. It is possible that some individuals view themselves as self-actualised, mindful, self-aware, and in touch with their emotions, and so
180
Narrow personality traits
Table 9.1 Five facets of mindfulness showing the items with the largest factor loadings from Baer et al. (2006) Facet
Example items
Non-reactivity to inner experience
I perceive my feelings and emotions without having to react to them.
Observing/noticing/attending to sensations/perceptions/thoughts/ feelings
I sense my body, whether eating, cooking, cleaning or talking.
Acting with awareness
I [do not] break or spill things because of carelessness, not paying attention, or thinking of something else.
I watch my feelings without getting lost in them.
I notice how my emotions express themselves through my body.
I [do not] fnd it diffcult to stay focused on what’s happening in the present. Describing/labelling with words
I’m good at fnding the words to describe my feelings. I can easily put my beliefs, opinions and expectations into words.
Non-judging of experience
I [do not] criticise myself for having irrational or inappropriate emotions. I [do not] tend to evaluate whether my perceptions are right or wrong.
will choose to answer items in mindfulness questionnaires accordingly; though they view themselves as being mindful, they may not actually be mindful at all. It is diffcult to tell whether they really do behave in the ways they claim, as their experiences are entirely subjective. That said, there is some evidence that scores on mindfulness questionnaires relate to behaviour. Holzell et al. (2011) give an excellent and non-technical account of what some of the psychological and physiological processes associated with mindfulness meditation may be, and show that Mindfulness scores increase following meditation. Interventions designed to enhance mindfulness in children seem to lead to a number of positive educational outcomes, including self-reported empathy, mindfulness, emotional control and social responsibility relative to a control group in a randomised control trial (Schonert-Reichl et al., 2015). Peer-ratings also showed some positive outcomes (for example, sharing, being more kind and rule-abiding) although school performance and cortisol levels (an index of stress) did not seem to beneft. But worthy as this approach was, it is possible that ‘demand characteristics’ were at work. Children in the ‘mindfulness’ group also received interventions aimed at ‘changing the ecology of the classroom environment to one in which belonging, caring, collaboration, and understanding others is emphasized to create a positive classroom milieu’ as well as three sessions of meditation training per day. Even if differences in behaviour were found, it is not obvious that these were due to mindfulness training. The other interventions may have infuenced behaviour. In addition,
181
Narrow personality traits
the children may well have guessed that these strange new mindfulness activities were intended to improve the outcome variables, and so rated themselves and each other accordingly. Whilst studies like this may suggest that mindfulness may improve behaviour, they are by no means defnitive. The picture is somewhat clearer for clinical studies, where a range of mindfulness interventions often seem to have a small-to-moderate benefcial effect on levels of self-reported stress, anxiety and depression (e.g., Abbott et al., 2014; Hofmann et al., 2010) and (crucially) also affect physical measures such as lowering blood pressure in patients with heart disease. Mindfulness training can even be of some beneft when it is delivered over the internet (Spijkerman et al., 2016) where it is particularly effective at reducing stress; it seems to operate by reducing ‘rumination’ – focusing on negative feelings (Jain et al., 2007). Occupational/ organisational psychologists have also used mindfulness interventions to try to reduce workrelated stress and anxiety with some success (Bartlett et al., 2019) although this meta-analysis was unable to determine whether performance at work beneftted from the intervention; once again it is possible that any changes are due to demand characteristics which is a particular problem as several of the studies lacked adequate control groups. Is Mindfulness a trait, as many researchers seem to assume? The fnding that rather short interventions can infuence scores on mindfulness questionnaires suggests that Mindfulness may be more of a state than a trait; the modest correlations between measures of mindfulness and other personality traits may also suggest that mindfulness varies over time. Mindfulness seems to need to be trained rather than being an inbuilt characteristic of the person in the way that (for example) Neuroticism, Extraversion or Conscientiousness is. Mindfulness training seems akin to coping mechanisms or relaxation techniques in helping people deal with stress, although the work of Jain et al. (2007) shows that mindfulness training produces different results from relaxation training, as mindfulness training alone seems to reduce rumination. Scores on mindfulness questionnaires may thus indicate the extent to which people use this technique to protect against stress, rather than being a stable personality trait.
Self-Esteem re-visited We mentioned self-esteem in Chapter 2. It is an enormously popular concept in social psychology although Kline (2000) has observed that the most commonly used test that measures self-esteem, the Rosenberg Self-Esteem (RSE) Scale (Rosenberg, 1965), is deeply fawed by being a bloated specifc. We noted in Chapter 8 that the ten items essentially all ask the same question over and over which causes problems. In our experience, even when the RSE items are interspersed with items from other scales, participants frequently write comments such as, ‘I have already answered this question!’ Such reactions may lead participants to skip questions, respond randomly, and engage in other test-taking behaviors that contribute to invalid protocols. (Robins et al., 2001) These authors go on to demonstrate that a single question – ‘I have high self-esteem’ – is as effective as the longer Rosenberg scale for measuring self-esteem; it shows similarly sized correlations with other variables, such as personality traits. It thus seems that self-esteem is a ‘bloated specifc’ – a single item re-phrased in multiple ways and there must therefore be some question about whether Self-Esteem is a causal infuence. Can we ever say that a person acts in a particular way ‘because they have low Self-Esteem’? If it is found that Self-Esteem is very
182
Narrow personality traits
substantially correlated with other, broader, personality traits it might be safer to explain behaviour using those traits instead. Robins et al. (2001) report the correlations between the Rosenberg Self-Esteem Scale and personality as shown in Table 9.2; their results are broadly in line with other similar studies, such as Watson et al. (2002). Some of these correlations are exceptionally large, and a regression performed on the correlations shown in Watson et al. (2002) – on the companion website – shows that over half the variation in scores on the Rosenberg inventory can be predicted from the Big Five personality factors. After correcting for unreliability of measurement, two-thirds of the variance in self-esteem scores can be explained by the Big Five personality factors. A generation of us were brought up with the idea that a person’s level of self-esteem caused them to behave in particular ways (e.g., Nadler, 1987) and that certain interventions can boost self-esteem. Put like this, self-esteem does not sound much like a stable personality trait as defned in Chapter 1, as many simple interventions in laboratory and other tasks appeared to boost its levels, whilst life events can apparently decrease it. The correlations with the Big Five personality traits shown above however suggests that it stays reasonably constant. Baumeister et al. (2003) argue that: to show that self-esteem is itself important, then, research would have to demonstrate that people’s beliefs about themselves have important consequences regardless of what the underlying realities are. Put more simply, there would have to be benefts that derive from believing that one is intelligent, regardless of whether one actually is intelligent. These authors review the literature and identify several cases where self-esteem is not causal, even though it appears to be at frst glance. For example, people with high self-esteem rate themselves as being physically attractive (the correlation being as high as 0.85). Perhaps seeing a beautiful face in the mirror each morning enhances one’s self-esteem. However this is just not so. When attractiveness is rated by others, it shows a correlation of zero with selfesteem. People with high self-esteem think they look wonderful, even when they do not; having good looks does not boost self-esteem. Their analysis shows that the same sort of thing applies to intelligence (people with high self-esteem think they are smart, but need not be), weight (being overweight does not infuence self-esteem nearly as much as people think it does) and so on. Thus self-esteem seems to behave rather more like a stable personality trait than social psychologists sometimes suggest – a fnding which ties in well with the large correlations shown in Table 9.2. For if self-esteem could easily be altered by life events, it would not correlate substantially with personality.
Table 9.2 Correlations between self-esteem and personality (from robins et al., 2001) NEO Extraversion
0.41
NEO Agreeableness
0.23
NEO Conscientiousness NEO Neuroticism NEO Openness to experience
0.28 –0.70 0.16
183
Narrow personality traits
Does self-esteem infuence behaviour? To determine this it is obviously necessary to control for the infuence of the main personality traits. Judge et al. (2002) demonstrated that it is very diffcult to differentiate self-esteem from Neuroticism, and points out that Eysenck (1990) viewed self-esteem as one of several components of Neuroticism – a view which is consistent with the idea that scales measuring self-esteem are really just a single item blown up into a trait by paraphrasing it several times. When writing this chapter I had expected to fnd studies which administered a self-esteem scale alongside some measure of broad personality traits (those from the Big Five, HEXACO or Eysenck models, for example) and some sort of outcome measure which Self-Esteem might be thought to infuence, as with the studies of emotional intelligence discussed in Chapter 7. Thus it would be possible to determine whether Self-Esteem has any predictive power over and above that offered by broader traits, such as Neuroticism. These studies are hard to fnd, though it is possible to re-analyse correlation matrices which are published for some other purposes. A SPSS fle containing the correlation matrix from Miciuk et al. (2016) may be found in the companion website; it shows that when predicting people’s feeling that their life has some purpose, the incremental validity of self-esteem is not signifcant. It does not add anything to the predictive power of measures of the Big Five personality factors. Many social psychologists feel that people’s self-esteem is important in determining how they behave; the belief that one has a great personality, high intelligence, appalling social skills or whatever is what causes us to behave in a certain way. It does not matter too much if one actually has these characteristics. Personality psychologists instead argue that beliefs are largely irrelevant, and our personality traits (based on behaviour) better explain our actions. The evidence outlined above shows that the test which is widely used to measure general selfesteem is substantially infuenced by personality; it is not a pure measure of belief and may not always show incremental validity over the Big Five personality factors for predicting behaviour.
Summary I suggest that for any narrow personality trait to be valuable, it should: l have a clear theoretical rationale; for example in the case of Sensation Seeking the trait
encompasses a number of different behaviours, thrill-seeking, lack of inhibition, seeking out novel experiences, etc. In the case of some of the more clinically relevant traits to be discussed in Chapter 10, the trait should cover all of the clinical characteristics identifed in psychiatry. l have an effective method of assessing each individual’s position along the trait, without which it is impossible to test any hypotheses associated with the theory. l be demonstrably distinct from the global personality models, such as those of Eysenck or Costa and McCrae. l have some literature exploring the processes that cause individuals to differ on their levels of the trait, for example the links between MAOI and Sensation Seeking. This chapter has considered some personality traits that have been developed for rather specifc research purposes. The choice of what to include has inevitably been subjective: traits such as Mindfulness and Morningness are increasingly cited in the literature and it would also be possible to make a case for considering impulsivity, optimism, perfectionism, self-effcacy, rumination and others. However, those that I have included seem to me either to have a
184
Narrow personality traits
particularly interesting theoretical rationale or to have continued importance. The exception is Self-Esteem, a concept that is hugely popular in social psychology and which is discussed here because its advocates often do not appreciate the problems associated with this trait and its measurement.
Suggested additional reading Breivik et al. (2019) give an interesting literature review of the importance of sensation seeking in military psychology (risk perception, etc.) and Mann et al. (2015) show how sensation seeking and other variables interact to predict delinquency. Adan et al. (2012) show how and why it is important to study circadian rhythms and ‘morningness’ and the Baumeister et al. (2003) review of the (non)-causal role of self-esteem in predicting behaviour makes some important points. Finally, Van Dam et al. (2018) provide an excellent and very accessible summary of mindfulness research.
Answers to self-assessment questions SAQ 9.1 Identifying bloated specifcs is not easy. The frst stage should be to look at the items and determine whether answering one item in a certain way forces a respondent to give a particular answer to another question. If this is widespread, then the scale is probably a bloated specifc. Second, if the scale has unusually high reliability given its length (say reliability above 0.80 for a ten-item test) this can be a warning sign that the test is a bloated specifc. This is because if the test items are sampled from a wide domain (as they should be) two items will only correlate because they share common variance because of the trait they both measure: if they measure a bloated specifc, they will share specifc variance, too, which will increase the reliability of the scale. The next step is to determine how scores on this scale relate to other source traits – locating the trait in factor space. It may be the case that scores are closely related to one or more well-known source traits.
SAQ 9.2 ‘Locating a trait within factor space’ means determining how this trait relates to the main source traits described in Chapter 4. It involves administering some measure of the trait alongside standard personality questionnaires, such as the Eysenck Personality Questionnaire (revised) to a good-sized, representative sample of people. You then have several options: you could factor analyse the responses to all of the items in the questionnaires together,
185
Narrow personality traits
to fnd whether the items in the novel trait form a distinct factor that is different from E, N and P; you could use confrmatory factor analysis to test this hypothesis. Either way, this analysis will also show the correlations between the standard personality factors and the novel trait. Or, if your sample is not large enough for factor analysis, you could correlate scale scores together: you might discover that the trait is moderately related to Neuroticism (r = 0.4, for example), but not at all correlated with Extraversion or Psychoticism. If sample size permits, you should use multiple regression (using the novel scale as the dependent variable and E, N and P as independent variables) to extend this analysis and show whether scores on the novel scale are largely predictable from the well-known personality factors. This tells you whether the trait is worth measuring – or whether it is just a mixture of wellstudied personality traits.
SAQ 9.3 Sensation Seeking involves individuals seeking stimulation in a variety of ways – through aesthetics and lifestyle choices, avoidance of routine, hedonistic activities such as casual sex and seeking out thrilling physical activities.
references Abbott, R. A., Whear, R., Rodgers, L. R., Bethel, A., Coon, J. T., Kuyken, W., Stein, K. & Dickens, C. 2014. Effectiveness of Mindfulness-Based Stress Reduction and MindfulnessBased Cognitive Therapy in Vascular Disease: A Systematic Review and Meta-Analysis of Randomised Controlled Trials. Journal of Psychosomatic Research, 76, 341–351. Adan, A., Archer, S. N., Hidalgo, M. P., Di Milia, L., Natale, V. & Randler, C. 2012. Circadian Typology: A Comprehensive Review. Chronobiology International, 29, 1153–1175. Aluja, A., Garcia, L. F., Blanch, A., De Lorenzo, D. & Fibla, J. 2009. Impulsive-Disinhibited Personality and Serotonin Transporter Gene Polymorphisms: Association Study in an Inmate’s Sample. Journal of Psychiatric Research, 43, 906–914. Aluja, A., García, O. & García, L. F. 2003. Relationships among Extraversion, Openness to Experience, and Sensation Seeking. Personality and Individual Differences, 35, 671–680. Baer, R. A., Smith, G. T., Hopkins, J., Krietemeyer, J. & Toney, L. 2006. Using Self-Report Assessment Methods to Explore Facets of Mindfulness. Assessment, 13, 27–45. Bardo, M. T., Donohew, R. L. & Harrington, N. G. 1996. Psychobiology of Novelty Seeking and Drug Seeking Behavior. Behavioural Brain Research, 77, 23–43. Bartlett, L., Martin, A., Neil, A. L., Memish, K., Otahal, P., Kilpatrick, M. & Sanderson, K. 2019. A Systematic Review and Meta-Analysis of Workplace Mindfulness Training Randomized Controlled Trials. Journal of Occupational Health Psychology, 24, 108–126.
186
Narrow personality traits
Baumeister, R. F., Campbell, J. D., Krueger, J. I. & Vohs, K. D. 2003. Does High Self-Esteem Cause Better Performance, Interpersonal Success, Happiness, or Healthier Lifestyles? Psychological Science in the Public Interest, 4, 1–44. Breivik, G. 1996. Personality, Sensation Seeking and Risk Taking among Everest Climbers. International Journal of Sport Psychology, 27, 308–320. Breivik, G., Sand, T. S. & Sookermany, A.M. 2019. Risk-Taking and Sensation Seeking in Military Contexts: A Literature Review. Sage Open, 9, 13. Brown, K. W. & Ryan, R. M. 2003. The Benefts of Being Present: Mindfulness and Its Role in Psychological Well-Being. Journal of Personality and Social Psychology, 84, 822–848. Cavallera, G. M. & Giudici, S. 2008. Morningness and Eveningness Personality: A Survey in Literature from 1995 up till 2006. Personality and Individual Differences, 44, 3–21. De Vries, R. E., De Vries, A. & Feij, J. A. 2009. Sensation Seeking, Risk-Taking, and the HEXACO Model of Personality. Personality and Individual Differences, 47, 536–540. Deng, X. M. & Zhang, L. 2019. Neural Underpinnings of the Relationships between Sensation Seeking and Emotion Regulation in Adolescents. International Journal of Psychology, https://doi.org/10.1002/ijop.12649 Eysenck, H. J. 1990. Biological Dimensions of Personality. In L. A. Pervin (ed.) Handbook of Personality: Theory and Research. New York: Guilford Press. Fieulaine, N. & Martinez, F. 2010. Time under Control: Time Perspective and Desire for Control in Substance Use. Addictive Behaviors, 35, 799–802. Furnham, A. & Avison, M. 1997. Personality and Preference for Surreal Paintings. Personality and Individual Differences, 23, 923–935. Giluk, T. L. 2009. Mindfulness, Big Five Personality, and Affect: A Meta-Analysis. Personality and Individual Differences, 47, 805–811. Glicksohn, J. & Abulafa, J. 1998. Embedding Sensation Seeking within the Big Three. Personality and Individual Differences, 25, 1085–1099. Glicksohn, J. & Bozna, M. 2000. Developing a Personality Profle of the Bomb-Disposal Expert: The Role of Sensation Seeking and Field Dependence-Independence. Personality and Individual Differences, 28, 85–92. Hayes, R. P. 2003. Classical Buddhist Model of a Healthy Mind. In K. H. Dockett, G. R. Dudley-Grant & C. P. Bankart (eds.) Psychology and Buddhism from Individual to Global Community. Boston, MA: Springer. Hidalgo, M. P., Caumo, W., Posser, M., Coccaro, S. B., Camozzato, A. L. & Chaves, M. L. F. 2009. Relationship between Depressive Mood and Chronotype in Healthy Subjects. Psychiatry and Clinical Neurosciences, 63, 283–290. Hofmann, S. G., Sawyer, A. T., Witt, A. A. & Oh, D. 2010. The Effect of Mindfulness-Based Therapy on Anxiety and Depression: A Meta-Analytic Review. Journal of Consulting and Clinical Psychology, 78, 169–183. Holzel, B. K., Lazar, S. W., Gard, T., Schuman-Olivier, Z., Vago, D. R. & Ott, U. 2011. How Does Mindfulness Meditation Work? Proposing Mechanisms of Action from a Conceptual and Neural Perspective. Perspectives on Psychological Science, 6, 537–559. Horne, J. A. & Östberg, O. 1976. A Self-Assessment Questionnaire to Determine Morningness– Eveningness in Human Circadian Rhythms. International Journal of Chronobiology, 4, 97–110. Jain, S., Shapiro, S. L., Swanick, S., Roesch, S. C., Mills, P. J., Bell, I. & Schwartz, G. E. R. 2007. A Randomized Controlled Trial of Mindfulness Meditation Versus Relaxation Training: Effects on Distress, Positive States of Mind, Rumination, and Distraction. Annals of Behavioral Medicine, 33, 11–21.
187
Narrow personality traits
Jamt, R. E. G., Gjerde, H., Furuhaugen, H., Romeo, G., Vindenes, V., Ramaekers, J. G. & Bogstrand, S. T. 2020. Associations between Psychoactive Substance Use and Sensation Seeking Behavior among Drivers in Norway. BMC Public Health, 20, 8. Jones, A. 1969. Stimulus-Seeking Behavior. In J. P. Zubeck (ed.) Sensory Deprivation: Fifteen Years of Research. New York: Appleton Century Crofts. Judge, T. A., Erez, A., Bono, J. E. & Thoresen, C. J. 2002. Are Measures of Self-Esteem, Neuroticism, Locus of Control, and Generalized Self-Effcacy Indicators of a Common Core Construct? Journal of Personality and Social Psychology, 83, 693–710. Kabat-Zinn, J. 1990. Full Catastrophe Living: Using the Wisdom of Your Body and Mind to Face Stress, Pain, and Illness. New York, NY: Delacorte Press. Kaspar, K., Buss, L. V., Rogner, J. & Gnambs, T. 2016. Engagement in One-Night Stands in Germany and Spain: Does Personality Matter? Personality and Individual Differences, 92, 74–79. Kish, G. B. 1966. Studies of Sensory Reinforcement. In W. K. Honig (ed.) Operant Behavior. Englewood Cliffs, NJ: Prentice Hall. Kline, P. 2000. The Handbook of Psychological Testing (2nd Edn). London: Routledge. Klinteberg, B. 1996. The Psychiatric Personality in a Longitudinal Perspective. European Child & Adolescent Psychiatry, 5, 57–63. Lipnevich, A. A., Crede, M., Hahn, E., Spinath, F. M., Roberts, R. D. & Preckel, F. 2017. How Distinctive Are Morningness and Eveningness from the Big Five Factors of Personality? A Meta-Analytic Investigation. Journal of Personality and Social Psychology, 112, 491–509. Mann, F. D., Kretsch, N., Tackett, J. L., Harden, K. P. & Tucker-Drob, E. M. 2015. Person x Environment Interactions on Adolescent Delinquency: Sensation Seeking, Peer Deviance and Parental Monitoring. Personality and Individual Differences, 76, 129–134. Miao, C., Humphrey, R. H. & Qian, S. S. 2018. The Relationship between Emotional Intelligence and Trait Mindfulness: A Meta-Analytic Review. Personality and Individual Differences, 135, 101–107. Miciuk, L. R., Jankowski, T. & Oles, P. 2016. Incremental Validity of Positive Orientation: Predictive Effciency Beyond the Five-Factor Model. Health Psychology Report, 4, 294–302. Muro, A., Castellà, J., Sotoca, C., Estaún, S., Valero, S. & Gomà-i-Freixanet, M. 2015. To What Extent Is Personality Associated with Time Perspective? Anales De Psicologia, 31, 488–493. Nadler, A. 1987. Determinants of Help Seeking Behavior – The Effects of Helpers’ Similarity, Task Centrality and Recipients’ Self-Esteem. European Journal of Social Psychology, 17, 57–67. Netter, P., Hennig, J. & Roed, I. S. 1996. Serotonin and Dopamine as Mediators of Sensation Seeking Behavior. Neuropsychobiology, 34, 155–165. Powell, J. E. & Zietsch, B. P. 2011. Predicting Sensation Seeking from Dopamine Genes: Use and Misuse of Genetic Prediction. Psychological Science, 22, 413–415. Randler, C. 2008. Morningness–Eveningness, Sleep–Wake Variables and Big Five Personality Factors. Personality and Individual Differences, 45, 191–196. Roberti, J. W., Storch, E. A. & Bravata, E. 2003. Further Psychometric Support for the Sensation Seeking Scale – Form V. Journal of Personality Assessment, 81, 291–292. Robins, R. W., Hendin, H. M. & Trzesniewski, K. H. 2001. Measuring Global Self-Esteem: Construct Validation of a Single-Item Measure and the Rosenberg Self-Esteem Scale. Personality and Social Psychology Bulletin, 27, 151–161.
188
Narrow personality traits
Rosenberg, M. 1965. Society and the Adolescent Self-Image. Princeton: Princeton University Press. Ruch, W. 1988. Sensation Seeking and the Enjoyment of Structure and Content of Humor – Stability of Findings across 4 Samples. Personality and Individual Differences, 9, 861–871. Saper, C. B., Scammell, T. E. & Lu, J. 2005. Hypothalamic Regulation of Sleep and Circadian Rhythms. Nature, 437, 1257–1263. Schonert-Reichl, K. A., Oberle, E., Lawlor, M. S., Abbott, D., Thomson, K., Oberlander, T. F. & Diamond, A. 2015. Enhancing Cognitive and Social-Emotional Development through a Simple-to-Administer Mindfulness-Based School Program for Elementary School Children: A Randomized Controlled Trial. Developmental Psychology, 51, 52–66. Siu, N. Y. F., Lam, H. H. Y., Le, J. J. Y. & Przepiorka, A. M. 2014. Time Perception and Time Perspective Differences between Adolescents and Adults. Acta Psychologica, 151, 222–229. Spijkerman, M. P. J., Pots, W. T. M. & Bohlmeijer, E. T. 2016. Effectiveness of Online Mindfulness-Based Interventions in Improving Mental Health: A Review and MetaAnalysis of Randomised Controlled Trials. Clinical Psychology Review, 45, 102–114. Stanton, B., Li, X. M., Cottrell, L. & Kaljee, L. 2001. Early Initiation of Sex, Drug-Related Risk Behaviors, and Sensation-Seeking among Urban, Low-Income African-American Adolescents. Journal of the National Medical Association, 93, 129–138. Stoel, R. D., De Geus, E. J. C. & Boomsma, D. I. 2006. Genetic Analysis of Sensation Seeking with an Extended Twin Design. Behavior Genetics, 36, 229–237. Thomson, C. J., Carlson, S. R. & Rupert, J. L. 2013. Association of a Common D3 Dopamine Receptor Gene Variant Is Associated with Sensation Seeking in Skiers and Snowboarders. Journal of Research in Personality, 47, 153–158. Toomey, R., Panizzon, M. S., Kremen, W. S., Franz, C. E. & Lyons, M. J. 2015. A Twin-Study of Genetic Contributions to Morningness–Eveningness and Depression. Chronobiology International, 32, 303–309. Van Dam, N. T., Van Vugt, M. K., Vago, D. R., Schmalzl, L., Saron, C. D., Olendzki, A., . . . Meyer, D. E. 2018. Mind the Hype: A Critical Evaluation and Prescriptive Agenda for Research on Mindfulness and Meditation. Perspectives on Psychological Science, 13, 36–61. Walach, H., Buchheld, N., Buttenmuller, V., Kleinknecht, N. & Schmidt, S. 2006. Measuring Mindfulness – The Freiburg Mindfulness Inventory (FMI). Personality and Individual Differences, 40, 1543–1555. Watson, D., Suls, J. & Haig, J. 2002. Global Self-Esteem in Relation to Structural Models of Personality and Affectivity. Journal of Personality and Social Psychology, 83, 185–197. Wittmann, M., Simmons, A. N., Flagan, T., Lane, S. D., Wackermann, J. & Paulus, M. P. 2011. Neural Substrates of Time Perception and Impulsivity. Brain Research, 1406, 43–58. Zimbardo, P. G. & Boyd, J. 2008. The Time Paradox: The New Psychology of Time. London: Rider. Zimbardo, P. G. & Boyd, J. N. 1999. Putting Time in Perspective: A Valid, Reliable IndividualDifferences Metric. Journal of Personality and Social Psychology, 77, 1271–1288. Zuckerman, M. 1979. Sensation Seeking. London: Wiley. Zuckerman, M. 1994. Behavioral Expressions and Biosocial Bases of Sensation Seeking. Cambridge: Cambridge University Press. Zuckerman, M. 2007. Sensation Seeking and Risky Behavior. Washington, DC: American Psychological Association.
189
Narrow personality traits
Zuckerman, M., Eysenck, S. & Eysenck, H. J. 1978. Sensation Seeking in England and America – Cross-Cultural, Age, and Sex Comparisons. Journal of Consulting and Clinical Psychology, 46, 139–149. Zuckerman, M. & Glicksohn, J. 2016. Hans Eysenck’s Personality Model and the Constructs of Sensation Seeking and Impulsivity. Personality and Individual Differences, 103, 48–52. Zuckerman, M., Kuhlman, D. M. & Camac, C. 1988. What Lies Beyond E and N – Factor-Analyses of Scales Believed to Measure Basic Dimensions of Personality. Journal of Personality and Social Psychology, 54, 96–107. Zuckerman, M., Kuhlman, D. M., Thornquist, M. & Kiers, H. 1991. 5 (or 3) Robust Questionnaire Scale Factors of Personality without Culture. Personality and Individual Differences, 12, 929–941. Zuckerman, M., Persky, H., Hopkins, T. R., Murtaugh, T., Basu, G. K. & Schilling, M. 1966. Comparison of Stress Effects of Perceptual and Social Isolation. Archives of General Psychiatry, 14, 356–365.
190
Chapter 10 Sub-clinical and antisocial personality traits
Introduction This chapter deals with some narrow personality traits where extremely high scores may have some clinical or social signifcance. Once again the choice of traits is fairly arbitrary, and there are many others which could have been included; the ones I mention in this chapter are currently popular and/or of some theoretical and practical interest. There is also a fair amount of critical evaluation, to encourage you to follow the same sort of approach yourself when reading the research literature. We start by looking at psychosis-like symptoms, as it has been found that the cognitive and emotional processes which defne schizophrenia in extreme cases can also be found in a milder form in those of us who do not have a clinical diagnosis. Other people have problems identifying and using emotions (Alexithymia), and the topic of conservative/authoritarian attitudes will be of interest of those studying fascism and other forms of extremism, as well as individual differences. Finally we consider the ‘dark side’ of human personality looking at how to identify the psychopaths who walk amongst us, understand how they came to behave in this way, and exploring other traits (Narcissism and Machiavellianism) which are strongly correlated with psychopathic behaviour. Examples items from commonly used scales have been included wherever possible – to give a feel for the nature of the traits being described, to ‘de-mystify’ some of the technical terms used by researchers and to show where scales can be found, in case they are useful for research purposes or general interest.
191
Sub-clinical and antisocial personality traits
Learning outcomes Having read this chapter you should be able to: l l l l
Discuss the ‘continuity hypothesis’ and form an evidence-based opinion of whether and why Schizotypy is linked to schizophrenia. Consider whether attitudes could be regarded as personality traits, and discuss the nature and origins of the authoritarian personality. Describe Alexithymia and discuss its relationship to other personality traits. Discuss the characteristic, origins and importance of Machiavellianism, Narcissism and Psychopathy – the ‘dark side’ of personality.
Schizotypy Several researchers (e.g., Claridge, 1985; Meehl, 1962) have argued that there is no real discontinuity between ‘normal’ and ‘abnormal’ behaviour; for example those who have depressive personalities may be more likely to suffer bouts of clinical depression than those who do not. Likewise, those who have higher than average scores on anxiety inventories are more likely to develop a full-blown anxiety disorder than are phlegmatic individuals. This idea (sometimes known as the continuity hypothesis) becomes of particular interest to personality theorists when applied to schizophrenia. Schizophrenia is a psychiatric condition that will affect about 1% of us at some time in our lives. Its symptoms can include hallucinations (seeing or hearing things that are not physically present, voices being particularly common), delusions (believing things that are unlikely to be true; for example, that their thoughts are being stolen or that they are being controlled by aliens via televisions), lack of emotion, illogical thought processes and inability to concentrate. The person becomes increasingly more eccentric, isolated and apathetic. (The idea that schizophrenia is anything to do with ‘Jekyll and Hyde’ or multiple personalities is an unfortunate myth that is without foundation.) If the continuity hypothesis holds true for schizophrenia as well as for conditions such as anxiety and depression, it might be possible to fnd milder signs of these symptoms in the general population. It might also be interesting to determine whether this predicts who will develop schizophrenic symptoms in the future. For example, are normal people who believe in fying saucers or conspiracy theories also more likely to show shallow emotional responses, have problems concentrating and occasionally see or hear things that are not really present? Schizotypy is the term that refers to people who function fairly normally, but show low levels of the cognitive, emotional and attentional problems that are found in aggrandised form in schizophrenics as discussed by Johns and van Os (2001). The concept of Schizotypy therefore has much in common with Eysenck’s idea of Psychoticism. Both of these traits are thought to predict risk of developing schizophrenia, but researchers such as Claridge have tried to extrapolate all the main symptoms of schizophrenia from the clinical to the non-clinical population. Eysenck focused more on impulsive, cruel and solitary behaviours – with little mention of perceptual distortions or cognitive or attentional problems. Thus although the traits are similar, they are not identical.
192
Sub-clinical and antisocial personality traits
Measurement of Schizotypy Several scales have been developed to measure Schizotypy, generally by factor analysing the responses that people make to a set of items that have been designed to measure the concept. The problem is that researchers differ in the way in which they conceptualise it. For example, it is common for researchers to use a ‘magical ideation’ scale (Chapman et al., 1980) as if it measured Schizotypy – yet this really only assesses strength of unusual beliefs. Others show better evidence of content validity, as they were developed by examining the symptoms that constitute the clinical defnition of schizophrenia and generating items which may tap milder versions of these symptoms. Tests that have been constructed in this way include Raine’s (1991) Schizotypal Personality Questionnaire (SPQ), the Schizotypal Trait Questionnaire, Form A (Claridge and Broks, 1984) known as the STA, and the Oxford– Liverpool Inventory of Feelings (the O-LIFE; Mason et al., 1995) which adds additional items to the STA. Several correlated facets are usually found when the items on these questionnaires are factor analysed. For example, the 37 items of the STA questionnaire generally produce three correlated factors; ‘unusual perceptual experience’, ‘paranoid ideas’ and ‘magical thinking’. The 104-item O-LIFE questionnaire consists of four facets. One refects the ‘positive’ (unusual) symptoms of schizophrenia such as strange perceptual experiences and magical thought. Defcits associated with schizophrenia (withdrawal) form another facet, cognitive and attentional problems a third and impulsive nonconformity a fourth. The correlations between these facets are generally substantial and form a higher-order factor of Schizotypy (Mason et al., 1995) as described in Chapter 5. The full test is printed in Mason and Claridge (2006), with some sample items shown in Table 10.1 The items are not too extreme so that non-clinical groups show a good range of scores when answering them, and factor analyses show that they form facets which in turn form a trait, much as expected. There is thus good evidence that such ‘watered-down’ symptoms of schizophrenia form a trait within the normal population. But is this trait related to schizophrenia?
Table 10.1 Sample items from the O-LIFE measure of Schizotypy (Mason and Claridge, 2006) Facet
Sample item
Unusual experiences
Have you ever felt you have special, almost magical, powers?
Cognitive disorganisation
No matter how hard you try to concentrate do unrelated thoughts always creep into your mind?
Introvertive anhedonia (withdrawal)
Are you much too independent to really get involved with people?
Impulsive non-conformity
Do you often feel like doing the opposite of what people suggest, even though you know they are right?
193
Sub-clinical and antisocial personality traits
Schizotypy and schizophrenia A huge amount of research has been performed in this area and it is impossible to do it justice in just a few paragraphs. It is made more diffcult because it is quite diffcult to locate people with schizophrenia who do not receive anti-psychotic medication. This makes it diffcult to tell whether people with high levels of Schizotypy show the same cognitive and emotional functioning (albeit to a lesser degree) than those diagnosed with schizophrenia; any similarities or differences might be due to the effects of medication. There are two important questions that can be asked: 1.
2.
Are high scores on Schizotypy scales linked to schizophrenia – for example, via longitudinal studies or studying the relatives of people diagnosed with schizophrenia? If schizophrenia develops because of biological or family infuences, then relatives might have high scores on Schizotypy scales if this trait really is linked to schizophrenia. If they are have different origins, one would expect no such links to emerge. Much is now known about which cognitive, attentional and other processes are affected by schizophrenia and about which remain intact. Do individuals who score high on measures of Schizotypy show similar profles of symptoms and defcits to those diagnosed with schizophrenia?
Knowing the answers to these questions will indicate whether Schizotypy involves the same psychological and physiological mechanisms as schizophrenia, or whether they are distinct. Addressing the frst question, Nelson et al. (2013) review the literature showing that both schizophrenia and Schizotypy tend to be inherited as described in Chapter 15, and that families of those diagnosed with schizophrenia tend to have elevated scores on tests of Schizotypy. Whilst this could indicate that either environmental or genetic infuences tend to increase levels of Schizotypy and increase the probability of a diagnosis of schizophrenia being made, Fanous et al. (2007) show that some of the genes which predispose people to developing schizophrenia also infuence levels of Schizotypy. This is important evidence that the biological mechanisms of Schizotypy resemble those of schizophrenia. Of course, environmental infuences will also be vital, and Hatzimanolis et al. (2018) show that a stressful life event (undergoing compulsory military training) can boost levels of Schizotypy, and that this is purely an environmental effect; it is not a consequence of some individuals showing a genetic predisposition for Schizotypy which is activated by the environmental stressor. Claridge (1997) summarises much of the old research literature and Lenzenweger (2006a, 2006b) gives an excellent overview of the development of the concept and the relationship between Schizotypy and schizophrenia. The problem with any longitudinal study to determine whether Schizotypy is linked to the development of schizophrenia in later life is that initial testing would have to be performed quite early (as schizophrenia commonly manifests itself in adolescence) and as less than 1% of the population will ever develop schizophrenia, huge samples would have to be screened. Thus most research uses less precise techniques for determining whether schizotypal personality traits really do predispose people to developing schizophrenia. Kwapil et al. (2013) and Chapman et al. (1994) describe a study in which 500+ individuals were assessed and followed up 10 years later. Those showing unusually high scores on perceptual aberration and magical ideation scales were more likely than controls to show signs of psychoses, have psychotic relatives, show schizotypal symptoms and psychotic-like experiences. The study does provide some suggestion that Schizotypy predicts later psychosis – although it does not show that it predicts full-blown schizophrenia.
194
Sub-clinical and antisocial personality traits
Relatives of people diagnosed with schizophrenia (who will therefore have shared some genes and the family environment with them) are sometimes found to have elevated scores on measures of Schizotypy (Mata et al., 2003; Mata et al., 2000). More recent studies have also been performed, but these tend to focus on biological (genetic) determinants and become technical. Turning to the second type of evidence, it is possible to measure eye movements and therefore discover whether a person moves their eyes smoothly when following a moving object with their eyes or whether the eyes make a number of jumps (saccades). It is well known that people diagnosed with schizophrenia and their relatives fnd it harder to follow a smoothly moving dot around the screen than do control groups and this has been used as a marker for proneness to schizophrenia (Iacono et al., 1992). A substantial literature shows that poor ability to follow a moving object smoothly with the eyes is also linked to Schizotypy (Gooding et al., 2000), supporting the hypothesis that the neural processes of Schizotypy and schizophrenia are similar. Those diagnosed with schizophrenia or who score high on Schizotypy also fnd it diffcult to look away from a dot when it suddenly appears on a screen; they fnd it diffcult to inhibit the saccade (Myles et al., 2017). Patterns of cognitive defcits (particularly of memory) associated with schizophrenia are also found to a lesser extent in those with high Schizotypy (Siddi et al., 2017), which once again may perhaps suggest that Schizotypy and schizophrenia are related. Men who are diagnosed with schizophrenia are poor at identifying odours though not at discriminating between them – a fnding of some interest, as the brain pathways involved in olfaction are reasonably well understood, and are known to involve brain areas (just behind the eyes) which are also implicated in other tasks at which they perform poorly (Seidman et al., 1991). Interestingly, men with high scores on Schizotypy questionnaires, together with their relatives, also show impaired odour-naming performance (Rupp, 2010), which once again suggests that Schizotypy and schizophrenia may share common neurophysiological processes. Finally, it has recently been suggested that Schizotypy and schizophrenia are linked to grey matter volume in parts of the brain, and a biochemical mechanism for this has been proposed (Modinos et al., 2018). Taken together, several sources of evidence suggest that Schizotypy and schizophrenia have a similar neurophysiological basis. Cannabis use in adolescence is associated with a two- to threefold greater chance of developing schizophrenia in later life (Moore et al., 2007; Ortiz-Medina et al., 2018), and the more cannabis that is used, the greater is the risk. There is also a substantial literature which also shows that cannabis use in adolescence is associated with higher levels of Schizotypy (Cohen et al., 2011). The important question is what causes what. Does cannabis use cause both Schizotypy and schizophrenia? Do high levels of Schizotypy develop into schizophrenia only in those who have a particular genetic predisposition, or can it happen to any cannabis user? Is it the cannabis or the Schizotypy which causes schizophrenia to develop? Are people with naturally higher levels of Schizotypy more drawn to using cannabis – and if so, why? These issues are diffcult to resolve, and older (but eminently readable) reviews such as Compton et al. (2007) are still useful; more recent ones consider the biological (genetic and physiological) mechanisms involved and become rather technical. However the fndings once again suggest that Schizotypy and schizophrenia share common mechanisms. It is well known that those diagnosed with schizophrenia fnd it hard to ignore irrelevant information; unattended, irrelevant information is more likely to break through into consciousness. This may result in some ‘positive symptoms’ – for example, they may consciously recognise rhymes for words and say these, even when they are not appropriate or relevant, whereas members of control groups will not consciously notice rhyming associations. One common task (the Stroop task) asks people to read out words from a card as quickly as
195
Sub-clinical and antisocial personality traits
possible: these words are all names of colours (e.g., ‘red’, ‘green’) but they are printed in different colours: the word ‘green’, for instance, may be printed in red ink. Negative priming involves two trials where the answer to one trial becomes the irrelevant stimulus for the next trial: for example, ‘blue’ written in green ink followed by ‘red’ written in blue. The participant should read out ‘blue, red’. Those diagnosed with ‘positive symptoms’ (delusions, etc.) show reduced inhibition and so less of a negative priming effect than do controls (Williams, 1996) and so do people with high levels of Schizotypy (Ettinger et al., 2018). There are other similarities in cognitive and perceptual functioning between schizophrenia and Schizotypy too, and Ettinger et al. (2014) provide a useful summary of this literature.
Self-assessment question 10.1 Why is it important to perform experiments such as those described in the previous paragraphs?
Schizotypy and the Big Five personality traits Is Schizotypy largely a combination of some of the Big Five personality traits? To answer this question Asai et al. (2011) correlated scores on the O-LIFE with scores on a test measuring the Big Five personality traits and performed a multiple regression. They found that the Big Five personality factors explained over 40% of the person-to-person variation in Schizotypy the largest correlation being with Neuroticism (r=0.43) followed by Agreeableness (r=–0.25) Openness (r=0.25), Extraversion (–0.23) and Conscientiousness (r=–0.15) which tied in with some early studies. It also generally agrees with a meta-analysis which examined differences in levels of personality traits between patients diagnosed with schizophrenia and members of the general population (Ohi et al., 2016). However the patients in this last study were generally medicated whilst the control groups were not, and so although this work too suggests people with high Schizotypy scores generally show a similar pattern of personality traits to those who are diagnosed with schizophrenia, this fnding needs to be treated with caution as the antipsychotic drugs may have affected the scores of the patients. It is however clear from the work of Asai et al. that there is some overlap between Schizotypy and the Big Five personality traits – although Schizotypy also has some unique properties of its own.
Summary Schizotypy is an important personality trait because of its association with a major psychosis – schizophrenia – and because it shows that milder symptoms of schizophrenia can be identifed in the general population. Rather than being qualitatively different from ‘normal’ behaviour, this evidence suggests that schizophrenia forms a continuum – a trait, if you like – with some of us who show rather low levels of functioning completely normally in society whilst others are out of contact with reality and have delusions, hallucinations, paranoid beliefs, emotional fattening and all the other debilitating symptoms of schizophrenia. The evidence outlined above suggests that many of the cognitive, genetic and physiological mechanisms which are associated with Schizotypy are also those which defne schizophrenia, although identifying causal pathways (for example, determining which environmental factors turn a genetic susceptibility towards developing schizophrenia into a full-blown psychotic episode) are complex, though heavily researched areas of neuropsychology.
196
Sub-clinical and antisocial personality traits
Authoritarian and dogmatic attitudes Should we regard strongly held attitudes as if they were personality traits? A case can be made for treating frmly entrenched beliefs in much the same way as personality traits, if they can be shown to be stable over time and to refect genuine internal characteristics of the person. Thus a considerable amount of research has been performed to try to understand why and how some people develop extreme conservatism and authoritarian beliefs. Much of this research stemmed from trying to understand the rise in fascism before and during World War II (‘right-wing authoritarianism’) and also the left-wing tyranny of Stalinist Russia. In addition, terms such as ‘dogmatism’ and ‘religiousness’ have been coined; it is clear that factor analysis is needed to determine whether any of these terms are identical, as well as experimental studies to determine what causes people to hold frm, authoritarian beliefs.
Authoritarianism and conservatism Adorno et al. (1950) developed the frst such scale, the items of which were based on common themes which emerged from interviews and questionnaire studies of a wide variety of American men and women. Questionnaires were used to screen people in various ways (e.g., antidemocratic tendencies) and in-depth interviews with those with extreme low and high scores led to the development of the scale some of whose items are shown in Table 10.2.
Table 10.2 Some items from the F-scale (Adorno et al., 1950) Facet
Description
Example item
Conventionalism
Rigid adherence to conventional, middle-class values
One should avoid doing things in public which appear wrong to others, even though one knows that these things are really all right.
Authoritarian submission
Submissive, uncritical attitude toward idealized moral authorities of the ingroup
Obedience and respect for authority are the most important virtues children should learn.
Authoritarian aggression
Tendency to be on the lookout for, and to condemn, reject, and punish people who violate conventional values
He is indeed contemptible who does not feel an undying love, gratitude, and respect for his parents.
Anti-intraception
Opposition to the subjective, the imaginative, the tenderminded
Although leisure is a fne thing, it is good hard work that makes life interesting and worthwhile.
Superstition and stereotypy
The belief in mystical determinants of the individual’s fate; the disposition to think in rigid categories
Although many people may scoff, it may yet be shown that astrology can explain a lot of things.
(continued)
197
Sub-clinical and antisocial personality traits
Table 10.2 (continued) Facet
Description
Example item
Power and ‘toughness’
Preoccupation with dominance–submission, strong–weak, leader– follower dimension
No insult to our honor should ever go unpunished.
Destructiveness and cynicism
Generalized hostility, vilifcation of the human
No matter how they act on the surface, men are interested in women for only one reason.
Projectivity
The disposition to believe that wild and dangerous things go on in the world
To a greater extent than most people realize, our lives are governed by plots hatched in secret by politicians.
Sex
Exaggerated concern with sexual ‘goings-on’
The sexual orgies of the old Greeks and Romans are nursery school stuff compared to some of the goings-on in this country today, even in circles where people might least expect it.
A person who strongly agrees with such statements is thought to be highly authoritarian or fascistic, and the items shown above also seem to imply highly conservative attitudes – although it will be shown later that this is not necessarily the case. The scale was conducted without using factor analysis, but it has been found that the items form a single factor, much as expected (Krug, 1961). Of course some items now look extreme or dated and some (not shown above) are downright offensive to minority groups. In addition, agreeing with an item always implies authoritarian views whereas it is advisable to ensure that some items are phrased in the opposite direction, as described in Chapter 8. Hence alternative questionnaires have been developed by Wilson and Patterson (1968), Kohn (1972) and Ray (1979) amongst others. Some of these scales expand the scope of the concept somewhat to include ‘Antihedonism’ (the belief that enjoying oneself is somehow wrong), Militarism (for example, supporting nuclear weapons) and Religiosity (belief in the importance of some form of conventional religion, which is not necessarily the same as having religious faith oneself). Authoritarianism is also near-identical to rigid thinking or ‘Dogmatism’ (Rokeach, 1956; Plant, 1960) which is reluctance to admit that one has made a mistake, or to take corrective action. An updated version of Rokeach’s Dogmatism scale may be found in Shearman and Levine (2006); it contains items such as ‘People who disagree with me are usually wrong’. Who could argue with that?
Self-assessment question 10.2 Consider whether Authoritarianism always indicates ‘right-wing’ political attitudes. Is it possible to be an authoritarian anarchist, do you think?
198
Sub-clinical and antisocial personality traits
Correlates and origins of authoritarian attitudes Social Dominance Orientation, the preference for unequal inter-group relations, has been extensively studied by social psychologists and it plus Conservatism is a potent predictor of ‘generalised prejudice’ against outgroups (Hodson et al., 2016). It also found that correlations between racist attitudes and the Big Five personality factors are entirely mediated by authoritarianism (Duriez and Soenens, 2006). In other words, the only reason that Conscientiousness and Openness correlate with racist attitudes is because both of these traits also measure Authoritarianism to some extent. Where do these attitudes originate? It has been found that the behaviour of preschool children correlates substantially with their levels of Authoritarianism two decades later (Block and Block, 2006). This suggests that authoritarian attitudes do not develop as a result of prolonged exposure to family role models during development. Could they be passed on through the genes? On the face of it this seems unlikely, yet several studies (e.g., Ludeke et al., 2013) show clearly that this is the case. The heritability of authoritarian attitudes is about 0.57; infuences outside the family also affect it whilst the infuence of the family environment is near-zero. Chapter 15 explains how we know this. Thus studying family dynamics may not be useful if one wishes to understand the origins of authoritarian attitudes, which must come as an unpleasant shock to students of sociology. Still more surprisingly, Authoritarianism/Conservatism is also linked to rather low-level aspects of brain function. Suppose that participants in an experiment are asked to press a button every time that a cross appears on a screen. This is a very simple task to learn and soon becomes automatic. However, suppose that a second condition is now introduced: for example, if a sound is played when the cross appears, the person should not make any response. Most trials in the experiment involve the cross appearing in the absence of sound (a ‘go’ trial) and just a few involve the inhibition of this response, indicated by the sound (‘no-go’ trials). So from the participant’s point of view, the ‘normal’ response is to push the button when the cross appears, except when the presence of the sound indicates that no response should be made. This task is quite diffcult to perform accurately and everyone sometimes pushes the button even though the presence of a sound indicates that they should not do so. Amodio et al. (2007) recorded electrical activity from the brain during this task and focused on the peak of electrical activity which occurred approximately 50ms after the incorrect response (see Figure 10.1). The voltage recorded 50ms after the incorrect response was correlated with authoritarian attitudes (r=–0.59, N=43, p0.7 with three measures of neuroticism and anxiety, and –0.41 with the Extraversion scale of the EPQ. BAS-RI BAS-RR and BAS-IMP correlated 0.61, 0.46 and 0.5 respectively with the Extraversion scale of the EPQ and –0.30, –0.18 and –0.01 with its Neuroticism scale. To my mind there are two problems with this work. The items forming facets of the BIS and FFFS are frequently rather similar and so it may make little sense to interpret the facets individually. Second, whilst the idea of defensive distance makes intuitive sense in determining
244
Biological, cognitive and social bases of personality
Table 11.2 Items measuring FFFS and BIS from the RST-PQ (Corr and Cooper, 2016) System
Facet
FFFS
Flight
I would run fast if I knew someone was following me late at night.
Active avoidance
There are some things that I simply cannot go near.
Freezing
I am the sort of person who easily freezes up when scared.
Motor planning interruption
When nervous, I fnd it hard to say the right words.
Cautious risk assessment
I worry a lot.
Obsessive thoughts
My mind is dominated by recurring thoughts.
Behavioural disengagement
I often fnd myself ‘going into my shell’.
BIS
whether the FFFS leads to attack, freezing or retreat in the face of threat, how can an individual’s defensive distance in a certain setting can be ascertained other than by observing whether they attack, freeze or run? It might seem that the same term is being used to both describe the person’s behaviour and to explain it. RST is an important attempt to link a model of animal behaviour to psychopharmacological and physiological structures, and thence to human personality. Suggesting how anxiety, fear and obsessional thinking are linked, but distinct emotions seems to me to be one of the major achievements of this approach to understanding personality. The great advantage of RST is that it tries to do more than point to correlations between scores on personality variables and activity in parts of the brain; identifying correlations really does not say much, other than to show that certain brain systems are somehow likely to be related to personality for reasons which are unknown. Instead it suggests a model which tries to explain performance in a particular situation, and argues that personality represents individual differences in the ease with which each of these systems may be activated. It sets out to be a true process model of personality, and one which is being actively developed.
Summary This chapter has examined theories that view personality as either a social or a biological phenomenon and it should be clear by this stage that neither of the views is exclusively correct. Chapter 15 will reveal that the main personality factors have a substantial genetic component, which suggests that purely social explanations of behaviour are unlikely to be adequate. However, when we delve into the psychophysiological literature, it is equally clear that relatively few experiments provide unequivocal evidence in support of Eysenck’s model of Extraversion. However, the fact that several of the studies do show signifcant correlations seems to suggest that there may be some underlying truth in the premise that Extraversion is
245
Biological, cognitive and social bases of personality
related to the degree of arousal in the cortex, although two studies that correlate personality with resting metabolic rate in the cortex do not support the theory. Perhaps this is to be expected. The biological theory of Extraversion essentially assumes that the only determinant of whether a person is lively, sociable, optimistic and impulsive is the amount of electrical activity in their ARAS and it really does seem fairly likely that myriad other social, cultural, motivational, cognitive and time-of-day variables that are diffcult to assess will also affect people’s behaviour. These uncontrolled variables will, of course, all tend to reduce the strength of the correlation between psychophysiological measures and Extraversion, so the fnding that some measures can explain up to about one-seventh of the variability in people’s levels of Extraversion is, perhaps, about as good as could reasonably be expected. Gray’s theory is growing in popularity, although it is not always easy to grasp the nuances of the interactions between the various cns systems and how these may relate to behaviour and/ or brain functioning. The basic question is whether one should relate personality directly to neural functioning (Eysenck) or instead identify various systems of behaviour (which form the cns) and develop models of how these relate to both personality and neural functioning. RST is dauntingly complex – as it arguably needs to be – and there are certainly plenty of problems with older versions of RST, as shown by Matthews and Corr (2008). However it seems to be an important way of linking physiology and psychological processes to personality
Suggested additional reading There is no shortage of good texts and papers on the biological basis of personality. The main problem is that some of them are very technical, while others go into such detail that it is diffcult to keep sight of the broad picture. Revelle (2016) and Mitchell and Kumari (2016) can be strongly recommended, whilst Sampaio et al. (2014) spell out the principles of resting state functional connectivity clearly for those with a good biological background. Kanai and Rees (2011) give an excellent and accessible summary of studies which link brain structure to personality (and intelligence). As far as RST is concerned, there are several good summaries of the theory with links to the neuropsychological literature (Corr, 2013; Corr and Cooper, 2016), and Collins et al. (2017) also relates RST to a model of self-regulation which may become important in the future.
Answers to self-assessment questions SAQ 11.1 There is generally no control group (or sophisticated statistical analysis) to guard against the possibility that the child might have developed in much the same way had it been brought up in a completely different family environment – the child may show certain patterns of personality because of its genetic similarity to its parents. The genes may infuence the parents to bring up the child in a certain way and also infuence the child’s personality and there may be no causal link between parental behaviour and personality at all. One way round this would be to perform such studies using only adopted children.
246
Biological, cognitive and social bases of personality
SAQ 11.2 Several objections to reductionism can be entertained. First, it may be unwise to point to brain structures as explanations for behaviour unless one knows why, precisely, these brain structures developed in the way that they did. For it is possible that life events (e.g., childrearing practices) can infuence the brain structures. In this case, a ‘biological’ theory of personality could be explained by social processes! There is a tendency to assume that the biology of the nervous system is fixed, immutable and thus somehow more fundamental than social theories, although the evidence for this is not always clear. Second, the theories clearly fail to take into account the importance of interactions between personality types and certain environments. It seems quite likely that certain life events will have a profound impact on those who are biologically disposed to react in a certain way. If this is so, then any attempt to explain personality by purely biological (or purely social) processes will clearly be unsatisfactory. Contrariwise, it may be sensible to explore how adequately simple (e.g., purely social or purely biological) models can predict personality before beginning the much more complex task of developing models that are based on interactionism.
SAQ 11.3 There are several possible approaches. The most obvious would be to measure the levels of male hormones, etc., in a sample of ‘normal’ male volunteers and to correlate the levels of such chemicals with their scores on the P scale of the EPQ(R) (or else use a statistical technique called ‘multiple regression’ to determine how much of the variation in the P scale can be predicted by individual differences in the levels of several such chemicals). Consideration could be given to the feasibility (and ethicality) of an intervention study in which some individuals’ levels of such hormones are increased, while a control group would receive a placebo. A one-between and one-within analysis of variance could then establish whether P scale scores increase more in the experimental group. However, this design is probably not feasible (even if it were ethical) because it would presumably need to be carried out over a period of days or weeks. Another approach might be to compare the P scale scores and hormonal levels of normal controls and violent criminals, in the expectation that the group with higher P score would also have higher levels of hormone. However, this approach does not show the size of the relationship between hormones and the P score in the general population and for this reason the frst design is probably the better.
247
Biological, cognitive and social bases of personality
SAQ 11.4 We have not had space to discuss this in much detail, but many of the reasons should be fairly clear. First, it allows us to answer two criticisms that are levelled at personality theory by social psychologists, namely that personality theories are ‘circular’ (inferring personality from behaviour and then trying to explain the behaviour by invoking a personality trait) and that they are not genuine properties of the individual, but rather the results of stereotyping, attributional biases, etc. If it can be shown that the main personality traits are linked to behaviour in quite different domains (e.g., patterns of electrical activity in the cortex), then it will be clear that the personality questionnaires are measuring some rather broad property of the organism. Neither can they merely refect attributional biases, etc. The second reason for studying processes is, of course, scientifc curiosity – a desire to understand how and why certain individuals come to display characteristic patterns of personality. Finally, such work may have some important applications. For example, if a ‘negative’ adult personality trait such as neuroticism or psychoticism could be shown to be crucially dependent on some kind of experience during infancy, society might well want to ensure that all children receive special attention at this stage.
SAQ 11.5 a. Eysenck studied the personality dimensions of Extraversion–Introversion and Neuroticism–Stability, whereas Gray studied personality factors that are mixtures of the two, as shown in Figure 11.2. b Eysenck sought to relate personality directly to brain physiology (e.g., correlating Extraversion with electrical activity in the cortex). Gray suggested that the various structures of the conceptual nervous system (cns) mediate this relationship. The cns refects ways in which animals behave – e.g., approach or avoidance of novel objects.
SAQ 11.6 a. BAS (impulsivity plus seeking out positive rewards). b. FFFS (whether the chocolate thief attacks the shopkeeper, runs or freezes will depend on how close the shopkeeper is, and whether there is a clear passage to the door). c. BAS, FFFS, BIS (desire for chocolate bar being balanced against fear of getting caught leads to activation of the BIS).
248
Biological, cognitive and social bases of personality
Note 1
Assuming it is a simple sine wave.
References Adelstein, J. S., Shehzad, Z., Mennes, M., Deyoung, C. G., Zuo, X. N., Kelly, C., . . . Milham, M. P. 2011. Personality Is Refected in the Brain’s Intrinsic Functional Architecture. PLoS ONE 6(11): e27633. https://doi.org/10.1371/journal.pone.0027633 Bibbey, A., Carroll, D., Roseboom, T. J., Phillips, A. C. & De Rooij, S. R. 2013. Personality and Physiological Reactions to Acute Psychological Stress. International Journal of Psychophysiology, 90, 28–36. Blanchard, R. J. & Blanchard, D. C. 1990. An Ethoexperimental Analysis of Defense, Fear, and Anxiety. In: N. McNaughton & G. Andrews (eds.) Anxiety. Dunedin, New Zealand: University of Otago Press. Byrne, A. & Eysenck, M. W. 1995. Trait Anxiety, Anxious Mood, and Threat Detection. Cognition & Emotion, 9, 549–562. Canli, T., Zhao, Z., Desmond, J. E., Kang, E. J., Gross, J. & Gabrieli, J. D. E. 2001. An fMRI Study of Personality Infuences on Brain Reactivity to Emotional Stimuli. Behavioral Neuroscience, 115, 33–42. Chida, Y. & Hamer, M. 2008. Chronic Psychosocial Factors and Acute Physiological Responses to Laboratory-Induced Stress in Healthy Populations: A Quantitative Review of 30 Years of Investigations. Psychological Bulletin, 134, 829–885. Claridge, G. S., Donald, J. R. & Birchall, P. M. 1981. Drug Tolerance and Personality: Some Implications for Eysenck’s Theory. Personality and Individual Differences, 222, 153–166. Collins, M. D., Jackson, C. J., Walker, B. R., O’Connor, P. J. & Gardiner, E. 2017. Integrating the Context-Appropriate Balanced Attention Model and Reinforcement Sensitivity Theory: Towards a Domain-General Personality Process Model. Psychological Bulletin, 143, 91–106. Coopersmith, S. 1967. The Antecedents of Self-Esteem. San Francisco: Freeman. Corr, P. J. 2008. The Reinforcement Sensitivity Theory of Personality. New York, NY: Cambridge University Press. Corr, P. J. 2013. Approach and Avoidance Behaviour: Multiple Systems and Their Interactions. Emotion Review, 5, 285–290. Corr, P. J. & Cooper, A. J. 2016. The Reinforcement Sensitivity Theory of Personality Questionnaire (RST-PQ): Development and Validation. Psychological Assessment, 28, 1427–1440. Depue, R. A., Luciana, M., Arbisi, P., Collins, P. & Leon, A. 1994. Dopamine and the Structure of Personality – Relation of Agonist-Induced Dopamine Activity to Positive Emotionality. Journal of Personality and Social Psychology, 67, 485–498. Derakshan, N. & Eysenck, M. W. 1999. Are Repressors Self-Deceivers or Other-Deceivers? Cognition & Emotion, 13, 1–17. DeYoung, C. G., Hirsh, J. B., Shane, M. S., Papademetris, X., Rajeevan, N. & Gray, J. R. 2010. Testing Predictions from Personality Neuroscience: Brain Structure and the Big Five. Psychological Science, 21, 820–828. Dubois, J., Galdi, P., Han, Y., Paul, L. K. & Adolphs, R. 2018. Resting-State Functional Brain Connectivity Best Predicts the Personality Dimension of Openness to Experience. Personality Neuroscience, 1, 1–21. Eysenck, H. J. 1967. The Biological Basis of Personality. Springfeld, IL: Charles C. Thomas.
249
Biological, cognitive and social bases of personality
Eysenck, H. J. 1973. Personality, Learning and Anxiety. In H. J. Eysenck (ed.) Handbook of Abnormal Psychology (2nd Edn). London: Pitman. Eysenck, H. J. & Eysenck, M. W. 1985. Personality and Individual Differences. New York: Plenum Press. Eysenck, M. W. & Byrne, A. 1992. Anxiety and Susceptibility to Distraction. Personality and Individual Differences, 13, 793–798. Fahrenberg, J., Schneider, H. J. & Safan, P. 1987. Psychophysiological Assessments in a Repeated-Measurement Design Extending over a One-Year Interval – Trends and Stability. Biological Psychology, 24, 49–66. Fink, A. & Neubauer, A. C. 2004. Extraversion and Cortical Activation: Effects of Task Complexity. Personality and Individual Differences, 36, 333–347. Gale, A. 1983. Electroencephlagraphic Studies of Extraversion–Introversion: A Case Study in the Psychophysiology of Individual Differences. Personality and Individual Differences, 4, 371–380. Gray, J. A. 1970. Psychophysiological Basis of Introversion–Extraversion. Behaviour Research and Therapy, 8, 249–266. Gray, J. A. 1982. The Neuropsychology of Anxiety. Oxford: Clarendon. Gray, J. A. & McNaughton, N. 2000. The Neuropsychology of Anxiety: An Enquiry into the Functions of the Septo-Hippocampal System. Oxford: Oxford University Press. Hagemann, D., Hewig, J., Walter, C., Schankin, A., Danner, D. & Naumann, E. 2009. Positive Evidence for Eysenck’s Arousal Hypothesis: A Combined EEG and MRI Study with Multiple Measurement Occasions. Personality and Individual Differences, 47, 717–721. Honess, T. 1979. Children’s Implicit Theories of Their Peers: A Developmental Analysis. British Journal of Psychology, 70, 417–424. Hsu, W. T., Rosenberg, M. D., Scheinost, D., Constable, R. T. & Chun, M. M. 2018. RestingState Functional Connectivity Predicts Neuroticism and Extraversion in Novel Individuals. Social Cognitive and Affective Neuroscience, 13, 224–232. Juranek, J., Filipek, P. A., Berenji, G. R., Modahl, C., Osann, K. & Spence, M. A. 2006. Association between Amygdala Volume and Anxiety Level: Magnetic Resonance Imaging (MRI) Study in Autistic Children. Journal of Child Neurology, 21, 1051–1058. Kanai, R. & Rees, G. 2011. The Structural Basis of Inter-Individual Differences in Human Behaviour and Cognition. Nature Reviews Neuroscience, 12, 231–242. Kline, P. 1981. Fact and Fantasy in Freudian Theory. London: Methuen. Kline, P. 1993. The Handbook of Psychological Testing. London: Routledge. Kreibig, S. D. 2010. Autonomic Nervous System Activity in Emotion: A Review. Biological Psychology, 84, 394–421. Leonhardt, A., Schmukle, S. C. & Exner, C. 2016. Evidence of Big-Five Personality Changes Following Acquired Brain Injury from a Prospective Longitudinal Investigation. Journal of Psychosomatic Research, 82, 17–23. Li, M. Z., Wei, D. T., Yang, W. J., Zhang, J. F. & Qiu, J. 2019. Neuroanatomical Correlates of Extraversion: A Test-Retest Study Implicating Gray Matter Volume in the Caudate Nucleus. Neuroreport, 30, 953–959. Machado-de-Sousa, J. P., Osório, F. D. L., Jackowski, A. P., Bressan, R. A., Chagas, M. H. N., Torro-Alves, N., . . . Hallak, J. E. C. 2014. Increased Amygdalar and Hippocampal Volumes in Young Adults with Social Anxiety. PLoS ONE 9(2): e88523. https://doi.org/10.1371/ journal.pone.0088523 Marques, A. H., Bjorke-Monsen, A. L., Teixeira, A. L. & Silverman, M. N. 2015. Maternal Stress, Nutrition and Physical Activity: Impact on Immune Function, CNS Development and Psychopathology. Brain Research, 1617, 28–46.
250
Biological, cognitive and social bases of personality
Matthews, G. & Corr, P. J. 2008. Reinforcement Sensitivity Theory: A Critique from Cognitive Science. The Reinforcement Sensitivity Theory of Personality. New York: Cambridge University Press. McNaughton, N. & Corr, P. J. 2004. A Two-Dimensional Neuropsychology of Defense: Fear/ Anxiety and Defensive Distance. Neuroscience and Biobehavioral Reviews, 28, 285–305. McNaughton, N. & Corr, P. J. 2008a. Animal Cognition and Human Personality. The Reinforcement Sensitivity Theory of Personality. New York, NY: Cambridge University Press. McNaughton, N. & Corr, P. J. 2008b. The Neuropsychology of Fear and Anxiety: A Foundation for Reinforcement Sensitivity Theory. The Reinforcement Sensitivity Theory of Personality. New York: Cambridge University Press. Minnix, J. A. & Kline, J. P. 2004. Neuroticism Predicts Resting Frontal EEG Asymmetry Variability. Personality and Individual Differences, 36, 823–832. Mitchell, R. L. C. & Kumari, V. 2016. Hans Eysenck’s Interface between the Brain and Personality: Modern Evidence on the Cognitive Neuroscience of Personality. Personality and Individual Differences, 103, 74–81. Nostro, A. D., Muller, V. I., Varikuti, D. P., Plaschke, R. N., Hoffstaedter, F., Langner, R., Patil, K. R. & Eickhoff, S. B. 2018. Predicting Personality from Network-Based Resting-State Functional Connectivity. Brain Structure & Function, 223, 2699–2719. Pang, Y. J., Cui, Q., Wang, Y. F., Chen, Y. Y., Wang, X. N., Han, S. Q., . . . Chen, H. F. 2016. Extraversion and Neuroticism Related to the Resting-State Effective Connectivity of Amygdala. Scientifc Reports, 6, 35484. Perkins, A. M., Ettinger, U., Davis, R., Foster, R., Williams, S. C. R. & Corr, P. J. 2009. Effects of Lorazepam and Citalopram on Human Defensive Reactions: Ethopharmacological Differentiation of Fear and Anxiety. Journal of Neuroscience, 29, 12617–12624. Pessiglione, M., Seymour, B., Flandin, G., Dolan, R. J. & Frith, C. D. 2006. DopamineDependent Prediction Errors Underpin Reward-Seeking Behaviour in Humans. Nature, 442, 1042–1045. Pickering, A. D., Smillie, L. D. & Corr, P. J. 2008. The Behavioural Activation System: Challenges and Opportunities. The Reinforcement Sensitivity Theory of Personality. New York, NY: Cambridge University Press. Rebernjak, B. & Busko, V. 2017. A Cross-Lagged Model of Reinforcement Sensitivity, Personality and Affectivity. Current Issues in Personality Psychology, 5, 83–90. Revelle, W. 2016. Hans Eysenck: Personality Theorist. Personality and Individual Differences, 103, 32–39. Sampaio, A., Soares, J. M., Coutinho, J., Sousa, N. & Goncalves, O. F. 2014. The Big Five Default Brain: Functional Evidence. Brain Structure & Function, 219, 1913–1922. Schultz, W. 1998. Predictive Reward Signal of Dopamine Neurons. Journal of Neurophysiology, 80, 1–27. Simon, S. S., Varangis, E. & Stern, Y. 2020. Associations between Personality and WholeBrain Functional Connectivity at Rest: Evidence across the Adult Lifespan. Brain and Behavior, 10(2):e01515. https://doi.org/10.1002/brb3.1515 Truneckova, D. & Viney, L. L. 2015. Therapeutic Relationships in Child-Centered Personal Construct Psychotherapy: Experiments in Constructions of Self. Journal of Constructivist Psychology, 28, 15. Van Bockstaele, B., Verschuere, B., Tibboel, H., De Houwer, J., Crombez, G. & Koster, E. H. W. 2014. A Review of Current Evidence for the Causal Impact of Attentional Bias on Fear and Anxiety. Psychological Bulletin, 140, 682–721.
251
Biological, cognitive and social bases of personality
Van den Heuvel, M. P. & Hulshoff Pol, H. E. 2010. Exploring the Brain Network: A Review on Resting-State fMRI Functional Connectivity. European Neuropsychopharmacology, 20, 519–534. Volkow, N. D., Wang, G. J., Fowler, J. S., Logan, J., Jayne, M., Franceschi, D., . . . Pappas, N. 2002. ‘Nonhedonic’ Food Motivation in Humans Involves Dopamine in the Dorsal Striatum and Methylphenidate Amplifes This Effect. Synapse, 44, 175–180. Wacker, J. & Smillie, L. D. 2015. Trait Extraversion and Dopamine Function. Social and Personality Psychology Compass, 9, 225–238. Weinberger, D. A., Schwartz, G. E. & Davidson, J. R. 1979. Low-Anxious, High-Anxious and Repressive Coping Styles: Psychometric Patterns and Behavioral and Physiological Responses to Stress. Journal of Abnormal Psychology, 88, 369–380. Zuckerman, M. 1991. Psychobiology of Personality. Cambridge: Cambridge University Press.
252
Chapter 12 Structure and measurement of abilities
Introduction The study of individual differences in ability is one of the very oldest areas of psychology and is certainly one of the most applicable. Tests assessing individual differences in mental ability have been of great practical value in occupational, industrial and educational psychology, as will be seen in Chapters 17 and 18. The psychology of ability is one of the four main branches of individual differences (the others being personality, mood and motivation) and the present chapter outlines what is now known (and generally accepted) about the nature and structure of human abilities.
Learning outcomes Having read this chapter you should be able to: l l l l l
Outline Spearman’s contribution to the study of mental abilities. Discuss how and why Thurstone’s model differed from Spearman’s, and how a hierarchical model solves the apparent confict between them. Interpret what an IQ of 130 (for example) means. Show an appreciation of how abilities may be measured using tests. Discuss and critically evaluate Gardner’s and Sternberg’s theories.
Howard Wainer (1987) reminds us that ability testing has a history stretching back 4000 years to when the Chinese used a form of ability testing to select for their civil service. He also notes a biblical reference to forensic assessment of ability (Judges 12:4–6). This is
253
Structure and measurement of abilities
hardly surprising, for few areas of psychology have proved of such practical usefulness as the capacity to measure human abilities. The accurate identifcation of which individuals will best be able to beneft from an advanced course of education or which job applicants are likely to perform best if selected (rather than selecting randomly or on the basis of time-consuming and potentially unreliable procedures, such as interviews) brings important fnancial and personal benefts. Tests of ability are also useful for identifying other forms of potential and problems, for example to identify outstanding musical talent or to help in the diagnosis of dyslexia and some types of brain disease. We will start by defning some terms. The term ‘mental ability’ is used to describe a person’s performance on some task that has a substantial information-processing component (i.e., that requires thinking, sound judgement or skill) when that person is trying to perform that task as well as possible. For example, writing a sonnet, adding the numbers 143 and 228, designing a building, reading a map, structuring an essay, inventing a joke and diagnosing a fault in a computer are all examples of mental abilities if (and only if) they are assessed when individuals are really trying to do their best. Some other abilities (e.g., pruning a rosebush, running a race) do not require vast amounts of cognitive skill for their successful completion and so are not usually regarded as mental abilities. Neither are tests of attainment, which (as discussed in Chapter 3) are designed to assess how well individuals have absorbed knowledge or skills that have been specifcally taught. For example, an attainment test in geography might take a sample of facts and skills that pupils should have learned, for example by sampling items from the syllabus. Tests of ability, by way of contrast, involve thinking rather than remembering and use test items that the individual will not have encountered previously. Or at least the individual will not know that they are to be tested on them. Thus mental abilities should be measured by assessing a wellmotivated person who is asked to do their best in performing some task with a substantial cognitive component, the exact nature of which is unfamiliar to them, but for which they have the necessary cognitive skills (e.g., being able to read). Mental abilities, then, are traits that refect how well individuals can process various types of information. They refect cognitive processes and skills and since one of the functions of the education system is to develop some of these (e.g., numeracy, literacy), it is diffcult to defne what is meant by an ability without taking individuals’ educational backgrounds and interests into consideration. Individual differences in the speed of solving differential equations might be regarded as an important ability for physics students or engineers, but not for many others. Thus it is not really possible to defne ability independently of cultural and educational background. For this reason, it is conventional to consider only those mental abilities that everyone can be assumed to have acquired as part of a basic education or which are not formally taught at all. Performance in solving cryptic crossword puzzles, evading income tax, thinking up novel uses for objects, assembling jigsaws or performing simple arithmetic would be regarded as mental abilities. Those abilities that refect specialised knowledge or training or that are noncognitive in nature would be excluded from the list. Hence being able to identify fungi, sprint, play the sitar, grow championship onions, recite the capitals of the states of the USA or call a touch of Belfast Surprise Major (a very esoteric mental skill involving church bells) would not generally be regarded as mental abilities within the general population. But how many abilities are there? After all, the number of ‘tasks with a substantial cognitive component, the exact nature of which is unfamiliar’ (our putative defnition of a mental ability) is potentially enormous. In fact, it is almost infnite, meaning that it would be impossible in practice ever to understand fully a person’s abilities simply by measuring all of them.
254
Structure and measurement of abilities
The alternative is to examine the correlations between mental abilities using factor analysis. This might show that there is some overlap between all these tasks measuring mental abilities and with luck this might reduce the number of distinct abilities to a manageable level. So, just as with the psychology of personality, psychometric studies of mental ability attempt to achieve two objectives: l to establish the basic structure of abilities l once there is some consensus about the number and nature of the main abilities, to try to
understand the nature of the underlying social, biological, cognitive and other processes that cause these individual differences to emerge. This chapter deals with the basic structure of abilities, and Chapters 13 and 15 address some of the underlying processes.
Structure of human mental abilities: general principles You will already be familiar with most of the research techniques that are used in the study of mental abilities, as most authorities agree that factor analysis of the correlations between ability test items can reveal the underlying structure of abilities. Therefore, precisely the same techniques that showed up personality characteristics such as Extraversion and Neuroticism should also be able to reveal the structure of abilities. If we measure individuals’ performance on a very wide range of tasks that seem to require thought for their correct solution, correlate their scores on the items (or tests) together and factor analyse this table of correlations, the factors that emerge should represent the main dimensions of ability.
Self-assessment question 12.1 Think creatively for a few minutes about how you might draw up a list of as many abilities as possible.
Which abilities should we analyse? When studying personality, Cattell used the ‘personality sphere’ concept in an attempt to ensure that all possible descriptions of behaviour were included in factor analyses that were designed to reveal all of the main personality traits. No similar approach has been followed for abilities, presumably because it is recognised that the dictionary may not contain very accurate descriptions of ability. For example, I can think of no commonly used word that describes ability to solve anagrams, sing in tune or show prodigious powers of memory. (Phrases are not really helpful, since they will not be turned up by a search through the dictionary.) This can present problems, since the personality sphere at least offered a starting point for psychologists wishing to map out the whole area of personality. Without any such guidance, it is diffcult to ensure that all possible abilities have been considered when trying to develop tests to cover the whole gamut of mental abilities. This can mean that certain mental ability factors will simply not be discovered. For example, if no tests of mathematical performance
255
Structure and measurement of abilities
are given to a group of people, it is clearly impossible for any sort of mathematical ability factor to emerge when individuals’ responses are intercorrelated and factor analysed.
Cattell’s Ability Dimension Action Chart (ADAC) Cattell (1971) therefore attempted to develop a taxonomy of abilities, by means of a chart showing what types of ability could logically exist. A simplifed version of this Ability Dimension Action Chart (ADAC) is shown in Table 12.1. It views abilities as consisting of three main components. The action domain specifes which aspect of the task is ‘hard’ for the individual. This might be at the input level – that is, the task may require someone to make a fne perceptual discrimination. For example, in a test individuals may be asked to match various extremely similar colours or attend to speech presented to one ear while ignoring speech presented to the other ear. The vast majority of conventional tests of ability instead rely on internal processing and storage – or thinking – to make the items hard. This is the second possible aspect of the action domain. Some tasks primarily require skilled performance (e.g., singing a note in tune, tracking a moving object on a computer screen using a joystick) and so these
Table 12.1 A short form of Cattell’s Ability Dimension Action Chart (ADAC) Action – involvement of • input • internal processing and storage • output Content – various areas such as • verbal • social • mechanical • knowledge • sensory modality Process – including • complexity of relations • complexity of processing • amount of remembering required • amount of knowledge required • amount of retrieval from memory required • flexibility/creativity vs convergence • speed • fine use of muscle
256
Structure and measurement of abilities
would fall into the output category. The other two domains are content (the type of material given to the individual, e.g., a mental arithmetic problem or an anagram) and process (which defines which mental processes the individual is supposed to perform in order to solve the problem). The content of the test items can vary considerably, from verbal, mathematical and mechanical problems to social and moral issues, to pitch perception. The processes involved in performing the task can also take a huge number of values. Since it is highly unlikely that many tasks will involve a single process, it is necessary to categorise the degree to which each of them requires the use of memory, the degree of background knowledge needed, speed, the detection of relationships between objects (e.g., ‘cat is to kitten as dog is to ???’) and the amount of information that needs to be processed (e.g., drawing a circle as opposed to drawing a mural). There could be a vast number of these processes, but a list could perhaps be drawn up on the basis of the main findings of cognitive psychology. The problem is, of course, that the categorisation of tests according to this model will not generally be straightforward. How can one know (without performing myriad experiments) how great the memory requirement of a task is relative to its speed? Likewise, as some of the processes are very broad and do not always reflect modern research, we would now need to modify the list considerably – for example, using ‘working memory’, ‘long term storage’, ‘visuo-spatial scratchpad’, etc. rather than just ‘memory’.
Self-assessment question 12.2 Try to classify each of the following tasks in terms of the ADAC chart in Table 12.1: a. b. c. d. e. f.
singing the same note as that played on a piano identifying a piece of music learning a part in a play solving a riddle inventing a pun solving this SAQ.
This chart can be useful in that it reminds us that most of the abilities identified so far require an action of ‘internal processing’ – we know rather less about those in the input and output domains. However, the problems arise because it has no clear theoretical rationale (e.g., not drawing the list of processes from cognitive psychology) and it is almost impossible to be sure just by looking at a task which processes are involved in its solution. Thus it is probably safest to conclude that there is no single obvious, automatic technique for discovering all of the behaviours that may be influenced by ability.
Factor analysis and the structure of abilities There is an important difference in the way in which factor analysis is generally used in the study of personality and ability. Surprisingly, this is one which does not seem to have been explicitly recognised in the literature. It relates to what precisely goes into the factor analysis. You will remember that in the case of personality psychology, the responses to individual test
257
Structure and measurement of abilities
items are usually fed straight into the factor analysis. Thus, if one factor analyses the correlations between the items in a particular personality questionnaire, the main personality factors that correspond to the scales of the questionnaire should emerge. It is extremely rare to fnd a study in the area of human abilities that calculates and factor analyses the correlations between individual test items. Almost all studies instead factor analyse the correlations between subscales (as facets of ability scales are usually called) consisting of several items – for example, individuals’ scores on subscales (each consisting of 20 or so items) measuring verbal comprehension, non-literal comprehension of proverbs and ‘odd word out’ (some examples taken from Thurstone, 1938 who pioneered work in this feld). These items differ in diffculty and must therefore use different cognitive processes, so are arguably not synonyms or ‘bloated specifcs’ as discussed in Chapter 3. Most ability researchers factor analyse the correlations between groups of items, rather than between the items themselves. How these items are grouped together in the frst place has important consequences for the results that are obtained. Suppose that a researcher wants to examine the structure of ability, which will include some arithmetical problems. He or she will have to decide whether it is better to include several different, separately timed and separately scored subscales measuring arithmetical ability (e.g., one for addition, one for subtraction, one for multiplication, one for division, one for geometry, one for algebra, one for long multiplication, one for long division, another for set theory, etc.) or whether to make the assumption that these skills all go together and so calculate a total score from a mixture of these items, hoping that it will be a reliable measure of a single ability. In the former case, the researcher determines empirically that these various types of problem will group together to form an ability factor, whereas the latter simply assumed that they did. Thus the factor that is obtained when the correlations between the subscales are analysed is often essentially the same as the score on the scale that is obtained when all of the items are simply put into the same test. If we factor analyse the correlations between a number of subscales, calculate individuals’ scores on this factor and correlate these factor scores with the sum of their scores on the subscales, the correlation would be very high (probably well above 0.9), showing that the factor is almost identical to the simple sum of scores. The point to note is that by factoring subscales one might well end up with factors that are different from those that are obtained by factoring correlations between scales. We shall return to this distinction when we examine the work of Spearman and Thurstone.
Empirical results Having decided (rather arbitrarily) whether to factor the correlations between scales or subscales, determining the basic structure of ability is rather straightforward and is fully described in Chapters 5 and 19. All that is necessary is to: 1. 2. 3. 4. 5. 6. 7. 8.
258
administer the test(s) to a large, representative sample of the population score the tests add up individuals’ scores on the various scales or subscales correlate these scores together factor analyse these correlations identify the factor(s) by looking at the variables that have appreciable loadings on them validate the factor(s), e.g., by establishing their predictive and construct validity examine the correlations between the factors – if any are appreciable, repeat steps 4 to 7 to produce ‘second-order’ factors, ‘third-order’ factors, etc., repeating the process until either the factors are essentially uncorrelated, or just one factor remains.
Structure and measurement of abilities
Spearman and Thurstone: one ability or 12? Charles Spearman (1904) was one of the frst to perform an empirical study of the structure of abilities – indeed, he invented the technique of factor analysis for this very purpose. He constructed some rather primitive tests that, he thought, would probably assess children’s thinking ability, and administered these to a sample of Hampshire schoolchildren. He gave them tests of vocabulary, mathematical ability, ability to follow complex instructions, visualisation, matching colours and matching musical pitch. With hindsight, it seems likely that some of the tests would be infuenced by formal education and were thus tests of attainment rather than of aptitude; however, the remaining three tests did not measure skills that were explicitly taught in school. He then added up children’s scores on these six scales, correlated their scores together and factor analysed these correlations. Note that Spearman factored the scores between quite different scales, not different subscales. He found just one factor, which he called g (for general ability). This result implied that, if a child performed above average on one of the tests, it was more likely than not that she would perform at an above average level on all the other tests as well. It is as if there is some basic ‘thinking ability’ that determines performance in all areas – some children seemed to excel at everything, some performed poorly at everything, but few excelled in one area and performed poorly in another. Spearman’s work turns out to be incredibly important, and is discussed at length in Chapter 13.
Self-assessment question 12.3 On the basis of this analysis, is it legitimate to say that general ability causes someone to perform at similar levels on all tests?
Thurstone (who was based in the USA) soon formed a different opinion, but this is largely because he analysed subscales rather than scales. He administered a much larger selection of tests (60 subscales in all) to a sample of just over 200 university students (Thurstone, 1938). When the intercorrelations between these 60 test scores were factor analysed by hand, 12 distinct ability factors were extracted and rotated orthogonally. Thurstone termed these factors ‘primary mental abilities’ and most of them made good psychological sense. For example, one factor had large loadings from all subscales that involved spatial visualisation, one had large loadings from all the subscales involving the use of language, while yet another subsumed all the subscales requiring numerical skills and so on. Thurstone’s primary mental abilities (PMAs) are listed in Table 12.2. Variants of the tests that Thurstone used to measure these abilities (the Test of Primary Mental Abilities) are still in print and are sometimes used to this day. Thus while Spearman found just one factor, namely general ability, Thurstone seems to have identifed a dozen quite independent ability factors. How many ability factors are there? I ought to mention another diffculty with Thurstone’s experiment that more or less guaranteed that he obtained results different from Spearman’s. The problem in using students is that they have most probably been selected on the basis of high levels of general ability. It is unlikely that many would have below average ability. Thus the range of ability in Thurstone’s sample was almost certainly much less than that in the general population. We now know
259
Structure and measurement of abilities
that one of the effects of this will be to reduce the correlations between the subscales, which will reduce the likelihood of fnding a single, all-pervasive factor of general ability. The debate that resulted from Thurstone’s work was hot and furious and many psychologists were convinced that factor analysis was useless because it seemed to produce such inconsistent results. However, you should be able to understand precisely why the results were so different. The difference arises because Thurstone analysed correlations between subscales, while Spearman analysed correlations between scales. Indeed, some of the ability factors that were extracted from Thurstone’s factor analysis (numerical ability, verbal ability and deductive reasoning/following instructions) look remarkably similar to the scales that Spearman put into his analysis. Thurstone’s analysis was just performed at a lower level than that of Spearman. When Thurstone’s subscales are examined in more detail, it can be seen that some of them are virtually identical. For example, ‘Reading 1’ and ‘Reading 2’ both measure understanding of proverbs and are identical in format and diffculty; it looks as if a set of items was written and arbitrarily divided between these two subscales. The subscales are thus bound to correlate substantially and bound to form a factor – a fnding of little or no psychological interest. No one appreciated at the time that it was important to ensure that the variables being factor analysed were different. The obvious next step is to examine the correlations between Thurstone’s factors, since, if they correspond to Spearman’s scales, they may correlate together to give g. The problem is, of course, that Thurstone forced his factors to be at right angles to each other (‘orthogonal rotation’) for ease of computation. Since all correlations between these factors are zero, it is not possible to factor analyse these correlations between the primary abilities. There is another problem, too. Even Thurstone’s work, involving 60 subscales, appeared to miss out some areas of ability. For example, there was nothing that measured how quickly people could come up with ideas, nothing that measured what Sternberg (1985) was to call ‘social intelligence’ (e.g., ‘what’s the “smart” thing to do if your potential fatherin-law forbids you to marry his daughter?’), nothing that measured musical skill, judgement, coordination or a whole host of other abilities that may prove to be important.
Table 12.2 Thurstone’s primary mental abilities (Thurstone 1938) Code
Factor
Brief description
S
Spatial ability
Visualising shapes, mental rotation
V
Verbal relations
Words in context: comprehension, analogies, etc.
P
Perceptual speed
Searching for letters
N
Numerical facility
Addition, subtraction, etc., algebra
W
Word fuency
Tasks dealing with isolated words, e.g., anagrams, spelling
M
Memory
Paired-associate learning and recognition
I
Induction
Finding rules given exemplars, e.g., number series
R
Restriction
Mechanical knowledge, spatial and verbal skills
D
Deduction
Deductive reasoning: applying a rule
. . . plus three factors that could not be easily interpreted
260
Structure and measurement of abilities
Hierarchical models of ability Later evidence shows that Thurstone’s analyses were fawed in one important respect. Whereas he believed that the primary abilities were essentially uncorrelated, most subsequent research has revealed that these factors are correlated together to varying extents. It is therefore possible to factor analyse the correlations between the primary abilities. In addition, several investigators extended the range of subscales that were entered into factor analyses and so discovered more and more primary mental abilities. Cattell’s (1971) contribution is still useful here. His review of the literature identifes some 17 frst-order factors that recur in several independent studies and that seem to be acceptably broad in scope. Hakstian and Cattell (1976) later published a test designed to assess 20 primary abilities – the ‘Comprehensive Ability Battery’. The ‘Kit of Factor-Referenced Cognitive Tests’ (Ekstrom et al., 1976 which may be downloaded free of charge) is very similar and there is good reason to believe that these tests measure many of the most important primary mental abilities. These include abilities as diverse as originality (thinking of creative uses for objects), mechanical knowledge and reasoning, verbal ability, numerical ability, fuency of ideas (being able to produce many ideas quickly), perceptual speed (speed in assessing whether two strings of characters are the same or different), spelling and aesthetic judgement (for works of art). Thus Cattell’s analyses confrm and extend Thurstone’s results. When the correlations between these primary ability factors (‘primaries’) are themselves factor analysed, a number of ‘second-order’ ability factors (‘secondaries’) emerge, rather as shown in Figure 12.1. The lines in this diagram show ‘signifcant’ factor loadings (those above 0.4, say). The raw data on which the analysis is performed are the scores on a number of subscales, represented by squares at the bottom of the fgure. The lozenges at the next level show the factors that emerge when the correlations between these subscales are factor analysed – these are the primary mental abilities. The next level up shows what happens when the correlations between the primary mental abilities are analysed. In this example, verbal ability (V), numerical ability (N) and mechanical knowledge ability (Mk) all have substantial loadings on a secondorder factor called Gc, the meaning of which will be discussed shortly. Note that none of the other primary ability factors shown in the fgure has a substantial loading on Gc. It might even be possible to factor analyse any correlations between the second-order factors in order to obtain one or more third-order factors. As we move towards the top of the fgure, we can see that each factor has an increasingly broad infuence. For example, V (the verbal ability primary) affects performance on just three subscales, while Gc affects performance on ten subscales. While there is generally good agreement about the nature of the main primary ability factors, opinions do differ somewhat about the number and nature of the second- and higherorder abilities. Vernon (1950) proposed that there were two main secondaries – one essentially corresponding to verbal/educational ability (‘v:ed’) and the other representing practical/ mechanical abilities (‘k:m’). Both of these were thought to correlate together, giving general ability (Spearman’s g) at the third level. Vernon’s v:ed factor had signifcant loadings from primaries such as verbal ability and numerical ability and k:m loads primaries dealing with visualisation and spatial abilities. Horn and Cattell (1966) and Hakstian and Cattell (1978) identifed not two but six secondorder factors, a fnding essentially replicated by Kline and Cooper (1984) and Undheim (1981), among others. The two most important factors in this model are known as fuid intelligence (Gf) and crystallised intelligence (Gc). Fluid intelligence is the raw reasoning ability that should develop independently of schooling, covering areas such as memory, spatial ability and inductive reasoning, while crystallised intelligence requires knowledge as well and infuences primaries such as verbal comprehension and mechanical knowledge. The other four
261
Structure and measurement of abilities
Figure 12.1 Part of a hierarchical model of ability, showing one third-order factor (general ability (g)), which infuences two second-order factors (crystallised ability (Gc) and retrieval (Gr)), which, in turn, infuence primary abilities such as verbal ability (V), numerical ability (N), mechanical ability (Mk), ideational fuency (Fl) and word fuency (W). At the bottom of the pyramid, the squares represent various subscales, e.g., those measuring comprehension, knowledge of proverbs and vocabulary, ability to solve simultaneous equations second-order factors are retrieval (Gr), which is concerned with the speed with which creative ideas are produced, visualisation (Gv), which had been identifed as a primary by Thurstone, speed (Gps – a ‘perceptual speed’ factor that measures the speed with which individuals can compare two strings of letters or fgures) and memory (Gm). Once again, these factors seem to make good sense. Further support for this hierarchical model of abilities comes from Carroll (1993) who conducted a massive reanalysis of all published sets of data in the area of cognitive abilities using the most modern statistical techniques then available in order to draw up a comprehensive list of ability factors. The ‘three stratum model’, which he identifed as best describing the data, is extremely similar to those discussed, though with the addition of more primary abilities and some more second-order abilities.
Self-assessment question 12.4 With which of Horn’s second-order factors would you expect the following measures to correlate?
262
Structure and measurement of abilities
a. Locating one’s whereabouts by map reading when lost in the country. b. Speed of thinking what can be cooked from two eggs, an onion and a slice of bread. c. A child’s school examination marks. d. Proofreading a complicated equation in a nuclear physics journal (i.e., checking whether the printed version is the same as the author’s original typescript).
The main reason for the differing numbers of second-order factors probably lies in the nature of the abilities being measured, although differences in the factoring procedure may also have some effect. If a test battery contains less than two subscales measuring creativity or memory skills, then of course no primary or secondaries such as Gv or Gm can emerge, since the creativity or memory tasks would not show a substantial correlation with any other tasks. The other main point to remember is that fuid intelligence (Gf) bears a very striking resemblance to Spearman’s g and so it is encouraging to see that when a third-order factor analysis was performed (by factoring the correlations between the secondaries), a third-order factor affected all three secondaries (Gf, Gc and Gv) that emerged when a sample of primary abilities was factored (Gustafsson, 1981). Interestingly, Gustafsson’s analysis showed that there was a very large correlation (approx. 0.9) between g and Gf and there is an interesting debate about whether these two factors are identical (Blair, 2006). It is thus possible to conceptualise ability in several ways. First, and working our way down the hierarchy, the positive correlation between all of the primary abilities and between all the secondaries suggests that Spearman’s factor of general ability (or something very like it) infuences all aspects of cognitive performance. This means that it is legitimate to say that ‘Martha is more intelligent than Archie’ for example. Some theorists have suggested that general ability may even be linked to the speed and/or effciency with which the nervous system operates and some evidence for this claim will be scrutinised in Chapter 13. There are also some ‘clusters’ of abilities that tend to rise and fall together – these are identifed by the half-dozen or so main second-order ability factors. It is perhaps unsurprising to fnd that one of these factors (Gc) is related to school-based knowledge and skills. However, the rest seem to refect pure thought processes. Then at the next level down there are the individual primary mental abilities, over 20 of which have been discovered. These are thought to be the most basic mental skills.
Measuring abilities There are two types of ability tests. Group tests are administered via computer or paper-andpencil are generally quite short (typically 30 minutes, sometimes much less) and are designed to give a reasonably accurate estimate of one or two abilities. They tend to be used by researchers, employers and others where some measurement error is acceptable. If a test is used to screen applicants for a clerical post it does not matter too much to the employer if a few individuals’ scores are over-estimated slightly whilst a few people’s scores are underestimated (though of course it will matter to the individuals concerned). On the other hand there are some settings where accuracy is all-important. A test which determines whether an individual can understand a criminal charge, and hence whether they are competent to
263
Structure and measurement of abilities
instruct a solicitor/attorney, clearly needs to be very accurate indeed. Tests used in this context are generally administered by a trained psychologist in a one-to-one situation. This allows the psychologist to motivate the participant to do their best, will ensure that they have properly understood what they are being asked to do for each part of the test, will ensure any special needs are taken into consideration and will allow rest-breaks to be given when necessary. This may well be necessary, as such individual assessments usually last at least an hour. Group tests commonly assess measure general intelligence, primary or second-order abilities, or abilities related to work (such as computer-programming or clerical aptitude). There are several freely available tests to measure primary mental abilities, including the Ekstrom-French-Harman (1976) ‘Kit of Factor-Referenced Cognitive Tests’ and its accompanying manual which contains details for administering and scoring it. It measures 23 primary abilities, and the whole test takes several hours to complete. The test and its manual can be downloaded from https://www.ets.org/Media/Research/pdf/Kit_of_FactorReferenced_Cognitive_Tests.pdf. The International Cognitive Ability Resource (Condon and Revelle, 2014) is a 60-item test measuring fve ability traits. Their paper also includes a short (16-item) test which can be used to measure general ability. There are many commercial ability scales, including ‘Raven’s Matrices’, a series of tests developed initially by John Raven, and which are available in versions for children (‘Coloured Progressive Matrices’), the general adult population (‘Standard Progressive Matrices’) and more intelligent adults, such as university students (‘Advanced Progressive Matrices’) although these are still commercial copyrighted tests (Raven et al., 2003) which are only available for use by professionals.
Choose the shape which completes the sequence both down the columns and along the rows.
?
a
b
c
d
Figure 12.2 An item similar to those used in the Raven tests 264
Structure and measurement of abilities
When constructing a test to measure general ability there are two options. The frst (used in the International Cognitive Ability Resource) involves using a variety of different items, to capture the full breadth of the abilities infuenced by g. The second option (used by the Raven tests) identifes one scale which has a very large correlation with g and uses just this format. Thus the Raven consists of puzzles rather like the one in Figure 12.2. One advantage of this item format is that it does not use language, so can possibly be used internationally. An alternative involves choosing items representing all the second-order abilities, or all the primary abilities. Such a test will therefore involve a variety of different types of items, which are sometimes grouped together into subscales. Because general intelligence infuences performance on all tasks that require thought for their successful completion, devising items to measure g is fairly easy. Table 12.3 shows some examples of items which I wrote for the ‘Test the Nation’ IQ test (Cooper, 2003) which were validated by correlating them with a standard intelligence test.
Table 12.3 Some typical items measuring cognitive abilities Item
Text
Problem
1
The answer is one of the letters ‘a’, ‘b’, ‘c’ or ‘d’
If the last letter of this sentence is c then the answer is b, if it is d the answer is a and if it is anything else it is c.
2
Which is the odd one out
litre, metre, pint, kilogram
3
Jon is twice as old as Paul. In ten years he will be one and a half times Paul’s age.
How old is Jon?
4
Rearrange the letters in ‘opera lane’ to spell one word
5
The gloomy student sulked and *o*** as he drove his *o*** to college.
What five-letter word whose second letter is ‘o’ fills both gaps?
This clock has no numbers on the dial and is viewed in a mirror.
What time is it?
6
7
265
Structure and measurement of abilities
Self-assessment question 12.5 What do you think are the advantages of measuring g using items that are of a similar format (as with the Raven items)? What are the advantages of using a mixture of formats (as in Table 12.3)?
Individual tests can only be used by psychologists who have been trained in their administration. They usually consist of 10–20 sets of items, each set measuring a different aspect of ability. Some involve the use of language, and others (e.g., solving jigsaws) do not. This makes it possible to compare a person’s ability on the scales which use language to their performance on the scales which do not; a substantial difference in scores may suggest dyslexia. The advantage of testing people individually is that it is possible to build up a good rapport to encourage people to perform their best, schedule breaks (where this is permitted by the test instructions) and use some types of items which are diffcult to administer in group format – for example, putting together coloured blocks to form a particular shape. Examples of widely used tests include the Wechsler Intelligence Scale for Children (Wechsler et al., 2016), the Wechsler Adult Intelligence Scale (Wechsler, 2010), the British Ability Scales (Elliott and Smith, 2011) or the Kaufman ABC (Kaufman and Kaufman, 2004). Such tests are widely used by educational and clinical psychologists and some researchers who need particularly accurate assessments.
IQ In the early years of ability testing, it was necessary to devise a scale of mental ability that could be easily understood by parents and teachers and so the concept of ‘intelligence quotient’ (IQ) was adopted as an operational defnition of general intelligence, g. Confusingly, this term has two quite distinct meanings. Since most psychological testing was originally performed on schoolchildren, it was initially thought sensible to determine how a child’s performance compared with the average performance of children of other ages. For example, Table 12.4 shows the average scores obtained on an early intelligence test by children of six age ranges. Suppose that a particular child is aged 74 months (this is their ‘chronological age’ or CA) and they score 30 on the test. Part (a) of Table 12.4 shows that a score of 30 is generally obtained by children aged 77 months, so the child’s mental age is 77 months. The old defnition of IQ (sometimes known as the ratio IQ) is: IQ =
mental age × 100 chronological age
In this example, the child’s IQ is: 77 × 100, or 104 74 An IQ of 100 shows that a child is performing at an average level for their age, an IQ greater than 100 shows that they are performing better than average and an IQ less than 100 shows that they are performing below average.
266
Structure and measurement of abilities
Table 12.4 Hypothetical scores on an ability test at various ages (a) Test scores of children aged 72 to 80 months Age (months)
72
73
74
75
76
77
78
79
80
Average score
23
24
25
27
28
30
31
33
35
(b) Test scores of adults aged 15 to 60 years Age (years)
15
16
17
18
25
30
40
50
60
Average score
62
64
65
66
65
66
65
64
64
There are several problems with this defnition of IQ, which led to its demise. The most obvious drawback is related to the fact that performance on ability tests ceases to increase with age after the late teens. Suppose that someone aged 18 years scored 70 on the test. Since none of the higher age groups has an average score of 70, it is not possible to calculate this person’s mental age or IQ using Table 12.4(b). The ratio IQ was therefore abandoned 40 years ago. It should never be used. Instead, a new term called the ‘deviation IQ’ was introduced. By defnition deviation IQs have a mean of 100 for any particular age group and a standard deviation that is almost always 16 (although two test constructors adopt standard deviations of 15 and 18, respectively). Deviation IQs also (by defnition) follow a normal (bell-shaped) distribution and the manuals of many ability tests contain a table showing which scores on the test correspond to which level of IQ. Since deviation IQs follow a normal distribution, it is relatively straightforward to work out how extreme a particular IQ score is. One simply converts the IQ to a z score by subtracting the mean IQ and dividing by the standard deviation – these fgures are 100 and 16, respectively, by defnition. One then consults the table of areas under the standard normal distribution which can be found in any statistics text. For example, an IQ of 131 corresponds to a z score of
(131 − 100 ) = 1.94 16
and the table of the standard normal integral shows that only 2.5% of the population will have scores above this level. An IQ of 110 corresponds to a z score of 0.625; 26% of the population will have scores above this level. It is only necessary (indeed, it is only statistically legitimate) to convert scores on ability tests into IQs to interpret the meaning of one individual’s score on a test. Most of us will simply calculate t-tests, correlations, etc. based on the raw scores from ability tests.
Self-assessment question 12.6 a. What is the difference between ratio IQ and deviation IQ? b. How would you respond if a member of the government berated schoolteachers because, despite a supposedly huge increase in schools’ resources, a recent study showed that 50% of children still have IQs below 100 (the same figure as 10 years ago)?
267
Structure and measurement of abilities
Gardner’s theory of multiple intelligences Howard Gardner had doubts about the adequacy of factor analysis as a method for revealing ability traits and so he performed an extensive literature search in order to try to fnd groups of behaviours that varied together regularly. For example, he looked for behaviours that: l l l l l l
tended to develop at similar ages were similarly affected by drugs or damage to a particular part of the brain appeared (or disappeared) together in geniuses and people with learning disabilities interfered with each other when people were asked to perform two tasks at the same time use common sets of symbols (e.g., music, mathematical notation) transfer, so that training on one task leads to good performance on a second.
Gardner (1983) identifed seven ‘intelligences’ that met most of these criteria. These were linguistic intelligence, musical intelligence, logical/mathematical intelligence, spatial intelligence, bodily/kinaesthetic intelligence, intrapersonal intelligence (self-knowledge) and interpersonal intelligence (quality of interactions with others). ‘Naturalistic intelligence’ (being in touch with nature) and ‘existential intelligence’ (spirituality) have more recently been added to the list. Some of Gardner’s ‘intelligences’ seem to correspond closely to ability traits identifed by Cattell, Gustafsson and others; intrapersonal and interpersonal intelligence sound like emotional intelligence discussed in Chapter 7 and where there is not good agreement (e.g., bodily/kinaesthetic or naturalistic intelligence) it may be because these are not generally regarded as important cognitive abilities under the hierarchical model. There are two diffculties with this approach. First, Gardner assumed that these ‘intelligences’ would be uncorrelated with each other: ‘[T]here exists a multitude of intelligences, quite independent of each other’ wrote Gardner (1993). Amazingly, he had no evidence for making this claim, which fies in the face of a century’s research on the structure of abilities. Second, using Gardner’s defnition of ‘intelligence’ at the start of this section, it is clear that the list cannot be complete. For example, sexual behaviours have symbols, appear (and decline) together with age, are all affected by drugs, brain damage, etc. – so why is there no factor of sexual intelligence? It seems to me that this theory does little more than back up what is already known about the structure of abilities, but it has been marketed as something new and radical, which in my view it is not. There is also the question of how one should best assess Gardner’s intelligences to test his claim that the intelligences are independent of each other and unrelated to g. Several tests have been published that claim to measure Gardner’s intelligences.
ExERCISE Think how you might assess one of Gardner’s intelligences – e.g., musical intelligence or interpersonal intelligence.
You might have suggested measuring musical intelligence by measuring how well someone can remember a tune or rhythm, perhaps testing their musical knowledge, asking them to sing in tune, write a tune, identify which of two notes is of higher pitch, analyse a piece of music (e.g., to identify the various components of sonata form) and so on. To measure interpersonal
268
Structure and measurement of abilities
intelligence you could perhaps ask others to rate how easy a person is to talk to, assess how clearly they can express a viewpoint at a meeting, test how well they can see things from another person’s perspective and so on. If so, you are thinking like a psychologist (which is meant to be a compliment). Unfortunately, tests to measure Gardner’s multiple intelligences are often not so sophisticated. According to the author’s website, the Teele Inventory for Multiple Intelligences ‘is used in 8,000 school districts throughout the nation and in twenty-fve other countries’; one would perhaps expect it to be state of the art. This measure simply shows children or adults pictures of several activities, supposedly representing seven of the intelligences proposed by Gardner (e.g., skateboarding for bodily/kinaesthetic intelligence, listening to music for musical intelligence) and asks them to choose the one that they would prefer to do. There is no evidence whatsoever that the participants are any good at the activities that they claim to prefer; if it measures anything, it may be motivation or interest. The Multiple Intelligences Development Assessment Scales (MIDAS) developed by C. B. Shearer is a commercial test which (according to the author’s website) ‘has been translated into Spanish, Chinese, Arabic and Korean and has been completed by over 25,000 people world-wide. It is being used in schools from Cambridge, Massachusetts to Portland, Oregon and from Vancouver, British Columbia to Jakarta, Indonesia and beyond’ and is viewed favourably by Gardner. It too fails to assess level of performance objectively, but simply asks people to rate their liking for various activities (and sometimes to self-assess their level of performance) as shown in Table 12.5. It is unsurprising that scores on tests such as these show small correlations with conventional intelligence tests, but that can hardly be taken as good evidence for Gardner’s theoretical model, as personality tests (or random numbers) would show small correlations too. Two studies have been performed to test Gardner’s theory. Visser et al. (2006a) chose two fairly conventional ability tests to assess each of Gardner’s intelligences: Gardner himself (Gardner, 2006) thought that these authors made ‘a reasonable effort to choose tasks that are central to the domains under examination’, although he complained that the tests that were used were too logical/scientifc, a point rebutted by Visser et al. (2006b). Visser et al. (2006a)
Table 12.5 Sample items from the Shearer’s MIDAS test of multiple intelligences (Akbari and Hosseini, 2008) Intelligence
Sample item. Each is answered on a fve-point scale plus a ‘do not know’ option.
Musical
Do you spend a lot of time listening to music?
Kinaesthetic
Are you a good dancer, cheerleader or gymnast?
Linguistic
Do you enjoy telling stories or talking about favourite movies or books?
Interpersonal
Are you ever a ‘leader’ for doing things?
Intrapersonal
Do you have a clear sense of who you are?
Naturalistic
Are you good at recognising breeds of pets or kinds of animals?
Spatial
How are you at finding your way around new buildings or city streets?
Math/Logic
Do you have a good memory for numbers such as telephone numbers or addresses?
269
Structure and measurement of abilities
found that many of Gardner’s supposedly independent ‘intelligences’ correlated substantially together, and that a clear g factor was found when these tests were factor analysed, although musical and kinaesthetic intelligences did not load on this. The second study (Castejon et al., 2010) is noteworthy because it used ratings of eight aspects of behaviour which were recommended by Gardner – for example, looking at children’s art portfolios (!) as a measure of their spatial intelligence, and assessing variables such as their level of representation and degree of exploration. This paper, too, found that Gardner’s intelligences are substantially intercorrelated and that a hierarchical model (with g at the top) ftted the data far better than Gardner’s model of six uncorrelated intelligences. These studies which test Gardner’s claim that there are six to eight quite independent domains of intelligence seem to show that the whole theory is simply not based on fact. Gardner’s work has been enthusiastically promoted to teachers and educationalists with children being categorised into ‘types’ on the basis of their scores on these so-called tests of intelligence; school curricula are designed to embrace diversity in that some children are thought to have strengths in interpersonal learning (group work), some learn better through the medium of music or using bodily movement. Is this new or surprising? I remember having a wide variety of lessons (including music, dance, art, ‘nature walks’ and group work) in the 1950s – long before Gardner’s theories developed. What Gardner seems to be advocating is (a) that a diverse curriculum is desirable, and (b) that it is desirable because many children perform markedly better in some areas than others. My problem is that there seems to be no evidence to support Gardner’s second assertion – and several studies which suggest that it quite wrong. We will return to this in Chapter 18.
Sternberg’s triarchic theory of ability The fnal theory to be considered here is that of Sternberg (1985). Sternberg argued that conventional ability tests were rather narrow in scope when compared to laypersons’ views of what constituted ‘smart’ or ‘intelligent’ behaviour. He accordingly developed three ‘subtheories’ of intelligence and suggests that any defnition of intelligence should take into account the following three things: 1.
2.
3.
The context in which ‘intelligent’ behaviour takes place – the result of neglecting this variable is that the only behaviours seen to be relevant to ‘intelligence’ as measured by psychometric tests are those that are important (and can be easily measured) in a classroom testing situation. It is argued that real-world intelligence involves solving illdefned problems such as how to schedule a busy social life, revise effectively, win when gambling or fnd a partner. This is known as the contextual subtheory. The experiential subtheory explicitly considers the role of novelty, experience and automaticity in solving the task. (Automaticity is the smooth performance of cognitively complex operations as a result of considerable experience, e.g., driving a car or reading.) It is thought to be linked to creative intelligence, as it involves deciding what problem is to be solved. The componential subtheory considers the underlying mechanisms (processes) of intelligent behaviour, which include ‘metacomponents’ (planning of actions and cognitive strategies for solving tasks) and knowledge acquisition. The more cognitive part of this subtheory will be introduced in Chapter 13. It is thought to be linked to analytical intelligence.
Sternberg’s triarchic theory is an attempt to explain intelligent behaviour in terms of these three subtheories.
270
Structure and measurement of abilities
A real-life example may help here. Some years ago my neighbour had a large, savage Rottweiler that lived in his garden, lunged against the hedge and barked at (and thoroughly terrifed) anyone in our garden. As far as the contextual subtheory is concerned, ‘intelligent’ responses to this situation might include changing the environment (moving house), modifying the environment (persuading the neighbour to take the dog into the house or letting the dog loose in the hope that it will run away) or adapting to it (wearing earplugs when working, becoming friends with the dog, not using the garden so that the dog would not pose a problem, or just realising that after a few months it might not appear to be so scary). The experiential subtheory seeks to understand the novel demands of the situation – both the nature of the situation itself and the possible solutions. The former may include a lack of experience of ferce dogs and less-than-considerate neighbours. The latter may include a lack of experience of confrontational situations. If I had substantial previous experience in any of these areas, my responses would be more automatic (e.g., knowing precisely how to obtain the desired result when approaching the neighbours), so my overall behaviour would appear more polished and more intelligent and would probably be more successful. The componential sub-theory would explain how I planned to solve the problem, perhaps gathering relevant information (e.g., legal advice sites, realtors, sources of earplugs), planning a strategy, monitoring the success of this strategy and so on. The problem is that by defning intelligence so broadly, Sternberg is stepping into the realm of personality and performance. Style of behaviour (which presumably includes problem solving) was, after all, how we initially defned personality. Thus it comes as no surprise that Sternberg’s focus turned to understanding the relationship between (his theory of) intelligence and personality (Sternberg, 1994), which all seems rather circular. Whether I murder or release the dog, shout at or reason with my neighbour, abandon or defend my right to use the garden, plan my response to the Rottweiler problem or rush straight to a solution, these different approaches certainly sound like personality traits. Sternberg’s theories therefore suggest that that as well as the ‘analytic intelligence’ considered in the early parts of this chapter, ‘practical intelligence’ or common sense may be at least as important for predicting behaviour. These practical problems often have more than one possible solution, are ill-defned, and may beneft from knowledge acquired through experience or observation (rather than being explicitly learned). The example he gives is promotion at work; the person who is promoted is not necessarily the best at their job, but is more likely to be the one who knows how ‘the system’ works, and knows how to work it to their advantage. However is this a good theory? How else could one act other than adapting oneself, changing the environment or shifting environments? Together they seem to spell out all possible options! The theory must therefore necessarily be correct; it cannot be falsifed and thus may not be scientifc. Putting that aside, a review of the evidence for ‘practical intelligence’ put forward by Sternberg concludes that it is backed up ‘mostly with the appearance, not the reality, of hard evidence’(Gottfredson, 2003) based on anecdotes and case studies; Linda Gottfredson’s paper is straightforward, devastating and warmly recommended. There is also good empirical evidence to back up her claim, as Koke and Vernon (2003) found that practical intelligence correlated 0.45 with a test of g and did not predict academic performance any better than did g. The notion of practical intelligence is appealing, but it may not stand up to scrutiny.
Conclusions Although it was developed more than a century ago, the idea of general intelligence still seems to be an important indicator of human behaviour – although it is now appreciated that there is
271
Structure and measurement of abilities
a hierarchy of abilities, rather than just a single g factor. Although several alternatives have been developed over the years, several of these have disappeared and those that remain (Gardner, Sternberg) seem to have major problems. Gardner’s theory does not provide any way of measuring the levels of children’s intelligences, which is a cause for concern; tests which have been developed by others certainly do not look as if they measure any form of ability. In addition, Gardner’s major premise (that his intelligences are independent) is demonstrably incorrect. Sternberg’s theory, like Gardner’s, was propounded in books rather than peer-reviewed publications. Where it has been possible to test Sternberg’s theory, it has not held up well. For example, his measure of ‘practical intelligence’ correlates substantially with a conventional intelligence test. The hierarchical model of abilities is underpinned by a solid bank of empirical data, and looks set to dominate both research and practice for the foreseeable future.
Suggested additional reading This chapter has concentrated on conveying a basic grasp of hierarchical models of ability, since these are of considerable theoretical and practical importance; more detailed coverage of these and the models of Gardner and Sternberg is given in Cooper (2015), Gardner (1993) and Sternberg (1985) – which should perhaps be read alongside Gottfredson (2003). A trio of papers exploring the validity of Gardner’s theory (and his response to criticisms similar to those which I levied above) can be warmly recommended (Gardner and Moran, 2006; Waterhouse, 2006a, 2006b). Those who are interested in the development of intelligence will enjoy Schroeders et al. (2016). Although the results section is rather advanced, it can safely be taken on trust. There is also a burgeoning literature on intelligence in other species; Arden and Adams (2016), for example. Scores on IQ tests gradually increased in much of the 20th century – a phenomenon sometimes known as the Flynn effect. Rindermann et al. (2017) describe the phenomenon and list some possible explanations whilst Williams (2013) gives a more detailed review of the research literature. Herrnstein and Murray (1994) contains some extremely controversial interpretations of the correlates of general ability – readers may like to evaluate critically both this work and also the many counterviews that have appeared on the internet. Those who really enjoy controversy might also like to explore the literature which claims that the average intelligence levels in a country can explain why some countries are more economically advanced than others – although the methodology of such studies needs to be carefully scrutinised, in particular the assumption that ‘Western’ intelligence tests produce accurate results in other cultures, and the possible presence of confounding variables (e.g., availability of education). Hunt (2012) is one of the more balanced papers in this area. It is also worthwhile downloading the Ekstrom et al. test and its manual (https://www.ets.org/Media/Research/pdf/Kit_of_FactorReferenced_Cognitive_Tests.pdf) to see what intelligence items look like.
Answers to self-assessment questions SAQ 12.1 There are no right or wrong answers to this question, the purpose of which was to encourage thought. Some possibilities are discussed in the next
272
Structure and measurement of abilities
section. Others might include analysing what actions requiring intelligence seem to be performed by a wide range of people, e.g., by following students, plasterers, unemployed people, pensioners, drivers, doctors, etc. during their working lives and noting either what problems they need to solve or asking them to ‘think aloud’. Alternatively, one can perform a massive literature search, as Carroll (1993) has done, in an attempt to identify and classify all the distinct abilities that have ever been reported in the literature. Another approach would be to start from cognitive psychology and to try to assess individual differences in the types of thing that cognitive psychologists assess, e.g., speed of mental rotation, mental scanning, semantic priming, etc.
SAQ 12.2 I cannot guarantee that these answers are correct – they merely represent my view concerning the type of skills that might be most important in each of the tasks: a. b. c. d. e. f.
action – output; content – aural; process – fine use of muscle input, memory and knowledge internal processing, verbal and remembering internal processing, verbal and complexity of relations internal processing, verbal, complexity of relations and creativity internal processing, knowledge, convergence and remembering.
SAQ 12.3 It is probably not sensible to conclude causality because, although the examples at the start of the chapter on factor analysis suggested that factors can sometimes be causal, it is not possible to extrapolate this and argue that any factor must be causal. For it is quite possible that things other than ‘general ability’ (possibly quality of education, inability to understand English, anxiety) may cause the observed results. However, if g can be shown to refect some basic feature of the individual (e.g., the speed with which their nerves work) that cannot easily be assessed by any other means, then it may be permissible to argue causality, and we cover this in Chapters 13 and 15.
273
Structure and measurement of abilities
SAQ 12.4 a. b. c. d.
Gv Gr Gc Gps.
SAQ 12.5 Advantages of using the single format of items include: l l
ease of administration; only one set of instructions are needed lack of reliance on language, which may make it more fair for dyslexic participants.
Advantages of using many different types of item include: l l
less likelihood of boredom estimating g by averaging across many different types of items ensures that the estimate will not be contaminated with the ‘unique factor’ (see chapter 5) associated with one type of item. As each type of item will have a different unique factor associated with it, their influence will disappear when combined. Gignac (2015) argues that these issues present problems for using the Raven tests as a measure of g.
SAQ 12.6 mental age × 100 whereas deviation IQ defines IQ as chronological age a normal distribution with a mean of 100 and a standard deviation of (usually) 16. b. Because of the definition of deviation IQ given earlier (a normally distributed variable), 50% of the children will always be defined as having IQs below 100, even if the underlying raw scores on the test all increase dramatically over the 10-year period. (In addition, the minister ought not to assume that education can improve IQ!) a.
Ratio IQ =
References Akbari, R. & Hosseini, K. 2008. Multiple Intelligences and Language Learning Strategies: Investigating Possible Relations. System, 36, 141–155.
274
Structure and measurement of abilities
Arden, R. & Adams, M. J. 2016. A General Intelligence Factor in Dogs. Intelligence, 55, 79–85. Blair, C. 2006. How Similar Are Fluid Cognition and General Intelligence? A Developmental Neuroscience Perspective on Fluid Cognition as an Aspect of Human Cognitive Ability. Behavioral and Brain Sciences, 29, 109–160. Carroll, J. B. 1993. Human Cognitive Abilities: A Survey of Factor-Analytic Studies. Cambridge: Cambridge University Press. Castejon, J. L., Perez, A. M. & Gilar, R. 2010. Confrmatory Factor Analysis of Project Spectrum Activities. A Second-Order g Factor or Multiple Intelligences? Intelligence, 38, 481–496. Cattell, R.B. 1971. Abilities, Their Structure Growth and Action. New York: Houghton Miffin. Condon, D. M. & Revelle, W. 2014. The International Cognitive Ability Resource: Development and Initial Validation of a Public-Domain Measure. Intelligence, 43, 52–64. Cooper, C. 2003. Test the Nation: The IQ Book. London: BBC Publications. Cooper, C. 2015. Intelligence and Human Abilities Structure, Origins and Applications. London; New York: Routledge. Ekstrom, R. B., French, J. W. & Harman, H. H. 1976. Manual for the Kit of Factor-Referenced Cognitive Tests. Princeton, NJ: Educational Testing Service. Elliott, C. D. & Smith, P. 2011. British Ability Scales. Manual 3, Administration and Scoring Manual. London: GL Assessment. Gardner, H. 1983. Frames of Mind (1st Edn). New York: Basic Books. Gardner, H. 1993. Frames of Mind (2nd Edn). London: Harper-Collins. Gardner, H. 2006. On Failing to Grasp the Core of MI Theory: A Response to Visser et al. Intelligence, 34, 503–505. Gardner, H. & Moran, S. 2006. The Science of Multiple Intelligences Theory: A Response to Lynn Waterhouse. Educational Psychologist, 41, 227–232. Gignac, G. E. 2015. Raven’s Is Not a Pure Measure of General Intelligence: Implications for g Factor Theory and the Brief Measurement of g. Intelligence, 52, 71–79. Gottfredson, L. S. 2003. Dissecting Practical Intelligence Theory: Its Claims and Evidence. Intelligence, 31, 343–397. Gustafsson, J.-E. 1981. A Unifying Model for the Structure of Intellectual Abilities. Intelligence, 8, 179–203. Hakstian, R. N. & Cattell, R. B. 1976. Manual for the Comprehensive Ability Battery. Champaign, IL: Institute for Personality and Ability Testing (IPAT). Hakstian, R. N. & Cattell, R. B. 1978. Higher Stratum Ability Structure on a Basis of 20 Primary Abilities. Journal of Educational Psychology, 70, 657–659. Herrnstein, R. J. & Murray, C. 1994. The Bell Curve: Intelligence and Class Structure in American Life. New York: The Free Press. Horn, J. L. & Cattell, R. B. 1966. Refnement and Test of the Theory of Fluid and Crystallised Intelligence. Journal of Educational Psychology, 57, 253–270. Hunt, E. 2012. What Makes Nations Intelligent? Perspectives on Psychological Science, 7, 284–306. Kaufman, A. S. & Kaufman, N. L. 2004. Kaufman Assessment Battery for Children (2nd Edn). Circle Pines, MN: American Guidance Service. Kline, P. & Cooper, C. 1984. The Factor Structure of the Comprehensive Ability Battery. British Journal of Educational Psychology, 54, 106–110. Koke, L. C. & Vernon, P. A. 2003. The Sternberg Triarchic Abilities Test (STAT) as a Measure of Academic Achievement and General Intelligence. Personality and Individual Differences, 35, 1803–1807.
275
Structure and measurement of abilities
Raven, J., Raven, J. C. & Court, J. H. 2003. Manual for Raven’s Progressive Matrices and Vocabulary Scales. Oxford: Oxford Psychologists Press. Rindermann, H., Becker, D. & Coyle, T. R. 2017. Survey of Expert Opinion on Intelligence: The Flynn Effect and the Future of Intelligence. Personality and Individual Differences, 106, 242–247. Schroeders, U., Schipolowski, S., Zettler, I., Golle, J. & Wilhelm, O. 2016. Do the Smart Get Smarter? Development of Fluid and Crystallized Intelligence in 3rd Grade. Intelligence, 59, 84–95. Spearman, C. 1904. General Intelligence Objectively Determined and Measured. American Journal of Psychology, 15, 201–293. Sternberg, R. J. 1985. Beyond IQ. Cambridge: Cambridge University Press. Sternberg, R. J. 1994. Thinking Styles. In R. J. Sternberg & P. Ruzgis (eds.) Personality and Intelligence. Cambridge: Cambridge University Press. Thurstone, L. L. 1938. Primary Mental Abilities. Chicago: University of Chicago Press. Undheim, J. O. 1981. On Intelligence I: Broad Ability Factors in 15 Year Old Children and Cattell’s Theory of Fluid and Crystallised Intelligence. Scandinavian Journal of Psychology, 22, 171–179. Vernon, P. E. 1950. Structure of Human Abilities. London: Methuen. Visser, B. A., Ashton, M. C. & Vernon, P. A. 2006a. Beyond g: Putting Multiple Intelligences Theory to the Test. Intelligence, 34, 487–502. Visser, B. A., Ashton, M. C. & Vernon, P. A. 2006b. g and the Measurement of Multiple Intelligences: A Response to Gardner. Intelligence, 34, 507–510. Wainer, H. 1987. The First Four Millenia of Mental Testing: From Ancient China to the Computer Age. Research Report 87–34, Educational and Testing Service, Princeton, NJ. Waterhouse, L. 2006a. Inadequate Evidence for Multiple Intelligences, Mozart Effect, and Emotional Intelligence Theories. Educational Psychologist, 41, 247–255. Waterhouse, L. 2006b. Multiple Intelligences, the Mozart Effect, and Emotional Intelligence: A Critical Review. Educational Psychologist, 41, 207–225. Wechsler, D. 2010. Manual for the Wechsler Adult Intelligence Scale – Fourth UK Edition (WAIS-IV UK). London: Pearson. Wechsler, D., Pearson Education Inc. & Psychological Corporation 2016. WISC-V: Wechsler Intelligence Scale for Children. London: NCS Pearson PsychCorp. Williams, R. L. 2013. Overview of the Flynn Effect. Intelligence, 41, 753–764.
276
Chapter 13 Ability processes
Introduction So far, we have assumed that intelligence and abilities are real characteristics of individuals, perhaps related to the way in which neurones operate or to the way in which individuals process information. That is, we have assumed that abilities exist and that they infuence behaviour. Not everyone shares this view. For example, Howe (1988) argued that these abilities are a convenient way of describing how people behave, but are essentially ‘constructions’ of the observer, rather than any real property of the individual. According to this view, it would be quite wrong to use general ability as an explanation – for example, to say that a child performs well at school because he or she is intelligent would imply that general ability is some basic property of the individual, rather than just a convenient way of describing their behaviour. This chapter accordingly examines whether intelligence, measured as described in Chapter 12, is related to the structure and function of the nervous system, or whether it is just a social construction of little real interest. If such links can be found, it may be appropriate to use general ability to ‘explain’ behaviour, since general ability may refect individual differences in the biological processing power of the brain and nervous system – a notion that dates back to Galton (1883) and Hebb (1949).
Learning outcomes Having read this chapter you should be able to: l
Discuss why measures of mental speed might be related to g, and outline the evidence linking direct and indirect measures of nerve conduction velocity to g.
277
Ability processes
l
l l
Show an understanding of attempts to model g by developing elementary cognitive tasks, and an appreciation of the problems which were found with this approach. Form an opinion about the relationship between g and working memory. Show a critical appreciation of modern research linking g to brain structure and brain activity.
Chapter 12 made some rather important points. It indicated that almost all human abilities are positively correlated together and that it is possible to describe these correlations either by a single factor of general ability (as suggested by Spearman), by a set of 20 or more ‘primary mental abilities’ (as suggested by Thurstone), by a few ‘group factors’ (as suggested by Burt and Vernon) or, most generally, by a hierarchical model that incorporates all of these features (as suggested by Carroll, Cattell and Gustafsson). This really is quite interesting. It would logically be quite possible for human abilities to be independent of each other – after all, why should abilities as diverse as musical talent, mathematical skill, vocabulary, memory or ability to visualise shapes be interrelated? It is not as if there are any obvious processes (e.g., pieces of knowledge or skills taught through education) that are common to all of them and that could, therefore, account for their interrelationships. However, knowing the structure of abilities does not mean that we understand what these abilities are. For this we need to develop ‘process models’ that describe intelligent behaviour in terms of lower-level biological, social, cognitive or neural processes. Only when we have a good understanding of precisely why some people show different patterns of ability, personality and so on, can we profess to understand why people show different levels of abilities. Thirty years ago the structure of personality and ability was not particularly well understood and journals were full of articles attempting to establish the basic dimensions of ability. As there is now some consensus of opinion about the basic structure of abilities, almost all of the older theories being accommodated neatly by a hierarchical model, the focus of research has shifted. The main aim now is to identify the processes that cause individual differences in ability to appear. This chapter focuses on general intelligence, g, rather than any of the second-order or primary abilities discussed in Chapter 12. There are two reasons for this. The frst is that as there are rather a lot of primary and second-order abilities, any discussion of their underlying processes would have to be quite superfcial to keep this chapter a reasonable length. However the second reason is more important. A person’s level of g must, by defnition, infuence their performance on all lower-level abilities. To anticipate Chapters 17 and 18, g seems to be the ‘active ingredient’ which makes lower-level abilities useful. So for example, a test measuring spatial ability might be found to predict performance at work. The question is whether this is because it measures g to some extent, or whether it is because it measures spatial ability. Surprisingly, it turns out that tests almost invariably work because of the g component that runs through each of them. Thus the main reason for considering g in this chapter is that it seems to be a potent predictor of many types of behaviour. Since abilities refect the processing of information by the nervous system, the logical place to start looking for process models of individual differences is in the biology of the nervous system and in cognitive psychology. Both of these areas can sensibly be regarded as lower level, more fundamental causes of intelligent behaviour – it seems more likely that the way neurones work will affect general ability than vice versa.
278
Ability processes
When one starts looking into the infuence of social processes, personality, etc., the picture becomes much more complex, because it is diffcult to be sure which variables are causes and which are effects. For example, suppose that it is found that children who perform poorly at mathematics hate the subject vehemently, whereas those who perform well have much more positive attitudes towards it. We assume that all children have been taught the same syllabus in the same way, and so any individual differences refect ability, not attainment. Perhaps we can argue that children’s attitudes to a subject are an important indicator of their ability in that feld – that is, that attitude infuences ability. Thus any process model of ability ought to consider attitudes as important predictors. However, this approach has several drawbacks. It is quite possible that the children’s attitudes towards mathematics will result from their performance rather than vice versa. Achieving plenty of A grades on mathematics tests, receiving approving comments from teachers, etc., might cause children to enjoy the subject. Failing tests, being laughed at by one’s peers and having to take extra tuition might well engender a negative attitude towards the subject. Thus it is just as likely that attitudes result from ability as that attitudes cause ability – which makes it unwise to rely on attitudes when exploring the causes of abilities. There is a third possibility, too – that attitude and performance may both be infuenced by some other factor(s), quality of teaching being one obvious variable. These problems are not (quite) insurmountable, given excellent experimental design and statistics, such as confrmatory factor analysis (see Chapter 19). However, it makes sense to explore the simplest routes (biology and cognition) frst – it is possible that individual differences in these areas may be able to explain a very sizeable proportion of the individual differences in ability. If so, there is no need to consider the (methodologically more messy) social and attitudinal approaches. However, if biological and cognitive approaches are unable to explain individual differences in ability, then these other avenues should be explored.
Neural processing and general ability In the 1980s several theorists suggested that a person’s general ability may, in part, be infuenced by the way in which individual neurones in his or her nervous system process information. One theory suggests that high general ability may be a consequence of having nerve cells in the brain that conduct impulses rapidly (Reed, 1984). Another theory suggests that general ability may be related to the accuracy with which information is transmitted from neurone to neurone (Eysenck, 1986; Hendrickson, 1972). The second point probably needs some explanation. Imagine that you are listening to an AM radio tuned to a distant station. It will be much more diffcult to hear what is being said than when it is tuned to a nearby station, because the background hisses and crackles interfere with your ability to detect what is being said – they mask the signal. It may be the same with neurones. Perhaps some people’s neurones switch reliably from a very low to a very high rate of fring when stimulated by another neurone. Other people’s neurones may fre quite frequently even when unstimulated or may fre only moderately frequently when stimulated. In such cases, information will not be transmitted very effciently from one neurone to others. Some upstream neurones will ‘think’ that a neurone has fred when it has not, or they may fail to detect a modest increase in fring rate that corresponds to a real psychological event. These two theories lead to similar predictions. They suggest that people whose neurones conduct information quickly along the axon and/or which transmit information effciently and accurately across the synapses are likely to be more effcient at processing information
279
Ability processes
than those whose nerves transmit information slowly or have a low signal-to-noise ratio. Several techniques have been developed to measure the speed and/or effciency of neural conduction, including direct measurement, inspection time, reaction time and evoked potential studies.
Direct measurement of neural conduction velocity In principle, such direct measurement is easy. One merely has to apply two electrodes (as far apart as possible) to a single neurone. An electric current is applied to one electrode, to stimulate the neurone. The second electrode is used to time how quickly the impulse travels down the axon of the neurone. If the distance between the two electrodes is measured with a ruler, it is a simple matter to calculate the nerve conduction velocity. If this experiment is performed with a large sample of people whose scores on a test of general ability are also known, then theory would predict that there should be a substantial positive correlation between general ability and nerve conduction velocity (NCV). Vernon and Mori (1992) performed such experiments using nerves in the arm and wrist and found that there was a signifcant correlation (of the order of 0.42–0.48) between general ability and NCV. Thus it does appear from these studies that intelligent people have nerves that simply transmit information more quickly than those of individuals of lower general ability. Others failed to replicate this fnding (Reed and Jensen, 1991; Rijsdijk et al., 1995) although Rijsdijk and Boomsma (1997) found a small relationship (r=0.15). However, these tasks do not involve any synaptic transmission and so, if the individual differences arise at the synapses, the Vernon and Mori task could not be expected to work. McRorie and Cooper (2001, 2004) hence ran some experiments to determine whether higher ability individuals show faster refex responses than lower ability ones (latency of kneejerk refex and speed of withdrawal from a painful electric shock). This second experiment involved participants pressing a metal button with their fnger. After a few seconds, a painful electric shock was delivered between the button and another electrode on the fnger; lifting the fnger broke the circuit and terminated the shock. These tasks do involve synaptic transmission (although no conscious thought at all) and there was a link between speed of withdrawal and IQ supporting the Reed/Eysenck/Jensen theory. Whilst directly stimulating nerves seems to be the obvious way of addressing the link between intelligence and neural processing speed, it has three main problems. l It can only be applied to nerves in the limbs and body – not those in the brain. This is a
problem because peripheral nerves are somewhat different from nerves in the brain (for example, their size and coating of myelin may differ). Results obtained from stimulating peripheral nerves may therefore not generalise to nerve cells in the brain. l Stimulating a single long peripheral nerve does not involve any synaptic transmission. If this is the key to understanding individual differences in intelligence (as suggested by Hendrickson) then the Vernon and Mori experiment could not be expected to work (although the refex experiments would). l These experiments are diffcult to perform because accurate temperature control is necessary which can be very diffcult to achieve in practice. Age and the amount of myelin surrounding the peripheral nerve also infuence the conduction velocity. It is therefore common to look at indirect measures of nerve conduction velocity, from examining speed of information processing, nerve activity in the brain, and similar measurements.
280
Ability processes
Indirect measurement of neural conduction velocity Inspection time Several techniques have been developed to measure NCV using indirect methods. The simplest is a technique known as inspection time (Vickers et al., 1972). This simply measures how long a simple stimulus has to be presented in order to be perceived correctly. For example, suppose that either shape (a) or shape (b) in Figure 13.1 is fashed on to a screen for a few thousandths of a second and a volunteer is then asked whether the longer line was on the left or the right. Shape (c) is presented immediately after (a) or (b) to act as a mask (if you are not familiar with the principles of masking you can safely ignore this detail). It is important to realise that inspection time is a measure of how long it takes to see a stimulus and not of how quickly a person can respond to it – in the inspection time task an individual can take as long as they want to make their decision. The aim of inspection time studies is to determine how long it takes an individual to perceive such stimuli. Since the retina is essentially an outgrowth of the brain, people with accurate and/or rapid neurones should be able to make this simple discrimination more at briefer exposure levels than those with slower nervous systems. There should be a negative correlation between general ability and inspection time. The amount of time a person needs to see the fgures can be estimated by repeating the experiment several times (e.g., 50–100 times) for each of several exposure periods. The proportion of correct answers can be noted and plotted on a graph such as that shown in Figure 13.2. It is possible to ft a curve to these data for each person, as shown. At the fastest exposures everybody performs at chance level. However, as the presentation time increases, it is clear that Person 2 can see which line is longer at short duration; they begin to perform slightly above chance at 0.3 seconds, whilst Person 1 needs the shape to be presented for about 0.4 seconds before they start to perform above chance levels. It is possible to calculate a statistic for each individual to refect these differences – for example, the exposure time at which there is an 80% chance that they will be able to solve the problem correctly. All one has to do is plot the graphs, drop a line down to the x axis and read off the inspection time for each person. This can be correlated with measures of general ability in order to determine whether inspection time (and hence, indirectly, speed and/or effciency of neural processing) is related to general ability.
Figure 13.1 Stimuli for an inspection time task 281
Ability processes
Figure 13.2 Probability of correctly identifying the ‘longer leg’ of the inspection time fgure at several exposure durations
Tens of studies show that it is. Most fnd that the correlations between measures of general ability and inspection time are of the order of –0.3 to –0.5. Several alternative methods of assessing inspection time have also been devised (e.g., McCrory and Cooper, 2005, 2007; Nettelbeck and Kirby, 1983) and they yield broadly similar results to the method described above. The interpretation of such data is however a little more controversial. It has been suggested that the correlations between inspection time and general ability may be infuenced by attention span or by the use of cognitive strategies, such as ficker detection, which will reveal whether the shorter leg of the masked stimulus were presented to the left or the right (Mackenzie and Bingham, 1985). Some researchers have developed more effective masks that do not allow people to develop strategies for solving the task (Stough et al., 2001) whilst others note that even when these are taken into account by experimental and statistical means, there is still an appreciable correlation between the inspection time and g (Egan, 1994). Alternative paradigms, such as ability to detect the apparent direction of a tone briefy presented through earphones (McCrory and Cooper, 2005; Parker et al., 1999) which are not affected by these cognitive strategies produce similar results to the original experiments, so the fnding seems to be robust and genuine. Grudnik and Kranzler (2001) report a metaanalysis of inspection time studies and conclude that the size of the underlying correlation is approximately −0.51 after correcting for reliability and restriction of range.
282
Ability processes
Self-assessment question 13.1 Why would it be serious if inspection time were shown to be infuenced by cognitive strategies, such as ficker detection?
Overall, it is clear that inspection time has a very substantial link to general intelligence, and that this work has replicated well using a variety of different stimuli. This lends strong support to the theory that general intelligence is linked to the speed or effciency of neural conduction. Reaction time Inspection time estimates the time for which a stimulus has to be presented in order to be correctly recognised, but it is also possible to assess how long it takes an individual to respond to a stimulus. This, too, might be related to general ability – if highly intelligent individuals have neurones that transmit information particularly quickly or accurately, these individuals should be able to respond more rapidly to a signal than people of lower general ability. This idea was frst proposed by Galton (1883), who eventually concluded that no such relationship existed. However, the accurate measurement of reaction times was problematic 150 years ago and Jensen and Munroe (1974) re-examined this issue. They measured reaction time using the apparatus shown diagrammatically in Figure 13.3.
Figure 13.3 Jensen and Munroe’s (1974) choice reaction time apparatus
283
Ability processes
The apparatus consists of eight green lights (here depicted by squares) arranged in a semicircle on a metal panel. Beside each light is a button (denoted by a black circle) with an additional ‘home button’ centred 15cm away from all of the other buttons. The lights and buttons are connected to a computer that controls the experiment. The task is straightforward and participants are asked to react as quickly as possible. The index fnger of the preferred hand is placed on the home button. After a (random) period of a few seconds, one of the lights is illuminated – the ‘target light’. The participant lifts their fnger from the home button and presses the button beside this light. Two time intervals are recorded: l the interval between the ‘target’ light being illuminated and the fnger leaving the home
button, known as the reaction time (RT) l the interval between the fnger leaving the home button and pressing the target button,
known as the movement time (MT). This is repeated at least 50 times. Measurements are also obtained for different numbers of lights (varying from one to eight), the unused lights and buttons being masked off by metal plates. Hence the apparatus measures simple reaction time (one light) or up to eight-choice reaction time. When one person’s mean reaction times are plotted as a function of the number of lights, a graph similar to that shown in Figure 13.4 is obtained. (Do not worry about why the
Figure 13.4 Reaction time and movement time of a single person as a function of the number of potential targets 284
Ability processes
numbers on the x axis are not equally spaced.) Straight lines may be ftted to this person’s RT and MT graphs as shown and the equations of these straight lines can be obtained by simple algebra. The height of these lines (the ‘intercept’) shows how quickly the participant reacted overall. The slope of the lines shows how much their RT (or MT) changes as the number of potential targets increases. RT typically increases as the number of potential targets increases (a phenomenon known as ‘Hick’s law’), whereas MT stays almost constant. We thus have four measures for each participant, corresponding to the slope and the intercept of the two lines. These measures may be correlated with scores on a test of general ability. It is also possible to measure the variability of each person’s scores within a particular condition, for example, the standard deviation of their scores within the four-light condition. This too may be correlated with g. The intercept of the RT graph generally shows a signifcant negative correlation with general ability – in the order of –0.3 (Jensen, 1987a) whereas the slope does not correlate with general ability. Intelligent people respond more quickly in each condition, but there is no difference in the slope of the lines obtained by intelligent and less intelligent people (Barrett et al., 1986; Jensen, 1982; Beauducel and Brocke, 1993). Variability of a person’s RT also correlates with g (Doebler and Scheffer, 2016), more intelligent people being more consistent. MT shows no correlation with general ability, although the variability in MT often does (Jensen, 1982). The Jensen apparatus is somewhat large and ‘clunky’; the fnger needs to travel a large distance and the buttons have strong springs and need to be pressed hard. More modern apparatus reveals large correlations between reaction time and g. It also seems that age matters, as the correlation is larger for older groups. Der and Deary (2017) found a correlation between four-choice RT and g of between –0.44 (aged 30) and –0.53 (aged 69). Unfortunately, however, reaction times are not easy to work with. l People occasionally give very long reaction times for no obvious reason, perhaps because
their attention wandered. The way in which such data are cleaned up (and making an arbitrary decision about how long a reaction time should be before it is removed) can have a massive effect on experimental results. l Each person produces more short reaction times than long ones. As reaction times are therefore skewed, it arguably makes little sense to average them anyway. l Variability of RTs is substantially correlated with the mean RTs (typically r=–0.5) and so it is diffcult to tell whether the link between RT and g indicates fast or consistent responding. Rather more complex models of how the brain makes the decision to respond to a stimulus have recently been developed (Ratcliff et al., 2008; Ratcliff et al., 1999). These are based on reaction time data from a two-choice task, but rather than looking at each person’s average RT and its variability they measure rate of acquisition of information and other variables for each person. These can be related to g. Schmiedek et al. (2007) found that rate of acquisition of information correlated 0.72 with working memory; this is very similar to g, as discussed below. These methods are a little complex and so will not be considered here, but are recommended to anyone interested in researching this area. It is also possible to estimate the amount of skew which an individual’s reaction times show, separately from their latency and spread – an ‘ex-Gaussian’ function (Osmon et al., 2018) and anyone seeking to analyse reaction-time-like measures would also be well advised to consult that paper. Not everyone would agree that a link between reaction time measures and g necessarily implies that intelligence is related to speed of processing. Longstreth (1984) suggests that the
285
Ability processes
more intelligent participants may be able to devise effcient cognitive strategies for responding quickly (Sternberg’s ‘metacomponents’ mentioned in Chapter 12). The correlations between RT and general ability may thus refect the use of such strategies rather than anything more fundamental. However, experiments performed in order to test such hypotheses show that these strategies cannot explain the underlying correlation between RT and general ability (Matthews and Dorn, 1989). Reaction time does seem to show a negative correlation with general ability, as expected.
Self-assessment question 13.2 a. What is the difference between inspection time and reaction time? b. Name three measures derived from the Jensen task that have been found to correlate substantially with general ability.
Evoked potentials The fourth way of exploring the link between neural activity and general ability relies on recording the electrical activity from brain cells. If one sticks (quite literally!) one or more electrodes on the surface of the scalp and attaches these to sensitive amplifers, it is possible to measure electrical activity in the brain – not in individual neurones, but in whole areas of the brain. One can either analyse the activity obtained when the person is sitting quietly, or note how the pattern of activity changes following some stimulus, such as a tone presented through headphones. Auditory evoked potential (AEP) recordings are, as their name suggests, measures of brain voltages (potentials) that are evoked by a sound. In the simplest experimental design, the participant is seated in a comfortable chair, an electrode is attached to their scalp and a pair of headphones is placed over their head. They are simply told to remain sitting, with eyes closed, and do nothing. They are informed that they will hear clicks or tones through the headphones from time to time, but that these should be ignored. A hundred or so clicks are presented over a period of about 10 minutes. What could be simpler, from a participant’s point of view? The brain’s electrical activity is recorded for a couple of seconds following each click. In practice, the voltages are typically measured every thousandth of a second following the onset of the click and at the end of the experiment each of these 1000 voltages is averaged across the 100 replications (hence it is known as the average auditory evoked potential). The 1000 averaged voltages may then be plotted as a graph, rather as shown in Figure 13.5, which is based on the pioneering work of Ertl and Schafer (1969). This shows time on the x axis and voltage on the y axis. This process therefore produces a different graph for each participant. Several features of these graphs have been linked to IQ. Hendrickson and Hendrickson (1980) suggested that high IQ participants appear to have longer, more spiky waveforms and this could be assessed simply by measuring the length of the graph over a particular period of time (such as 0.5 seconds). They called this the ‘string measure’ since it was originally measured by putting a piece of string over graphs such as those shown in Figure 13.5 and then straightening it out and measuring its length. Using this technique, they found a correlation of 0.7 between string length and IQ in Ertl and Schafer’s data, though it has proved diffcult to replicate this effect. A technically sophisticated study by Barrett and Eysenck (1992) could fnd no evidence whatsoever for this relationship. There are theoretical problems with the string measure, too. A spiky average trace can only arise if (a) each individual trace contains a lot of high-frequency components and (b) the time course of these components is consistent from trial to trial; if this is not the case the peaks and troughs
286
Ability processes
Figure 13.5 Average auditory evoked potential recording will tend to average out to produce a straight line. A fat trace could thus arise for two very different reasons – a lack of high-frequency activity or inconsistent time courses. It would surely make sense to analyse each of these variables separately. If the Reed/Jensen theory is correct, one might expect people with faster-acting nervous systems to show faster reactions to the stimuli – some or all of the large peaks and troughs shown in Figure 13.5 might all move to the left for someone with high intelligence, as these are thought to represent the processing of the auditory stimulus. The problem is, however, that one can never be quite sure which neural networks are being picked up by the electrode, and even the decision about where to place the electrode seems arbitrary. It seems possible that two people may have developed different networks of neurons for processing such signals, and this (rather than speed of activity within neurons or fdelity of transmission across the synapses) might well infuence the AEP waveform. All in all the technique is rather crude, although a fercely complex paper by Schubert et al. (2017) recorded evoked potentials from several areas of the brain whilst people solved a variety of cognitive puzzles. They found substantial links between ERP latency (how soon after the stimulus was presented the major peaks and troughs appeared) and general intelligence, and found that the speed at which their participants responded mediated this relationship. It seems that there is some association between speed of information processing in the brain and intelligence.
Elementary cognitive tasks It would be amazing if the kinds of ability described in Chapter 12 were completely unrelated to the types of process studied by cognitive psychologists and so in this section
287
Ability processes
and the next we shall consider some areas of overlap between the two. Carroll (1980) suggested that some of the tasks traditionally studied by cognitive psychologists, such as the Sternberg memory scanning paradigm or the Clark and Chase (1972) sentence verifcation task, should be related to mental abilities. These experiments were designed to measure the time taken to perform a single elementary cognitive operation (ECO). For example, the Saul Sternberg memory scanning paradigm (Sternberg, 1969) presents a list of numbers or characters for several seconds, after which it disappears and is followed by a ‘target character’. The participant is asked to push one button if the target character was in the preceding list and another if it was not. The reaction time is found to be related to the length of the list. After a little statistical juggling (involving regression lines), it is possible to estimate how long it takes an individual to ‘take in’ the meaning of the target and how long it takes that individual to compare the target with one of the stored representations of a character. Jensen (1987b) used the Sternberg memory scanning experiment and found that comparison time correlated negatively with scores on a test of intelligence whilst Carroll (1980) showed that it is fairly easy to fnd small negative correlations (in the order of –0.3) between the time it took people to perform various elementary cognitive operations and intelligence. These studies were important because they showed that intelligence tests (even ones where participants were given as much time as they wanted to solve the items) were related to speed of processing in a wide variety of tasks then being developed by cognitive psychologists. However this did not really help to develop a theory of human intelligence, other than showing that speed of processing might be important. The second approach attempts to model solution times to complex tasks by frst drawing up a fowchart of the cognitive processes which need to be performed in order to solve a particular type of problem, and then performing experiments to estimate how long a particular person takes to perform each of these ECOs.
Self-assessment question 13.3 Is it possible that some cognitive strategies could affect performance on either of these experiments?
Robert Sternberg (1977) performed some remarkably elegant experiments in this area. He argued that if the sequence of cognitive operations required to perform a fairly complex task were known, and if it was possible to estimate (through experiments such as those described earlier) how long it takes each individual to perform each of these basic cognitive operations, it might be possible to predict quite accurately how long it would take an individual to solve the problem. In other words, one frst identifes all the ECOs that are thought to be involved in a particular task and draws up a fowchart to show how these are organised. The next step is to assess how long it takes each individual to perform each of these mental operations, using carefully designed experiments. Having done this, it should be possible to ‘plug in’ the amount of time that it takes an individual to perform each ECO in order to predict how long it should take a person to solve a new problem which involves some or all of the same ECOs, performed in a different sequence. This approach might produce a quantitative, predictive model – a rarity in psychology. Essentially, ECO durations would replace ability traits as the unit of analysis. That was the theory, at any rate.
288
Ability processes
Much of Sternberg’s work focused on nonverbal analogy problems – for example, analogy items based on simple cartoon characters. These characters differed from each other in the following four basic ways: l l l l
gender – male/female height – tall/short girth – fat/thin colour – shaded/unshaded.
A typical analogy task using these characters would be like that shown in Figure 13.6. You have probably already encountered verbal analogies – for example, ‘cat is to kitten as cow is to bull’ [sic]. Participants would have to decide whether each of these items were true or false. Sternberg moved away from such verbal analogies because he felt that the use of language may complicate the process of solving these problems. Instead, he argued that cartoon characters can be used in the same way. In Figure 13.6 the only thing that differs between the frst pair of characters is their sex. Their shading, height and girth are the same. Applying the same rule to the second pair, the fourth character should clearly be a short, thin, unshaded female. However, it is not, so the analogy is false. Sternberg argued that six main cognitive processes underpinned performance on these tasks, namely encoding (the recognition of each feature of the cartoon characters), inference (noting which features changed between the frst pair of characters, thus inferring the rule to be applied to the second pair of characters), mapping the relationship between the frst and third characters, applying the rule to the third character, comparing the fourth character to what it should be and responding appropriately. He drew up several plausible fowchart models for solving these problems on the basis of these elementary cognitive operations. Through the use of some ingenious experiments involving pre-cueing (sometimes showing some of the characters before the others, thereby allowing some of the ECOs to be performed prior to the measurement of the reaction times), he was able to assess the duration of each of these components independently for each person (Sternberg, 1977). The problem is that, even with a relatively simple task such as this, the number of possible ways of solving the problem becomes enormous. For example, some cognitive operations may
Figure 13.6 Example of a schematic analogy task, reading ‘Tall fat unshaded man is to tall fat unshaded woman as short thin unshaded man is to short thin shaded man’ – an incorrect analogy. If correct, the fourth character would be a short, thin unshaded woman 289
Ability processes
be performed in parallel, rather than sequentially. Cognitive strategies for solving the problems can also complicate the picture; for example, if three of the characters are shaded but one is not or if one fgure is a different height from the other three, then the analogy must be incorrect. Some (but not all) of the individuals performing this task seem to check for such possibilities before analysing the problem in detail; intelligence did not seem to infuence the decision to use a strategy. The one published attempt to replicate this work (May et al., 1987) found that the model(s) proposed by Sternberg really did not ft the data very well. Neither did the estimated durations of Sternberg’s six basic cognitive processes correlate very strongly with other cognitive measures. Although it is an elegant and sophisticated attempt to estimate the duration of elementary cognitive processes, it seems that people use heuristics when deciding how to solve complex problems. There is also the issue of what is meant by ‘elementary’. How can one ever tell that a particular cognitive operation is so basic and hard-wired that it operates in the same way every time, whatever else the organism is doing? Modelling intelligence using low-level tasks from cognitive psychology may not be as fruitful as it once appeared. Higher-level cognitive processes have however been linked to intelligence, as discussed below.
Working memory and intelligence There is currently much interest in the relationship between working memory and intelligence. Working memory, as studied by cognitive psychologists, is known to be closely related to general intelligence (Colom et al., 2008; Engle et al., 1999). A careful metaanalysis by Oberauer et al. (2005) indicates that the correlation is 0.85, showing that the two concepts are virtually identical. However the meaning of this correlation is less than clear, as discussed below. Working memory tasks require a person to keep information in short-term storage while performing some other cognitive operation. They may involve showing someone a story consisting of several sentences and asking them to remember both the meaning of the story and the last word in each sentence; they are tested afterwards on both of these aspects. The task requires deep processing of words (to grasp their meaning) and memory (for the last word in each sentence) and the central executive presumably works hard switching resources between the two tasks. Table 13.1 gives an example of an item from such a test.
Table 13.1 A typical item from a test of working memory
290
Description
Text shown on screen
Passage shown on screen for 30 seconds. Participants are told to remember the meaning of the passage and memorise the last word of each sentence, which is in bold.
It was noon, and the sun beat against the outer walls like an invading army. It was refected off the cobbles in the old town. It felt as though an oven door had been thrown open. Flies buzzed. No birds sang. The whole world seemed to doze.
Question 1
Where were the cobbles?
Question 2
What was the last word of the ffth sentence?
Ability processes
Another task which supposedly measures working memory asks people to listen to a string of numbers – e.g., 1, 4, 9, 2, 7 – and then repeat them backwards. It is thus necessary to both remember the original list and process it in reverse. It correlates with general intelligence which is perhaps hardly surprising as this ‘digit backwards’ task appears in several intelligence tests (e.g., the WISC-IV) of which it is ‘a high quality subtest’ (Gignac and Weiss, 2015). Perhaps cognitive psychologists are just re-discovering g, calling it ‘working memory’? The reverse digit-span task correlates only about 0.19 with another measure of working memory (Hilbert et al., 2015) which casts doubt as to whether it is a ‘pure’ measure of working memory. Other tasks have been used too. Johnson et al. (2010) describe some of them and report the correlations between them for various age groups. These correlations are usually between 0.25 and 0.3, which raises concern about whether the tasks all measure the same thing. Based on this research (which used a huge sample) it would seem essential to administer several different working memory tasks, as each task has a lot of task-specifc variance associated with it which needs to be removed by combining scores on several different tasks. Unfortunately, however, researchers do not always do so. Whilst combining scores from several different working memory tasks will be effective if each task is affected by a different source of task-specifc variance, it will be no use at all if all working memory tasks measure several aspects of cognitive performance – not just working memory. Unfortunately Diamond’s (2013) conclusion is that many tasks which supposedly measure working memory also measure many ‘executive functions’ which infuence problemsolving ability. Executive functions she defnes as: inhibition [inhibitory control, including self-control (behavioral inhibition) and interference control (selective attention and cognitive inhibition)], working memory (WM), and cognitive fexibility (also called set shifting, mental fexibility, or mental set shifting and closely linked to creativity). From these, higher order EFs are built such as reasoning, problem solving, and planning. (Diamond, 2013) These executive functions comprise the very skills which are required for the successful solution of unfamiliar problems, which is exactly what intelligence tests entail. It would thus be amazing if tasks which measured executive functions did not (a) resemble the problems in intelligence tests and (b) correlate substantially with them. According to this view, tests which claim to measure working memory may instead measure a whole bundle of cognitive processes. If this is the case, then even if they show massive correlations with g as Oberauer’s work suggests, it is diffcult to tell the extent to which the ‘working memory’ part of the bundle is responsible for this correlation. To summarise so far: is working memory the same as g? The correlations suggest that it might be, although one must always ask how well working memory has been assessed; as some tasks which supposedly measure it are only feebly related to other tasks. It is possible that performance at many ‘working memory’ tasks are also infuenced by other executive functions, which may account for their large correlations with tests measuring g. There seems to be a need for cognitive psychologists to develop ‘pure’ measures of working memory, or justify that the tasks which they currently use are pure measures of the concept. One way of determining whether working memory is the same as g is to determine whether tests of these two constructs predict behaviour differently. For example, if tests of working memory and g are given to young children whose school performance is measured some years later, do they predict performance equally well? We will consider such work in Chapter 18,
291
Ability processes
Table 13.2 Correlations between fuid intelligence, working memory and three EEG measures. P300 refers to a positive electrical signal 300 milliseconds following a stimulu (Wongupparaj et al., 2018) Fluid intelligence Working memory
Working memory
0.53
P200 latency
–0.17
–0.19
P300 amplitude for target stimuli
–0.28
–0.28
P300 amplitude for non-target stimuli
–0.48
–0.43
but note in passing that Alloway and Alloway (2010) believe that they show different levels of prediction and so are not identical. A second approach is to study the neurological underpinnings of both working memory and g. This should reveal whether they involve activity in similar parts of the brain. Unfortunately, the literature is sparse and sometimes seems to focus more on elegant physiological measurement than elegant measurement of working memory. Table 13.2 shows the correlations between three measures from an evoked potential study, and tests of working memory and fuid intelligence. It can be seen that the correlations which working memory and fuid intelligence show with the three measures from an evoked potential study look similar. In addition, Clark et al. (2017) performed an fMRI scan whilst volunteers completed an intelligence (Raven) or a working memory task. They found that similar areas of the brain were activated by both tasks. Wongupparaj et al. (2018) conclude that intelligence is infuenced by executive functions, together with an input from working memory as evidenced by the psychometric and physiological responses. These fndings agree well with our earlier suggestion that individual working memory tasks may also measure executive function to a substantial extent, and this combination of factors is what may make them correlate substantially with tests of intelligence.
Intelligence, brain structure and brain activity Brain size (volume) correlates somewhere between 0.26 and 0.56 with intelligence, the volumes of the frontal and temporal areas being particularly important (Flashman et al., 1997). Grey matter volume (cell bodies) seems more important that white matter volume (axons; Colom et al., 2009). Interestingly, the correlation between brain volume and g is almost entirely due to the same genes affecting both brain size and g (Posthuma et al., 2002) showing that intelligence has a substantial biological basis. There is evidence that the thickness of some areas of the cortex is related to g, and variations in thickness due to ageing are refected in changes in g (Karama et al., 2011; Tisserand and Jolles, 2003), though of course this does not necessarily imply causation. There does not seem to be much evidence for an overall relationship between the amount of convolution (folding) in the cortex and intelligence, or between the surface area of the cortex and intelligence (Bajaj et al., 2018), despite some earlier suggestions that these were important.
292
Ability processes
Which parts of the brain are active when we take intelligence tests? If we discover this, and know the functions of various systems of the brain and their interconnections, then it might be possible to develop a physiological model of the processes which underpin g. The literature in this area can get technical, but Bastern et al. (2015) provide a clear review of the main issues and research fndings whilst Luders et al. (2009) give an intelligible account of the older literature outlining links between intelligence and brain activity, as revealed by a review of all the literature on MRI, fMRI, PET and other scans and their relationship to tests of psychometric g. Jung and Haier (2007) reviewed studies which linked intelligence to brain structure and function and found that the literature was surprisingly consistent. It showed that rather large areas of the brain were involved, particularly activity in the grey matter (nerve cells) of the parieto-frontal areas and some bundles of axons connecting the various areas. They suggested that processing begins at parts of the brain (areas within the temporal and occipital lobes) which deal with visual and auditory information, and then passes to areas which perform recognition and higher-order analysis of auditory and visual information. The information then fows to the parietal cortex and that the frontal lobes are involved in generating and testing multiple solutions to the problem. Once the best solution is found other parts of the brain make whatever response is required. Their paper goes into considerable anatomical detail, but one of the great merits of this work is that it then draws on studies of traumatic brain damage (e.g., shrapnel wounds, surgery, stroke) to test the theory. This shows that damage to the areas which form the basis of Jung and Haier’s model result in changes to intelligence, whereas damage to other areas generally do not. More recent fMRI scans taken when people solve intelligence tests also show that the frontal and parietal lobes of the brain seem to be particularly important, although Jung and Haier’s fronto-parietal theory of intelligence requires some modifcation in the light of more accurate scanning methodology (Basten et al., 2015). These authors show that other areas of the brain are also involved, including the area below the cortex, the insular cortex and the posterior cingulate cortex. We encountered resting-state functional connectivity (RSFC) in Chapter 11, where it was used to explore whether individual differences in personality were related to the strength of the neural networks which connect various brain centres. The same sort of approach has been followed with regard to intelligence, with more promising results. For example, Hearne et al. (2016) found that levels of intelligence were related to positive strengths of connections between a number of neural sites (based on previous literature including, but not limited to the frontal and pre-frontal cortex and parietal areas) when people were just resting. The overall correlation was an impressive 0.38 and they conclude that ‘this pattern of connectivityintelligence associations is consistent with other fndings suggesting that higher global network effciency is related to higher general intelligence measures and that the acrossnetwork connectivity of the fronto-parietal network is critical for fuid intelligence’. This fnding that the strength of the connections between distinct regions of the brain is related to intelligence (even when people are not actively solving problems) is certainly consistent with both Jensen’s theory of intelligence and much of the psychophysiological data considered above. A similar fnding was reported by Ferguson et al. (2017) who scanned the resting brains of volunteers, and estimated what they thought their intelligence would be based on the degree of connectivity (synchrony) of many overlapping networks across the brain as a whole. The correlation between measured intelligence and intelligence levels which were predicted just from the analysis of the fMRI data was 0.24 – an impressive feat. We are starting to understand how the connections between various parts of the brain result in a particular level of intelligence.
293
Ability processes
Summary This chapter has covered a lot of ground – some of it rather technical. There are however some rather clear and consistent fndings. l Individual differences in brain structure (volume, etc.) correlate with g. If such physiological
l
l
l
l
variables are largely under genetic control, rather than being determined by purely environmental factors, this suggests that g is linked to individual differences in the biology of the brain. The fMRI studies investigating which areas of the brain are active when people solve intelligence-test problems produce fairly consistent results and now that neuroscientists are starting to determine the function of various parts of the brain and the pathways that interconnect them this may allow them to understand the biological processes which underpin g. Is g the same as working memory? We cannot really tell, because no one really seems to know whether ‘working memory’ tasks really just measure working memory. If they do, then the two concepts seem to be so highly correlated as to be almost indistinguishable. Does performance at elementary cognitive tasks predict g? It is not diffcult to fnd a correlation in the order of 0.3 between a simple cognitive task and a test measuring g. However modelling g in terms of a sequence of cognitive operations (Sternberg, 1977) does not work, probably because people develop different strategies and heuristics for solving the problems. There is also the problem of determining what these elementary cognitive processes are; how can we show that each task is incapable of being defned in terms of simpler processes? Is g linked to speed/fdelity of information processing in the nervous system? Evidence from direct stimulation of peripheral nerves is ambiguous, however indirect measures (inspection time, modern reaction-time studies and evoked potential measures) is plentiful. It all suggests that a fast, accurate nervous system underlies g.
Suggested additional reading Ian Deary’s (2012) summary of intelligence research is comprehensive, even-handed and easy to read; the genetics section can be postponed until Chapter 15. Der and Deary (2017) give a very readable account of the reaction-time literature; the technical detail in the ‘methods’ section can safely be ignored. Inspection time studies seem to be less popular in recent years, presumably because its relationship with g is now so well established as to be uncontroversial. Deary’s (2012) review can be recommended, as can Sheppard and Vernon’s (2008) review of the older literature. Finally, those who are interested in studying whether it is possible to boost intelligence through training exercises might be interested in Hayes et al. (2015).
Answers to self-assessment questions SAQ 13.1 The whole rationale for correlating inspection time with g is that inspection time is thought to measure a rather simple and basic physiological process,
294
Ability processes
namely the speed and/or accuracy with which information can be transmitted from neurone to neurone. If it is found that performance on the inspection time task is infuenced by ‘higher order’ mental processes (such as strategy use or concentration), then it is clear that inspection time cannot be a pure measure of this physiological phenomenon and so will be less useful for testing the theory that intelligence is essentially a measure of how fast and/ or accurately nervous systems can transmit information.
SAQ 13.2 a. In an inspection time task the experimenter controls the duration of the stimuli and ascertains the exposure duration at which each individual has a certain probability (e.g., 75% or 90%) of correctly identifying the stimulus. The participant can take as long as they like to make their response. A reaction time task, by way of contrast, requires participants to respond as quickly as possible to a stimulus. In the inspection time task, perceiving the stimulus is the important part of the experiment, whereas in the reaction time task it is responding quickly that is crucial. b. The following three measures have been found to correlate substantially with general ability: l l l
The slope of the line obtained when response time is plotted as shown in Figure 13.4 shows a negative correlation with g. The intercept (height) of the line obtained when response time is plotted as shown in Figure 13.4 shows a negative correlation with g. The variability in the movement time (e.g., the standard deviation of individuals’ movement times for a particular experimental condition) shows a negative correlation with g.
SAQ 13.3 It does seem possible that cognitive strategies might infuence performance on the Sternberg task. Most obviously, the standard instruction to ‘respond as quickly as possible while trying not to make any errors’ may well be interpreted rather differently by various individuals, some of whom will respond slowly in order to avoid making any errors, while others may be content to sacrifce accuracy for speed. There are other possibilities, too. For example, when scanning a list of ‘target letters’ in the Sternberg task, does one continue to scan to the end of the list after detecting the target or does one stop immediately? Is it possible to scan several elements
295
Ability processes
of the list in parallel, rather than serially? The evidence shows that the responses to this task are infuenced by such variables. If people use different strategies, some of which are more effective than others, this will reduce the correlation between task-performance and intelligence.
References Alloway, T. P. & Alloway, R. G. 2010. Investigating the Predictive Roles of Working Memory and IQ in Academic Attainment. Journal of Experimental Child Psychology, 106, 20–29. Bajaj, S., Raikes, A., Smith, R., Dailey, N. S., Alkozei, A., Vanuk, J. R. & Killgore, W. D. S. 2018. The Relationship between General Intelligence and Cortical Structure in Healthy Individuals. Neuroscience, 388, 36–44. Barrett, P. T. & Eysenck, H. J. 1992. Brain Evoked Potentials and Intelligence: The Hendrickson Paradigm. Intelligence, 16, 361–381. Barrett, P. T., Eysenck, H. J. & Lucking, S. 1986. Reaction Time and Intelligence – a Replicated Study. Intelligence, 10, 9–40. Basten, U., Hilger, K. & Fiebach, C. J. 2015. Where Smart Brains Are Different: A Quantitative Meta-Analysis of Functional and Structural Brain Imaging Studies on Intelligence. Intelligence, 51, 10–27. Beauducel, A. & Brocke, B. 1993. Intelligence and Speed of Information Processing: Further Results and Questions on Hick’s Paradigm and Beyond. Personality and Individual Differences, 15, 627–636. Carroll, J. B. 1980. Individual Difference Relations in Psychometric and Experimental Cognitive Tasks. Chapel Hill, NC: The L.L. Thurstone Psychometric Laboratory. Clark, C. M., Lawlor-Savage, L. & Goghari, V. M. 2017. Comparing Brain Activations Associated with Working Memory and Fluid Intelligence. Intelligence, 63, 66–77. Clark, H. H. & Chase, W. G. 1972. On the Process of Comparing Sentences against Pictures. Cognitive Psychology, 3, 472–517. Colom, R., Abad, F. J., Quiroga, M. Ã. N., Shih, P. C. & Flores-Mendoza, C. 2008. Working Memory and Intelligence Are Highly Related Constructs, but Why? Intelligence, 36, 584–606. Colom, R., Haier, R. J., Head, K., Alvarez-Linera, J., Quiroga, M. A., Shih, P. C. & Jung, R. E. 2009. Gray Matter Correlates of Fluid, Crystallized, and Spatial Intelligence: Testing the P-Fit Model. Intelligence, 37, 124–135. Deary, I. J. 2012. Intelligence. In S. T. Fiske, D. L. Schacter & S. E. Taylor (eds.) Annual Review of Psychology, Vol 63. Palo Alto: Annual Reviews. Der, G. & Deary, I. J. 2017. The Relationship between Intelligence and Reaction Time Varies with Age: Results from Three Representative Narrow-Age Age Cohorts at 30, 50 and 69 Years. Intelligence, 64, 89–97. Diamond, A. 2013. Executive Functions. In S. T. Fiske (ed.) Annual Review of Psychology, Vol 64. Palo Alto: Annual Reviews. Doebler, P. & Scheffer, B. 2016. The Relationship of Choice Reaction Time Variability and Intelligence: A Meta-Analysis. Learning and Individual Differences, 52, 157–166. Egan, V. 1994. Intelligence, Inspection Time and Cognitive Strategies. British Journal of Psychology, 85, 305–316.
296
Ability processes
Engle, R. W., Tuholski, S. W., Laughlin, J. E. & Conway, A. R. A. 1999. Working Memory, Short-Term Memory, and General Fluid Intelligence: A Latent-Variable Approach. Journal of Experimental Psychology – General, 128, 309–331. Ertl, J. P. & Schafer, E. W. P. 1969. Brain Response Correlates of Psychometric Intelligence. Nature, 223, 421–422. Eysenck, H. J. 1986. The Theory of Intelligence and the Neurophysiology of Cognition. In R. J. Sternberg (ed.) Advances in the Psychology of Human Intelligence. Hillsdale, NJ: Erlbaum. Ferguson, M. A., Anderson, J. S. & Spreng, R. N. 2017. Fluid and Flexible Minds: Intelligence Refects Synchrony in the Brain’s Intrinsic Network Architecture. Network Neuroscience, 1, 192–207. Flashman, L. A., Andreasen, N. C., Flaum, M. & Swayze Li, V. W. 1997. Intelligence and Regional Brain Volumes in Normal Controls. Intelligence, 25, 149–160. Galton, F. 1883. Inquiries into Human Faculty and Its Development. London: Macmillan. Gignac, G. E. & Weiss, L. G. 2015. Digit Span Is (Mostly) Related Linearly to General Intelligence: Every Extra Bit of Span Counts. Psychological Assessment, 27, 1312–1323. Grudnik, J. L. & Kranzler, J. H. 2001. Meta-Analysis of the Relationship between Intelligence and Inspection Time. Intelligence, 29, 523–535. Hayes, T. R., Petrov, A. A. & Sederberg, P. B. 2015. Do We Really Become Smarter When Our Fluid-Intelligence Test Scores Improve? Intelligence, 48, 1–14. Hearne, L. J., Mattingley, J. B. & Cocchi, L. 2016. Functional Brain Networks Related to Individual Differences in Human Intelligence at Rest. Scientifc Reports, 6, https://doi.org/ 10.1038/srep32328. Hebb, D. O. 1949. The Organisation of Behavior. New York: Wiley. Hendrickson, D. E. 1972. An Integrated Molar/Molecular Model of the Brain. Psychological Reports, 30, 343–368. Hendrickson, D. E. & Hendrickson, A. E. 1980. The Biological Basis of Individual Differences in Intelligence. Personality and Individual Differences, 1, 3–33. Hilbert, S., Nakagawa, T. T., Puci, P., Zech, A. & Buhner, M. 2015. The Digit Span Backwards Task Verbal and Visual Cognitive Strategies in Working Memory Assessment. European Journal of Psychological Assessment, 31, 174–180. Howe, M. J. A. 1988. Intelligence as Explanation. British Journal of Psychology, 79, 349–360. Jensen, A. R. 1982. Reaction Time and Psychometric g. In H. J. Eysenck (ed.) A Model for Intelligence. Berlin: Springer-Verlag. Jensen, A. R. 1987a. Individual Differences in the Hick Paradigm. In P. A. Vernon (ed.) Speed of Information-Processing and Intelligence. Norwood, NJ: Ablex. Jensen, A. R. 1987b. Process Differences and Individual Differences in Some Cognitive Tasks. Intelligence, 11, 107–136. Jensen, A. R. & Munroe, E. 1974. Reaction Time, Movement Time and Intelligence. Intelligence, 3, 121–126. Johnson, W., Logie, R.H. & Brockmole, J.R. 2010. Working Memory Tasks Differ in Factor Structure across Age Cohorts: Implications for Dedifferentiation. Intelligence, 38, 513–528. Jung, R. E. & Haier, R. J. 2007. The Parieto-Frontal Integration Theory (P-FIT) of Intelligence: Converging Neuroimaging Evidence. Behavioral and Brain Sciences, 30, 135–154. Karama, S., Colom, R., Johnson, W., Deary, I. J., Haier, R., Waber, D. P., . . . Brain Development Cooperative Group 2011. Cortical Thickness Correlates of Specifc Cognitive Performance Accounted for by the General Factor of Intelligence in Healthy Children Aged 6 to 18. Neuroimage, 55, 1443–1453.
297
Ability processes
Longstreth, L. E. 1984. Jensen’s Reaction Time Investigations: A Critique. Intelligence, 8, 139–160. Luders, E., Narr, K. L., Thompson, P. M. & Toga, A. W. 2009. Neuroanatomical Correlates of Intelligence. Intelligence, 37, 156–163. Mackenzie, B. & Bingham, E. 1985. IQ, Inspection Time and Response Strategies in a University Population. Australian Journal of Psychology, 37, 257–268. Matthews, G. & Dorn, L. 1989. IQ and Choice Reaction Time: An Information Processing Analysis. Intelligence, 13, 299–317. May, J., Kline, P. & Cooper, C. 1987. A Brief Computerized Form of a Schematic Analogy Task. British Journal of Psychology, 78, 29–36. McCrory, C. & Cooper, C. 2005. The Relationship between Three Auditory Inspection Time Tasks and General Intelligence. Personality and Individual Differences, 38, 1835–1845. McCrory, C. & Cooper, C. 2007. Overlap between Visual Inspection Time Tasks and General Intelligence. Learning and Individual Differences, 17, 187–192. McRorie, M. & Cooper, C. 2001. Neural Transmission and General Mental Ability. Learning and Individual Differences, 13, 335–338. McRorie, M. & Cooper, C. 2004. Synaptic Transmission Correlates of General Mental Ability. Intelligence, 32, 263–275. Nettelbeck, T. & Kirby, N. H. 1983. Measures of Timed Performance and Intelligence. Intelligence, 7, 39–52. Oberauer, K., Schulze, R., Wilhelm, O. & Suss, H. M. 2005. Working Memory and Intelligence – Their Correlation and Their Relation: Comment on Ackerman, Beier, and Boyle (2005). Psychological Bulletin, 131, 61–65. Osmon, D.C., Kazakov, D., Santos, O. & Kassel, M. T. 2018. Non-Gaussian Distributional Analyses of Reaction Times (RT): Improvements That Increase Effcacy of RT Tasks for Describing Cognitive Processes. Neuropsychology Review, 28, 359–376. Parker, D. M., Crawford, J. R. & Stephen, E. 1999. Auditory Inspection Time and Intelligence: A New Spatial Localization Task. Intelligence, 27, 131–139. Posthuma, D., De Geus, E. J. C., Baare, W. F. C., Pol, H. E. H., Kahn, R. S. & Boomsma, D. I. 2002. The Association between Brain Volume and Intelligence Is of Genetic Origin. Nature Neuroscience, 5, 83–84. Ratcliff, R., Schmiedek, F. & McKoon, G. 2008. A Diffusion Model Explanation of the Worst Performance Rule for Reaction Time and IQ. Intelligence, 36, 10–17. Ratcliff, R., Van Zandt, T. & McKoon, G. 1999. Connectionist and Diffusion Models of Reaction Time. Psychological Review, 106, 261–300. Reed, T. E. 1984. Mechanism for Heritability of Intelligence. Nature, 311, 417. Reed, T. E. & Jensen, A. R. 1991. Arm Nerve Conduction Velocity (NCV), Brain NCV, Reaction Time and Intelligence. Intelligence, 15, 33–47. Rijsdijk, F. V. & Boomsma, D. I. 1997. Genetic Mediation of the Correlation between Peripheral Nerve Conduction Velocity and IQ. Behavior Genetics, 27, 87–98. Rijsdijk, F. V., Boomsma, D. I. & Vernon, P. A. 1995. Genetic Analysis of Peripheral Nerve Conduction Velocity in Twins. Behavior Genetics, 25, 341–348. Schmiedek, F., Oberauer, K., Wilhelm, O., Suss, H. M. & Wittmann, W. W. 2007. Individual Differences in Components Their Relations to Working of Reaction Time Distributions and Memory and Intelligence. Journal of Experimental Psychology – General, 136, 414–429. Schubert, A. L., Hagemann, D. & Frischkorn, G. T. 2017. Is General Intelligence Little More Than the Speed of Higher-Order Processing? Journal of Experimental Psychology – General, 146, 1498–1512.
298
Ability processes
Sheppard, L. D. & Vernon, P. A. 2008. Intelligence and Speed of Information-Processing: A Review of 50 Years of Research. Personality and Individual Differences, 44, 535–551. Sternberg, R. J. 1977. Intelligence, Information Processing and Analogical Reasoning: The Componential Analysis of Human Abilities. Hillsdale, NJ: Erlbaum. Sternberg, S. 1969. High-Speed Scanning in Human Memory. Science, 153, 652–654. Stough, C., Bates, T. C., Mangan, G. L. & Colrain, I. 2001. Inspection Time and Intelligence: Further Attempts to Eliminate the Apparent Movement Strategy. Intelligence, 29, 219–230. Tisserand, D. J. & Jolles, J. 2003. On the Involvement of Prefrontal Networks in Cognitive Ageing. Cortex, 39, 1107–1128. Vernon, P. A. & Mori, M. 1992. Intelligence, Reaction Times and Peripheral Nerve Conduction Velocity. Intelligence, 16, 273–288. Vickers, D., Nettelbeck, T. & Willson, R. J. 1972. Perceptual Indices of Performance: The Measurement of ‘Inspection Time’ and ‘Noise’ in the Visual System. Perception, 1, 263–295. Wongupparaj, P., Sumich, A., Wickens, M., Kumari, V. & Morris, R. G. 2018. Individual Differences in Working Memory and General Intelligence Indexed by P200 and P300: A Latent Variable Model. Biological Psychology, 139, 96–105.
299
Chapter 14 Personality and ability over time
Introduction This chapter explores how personality and abilities change over time. It studies whether there are changes from generation to generation, whether interventions designed to change intelligence or personality are successful, and how personality and abilities change as we age.
Learning outcomes Having read this chapter you should be able to: l l l l l
Discuss four meanings of ‘change’ in personality and ability. Comment on whether the increase in test scores from generation to generation means that intelligence levels are increasing. Discuss how (and why) personality changes over the lifespan. Discuss whether personality and ability traits are stable. Discuss the effectiveness of interventions to change personality and ability, with particular reference to Head Start programmes designed to boost intelligence.
301
Personality and ability over time
The issue of change in personality and cognitive abilities is one of the more muddled areas of individual differences, for ‘change’ means different things to different psychologists. It seems that there are four quite different issues running through the literature: 1.
2.
3.
4.
Do levels of personality and/or ability vary in same-aged individuals over time? For example, are today’s 20 year olds any more or less neurotic than the 20 year olds in our grandparents’ generation? Do personality and cognitive abilities typically vary much over the lifespan? For example, does neuroticism generally rise or fall at certain ages? Are 40 year olds generally any more or less sociable than 60 year olds, regardless of their year of birth? To keep matters simple I focus on changes in adulthood, rather than childhood, as this is not a developmental psychology text. Are personality and ability stable over time? Is someone who is highly extraverted (relative to their peers) at age 20 also highly extraverted relative to their peers at age 70? This question differs from (2) because the correlation coeffcient that is used to investigate this issue ignores any shift in the mean level of the trait over time. How much do specifc life events (e.g., psychotherapy, an enhanced early educational environment, experience of war) alter personality and/or abilities?
To further complicate things, experiments need to be designed carefully to avoid confusing genuine changes in scores within individuals with differences in scores between generations or administering an unreliable test on two or more occasions and incorrectly inferring that personality/ability changes over time.
Self-assessment question 14.1 What sorts of experiment might you design to determine whether a trait changes in the four ways just listed?
There is also the question of whether a change in a test score necessarily implies that the underlying trait has changed. For example, suppose that a group of children was coached intensively on the sorts of problem used in ability tests as described by Hayes et al. (2015) – putting sequences of pictures in the correct order, practising defning words for hours and generally honing the specifc skills required to perform well on some ability tests. They are then found to perform better than a control group when given an ability test. Does this imply that they are more intelligent? It probably does not, as the ability test is only a valid measure of g when used with naive participants. The enhanced performance is likely to be specifc to the sorts of items on which they have been trained and may not generalise even to other IQ tests, let alone to real-world performance (it may not indicate that the individual will be well suited for a demanding college course or will perform well in a job requiring the exercise of general intelligence). Similar issues may affect personality tests. Society’s values change over time, so the proportion of people agreeing that they believe in marriage (for example) would probably be less than it was a generation ago – making people appear more unconventional. In addition, some words used in personality tests (e.g., ‘wicked’) may have changed their meaning
302
Personality and ability over time
substantially. It is possible that people will appear to be more unconventional, extraverted or whatever than their parents just because of this. Changes in lifestyles may also affect performance on ability tests. As a result of computer games, children now have far more opportunity to hone their spatial skills and their hand–eye coordination than they did a generation ago. Might this accidental training artifcially infate their scores on some IQ tests? Although it sounds plausible, this seems not to be the case; playing computer games has a ‘weak to non-existent’ infuence on cognitive abilities (Gnambs and Appel, 2017). The reason why these issues can be controversial centres on the nature–nurture argument to be outlined in Chapter 15. It is sometimes asserted that because a personality or ability trait has a strong genetic component, this makes it somehow immutable and hence interventions (such as enriching the environment or improving thinking skills of disadvantaged children) are unlikely to be effective. This argument is misguided for three reasons. First, I know of no personality or ability trait that has a near total genetic component: in every case, environmental infuences, either within or from outside the childhood family, will have some infuence on levels of the trait. Second, genes cannot express themselves as behaviours unless they are given a particular environment: for example, the gene which causes the illness phenylketonuria cannot express itself (lead to the disease) unless patients eat or drink a source of phenylanaline. A child whose genetic makeup means that they have an above average predisposition to excel at music is unlikely to do so if reared in an environment where this talent is neither recognised nor developed. Third, it is easy to forget that estimates of heritabilities refer to the population being studied, not to an individual. Just because a trait has a heritability of 0.5 and the common (family) environment has no infuence at all on its level in adulthood (e.g., general intelligence) it does not follow that if one takes a particular child and raises it in an enriched (or an impoverished) environment then this will not affect the child’s ultimate level of cognitive ability. It most probably will: the genetic estimates can only speak about the likelihood of what will happen to children given the sorts of environment typically experienced within a particular culture.
Ability change Do abilities change from generation to generation? Intelligence tests which are designed for the assessment of individuals need to be carefully normed, as described in Chapter 3. That is, they are given to a large sample which is carefully chosen to be representative of the population of the country in terms of age distribution, gender distribution, ethnicity, socio-economic status, educational background, etc. The distribution of scores is published in the test manual, and is used to convert a person’s score into an IQ as described in Chapter 12. There is excellent evidence that performance on tests measuring fuid intelligence improved from generation to generation in the 20th century (Flynn, 2018), and there are several sources of evidence which show this. l When the same people are given two versions of an intelligence test on the same occasion
(one using old norms, the other using new norms to calculate IQ), their IQ score on the old test is almost always better than their IQ on the new test. This shows that the standard has changed, and that people have got better at solving IQ problems. l When large organisations (such as the military) give the same test to recruits year after year without re-norming, they notice that the number of people with high IQs seems to rise each year. This too suggests that people have got better at solving these problems.
303
Personality and ability over time l The tables of norms used to convert children’s scores into IQs change over time. An old
version of a test might show that an 11 year old who gets 48 items correct has an IQ of 100; when the same test is re-normed some years later, a score of 48 might correspond to an IQ of only 93, which once again shows that levels of intelligence have improved over time. Of course, there are plenty of alternative explanations. The military might simply attract more intelligent applicants as time goes by. However the same effect was found in countries which have compulsory military service, so this is not the reason. Likewise it is possible that children simply mature earlier, which might be the reason why today’s 11 year old performs better than an 11 year old from 30 years ago. This cannot be the reason either, as the same effect is found when adults are tested. Is it possible that the test relies on school learning, with plenty of items measuring vocabulary, mathematics, etc.? If so, the increase in test performance might just arise because children are receiving a better education nowadays. However the same effect is found when tests such Raven’s matrices are used (see Chapter 12) and as these do not involve language, mathematics or anything else which is taught in school, this cannot be the explanation. Indeed the gain in IQ is larger for tests which measure fuid intelligence such as the Raven than for tests which also rely on knowledge and hence measure crystallised intelligence to some extent (Flynn, 1999). This increase in performance from generation to generation is usually known as the Flynn effect, after James Flynn, a New Zealand political scientist who reported the phenomenon (Flynn, 1984) and analysed it in detail (Flynn, 1987, 1999) though the issue had previously been noticed in the 1940s (Deary, 2000). It means that by today’s standards, our parents and grandparents were (on average) far less intelligent than we are. This is shown in Figure 14.1. The top of the graph shows the average IQ (always 100) in the year in which a test was most recently normed, for various countries. The lines connect to occasions when the test was previously normed; the y-value shows what the average score (which was an IQ of 100 at the time) would correspond to by the standard of the most recent norming. For example, one test was last normed in the UK in 1994. It was previously normed in 1944. The average score (which corresponded to an IQ of 100 in 1944) only corresponds to an IQ of 73 by the standards of 1994. It can be seen that the effect is both substantial, and consistent in several countries. As Flynn observes: Take a woman with an IQ of 110 who taught for 30 years in the Netherlands [from 1952 to 1982]. As [Figure 14.1] shows, during that period, the Dutch gained 21 IQ points on Ravens. So in 1952, she was brighter than 75% of her senior students; by 1967, they were her equals; by 1982, 75% of them were brighter than she was. The fnding that the Flynn effect is associated with abstract reasoning, tests measuring Gf rather than Gc, suggests that it is not a simple consequence of better education or access to knowledge (e.g., through cheap paperback books, radio, television and the internet). Furthermore, the effect is found in both prosperous and developing countries (although the increase in IQ is more marked in developing countries; Brouwers et al., 2009) but these authors show that this is not entirely due to an improving provision of education in developing countries. Several other possible explanations have been suggested. Improved nutrition seems an obvious possibility, but these authors also show that attempts to boost IQ by improving nutrition in Western countries which show the Flynn effect have proved largely ineffective. Might healthier diets or reduced rates of alcohol consumption in pregnancy be the reason? It is hard to tell. Could increased familiarity with IQ testing lead to a better grasp of how to solve problems in IQ tests? Again, how can one check this? Or might it just be that modern
304
Personality and ability over time
Figure 14.1 Intelligence test scores in fve countries as a function of year of birth (from Flynn, 1999)
life gives us far more practice at complex problem solving than our parents or grandparents enjoyed; perhaps we just get more practice at abstract reasoning when trying to work out how to fnd an effective backup strategy for the data on our computers, why our car will not connect to the household wireless network, or weighing up price and time to determine which is the ‘best’ air fare. It sounds plausible, and it would probably happen worldwide – but how can one tell whether technological demands lead to better scores on ability tests? The Flynn effect seems to be widespread. For example, Ronnlund et al. (2013) found an increase in the fuid ability scores of over a million Swedish recruits between 1970 and 1993, and increases have also been found in Argentina between 1964 and 1998 (Flynn and RossiCase, 2012). The big question is whether the changes in scores on IQ tests refect a real upwards shift in intelligence (g) over the years, or whether people are just acquiring the skills needed to solve intelligence test puzzles. If it is the former, then you would expect the higher IQ scores to result in better performance at tasks which are known to refect general intelligence – for example, reaction time or inspection-time tasks, or school performance. If the latter, we would not expect higher scores on the tests to result in better performance in other areas;
305
Personality and ability over time
people may just be learning effective strategies for approaching novel problems, which generalise to the types of problem in IQ tests. This sounds possible, and it would certainly explain why the Flynn effect is greater for tests measuring fuid, rather than crystallised ability. It should also be possible to tell whether the Flynn effect represents a real upwards shift in intelligence by looking at the predictive validity of IQ tests. For example suppose data are available to predict job or school performance from IQ scores on two occasions, some years apart. Are the equations identical? If so, this would indicate that the improved IQ scores on the second occasion are mirrored by better academic or job performance, and so the increases in IQ scores represents a real upward shift in general intelligence. If IQ predicts a lower level of job performance on the second occasion, this would suggest that the Flynn effect affects performance at IQ tests but not general intelligence. Unfortunately it is hard to tell whether children perform better nowadays, as syllabi and teaching methods will have changes, as will the educational outcomes which are assessed. It is diffcult to think of a way in which modern-day educational performance can be directly compared with that of some years ago. It is likewise diffcult to think of any jobs which have remained pretty much unchanged for a period of some years, though if any could be identifed, this would be a useful way of testing what the Flynn effect really indicates. However it should be possible to check whether reaction time and inspection time have improved, as reaction time has been studied for a century, and inspection time for over 40 years. Nettelbeck and Wilson (2004) tested children from the same school in 1981 and 2001, using the same IQ tests and the same equipment to measure inspection time. They found that scores on the intelligence tests increased (the Flynn effect). Crucially there was no hint of any change in the average inspection time between 1981 and 2001. The cohort tested in 2001 did not seem to process information faster. This casts some doubt on whether the change in test scores refects differences in IQ, or just better test-taking skills. The picture becomes even more complex when reaction time is considered, as it seems likely that reaction times have slowed over the years, rather than speeding up as the Flynn effect might suggest. Evidence for this comes from a meta-analysis by Silverman (2010); to say this work is controversial is an understatement, for there are formidable methodological problems, including a lack of clarity about the apparatus used in the early (19th century) studies, the experimental procedures and the representativeness of the samples (in terms of age, gender and intelligence). We also noted in Chapter 13 that measuring average reaction time is problematical, as it is necessary to ‘clean’ the data to remove extremely long reaction times, and the way in which this is done can affect the mean reaction time which is reported. A meta-analysis by Woodley et al. (2013) tries to correct for these problems, and their graph of estimated reaction time against year of testing is shown in Figure 14.2. This still shows that reaction times seem to be getting gradually longer, an effect which is statistically highly signifcant. Other approaches fnd similar results. For example, Madison et al. (2016) tested 7000 Swedish twins using the same apparatus each time (hence overcoming one of the major problems of the meta-analyses) and found reaction times had increased over a 27-year period. The question of why reaction time seems to be increasing is fascinating, but for the present all we need to note is that reaction times do not seem to be speeding up, as the Flynn effect would suggest they should. It does not seem that either inspection time or reaction time have speeded up, and so the Flynn effect probably does not indicate a genuine shift in intelligence (g). As Flynn (2018) argues, the (largely unknown) environmental infuences which boost scores on tests do not seem to have boosted levels of other related variables, such as inspection time. We are just becoming more skilled at taking tests.
306
Personality and ability over time
Figure 14.2 Estimated reaction time as a function of year of testing (from Woodley et al., 2013). Note: small dots indicate data from small samples (N