339 61 3MB
English Pages 354 [363] Year 1988
Multivariate Exploratory Data Analysis
The world is not to be narrowed till it will go into the understanding... but the understanding is to be expanded and opened till it can take in the image of the world. —Francis Bacon, The Parasceve
Multivariate Exploratory Data Analysis A Perspective on Exploratory Factor Analysis Allen Yates State University of New York Press
Published by State University of New York Press, Albany © 1987 State University of New York All rights reserved Printed in the United States of America No part of this book may be used or reproduced in any manner whatsoever without written permission except in the case of brief quotations embodied in critical articles and reviews. For information, address State University of New York Press, State University Plaza, Albany, N.Y. 12246 Library of Congress CataloginginPublication Data Yates, Allen, 1943 Multivariate exploratory data analysis. Bibliography: p. Includes index. 1. Factor analysis. 2. Psychometrics. I. Title. QA278.5.Y38 1987 8630207 ISBN 0887065384 ISBN 0887065392 (pbk.) 10 9 8 7 6 5 4 3 2 1
To R. DeWAYNE BADGER for his patience, care, and supportwhich gave me the courage and strength to undertake completion of the book.
and to ROBERT M. PRUZEK for his friendship, his faith in the book, and his willingness always to lend a helping hand.
Page vii
CONTENTS Tables
xi
Figures
xiii
Preface
xv
Prologue
1
Part I. Introductory Algebra, Geometry, and Terminology Chapter 1. The Factor Analysis Model
Factor Analysis as a Special Case of Multivariate Multiple Regression
9
The Geometry of Factor Analysis in the Case of an Ideal Simple Structure Part II. Transformation of Primary Factors on the Basis of Thurstone's Original Mathematical Criterion for a BestFitting Simple Structure Chapter 2. Simple Structure Transformation
Thurstone's Mathematical Criterion for a BestFitting Simple Structure
31
Direct Geomin Transformation for Primary Factor Pattern Simplification
44
The Geometry of Primary Factor Rotation
20
49
Page viii
The Algebra of Factor Pair Rotation
52
Solution for the Optimum Angle of Rotation
56
Chapter 3. Practical Application of Direct Geomin Transformation
A Comparison of Direct Geomin and Direct Quartimin
61
Hard vs. Soft Squeeze Direct Geomin Transformation Strategies
67
Resistant Fitting Aspects of Direct Geomin Transformation
74
Sensitivity of Direct Geomin to Characteristics of the Starting Configuration
77
Part III. Bounding the Test Vector Configuration with Hyperplanes Through Simple Structure Transformation Chapter 4. Simple Structure Reconsidered
Beyond Factorial Purity and Ease of Interpretation
81
The Positive Manifold Assumption
87
A Source of Prior Information About Test Vector Configuration Location Chapter 5. Bounding Hyperplane Simple Structure Transformation
98
Toward Bounding the Test Vector Configuration with Hyperplanes In the Identification of Primary Factors
105
Distinguishability Weighting
107
Complexity Weighting and the Associated Factor Contribution Matrix
113
Factor Size Scaling
119
Direct Geoplane Transformation Chapter 6. A Global Strategy for Bounding Hyperplane Simple Structure Transformation Locating the Central Axis of a Polyhedral Convex Cone of Test Vectors
121
133
Page ix
Transformation to Orthogonal Bounds
138
Iterative Recomputation of Distinguishability and Complexity Weights
141
Iterative Reweighting and Factor Size Scaling
142
An Orthogonal Initial Configuration for Direct Geoplane Transformation Chapter 7. Factorial Invariance and the Global Direct Geoplane Transformation Strategy
Analysis of Thurstone's Invariant 26Variable Box Problem
153
Assessing Factorial Invariance Through Extended Vectors Projection of Direct Geoplane Factors
165
Part IV. Distinguishing Invariant Major Common Factors From Unstable Minor Factors of Group Overlap Chapter 8. Alternative Approaches to the Problem of Extracting Common Factors
The Problem of Minor and Doublet Factors in Fitting the Common Factor Model
187
The Classical Statistical Approach to Fitting the Factor Analysis Model
189
A Perspective on the Maximum Likelihood Method of Factor Extraction
193
Convergence of Simple Iterative Techniques for Maximum Likelihood Factor Extraction
198
Alternatives to the Classical Confirmatory Approach to Fitting the Unrestricted Common Factor Model
146
209
GaussSeidel and Multidimensional Sectioning Methods of Fitting the Factor Analysis Model
213
CollinearityResistant Fitting of the Common Factor Model in the Presence of Local Dependency Outliers
224
Page x
Chapter 9: Resistant Fitting and the Number of Factors Problem— Toward Fully Exploratory Factor Analysis
Fitting Lord's Highly Clustered Data
233
Practical Comparison of CollinearityResistant Fitting and Direct Geoplane Transformation with Available Approaches to Unrestricted Factor Analysis
252
Achieving Solution Invariance Through Dimensionality Reduction in Fully Exploratory Factor Analysis
266
An Alternative to Second Order Factoring
274
Accommodating Observed Data of High Dimensionality
277
Resolution of the Psychometric Vs. Statistical Conflict in Fitting the Factor Analysis Model Part V. Fully Exploratory Factor Analysis Chapter 10. Historical Perspective, Practical Application, and Future Possibilities
278
A Synthesis of the Classical Common Factor Theories of Spearman and Thurstone via Multivariate Exploratory Data Analysis
285
A Second Look at the Primary Mental Abilities
293
Toward an Understanding of Higher Integrative Mental Functioning
314
Epilogue: Release From the Burden of Confirmation in Fully Exploratory Multivariate Data Analysis
323
Appendix: Summary of Computational Steps
333
Bibliography
337
Index
347
Page xi
TABLES
1 Factor Analysis of Thurstone's 26variable Box Problem
23
2 Five Direct Quartimin and Direct Geomin Transformed Minimum Residuals Factors of Lord's [1956] Data
64
3 Behavior of Direct Geomin Transformation Under Hard and Soft Squeeze Conditions
71
4 Direct Geomin Local Minimum Solution for Thurstone's Box Problem
100
5 Direct Geoplane Primary Factor Solutions for Three Invariant Box Problem Factors
156
6 Direct Geoplane Primary Factor Solutions for Four Box Problem Factors
160
7 Five Direct Geoplane Transformed CollinearityResistant Factors of Lord's Data
236
8 Local Dependency Outlier Residuals After CollinearityResistant Fitting of Three Factors to Lord's Data
240
9 Three Direct Geoplane Transformed CollinearityResistant Factors of Lord's Data
244
10 Three Factor Residuals for Lord's Data: Maximum Likelihood Below Diagonal, Alpha Above Diagonal
256
11 Three Direct Geoplane Transformed Maximum Likelihood Factors of Lord's Data
261
12 Three Direct Geoplane Transformed Alpha Factors of Lord's Data
262
13 Three Direct Quartimin Transformed CollinearityResistant Factors of Lord's Data
265
Page xii
14 Four Direct Geoplane Transformed CollinearityResistant Factors of Lord's Data
268
15 Three Direct Geoplane Transformed ReciprocalUniquenessWeighted CollinearityResistant Factors of Lord's Data
280
16 Three Direct Geoplane Transformed ReciprocalUniquenessWeighted CollinearityResistant Factors of the Thurstone and Thurstone FSI Battery
296
17 Linkages Among WordFluency Tests in the FSI Battery
301
18 Four Direct Geoplane Transformed ReciprocalUniquenessWeighted CollinearityResistant Factors of the Thurstone and Thurstone FSI Battery
307
19 Three Direct Geoplane Transformed ReciprocalUniquenessWeighted CollinearityResistant Factors of the Thurstone Primary Mental Abilities Battery
315
Page xiii
FIGURES 1 The Geometry of Simple Structure in Three Dimensions
25
2 Rotation of Primary Factor Axis
51
3 Hierarchical Cluster Analysis of Lord's Data
62
4 Comparison of Direct Geomin and Direct Quartimin Factor Pattern Distributions
67
5 Behavior of Direct Geomin Transformation Under Hard and Soft Squeeze Conditions
69
6 Comparison of Configurations Among the Common Parts of Test Vectors in Three Possible Situations
91
7 Extended Vectors Plot for Thurstone's Analysis of Brigham's Data
167
8 ad Extended Vectors Plots of the Table 5 ThreeFactor Box Problem Solutions
16972
9 ad Extended Vectors Plots for a Table 6 FourFactor Box Problem Solution
17276
10 ad Extended Vectors Plots for a Table 6 FourFactor Box Problem Solution
17780
11 Extended Vectors Plot of Three Direct Geoplane Transformed Collinearity Resistant Factors of Lord's Data
245
12 Extended Vectors Plot of Three Direct Geoplane Transformed Maximum Likelihood Factors of Lord's Data
263
Page xiv
13 Extended Vectors Plot of Three Direct Geoplane Transformed Alpha Factors of Lord's Data
264
14 Extended Vectors Plot of Three Direct Quartimin Transformed Collinearity Resistant Factors of Lord's Data
266
15 to 18 Extended Vectors Plots for a FourFactor Direct Geoplane Transformed 270, 271, CollinearityResistant Solution for Lord's Data 27273
19 ab Extended Vectors Plots of the Table 16 ThreeFactor Fully Exploratory Factor Analysis Solution for the Thurstone and Thurstone FSI Battery
298 and 299
20 ad Extended Vectors Plots for the Table 18 FourFactor Fully Exploratory Factor Analysis Solution for the Thurstone and Thurstone FSI Battery
30912
21 Smallest Space Analysis of Selected FSI Tests [After Guttman, 1965, p. 32, Fig. 1 ]
313
22 Extended Vectors Plot for the Table 19 ThreeFactor Primary Mental Abilities Fully Exploratory Factor Analysis Solution
317
Page xv
PREFACE The research reported here began nearly two decades ago, when I was a graduate student in the Psychology Department at Bowling Green State University. At that time I decided that something had to be done to make exploratory factor analysis live up to its original promise as a mute toward routine discovery of ''invariant common factors" that could be taken as hypothethical determinants of observed covariation within any roughly homogeneous behavioral domain. My research on factor analysis was encouraged by my principal advisors, J.P. Scott and Fumiko Samejima, despite little overlap with their own respective areas of interest. It continued in the L. L. Thurstone Psychometric Laboratory at the University of North Carolina, where access to literature on multidimensional scaling broadened my interests and provided a unique perspective on the exploratory analysis of multivariate association. It was at Educational Testing Service, however, where my growing involvement in the application of factor analysis, cluster analysis, and multidimensional scaling coalesced into what is presented here. The catalyst for this synthesis was the exposure to exploratory data analysis methods I received while in Princeton. The enthusiasm of my statistician colleagues for resistant fitting methods, coupled with their skepticism about psychometrics in general— and factor analysis in particular—challenged me to reexamine many accepted features of the classical common factor model. It is fortunate that substantive researchers at E.T.S. and elsewhere were willing to let me apply new methods of factor analysis to their research data. This practical experience helped me to see how far factor analysis had deviated from its original goal of finding underlying causal explanations to mere summary of superficial relationships. My repeated efforts to explain the alternatives to clusteroriented factor rotation to colleagues only made it obvious to me that the representation of superficial relationships in a final solution could only be avoided if one were to deliberately overlook them from the very beginning; i.e., to accurately fit only the weaker, more dif
Page xvi
fuse (but more general, more stable, and more invariant) coplanarity relationships among manifest variables while ignoring the stronger but less informative associations that merely reflect superficial collinearities. I soon became convinced that the popular objectives of factor analysis had to be turned upside down right at the stage where the model was being fit to the data, not just in the subsequent stage of simple structure transformation. Eventually, I came to realize that this reversal in perspectives had been at the heart of the SpearmanThurstone controversy all along and still polarized British and American factor analysts. Some aspects of the research on multivariate exploratory data analysis reported here have been discussed at meetings of the Psychometric Society and Part II grew out of my dissertation. Those readers who are being exposed to fully exploratory factor analysis for the first time are encouraged to read the Prologue and Part V if the following overview does not motivate them to peruse the entire volume: It can be argued that the classical factor analysis model is basically confirmatory in conception. Its fully exploratory use might therefore be facilitated if the model could be "weakened" somewhat through the incorporation of resistant fitting principles developed within the context of modern exploratory data analysis. The impact of such an innovation on factor analytic practice should prove comparable to the profound effect Shepard's successful weakening of Torgerson's classical multidimensional scaling model had on the analysis of similarities data when it gave rise to nonmetric scaling methods. Indeed, the outcome of "fully exploratory factor analysis" resembles that of nonmetric multidimensional scaling in that the observed data can generally be represented within an easily visualized space of low dimensionality. However, the solution retains all the essential features of the strong functional common factor model that facilitate identification of hypothetical determinants of observed covariation. Moreover, the latent common factors so identified are highly likely to remain invariant with respect to changes in test battery composition because the contaminating influence of "methods" factors tend to be routinely eliminated (in the form of local dependency outlier residuals) by the collinearityresistant fitting procedure used. Invariant location of the primary factors retained is further assured through their placement at the intersections of testvectorconfigurationbounding hyperplanes in such a way that Thurstone's original mathematical criterion for a bestfitting simple structure is satisfied. Fully exploratory factor analysis holds promise as a route toward synthesis of widely divergent views about the
Page xvii structure of human abilities, as illustrated through reanalysis of Thurstone's Primary Mental Abilities data in Part V.
In closing, I would like to thank the many individuals who in one way or another made it possible for me to write this volume. They are too numerous to mention individually but I am highly indebted to the following for reviewing the manuscript, in whole or in part: Anonymous Peter Bentler Michael Browne Bert Green Jack McArdle Roderick McDonald Bob Pruzek Joseph Royce Spencer Swinton Yoshio Takane Bary Wingersky
I also wish to thank Magdalen Braden for copyediting the final manuscript and Byerly Woodward for copyediting portions of the first draft. Of course, the individuals listed are not responsible for any problems that remain since I used my own judgment in making suggested changes. Allen Yates San Antonio, TX January 31, 1987
Page 1
PROLOGUE Early in this century the development of factor analysis procedures by Spearman [1904], Thurstone [1935], and many others set the stage for a good deal of controversy about alternative methods of analyzing the association commonly observed among psychological variables. Although some controversy persists to this day, it is now generally agreed that different factor analytic methods can be profitably applied to the same set of data [cf. Harris, 1967]. Dozens of methods are now available for estimating parameters in the factor analysis model, but it is difficult to determine the impact of a particular methodological choice upon the scientific implications of a given analysis. Instead of asserting that one method is generally optimal, it is commonplace to emphasize the basic similarities between different procedures, to regard them as alternative means for achieving the same scientific objectives, and to make an arbitrary choice among methods on the basis of computer program availability or familiarity. Unfortunately, placing too much emphasis upon similarities among different techniques for exploratory factor analysis can obscure the possibility that different computational methods are best suited for the attainment of quite different scientific objectives. Indeed, one aim of this volume is to demonstrate that rather minor modification of certain mathematical criteria used in exploratory factor analysis can have an important impact upon the scientific utility of the outcome. A choice among alternative methods of factor analysis can be made on the basis of computational, statistical, and scientific (substantivetheoretical) grounds. Early workers were preoccupied with computational issues and with the scientific implications of various methods. The methods developed at that time (e.g., centroid extraction, graphical rotation) have been gradually replaced by modem computational routines made feasible by computer technology and advances in numerical analysis. Similarly, the early scientific issues raised by general factor, bifactor, group factor, and hierarchical factor
Page 2
enthusiasts have been overshadowed by general acceptance of Thurstone's multiple factor model [Solomon, 1969]. With the diminished emphasis on computational and scientific issues in contemporary factor analysis, both statistical and psychometric issues have come to the foreground. Treating factor analysis as a standard statistical problem of estimating parameters in a mathematical model for the population on the basis of sample data, given the assumption of multivariate normality, has led to the maximum likelihood method [Lawley, 1940, 1942, 1943; Jöreskog, 1967]. Similarly, psychometric emphasis upon the reliability of linear composites has led to alpha factor analysis [Kaiser & Caffrey, 1965], while other work on prediction within a domain of variation has led to image analysis methods [Guttman, 1953]. In terms of statistical estimation, measurement, and prediction, modem factor analysis appears to be highly developed and well understood (see, for example, Mulaik [ 1972]). With ready access to electronic computing, the development of modem factor analytic theory and methodology has proceeded so rapidly and successfully that it is tempting to believe that few major problems remain to be solved. Nevertheless, factor analysis is held in low regard as a method for exploratory data analysis by many statisticians and psychologists [cf. Overall, 1964; Armstrong, 1967; Lykken, 1971; Mukherjee, 1973]. This disenchantment with factor analysis as a tool for exploratory research has contributed to the development and acceptance of alternative methods for the analysis of association such as cluster analysis and multidimensional scaling. With the declining popularity of exploratory factor analysis, confirmatory alternatives such as path analysis have also been accepted with enthusiasm. The major problem with exploratory factor analysis seems to be that the technique promises to identify routinely the causal determinants of observed behavioral variation. Whereas most alternative exploratory approaches to the analysis of association simply provide a compact and convenient way of describing manifest relationships among observed variables, factor analysis is designed to reveal the relationships of observed variables to latent, unobserved "factors" that may be regarded as hypothetical determinants or causes of manifest variation. In theory, at least, factor analysis goes beyond summarizing relationships among observed variables to reveal a functional model by which a small number of latent variables could have given rise to a larger set of observed phenomena. Unfortunately, discovering a unique, welldefined, and scientifically meaningful latent influence model that accounts for a given set of observed data can be a monumental task. The objectives of factor analysis rest as much on scientific considerations as on the logic of statistical inference. They derive from the philosophical notion that relatively uncomplicated hypothetical influence models are the most promising route to useful scientific explanation, at least at the initial,
Page 3 *
exploratory phase of research (the law of parsimony, Occam's razor, Newton's natura est simplex). As long as the basic factor analysis model of a few latent variables influencing manifest variables in a simple linear way is a reasonable formulation for a given domain, there is some hope that its exploratory use will lead to theoretical insight. Three major problems are typically encountered in the conduct of exploratory factor analysis: determination of the number of factors shared in common by variables in the domain being analyzed, fitting a common factor model of this dimensionality to the observed data and, finally, transforming the outcome to a "scientifically meaningful" orientation. It is not uncommon for users of exploratory factor analysis to undertake these three tasks as though they are completely unrelated to one another rather than being complementary steps in the attainment of one overriding scientific objective. Exploratory factor analysis is apt to be scientifically fruitless, however, unless all three phases of its conduct are jointly designed to bring about the same objective. The overriding objective of solution invariance with respect to domain sampling idiosyncracies is emphasized in the approach to exploratory factor analysis presented in this volume. Because exploratory factor analysis is usually based upon a relatively small battery of tests and a rather large sample of individuals, the sensitivity of estimated parameters to statistical sampling fluctuations is a minor consideration in comparison to sensitivity of the overall solution to changes in test battery composition. Each phase of exploratory factor analysis must therefore be evaluated more in relation to the psychometric issue of factorial invariance than the statistical issue of inference to a population. The transformation (rotation) phase of exploratory factor analysis is undertaken in an attempt to arrive at a simple, albeit substantively meaningful and theoretically informative, linear influence model relating manifest variables to their hypothetical determinants, the factors. The basis of this search for "simple structure" is a scientific belief in the simplicity and parsimony of natural processes. The discovery of simple structure in psychological data, for example, is theoretically possible as long as unique behavioral manifestations are not generally produced by exactly the same combination of influences and as long as few behavioral outcomes are influenced by every hypothetical determinant under consideration. It is not unreasonable to expect such general conditions to be satisfied within the behavioral domain, but they are extremely difficult to translate into a specific set of restrictions on *
According to Russell [1945], William of Occam stated in the fourteenth century, "it is vain to do with more what can be done with fewer." This can be taken to mean that no hypothetical entity should be assumed to be involved if it is not actually required in the scientific interpretation of an observed phenomenon.
Page 4
the parameters within a mathematical model. They do not lead in any obvious way to the degree of model specification required for parameter estimation as it is usually accomplished in confirmatory data analysis. The difficult and seemingly illdefined problem of factor transformation therefore remains paramount in exploratory analyses, despite the high degree of statistical or psychometric sophistication that can now be achieved in the initial phase of factor extraction (fitting the basic, arbitrarily identified model to observed data). In Part II of this volume factor rotation is discussed from the perspective of satisfying Thurstone's original mathematical criterion for a bestfitting simple structure [cf. Thurstone, 1935]. In Part Ill the objectives of factor transformation are reassessed in the light of a basic assumption about the typical mode of joint action of latent determinants within the domain of natural behavioral variation—an assumption that implies both factorial invariance and simple structure. Although sophisticated statistical and psychometric procedures for fitting the common factor model to observed data have been developed, so far no method of factor extraction has been based upon the joint assumption that observations are sampled from a population of individuals and variables are sampled from a domain of content. Hence, a dilemma exists regarding the proper choice between alternative factor extraction methods. In Part IV we present a possible resolution of this conflict between classical statistical and psychometric approaches to fitting the common factor model. The proposed solution makes use of resistant fitting principles developed within the context of exploratory data analysis [cf. Mosteller and Tukey, 1977]. It is specifically designed to bring about factorial invariance at the stage of fitting the common factor model to observed data. In addition to the problem of fitting the common factor model to observed data and appropriately transforming the solution thus obtained, there remains the crucial task of determining how many underlying factors are shared by manifest variables within the domain of content analyzed. Both statistical and psychometric solutions to the number of factors problem have been proposed, but these methods usually suggest analyses of different dimensionality for the same set of data. Statisticallybased estimates of dimensionality tend to be dependent on sample size whereas psychometricallybased estimates tend to depend on the number of variables included in the analyzed battery. In fact, since both sample size and battery size are arbitrary features of any data set analyzed, they do not reflect the true latent dimensionality of the domain of variation being studied and should not enter into its determination. We see the number of factors problem as but one aspect of the more general problem of making the geometrical distinction between stable (invariant) effects of broad general factors common to the entire domain of
Page 5
variables sampled and unstable (i.e., battery specific) effects of minor factors of group overlap. This later distinction is integral to the resistant fitting method proposed in Part IV, however, so our discussion there concerning factor extraction actually focuses on the number of factors problem as well. Moreover, since effective application of bounding hyperplane simple structure transformation requires that the effects of broad general and invariant common factors be distinguished from those of minor factors of group overlap, the number of factors issue is pivotal to our Part Ill discussion of rotation strategies. In short, we have proposed a highly integrated overall approach to exploratory factor analysis wherein complementary methods of transformation, fitting observed data, and determining dimensionality have all been designed to bring about the same overriding scientific objective of factorial invariance. In order to arrive at computational techniques that are likely to facilitate the attainment of this objective, both statistical and psychometric considerations are taken into account. As background for our subsequent discussion of exploratory factor analysis, some conventional notation and terminology related to the factor analysis model are given in Part I. The presentation is at an intermediate mathematical level in that rudimentary matrix algebra is used to relate the factor analysis model to that of multivariate multiple regression but the more complex algebra of characteristic equations is avoided. Nor is the factor analysis model presented in the most elegant fashion possible. (For a more rigorous definition of the factor analysis model the reader is referred to Williams [1978]). The aim in Part I is to give only a brief review of theory and an introduction to the system of notation that will be used in subsequent developments. In Parts II through V of this volume we have endeavored to present a new perspective on exploratory factor analysis through extensive use of both figural and verbal communication as well as convenient algebraic symbolism. Consequently, much of what we have attempted to communicate can be gleaned even by a reader who is willing to follow only our verbal arguments. Those sections that appear to be too highly saturated with algebra may usually be skipped with impunity. On the other hand, the reader who has good geometrical intuition can avoid becoming ensnared in either the thread of our discourse or the accompanying algebraic sequence simply by focusing upon the figures and their attendant discussion. Of course, the most complete understanding will come to readers who are simultaneously receptive to the figural, symbolic, and semantic aspects of our presentation. It is our feeling that much of the original intuitive geometrical appeal of exploratory factor analysis as a technique of scientific discovery has been diminished over the years by excessive emphasis on the detailed mathemati
Page 6
cal and statistical aspects of the model. Butler expresses a similar concern as follows: Although it is in danger of becoming a "hasbeen" among approaches to multivariate analysis, factor analysis has a power that the statistical approaches simply do not have. I believe that one of the reasons for the decline of factor analysis is that some factor theorists with more mathematical training than the rest of us have been addressing themselves to problems of factor analysis on a technical rather than upon a fundamental level. The result has been an array of models and solutions irrelevant in most cases to the problems and predicaments of those for whom factor analysis is a research tool, not an object of theoretical interest in its own right. The signs of unrest are quite evident. Far too many intelligent students complain that the only basis they have for choosing among the many types of factor analysis and of factor analytic solutions is the prestige of a given theorist or the fact that this or that "maximin" solution has often been used. [1969, pp. 252253]
Those readers who are already very familiar with factor analytic theory are probably algebraically inclined, so it would be wise for them to concentrate on the geometrical ideas discussed in Parts Ill and IV. The algebra can be summed up for those readers simply by saying that a variation upon criterion (40) on page 46 is minimized in the transformation phase while a variation upon criterion (189) on page 229 is minimized in the extraction phase of the proposed approach to exploratory factor analysis. Much algebraic detail is presented with respect to the actual mechanics of optimizing these criteria, but is actually secondary to the overriding concern for the logical implications of the geometry that underlies the solution strategies adopted. Only through an understanding of our verbal and geometrical arguments can the mundane algebra contained herein be brought to light as the embodiment of a unique scientific conceptualization of the objectives of exploratory factor analysis—a conceptualization that unites the collective insights of early visionaries within the field of factor analysis with the practical objectives of modem exploratory data analysts.
Page 7
PART I INTRODUCTORY ALGEBRA, GEOMETRY, AND TERMINOLOGY
Page 9
1 The Factor Analysis Model Factor Analysis as a Special Case of Multivariate Multiple Regression Factor analysis is most readily conceptualized as a multivariate multiple regression problem in which we are concerned with predicting scores on n observed criterion variables from scores on mn individuals, and the task is to make inferences about what common set of latent variables (hypothetical determinants) might account for any covariance shared by the manifest variables. Note that we are interested in accounting for observed associations among manifest variables through reference to latent variables in the form of common factors. It is therefore necessary to allow for some variation in each manifest variable that remains unaccounted for by the m common factors. This is accomplished by introducing n additional latent variables, each of which accounts for the unique, residual variance in its associated manifest variable. The final model is (1a) in simple summation notation, or (1b)
Z’ = P F’ + U E’,
in more convenient matrix notation. The required matrices are defined as follows:
Z (N × n) = {Zij}
is the observed matrix of standard scores for each of N individuals on n manifest criterion variables;
Page 10 F (N × m) = {fik}
is an unobserved matrix of standard scores for each of N individuals on each of m0 in (44) and therefore disrupting smooth convergence, the results summarized in Table 3 make it clear that this is an effective way of optimizing hyperplane count within any desired range of zero. The problem here seems to be that the mathematical fitting procedure (without inclusion of hyperplane width tolerance) implies acceptance of an error theory to the effect that estimates (factor pattern coefficients) that are smaller in magnitude have smaller standard errors than do estimates that are larger in magnitude.† The multiplicative form of Thurstone's original simple structure criterion (32), as well as our (zero widthtolerance) modification thereof (40) through use of the geometric mean, is a clue to this implicit error theory. Attempting to minimize either the product or the geometric mean of squared beta weights in a given row of the primary factor pattern matrix is comparable to attempting to minimize the sum of logarithms of the magnitudes of these beta weights. This clearly places great emphasis upon those coefficients that approach zero most closely, and is not a bad idea since it allows the "outliers" within each row (i.e., those coefficients which do not help define hyperplanes) to be largely overlooked. By restricting the degree to which nearzero coefficients in an initial or intermediate configuration are taken to be accurately estimated, however, the softsqueeze direct geomin algorithm is free to move past local minima as it progresses from courser to finer definition of hyperplanes across major iterations cycles. Both Thurstone's extended products weighting in (32) and its geometric mean variation in (40) are reasonable criteria to use when searching for nearzero factor pattern coefficients because they tend to overlook such coefficients to the extent that the latter differ from zero. This tendency is so †
We do not refer here to standard errors of loadings in the usual sense of their being estimates of population parameters, but as estimates of a presumed lack of relationship between variables and factors; only if the true relationship is zero will there by any equivalence.
Page 74
pronounced, in fact, that it must be dampened rather strongly in the initial phases of hyperplane. search, in order to avoid fixation on local minima, and must be dampened somewhat even in the final stages of hyperplane search. Resistant Fitting Aspects of Direct Geomin Transformation. It might be felt that a straightforward resistant fitting approach to factor transformation would be workable. In such an approach one would take the value of zero as an a priori estimate of all n × m beta weights in the primary factor pattern matrix—making the most extreme possible use of the notion of scientific parsimony; i.e., postulating that none of the observed variables arc influenced by any of the factors. Departure from this a priori hypothesis of ultimate parsimony, albeit nonsensical, would of course be indicated by any nonzero entry in the observed factor pattern matrix. This matrix could then be regarded as a matrix of "residuals", each reflecting lack of fit of the "model" to observed data. The analytic objective would then be to minimize this lack of fit between hypothetical model and observed data. An appropriate criterion to be minimized would thus be the sum of squared "residuals"; i.e., the sum of squared factor pattern coefficients in P. Recognizing that this approach could not lead to a very good overall fit between a priori model and observed data, however, we might attempt only to "fit the fittables'' through a method resistant to outliers. In a resistant fitting approach to factor pattern transformation we would regard nearzero beta weights as evidence of "fit" to our a priori model of a lack of relationship between latent factors and manifest variables. Lack of fit in the form of nonzero beta weights could then be tolerated as long as the "residuals" implied differed sufficiently from zero to be regarded as true "outliers". The use of a weight for each "residual" in the primary factor pattern that decreased smoothly as the corresponding residual increased in magnitude through successive, reweighted least squares iterations could be invoked to accomplish resistant fitting. For instance, a biweight [cf. Mosteller and Tukey, 1977] approach to resistant fitting of hyperplanes can be developed using the weights.
(71)
in minimizing the iteratively rewighted least squares function (72)
Page 75
over all elements of the primary factor pattern matrix P. If the convention is followed that negative weights resulting from (71) are set to zero; i.e., if the weights used in (72) are computed as
then the value of x serves as a cutoff point above which residuals are completely ignored. This possibility leads to a procedure not unlike the one Browne and Kristof [1969] use for oblique procrustes rotation to a partially specified target. Indeed, implementation of the biweight approach to resistant fitting of hyperplanes as outlined above is highly similar to that of the Browne and Kristof method, except that weighting the degree to which any given factor pattern coefficient is to be reduced in size is a function of the smallness of the corresponding "residual" (i.e., that same factor pattern coefficient) which remains after the previous fitting cycle, rather than being the product of an a priori hypothesis. The biweight resistant fitting approach to factor pattern transformation briefly outlined here is quite feasible and has been used extensively by the author on real data problems in comparison with direct geomin and direct quartimin. Treating all nonzero factor pattern coefficients as "outliers" which deviate from the prior hypothesis of no association between observed variables and factors is an extreme example of the notion of pulling parameter estimates back toward an a priori value of zero. We will therefore refer to the resistant fitting approach to primary factor pattern simplification as "direct pullback" in subsequent discussions. Direct pullback does not get around the problem of having to specify a kind of hyperplane width parameter, since x in the biweight formulation determines the point beyond which outlying factor pattern coefficients (residuals) are completely ignored. Various ad hoc rules related to the assumption that nearzero factor pattern coefficients are normally distributed and therefore would never be expected to exceed six or so standard deviations are possible routes to choosing x [cf. Mosteller and Tukey, 1977]. This digression concerning the direct pullback approach to factor pattern simplification has been presented because the notion that there should be as many zeros in the factor pattern matrix as possible is often regarded as being equivalent to that of simple struture. Interestingly enough, the author has found that direct pullback does not accomplish the task of distinguishing between factor pattern coefficients which fall in vs. out of hyperplanes as convincingly or as routinely as does direct geomin. Undue fixation on a local minimum initial configuration occurs regularly with direct pullback, and a remedy is not as readily at hand as for direct geomin. (Iteratively
Page 76
reweighted least squares firing algorithms are characteristically reluctant to move away from any initial subset of "fittables".) The problem of having parameter estimates driven more closely toward an exact zero value than might seem reasonable with errorprone data also occurs with direct pullback. A solution to this problem, just as when it occurs with direct geomin, is to restrict the weighting coefficients (71) to values less than or equal to what would be obtained where a hyperplane width tolerance substituted into (71) for any factor pattern coefficients falling below that value. In short, the direct pullback resistant fitting approach to hyperplane search runs into the same difficulties encountered in implementation of direct geomin hyperplane search, but the way around these obstacles to realistic application of the former technique is not any more straightforward or compelling than for direct geomin. Comparison of the two approaches is nevertheless enlightening because it reveals the inherent resistant fitting aspects of direct geomin transformation. It reinforces our notion that a soft squeeze approach with successive reductions in some sort of hyperplane width tolerance between initially coarser and ultimately finer limits, as major iteration cycles progress, is a likely requirement in any transformation technique that presumes to separate "signal" from "noise" in the primary factor pattern matrix of beta weights. One distinct advantage that direct geomin has over direct pullback, or any other procedure that attempts to get as many primary factor pattern coefficients within the vicinity of zero as possible, is that criterion (40) is designed to ensure simplification of every row in the factor pattern matrix (i.e., to find at least one nearzero relationship within each row). This is the basic aim of simple structure and it is only of secondary importance that further simplification (more nearzeros) of each variable occur. In other words, direct geomin will not tend to oversimplify some variables at the expense of leaving others too highly complex. Our experience with direct pullback revealed a fundamental problem with the notion that hyperplane overdetermination (i.e., the degree to which hyperplanes are populated) should be the primary objective of simple structure transformation. It all too often happens that direct pullback transformation achieves an abundance of nearzero loadings, as well as a clear separation of nearzero from "outlying" parameter estimates, through the device of introducing wide variation in factor "sizes". In other words, the recovery of a large "general" factor and a set of small, bipolar "residual" factors can lead to a pattern with many nearzero loadings, many large loadings, and few loadings of intermediate size. A similar problem of disproportionate factor "sizes" has repeatedly been noted in connection with the quartimax [Neuhaus and Wrigley, 1954] and quartimin [Carroll, 1953] transformation cri
Page 77
teria. Indeed, it is mainly in an effort to ensure more equitable spreading of common variance among all factors that the vailmax [Kaiser, 1958] and oblimin [Carroll, 1957] variations on the former independentclusteroriented criteria are used. Although direct geomin does not seem to suffer from the problem of inequitable distribution of common variance among factors, as does direct pullback, we will consider this issue in the light of theoretical justification for expecting factors to share approximately evenly in accounting for common variance in Part III. Sensitivity of Direct Geomin Transformation to Characteristics of the Starting Configuration. From the foregoing discussion of resistant fitting aspects of direct geomin transformation we have seen what great emphasis is placed on the nearzero entries in the factor pattern matrix by any variation upon Thurstone's original mathematical criterion for a bestfitting simple structure. The problems to which this emphasis can lead in terms of fixation on a local minimum solution are obvious. Given this feature of the direct geomin simple structure criterion, it would seem quite important to initiate such transformation from an initial configuration that is not too far removed from the final solution desired even though the softsqueeze strategy is designed to avoid local minimum problems. Although softsqueeze direct geomin transformation yielded the superior solution for Lord's highly clustered data discussed earlier in this chapter, that outcome would not have been guaranteed without the use of a rather good initial configuration due to Landahl. Equally satisfactory results could have been achieved in this case by using the clusteroriented direct quartimin solution as an initial configuration, due to the highly clustered nature of Lord's data, but it turns out that even softsqueeze direct geomin can yield quite different final solutions, depending upon what starting configuration is used, when one is dealing with factorially complex data. In Part Ill we will see that different starting solutions yield quite discrepant direct geomin solutions for Thurstone's highly complex 26variable box problem, for instance. Although direct geomin transformation can be a highly effective method of refining any initial factor solution because of its strong emphasis upon the placement of every test vector into at least one hyperplane, the high tolerance for factorial complexity of even localminimum avoiding softsqueeze direct geomin makes that method incapable of accomplishing major shifts in the orientation of factor axes away from their original placement within any given initial configuration. Since most available factor transformation methods that might be used to obtain an initial configuration from which to launch softsqueeze direct geomin transformation are strongly cluster oriented, moreover, there is reason to be concerned about how this initial
Page 78
alignment of factor axes with independent clusters might bias any subsequent direct geomin solution. In particular, it would seem important to avoid the initial alignment of primary factor axes with oblique cluster dimensions in order to ensure proper placement of the former at the intersections of testvectorconfigurationbounding hyperplanes through subsequent softsqueeze direct geomin transformation. It is exactly this consideration that led to the research dicussed in Part III. In Part Ill an iteratively reweighted variation upon the softsqueeze direct geomin criterion is developed into a global transformation strategy that promises to place primary factor axes at the intersections of true testvectorconfigurationbounding hyperplanes despite strong clustering of factorially complex tests well within the central region of the test vector configuration as well as other idiosyncrasies of test battery composition. The latter idiosyncrasies can so grossly distort the outcome of initial clusteroriented transformation (e.g., in its application to Thurstone's 26variable box problem) that subsequent application of softsqueeze direct geomin transformation cannot bring about the major shift in orientation of axes required to move primary factors toward the outer "fringes" of the test vector configuration, where the actual causal determinants of observed covariation can be expected to be found [cf. Thurstone, 1934, p. 30]. (The global strategy for bounding hyperplane simple structure transformation developed in Part III, termed "direct geoplane", can be regarded simply as a way of avoiding reliance upon a clusterbound or otherwise undesirable local minimum solution as the starting point for softsqueeze direct geomin transformation. The data oriented reader may therefore wish to proceed directly to our discussion of the practical performance of this global strategy for bounding hyperplane simple structure transformation in Chapter 7).
Page 79
PART III BOUNDING THE TEST VECTOR CONFIGURATION WITH HYPERPLANES THROUGH SIMPLE STRUCTURE TRANSFORMATION
Page 81
4 Simple Structure Reconsidered Beyond Factorial Purity and Ease of Interpretation In Part II it was shown that it is feasible to transform primary factors on the basis of a slightly modified version of Thurstone's original mathematical criterion for the bestfitting simple structure. Not only did this approach succeed in the case of data displaying an independent cluster structure, in that clusters were made to fall at the intersections of hyperplanes even though Thurstone's criterion could technically be satisfied in other ways, but the resulting solution could be seen to be superior to that given by the independentclusteroriented direct quartimin procedure. It therefore became apparent that the proposed soft squeeze direct geomin transformation strategy could be used as an effective means of cleaning up any preliminary solution, including one obtained through conventional clusteroriented transformation. By tolerating factorial complexity for those variables that do not fall clearly into any single group of collinear variables located through preliminary clusteroriented transformation, for example, softsqueeze direct geomin should be able to bring about a clearer separation of "signal" from "noise" in the refined solution. Although direct geomin could become popular as a means of cleaning up independentclusteroriented solutions in exploratory factor analysis, this particular application of the method must be discouraged. From our perspective, the proper aim of exploratory factor analysis is not the location of independent clusters of moreorless collinear variables, but the isolation of invariant hypothetical determinants of manifest covariation at the intersections of hyperplanes that closely and effectively bound the entire test vector configuration. It would seem, moreover, that softsqueeze direct geomin transformation could be applied quite successfully as the final stage in the latter process, just as it can be used to refine an independentclusteroriented solution. It would only be necessary to start with an initial solution in which factors have been made to fall roughly normal to distinct and welldefined outer boundaries of the test vector configuration.
Page 82
Extensive experience in the application of softsqueeze direct geomin transformation in the presence of latent coplanarity rather than latent collinearity effects (as in the nearideal simple structure represented by Thurstone's box problem) convinced the author that independentclusteroriented transformation strategies provide an inadequate base from which to initiate true simple structure transformation via direct geomin. Since most independentclusteroriented transformation methods yield distorted results in the presence of factorial complexity [cf. Guilford, 1977] and generally cannot succeed in finding the appropriate solution in the analysis of Thurstone's idealized box problem data [cf. Butler, 1969; Cureton and Mulaik, 1971], they can hardly be expected to provide a useful starting solution for direct geomin transformation. The quest for a useful initial configuration solution in true bounding hyperplane simple structure transformation (which culminates with softsqueeze direct geomin) led to rather extensive psychometric developments, which will be discussed in the remaining parts of this volume. Not only is it necessary to avoid dependence on independentclusteroriented transformation as the starting point, it is also necessary to avoid reliance on conventional methods of fitting the factor analysis model (extracting factors) and deciding upon dimensionality (number of common factors). The direct geomin variation on Thurstone's original mathematical criterion for a bestfitting simple structure nevertheless remains an effective basic procedure for final adjustment of primary factors so that the hyperplanes at whose intersections each factor falls closely and effectively bound the test vector configuration. Before direct geomin transformation can be expected to yield results of real scientific interest, however, a series of preliminary analyses must be brought to successful completion. Our experience with conventional factor transformation methods during the search for an effective initial configuration routine led to the conclusion that factor analysis as commonly practiced must be regarded simply as one variety of cluster analysis. The objective of most popular factor transformation procedures (albeit imperfectly realized in practice) is to assign each variable to an "independent cluster" of collinear variables. However, once one has obtained clusterdimensions summarizing manifest relationships of 3 among groups of variables, it is not safe to revert to factor analytic reasoning which treats each derived dimension as though it identifies a fundamental determinant of the observed sample variation. Relative collinearity (clustering) of manifest variables can be the joint result of a whole complex of shared determinants and is highly dependent upon test battery composition. Relative cohyperplanarity (simple structure), on the other hand, should not be disturbed by changes in test battery composition. Hyperplanes should theoretically be well defined whether or not
Page 83
any manifest variables are highly correlated because they share exactly the same set of determinants (i.e., cluster) as long as enough variables that are unrelated to each factor are included in an analysis. All that is required for the existence of simple structure, under its original definition, is that few of the observed variables be influenced by all of the hypothetical determinants that are found to be active within a given domain.† Of course, the ease and accuracy with which these hypothetical determinants can actually be identified depends upon the distinctness and definition of any cohyperplanarity which in fact exists in the data. Thurstone unfortunately modified his original verbal criteria for assessing the probable uniqueness of a simple structure configuration [1935, p. 156] to include some aspects of independent cluster structure [1947, p. 335] at a rather late date in his career. This action was taken once he became familiar with the factorial composition of his own data and demanded clearer distinction between factors. Such inconsistency on Thurstone's part is probably the basis for much subsequent confusion of factor and cluster concepts. The result has been erroneous emphasis upon independent cluster structure at the early, exploratory phases of analyses when them is no basis for judging which variables might be essentially unifactorial. In fairness, it must be pointed out that some of those who originated analytic factor transformation methods were well aware that their algebraic criteria could only be expected to yield satisfactory results for unifactorial variables†† [cf. Carroll, 1953; Saunders, 1953 Note]. Any subsequent abuse of the methods has therefore been due largely to our modern tendency to let available computer programs dictate the course of data analysis rather than basing our choice of methods upon theoretical considerations. Carroll, for instance, realized that he had proposed a criterion that would yield "only an approximation to simple structure" and was of the "opinion that graphical methods, properly used, still afford the most desirable way of obtaining a final simple structure" [1953, p. 23]. He clearly stated that one of the main disadvantages of what later came to be called the quartimin method is that "The presence of factorially complex tests makes the primary axes more highly correlated than they would be if placed by graphical methods, and may give rise to larger negative projections than would otherwise be the †
The popularity of clusteroriented approaches to exploratory factor analysis has seriously distorted this general meaning of simple structure as originally formulated by Thurstone. ††
Neuhaus and Wrigley [1954] expressed no awareness that their quanimax criterion tends to handle factorially complex variables ineffectively, however. They were of the opinion that the purpose of rotation was as yet undecided by psychologists, leaving reduction of factorial complexity as good a goal as any. Ferguson [19541 thought, moreover, in his suggestion of an equivalent approach, that he had arrived at the mathematical explication of Thurstone's principle of simple structure.
Page 84
case" [1953, pp. 33, 37]. He went on to recommend the elimination of complex tests, followed by reanalysis, as a possible means of obtaining a satisfactory solution. In a similar vein, Saunders pointed out that factorial complexity often caused problems for his approach to rotation, which soon came to be called quartimax [cf. Neuhaus and Wrigley, 1954], because "It is often much easier to invent or select variables that are loaded by both of a pair of factors than variables loaded by only one or the other of them" [Saunders, 1953 Note, p. 23]. He discussed the conditions under which his method might be expected to fail (i.e., in the absence of independent clustering of factorially pure variables) but, unlike Carroll, felt that an analysis involving "too many variables with the same type of factorial complexity is not so easy to deal with after variables have been selected" [1953 Note, p. 26]. Unfortunately, these warnings by Carroll and Saunders about the limitations of their clusteroriented techniques for analytic factor transformation went largely unheeded. In the decades since the advent of electronic computing Thurstone's powerful but subtle concept of simple structure has been wholly sacrificed in the popular quest for "ease of interpretation," even though the latter can be expected only when well understood variables have been carefully selected to yield an independent cluster structure. The true aim of exploratory factor analysis, on the other hand, is to identify hypothetical determinants of observed variation via causal relations inferred from the data—even if the latter are complex and difficult to interpret. Exploratory factor analysis requires a strong assumption about what form of functional relationship might prevail between hypothetical determinants and observed variables (linearity), a weak set of restrictions on the general pattern of relationships that might be expected to underlie a particular domain (cohyperplanarity), a good understanding of alternative theoretical bases for expecting manifest association among the variables under consideration, and adequate sampling of variables in line with the foregoing knowledge. Nothing about the process suggests that its outcome will be simple, in the popular sense of the term, or easy to interpret. Whereas clusteroriented factor analysis (clusterdimension analysis) aims mainly to summarize manifest associations within various moreorless independent groups of highly coilinear variables, true exploratory factor analysis is designed to reveal hypothetical causal associations between "source" factors and "surface" variables [cf. Cartell, 1946]. The latter associations are not necessarily discernable from zeroorder correlations because observed relationships can be the joint result of several factors operating in different (presumably either complementary or contradictory) ways. The simple structure hypothesis states only that for any given observed variable it should be possible to find one or more theoretically interesting hypothetical determi
Page 85
nants which do not play any causal role when it comes to variation in that attribute. Such a state of affairs will not generally be revealed by direct investigation of zero order correlations or by attempting to summarize the latter in terms of oblique clusterdimensions. Although it sounds almost trivial, detecting a particular variable's lack of relationship to certain factors becomes informative when we consider which other observed variables are influenced by those same factors. By focusing upon the subspace spanned by that set of variables which are unrelated to each respective factor, a ''hyperplane" can be located in multidimensional space normal (perpendicular) to the reference axis under consideration. If the hyperplane normal to a given reference axis is found to be highly populated and well defined (i.e., if many cohyperplanar variables within it fan out in all directions except along the reference axis under consideration) then one can attempt to infer the factor's theoretical meaning as a hypothetical determinant of the observed variables which do project onto its axis. Note, however, that such interpretation need in no way prove easy because any variable that relates to a particular factor can also relate to others in its own unique pattern of complexity. It is actually unlikely that many observed variables will be of unit factorial complexity, since we are seldom lucky enough to arrive at pure measures of most hypothetical determinants that emerge in the exploratory stage of any investigation. By the same token, it is dangerous to conclude that the mere appearance of "factorially pure" measures in any particular analysis indicates anything other than our failure to include enough indicators of other relevant sources of variation in the battery. For these reasons, accurate interpretation of a factor depends upon our ability to identify a theoretically appealing contrast between that set of variables which lie in its hyperplane and those which do not, rather than upon a cursory search for highloading "unifactorial" variables. In short, mere ease of interpretation is not the goal of exploratory factor analysis, although it may be the legitimate aim of other techniques such as cluster analysis and multidimensional scaling, where emphasis is placed upon exposing all pairwise linkages among observed variables to direct view. Common factor analysis has immense appeal as an exploratory data analysis technique in that it promises to help reveal functional linkages between theoretical constructs and observed variables, as well as to aid in the identification of relevant theoretical constructs per se. This aim could be accomplished through a clusteroriented approach only if most observed variables were pure measures of one and only one theoretical construct. Of course, if we could be confident of the latter then there would be little use in doing a factor analysis except to confirm this factorial purity. Even the confirmatory use of clusteroriented factor analysis must be viewed with caution, however,
Page 86
since recovery of expected clusters tells us only that we have identified several moreorless independent groups of collinear variables that hold up under replication. It does nothing to assure us that these clusters have any underlying theoretical unity or importance. Thurstone's original objective of hyperplane location has again begun to be accepted as the proper goal of factor transformation [e.g., Cattell and Muerle, 1960; McDonald and GharteyTagoe, 1973 Note; Kaiser and Madow, 1974 Note; Katz and Rohlf, 1974, 1975; Meredith, 1977], despite the past thirtyodd years of infatuation with clusteroriented analytic rotation methods. Such clusteroriented transformation methods have been eagerly exploited because they tend to make the outcome of any factor analytic investigation seem meaningful in the superficial sense of being easy to interpret. Little real theoretical insight can generally be gained from such results, however, since only the manifest, surface clustering of variables is clearly revealed by clusteroriented transformation. Although the solution is often made to approximate independent cluster structure and every variable thus appears as unifactorial as possible, the simplicity is usually deceptive unless a true independent cluster structure is indeed extant in the data. If factorially complex variables are present, as is virtually always the case in exploratory analyses, then "ease of interpretation" in unifactorial terms will only obscure the true factorial nature of the domain observed [cf. Guilford, 1977]. A valid mathematical criterion for use in the search for simple structure in terms of hyperplane location must be highly tolerant of factorial complexity, even if to the extreme that every common factor influences some of the variables. Individual variables may be "encouraged" to be of less than full factorial complexity, but none must be "forced" toward ultimate factorial simplicity—that outcome has to depend strictly upon the nature of the data itself. Simple structure is actually a hypothesized feature of naturally occurring multivariate data which we hope to detect and locate through an appropriate form of factor transformation. We hope any correspondence will provide theoretical insight into the workings of nature. Thus, simple structure is not a rigid pattern into which we can force the outcome of any factor analysis for the sake of easy interpretation, but a general guiding principle by which we hope to lead our mathematical model to reveal the underlying bases of observed reality. Effective application of this principle must proceed delicately lest we impose structure upon the data which is not really there. Our model must be responsive to the data rather than vice versa, hence the difficulty in replacing the art of subjective graphical rotation to detect simple structure with a rigid computer algorithm. Although objective, the popular clusteroriented factor transformation programs tend to impose their own brand of simplicity upon any data—even random data in which true simple structure cannot exist [Cattell and Gotsuch, 1963]. On the other
Page 87
hand, our own implementation of Thurstone's original mathematical criterion for a bestfitting simple structure, in the form of the softsqueeze direct geomin approach, proceeds so gently that it tends not to move too far away from any given starting solution in the process of "assimilating" observed test vectors into preexisting hyperplanes. Although the solution simultaneously shifts in "accomodation" to the observed data, this mainly affects the finer details of hyperplanar structure. It is clear that the latter effect of "finetuning" an available set of hyperplanes to reflect the detailed features of observed data could be put to good use in bounding hyperplane simple structure transformation, provided that an adequate initial configuration solution is available. In order to determine how we can arrive at such an a priori orientation for any given set of initial factors, it is necessary to go somewhat beyond the concept of simple structure as elaborated by Thurstone. In short, we must seek a philosophical basis for the initial placement of factor axes so that they will roughly approximate a set of latent determinants that could well have given rise to the test vector configuration at hand in the first place. Clearly this will involve making assumptions about the typical behavior of the class of latent determinants being entertained. The Positive Manifold Assumption. In his development of the concept of simple structure and its mathematical expression in criterion (32), Thurstone clearly had in mind the notion of an ideal simple structure. His box problem provides an example of a nearly ideal simple structure because it is contrived to do so, but it is legitimate to ask what, if anything, might lead to such an ideal configuration when working with real data. In fact, rather than dwell on such an idealized and unrealistic example, we prefer to motivate the search for simple structure by showing that a rather parsimonious assumption about the typical mode of joint action of natural causal determinants of behavioral variation implies the production of a test vector configuration similar to the ideal simple structure configuration illustrated in the box problem. Note our use of the term "similar"; it turns out that Thurstone's ideal simple structure is a special case of a more general class of configurations that might be expected to occur given about the weakest set of realistic assumptions we can make about the workings of nature. Short of accepting the extremely chaotic situation in which manifest behavioral variables are related to their latent determinants in a completely unrestricted fashion, about the weakest simplifying assumption we can make about the action of natural causal factors is that they may influence manifest variables either independently or in conjunction, but do not tend to act at odds to one another. This assumption is a more fundamental expression of the principle of scientific parsimony than is the notion of simple structure, although it leads to the latter as a special case.
Page 88
When dealing with the problem of identifying unobserved or "latent" independent variables that can be regarded as the source of observed covariation among dependent variables we really have little choice but to assume that these independent variables do not in general produce effects that cancel one another out. If such cancellation of latent determinant effects were typical, then the implications of any observed set of intercorrelations among manifest variables would be ambiguous indeed. In particular, we could not take the absence of manifest association between any two observed behavioral variables to mean that they do not share a common cause—it could be that the lack of manifest association occurs simply because the strong mutual effects of one shared determinant are cancelled out by equally strong but contradictory effects of another shared determinant. Although we cannot deny that precisely this sort of cancellation of independent variable effects may ocasionally occur, it must not be admitted as the typical way in which nature works within the behavioral domain on the grounds of scientific parsimony. Otherwise, we could be faced with a proliferation of mutually contradictory hypothetical determinant effects in order to account for a manifest lack of association between observed behavioral variables! It is simpler and presumably safer to assume that manifestly unrelated phenomena share no hypothetical influences in common unless we encounter convincing evidence to the contrary. This is in line with the usual conception of the law of scientific parsimony: "Entities shall not be multiplied without necessity" [Russell, 1945]. Before pursuing implications of the assertion that latent determinants within the behavioral domain must generally be expected not to act in opposition to one another, let us recall the original reason for postulating the existence of even one hypothetical determinant in the form of a general common factor. Although many others had failed to detect a consistent pattern of relationships among measures of intellectual ability, a turn of the century experimental psychologist named Charles Spearman (who happened to live near a village school where he could obtain course grades as well as administer his own sensory discrimination battery) noticed that all such variables supposedly related to intellectual ability were indeed positively correlated. Spearman's initial investigation of these correlations adjusted for attenuation due to unreliability clearly suggested to him that "All examination . . . in the different sensory, school or other specific intellectual facilities, may be regarded as so many independently obtained estimates of the one great common Intellective Function" [1904, p. 272]. Through further examination of the pattern of intercorrelations he had detected among measures of intellectual ability Spearman was able to discern a "hierarchical order" whereby various different measures of ability could be ranked with respect to their "saturation" with general intelligence.
Page 89
All of the controversy that followed upon Spearman's initial postulation of a hypothetical common factor of general intelligence simply led to eventual acceptance of the notion that more than one hypothetical determinant must be invoked in order to account for the pattern of associations commonly observed among measures of intellectual ability. Entertaining more than one hypothetical determinant within any given domain of behavioral variation immediately leads, however, to the issue of determining whether these alternative hypothetical sources of manifest variation should be expected to act in a complementary fashion, independently, or in opposition to one another. Consider Figure 6. In the first section of the figure we have represented geometrically Spearman's notion that all intellectual tasks fall into an orderly hierarchy depending upon the extent to which they are saturated with or influenced by one common hypothetical determinant, g. Since we have chosen to represent only the common part of each test (designated alphabetically) in the figure, they appear as a set of perfectly collinear vectors whose squared lengths are their saturations with g (factor F, in the figure). There are two striking things about the Spearman case, although only one is usually noted. Not only is it remarkable that the test vectors in Figure 6a are collinear, it is just as remarkable that they all project in the same direction from the origin. What the latter implies is that we do not generally encounter tests of intellectual ability on which increased general intelligence has a detrimental effect. Indeed, if such an instance occurred in practice we would not hesitate to conclude that either we have scored the test incorrectly or that the given test does not really belong in the domain of measures of intellectual ability. With respect to the latter possibility, however, brief reflection will reveal that there are few, if any, performances on which we would expect general intelligence to have a negative effect, even outside of the domain of intellectual abilities. There may be domains of behavioral variation that are largely independent of general intelligence, however. This line of reasoning brings us back, of course, to the positive manifold assumption with which we began this section on the grounds of scientific parsimony: We do not generally expect hypothetical determinants of manifest behavioral variation to act in opposition to one another.† †
Notice that we are speaking of hypothetical determinants of behavioral variation; i.e., those factors which lead to differential performance of some individuals relative to that of others at any given point in time and/or those factors which yield performance change from one time to another within the same indiidual We are not therefore attempting to account for steady state phenomena of nonvariation such as homeostasis, which might well be attributed to the balance or mutual cancellation of any indeterminate number of opposing forces. Although states of equilibrium can be viewed as the product of feedback control mechanisms operating within the context of organism/environment interaction. implying some balance of opposing Footnote continued on next page.
Page 90
Let us now consider the sort of configuration among common parts of test vectors that we could expect to find if more than one latent determinant were active within a given domain of manifest variation. The second section of Figure 6 illustrates what might well be expected if two common general factors underlie a given domain of variation. Again, Figure 6b has two striking features, even though only one is usually noted. (Often neither is noted, since independent clustering is expected to result once the influence of a second causal factor is felt within any domain of variation—as though no factor can influence any manifest variable that already relates to another factor!) Not only is it remarkable that the test vectors in Figure 6b are coplanar (that is, they are determined, up to uniquenesses, by the two factors), it is just as remarkable that no test vectors fall outside of the boundaries set by the two factor axes. The latter consequence, however, follows directly from the notion that neither factor should be expected to have a detrimental effect upon any performance within the broad general domain of manifest variation under consideration. If such were not the case, and any one of the test vectors fell outside of the boundaries defined by the factors, then we would tend to conclude that the test in question is not a consistent indicator of variation within the domain in question, and therefore may not belong in the analysis. What would happen if we allowed hypothetical determinants to act in opposition? The third section of Figure 6 is the answer; in it we have attempted to present a moreorless random distribution of test vectors in twospace. Notice that the common parts of scores Zd and Ze are unrelated to one another; however, if the factor axes F1 and F2 were interpreted as hypothetical determinants then we would be led to believe that score Zd is positively influenced by both factors while score Ze is positively influenced by one factor and negatively influenced by the other—a rather elaborate and unparsimonious way of accounting for the manifest lack of relationship be (Footnote continued from previous page) forces, this does not do any injustice to the assumption that determinants of most behavioral variation tend not to act in opposition to one another From the perspective of individual development all we are saying is that movement from one point of equilibrium to another with respect, say, to some aspect of general intelligence (such as verbal ability) does not generally result in a change in the opposite direction in any other aspect of general intelligence (such as spatial ability). The organism processes information continuously so its performance on any given task at any given point in time must reflect a balance of sorts between what new information has been gained through processes of "assimilation" and what old information has been discarded through processes of "accommodation". In dealing with intraindividual change, then, the positive manifold assumption simply asserts that the resources of the organism are not ordinarily expected to be so limited that an increment in one capacity implies a decrement in other capacities. Another way of putting this is to say that positive transfer of training is expected to generalize more widely than is negative transfer of training.
Page 91
Figure 6. Comparison of configurations among the common parts of test vectors in the case of one common general factor (6a). as postulated by Spearman (1904); in the case of two common factors whose range of joint action is limited by the principle of noncancellation of latent variable effects (6b), in line with the positive manifold assumption; and in the case of a random configuration of test vectors. corresponding to the chaotic situation in which the effect of one latent determinant can be routinely cancelled out by the effect of another latent determinant (6c). Notice that in Figure 6b no test vectors fall outside the boundaries defined by the factors F1 and F2, although it would be perfectly consistent with the principle of noncancellation of latent variable effects were some of the vectors reflected so that they fell on the opposite side of the origin.
Page 92
tween scores Zd and Ze! Similar inconsistencies in the direction of influence of factors on manifest variables would show up no matter what orientation (rotation) of factor axes we might choose in Figure 6c, however, so it would not be possible to arrive at a parsimonious description of the overall configuration. What the above arguments and illustrations were designed to communicate is that the assumption that hypothetical determinants tend not to act inconsistently (as in Figure 6b) can be expected to hold when we are dealing with manifest associations among scores from a moreorless internally consistent but multidimensional domain of behavioral variation, as opposed to when we are dealing with associations among an arbitrary collection of variables (as in Figure 6c). If we are dealing with scores from a moreorless homogeneous domain of behavioral variation—which should be the case if we are looking for common hypothetical determinants of manifest covariation—then we can expect the common parts of all the test vectors to fall within the allpositive and/or the allnegative orthant of a coordinate system whose axes make roughly equal angles with the major principal axis of the test vector configuration. In other words, we should be able to reflect all of the test vectors so that they are not only mutually positively correlated, but also gathered within the same quadrant or orthant of hyperspace. The coordinate axes that define this allpositive orthant, fully containing the configuration of appropriately reflected test vectors, are then likely to be closely aligned with hypothetical determinants that could have given rise to the test vector configuration in line with the positive manifold assumption about the action of hypothetical determinants; i.e., they do not work at odds to one another, but work either independently or in conjunction. Furthermore, the restriction of an entire test vector configuration to the subspace defined by the allpositive and/or allnegative orthant of a coordinate system implies the existence of a substantial amount of common, shared variance, as might well be expected when dealing with a homogeneous domain of variation. The latter implies, in turn, that a large component of common variation will occur along the major principal axis of the test vector configuration—an axis passing roughly through the center of the test vector configuration and therefore making roughly equal angles with any system of coordinate axes which is highly populated by test vectors in its positive and/or negative orthant. At the beginning of this discussion we implied that Thurstone's notion of an ideal simple structure is a special case of the pattern of relationships one might expect to find among factors and variables given the assumption that hypothetical determinants of manifest behavior do not act inconsistently. Since the box problem, as illustrated in Figure l, is representative of Thur
Page 93
stone's geometrical conceptualization of simple structure, let us return to it as a special case of the sort of configuration we have been discussing in this section. In our earlier discussion of the box problem we pointed out that the aim of simple structure factor transformation, as presented by Thurstone [1935, 1947], is to identify reference vectors normal to that set of hyperplanes that close in on and effectively bound the test vector configuration. It is as though the test vectors are presumed to be gathered in a quadrant or orthantshaped bundle somewhere in hyperspace, and our aim is to bound or envelop them closely with hyperplanes normal to reference vectors through transformation of the coordinate system. Assuming an orthogonal coordinate system for the moment, the aim is to rotate the coordinate frame rigidly until all test vectors fall within the allpositive and/or allnegative orthant of the system. (This could easily he done in Figure 6b, for instance.) Notice that the geometrical process we have just tried to picture could only be expected to yield an orientation of coordinate axes that aids in the identification of hypothetical determinants of the manifest test vector configuration if we subscribe to the assumption that these hypothetical determinants tend not to act at odds with one another. Otherwise, we should not expect to find a boundable bundle of test vectors gathered within any particular potential coordinate system orthant. Indeed, such a constellation of test vectors within any one potential orthant (or two opposing potential orthants) of hyperspace is a great departure from a spherical distribution of said vectors throughout hyperspace (as suggested in Figure 6c). That is why Thurstone regarded the discovery of a positive manifold simple structure configuration as particularly compelling from a scientific perspective [1935, pp. 200201]. Although we have been arguing that Thurstone's conception of an ideal simple structure is just a special case of what can be expected when hypothetical determinants tend not to act in a mutually contradictory fashion, it seems that Thurstone himself was only dimly aware of this state of affairs. Although he recognized that a positive manifold simple structure configuration is a very compelling departure from an arbitrary test vector configuration, as mentioned above, he seems not to have realized that allowing factors to be bipolar undermines the basic philosophical justification for expecting simple structure unless that bipolarity takes the form of simple tolerance for reflection in the direction in which manifest variables are scored. In other words, allowing reflection in the direction of scoring of manifest variables means that we can expect both the allpositive orthant and the allnegative orthant to be populated in a coordinate system which has been transformed so that the associated hyperplanes bound the test vector configuration. As indicated earlier, the latter sort of bipolarity does not do any violence to the notion that hypothetical determinants tend not to act at odds to
Page 94
one another. This point was not stressed by Thurstone, however, since he tended to discuss the concepts of simple structure, positive manifold, bounding hyperplanes, and factorial invariance as though they were disconnected topics [cf. Butler, 1969] when, in fact, they all relate simultaneously to the more basic notion that hypothetical determinants do not generally act in a mutually opposing fashion. Returning again to Thurstone's notion of an ideal simple structure, as illustrated by the box problem in Figure l, we can see that almost all of the test vectors are distributed within clearly defined hyperplanes along the outer boundaries or faces of the configuration. Notice, moreover, that the system of primary factor axes at the lateral edges of the configuration could easily be approximated by an orthogonal coordinate system that is maximally similar to the existing primary factor system in the sense of having each orthogonal axis as highly correlated with its respective primary factor axis as possible. This notion of a set of orthogonal axes maximally correlated with a corresponding nonorthogonal set of axes has been discussed by Green [1952], Gibson [1962], Johnson [1966], Schönemann [1966], Kaiser [1967], Bentler [1968], Hall [1971], and Price and Nicewander [1977]. It is a very simple matter to transform any initial set of primary factors, or the corresponding set of reference vectors, to the required system of "orthogonal bounds" lying nearby. For our present purpose what is important to notice, however, is that a set of mutually orthogonal axes could easily be found for the box problem so that all of the test vectors represented in Figure I fell into only one orthant—the allpositive orthant—through transforming the primary factors to their orthogonal bounds. Given knowledge of the location of primary factors, it is a simple task to transform the latter to the nearest set of orthogonal axes—the orthogonal bounds. What is more to the point, however, since we do not start an explanatory factor analysis with any knowledge at all about the location of the primary factor axes, is that if we can manage to find a set of orthogonal coordinate axes which includes all of the common parts of manifest test vectors within its allpositive and/or allnegative orthant, then we are very close indeed to having located the required primary factors per se. This is the mileage gained by making the positive manifold assumption that hypothetical determinants of behavioral variation tend not to act in opposition to one another. We have simply converted the logic to say that success in orienting a reference frame so that every test vector has either allpositive or allnegative projections onto the coordinate axes is likely to have brought the latter into close alignment with any natural causal factors that actually gave rise to such a characteristic type of test vector configuration in the first place, in line with the positive manifold assumption. At this point it is appropriate to point out that the orthogonal bounds,
Page 95
or set of mutually orthogonal coordinate axes that approximate as closely as possible the location of corresponding primary factors, are not too far removed from the primary factors themselves unless the latter are very highly intercorrelated. Indeed, if we assume that acute angles of inclination prevail among primary factors, as should be the case if they span a homogeneous domain of variation and fall at the intersections of hyperplanes that bound a compact configuration of test vectors, then the orthogonal bounds will be located just enough farther out from the center of the test vector configuration than are the primary factors to permit mutual orthogonality. A bit farther out, then, we will find the system of reference vectors, which must be inclined to one another at obtuse angles in order to permit the hyperplanes to which they are normal to close in on and envelop or bound the test vector configuration. One very noteworthy feature of such a configuration is the fact that those test vectors closest to the primary factor axes, and therefore the most factorially simple or pure indicators in the test vector configuration, are also the most nearly mutually orthogonal set of test vectors within the configuration. Referring to Figure 1, we can easily see that the three factorially pure measures, h, l, and w, are the most mutually orthogonal triplet of test vectors in the entire box problem configuration. Butler [1964, 1969] has referred to this property of factorpure measures as their ''distinguishability" and has argued for a form of factor analysis that takes the most distinguishable set of observed test vectors to define the location of factors. Although the latter approach is consistent with our assumption about the mode of joint action of latent determinants within the behavioral domain, it does not make use of all of the information about primary factor location contained in the test vector configuration since reliance is placed upon only the factorially pure indicators of each factor (if such exist). Given the assumption that hypothetical determinants act either in concert or independently, but not in opposition, it is only necessary (from a geometrical perspective) to locate the expected compact bundle of test vectors in hyperspace and to orient our coordinate axes so that all of these test vectors fall within the polyhedral convex cone defined by the positive and negative orthants of that coordinate system, in order to come very near aligning coordinate axes with likely hypothetical determinants. Although even the latter is much more easily said than done, once it is accomplished we are still left with the challenging task of refining the solution to allow for possible correlation among primary factors (hypothetical determinants) through appeal to the notion of simple structure developed by Thurstone. This brings us, finally, to a consideration of the relationship between Thurstone's notion of simple structure and the assumption that hypothetical determinants do not act in opposition. The latter suggests the conditions
Page 96
under which it should be feasible to find an orientation of coordinate axes such that a polyhedral convex coneshaped bundle of test vectors is made to fall totally within the positive and negative orthants of hyperspace. When such a solution is found, moreover, it suggests that our coordinate axes are approximately aligned with a set of hypothetical determinants that may well have given rise to the observed data in the first place. Such a coneshaped configuration of test vectors is a highly improbably outcome of pure chance. The notion of simple structure takes us one step further by suggesting that few, if any, observed variables will be found to be influenced by all of the hypothetical determinants or factors which have given rise to the full configuration of test vectors; i.e., it suggests not only that test vectors will fall into a polyhedral convex coneshaped configuration, but that only the faces of the region so defined will tend to be occupied. The positive manifold assumption limits the range of joint action of latent causal factors which is presumed to occur in that mutual cancellation of their effects upon manifest variables is seen as an atypical mode of expression. This state of affairs implies that we should be able to rotate a reference system of coordinate axes of appropriate dimensionality so that the entire configuation of test vectors from any homogeneous domain of variation falls wholly within the allpositive and allnegative orthants of the coordinate system. Thinking, then, in terms of the projections of test vectors onto such an orthogonal reference frame (i.e., onto orthogonal axes normal to hyperplanes which bound the test vector configuration), we would expect to see likesigned projections of any given test vector onto all of the coordinate axes. Any test vector in the positive orthant would have positive projections onto all of the coordinate axes and any test vector falling within the negative orthant would have negative projections onto all of the coordinate axes. Since no test vectors would fall outside of the boundaries of these two opposing orthants, we would not find any test vector with coordinates of mixed signs. It is of historical interest to note that conclusions similar to the above were reached by Thurstone before he developed the principle of simple structure per se. In his 1933 presidential address to the American Psychological Association, wherein he introduced his ideas about multiple common factors as an alternative to Spearman's general factor conceptualization, Thurstone speculated as follows about how multiple factor analysis might best be accomplished: This solution would be to find a set of orthogonal axes through the fringe of the space of mental abilities rather than through the middle of it. The geometrical representation of the solution will probably be as follows. The mental abilities can be represented as
Page 97 points within a cone. The axis of the cone will represent a fictitious central intellective factor. The fundamental abilities which have genetic meaning will be represented by a set of orthogonal elements of the cone in space of as many dimensions as there are genetic factors. All mental tests will then be described in terms of positive orthogonal coordinates, corresponding to the independent genetic factors. Negative loadings will disappear. [1934, pp. 3031].
In other words, primary factors should be located at the lateral edges or "fringes" of a highly populated polyhedral convex coneshaped test vector configuration. What the notion of simple structure adds to the above is the idea that many (if not all) test vectors can be expected to fall at or very near the boundaries of the positive and negative orthants of any coordinate system whose axes approximate the location of those latent determinants which presumably gave rise to the configuration of test vectors in the first place. In other words, not only would we expect the projections of any given test vector onto the coordinate axes (read latent determinants) to be consistent in terms of algebraic sign, but we would also expect at least one, and perhaps more, of these projections to be at or near zero. Indeed, the aim of simple structure transformation is to bound the faces of a presumed polyhedral convex cone of test vectors with hyperplanes so closely that the intersections of those hyperplanes will define the location of a primary factor precisely at each lateral edge of the test vector cone, even if no factorially pure manifest variable happens to be found in that location. This is the best estimate that can be reached, through mangulation in hyperspace, if you will, of the position of any hypothetical determinant that might have participated in giving rise to the associated configuration of test vectors. It is actually a very powerful geometrical notion, provided that the positive manifold assumption that latent determinants do not act at odds to one another is not far from the truth. Since there is no reason to presume that latent determinants in nature will be mutually uncorrelated, we must be prepared to locate primary factors at the intersections of testvectorconfigurationbounding hyperplanes which are normal to an oblique set of reference axes. This condition need not conflict in any way with the assumption that latent determinants do not tend to act at odds to one another, provided that hyperplanes are not brought inside of the effective boundaries of the test vector configuration in order to maximize the number of nearzero loadings in the primary factor pattern matrix. The latter can actually present a problem in the practical application of simple structure transformation with real data, however, since it is not as uncommon as Thurstone might have us believe (in his later work) for manifest variables
Page 98
to be of full factorial complexity; i.e., to have substantial loadings on all factors being entertained. Any such fully complex variable would be located near the central axis of the polyhedral convex coneshaped test vector configuration, as is clearly the case for a couple of variables even in the 26variable box problem (see, e.g., the measure of volume, h/w, in Figure 1). Notice from Figure I and Table I for the box problem that the overwhelming majority of test vectors in this configuration conform to the simple structure hypothesis; i.e., all but the last two variables fall into at least one hyperplane so there is at least one nearzero coefficient in each of the respective rows of the primary factor pattern matrix. It is certainly not a necessary consequence of the positive manifold assumption about the action of latent determinants that every variable, or even most of the variables, will fall into at least one hyperplane, although any variables that populate the faces of their polyhedral convex cone of occupation will likewise fall into the hyperplanes that closely bound those faces. It is for this reason that we regard Thurstone's notion of simple structure as a special case of the positive manifold assumption about the limited range of joint action of latent determinants in nature—a special case that cannot generally be expected to hold true for all variables in any real, as opposed to idealized, test vector configuration. We must therefore be prepared to tolerate any variables that prove to be of full factorial complexity in the practical appliation of simple structure transformation, provided that said variables fall within the region bounded by hyperplanes normal to reference vectors in the final solution. It may be apparent from all of the foregoing that direct application of the principle of simple structure alone can do little to help us in the task of locating and placing bounding hyperplanes about the faces of a polyhedral convex cone of test vectors which is presumed to be located off in some potential, but as yet unknown, pair of opposing orthants of hyperspace. The principle of simple structure can only be expected to be of utility if it is invoked as a means of refining the final location of bounding hyperplanes once we have already found their approximate location through some other means. The foregoing consideration undoubtedly ranks among the many reasons why we encountered local minimum problems in our early attempts to approach satisfaction of Thurstone's original criterion for simple structure (32) through minimizing that function, given that we employed a rather arbitrarily selected starting position from which to transform the factors. A Source of Prior Information About Test Vector Configuration Location Even the highly refined and modified soft squeeze approach to direct geomin transformation, the performance of which was illustrated with Lord's
Page 99
data in Part 11 of this volume, can be found to give different local minimum solutions provided they are extant in the data and a solution lying nearby is used as an initial configuration for starting direct geomin iterations. In fact, Butler [1964], Eber [1966], Cureton and Mulaik [1971], and others have noticed that alternative "simple structure" solutions exist for Thurstone's 26variable box problem—even though this example was specifically contrived to illustrate a nearideal simple structure configuration. Although the preferred solution to the box problem we presented in Table I was obtained using softsqueeze direct geomin transformation, alternative and less desirable solutions can also be revealed by direct geomin transformation simply by starting with less optimal initial configurations. We have presented the basics of such an alternative solution in Table 4. Notice that the main things "wrong" with the box problem solution given in Table 4 are the fact that the primary factors are more highly correlated than in the preferred solution (Table 1) and the fact that the most highly distinguishable triplet of test vectors (h, l, and w—the original box dimensions) each show an inconsistent pattern of signs in their loadings across the various factors. Although neither of these features of the solution do enough violence to the principle of simple structure per se to lead direct geomin away from this solution, they do imply that something might be amiss when seen from the perspective of the positive manifold assumption that latent determinants can seldom be expected to act at odds to one another. Inconsistent loading signs coupled with high interfactor correlations imply that hyperplanes may well have cut inside the bounding faces of the presumed polyhedral convex cone of test vectors. What this solution has actually accomplished is the placement of hyperplanes so that they intersect near clusters of moreorless collinear variables, more in line with an independent cluster orientation to achieving "ease of interpretation" of the factor pattern than in line with the notion of bounding the entire test vector configuration with hyperplanes whose intersections then define the location of primary factors. We obtained the solution in Table 4, in fact, by using a targeted independentcluster solution as the initial starting point for direct geomin iterations. A contrast in the way in which we obtained these two alternative solutions to the 26variable box problem will suggest a strategy that can be developed into a general means of ensuring that only bounding hyperplanes are converged upon in the practical application of direct geomin simple structure transformation. In the successful instance of analysis of the box problem data, the outcome of which is presented in Table 1, we used as an initial configuration from which to start direct geomin iterations an orthogonal solution that is completely arbitrary except for one feature: all axes of this orthogonal starting configuration were made to have equal angles with the major
Page 100 Table 4. Direct Geomin Local Minimum Solution for Thurstone's Box Problem
FACTOR PATTERN MATRIX
Variable
I
h
40*
II
III
67
68
l
67
35
60
w
75
60
38
hl
19
13
84
hw
25
82
13
/w
91
13
08
h 2l
05
39
81
2
44
08
74
h w
01
81
35
2
hw
42
81
09
2
l w
87
06
26
lw2
89
29
09
h/l
85
87
00
l/h
85
87
00
h/w
91
01
91
w/h
91
01
91
l/w
01
90
86
w/l
01
90
86
2h + 2l
27
03
82
2h + 2w
28
84
07
2l + 2w
89
13
l0
h 2 + l 2
27
04
80
h 2 + w2
29
80
06
l2
86
13
13
50
39
40
59
31
35
hl
2
+ w2
h/w 2
2
2
h + l + w
FACTOR CORRELATIONS I
*
100
34
II
34
100
36
III
42
36
100
decimal points omitted
42
Page 101
principal axis of the original test vector configuration in common factor space. This general type of transformation was developed by Landahl [1938] because it was found useful by Thurstone [1938] as a starting point in his extended vectors method of graphical rotation in threedimensional subspaces (an effective means of locating bounding hyperplanes of the test vector configuration through hand calculations when the number of factors is small). That the transformation developed by Landahl provides a good point from which to start any search for hyperplanes which bound the test vector configuration is not really surprising. Far from being hidden off in some completely unknown and only potentially boundable pair of opposing orthants of hyperspace, our presumed compact and polyhedral convex coneshaped configuration of test vectors is really not difficult at all to localize, at least in one major sense, well before the outset of simple structure transformation. The major principal axis of the test vector configuration must pass roughly along the central axis of the very cone whose faces we wish to bound (and, therefore, define) with hyperplanes normal to a corresponding set of reference vectors. What this means is that we can anticipate from the outset of any search for bounding hyperplanes that these planes will all fall at roughly the same degree of angular inclination to the major principal axis of the test vector configuration, provided that the latter occupies a compact polyhedral convex coneshaped region that is moreorless uniformly populated with test vectors. The last condition will be met if the positive manifold assumption about the joint action of latent determinants holds essentially true and if an effort has been made to obtain a well representative sample of manifest variables from the domain of variation of interest. Since the hyperplanes bounding a polyhedral convex coneshaped configuration of test vectors can be expected to be inclined at approximately equal angles to the major principal axis (which passes, in turn, through the approximate centroid of the test vector cone), the primary factor axes that fall along the lateral edges of that same configuration (at the intersections of hyperplanes) must also intersect the major principal axis at approximately equal angles (the complement of the angles made by the hyperplanes, in the case of orthogonal axes). That is the reason why Landahl's transformation provides a fairly reasonable positioning of coordinate axes from which to begin the seach for simple structure. However, the fact that Landahl's transformation is quite arbitrary in all other respects means that it could be greatly improved upon, in terms of the aim of fully and effectively bounding the test vector configuration, through successive adjustment of coordinate axes within the m1 dimensional subspace which remains unspecified after fixing the relationship of all coordinate axes to the first principal axis. Although we have only thought briefly about the route this implies for approximately bounding a test vector configuration with coordinate hyperplanes, we see it
Page 102
as a bit too rigid, constrained, and computationally unwieldy to put into actual practice. It does serve, however, as a good geometrical summary of what must ultimately be accomplished before we have in hand an initial starting configuration that can be safely surrendered to the principle of simple structure for final refinement of hyperplane location. The general idea here is first to transform an orthogonal set of coordinate axes so that they make equal angles with the major principal axis of the test vector configuration, following Landahl, then spin this system of axes rigidly about the major principal axis until all test vectors in the configuration are found to be within the boundaries provided by the two opposing orthants of the coordinate frame which are penetrated centrally by the major principal axis. From that point on it should be safe to appeal to the principle of simple structure alone as a means of refining the location of hyperplanes, provided that fully complex variables approaching collinearity with the major principal axis are ignored. The foregoing considerations seem to lead to the conclusion that any attempt to locate hyperplanes bounding the faces of a presumed polyhedral convex coneshaped configuration of test vectors must make use of the information provided about the general location of that configuration in hyperspace by its major principal axis. This information leads toward effective use of the notion of simple structure per se, since confining the search for simple structure to those sets of primary factors which make roughly equal angles with the major principal axis of the test vector configuration reduces the danger of getting lost and trapped in an undesirable local minimum that satisfies Thurstone's criterion in a local sense because it is fixed upon nonbounding (hence, noninformative) hyperplanes. The latter are quite prevalent in real data, as Cattell [1966] points out, and must be carefully and routinely avoided in the practical application of simple structure transformation. It is prudent to carry out any search for simple structure (i.e., any attempt to refine the location of hyperplanes so that they bound the test vector configuration about its faces) on the assumption that factors tend to make roughly equal angles with the major principal axis of the test vector configuration. It is also wise, however, to assume (unless there is convincing evidence to the contrary) that the primary factor axes being sought are not too highly inclined toward the major principal axis of the test vector configuration itself. In other words, there is good reason to presume that primary factors are highly "distinguishable", in the sense that they are mutually highly uncorrelated and distinct, unless this is contradicted by strong evidence from the data. Recall from our earlier discussion land as is evident from Figure 1 for the box problem) that factorpure indicators are the most highly distinguishable (read most nearly mutually orthogonal) mtuplet of test vectors in the entire configuration. This situation is exploited by Butler [1964] in
Page 103
his "simplest data factors" solution, which gives factorially invariant results when conventional factor transformation methods fail. It stands to reason, then, that primary factors per se must tend to be quite highly distinguishable—falling, as they do, at the extreme lateral edges of the polyhedral convex coneshaped configuration of test vectors. As Thurstone put it, each common factor is located at the "fringe of the space of mental abilities" [1934, p. 30]. By directing our search for primary factors that might satisfy Thurstone's criterion for simple structure (32) primarily to a region about the major principal axis of the test vector configuration which is the domain of all possible sets of orthogonal systems of coordinate axes making equal angles with the major principal axis, then, we should be able to diminish the probability of converging to a final solution that falls inside of the true boundaries of the test vector configuration. The latter problem was not avoided in the application of soft squeeze direct geomin transformation to the 26variable box problem (Table 4), however, because we started hyperplanar search from rather centrally located independent clusters in the initial configuration. In this chapter we have taken a second look at Thurstone's powerful but subtle principle of simple structure and have concluded that it is a special case of what might well be expected in terms of factor loading patterns when the positive manifold assumption about the mode of joint action of latent determinants can be made. This assumption leads to the implication that the common parts of manifest test vectors will be gathered into a polyhedral convex coneshaped configuration in hyperspace. Identification of the lateral edges of such a configuration by locating the intersections of hyperplanes that closely bound its faces will then yield a set of primary factors which can be taken as hypothetical determinants of covariation within the domain of manifest variables sampled. It became evident in our discussion that the principle of simple structure per se can do little to ensure that hyperplanes are located at the actual boundaries of a test vector configuration, as opposed to other less informative locations that might also tend to satisfy that principle in a local sense. Fortunately, our knowledge that the major principal axis of the test vector configuration must pass approximately along the central axis of any polyhedral convex coneshaped region occupied by that configuration suggests a way to confine the search for simple structure to a region of hyperspace which is likely to contain the boundaries of the test vector configuration—given the circumstances prescribed by the positive manifold assumption about the range of joint action of latent determinants within the domain of behavioral variation. It appears, then, that practical application of the principle of simple structure requires explicit use of the a priori information that primary factors
Page 104
can generally be expected to be highly distinguishable (i.e., approximately mutually orthogonal) and related about equally strongly to the major principal axis of any roughly homogeneous battery of scores being analyzed. Only in the face of convincing evidence from the data can departure from these presumed structural aspects of the solution be tolerated. Otherwise, we run the risk of having no justification for inferring that the intersections of hyperplanes provide a fix on the location of true latent determinants of manifest covariation. The validity of this inference stems from the truth of the positive manifold assumption and is not a property of the principle of simple structure per se. It is only because the positive manifold assumption implies the properties of simple structure for the subset of variable vectors that happen to fall at the lateral edges or faces of their expected polyhedral convex coneshaped distribution in hyperspace that the principle of simple structure is justified as a means of isolating hypothetical determinants. Additional principles must therefore be invoked in order to ensure that the hyperplanes located through application of the principle of simple structure actually bound the test vector configuration. That is why prior knowledge about the relationship of the major principal axis of the test vector configuration to the hyperplanes that bound that configuration must be made to play an intimate role in the process of simple structure factor transformation. In the next chapter we will present practical suggestions about how to make use of prior information concerning the form of test vector configuration implied by the truth of the positive manifold assumption, in order to direct the search for simple structure via the softsqueeze direct geomin transformation method outlined in Part II to an informative outcome.
Page 105
5 Bounding Hyperplane Simple Structure Transformation Toward Bounding the Test Vector Configuration with Hyperplanes in the Identification of Primary Factors In the foregoing chapter we sought a philosophical basis for the location of primary factors axes that can be taken as hypothetical determinants of manifest covariation. A simple assumption about the limited range of joint action of latent determinants within the behavioral domain to the effect that such determinants tend not to oppose one another was seen to imply a characteristic form of test vector configuration. From the geometrical properties of this form of configuration it was found that we could infer the location of any hypothetical determinants that might have given rise to the configuration in the first place. The resulting solution for the location of primary factors could then be expected to exhibit simple structure in a somewhat limited sense; i.e., some, but by no means all, of the variables could be expected to relate to fewer than the full set of hypothetical determinants being entertained, in line with criterion (32). An additional feature of the expected solution that was, in many respects, more important than the presence of one or more nearzero loading(s) for some of the variables, however, was seen to be consistency in the algebraic signs of loadings for any given variable across all factors. The latter implication of the positive manifold assumption (which gives the potential to infer geometrically the location of latent determinants) goes beyond the principle of simple structure, so steps must be taken in the course of simplestructureoriented transformation to ensure that this additional principle is satisfied. The principle of positive manifold was certainly taken into account by early practitioners of graphical rotation; the general idea is to make certain that any hyperplanes located through the application of the principle of simple structure are hyperplanes that actually tend to bound the entire test vector configuration. As was seen toward the end of the last chapter, we have a priori knowledge that the primary factors falling at the intersections of such testvectorconfigurationbounding hyperplanes will
Page 106
tend to be highly distinguishable and inclined at roughly equal angles to the major principal axis of the test vector configuration. It is the aim of this chapter to show how softsqueeze direct geomin transformation can be modified to take account of those principles, in addition to limited simple structure, that would characterize the primary factor pattern in any situation where primary factors actually correspond to latent determinants with a limited range of joint action, in line with the positive manifold assumption. It is interesting, by the way, that practical success in developing such an approach to primary factor transformation can lead to a means of determining the extent to which the positive manifold assumption holds true in the analysis of naturally occurring behavioral data. It has been our experience that the positive manifold assumption is reasonable in a far wider range of behavioral domains than has previously been recognized. Whereas positive manifold has typically been viewed as a reasonable assumption in the intellectual (cognitive) domain, it has not generally been applied in the domain of personality or other areas where ''bipolar" factors are presumed to be the rule. In fact, about the only thing we have noticed in applications of the proposed method to personality ratings [Yates, 1979 Note] is a slightly diffuse positive manifold, in that a few small projections outside of the manifold must be tolerated. Although there are a number of different conditions in addition to that of limited simple structure that simultaneously characterize any set of primary factors conforming to the model outlined in the previous chapter for the general mode of action of latent determinants, the goal of defining a specific algorithm for meeting these conditions demands a somewhat distinct treatment of each aspect of the problem. We therefore have developed, and will introduce separately, several different modifications of and additions to the softsqueeze direct geomin approach to simple structure transformation introduced earlier. In practice, however, all of these computational innovations are implemented simultaneously and/or in an iterative sequence in order to ensure attainment of the multiple objects of boundinghyperplane simple structure transformation alluded to in the foregoing chapter. For convenience, we refer to the overall approach as "direct geoplane" transformation, indicating that it involves softsqueeze direct geomin as a simplestructureattaining component but that it also takes into account the conditions that must be met in order to ensure bounding of the test vector configuration by all m1 hyperplanes that intersect along each primary factor axis. This chapter is devoted to the practical application of the theoretical principles discussed in the previous chapter. It is clear that our particular solution to the computational tasks involved is only one of many possibilities, just as was the case with our development of softsqueeze direct geomin as a practical means of approaching satisfaction of Thurstone's original mathe
Page 107
matical criterion for simple structure (32). Practical aspects of the computational task often demand that scientifically arbitrary issues be settled on the basis of an appeal to the aesthetics of design rather than to profound theoretical principles. A good example of this was our choice of the geometric mean in (38) instead of some other choice that might also have been in line with (32). The most important considerations in leaving such a choice up to our artistic sensibilities are that it not conflict with the scientific principles being implemented (e.g., criterion (38) is certainly more in line with (32) than is (34)) and that our eventual practical experience shows successful performance of the implemented option. There are many different ways to span a space using the same basic principle of suspension, for example, since many additional aesthetic as well as practical considerations can enter into any realistic application of the principle.† The same is true for the principle of non cancellation of latent determinant effects. Our approach to the application of such a principle is only one among many possibilities, but we have made every effort to make our computational approach fully consistent with the scientific principle in mind. Moreover, we have followed the route of extensively testing alternative modes of implementation on real as well as simulated data in order to optimize both the design and implementation of the proposed solution. Many alternative approaches have thus been entertained and eventually discarded along the way. In fact, it was precisely through this sort of experience in the application of direct geomin transformation that we were led to a reconsideration of theoretical bases for expecting simple structure in real data. The modified theory was presented in the previous chapter, and the corresponding computational modifications are presented in what follows. Distinguishability Weighting. From our discussion of Butler's [1969] work, as well as from our own perspective outlined earlier, it is clear that the factorially pure tests within any collection of moreorless homogeneous tests are likely to be highly distinguishable (i.e., more nearly mutually orthogonal than any other mtuplet of tests). Of course, a battery may contain very few factorpure tests, or the factors themselves (and, hence, any factorially pure tests) may be intercorrelated, but we can, in general, make use of the notion of distinguishability to help ensure that simple structure transformation leads to factors which fall at the intersections of testvectorconfigurationbounding hyperplanes. A convenient and practical way to exploit the notion of distinguishability in simple structure transformation is to make use of this notion in conjunc †
The progression from Greek beam and column architecture through the Roman and Gothic arches demonstrates the extensive ramifications of various practical discoveries designed to accomplish the same basic goal of spanning space [Bronowski. 1973. pp. 104113].
Page 108
tion with our prior knowledge that primary factors should tend to be highly distinguishable themselves (i.e., approximately mutually orthogonal), as well as tending to make approximately equal angles with the major principal axis of the test vector configuration. This presumed structure follows from the assumption of non cancellation of latent determinant effects. Given such a presumed structure, however, we can assess the distinguishability (i.e., likelihood of being factorially pure) of each manifest variable simply by considering its angular inclination to the major principal axis of the test vector configuration. In particular, the primary factors themselves and, therefore, any factorpure variable can be expected to have an angular inclination of (74)
cos1 (1/ m)
to the major principal axis of the test vector configuration. The value of 1/ m is the direction cosine, relative to the major principal axis, required to make each of a set of mutually orthogonal axes be equally inclined toward that major axis, and is employed in Landahl's [1938] method of transformation for this purpose. Since we are speaking, for the moment, of an orthogonal reference frame, it is obvious that the angle given in (74) is the complement of the angle which hyperplanes normal to the system of factor axes under discussion must themselves be inclined to the major principal axis of the test vector configuration. Given the above information we have a promising route toward indexing the "distinguishability" of each variable in a battery in terms of how close it comes to intersecting the major principal axis of the test vector configuration at the theoretically ideal angle (74) for a factorially pure variable. This even suggests one possible way to make an independentclusteroriented criterion locate factors at the intersections of hyperplanes bounding a test vector configuration, since weights could be chosen that would make only the most highly distinguishable test vectors play an active role in the transformation of axes. Of course, success of the latter approach would depend upon the existence of essentially factorpure variables for every factor, as well as accurate location of the central axis of the test vector configuration and nearorthogonality of the primary factor axes. Nevertheless, this basic strategy was found by Cureton and Mulaik [1975] to lead to the first successful purely analytic solution for Thurstone's 26variable box problem using an independentclusteroriented criterion (varimax). The distinguishability index suggested by Cureton and Mulaik is designed to provide weights for use in an independentclusteroriented criterion for factor transformation. The maximum weight must therefore be applied to
Page 109
those variables exactly making angle (74) with the major principal axis of the test vector configuration, with a relatively smooth but rapid decline in weighting as the location of variables departs from what might be expected of factorpure indicators of orthogonal factors. This results in a rather more complicated formulation of the problem of weighting variables according to their distinguishabilities than is necessary with a true simplestructureoriented criterion that tolerates factorial complexity. Along the lines explored independently by Cureton and Mulaik, the author developed a method of weighting variables in terms of their distinguishabilities [Yates, 1974 Note] that is consistent with Thurstone's original mathematical criterion for a bestfitting simple structure. It thus emphasizes equally the role of all those variables which are likely to be located within bounding hyperplanes, regardless of where their indicated complexity falls within the acceptable range from one (i.e., maximally distinguishable and, therefore, presumably factorially pure) to m1 (i.e., related to all but one factor and thus displaying minimally acceptable distinguishability). Just as the role of the Cureton and Mulaik weighting is largely to exclude from consideration during independentclusteroriented rotation all variables except those that can be expected on a priori grounds to be collinear with or very near primary factors, so the role of the weighting we proposed is largely to exclude from consideration during simple structure rotation all variables except those that can be expected on a priori grounds to fall into one or more bounding hyperplanes. Each approach makes theoretical sense, given the validity of the positive manifold assumption about the restricted range of joint action of latent determinants, but we again prefer to take advantage of all of the information about primary factor location provided by variables which fall into anywhere from one up through m1 hyperplanes, rather than being highly dependent upon a few potentially factorially pure variables. Having softsqueeze direct geomin transformation available makes it possible to employ the notion of variable distinguishability simply as a means of reducing the possible distorting influence of any variables that might be of full factorial complexity, or that might fall well outside of the boundaries of the basic test vector configuration because they are inconsistent indicators of common variation within the roughly homogeneous domain of variables under consideration. A consideration of Figure 1 and Table 1 would be useful at this point. Notice that variables h, l, and w fall into m 1 = 2 hyperplanes and have a factorial complexity of one. Variables h l w and h2 + l2 + w2 fall into no hyperplane and have a factorial complexity of three. All of the remaining variables fall into only a single hyperplane and have a factorial complexity of ml = 2. Notice, however, that the ratio variables (#13 through #18
Page 110
in Table 1) are inconsistent indicators in that they have loadings of mixed signs on the factors that jointly contribute to the domain of volumerelated measures on boxes. Hence, the latter variables could exercise a distorting influence on the overall analysis if they received much weight in transformationthey are actually located outside of the boundaries of the test vector configuration we wish to bound with hyperplanes as in Figure 1. Likewise, the fully complex variables must receive low weight in the practical application of any criterion such as (32) since they should have nonzero projections onto all factors. Let us now consider how we might arrive at a simple weighting coefficient for any given variable included in a factor analysis such that the variable will receive full weight in the application of simple structure transformation only if its presumed complexity ranges between one and m1. We wish to make use of the assumption that the major principal axis of the test vector configuration makes equal angles with orthogonal factors normal to hyperplanes bounding the test vector configuration, in line with the notion that test vectors populate positive and negative orthants centered about this principal axis. Our aim, then, is to give unit weight to any variable inclined to the major principal axis of the test vector configuration at any angle that falls between the value given in (74), for a fully distinguishable and, therefore, presumably unifactorial variable, and the complement of the angular value given in (74), which corresponds to a barely distinguishable variable having the maximum degree of complexity consistent with simple structure; i.e., presumably situated in only one hyperplane and projecting equally onto all of the m1 remaining factors. As seen in (74), the direction cosine from the major principal axis to a fully distinguishable and possibly unifactorial variable is l/ m. The direction cosine from the major principal axis to a barely distinguishable variable; possibly in the center of a hyperplane normal to a fully distinguishable variable (i.e., normal to a potential factor), is (11/m) since this is the cosine of the complement of the angle given in (74). Our task, then, is to find a coefficient that attains its maximum value whenever the direction cosine from the major principal axis to the corresponding variable falls within the interval from (1/m) to (11/m) in absolute value but that drops toward zero to the extent that the direction cosine in question falls beyond these limits to its range. In order to find the direction cosine from the major principal axis of a test vector configuration to that part of any manifest variable that is represented in common factor space, it is only necessary to normalize implicitly the common part of the variable in question and then determine its new projection onto the major principal axes; i.e., divide the projection of the original variable onto the major principal axis by the square root of its com
Page 111
munality (4). If we refer to the resulting direction cosine for the ith variable as c1, then it can be shown that the inequality
holds only when
i.e., precisely when the corresponding test vector falls within that range of angular inclinations to the major principal axis of the configuration over which the principle of simple structure should prove most applicable. If the direction cosine c, falls much beyond in absolute value, it indicates that we are dealing with an inconsistent indicator of common variation within the moreorless homogeneous domain of variation under consideration—a value of c, equal to zero would correspond to a variable which has so little in common with the domain in question that it does not project onto the major principal axis at all. Of course, the maximum value of the product,: c2i (1 c2i is reached when c2j = .5; i.e., when the variable in question makes an angle of 45° to the major principal axis of the test vector configuration. When m = 2 the right hand side of (75) becomes ¼, while the upper and lower ranges of (76) converge to the value of ½. This is consistent with our knowledge that the hyperplane normal to either orthogonal coordinate axis in twospace is simply the other axis. The twospace example is particularly relevant to our discussion, however, since it indicates the need to develop a "distinguishability" weighting system in which some weight is given to simplification of variables whose cosines of angular inclination to the major principal axis fall somewhat outside of the bounds given in (76). Otherwise, only perfectly orthogonal and factorially pure variables oriented at exactly 45° to the major principal axis satisfy (75) and (76) when m = 2.
Page 112 2
Notice that if we simply take the inequality in (75) and multiply both sides by the quantity m /(m l) we get another inequality,
which is satisfied whenever (76) holds true; i.e., whenever variable vectors fall precisely within the range of angular inclinations to the test vector configuration over which Thurstone's original criterion for simple structure (32) can be expected to apply. Notice, moreover, that for the case of m = 2 we might well entertain a distinguishability weighting coefficient corresponding to the left hand side of (77) as a way of seeing to it that only those variables that fall within the general vicinity of 45° in angular inclination to the major principal axis of the twodimensional test vector configuration enter into a simple structure transformation criterion. For m = 2 such coefficients could not exceed unity and would fall more and more rapidly toward zero as test vectors departed from 45 degrees of angular inclination to the major principal axis. The functional relation given by the left hand side of (77) is indicated in the following tabulation of values that result as test vectors are entertained that range away from 45° of angular inclination to the major principal axis in 5° steps until either collinearity with or normality to that axis is reached: 1.000, .970, .883, .750, .587, .413, .250, .117, .030, .000. We can see from the example above for m = 2 that it is quite feasible to arrive at a simple set of weights that could be used in conjunction with a simplestructure oriented transformation criterion, such as that implemented in direct geomin, in order to focus application of the criterion away from any variables that should not be expected on an a priori basis (i.e., knowing only the projections of variables onto their major principal axis) to fall at or near the boundaries of the test vector configuration. The m = 2 example also suggests a simple way to generalize such coefficients to the case of higher dimensional solutions; i.e., simply use the same formulation implied by the left hand side of (77), but with truncation of any weights that exceed unity back to that value. This leaves us with the following formulation for variable vector "distinguishability weights" based upon their direction cosines with respect to the major principal axis of the test vector configuration in mdimensional common factor space:
or, algebraically,
Page 113
It may, finally, be desirable to square the resulting weights before using them in a criterion such as (40a), both in order to improve continuity of the function of ci at the point where the left hand side of (77) exceeds unity when m> 2 and in order to provide for a more rapid decrease in weighting outside the boundaries defined by (76); i.e., where inequality (77) no longer holds because its left hand side is less than unity. As might be gathered from our m = 2 tabulation of these values as the angle of inclination of a test vector to the major principal axis departs from 45º by 5° steps, even a test vector midway between the 45° location of highest distinguishability and the major principal axis still receives a distinguishability weight ofdw1 = .500. Squaring the weights in (78) would, of course, result in a much more rapid drop away from unity with deviation from the angle of greatest distinguishability or its complement (i.e., 45° when m = 2). this is a practical issue, however, and will be left unresolved until we have dealt with a highly related question. What can be done to make use of any information provided by the location of those variable vectors that appear to be highly collinear with the major principal axis of the test vector configuration and are, therefore, of full factorial complexity'?. Complexity Weighting and the Associated Factor Contribution Matrix. Although those variables that prove to be of full factorial complexity can detract from the utility of the principle of simple structure and must therefore be given low weight in its application, they can help ensure that the bulk of the test vector configuration is actually bounded by the m 1 hyperplanes that intersect at each primary factor axis. Recall from our earlier discussion of the transformation proposed by Landahl that we can conceive of the process of bounding any test vector configuration with hyperplanes normal to a set of orthogonal axes (all making equal angles with the major principal axis of the test vector configuration) simply as "spinning" an otherwise arbitrary initial set of such axes about the major principal axis until all test vectors have been captured in the positive and negative orthants surrounding the major principal axis. Although the latter is only a conceptual model of what must be accomplished, it does point out an important role that can be played by highly complex variables in ensuring that the reference frame does not depart too far from its presumed location relative to the major principal axis of the test vector configuration.
Page 114
Highly complex variable vectors are proxies for the major principal axis itself, but the former have the advantage of being represented in the test vector configuration. Consequently, they can play a more active role in the factor transformation process than can the major principal axis itself. We shall see shortly that they also provide a source of information about the likely location of hypothetical determinants that relates directly to the positive manifold assumption that latent determinants tend to act in such a manner that their effects on manifest variables seldom cancel one another. In our quest for a geometrical fix on the location of hypothetical determinants relative to the test vector configuration to which they might conceivably have given rise in the first place, we must allow for the possibility that hypothetical determinants are themselves intercorrelated. This means that we must be prepared to deal not only with the primary factor pattern matrix of beta weights for predicting standard scores on manifest variables from their hypothetical determinants, the primary factors, but we must also take into consideration the primary factor structure matrix of zeroorder correlations between primary factors and observed variables. As far as the principle of simple structure is concerned, we can restrict our focus to the primary factor pattern matrix, since the aim of this principle is to account for each distinguishable variable through regression on fewer than m common factors. When it comes to making certain that any hyperplanes converged upon through simple structure transformation actually bound the test vector configuration, however, we can make good use of information contained in both the primary factor pattern matrix and the primary factor structure matrix. In particular, it is possible to assess the combined (joint as well as direct) effect that a particular factor has upon any given variable through the "weighted validity" [cf. Conger, 1971 Note] or "associated factor contribution" [White, 1966] coefficient which results when corresponding elements of the primary factor structure matrix and the primary factor pattern matrix are multiplied; i.e., by examining the products, sijpij, of individual elements in the matrices S and P. Such coefficients are useful for the purpose to which we propose to put them because, being individual components in the implicit sum going onto the diagonal of (7), they add up to a fixed value that is the communality of each variable under consideration. The property of associated factor contribution coefficients (weighted validities) that their sum is equal to the communality for each respective variable sets them quite apart from either factor pattern weights or factor structure projections when it comes to the task of ensuring that primary factor axes do not depart too far from the a priori restraints that they remain approximately mutually orthogonal and continue to make approximately equal angles with the major principal axis of the test vector configuration throughout any initial search for simple structure. Rigidly imposing the latter
Page 115
restriction would imply the sort of search for bounding hyperplane simple structure illustrated by the image of "spinning" factor axes through the subspace which remains free once their orientation to the major principal axis is fixed. Something similar can be accomplished with a good deal less rigidity and more practicality, however, simply by treating highly complex variables in the test vector configuration as proxies for the major principal axis with which they approach collinearity. Our aim, then, becomes one of keeping the associated factor contribution indices for highly complex variables as nearly equal as possible, since this implies maintenance of the reference frame in the location relative to the major principal axis of the test vector configuration which we desire on a priori grounds. Row elements of the associated factor contribution matrix sum to the communality for each respective variable. The magnitude of the product of a set of such values which have a constant sum will, of course, reach its maximum when they are all equal. The associated factor contributions to any particular variable will be equal, in turn, when that variable vector intersects all primary factor axes at equal angles and the primary factors themselves are all intercorrelated to the same degree (i.e., define a regular hyperpyramid). The latter is, of course, exactly the orientation of factor axes which we would like to maintain with respect to any fully complex variables; i.e., any variables collinear with the major principal axis of the test vector configuration. The contrast between the extended product of factor contributions to a given variable and their constant sum, that variable's communality, is reminiscent of the contrast considered earlier between geometric and arithmetic means of squared factor pattern coefficients. From the immediately foregoing discussion of distinguishability weighting we have seen that the objective of simple structure can be approached through minimizing the sum of geometric means of squared primary factor pattern coefficients (40) for those variables which range in angular inclination to the major principal axis of the test vector configuration from1( (m1)/m) to in absolute value. Now we see, moreover, that the aim of keeping factor axes inclined at equal angles to the major principal axis of the test vector configuration, as well as at mutually equal angles to one another, can be approached through maximizing the sum of geometric means of associated factor contribution coefficients for those variables that approach collinearity with the major principal axis of the test vector configuration. The foregoing development is appealing in its symmetry. It provides a way to make optimal use of much information in the test vector configuration about the location of bounding hyperplanes, since both the distinguishable variables and the fully complex variables are considered in the proper
Page 116
light (i.e., simple structure in the first case and configuration bounding/ orthant centering in the latter case). Not all variables can be expected to fall either precisely along the central axis or at the faces and lateral edges of any polyhedral convex coneshaped configuration of test vectors that we wish to bound with hyperplanes via simple structure transformation. Just as was the case when we considered weighting variables in terms of their distinguishabilities in the previous section, then, we must now obtain a weighting coefficient that indicates how highly collinear each variable is with the major principal axis of the test vector configuration. Application of such weights in maximizing the sum of geometric means of associated factor contribution coefficients for individual variables would thus tend to exclude from consideration any variables that depart too far from collinearity with the major principal axis. It is easy to obtain the required weights as a simple function of the direction cosines, ci, introduced in the previous section; i.e., the normalized projection of each variable onto the major principal axis of the test vector configuration. The squared direction cosine from the major principal axis to each variable vector could itself serve as an index of collinearity with the principal axis because the direction cosine is unity given collinearity and zero given normality. The only reason for not using c2i directly as an index of presumed complexity is the desire to exclude inconsistent indicators of common variation as well as fully distinguishable variables from consideration in maximizing the geometric mean of elements in the corresponding row of the associated factor contribution matrix. For this reason we might consider a simple function of the direction cosine, ci, that ranges between the value of zero for a maximally distinguishable variable and the value of unity for a fully complex variable; i.e., max {0, (c2i> l/m)/(l l/m)}. In order to develop a coefficient that is in some sense comparable to and therefore competitive with (78), when it comes to deciding whether any given variable should be simplified or made more complex, it is necessary to square the value just suggested. This gives us, algebraically,
as a ''complexity weight" which ranges from zero for fully distinguishable variables (and those beyond) to unity for fully complex variables. That (79) is comparable to (78) when m = 2 can be seen by considering a variable that falls midway between the major principal axis and the 45° angle of inclination to that axis implied by (74) for a fully distinguishable variable. Substituting the value c1 = .9239, the direction cosine from the
Page 117
major principal axis to a variable vector inclined at 22.5° to that axis, into either (78) or (79) yields the same value of .5000. The latter result seems very appropriate in this case, since our prior information does not favor either method of handling a variable that falls midway between the major principal axis of the test vector configuration and the position expected of a factorially pure variable. We have no information on the basis of which to judge whether such a variable should be simplified or made fully complex. The balanced weighting of such an uninformative variable allows it, in turn, to be effectively ignored in the factor transformation process. There are subtleties in the former line of argument comparing distinguishability weighting (78) and complexity weighting (79) of variables, of course, since the weights in one set apply to the geometric means of squared primary factor pattern coefficients, while the weights in the other set apply to the geometric means of associated factor contribution coefficients. Only in the case of orthogonal factors will these coefficients be identical (i.e., strictly comparable). Likewise, the symmetry in distinguishability and complexity weighting observed for m= 2 breaks down somewhat as higher dimensional solutions are entertained. In the case of m> 2 there is no clearcut way to ascertain when a variable is uninformative since hyperplanes and primary factor axes intersect the major principal axis at different angles of inclination. We must therefore consider other aspects of the complexity weighting method implied by (79) to see if it is justifiable when m>2. One interesting feature of associated factor contribution coefficients, in addition to the fact that they sum to the communality value for each variable, is that they are all positive in sign for any variable falling inside the boundaries defined by hyperplanes that intersect at a set of mutually positively correlated or orthogonal primary factor axes. What this means is that any test vector that relates to a set of mutually positively correlated factors in the manner implied by the positive manifold assumption (i.e., noncancellation of latent determinant effects) will also have allpositive associated factor contributions regardless of the direction in which that variable is scored. Any variable, moreover, that falls outside of one of the hyperplanes bounding a test vector configuration, but remains close enough to the configuration to have like signed correlations with all factors, will have a negative associated factor contribution coefficient. Such a variable will tend to be brought back inside the hyperplane boundaries by any procedure that maximizes the sum of products of associated factor contribution coefficients for all variables that are of less than full distinguishability. What this means, in effect, is that hyperplanes normal to reference vectors can be kept from cutting inside of the effective boundaries of a test vector configuration by choosing complexity weights, as in (79), that tend to maximize the product of associated factor contributions in proportion to how much the presumed
Page 118
complexity of any given variable exceeds one. A side effect of this approach is the tendency to maintain relative orthogonality among factors, since all associated factor contribution coefficients are necessarily positive in an orthogonal solution. Any variable that is a consistent indicator of variation within a moreorless homogeneous domain under investigation should be found to be inclined to the major principal axis of the test vector configuration at least as closely as a factorially pure variable is expected to be (74). Such a variable should also have allpositive associated factor contribution coefficients. This correspondence is another justification for the complexity weights in (79) when m³2, since any variable that falls closer to the major principal axis than a presumably unifactorial variable (itself at a lateral edge of the polyhedral convex coneshaped configuration of test vectors) receives some impetus through (79) to have allpositive associated factor contribution coefficients. This last point could bear some elaboration because we have not yet discussed the mechanics of maximizing complexityweighted geometric means of associated factor contribution coefficients, while simultaneously minimizing distinguishabilityweighted geometric means of squared factor pattern coefficients. Suffice it to say, for now, that our use of (42) to accomplish the latter suggests an analogous procedure for the former. A necessary caveat, however, is that the geometric mean weighting coefficients for associated factor contributions analogous to (43) must be based upon component elements that are taken to be nonnegative. The latter is easily accomplished in a manner highly consistent with (44) in the soft squeeze approach to direct geomin transformation simply by substituting the final value to be attained by t2, after as many successive halvings as are deemed worthwhile, for any component element in the numerator of the associated factor contribution analog of (44) which is less than t2 (including negative contributions). In practice, we have found the latter procedure very effective with the (squared, hence positive) primary factor pattern coefficients in (44) as well, since it prevents instability due to minor fluctuations in factor pattern coefficients that approach zero very closely. This is especially important when nearnull factors are included in an analysis, since the latter otherwise lead to trivial satisfaction of Thurstone's mathematical criterion for simple structure (32). The foregoing discussion has been cryptic, but its implications will be made clear once we express the details algebraically. At this point, our aim is merely to suggest how a geometric mean weighting scheme similar to that developed for primary factor pattern coefficients in connection with (42) can serve to ensure that any variable vector which falls closer to the major principal axis than a presumably unifactoriai (maximally distinguishable) variable can be further encouraged to come within the region actually bounded by hyperplanes which intersect at primary factors. Of course, the latter phrasing must be turned around in practice, since such a variable
Page 119
actually serves as a target to be included within the region enveloped by hyperplanes through the transformation of primary factor axes. The goal of eliminating any negative associated factor contributions to a variable that is presumed to be of greater than unit factorial complexity can be seen to be in line with the positive manifold assumption about the limited range of joint action of latent determinants in an interesting way. Associated factor contributions are weighted validities, as Conger points out, so the occurrence of a negative value implies that the corresponding factor is acting as a negative suppressor with respect to the dependent (manifest) variable in question. By assuming consistency in the manner in which manifest variables relate to their hypothetical determinants, then, the positive manifold assumption is equivalent to the assertion that suppressor variable effects are rare or, at least, not typical in nature. From the perspective of scientific parsimony this would seem to be a safe assumption to maintain until faced with evidence to the contrary. Direct geoplane can thus be viewed as a way of seeking a set of hypothetical determinants that could have given rise to the obtained relationships among manifest variables either through their direct independent action or through joint action which is exclusive of suppressor effects. The foregoing arguments all tend to justify the complexity weighting scheme for maximizing geometric means of associated factor contributions given in (79) for m³2. We have already seen, for m = 2, that (79) and (78) are nicely balanced in the case of a noninformative variable. The superior performance in practical applications of weighting coefficients (78) and (79), as opposed to powered versions of the same, implies that strong opposing drives toward both simplifying and "complexifying" ambiguous variables has a healthy and stabilizing effect within direct geoplane transformation. A further consideration in the development of practical procedures for simple structure primary factor transformation is the matter of the overall size and importance of individual factors in accounting for variance shared in common among manifest variables. It will be recalled from the foregoing discussion of associated factor contribution coefficients that their sum for each variable is its communality. It follows that their sum across all variables and all factors is a constant that corresponds to the total variance shared in common among the manifest variables. It follows, in turn, that the sum of associated factor contribution coefficients for each respective factor across all variables can be taken as a good index of that factor's overall size or importance in contributing to and accounting for variance shared by these manifest variables. Factor Size Scaling. We alluded earlier to the notion that latent determinants complying with the positive manifold assumption about their restricted range of joint action should contribute more or less equally to the
Page 120
variance shared by all manifest variables. This generalization stems from our assumption that the polyhedral convex coneshaped configuration of test vectors will be rather uniformly populated within its faces and about the central axis if a thorough attempt has been made to sample variables for a roughly homogeneous but multidimensional domain. The principle of simple structure itself rests on the assumption that the boundaries of the test vector configuration are highly populated and well defined. Any remaining fully complex variables that populate the inner regions of the presumed test vector cone will tend to approach collinearity with the major principal axis of the test vector configuration, which presumably is no closer to any one primary factor axis than to another. There is little reason, then, to expect primary factors to differ to any substantial degree with respect to their overall contribution to common variance within the test vector configuration. Each primary factor should simply fall along one lateral edge of a rather uniformly populated polyhedral convex coneshaped configuration of test vectors—unless, of course, we are entertaining more factors than the number of latent determinants which initially gave rise to the test vector configuration through the principle of mutual noncancellation of effects. In the latter event we would not expect to be able to place m primary factors around an m 1 dimensional polyhedral convex cone of test vectors in such a way that the configuration is closely enveloped and bounded by hyperplanes intersecting along primary factor axes of roughly equal size or importance—a null or near null factor would have to be entertained in order to effectively reduce the dimensionality of the system to m 1. We just verged on discussing what proves to be a promising rationale for determining the number of general common factors required to account for those aspects of a test vector configuration that are implied by the positive manifold assumption about the noncancellation of latent variable effects. We will postpone further discussion of the number of factors issue, however, since our main interest here is in what can be expected in terms of relative sizes of factors when we are dealing with a factor solution of correct dimensionality; i.e., of dimensionality equal to the number of latent determinants that actually gave rise to the test vector configuration through action consistent with the positive manifold assumption of noncancellation. Given the expectation that latent determinants will make comparable contributions toward shaping the test vector configuration into that characteristic polyhedral convex cone implied by their limited range of joint action (i.e., noncancellation of effects), we have the practical problem of making use of this prior information to direct the course of simple structure transformation. The task, then, is to try to maintain moreorless uniform sizes for primary factors throughout the initial stages of bounding hyperplane simple structure transformation, on the assumption that we are dealing with a
Page 121
solution of proper dimensionality. Only in the presence of convincing evidence to the contrary from the data (e.g., the inability to closely bound the test vector configuration with hyperplanes, m 1 of which intersect along each primary factor axis), should we tolerate wide discrepancies in the degree to which different primary factors contribute to common variance. in the latter case, moreover, it is preferable that as many factors as possible remain nearly equal and large in size while the others tend to vanish. Consider the index of factor size obtained by summing associated factor contribution coefficients across all variables on each factor. These indices sum, in turn, to a constant: the total common variance in the m dimensional test vector configuration. Equality of factor sizes is therefore attained when the product of these indices is a maximum; i.e., when the geometric mean of individual factor size indices reaches its maximum value, which is the arithmetic mean of the same indices. We can therefore arrive at a convenient coefficient of equality of factor sizes, which is obviously very much in the geomingeoplane spirit, simply by setting the geometric and arithmetic means of individual factor size indices in ratio to one another; i.e;
As indicated, this coefficient attains its maximum value of unity when factors make uniform contributions to common variance, in line with prior expectations given consistency in the mode of action of latent determinants and adequate sampling of variables from a domain influenced by m such determinants. Direct Geoplane Transformation Recall from our discussion in the previous section that the use of distinguishability weights (78) in minimizing the direct geomin simple structure criterion (40) can be coupled with the use of complexity weights (79) in simultaneously maximizing a sum of geometric means criterion analogous to (40), but involving associated factor contributions instead of squared primary factor pattern coefficients. Incorporating the factor size comparison coefficient on the left hand side of (80) into this scheme can also be entertained. This threeway attack on the problem of ensuring convergence of simple structure transformation onto true testvectorconfigurationbounding hyperplanes can now be summarized in the basic direct geoplane criterion:
Page 122
In minimizing (81) we can exploit much of what was presented earlier in connection with minimizing (40), since the latter direct geomin criterion is an essential component of the more complex direct geoplane criterion. Ignoring the denominators in (81) for the moment, since they should tend to be equal upon convergence, notice that we have simply subtracted a complexityweighted sum of geometric means term involving associated factor contributions from a highly comparable distinguishabilityweighted version of (40). The choice of denominator terms then stems directly from the left hand side of (80); i.e., criterion (81) is being minimized, and we wish to maximize the left hand side of (80) in order to approach equality of factor size indices. Of course, the denominator of the rightmost term in (81) is a constant, so it plays only a passive scaling role during minimization of the full direct geoplane criterion function. Before proceeding to a discussion of the actual minimization of direct geoplane criterion (81), let us return briefly to the notion of transforming primary factors a pair at a time with allowance for a hyperplane width tolerance value, t, as in (42) through (45) for direct geomin transformation. The arguments which got us from (40) to (42) through (45) for softsqueeze direct geomin can also get us from (81) to what follows for softsqueeze direct geoplane, provided that we do not allow hyperplane width considerations to influence the denominators of (81) along the way (the impact would be negligible, in any case, but there is no theoretical reason to entertain hyperplane width effects in the consideration of factor sizes):
where
Page 123
and where
with
and where
with i = max (t22, sil pil) ,
ao l
aoij = max (t21/4c
, sij pij ,
ik = (t21/4c
ao
, sik pik);
and where distinguishability weights,dw1, and complexity weights,cw1, are as defined in (78) and (79), respectively. A bit of clarication of the notation used in (82) through (85) is in order since an assortment of considerations are represented there. It should be clear that (adjusted) associated factor contributions are symbolized by the letter a with appropriate subscripts. The symbols t1 and t2 are used to designate, respectively, the initial hyperplane width tolerance and the final hyper
Page 124
plane width tolerance reached after all successive halvings employed in the soft squeeze iterative approach. Ordinarily, t1 will be a value in the vicinity of .20 and t2 will be in the vicinity of .05 (i.e., after a limit of two halvings or cuts), although this is largely a matter of sample size. The exponent c in the denominator of t21/4c thus refers to the number of halvings or cuts in hyperplane width tolerance, where each cut follows an increase in the direct geoplane function value. Since c = 0 at the outset, the value t21/4c is equal to t21, as desired, at that point; after the first cut t21/4c = (t1/2)2, and so on. The exponent c thus serves to keep track of the effect of cutting hyperplane width tolerance on the square of that value, since the latter enters directly into the direct geoplane criterion function through (84) and (85). We presume, furthermore, that the value of t22 equals the minimum value to be reached through successive hyperplane width tolerance halvings; i.e., t22 = t21/4c when c reaches the maximum number of cuts employed in a given analysis. The latter is, of course, a function of how close the user wishes to push "nearzero" factor pattern coefficients toward an exact zero value, as discussed in Chapter 3 in Part II. A final note about softsqueeze factor pair direct geoplane criterion (82) is that asterisks are used to designate those primary factor pattern coefficients and factor structure correlations subject to modification through transformation. As was seen earlier in connection with Figure 2, depicting the pairwise rotation of factor F*k in the Fj Fk plane, both the jth and the kth columns of the primary factor pattern matrix are influenced but only the kth column of the primary factor structure matrix is subject to change through this type of transformation.
Given expressions (50), (60), and (61) for transformed factor structure and factor pattern values in single plane rotation, we are in a position to consider minimizing direct geoplane criterion (82) through solution for the optimum angle of rotation of factor F*k in the Fj Fk plane. Recall from our earlier presentation of the direct geomin solution, however, that a change of variable suggested by Jennrich and Sampson is convenient because it avoids the problem of dealing with trigonometric functions of angles and sums of angles. By using ratios of trigonometric terms from the geometric approach to single plane factor rotation we can write expressions for all of the transformed coefficients involved in direct geoplane criterion (82). From (63), (66), and (50) we arrive at an expression for the transformed primary factor structure elements for factor F*k.
From (60) and (66) we get (86)
s*ik = (sij + sik / .
From (60) and (66) we get (87)
p*ik = pik
Page 125
while getting (88)
p*ij = pij pik
from (61) and (63), as used already in (67). Rewriting (82) in terms of the transformed primary factor structure and primary factor pattern elements as just expressed, we get:
Again, just as was the case in (67), we can rewrite the criterion as a function of the ratio parameter alone, since (65) gives an expression for 2 in terms of stemming from the constraint that transformed factor variance remain at unity. In (89), moreover, the terms involving in the denominator conveniently lead to its cancellation. This gives us, then, from (89) and (65):
We can symbolize an equivalent of the denominator of the first term in (90) as
Page 126
and can then go on to express all of (90) as
where
and, finally,
The task of minimizing (82) = (89) = (90) = (92) can be accomplished by setting the partial derivative of (92) with respect to equal to zero and solving for that which yields a positive value for the second derivative of the criterion function. This process is a bit more complicated than in the case of direct geomin, however, because of the factor size scaling incorporated into the denominators of (81). The first derivative of (92) with respect to can be set equal to zero for solution as follows,
We can notice immediately that the term N1 in (93) is analogous to the direct geomin criterion function as expressed in (67) and (68); the only difference being the use of distinguishability weighting in the summations required for direct geoplane transformation. This means that the term dN1/d is simply a weighted version of what we already saw in (69). The term D2 (95) is a constant and the term D1 (91) is easily computed as the geometric mean of individual factor size indices, as already seen in the numerator of the left hand side of (80). This means that the only unfamiliar terms left to deal with in (96) are dN2/d and dD1/d . The first derivative of D1 (91) with respect to can be viewed from the perspective used earlier in going from (40) to (41); i.e., writing
Page 127
we can see that the first derivative of the latter with respect to changing factor pattern and factor structure values (an ultimate function of ) is:
Recalling (83a), however, we see that the same derivative would result from an alternative expression of denominator D1 as a weighted product of individual factor size indices for factors F1 and Fk; i.e.,
which is why we chose to write it that way in (82). Expanding the terms in (99) corresponding to transformed factor pattern and factor structure coefficients yields, from (86), (87), and (88):
Page 128
The first derivative of D1 (100) with respect to is then simply:
We can likewise find the first derivative N2 with respect to after expanding those terms in N2 corresponding to transformed factor pattern and factor structure values; i.e., from (94):
The first derivative of N2 (102) with respect to is then simply:
Page 129
Substitution of (101) and (103) back into (96) then yields the full first derivative expansion of direct geoplane criterion (82). It is then only a matter of collecting terms to arrive at a polynomial expossion for the first derivative of (82) in terms of . Coefficients in the latter will not differ a great deal from those entering into (68) and (69) for the direct geomin criterion. It can be seen from (96) that the term (dN1/d )/D1 is simply related to what was already given in (69); i.e., all summations are made to include distinguishability weights and the resulting polynomial coefficients are divided by D1 with = 0, the geometric mean of factor size indices before transformation. It only remains to subtract any polynomial coefficients resulting from the terms N1(dD1/d )/D12 and (dN2/d /D2 in (96) from the corresponding distinguishabilityweighted and factorsizescaled coefficients already seen in (69) in order to convert a solution from direct geomin to direct geoplane. We know from (93) and its source in (82) and (81) that the value of N1 at the outset of any given iteraion (i.e., with = 0) is simply the distinguishabilityweighted version of the direct geomin criterion value (40). The term N1 (dD1/d )/D12is thus simply the first derivative of D1 with respect to (101) scaled by the ratio of the distinguishabilityweighted direct georain criterion value to the squared geometric mean of factor size indices. We can see from (101), moreover, that both terms in the first derivative of D1 with insect to include the factor size weighting coefficientAw (83). The latter can be regarded as having the geometric mean of factor size indices in its numerator (83b), however, which cancels the second power of D1 in the denominator of (96), and which results in the following contributions to direr geoplane first derivative polynomial coefficients due to the second half of the first term in (96):
The contribution of the final term in (96) to the direct geoplane first derivative polynomial expansion in is simply a scaling of the coefficients in
Page 130
(103) by D2, the arithmetic mean of individual factor size indices (95), which is a constant throughout rotation:
It is clear from (105) that the provision in the direct geoplane criterion (81) for maximizing the complexityweighted sum of geometric means of associated factor contributions to the communalities of individual variables has an impact only upon the constant and first degree terms in the expansion of the first derivative of (82) as a polynomial in . We have already seen that one outcome of including a distinguishabilityweighted and factorsizescaled version of direct geomin as a term to be minimized in the direct geoplane criterion (81) is simply a distinguishabilityweighted and factorsizescaled version of the direct geomin first derivative polynomial coefficients presented in (68) and (69). To the latter two results with respect to coefficients in the first derivative polynomial expansion in for direct geoplane transformation we must add the impact of factor size scaling seen in (104). Here, again, we see that only the constant coefficient and the coefficient for the first degree term of the polynomial in are affected. What the foregoing implies, of course, is that a single computer program can easily be adapted to optimize either the original direct geomin criterion (40) or the full direct geoplane criterion (81). Distinguishability weighting and factor size scaling can be easily applied to the direct geomin polynomial coefficients in (68) and (69). The only terms in the first derivative polynomial expansion in that need further adjustment to incorporate the provisions of the full direct geoplane criterion are the constant and first degree coefficients, in line with (104) and (105). Because of the convenient parallels between what we have been calling the direct geomin criterion (40) and the direct geoplane criterion (81), it is no longer necessary nor particularly informative to maintain this particular distinction in nomenclature. Consequently, we have come to think of criterion (81) as being made up of a number of different components, some of which are optional. As long as distinguishability weighting and complexity weighting are included in the respective numerators, it is clear that we regard bounding the test vector configuration with hyperplanes as an essential feature of the simple structure solution being sought using a sums of geo
Page 131
metric means criterion; hence the name geoplane. Whether or not factor size scaling is incorporated into (81) is optional and does not in itself change the fact that our aim is to bound the test vector configuration with hyperplanes. In fact, improved hyperplane fit (in terms of the number of nearzero elements in the factor pattern matrix) may well result from releasing the pressure toward equality of factor sizes; e.g., if too many factors have been postulated to start with and one or more near null factors are thus actually required. The factor size scaling option is simply a way of ensuring that common variance is spread more or less equitably among the factors up to that point where it is safe to let simple structure considerations take a predominant role in refining the location of hyperplanes. We therefore must always specify whether or not we are dealing with a factor size scaled version of direct geoplane transformation. Finally, the full direct geoplane criterion in (81) can be seen to be made up of two competitive terms or components, the first of which is to be minimized and the second of which is to be maximized. Because the first of these components of the criterion is simply a distinguishabilityweighted and (optionally) factorsizescaled version of the original direct geomin criterion, it would be convenient to give it a specific appellation. We have therefore come to refer to use of the first term in (81) alone as geoplane I and to use of the two terms together as geoplane II. Of course, either of these two criteria can be put into effect with or without factor size scaling, as desired. The motivation for expanding upon direct geoplane nomenclature in the foregoing digression will become clear when we outline a global strategy for bounding hyperplane simple structure transformation in the next chapter. It so happens that factorsizescaled geoplane II is a very effective means of moving from any initial starting position (about which we will say more later) for the primary factors to a point where the polyhedral convex coneshaped test vector configuration is thoroughly enveloped by hyperplanes. The resulting primary factors tend toward mutual orthogonality as well as toward equality in their levels of contribution to common variance. This makes it likely that the m 1 hyperplanes which intersect along each geoplane II primary factor axis will not cut inside the outer boundaries of the test vector configuration. Given the latter outcome it may then be expedient to initiate geoplane I transformation, without factor size scaling, in order to define hyperplane location more precisely through applying Thurstone's simple structure criterion only to distinguishable variables. Relaxing the pressure to maintain equality of factor sizes and low interfactor correlations at this point can reveal evidence of overfactoring through the appearance of one or more smaller factors that are highly collinear with others and permits final refinement of hyperplane location.
Page 132
We will see in the next chapter how to arrive at a theoretically appealing initial orthogonal starting configuration for geoplane II transformation, as well as how iterative reweighting of direct geoplane transformation through several successive repetitions can decrease reliance upon the major principal axis of the test vector configuration as the sole basis from which to judge the latent distinguishability and complexity of individual variables.
Page 133
6 A Global Strategy for Bounding Hyperplane Simple Structure Transformation Locating the Central Axis of a Polyhedral Convex Cone of Test Vectors In the foregoing chapter we suggested rather extensive modification of the direct geomin criterion function in order to ensure that Thurstone's original mathematical criterion for a bestfitting simple structure (32) is minimized only through the location of hyperplanes that bound the outer faces of a presumed polyhedral convex cone of test vectors. The theoretical reasons for expecting a polyhedral convex coneshaped configuration of test vectors had been introduced in the previous chapter, where it had also been shown that the principle of limited simple structure is but one implication of the positive manifold assumption about the mode of joint action of latent determinants in any roughly homogeneous behavioral domain. Mathematical implications of this assumption were taken into account by the modifications to the direct geomin criterion suggested in (81), which was referred to as the direct geoplane criterion. These modifications stemmed from the fact that we can expect the central axis of the presumed latent cone of test vectors to be roughly collinear with the major principal axis of the test vector configuration in common factor space. In this chapter our aim is to direct the line of thinking introduced in the previous chapter toward the development of a global strategy for bounding hyperplane simple structure transformation. This task is complicated mainly by the fact that the major principal axis of any actual configuration of test vectors is likely to be only an imperfect indicator of the true location of the central axis of that polyhedral convex cone whose lateral edges are aligned with latent determinants that actually gave rise to the test vector configuration. As Thurstone discovered when he first entertained the use of principal axes as the basis for factorial description of a test battery [1932], the principal axes are not invariant with respect even to slight changes in test battery composition. Indeed, his motivation for the later development of the principle of simple structure was the need for some means of determining the
Page 134
factorial composition of any given test that would not vary with changes in test battery composition, as long as variables were sampled from the same domain. Insight into this feature of simple structure can be gained by viewing it as a consequence of the positive manifold assumption about the limited range of joint action of latent determinants. The principle of noncancellation of latent determinant effects implies the production of a polyhedral convex coneshaped configuration among the common parts of manifest variables. Adequate sampling of variables from a roughly homogeneous domain ensures that, in addition to the interior, all faces of the configuration of test vectors will be highly populated and therefore well defined. It does not matter, then, which particular variables falling along the faces of the cone of test vectors are sampled, as long as enough of them are included in a battery to determine each face when it is ultimately bounded by a hyperplane during simple structure transformation. Since primary factor axes are located at the lateral edges of the cone of test vectors, where hyperplanes bounding the faces of that test vector configuration intersect, the location of primary factors will be essentially invariant with respect to sampling of test vectors as long as enough tests are sampled to determine with accuracy the faces (boundaries) of the test vector configuration. Since a hyperplane in mspace is m 1 dimensional, it takes m 1 distinct test vectors to determine such a hyperplane in the ideal, errorfree case. The latter is the justification for Thurstone's [1935, p. 156] second verbal rule for judging the probable uniqueness of an obtained simple structure; i.e., there must be at least m test vectors in each face of the coneshaped configuration of test vectors for that face to be overdetermined. Thurstone's third rule for judging the probable uniqueness of an obtained simple structure solution can be seen as a way of determining whether the lateral edges of its presumed underlying structure are distinctly identifiable; i.e., he attempts to ascertain that some test vector falls near every lateral edge of the latent cone by determining that at least m test vectors fall near an edge of each respective face. Such an inference is uncertain, however, and that quite likely explains why Thurstone eventually began to check for factorially pure indicators of each factor. Of course, Thurstone's first rule for judging the probable uniqueness of an obtained simple structure solution can be taken to mean that every test vector must fall into at least one face of the polyhedral convex cone of test vectors. However, there is no justification for expecting the latter to be true for all test vectors, as Thurstone readily admits [1935, p. 156], since the central region of the cone of test vectors can also be expected to be populated. Any highly complex tests that occupy a region near the central axis of the cone of test vectors are merely of no use in determining a reference frame from the simple structure perspective, so they are ignored in that de
Page 135
velopment. One thing we can take Thurstone to have meant by his first rule for judging the probable uniqueness of an obtained simple structure solution is that enough test vectors must be sampled in any study to ensure that all faces of the presumed latent polyhedral convex cone of test vectors are highly populated. In the actual practice of sampling variables from a multidimensional domain of variation, it is very easy to defeat our original purpose of discovering hypothetical determinants of manifest covariation. Rather than sampling many variables that fall at or near the important lateral edges and faces, we are very likely to concentrate our sampling of variables within a region near the central axis of the polyhedral convex cone we ultimately wish to populate with test vectors. Little more than this could be expected at the outset of any search for latent determinants about which little or nothing is known except that they might play a role in producing covariation within a domain of manifest variables which seem to hang together in some sense or other. To make matters even worse, moreover, our attempt to sample from different regions within the same domain of variation often results in concentrations or clusters of nearcollinear variables—variables that are simply minor variations on one another due to such superficial manifest facets as format, speededness, or method of measurement. Such an approach is not likely to result in an even distribution of test vectors about the faces of a latent polyhedral convex cone the lateral edges of which correspond to hypothetical determinants which, in turn, (should they be identified) might aid us in our theoretical understanding of possible underlying bases for observed variation. Indeed, given the extreme sensitivity of conventional factor extraction and factor rotation procedures to collinearity among manifest variables, a clustered sampling of variables is likely to lead us to get from a factor analysis nothing more than we have put into it. Not only do we learn nothing new by this approach, but our preconceptions (quite possibly mistaken) are reinforced by the analysis. One of the appealing things about the relative insensitivity of bounding hyperplane simple structure transformation to variation in test battery composition (i.e., the phenomenon of factorial invariance) is the fact that even the biased and incomplete way in which we typically sample variables from any given domain of variation can be expected to lead to the identification of some useful hypothetical determinants, as long as the number of variables sampled greatly exceeds the dimensionality of the underlying system of coplanarity being sought, and as long as we can get around the superficial but distracting problem of extensive collinearity (clustering) at the manifest level. Since the location of the major principal axis of the test vector configuration is quite sensitive to superficial collinearities as well as to other minor variations in test battery composition, a better indicator of the central axis of
Page 136
the latent cone of test vectors is essential for effective implementation of direct geoplane transformation. The issue of how to eliminate the distracting influence of superficial collinearities among manifest variables, as well as the issue of how to assess latent dimensionality with relative freedom from distraction due to biased and incomplete sampling of variables, will be taken up in Part IV, where we will discuss factor extraction or fitting the common factor model to observed data. At this point our concern is that of locating the central axis of a given test vector configuration within common factor space without placing too much reliance upon the major principal axis for that purpose. The positive manifold assumption that latent determinants tend to act either independently or in concert, but not in opposition to one another, leads to the conclusion that the common parts of test vectors can be expected to populate the central region, faces, and lateral edges of an m — dimensional polyhedral convex cone in hyperspace. By taking the further step of assuming that the lateral edges of this cone (which correspond to latent determinants or primary factors) are mutually orthogonal, we find that knowledge of the location of its central axis yields much valuable information about which variables can be expected to comply with Thurstone's original definition of simple structure. Such ''distinguishable" variables must be located far enough away from the central axis of the cone to fall into at least one of its faces, but not far enough away from the central axis to fall beyond any of its lateral edges. Were it not for its sensitivity to idiosyncracies in the sampling of manifest variables, the major principal axis of the test vector configuration would not be a bad indicator of the locus of the highly informative central axis of the presumed latent cone of test vectors. It is usually good enough, in fact, to provide initial estimates of the distinguishability (78) and complexity (79) weights for variables required in direct geoplane transformation. Once we have a solution for the location of primary factors resulting from, say, factorsizedscaled direct geoplane II transformation, however, we are in a position to get improved estimates of variable distinguishabilities and complexities. The possibility of carrying out iteratively reweighted direct geoplane transformation is thus opened to us. Once we have available a solution for the location of primary factors that fall at the intersections of hyperplanes approximately bounding the faces of a configuration of test vectors—the outcome of factorsizescaled direct geoplane II transformation using distinguishability and complexity weights derived from the projections of variables onto the major principal axis—how can we arrive at a new estimate of the locus of the central axis of the latent cone of interest? What we are looking for, in fact, is the locus of the central axis of the latent cone of all test vectors that could possibly be generated by the true latent determinants of manifest variation which we are seeking if
Page 137
they acted according to the principle of mutual noncancellation of effects. By making the further assumption that these latent determinants are mutually orthogonal and equally influential, we could determine the central axis of the test vector cone they would generate simply as the axis inclined at the same angle of cos1(1/ m) (74) to all of these primary factors. Remember, now, that factorsizescaled direct geoplane II transformation is inclined to produce primary factors that do not depart too far from mutual orthogonality and that all tend to be of about equal size or importance in contributing to common variation, assuming that a uniform test vector distribution has been achieved through representative sampling of variables. Were we, then, to find that set of mutually orthogonal factor axes that are as close as possible to the direct geoplane primary factors (e.g., in the sense of having maximal correlation between corresponding factors in the two systems), it would seem to be a trivial matter of using equal direction cosines from these axes to locate the central axis of any uniformly populated cone of test vectors falling wholly within the allpositive and allnegative orthants defined by these orthogonal bounds [Hall, 1971] to the direct geoplane factors. Taking the equiangular centroid, as just suggested, of a set of orthogonal axes placed as closely as possible about a corresponding set of primary factor axes as the new central axis from which to assess the distinguishabilities and complexities of individual variables can then be expected to yield a set of weights that are optimal for recovery of this same orthogonal set of axes as the outcome of successive simple structure transformation, provided that the corresponding hyperplanes do indeed bound the test vector configuration and provided that the principle of simple structure is satisfied for all distinguishable variables. It is therefore reasonable to follow the practice of taking the orthogonal bounds to the available primary factors as a starting configuration for the next iteration of factorsizescaled direct geoplane II transformation, as well as to use these orthogonal bounds as a way of getting new distinguishability and complexity weights for individual variables. In fact, we have found it a good practice to restart the entire iteratively reweighted softsqueeze direct geoplane transformation sequence from the orthogonal bounds approximating the outcome of the previous such cycle and to repeat this process until global convergence occurs. Not only does this assure convergence to a good final set of distinguishability and complexity weights, but it means that the final location of primary factor axes is recoverable when starting from the nearest set of orthogonal bounds. The latter makes the appearance of intercorrelations among primary factors quite convincing, since it is clearly required in order to satisfy the principle of simple structure, above and beyond the principle of noncancellation of latent variable effects (the latter principle is satisfied equally well by the factors at their orthogonal bounds); otherwise, the factors would remain orthogonal.
Page 138
Transformation to Orthogonal Bounds. The process of transforming a set of primary factors to the closest corresponding set of orthogonal axes can be viewed in much the same light as the process of transforming primary factors to the corresponding set of reference vectors. Recall from (28) that the transformation that takes a set of primary factors into the corresponding set of reference vectors is simply the inverse of the matrix of correlations among primary factors, the columns of which have been scaled to yield equations for residual variates of unit variance (i.e., standardized antiimages of the primary factors). If we rewrite (28) in terms of the eigendecomposition of the primary factor intercorrelation matrix, C = Q L 2 Q’, we get an expression, (106)
G = F Q L 2 Q’ D1/2
which suggests what is implicitly going on geometrically when we transform primary factors to the corresponding set of reference vectors. It is obvious that the initial portion of the transformation in (106), which takes the primary factors into their principal components, F Q, yields a new set of variates in canonical form with variancecovariance matrix (107) The next portion of the transformation in (106) can be seen, however, to yield a set of canonical variates, F Q L 2, which have reciprocal variances relative to the original set of principal components; i.e., (108) . It is these rescaled variates that are then finally taken back from their canonical orientation toward the general vicinity of the original primary factors via the direction cosines relating primary factors to their principal components in Q', in order to get the antiimages or unpredictable parts, F Q L 2 Q' = F C1, of the primary factors. The variancecovariance matrix among these unpredictable or residual parts of the primary factors,
.
Page 139
suggests the final diagonal scaling matrix D½ , where D1 = Diag. {C1}, used in (106) and (28) as a means of coming out with reference vector scores which have unit variance. Returning now to the implicit rescaling in (106), that occurs once primary factors have been transformed to their principal components, just before the direction cosines in Q‘ are applied to linearly recombine these components for each individual variate (factor), we can notice that the rescaling of F Q by L 2 places great emphasis upon what are originally the smaller eigenvalues of C. In a sense, then, this rescaling implicitly used in the process of getting reference vectors from primary factors via (106) inverts the roles played by the major and minor principal axes of the distribution of individuals with respect to their common factor scores. An alternative is to equalize the roles played by all axes in order to obtain a new set of axes occupying an intermediate location between the primary factors and their corresponding reference vectors. If we transform the original primary factors to their principal components, which of course have variance covariance matrix L 2, as in (107) but then rescale these canonical variates to have unit variance so that they are all equally emphasized, we then obtain the scores F Q L 1 which have the following variancecovariance matrix, as desired: (110) If we then apply the direction cosines in Q’ to return to a set of variates in the vicinity of the original primary factors (and reference vectors, necessarily, since they are linked in pairwise fashion via (29)), we get an interesting and useful outcome somewhat analogous to (106): (111)
Fb = F Q L 1 Q’ = F C1/2
where C1/2 is the symmetric square root of C. Since the variancecovariance matrix among the transformed factors in (111) is the identity matrix, (112)
1
2
1
= Q L Q’ Q L Q’ Q L = Im ,
we have come out with a set of orthogonal factors that approximate the location of the original primary factors. These are, of course, the ortho
Page 140
gonal bounds discussed by Green [1952], Gibson 11962], Johnson 11966], SchÖnemann [ 1966], Kaiser [ 1967], Bentler [ 1968], Hall [ 1971 ], and Price and Nicewander [1977]. One interpretation of the method of orthogonalizing primary factors suggested in (111) is that we have succeeded in eliminating any original correlation among primary factors in C since that correlation is what initially led to eigenvalues in L 2 different from unity; i.e., we have effectively set the latter back to unity, as shown in (110). The outcome in (111) can be reached in several different ways, however, including solving for that set of scores on orthogonal factors that are maximally correlated (pairwise) with the corresponding primary factors [Kaiser, 1967]. From the latter perspective, it is interesting to find that the matrix of correlations between the original primary factors and their orthogonal bounds is, from (111), (113) a symmetric matrix which is the unique square root factorization of the original matrix of intercorrelations among the primary factors. The projections of the original variables onto the orthogonal bounds for any particular set of primary factor axes can readily be found from (111) as: (114) in terms of a transformation of the primary factor structure correlations in S (5a), or as (115)
B = P C Q L 1 Q’ = P C1/2
in terms of a transformation of the primary factor pattern weights in P, based upon the known relationship between matrices P and S (5b). Returning now to the role played by the transformation of primary factors to their corresponding orthogonal bounds in iteratively reweighted direct geoplane transformation, we can use (115) to convert the outcome of any given direct geoplane solution for the primary factor pattern matrix P, and factor intercorrelations, C, to the associated matrix of projections of variables onto a set of mutually orthogonal factors located as close as possible to the primary factor outcome. To the extent that the direct geoplane solution has located primary factors at the intersections of hyperplanes that bound the presumed cone of test vectors, the latter configuration will be
Page 141
fully contained within the positive and negative orthants of an orthogonal coordinate system that optimally approximates the location of the primary factors, the orthogonal bounds. However, the hyperplanes defined by orthogonal bounds to the primary factor system are less likely to cut inside the outer boundaries of the test vector configuration than are hyperplanes intersecting at factors making acute angles to one another. Likewise, the orthogonality constraint makes the orthogonal bounds a safe and stable base from which to undertake further refinement of hyperplane location—especially given our a priori notion that primary factors should tend to remain highly distinguishable (i.e., approximately mutually orthogonal) throughout transformation, unless there is substantial empirical support for their intercorrelation in the data. Iterative Recomputation of Distinguishability and Complexity Weights. The orthogonal bounds (115) provide a maximally distinguishable version of the outcome of any application of direct geoplane transformation. As such, they provide a stable basis from which we can reestimate the locus of the central axis of the presumed polyhedral convex cone of test vectors, for use in recomputation of the distinguishability (78) and complexity (79) indices for each individual variable before reiteration of the softsqueeze direct geoplane transformation method. As we mentioned earlier, the equiangular centroid of the factors at their orthogonal bounds can provide a new set of direction cosines, ci, to be used in (78) and (79) to determine iterated distinguishability and complexity weightings for individual variables. The resuiting weights will, in turn, be optimal for maintenance of the factor axes in their position at the orthogonal bounds to the foregoing set of primary factors, provided that the principle of simple structure is satisfied in that orientation for all distinguishable variables. Redesignation of those variables approximating collinearity with the equiangular centroid of the orthogonal bounds as fully complex will likewise tend to preserve the orthogonal, highly distinguishable location of primary factor axes in subsequent geoplane II iterations. Having introduced the notion of iteratively reweighted geoplane transformation with recomputation of variable distinguishabilities (78) and complexities (79) on the basis of the direction cosines to individual variables from the equiangular centroid of the orthogonal bounds to the outcome of the previous solution for primary factors—coupled with use of that same orthogonalbound solution as an initial configuration from which to undertake further application of geoplane transformation—it is now in order to fill in some details about the factor size scaling option in geoplane transformation which we chose to neglect in our earlier treatment of (82) through (85).
Page 142
Iterative Reweighting and Factor Size Scaling. We discussed the implications of using distinguishability weights,dw1 (78), and complexity weights,cw1 (79), in the numerators of (81) and (82), but neglected to mention what role, if any, such weighting coefficients should play in the corresponding denominators which provide for equalization of factor size indices. A second look at (81) will reveal, however, that it does not make much sense to include only highly distinguishable variables in computing the numerator of the first term while including all variables in computing the corresponding denominator. Likewise, it does not make much sense to include only the highly complex variables in computing the numerator of the second term while including all variables in computing the corresponding denominator. The appropriate variable weighting procedures for computing the denominators of (81) do not strictly parallel those for the corresponding numerators, however, but must be arrived at through a consideration of the original index (80) of equality of factor sizes which we desire to be at a maximum commensurate with effective bounding of the test vector configuration with hyperplanes. What we actually aim to accomplish through incorporating factor size scaling into the direct geoplane II criterion (81) seems at first sight to be asking far too much. If we are dealing with the correct number of factors, m, we would like for all factors to be of roughly the same size or importance— at least through the iteratively reweighted and factor size scaled softsqueeze geoplane II phase of transformation; after that, when further refinement of hyperplanes is turned over more fully to the principle of simple structure via distinguishability weighted softsqueeze geoplane I transformation, factor size scaling can be dropped. If we are dealing with too many factors, however, we would like all but the extraneous factor(s) to be of roughly the same size or importance, while permitting one or more nearnull factors to emerge. These goals should be achievable through the use of the positive difference between distinguishability and complexity weights for individual variables as their composite weight in the summations going into both denominator terms in (81) and this appears to be effective in practice; i.e., we minimize the slightly revised geoplane criterion
where e
w1 = max {(dw1 c w1, 0};
Page 143
or, algebraically,
In other words, we are interested in achieving near equality of factors in terms of "size" only as indexed by their contribution to the variance of clearly distinguishable variables. In relation to the implications of the positive manifold assumption about the mode of joint action of latent determinants, this variety of factor size scaling is equivalent to asserting that the segment of the presumed polyhedral convex cone of test vectors located near each lateral edge of the test vector configuration (i.e., near each primary factor isolated at the intersection of m 1 hyperplanes) should be roughly equally densely populated with test vectors, regardless of which factor is under consideration and regardless of how said vectors are distributed near the central axis of their presumed cone of occupation. By indexing factor sizes only in terms of their contributions to highly distinguishable variables (with that distinguishability measured in relation to the orthogonal bounds approximating the primary factors arrived at through previous applications of iteratively reweighted and factorsizescaled softsqueeze direct geoplane II transformation), we automatically provide for the appearance of one or more nearnull or clearly redundant factors through the course of iteratively reweighted direct geoplane II transformation if the test vectors span a space of dimensionality less than m. In such a case, there will not be m highly distinguishable regions in the factor space all occupied by test vectors. The lack of highly distinguishable test vectors in the vicinity of any orthogonal bound approximating a previously located primary factor will, in turn, leave no variable with a positive weight differential in (117) which can serve in the denominator of the first term in (116) to prevent further reduction in the size or distinctness of that particular factor. Only if there are indeed m highly distinguishable regions in the common factor space that are all occupied by test vectors will the weighted factor size scaling incorporated into the denominators of (116) encourage all m factors to contribute equally to the variance of distinguishable variables. Short of this, one or more factors will tend to become null—placing even more pressure on criterion (116) to maintain equality in the weighted size indices for the remaining factors. This is, of course, exactly the outcome we desire. One of the more serious defects of most analytic factor transformation methods, whether orthogonal or oblique, is the tendency to spread common variance throughout the full mdimensional working space provided, even though the principle of simple structure could be effectively satisfied in a lowerdimensional subspace. On the grounds of scientific parsimony, of course, we should not "multiply entities without necessity." More to the
Page 144
point, however, is the importance of achieving factorial invariance not only with respect to the sampling of variables from a domain, but with respect to the number of factors entertained within a given analysis. The general weakness of analytic transformation methods just mentioned (especially independentclusteroriented methods) is due to the fact that there are always more distinct clusters possible than the dimensionality of the subspace actually required to represent every cluster in terms of its own unique pattern of loadings within a true simple structure solution. Hence, there is a tendency to split factors apart in order to populate any available space with its own cluster of test vectors, leading to an illusory sort of "ease of interpretation" in which the entire solution changes drastically with changes in the number of factors extracted. The weighting method for factor size scaling introduced in (116) and (117) for iteratively reweighted geoplane II transformation, on the other hand, can actually reveal to the user that too many factors have been extracted, as we will see shortly for Thurstone's 26variable box problem—a notoriously difficult challenge for conventional analytic transformation methods even with the correct specification of dimensionality. In connection with the possibility of employing direct geoplane criterion (116) for determining the effective dimensionality of a factor solution, we must clarify the global strategy already alluded to of employing iterativelyreweighted and factorsizescaled softsqueeze direct geoplane II transformation to get an intermediate solution that can then be turned over to iterativelyreweighted (but not factorsizescaled) softsqueeze direct geoplane I transformation to accomplish final refinement of hyperplane location. The first thing to clarify is that we have no proof—aside from our general observation of the fact and its apparent likelihood from an intuitive geometrical appreciation of the procedure—that iterativelyreweighted direct geoplane transformation will always lead to convergence at a stationary point, let alone any sort of global minimum. In practice, however, convergence is usually seen within a few repetitions of the overall sequence, as can be gauged most easily by the stabilization of factor size indices and factor intercorrelations (summarized, e.g., by the determinant of the matrix of intercorrelations among primary factors). Our global strategy, then, is to follow convergence of iteratively reweighted and factorsizescaled softsqueeze direct geoplane II transformation with an iteratively reweighted version of softsqueeze geoplane I transformation without factor size scaling. We can notice from (81), however, that geoplane I without factor size scaling is simply a distinguishabilityweighted version of the original direct geomin criterion (40). The latter, in turn, is closely related to Thurstone's original mathematical criterion for a bestfitting simple structure (32), so the ultimate outcome of our global transformation
Page 145
strategy is satisfaction of the simple structure criterion for distinguishable variables. Because of the possibility that one or more factors will tend to account for little or no common variance at the conclusion of the iterativelyreweighted and factorsize scaled (116) softsqueeze direct geoplane II transformation sequence, as suggested above for the case of initial overestimation of the number of factors, we have found it useful to take this possibility into consideration when computing new variable distinguishabilities via (78) for use in the application of geoplane I transformation. For geoplane II the value c, used in (78) and (79) is taken to be the direction cosine to each variable from the equiangular centroid of the orthogonal bounds that approximate the available primary factors (115). Rather than basing the direction cosines required in getting distinguishability weights (78) for iterativelyreweighted softsqueeze direct geoplane I transformation upon the equiangular centroid of orthogonal bounds that approximate the intermediate outcome from previous application of iterativelyreweighted geoplane II, however, we have found it expedient to entertain a weighted centroid that takes into account the relative sizes of factors in the configuration at the orthogonal bounds (115). In this way the role of any nearnull factors is diminished in determining the distinguishability of individual variables for further application of the principle of simple structure via geoplane I transformation. if we take as an index of the size of each orthogonal factor resulting from (115) the sum of squares of all loadings upon it, norm these indices to sum to unity across all factors, and then take their square roots, we have available a set of direction cosines,
which can then be applied to obtain weighted centroid projections for individual variables. The weights given in (118) place greater emphasis upon larger factors in computing direction cosines from a weighted central axis to individual variables as follows:
The scaling by l/h, in (119) serves to make c, the direction cosine to the ith variable from the weighted centroid by implicitly norming the common part of the ith test vector to unit length.
Page 146
The only difference between the values of c, given by (119) for substitution into (78) to get revised distinguishability weights for individual variables and the equiangular centroid values required for geoplane II is the use of differential weighting via (118) instead of setting all wj = 1/ m. Of course, the occurrence of one null factor while the rest remained equal would yield one zero wj from (118), while the remaining m 1 would equal 1/ (m1). A potentially substantial impact upon the value of c1 could thus result. This brings up an interesting point, however, since the occurrence of such a situation, in which m 1 factors are essentially equal in size while the mth factor is null, implies that the computation of distinguishability weights via (78) should proceed as though the number of factors is m 1 instead of m. In order to get a general indication of the number of ''effective" factors that should enter into computing distinguishability weights for geoplane I via (78), even when that number is not an integer because of an uneven distribution of common variance among the factors in B, let us consider the value
where the wj 's are those derived from (118). The use of weights from (118) in (119) to get values for c1, and in (120) to get the estimated effective "number of factors", followed by use of these values in (78) to get variable distinguishabilities, will result in the complete irrelevance of any null extra factors in the case where all nonnull factors present are of equal size, and will likewise give reasonable results for less extreme or more variable cases. We have found that this approach to obtaining revised distinguishability weights in iterativelyreweighted softsqueeze direct geoplane I transformation typically results in smooth convergence and that any progress made initially by iterativelyreweighted and factorsizescaled direct geoplane II transformation in eliminating an extra factor from consideration is generally carried toward completion by geoplane I. In this way, our global iteration strategy is made largely invariant with respect to the inclusion of extra, null factors. In practice, however, any extra factors included at the extraction stage will not be completely null, but will simply be small and highly error contaminated. In this case it is even more important to have available a transformation technique that will not tend to spread common variance onto extra factors just because the space is available. An Orthogonal Initial Configuration for Direct Geoplane Transformation Before moving on to numerical examples, one final consideration must be taken up in order virtually to guarantee the location only of those hyperplanes
Page 147
that fully and effectively bound a test vector configuration via iterativelyreweighted and factorsizescaled (116) softsqueeze geoplane II transformation followed by iterativelyreweighted softsqueeze geoplane I transformation. This is the issue of choosing a good initial orthogonal solution from which to launch iteratively reweighted geoplane II transformation in the first place. The appropriate choice of an orthogonal initial configuration from which to launch hyperplane search via oblique analytic transformation of factors cannot be overemphasized. In discussing the relationship between Thurstone's principle of simple structure and the positive manifold assumption regarding noncancellation of latent variable effects, we pointed out that the principle of simple structure can only be expected to aid in refining the location of bounding hyperplanes once we have managed, through some other means, to isolate the entire configuration of test vectors roughly within a pair of opposing orthants of the coordinate system. Moreover, in order to make effective use of distinguishability and complexity reweighting (through reference to direction cosines to individual variables from the equiangular centroid of primary factors at their orthogonal bounds, as just discussed for iterativelyreweighted direct geoplane transformation), we must make certain that the opposing pair of orthants in which we have isolated the test vectors are the positive and negative orthants of the coordinate system. This last restriction is due to the fact that taking the equiangular centroid of primary factors at their orthogonal bounds to represent the central axis of the presumed polyhedral convex cone of test vectors is only possible if the factors are properly reflected so that the test vector configuration is isolated within the positive and negative orthants of the coordinate frame. What we are looking for in an orthogonal initial configuration then, is the isolation of the bulk of the test vectors fully within the positive and negative orthants of the coordinate system of factor axes. It will suffice, however, if we can simply find some means of ensuring that the entire test vector configuration is confined to any given pair of opposing orthants, since we can always reflect factors to get the configuration into the positive and negative orthants once it is isolated within any pair of opposing orthants. After much experimentation with available orthogonal factor rotation methods, as well as many attempts to develop alternative criteria, we found that any formulation that works directly with inner products of the powered elements in pairs of columns of the orthogonal factor pattern matrix (e.g., variations on the quartimin theme (33) such as quartimax or varimax) is not a likely candidate for production of the variety of initial configuration we desire. Such criteria tend to yield independent cluster solutions and thus do not tolerate the accumulation of many test vectors all within the same orthant of hyperspace (a concomitant of factorial complexity). In order to hasten the accumulation of test vectors into any given pair of
Page 148
opposing orthants in hyperspace what we should undoubtedly attend to, rather than independent clustering notions, is the fact that only highly distinguishable variables will tend to be unifactoral in any orthogonal solution wherein the test vector configuration is properly confined to a pair of opposing orthants. We have already seen in what led up to (78) that the potential distinguishability of variables can be indexed by their degree of angular inclination to the major principal axis of the test vector configuration. We should certainly make use of this information, but there is another, more specific way to gauge just how distinguishable any given pair of variables are from one another as they are represented in common factor space. If we consider normalizing all variables to unit length, simply by standardizing the matrix of m factor reproduced correlations (3) among original variables, then the weights
(where the entry r’ij> refers to the reproduced correlation between variables i and j) range from zero for a pair of perfectly collinear variables to unity for a pair of variables that are orthogonal and, hence, maximally distinguishable from one another within the common factor space. By ensuring that highly distinguishable pairs of variables appear on different factors in using any weighted criterion involving inner products of squared loadings in pairs of columns of the orthogonal factor pattern matrix, we should be able to accomplish our aim of confining all test vectors to a pair of opposing orthants of the coordinate system—any less than fully distinguishable variables are presumably factorially complex and will thus be located in the faces or near the central axis of the cone of test vectors. Thus, if the mutually highly distinguishable variables at or near the lateral edges of a polyhedral convex coneshaped configuration of test vectors are made to fall near distinct axes in a coordinate system, then all the more factorially complex variables will automatically be brought within a pair of opposing orthants of the coordinate system. It turns out that exactly the form of orthogonal factor transformation criterion that the foregoing line of argument leads to has been available in the literature for years [Comrey, 1967]. Unfortunately, it has not yet seen wide usage because of the popularity of clusteroriented transformation methods. Comrey's "criterion II" is designed to ensure that pairs of individual variables that are not correlated in the common factor space (i.e., are mutually distinguishable) appear on separate factors. We can write a version of criterion II that applies to any individual pair of orthogonal factors as follows,
Page 149
where the coefficients a*ik and a*jk refer to rotated loadings of variables i and j, respectively, on the k th orthogonal factor and where the corresponding rotated loadings of these variables on the lth orthogonal factor are designated as a*il and a*j l. This expression differs from that given by Comrey [1967, p. 145] only in that we have explicitly normed the weighting coefficients to range from a value of zero for full collinearity to a value of unity for complete distinguishability between the common parts of each pair of variables.
One potential problem in the minimization of criterion II (122), which was duly noted by Comrey, is its tendency to spread common variance rather evenly across factors even though the original objective of simply aligning highly distinguishable variables with distinct factor axes could be satisfied just as well while leaving one or more essentially null factors. In other words, criterion II tends to overdo its task of gathering all of the test vectors into a pair of opposing orthants, as defined by the transformed coordinate axes, and may, when the true dimensionality has been overestimated, wind up with a concentration of test vectors around the central axis spanning the opposing pair of occupied orthants. Criterion II thus tends to split large general factors up whenever dimensionality is overestimated, rather than tolerating wider variation in factor sizes. A slightly modified version of criterion II that is somewhat less prone to disorientation when dimensionality is overestimated can be arrived at simply by computing a weighting for each pair of variables and factors, analogous to (121), as though only two factors are involved in the solution. Thus let us define the revised factor pair criterion as:
Brief reflection will reveal that (123) is identical to (122) when only two factors are involved, but that when m>2 we confine our assessment of how distinguishable vs. collinear any particular pair of variables are to their projection onto the plane of the pair of factors under consideration. In this way, any pairs of variables that project onto the plane defined by a pair of factors
Page 150
as though they are collinear will receive no impetus to project onto more than one factor. If such a pair of variables is not mutually collinear in the full m dimensional common factor space, however, then use of the original criterion II weights (121), as in (122), would provide impetus to account for their apparent collinearity within the reduced space of two dimensions through joint projections onto both factors; i.e., by spreading their variance across factors. This phenomenon occurs because the weighted criterion being minimized involves squared inner products of factor loadings. A pair of variables which appear collinear within the reduced space of two dimensions could, for example, have either of the following two sets of (normalized) coordinates:
l
k
l
.7
0
i
.5
.5
.7
0
j
.5
.5
k i
j
or
If any weight whatever is applied to minimize the sum of squared products term (aik2 ajk2 + ail2 aj l2), it is clear that (.54 + .54) 33 Conduct (G)
9
3
15
3
Primary Factor
FACTOR CORRELATIONS I I
H
100
III
16
20
Correlations
II
16
100
34
III
20
34
100
*
decimal points omitted
Page 245
Figure 11. Extended Vectors Plot of Three Direct Geoplane Transformed CollinearityResistant Factors of Lord's Data
likelihood factor analysis, relying upon an even more limited subset of fifteen tests for this purpose. From the extended vectors plot in Figure 11, as well as from an examination of Table 9, it is not difficult to identify a "verbal" factor (near variable #2, the verbal admissions test), a "spatial" factor (near variable #9, the spatial relations admissions test), and a third "perceptual speed" factor (near variable #25, the cancellation reference test). However, the hyperplanes intersecting along each primary factor are not as well defined as might be desired—especially the hyperplane which is common to the verbal and spatial factors. The latter "hyperplane" reveals no complex tests fanning out into intermediate locations between the verbal and spatial factors, so we cannot be confident that its location will remain invariant with respect to changes in test battery composition. Nevetheless, the information provided about Lord's test vector configuration by Figure 11 is worth discussing be
Page 246
cause two of the boundaries of that configuration seem to be fairly well defined, and it is not too difficult to imagine why the third is not, as we shall see shortly. In any case, it would not be prudent to attempt further reduction of dimensionality since there are clearly quite a few variables relating to all three of the Table 9 factors, being located well within the central region of the test vector cone as seen from above in Figure 11. Notice that our earlier conjecture, based upon an examination of factor intercorrelations in the overfactored battery, that the cluster of arithmetical reasoning tests might well fall near the central axis of the test vector cone is confirmed by Figure 11. The arithmetical reasoning cluster consists of tests #16 through #22, and we have circled these tests in Figure 11 to indicate that they share excess linkage even beyond the high degree of collinearity evident in Figure 11, having left sizeable local dependency outlier residuals in Table 8. Performance on the mathematical and arithmetical reasoning tests in Lord's battery must therefore be regarded as the complex outcome or effect of all three hypothetical determinants which presumably gave rise to or "caused" the trihedral convex coneshaped configuration seen from above in Figure 11, as well as a minor residual factor which they alone share due to superficial overlap or redundancy in format, content, method of administration, or whatever. Notice, however, that course grades in mathematics (#32) reflect a simpler pattern of determination which will be discussed later. A possible reason for the absence of tests at intermediate locations within the hyperplane common to the verbal and spatial factors is simply the lack of any tests in Lord's battery requiring both verbal and spatial skills in the absence of any demand for speed or fluency of performance. Such a test presumably would require manipulation of symbolic (i.e., meaningful figural) material in order to satisfy criteria of correctness that do not involve time constraints. The latter inference is based in part on the results of analyzing various other sets of cognitive data. Nevertheless, it is clear from the configuration in Figure 11, the loading pattern in Table 9, and the residuals in Table 8 that performance on Lord's arithmetic reasoning tests is highly complex in terms of hypothetical determinants. Had performance on any of these tests been a function of accuracy alone rather than speed (or a combination of speed and accuracy) then it is likely that the latter would be found closer to the hyperplane in question. The results presented in Table 8, Table 9, and Figure 11 reveal that the strong clustering seen among Lord's arithmetic reasoning test scores at the manifest level does not indicate either factorial purity or the existence of a "unitary trait"—as one might be led to believe by a clusteroriented approach to factor analysis. On the contrary, such manifest clustering has apparently been brought about through the joint action of three broad, gen
Page 247
eral, and quite possibly invariant common factors as well as by some form of superficial overlap or redundancy among the tests involved in terms of their particular format, content, method of administration, or whatever. The third, seemingly speedrelated, factor seen in Table 9 and Figure 11 contributes strongly to performance on the arithmetical reasoning tests— especially the highly speeded forms (#20, #21, and #22) and the mathematics admissions test (#16), as already indicated. It is important to gain some understanding of this third factor, since it contributes quite strongly in giving rise to the overall test vector configuration and is located at the intersection of the two bestdefined hyperplanes bounding that configuration. However, only one test (#25, the cancellation reference test for perceptual speed) appears to be a very pure measure of this factor, and its communality is quite low (.19, as can be seen in Table 9). Moreover, the hyperplane containing tests that are unrelated to this third factor is not well defined. so it is risky to rely upon any contrast seen between those tests which lie in vs. out of that "hyperplane" as a basis for interpreting the factor. One thing we can notice is that those test scores and grades that are in the hyperplane common to the third factor and the "verbal" factor (i.e., those unrelated to the "spatial" factor) fan out into a wide arc that ranges from foreign language grades (#29, albeit with low communality) through the reference test for word fluency (#1, also with low communality), English grades (#28), and the highly speeded vocabulary tests (#6, #7, and #8) to the essentially nonspeeded vocabulary tests (#3, and #4, and #5) and the verbal admissions test (#2). This sequence can be regarded as ranging from those verbal measures requiring a high degree of fluency and flexibility in performance to those requiring a rather more carefully evaluated and accurate response regardless of how long it takes—a speed plus accuracy vs. accuracy alone continuum, in other words. (Note that Lord's use of formula scoring for his speeded tests made them demand high accuracy as well as speed or fluency of performance.) The latter interpretation is also borne out to some extent by an examination of the vectors that populate the hyperplane common to the third factor and the "spatial" factor; i.e.. those test scores and grades that are essentially unrelated to the "verbal" factor. As for the number speed (#23, and #24) and number checking (#27, albeit of low communality) reference tests, however, it is noteworthy that spatial ability seems to interfere somewhat with the conduct of these highly speeded tasks—in contradiction to the positive manifold assumption about the typical mode of joint action of latent determinants within the domain of natural behavioral variation. At this point we can only conjecture that the implied interference may result from errors or delays produced through reliance upon simple spatial means of accomplishing numerical operations (e.g., rough estimates) on the part of individ
Page 248
uals high in spatial ability, in contrast to the use of more highly efficient and accurate information processing methods. It is clear, in any event, that the number speed tests require rapid and ''fluent" performance, as do the number checking, cancellation (#25), and picture discrimination (#26) tests that were used by Lord as indicators of perceptual speed. It is not surprising that the picture discrimination task is positively influenced by spatial ability, however, as are grades in mathematics courses (#32). Grades in chemistry courses (#31) also seem to require spatial ability in conjunction with motivation to perform tasks rapidly and fluently as well as accurately. Not surprisingly, courses in engineering drawing and descriptive geometry (#30) seem to place less emphasis upon rapidity or fluency of performance, as opposed to accuracy alone, than do other courses besides English (#28). The former courses demand a good deal of spatial ability, however, just as English courses demand much verbal ability, as can be gathered from Figure 11 and Table 9. One thing to be noted about the range of tests located within the hyperplane under discussion, common to the third "speed" or "performance motivation" factor and the "spatial" factor, is that the highly speeded intersections tests (#13, #14, and #15) do not seem to require the particular form of ability or motivation characteristic of the third factor any more than do the less speeded intersections tests (#10, #11, and #12) or the spatial relations admissions test (#9). The highly speeded intersections tests do share in common a minor group factor attributable to overlap or redundancy, however, as can be seen from their strong residual linkages in Table 8. Hence, they are circled in Figure 11 to indicate superficial overlap going even beyond that high degree of collinearity at the latent level that is evident in the extended vectors plot. It can also be noted at this point that the less highly speeded intersections tests (#10 through #12) also seem to share in the overlap or redundancy just noted among the highly speeded forms, albeit to a lesser degree (see Table 8). As far as the speeded tests are concerned, it is clear from Figure 1l and the accompanying tables that something different is going on within each of the three different content areas studied by Lord: verbal, spatial, and mathematical. The speeded spatial tests are of complexity two, relating simultaneously to the general spatial ability factor and a minor factor of group overlap. The speeded vocabulary tests are of complexity three, relating simultaneously to the general verbal ability factor, the general speed of performance or fluency factor, and a minor group overlap factor. The arithmetic reasoning and mathematics tests are of complexity four, however, whether speeded or not, in that they relate simultaneously to the general verbal ability factor, the general spatial ability factor, and the general fluency or speed of performance factor—as well as sharing a minor factor of group overlap. It is
Page 249
noteworthy that Jöreskog's independentclusteroriented analysis of Lord's data in terms of the "congeneric tests" hypothesis completely missed the contentdependent difference in complexity of factorial composition of highly speeded tests which, as just discussed, can be seen in Figure 11 and Table 9. Our oftrepeated warning that collinearity is no evidence of factorial purity is thus well illustrated by the highly speeded tests in Lord's battery. Course grades in mathematics and chemistry, it will be recalled, show a large residual linkage oftrij=.34 in Table 8. Although grades in these courses are both seen to lie near the hyperplane common to the "spatial ability" and "performance motivation" factors in Figure 11, they are not as close together as might be expected on the basis of their extremely high degree of manifest association (rij = 8.1). Chemistry course grades (#31) seem to require a slightly more complex mixture of basic abilities and motivation than do grades in mathematics courses (#32), so they have been pulled apart a bit more at the latent level, as seen in Figure 11, than they appear to be at the manifest level where superficial overlap has presumably obscured underlying differences. The picture discrimination test score (#26) falls within the general vicinity of the chemistry and mathematics course grades just discussed, in the Figure 11 extended vectors plot, but an examination of residuals in Table 8 reveals a fair number of negative entries which deserve special consideration. What these negative residuals indicate, given that the original correlations show a positive manifold, is that the picture discrimination task differs more from nearby tests (arithmetic reasoning tests, in particular, as well as course grades in foreign languages, chemistry, and mathematics—judging from the residuals) at the manifest level than is indicated by its placement in the common factor space (i.e., the extended vectors plot in Figure 11). All of this seems to stem from the fact that the picture discrimination task requires a mixture of spatial ability and motivation to perform rapidly and fluently, which is comparable to the mixture required for attainment of high grades in mathematics courses, say. However, the picture discrimination task is superficially quite distinct from performance in courses of instruction or on tests of arithmetical reasoning, as shown by relatively low correlations at the manifest level between scores obtained by such diverse methods. In fact, the picture discrimination task is superficially similar to other putative measures of perceptual speed such as number checking (#27) and cancellation (#25), as seen by positive residuals in Table 8 between these scores. in a sense, then, the relationships between factors and variables depicted in Table 9 and Figure 1l seem to have been freed from the contaminating effects of method variance, which shows up in the form of residual linkage between manifest variables in Table 8. Such unconfounding of socalled trait and method variance is possible because sources of the latter tend to bring
Page 250
about superficial collinearities, whereas invariant hypothetical determinants of scientific interest bring about latent coplanarities. In other words, collinearityresistant fitting automatically distinguishes trait from method variance by taking account of the fact that "the higher the intertrait correlation the more the relationship is augmented when both measures utilize the same method" [Campbell and O'Connell, 1967, p. 415]. Hence, all of the criticisms leveled at the classical factor analysis model by Campbell and O'Connell turn out to be arguments in favor of collinearityresistant fitting, including their observation that the stronger a trait relationship is at the latent level the more susceptible it is apt to be to attenuation when different methods of measurement are employed. The only other negative residuals in Table 8 of any note are likewise related to superficial differences between tests in terms of format or mode of administration that are not substantiated at the level of latent determination. Hence, the verbal admissions test (#2) appears to be highly similar to the unspeeded vocabulary tests (#3 and #4), as far as being a rather pure indicator of verbal ability is concerned, although this is not adequately reflected in terms of zeroorder correlations. The latter have most likely been attenuated somewhat by the intervention of superficial differences in format or mode of test administration, not to speak of the time interval that may have separated admissions testing from the administration of Lord's own vocabulary tests. A few other superficial linkages between pairs of variables that are not accounted for (i.e., substantiated) at the latent level can be found by inspecting the threefactor residuals in Table 8. That course grades in English (#28) and foreign language (#29) are more strongly linked on the than is indicated by their unique relationships to the "verbal" and "performance motivation" factors in Table 9 and Figure 11 is not surprising, since they undoubtedly overlap a good deal in many other respects. Why foreign language course grades might be linked to grades in chemistry (#31) and mathematics (#32) beyond what is indicated through their mutual patterns of relationship to the three broad general factors underlying Lord's battery as a whole leaves more room for speculation, however, as do a few other linkages involving chemistry grades which we will not discuss here either. As the reader can see by now, inspecting residual outliers is a favorite pastime of exploratory data analysts, often taking up as much time and effort as examination of the latent model which fits the main part of the data. Inspection of these residuals can be facilitated in many different ways, including multidimensional scaling and supplementary clusteroriented factor analysis. In this way the outcome of typical clusteroriented factor analysis or "firstorder" factor analysis can be made available if it is desired without jeopardizing the search for broad general and invariant common factors.
Page 251
One approach that can be entertained in examining the pattern of residual linkages left by collinearityresistant fitting of the factor analysis model in the presence of local dependency outliers is cluster analysis. Application of cluster analysis to the residual associations that remain after broad general factors accounting for coplanarity relationships underlying the battery as a whole have been partialled out would seem to be quite appropriate, since the superficial collinearity relationships that remain as residuals have a good chance of being accurately represented in terms of independent clusters. Nevertheless, rather intricate patterns of pairwise linkage that cannot be adequately depicted in terms of independent, nonoverlapping clusters sometimes remain in a residual matrix after collinearityresistant fitting. Another approach that might appeal to those who find analysis of covariance structures useful in model building is to use the pattern of nearzero loadings located through bounding hyperplane simple structure transformation (as in Table 9 for Lord's data) in conjunction with a detailed examination of residual linkages left among groups or pairs of variables through collinearityresistant fitting (as in Table 8) in setting up a priori constraints in a restricted factor analysis model. The suggested approach is fundamentally exploratory rather than confirmatory, of course, but it may prove reassuring to some in that a statistical test of fit (albeit not strictly appropriate when based upon prior analysis of the data) is made available, as are standard errors for estimates of any free parmeters in the model. In taking such an approach with Lord's data, for instance, one would entertain the "hypothesis" of three interconelated general factors whose zerorestricted loadings reflect the pattern of hyperplane membership seen in Table 9 and Figure 11. In addition, around five minor group factors would have to be entertained on the basis of residual linkages seen in Table 8. Since neither the reference tests for perceptual speed nor the course grades form a compact cluster, moreover. cerain pairwise linkages or doublet factors would also have to be "hypothesized". All of the foregoing implies that it takes a rather complicated model involving coplanarity relationships as well as collinearity relationships (the latter reflecting both clustering and doublet linkages) in order to represent Lord's data, not to mention the usual unique factors and the fact that intercorrelations among the major common factors cannot be presumed to be zero. Given so many factors the fit of the model to observed data might be expected to be so close that any remaining residual lack of fit could safely be attributed to mere statistical sampling fluctuations rather than to possible variable sampling idiosyncracies. if so, then the maximum likelihood method of estimating parameters would prove to be appropriate and might actually yield reasonable results when given good starting values. Unfortunately, not only does the dimensionality of such a restricted solu
Page 252
tion have to be at least eight in the case of Lord's 33 variables, not counting possible pairwise linkages (implying around 130 free parameters to be estimated), but the solution is nowhere near adequate when judged in terms of the likelihood ratio chisquare test of fit. Moreover, each attempted reanalysis involving fewer model restrictions proves quite costly under such circumstances, even if storage and time limitations are not found to make the postulation of more free parametes impractical.† Despite all of this impressive statistical effort, nothing about the solution can really be expected to remain invariant with respect to changes in test battery composition except for the three broad general common factors originally isolated through collinearityresistant fitting and bounding hyperplane simple structure transformation! As can be seen in Figure 11, however, there is some reason to be suspicious even of the uniqueness or location invariance of these three factors, since they are not all located at the intersections of distinct and welldefined hyperplanes bounding the test vector configuration. What is needed for confidence in the solution, then, is not a larger sample of observations or more efficient (under certain conditions, technically) statistical estimation of parameters in a highly detailed mathematical model for the population covariance structure of this particular set of variables but exploratory analysis of a larger and more representative sample of variables from the domain of variation of interest. Practical Comparison of CollinearityResistant Fitting and Direct Geoplane Transformation With Available Approaches to Unrestricted Factor Analysis We have endeavored to present factor analysis as a modern technique of exploratory data analysis with the objective of discovering hypothetical determinants of manifest covariation that can be expected to remain invariant with respect to changes in test battery composition. In so doing we have pointed out some of the apparent disadvantages of viewing exploratory factor analysis solely from the conventional statistical perspective of hypothesis testing and disconfirmation. It might well be asked, however, whether the fully exploratory approach to factor analysis we have developed actually yields results that differ in any important practical respects from the outcome †
The use of LISRELIV [Joreskog and Sörbom, 1978] to conduct a "confirmatory" factor analysis of this nature for Lord's data required 250K of storage for the smallest number of free parameters postulated. cost over twice as much as had the full exploratory analysis using collinearityresistant fitting and direct geoplane transformation on which the postulated model was based. and yielded evidence that more than 130 free parameters are required in the model (X2 = 1170 with 432 df) if the data are to be tit adequately from a purely statistical perspective.
Page 253
of more traditional forms of model estimation and parameter identification. In particular, we have not yet demonstrated that collinearityresistant firing of the common factor model followed by direct geoplane transformation has any practical advantages over statistically motivated maximum likelihood estimation followed by cluster oriented transformation via direct quartimin, for example. It will be recalled from our earlier comparative analysis of Lord's data in Table 2 that direct geomin and direct quartimin transformation (of the same five minres factors, in that case) led to highly similar results, except for a few minor discrepancies attributable to direct geomin's greater tolerance of factorial complexity. After the presentation of that comparison, however, we discovered through collinearityresistant firing and bounding hyperplane simple structure transformation that only three common factors (general verbal ability, general speed or rapidity of performance and general spatial ability) show much evidence of factorial invariance in Lord's battery (Table 7 and Figure 11). Although several group factors (verbal speed, perceptual speed, arithmetic reasoning, and spatial speed) as well as a few doublets (language grades, number speed tests, mathematics and chemistry grades) also appear (in Table 8) in the form of residual linkages reflecting superficial collinearities among manifest variables, none of these can be expected to remain invariant with respect to changes in test battery composition. Application of exploratory maximum likelihood factor analysis to Lord's data in any conventional fashion could not be expected to meet with much success. All ten or so of the factors mentioned in the last paragraph, including doublets, would have to be dealt with in the same analysis in order to fit Lord's data adequately in a statistical sense. We know, however, that entertaining doublet "factors" in exploratory analyses is troublesome as far as identification of parameter estimates is concerned. Given a doublet factor, it is possible to multiply either of the loadings by any arbitrary scalar quantity, as long as the other loading is divided by that same quantity; no change in fit of the theoretical model to observed data results from such rescaling. One frequent symptom of the latter type of indeterminancy is an improper solution; i.e., the outcome of overfitting in the presence of strong pairwise local dependency outliers is often indefiniteness in parameter estimates [cf. van Driel, 1978]. Collinearityresistant fitting is a way of excluding local dependency outlier effects from an exploratory factor analysis for the sake of "fitting the fittables" in a factorally invariant manner. Might it not be possible to accomplish a similar result with a more conventional approach to factor extraction, however, simply by reducing the number of common factors entertained below that suggested by global statistical tests of goodness of fit? it is commonly held, after all, that the series of likelihood ratio chisquare tests for
Page 254
completeness of factoring conducted in connection with exploratory maximum likelihood factor analysis leads to overfitting in analyses based upon large samples of observations. Lord's data can be regarded as coming from a large sample, so we have followed conventional wisdom and ignored strong rejection of the hypothesis that three maximum likelihood factors fit his data in order to arrive at a solution which can be compared to the outcome of collinearityresistant fitting. To make matters more interesting, however, we have chosen to contrast the latter two methods of analysis with the psychometric "alpha" approach to extracting common factors developed by Kaiser and Caffrey [1965]. The threefactor residuals produced by collinearityresistant fitting of Lord's data have already been presented below the diagonal in Table 8. Comparable entries produced by maximum likelihood factoring have been entered below the diagonal in Table 10, while the residuals that result from alpha factoring of the same data with three common factors appear above the diagonal in that table. Comparison of corresponding entries below the diagonal of Table 8 and Table 10 reveals some striking discrepancies between collinearityresistant fitting (CRF) and maximum likelihood estimation (MLE) in terms of threefactor residuals. While MLE does leave some rather sizeable residuals (a dozen over .20 in magnitude, whereas CRF leaves more than twice as many), none are found among the highly speeded verbal, spatial. or arithmetic reasoning tests used by Lord. Moreover, MLE leaves only two residuals above .l0 in magnitude among all of Lord's arithmetic reasoning tests, in contrast to CRF for which only four of these same residuals fall below .20 in magnitude. It is clear that MLE has led to close fitting of exactly those original correlations that were found by CRF to reflect excessive, superficial collinearity among manifest variables. The reason for this discrepancy is not difficult to find, however. We can see from the implicit reciprocal uniqueness weighting of MLE—when viewed as a conditional weighted least squares fitting method in (147), for example—that great emphasis is placed upon fitting observed correlations involving any variables that ultimately come to have high communalities. Since those variables that are relatively collinear at the manliest level (because they are highly redundant or overlapping measures) appear to share a great deal of variance in common, MLE tends to fit their interrelationships quite accurately. in other words, MLE is prone to be strongly influenced by superficial collinearities brought about by minor group factor effects, rather than being sensitive to the less obvious effects of broad, general, and quite possibly invariant common factors that bring about relationships of coplanarity. Moreover, it is exactly these same biases in MLE that invalidate its use as a way of confirming hyperplanar structure unless a very good overall fit to the data is achieved.
Page 255
From Table 10 we can see that all of the large MLE residuals are confined to relationships among Lord's reference tests for number speed and perceptual speed (#23 through #27) or among his indicators of grades in academic courses (#28 through #33). What these MLE residuals indicate, then, is that performance on reference tests for perceptual speed as well as achievement in academic courses require factors beyond those that are common to the verbal, spatial, and mathematical test score domains tapped by Lord's battery. The latter domains were very strongly represented in the battery, however, due to the inclusion of six highly parallel (but for scale and origin) measures of each, so MLE attempted to fit their manifest interrelationships acurately given the hypothesis of only three common factors active in the population sampled. Comparing the threefactor alpha residuals above the diagonal in Table l0 with corresponding entries below the diagonal in that same table and in Table 8 reveals that alpha factoring comes much closer to collinearityresistant fitting than does maximum likelihood estimation. Although only two of the threefactor alpha residuals exceed .20 in magnitude, the placement of those which exceed .10 in magnitude closely parallels that found below the diagonal in Table 8. One thing we can infer from this correspondence is that alpha factoring should serve as a good starting point (initial configuration) from which to initiate collinearityresistant fitting. in this connection, it can be noted that fitting Lord's data with three alpha factors required only six major iteration cycles (with principal factors as a starting point) before cornmunalities had stabilized to beyond the third decimal place. Maximum likelihood factoring required nine major cycles to satisfy a comparable criterion of convergence, while the adaptive reweighting scheme employed in collinearityresistant fitting required nineteen such cycles. Reliance upon alpha factoring as an initial configuration for CRF might thus be expected to yield considerable savings in computation time; alpha factoring is quite efficient and goes a good way toward identifying those very local dependency outlier residuals eventually isolated by CRF. We will have more to say about the initial configuration issue in collinearity resistant fitting of the invariant common factor model to observed data at the end of this section. The ability, albeit limited, of alpha factoring to resist local dependency outliers stems, of course, from the fact that emphasis is placed on accurately fitting correlations involving variables with low communalities in any conditionally weighted least squares criterion using reciprocal communality scaling (167). Alpha factoring therefore tends to be responsive to the moderate levels of association brought about by coplanarity relationships among variables rather than to the excessively high levels of manifest association characteristic of superficial collinearity relationships. In other words, alpha factors are more apt to remain invariant with respect to changes in test battery composition than are maximum likelihood factors. This is just another way
Page 256 Table 10. Three Factor Residuals for Lord's Data: Maximum Likelihood Below Diagonal, Alpha Above Diagonal
#0l Word Fluency (R)**
#01
#02
#03
#04
#05
#06
#07
#08
#09
#10
#11
#12
#13
#14
#15
—
4
7
1
5
2
0
1
1
5
7
3
8
5
4
#02 Verbal (A)
2
—
0
#03 Vocabulary (L)
5
4
#04 Vocabulary (L)
1
3
#05 Vocabulary (M)
2
#06 Vocabulary (S) #07 Vocabulary (S) #08 Vocabulary (S) #09 Spatial Relations (A)
2
5
1
2
2
—
1
0
1
1
2
—
2
3
2
6
2
0
—
0
0
3
3
1
3
4
4
2
2
2
3
5
4
3
3
4
1
1
4
2
1
2
1
1
3
1
3
2
2
0
1
1
1
2
3
0
0
2
0
2
2
1
2
3
2
1
2
2
0
0
—
8
10
2
1
2
3
3
5
3
4
—
12
2
1
1
3
3
5
4
6
8
—
2
3
0
4
3
4
4
0
1
1
—
0
4
1
4
1
2
2
3
#10 Intersections (L)
2
1
3
1
0
2
0
3
2
—
10
5
8
7
6
#11 Intersections (L)
3
1
2
1
1
0
1
1
2
4
—
7
6
7
8
#12 Intersections (M)
3
0
1
2
1
0
1
0
1
1
2
—
5
5
5
#13 Intersections (S)
4
1
1
0
2
1
1
2
1
0
2
1
—
13
11
#14 Intersections (S)
0
1
0
2
0
2
2
1
1
1
2
2
2
—
15
#15 Intersections (S)
0
1
0
2
0
1
1
2
0
2
1
1
1
4
—
*
decimal points omitted
**
R = Reference, G = Grade, A = Admission, L = Lead, M = Moderately Speeded, S = Speeded
(Table continued on next page)
Page 257 Table 10. (Continued)
#01
#02
#03
2
#07
#08
#09
#10
#11
#14
#15
4
4
5
7
4
6
5
0
5
4
6
1
#18 Arithmetic Reasoning (L)
4
4
2
6
1
3
5
4
#19 Arithmetic Reasoning (M)
7
3
2
5
1
3
5
2
#20 Arithmetic Reasoning (S)
6
1
2
3
1
0
1
#21 Arithmetic Reasoning (S)
6
1
2
0
1
2
1
#22 Arithmetic Reasoning (S)
2
2
0
0
1
1
2
0
#23 Number Speed (R)
13
4
1
5
#24 Number Speed (R)
6
4
1
4
1
#25 Cancellation (R)
12
7
5
7
2
#26 Picture Discrimination (R)
10
5
5
7
4
7
8
8
4
2
1
2
4
4
2
5
7
3
1
5
7
7
6
2
1
1
3
0
4
4
1
0
6
0
1 4 7
6 3 7
1
2
1
1
1
1
0
2
1
2
3
4
1
4
3
3
4
3
2
1
3
1
2
2
1
1
3
1
2
1
1
2
2
1
0
3
0
0
1
0
1
1
1
1
0
1
2
1
0
3
5 4 8
1 1 3
1 2
2 1
1
4 4 3
5 2 0
4 4 3
2
0
3
1
1
3
4
3
5
0
0
2
0
3
1
1
4
2
3
1
1
0
2
#30 Eng'g Draw & Des. Geom (G)
3
2
6
0
1
3
1
3
3
0
2
2
1
2
2
#31 Chemistry (G)
6
1
1
3
1
0
0
3
5
0
1
1
0
2
3
#32 Mathematics (G)
3
2
0
0
1
0
1
1
7
2
0
1
0
1
2
9
1
5
3
1
0
0
1
2
2
1
2
2
1
1
3
1
#29 Foreign Language (G)
#33 Conduct (G) (Table continued on next page)
4
0
#13
5
1
3
#12
6
6
1
#06
#16 Mathematics (A)
#28 English (G)
1
#05
#17 Arithmetic Reasoning (L)
#27 Number Checking (R)
8
#04
Page 258 Table 10. (Continued)
#01 Word Fluency(R)** #02 Verbal (A)
#16
#17
#18
#19
#20
#21
#22
5
4
4
6
5
5
2
#23
#24
#25
#26
#27
#28
#29
#30
#31
#32
#33
7
1
8
8
0
3
3
5
5
1
9
3
1
3
2
2
1
0
2
1
5
0
1
6
6
5
1
1
0
#03 Vocabulary (L)
2
2
3
3
0
1
2
1
2
3
1
5
5
6
3
3
5
4
#04 Vocabulary (L)
4
1
1
2
7
3
3
5
4
6
3
4
11
7
6
4
1
3
#05 Vocabulary (M)
2
4
5
4
1
0
2
0
2
1
0
1
5
5
7
1
2
0
#06 Vocabulary (S)
7
8
7
7
2
3
1
4
1
4
8
5
2
1
8
1
2
0
#07 Vocabulary (S)
7
6
9
9
4
2
5
2
3
3
9
4
4
1
7
2
2
0
#08 Vocabulary (S)
7
8
7
5
1
3
2
3
2
3
7
2
1
1
7
4
4
1
4
3
#09 Spatial Relations (A)
4
5
5
0
2
2
1
2
5
7
8
#10 Intersections (L)
4
5
1
8
7
5
6
2
0
8
6
6
0
0
2
2
1
4
#11 Intersections (L)
8
8
6
6
7
9
7
3
2
7
9
6
1
1
3
2
2
4
#12 Intersections (M)
9
7
2
4
4
6
7
2
1
7
8
5
1
3
3
3
3
1
#13 Intersections (S)
10
8
8
8
7
6
6
5
3
9
13
4
6
3
1
3
3
4
#14 Intersections (S)
9
8
8
8
7
5
5
7
2
7
13
9
7
2
1
5
4
5
#15 Intersections (S)
8
7
8
7
3
4
3
6
5
1
6
4
4
6
4
2
9
l0
l0
2
8
2
0
Page 259 Table 10. (Continued)
#16 Mathematics (A)
#16
#17
#18
#19
#20
#21
#22
#23
#24
#25
#26
#27
#28
#29
#30
#31
#32
#33
—
13
17
19
16
13
14
1
7
6
7
6
8
10
13
7
2
#17 Arithmetic Reasoning (L)
8
—
14
19
12
10
10
2
1
7
11
8
2
4
8
2
1
1 2
#18 Arithmetic Reasoning (L)
10
9
—
14
11
11
12
2
5
9
10
8
5
1
10
2
7
2
#19 Arithmetic Reasoning (M)
12
14
8
—
18
12
13
2
7
5
9
4
4
8
12
5
5
0
#20 Arithmetic Reasoning (S)
7
5
3
9
—
15
17
4
10
6
8
8
10
10
12
9
7
2
#21 Arithmetic Reasoning (S)
6
5
5
5
7
—
14
3
2
5
3
4
9
7
13
5
6
3
#22 Arithmetic Reasoning (S)
5
4
4
5
8
7
—
3
8
7
7
7
9
8
12
9
7
3
4
8
3
#23 Number Speed (R)
8
11
9
12
3
8
2
—
16
11
5
9
9
6
5
#24 Number Speed (R)
4
12
7
5
0
5
1
27
—
1
1
3
10
9
14
13
9
1
4
#25 Cancellation (R)
8
11
13
8
7
6
7
17
12
—
20
17
6
2
6
4
8
#26 Picture Discrimination (R)
6
13
12
10
8
3
6
20
12
26
—
20
9
11
7
9
11
3
#27 Number Checking (R)
10
14
15
10
10
5
8
22
21
31
32
—
7
10
2
10
6
2
#28 English (G)
6
0
3
3
8
7
7
2
3
0
5
0
—
24
9
13
8
8
#29 Foreign Language (G)
11
6
3
10
11
6
8
3
2
4
5
1
25
—
5
19
16
10
#30 Eng'g Draw & Des. Geom (G)
7
3
6
8
7
7
7
2
5
7
8
7
12
12
—
12
6
8
#31 Chemistry (G)
9
0
1
8
11
5
10
4
9
1
3
1
18
26
22
—
28
4
#32 Mathematics (G)
6
3
1
10
10
7
10
2
2
3
3
7
15
25
18
36
—
3
#33 Conduct (G)
2
0
5
3
0
0
0
7
2
7
2
1
9
11
12
5
4
—
Page 260
of saying that alpha factors tend to characterize the entire domain of variables from which any particular battery has been sampled. In Tables 11 and 12 we have presented threefactor directgeoplanetransformed maximum likelihood and alpha solutions for Lord's data. The corresponding extended vectors plots are given in Figures 12 and 13. The similarity of Figure 13 and Figure 11 is striking, confirming our suspicion that alpha factoring goes a good distance toward achieving the outcome of collinearityresistant fitting and would provide a good starting point from which to initiate the latter. (In comparing these results remember that the factors emerged from the computer in an order unique for each method of factoring.) Notice from Table 11 and Figure 12 that the threefactor maximum likelihood solution for Lord's data goes against all the evidence discussed in previous chapters indicating that arithmetic reasoning is a factorially complex performance. In other words, MLE has led to a rather predominant arithmetic reasoning factor simply because of strong representation of that cluster within Lord's particular test battery. We could not expect this result to display much invariance with respect to changes in test battery composition, however. Simply including fewer arithmetic reasoning tests and more measures of perceptual speed or academic grades could be expected to yield a major shift in meaning of the factor under consideration, for example. In the foregoing we have seen that the psychometrically motivated alpha method of factor extraction shows promise of fostering factorial invariance because it is somewhat resistant to localdependency outlier effects. Statistically motivated maximum likelihood estimation, on the other hand, is highly sensitive to exactly those superficial collinearity effects detrimental to factorial invariance. An extreme form of responsiveness of MLE to local dependency outliers is indicated in its tendency to yield improper solutions when pairwise linkages (doublet factors) are present and overfitting is indulged in in order to satisfy the likelihood ratio chisquare test of goodness of fit between theoretical model and observed data. We have now seen, moreover, that MLE is also sensitive to minor factor effects in the form of local dependency outliers even when the number of broad, general, and invariant common factors is not overestimated and when the minor factors in question pertain to sizeable groups of variables rather than to doublet pairs. Beyond the foregoing demonstration of the practical implications of using psychometric vs. statistical criteria for fitting the common factor model, there remains a need to demonstrate just what practical impact bounding hyperplane simple structure transformation might have upon an analysis, in comparison to clusteroriented transformation, once minor factor collinearity effects have been effectively excluded from consideration. in particular, does direct geoplane display any advantage over direct quartimin when transformation is applied only within the space spanned by a small number of broad general and quite possibly invariant common factors?
Page 261 Table 11. Three Direct Geoplane Transformed Maximum Likelihood Factors of Lord's Data
01 Word Fluency (R)
I
II
III
1*
27
19
Communality 14
02 Verbal (A)
5
85
1
74
03 Vocabulary (L)
2
81
6
64
04 Vocabulary (L)
8
81
6
66
05 Vocabulary (M)
0
87
4
73
06 Vocabulary (S)
5
78
7
67
07 Vocabulary (S)
4
89
7
81
08 Vocabulary (S)
4
83
11
74
09 Spatial Relations (A)
58
1
11
40
10 Intersections (L)
80
2
3
65
11 Intersections (L)
87
7
2
72
12 Intersections (M)
89
3
1
77
13 Intersections (S)
87
3
0
75
14 Intersections (S)
91
6
3
79
15 Intersections (S)
89
9
6
80
16 Mathematics (A)
7
18
64
55
17 Arithmetic Reasoning (L)
18
15
51
43
18 Arithmetic Reasoning (L)
15
12
56
47
19 Arithmetic Reasoning (M)
13
17
58
51
20 Arithmetic Reasoning (S)
10
15
66
58
21 Arithmetic Reasoning (S)
15
15
61
54
22 Arithmetic Reasoning (S)
4
19
64
55
23 Number Speed (R)
29
2
24 Number Speed (R)
28
62
25 Cancellation (R)
4
5
33
6
71
42
33
11
26 Picture Discrimination (R)
19
7
23
14
27 Number Checking (R)
20
8
44
16
28 English (G)
6
59
22
45
29 Foreign Language (G)
1
14
34
16
30 Eng'g Draw & Des. Geom (G)
57
0
26
51
31 Chemistry (G)
17
7
54
41
32 Mathematics (G)
12
3
65
49
33 Conduct (G)
0
11
17
3
FACTOR CORRELATIONS
Primary Factor
I I
II
100
III
20
37
Correlations
I
20
100
31
III
37
31
100
*
decimal points omitted
Page 262 Table 12. Three Dired Geoplane Transformed Alpha Factors of Lord’s Data
01 Word Fluency (R)
I
II
III
2*
30
24
Communality 17
02 Verbal (A)
9
86
2
77
03 Vocabulary (L)
I
84
8
69
04 Vocabulary (L)
9
78
2
64
05 Vocabulary (M)
0
85
1
73
06 Vocabulary (S)
5
74
15
63
07 Vocabulary (S)
5
84
19
78
08 Vocabulary (S)
3
77
22
70
09 Spatial Relations (A)
61
4
5
40
10 Intersections (L)
76
5
1
59
11 Intersections (L)
82
2
4
65
12 Intersections (M)
87
1
6
74
13 Intersections (S)
81
2
1
65
14 Intersections (S)
83
1
4
68
15 Intersections (S)
84
4
3
70
16 Mathematics (A)
34
28
38
46
17 Arithmetic Reasoning (L)
41
25
27
40
18 Arithmetic Reasoning (L)
38
25
27
40
19 Arithmetic Reasoning (M)
38
29
32
44
20 Arithmetic Reasoning (S)
36
24
43
48
21 Arithmetic Reasoning (S)
39
21
42
49
22 Arithmetic Reasoning (S)
29
26
44
46
23 Number Speed (R)
20
0
24 Number Speed (R)
13
69
25 Cancellation (R)
5
4
47
5
73
50
45
21
26 Picture Discrimination (R)
14
3
37
18
27 Number Checking (R)
19
16
64
39
28 English (G)
3
50
32
43
29 Foreign Language (G)
10
7
41
21
30 Eng'g Draw & Des. Geom (G)
65
7
35
60
31 Chemistry (G)
34
7
53
48
32 Mathematics (G)
32
4
66
60
33 Conduct (G)
8
11
12
3
FACTOR CORRELATIONS
Primary Factor
I I
II
100
III
14
17
Correlations
II
14
100
21
III
17
21
100
*
decimal points omitted
Page 263
Figure 12. Extended Vectors Plot of Three Direct Geoplane Transformed Maximum Likelihood Factors of Lord's Data
In Table 13 and Figure 14 we have presented results that are directly comparable to those given in Table 9 and Figure 11. The only difference is that the three collinearityresistant factors of Lord's data have now been transformed via the clusteroriented direct quartimin procedure lcf. Jennrich and Sampson, 1966; Carroll, 19531 rather than through the global direct geoplane strategy for bounding hyperplane simple structure transformation. Although Figures 11 and 14 depict exactly the same threespace (as can be judged from comparable positioning of test vectors endpoints), it can be seen that the first factor in the direct quartimin solution falls nearer Lord's speeded vocabulary tests than was the case in the direct geoplane solution. in other words, direct quartimin transformation is less content than is direct geoplane to allow these speeded tests to display factorial complexity. Although only a minor shift in its position occurs between these two analyses, the fact that the first factor is no longer at the extreme lateral edge of the
Page 264
Figure 13. Extended Vectors Plot of Three Diret Geoplane Transformed Alpha Factors of Lord's Data
test vector configuration (at the intersection of bounding hyperplanes) mean,, that the formerly essentially factorially pure verbal admissions test (#2) and the unspeeded vocabulary tests (#3 and #4) now have ambiguous and potentialy confusing negative loadings on the third general speed or performancemotivation factor in Table 13, while the highly speeded vocabulary tests (#6 through #8) are less clearly influenced by this same factor than was the case in Table 9. Similar effects can be discerned when it comes to the spatial tests that are loaded highly on the second factor in Table 13 although their negative loadings on the third factor are rather insubstantial. The distortions seen in Table 13 and Figure 14 correspond exactly to those noted by Carroll when he proposed the quartimin criterion; i.e., "The presence of factorially complex tests makes the primary axes more highly correlated than they would be if placed by graphical methods, and may give rise to larger negative projections than would otherwise be the case" [1953,
Page 265
Table 13. Three Direct Quartimin Transformed CollinearityResistant Factors of Lord's Data
I
II
01 Word Fluency (R)
25
02 Verbal (A)
III
*
Communality
I
25
16
102
2
17
97
03 Vocabulary (L)
91
3
16
76
04 Vocabulary (L)
92
5
18
81
05 Vocabulary (M)
93
5
12
80
06 Vocabulary (S)
68
5
9
53
07 Vocabulary (S)
76
5
12
63
08 Vocabulary (S)
72
4
15
59
09 Spatial Relations (A)
2
62
0
10 Intersections (L)
0
79
11 Intersections (L)
6
12 Intersections (M)
2
13 Intersections (S) 14 Intersections (S) 15 Intersections (S) 16 Mathematics (A)
40 7
59
80
9
59
91
15
76
2
77
4
58
4
78
7
57
7
80
0
62
27
24
33
37
17 Arithmetic Reasoning (L)
24
34
18
31
18 Arithmetic Reasoning (L)
21
32
24
32
19 Arithmetic Reasoning (M)
26
29
26
34
20 Arithmetic Reasoning (S)
22
27
41
44
21 Arithmetic Reasoning (S)
22
31
34
40
22 Arithmetic Reasoning (S)
26
20
43
43
23 Number Speed (R)
2
19
24 Number (R)
0
25 Cancellation (R)
3
66
40
16 I
72
48
43
19
26 Picture Discrimination (R)
1
22
44
30
27 Number Checking (R)
8
14
50
22
28 English (G)
59
29 Foreign Language (G)
3
21
46
16
4
38
22
30 Eng'g Draw & Des. Geom (G)
0
70
20
61
31 Chemistry (G)
12
31
42
41
32 Mathematics (G)
I
28
62
56
33 Conduct (G)
9
5
15
3
FACTOR CORRELATIONS
Primary Factor
I
II
28
100
29
29
100
I
100
26
Correlations
II
26
III
28
*
decimal points omitted
III
Page 266
Figure 14. Extended Vectors Plot of Three Direct Quartimin Transformed CollinearityResistant Fadors of Lord's Data
pp. 33, 37l. This suggests, of course, that the direct geoplane solution in Table 9 and Figure 11 comes close to the type of graphical solution for which Carroll frankly expressed a preference (over his own quartimin solution) but for the necessity of its attainment by laborious and subjective graphical means at the time. Achieving Solution Invariance Through Dimensionality Reduction in Fully Exploratory Factor Analysis. Although all of the foregoing approaches to fitting the factor analysis model and transforming the results yield somewhat comparable results, in that both a verbal and a spatial factor clearly appear, there is a good deal of variation from solution to solution when it comes to details—especially with regard to loadings on the third, seemingly speedrelated factor. It is precisely these differences in the detailed implications of various analyses which make it important that we use
Page 267
the best available technology in conducting exploratory factor analysis, however. The decision to disregard any potential common factor as a minor symptom of group overlap or redundancy hinges upon the lack of evidence that it will remain invariant with respect to changes in test battery composition. The latter evidence does not appear in the form of highloading unifactorial variables but in the form of factorially complex variables that are merely coplanar with the factor in question and therefore have only low or intermediate loadings thereon. The combined use of collinearity resistant fitting of the invariant common factor model, global direct geoplane bounding hyperplane simple structure transformation, and visual inspection of extended vectors plots of the outcome affords us a new means of making the required decisions. An advantage of this approach is the likelihood, once we have detected a minor factor of group overlap and rerun the analysis in a space of reduced dimensionality, that the excluded minor factor will simply make itself felt in the form of local dependency outlier residuals and will not disrupt invariance of the remaining broad general common factors. indeed, excluding such minor factor effects should contribute to stability of the overall solution. In order to demonstrate the latter phenomenon, we can now return to an examination of the fourfactor collinearityresistant directgeoplanetransformed solution for Lord's data which we found it expedient to overlook earlier in order to proceed with a discussion of the three factor solution on which we finally settled. It will be recalled that even the final threefactor solution for Lord's data (Table 9) is rather suspect in that one of the supposedly testvectorconfigurationbounding hyperplanes is not populated with test vectors anywhere except at its very edges (Figure 11). it is not surprising, then, that the fourfactor solution shows even less convincing evidence of factorial invariance. In Table 14 we have presented the fourfactor solution in question. In Figure 15, the extended vectors plot based upon orthogonal bounds to the first, second, and third Table 14 factors, one can see a structure somewhat similar to that in Figure 11 for the final threefactor solution which we have argued might well remain invariant with respect to changes in test battery composition. Not surprisingly, since the extra factor in Table 14 is defined largely by the cluster of reference tests for perceptual speed (#25 through #27), the latter do not fall into comparable positions in Figures 11 and 15. indeed, the amount of variance accounted for by the first three factors is so low for variable #27 (and its loadings on these three factors display such a mixed pattern in terms of directionality) that its extended vectors projection in Figure 15 must be ignored. Just as the third factor in Table 14 and Figure 15 no longer appears to have a direct impact on the perceptual speed tests, we also see that its influ
Page 268 Table 14. Four Direct Geoplane Transformed CollinearityResistant Fadors of Lord's Data
01 Word Fluency (R)
I
II
28
III I
IV
8
Communality
24
18
02 Verbal (A)
91
0
3
10
84
03 Vocabulary (L)
84
4
3
6
68
04 Vocabulary (L)
82
2
2
11
69
05 Vocabulary (M)
88
6
1
3
74
06 Vocabulary (S)
77
5
l
15
64
07 Vocabulary (S)
88
5
2
19
79
08 Vocabulary (S)
81
5
3
19
71
09 Spatial Relations (A)
3
62
7
0
43
l0 Intersections (L)
0
76
8
6
62
11 Intersections (L)
5
81
2
5
65
12 Intersections (M)
2
88
2
10
79
13 Intersections (S)
0
85
2
2
72
14 Intersections (S)
2
87
4
1
72
15 Intersections (S)
6
84
4
3
73
16 Mathematics (A)
19
11
51
3
43
17 Arithmetic Reasoning (L)
15
18
48
11
40
18 Arithmetic Reasoning (L)
11
15
56
9
44
19 Arithmetic Reasoning (M)
19
17
45
01
39
*
decimal points omitted
Page 269 Table 14. (Continued)
20 Arithmetic Reasoning (S)
I
II
III
15
13
IV
55
Communality
6
48
21 Arithmetic Reasoning (S)
17
20
44
9
43
22 Arithmetic Reasoning (S)
18
6
55
7
47
23 Number Speed (R)
1
24 Number Speed (R)
21 3
25 Cancellation (R)
10
34
21 5
53
48 2
47
44
48
59
38
26 Picture Discrimination (R)
14
30
13
61
47
27 Number Checking (R)
4
11
12
64
45
28 English (G)
52
13
37
2
49
29 Foreign Language (G)
8
7
50
2
26
30 Eng'g Draw & Des. Geom (G)
1
60
28
7
58
31 Chemistry (G)
2
14
65
2
52
9
10
73
13
52
14
1
24
6
5
32 Mathematics (G) 33 Conduct (G) Primary Factor
I
100
22
36
3
II
22
100
37
8
III
36
37
100
26
IV
3
8
26
100
Correlations
*
decimal points omitted
Page 270
Figure 15. Extended Vectors Plot of the 1st, 2nd, and 3rd Table 14 Factors of Lord's Data
ence on the highly speeded vocabulary tests (#6 through #8) and the word fluency test (#1) is diminished in comparison to what was seen earlier in Table 9 and Figure 11. This leaves the third factor in the fourdimensional analysis of Lord's data a seemingly rather pure determiner of course grades in Foreign Language (#29), Chemistry (#31), and Mathematics (#32), as well as a strong determiner of performance on all tests of mathematical (#16) and arithmetical reasoning abilities (#17 through #22). But for the loadings of course grades in English (#28), Foreign Language (#29), and, perhaps, Conduct (#33) here we might take the first three factors in Table 14 simply to reflect the three content areas studied by Lord: Verbal, Mathematical, and Spatial. Since loading patterns of the spatial tests are little changed in going from four to three factors and we do not want to go into a detailed analysis of all four extended vectors plots associated with Table 14, let us now examine
Page 271
that extended vectors plot based on the orthogonal bounds to the nonspatial (first, third, and fourth) factors in that table. In Figure 16 we see from this extended vectors plot that there is little evidence that the factors involved fall at the intersections of distinct and welldefined hyperplanes bounding a densely populated polyhedral convex coneshaped configuration of test vectors. On the contrary, if one only masks out four points in Figure 16 (#23, #24, #28, and #1), then all of the remaining tests in the battery appear to fall into rather clear independent clusters Figure 17 is the extended vectors plot for the verbal, spatial, and perceptual speed factors in Table 14. Note that those tests that do not load strongly on the perceptual speed factor (FZ) fall to either side of a very poorly defined hyperplane normal to the reference axis in question. That the ''hyperplane" thus revealed is so illdefined that test vectors spread widely to either side of it indicates that entertaining the perceptual speed factor adds a good deal of
Figure 16. Extended Vectors Plot of the 1 st, 2nd, and 4th Table 14 Factors for Lord's Data
Page 272
Figure 17 Exiended Vectors Plot of the 1st, 2nd, and 4th Table 14 Factors for Lord’s Data
noise to the overall configuration and does not aid in defining its true outer boundaries. (Compare this analysis of the situation with our earlier discussion of the impact of entertaining a nearnull extra factor in Figures 9b through 9d for the second box problem solution in Table 6.) Note that none of the boundaries of the test vector configuration in Figure 18, the remaining extended vectors plot in this analysis, are well defined. There is little about the configuration which suggests that it might remain invariant with respect to changes in test battery composition. Neither the central region nor the bounding faces of the fourdimensional solution for Lord's data appear to be wellpopulated with test vectors: instead, moreorless independent clusters appear. The extended vectors plots for the Table 14 solution show much weaker definition of bounding hyperplanes than is evident in Figure 11. Were it not for the evident complexity of the number speed tests (#23 and #24), we could readily dismiss the notion that perceptual speed shows potential invariance as a hypothetical deter
Page 273
Figure 18. Extended Vectors Plot of the 2nd, 3rd, and 4th Table 14 Factors of Lord's Data
minant of variation within Lord's battery. Even as it is, however, there is not enough evidence that this factor might persist given changes in test battery composition to justify treating it as anything but a minor group factor reflecting superficial overlap or redundancy in the number speed, cancellation, picture discrimination, and number checking tasks. The evident factorial complexity of the number speed tests in Table 14 and associated extended vectors plots could be taken seriously if the third factor could safely be viewed as purely arithmetical or mathematical in scope. However, the number checking task (#27) does not seem to relate to this factor while grades in courses with nonmathematical content do, as mentioned previously. It therefore seems prudent to reduce the dimensionality of the solution by one both in order to rid the analysis of the possible distorting influence of the minor perceptual speed factor and to get a better conception of the nature of the third factor. Incidentally, we can note from the factor intercorrelation matrix in Table
Page 274
14 that the minor factor of perceptual speed (IV) correlates only with the factor (III) about the interpretation of which there is still some ambiguity. Moreover, while the first three factors are all mutually intercorrelated to some degree, the relatively high degree of correlation of the third factor with all the rest suggests that it might still be placed too near the center of the overall test vector configuration, just as was the case in the five factor solution of Table 7. in passing, it can also be noted that the fourth factor is a good deal smaller than the others in overall size or influence in the test battery, which would imply that it may not reflect the action of a broad, general and quite likely invariant source of covariation within the domain sampled were we to assume representative sampling of tests. Of course, we already know that reducing the dimensionality of the solution for Lord's battery from four to three left sizeable residual linkages among the perceptual speed tests in Table 8, and that in itself is evidence that the manifest correlations among these variables might well have been inflated through superficial overlap or redundancy in format or content of the tests involved. In this connection, note that the reducing the dimensionality of the solution from five to four brought about the appearance of new residual linkages among the arithmetic reasoning tests (#17 through #22), including the mathematics admissions test (#16), as well as between many of the course grades variables. With each reduction in dimensionality, then, the major common factor solution is freed from the contaminating influence of minor group factors due to superficial overlap or redundancy in test content. The cumulative effect of this paring away of excessive collinearity at the manifest level ultimately reveals that the third major factor in Table 9 and Figure 11 is not content related at all, but accounts for the general speed or fluency of performance component in perceptual speed tests, number speed tests, mathematics or arithmetic reasoning tests, and achievement in academic courses as well. As such, the latter factor seems to have identified a very important and widely influential hypothetical determinant in the form of motivation to perform rapidly and fluently when faced with a wide variety of specific tasks. An alternative to Second Order Factoring. Although our approach to fitting the invariant common factor model through the exclusion of minor factor effects in collinearityresistant fitting may be thought of as a way of getting at what are usually regarded as "second order" common factors, it is extremely doubtful that the usual Thurstonian approach to second order factoring could arrive at the evidently rather invariant solution for Lord's data given in Table 9 and Figure 11. Although the broad range of influence of the general fluency or performance motivation factor (III) in Table 9 presumably accounts for many of the interfactor correlations seen earlier in the
Page 275
fivefactor solution of Table 7, there is little hope of isolating and identifying such a second order factor through analyzing this 5 × 5 matrix. Given such a small number of "first order" factors, in fact, what one would be most likely to arrive at through analyzing their intercorrelations is only one higher order general factor on which the arithmetric reasoning first order "factor" has the strongest saturation simply because it falls right in the middle of the configuration.† Such an outcome would be much less informative than that from collinearityresistant fitting of the invariant common factor model followed by global direct geoplane transformation (i.e., Tables 8 and 9 as well as Figure 11). Evidence for a higher order general factor still persists in the form of interfactor correlations in Table 9, but it will not generally be found necessary to analyze such matrices of intercorrelations among directgeoplanetransformed collinearityresistant factors, all of which evidence a high likelihood of factorial invariance. The latter factors should be located at the extreme lateral edges of the polyhedral convex cone of test vectors and will therefore tend to show rather uniform and low levels of mutual intercorrelation. In other words, only a single higher order general factor is expected to appear, and it can be regarded merely as evidence that the invariant common factors span a moreorless homogeneous domain of variation. Hence, neither the "thirdorder" general factor nor the "first order" minor residual factors that appear in the approach to fully exploratory factor analysis that we have proposed can be regarded as being of much scientific interest; they merely refleet patterns of collinearity due to group overlap or homogeneity of content and are highly dependent upon which particular tests happen to have been included in the analysis. Only at what might be thought of as the intermediate or "second order" level can primary factors be isolated at the intersections of test vector configuration bounding hyperplanes in such a way that their location is invariant with respect to changes in test battery composition. it is conceivable for such invariance to carry over in a sense to the "third order'' general factor, however, since we might always come up with the same invariant common factors at the "second order" level within any given domain. Such a phenomenon would tend to validate Spearman's theory of general intelligence, for instance, if it were found to recur in analyses of different test batteries from the domain of intellectual abilities. Although only three evidently invariant common factors were arrived at ó
Analysis of the intercorrelation matrix among the nine first order factors arrived at by Lord [l956, p. 471 yields only two second order factors, on one of which the three factors corresponding to unspeeded tests (verbal, spatial, and mathematical) are loaded and on the other of which the factors corresponding to clusters of speeded tests (verbal and spatial) are loaded along with factors corresponding to the number speed and perceptual speed clusters.
Page 276
for Lord's data in Table 9 and Figure 11, viewing that outcome as only the second order portion of what is implicitly a hierarchical solution suggests that we are actually dealing with a model of rather high dimensionality when it comes to accounting for the observed data. The collinearity resistant fitting procedure has left clear evidence (in the Table 8 residuals) of four "first order" minor common factors of group overlap (i.e., verbal speed, spatial speed, arithmetic reasoning, and perceptual speed) along with several other minor "first order" doublet factors (number speed, languages, math/ science, etc.). Since the latter "first order" factors have already been made to reflect only the residual, specific variance shared within the cluster of tests of which they are composed, however, they appear to be mutually orthogonal as well as being orthogonal to the "second order" factors in Table 9. What we have routinely accomplished then, is the very objective of the Schmid and Leiman [1957] procedure for casting an oblique first order solution into hierarchical form. Rather than relying upon the decomposition of unstable estimates of correlation among a small number of lower order factors for this purpose, however, we have proceeded directly to a model for the preponderance of observed relationships among manifest variables that are fittable in terms of "second order'' coplanarity relationships. Since "first order" minor factor collinearity effects are overlooked as residual local dependency outlier clusters in the fitting process, the directly obtained "second order" solution in Table 9 is resistant to the effects of changing test battery composition. It is therefore apt to show the extremely desirable property of factorial invariance. Such an outcome could not be obtained through the ordinary Thurstonian approach to higher order factoring simply because the second order factors obtained thereby depend upon which particular factors have emerged at the first order level, while the latter depend, in turn, upon the exact composition of the original test battery. The proposed solution therefore directly accomplishes the objectives of second order factor analysis while remaining aloof to the idiosyncracies of test battery composition as they are reflected in manifest clustering at the first order level. Thurstone's [1944] higher order factoring approach, on the other hand, is focused on the task of accounting for relationships among "first order" factors per se, rather than on the broader relationships of coplanarity among original variables of which the former are weakly diagnostic at best. The information about "second order" hyperplane location provided by test vectors which do not fall into any of the "first order" clusters (e.g., variables #1 and #30 in Figure 11) is not made use of by Thurstone's approach, as is any information provided by withincluster variation (e.g., among perceptual speed tests #25 through #27 as well as between language grade variables #28 and #29 in Figure 11). Although representing each cluster of highly collinear tests as a single
Page 277
first order factor could conceivably result in a considerable reduction in the number of distinct parameter estimates that must be entertained in a higher order model for the data, we will see in Part V that, in contrast to the situation in Figure 11, the polyhedral convex cone of test vectors is often rather uniformly populated with test vectors rather than being occupied by a distinct set of compact clusters of highly collinear variables (as would be required for effective application of Thurstone's approach to higher order factoring). Accommodating Observed Data of High Dimensionality. It is important to notice that the decision to reduce the number of broad general and invariant common factors extracted and rotated in the approach to exploratory factor analysis that we have proposed does not result in a change in dimensionality of the entire solution as viewed from the SchmidLeiman hierarchical perspective. Instead, reduction of the number of potentially invariant common factors entertained simply entails downgrading one or more of the contenders for "second order" status to the role of a minor factor of group overlap or redundancy. Therefore, even though we have settled upon only three potentially invariant common factors for Lord's data in Table 9 and Figure 11, associated with this solution are all of the minor group factors implied by the residuals in Table 8 as well as the higher order general factor implied by the fact that the factors in Table 9 are correlated with one another. This rather high dimensional hierarchical decomposition of Lord's original correlations (above the diagonal in Table 8) should account for everything systematic that is going on there beyond mere statistical sampling fluctuations, since the total number of factors being entertained (exclusive of unique factors, of course) is approximately that number (i.e., eleven) suggested by the likelihood ratio chisquare test for this data [cf. Jöreskog 1967, p. 474]. Attempting to identify and distinguish these various orders of factors through simultaneous transformation of an eleven dimensional exploratory maximum likelihood solution would be fruitless, however, given only thirtythree manifest variables. Neither could we expect the conventional Thurstonian approach to higher order factoring to bring order into the chaos generated by entertaining eleven (illdefined) first order factors—for reasons which have already been given. The foregoing discussion reveals that the fully exploratory approach to factor analysis we have proposed is not based on the assumption that a common factor solution of low dimensionality will account for all observed covariation among manifest variables aside from mere statistical sampling fluctuations. On the contrary, collinearity resistant fitting is designed to fit only the broad general relationships of latent coplanarity seen among manifest variables with an invariant common factor model of low dimensionality.
Page 278
In this sense, the proposed fitting procedure is in the spirit of Guttman's [cf. Lingoes and Guttman, 1967] quest for a reduced rank approximation to observed data that is itself known on a priori grounds to be of high dimensionality. There is no need to attribute imperfect fit between the lowdimensional invariant common factor model and observed data solely to sampling error, since "errors of approximation" due to local dependency outlier effects are also tolerated by collinearity resistant fitting. Resolution of the Psychometric Vs. Statistical Conflict in Fitting the Factor Analysis Model The geometrical approach to distinguishing broad general and evidently invariant common factors from unstable minor factors of group overlap that we have proposed takes due account of both psychometric and statistical considerations but is not tied too strongly to statistical principles. Through a simple modification of collinearity resistant fitting criterion (189) we can also arrive at a hybrid approach to the problem of extracting broad general and invariant common factors that fully exploits both psychometric and statistical principles. Recall from our earlier discussion that coilinearityresistant fitting and alpha factoring tend to tolerate the same pattern of local dependency outlier residuals, although the former displays a higher degree of tolerance for these linkages. In the same discussion it was shown that the implicit reciprocal uniqueness weighting used in maximum likelihood estimation (MLE) leads to overfitting of those very manifest correlations that seem to reflect superficial local dependencies when the analysis is restricted to a space of low dimensionality. In principle, however, the implict MLE weighting strategy should be beneficial in that it places emphasis upon those variables that share the most variance in common with the remainder of the battery. What seems to cause the problem in MLE, then, is its failure to distinguish the variance that can be attributed to broad general and invariant factors common to the battery as a whole from minor factor variance due to superficial group overlap. Since the latter distinction is accomplished by collinearityresistant fitting, however, would it not be possible to arrive at a hybrid approach in which implicit MLE weighting is applied in a collinearityresistant way such that emphasis is placed upon accounting accurately only for common variance attributable to broad general and invariant factors, while minor factor "common" variance is ignored along with unique factor variance? Consider the following criterion, which is simply a reciprocaluniquenessweighted version of collinearityresistant fitting criterion (189):
Page 279
The latter could also be viewed as a (conditionally) uniqueness weighted version of criterion (175), for maximum likelihood estimation, in which pairwise distinguishability weights,tWij (188), have been included so that residual linkages between highly collinear (hence, indistinguishable within common factor space) variables can be tolerated as legitimate errors of approximation rather than having to be fit within the limits of statistical sampling fluctuations. The distinguishability weights used in criterion (191) thus serve to weaken the stong mdimensional factor analysis model to which the observed data are supposed to conform with high likelihood from a purely statistical perspective, but seldom in fact do. It follows from what led earlier to (181) and (190) that reciprocaluniquenessweighted collinearityresistant firing criterion (191) can be minimized by obtaining improved estimates of successive individual elements in an arbitrarily identified orthogonal factor matrixtA; i.e.,
where the prior definitions oftaik,trij,twij, andtu2j japply and minimization is conditional on these values. In Table 15 we have presented the three factor direct geoplane transformed outcome of collinearityresistant fitting of Lord's data in conjunction with the conditional reciprocal uniqueness weighting implicit in MLE. It will be noted that the outcome hardly differs from the earlier collinearity resistant solution presented in Table 9. Hence, it differs quite markedly from the Table 11 maximum likelihood solution. Evidently the psychometric advantages of collinearity resistant fitting are not lost when statistically motivated reciprocal uniqueness weighting is also incorporated into the solution. Notice that the final estimate obtained for the second communality is not as high as it was in Table 9 for the more straightforward approach to collinearityresistant fitting. The hybrid solution is, if anything, slightly preferable in this respect. From a theoretical perspective the proposed hybrid approach to adaptive weighting should yield results that are highly invariant with respect to changes in test battery composition as well as the sampling of observations from a population. The unique variances that enter into criterion (191) reflect the combined influence of error, specific, and minor group factors, i.e., all but the broad general common factors that would be expected to remain invariant with respect to changes in test battery composition. Hence, being the complement of invariant common factor variance, each unique variance es
Page 280 Table 15. Three Direct Geoplane Transformed Reciprocal UniquenessWeighted CollinearityResistant Factors of Lord's Data
I
01 Word Fluency
II
23
III
1
Communality
30
17
02 Verbal (A)
96
10
3
94
03 Vocabulary (L)
85
4
5
73
04 Vocabulary (L)
87
12
6
78
05 Vocabulary (M)
87
2
0
76
06 Vocabulary (S)
66
8
18
54
07 Vocabulary (S)>
74
2
21
64
08 Vocabulary (S)>
69
2
24
58
09 Spatial Relations (A)
3
60
6
39
10 Intersections (L)
0
79
0
62
11 Intersections (L)
5
81
3
63
12 Intersections (M)
2
90
7
78
13 Intersections (S)
2
77
3
60
14 Intersections (S)
3
78
1
60
15 Intersections (S)
6
79
7
65
16 Mathematics (A)
29
19
40
38
17 Arithmetic Reasoning (L)
24
30
26
31
18 Arithmetic Reasoning (L)
21
28
32
32
19 Arithmetic Reasoning (M)
25
26
33
35
20 Arithmetic Reasoning (S)
21
21
50
45
21 Arithmetic Reasoning (S)
25
26
42
40
22 Arithmetic Reasoning (S)
25
14
50
43
23 Number Speed (R)
0
24 Number Speed (R)
26 2
25 Cancellation (R)
0
26 Picture Discrimination (R) 27 Number Checking (R) 28 English (G)
5
39 72
46
47
21
1
15
48
30
10
20
50
22
55
29 Foreign Language (G)
66 24
3
31
45
15
0
43
22
30 Eng’s Draw & Des. Geom (G)
0
64
28
60
31 Chemistry (G)
10
24
49
41
32 Mathematics (G)
0
18
66
55
33 Conduct (G)
9
2
16
3
Primary Factor
I
100
15
18
Correlations
II
15
100
32
III
18
32
100
*
decimal points omitted
Page 281
timate can also be expected to remain invariant with respect to changes in test battery composition. The presence of batteryspecific minor factor effects will therefore not disturb unique factor variance estimates resulting from collinearityresistant fitting. Such minor factors are effectively ignored in resistant fitting and their variance is thus attributed to the respective unique factors just as if no superficial overlap among the observed variables had occurred within the battery in question. Each unique factor variance arrived at should thus be characteristic of that particular variable within the entire domain sampled rather than within the particular battery analyzed. it will therefore serve as a good basis for reciprocal uniqueness weighting in which the noncommon or "error" portions of observed variances are equalized for purposes of optimal statistical estimation. While on the topic of solution invariance or robustness against changes in test battery composition, recall our earlier observation that alpha factor analysis should serve as a good initial configuration from which to launch collinearityresistant fitting. Given Nosal's [19771 observation that even the relatively straightforward minres method fails if it is not supplied with a good starting point, it would seem very important that reciprocaluniquenessweighted collinearityresistant criterion (191) be supplied with good starting values. After much experimentation we have settled upon a global strategy in which reciprocal uniqueness weighting via criterion (191) is applied only as a final step in refining an initial set of collinearityresistant factors. The latter are arrived at, in turn, from an initial set of alpha factors. In this way an initial psychometricallymotivated fit of the common factor model to observed data is first relaxed in the direction of accommodating local dependency outliers and then in the direction of excluding any variables or groups thereof which are found to share little in common with the battery as a whole. Further robustness against the biasing effects of superficial collinearities among observed variables has been found to accrue when the initial alpha factoring itself proceeds not from a principal axes solution but from a centroid solution such as that suggested by Horst [1965, pp. 1181251. Although regarded as "old fashioned" and inefficient by most modem factor analysts. the unit weighting used in centroid factor extraction renders the resultant solution less sensitive to the idiosyncracies of test battery composition than the mathematically more sophisticated least squares outcome tends to be.
Page 283
PART V FULLY EXPLORATORY FACTOR ANALYSIS
Page 285
10 Historical Perspective, Practical Application, and Future Possibilities A Synthesis of the Classical Common Factor Theories of Spearman and Thurstone via Multivariate Exploratory Data Analysis In this day of routine factor analysis via packaged computer programs it is all too easy to lose sight of the alien turn of the century climate into which the idea of common factor analysis was first introduced. Spearman's conjecture that all branches of intellectual activity have in common one fundamental function (or group of functions). whereas the remaining or specific elements of the activity seem in every case to be wholly different from that in all the others [1904, p. 284l
represented a striking departure from contemporary thought. He argued, moreover, that it was simply methodological deficiencies and measurement error that had led earlier workers to find no correspondence between sensory discrimination abilities, as measured in the laboratory, and "the more complicated Intellectual Activities of practical life" [Spearman, 1904, p. 284]. Proving that a functional correspondence existed between sensory discrimination and, for example, the marks students received in school classes was seen by Spearman as essential for practical justification of experimental psychology, as well as for its further theoretical development. A welltrained experimental psychologist himself, Spearman found it unsettling that a very long list of supposed "mental faculties" had been developed by introspectionists on one hand while, on the other, "There [was] scarcely one positive conclusion concerning the correlation between mental tests and independent practical estimates that [had] not been with equal force flatly contradicted" [1904, p. 219]. To make matters even worse, faculty enthusiasts such as Binet "had long been busily measuring such faculties as 'imagination,' 'memory,' 'attention,' and the like" [Spearman, 1930, p. 324]
Page 286
in the absence of any quantitative theory, such as that of common factors, that would justify the construction of measurement scales. Despite the increasing popularity of intelligence scales at the time, the overwhelming scientific evidence of lack of association among different measures of intellective performance supported Thorndike's view [e.g., Aikens, Thorndike, and Hubbel, 1902] that "the mind possesses an infinite number of abilities all mutually independent" [Spearman, 1930, p. 325]. Spearman of course discovered that low manifest correlations among laboratory tests, as well as between the latter and practical measures of intellectual performance in daily life, could simply be the result of "attenuation"; i.e., the spurious decrease in apparent degree of association between manifest variables which results when their measurement is subject to error. By taking two independent measurements of each variable, however, he was able to determine its degree of selfcorrelation or "reliability". With such indices of accuracy of measurement it became possible to estimate the "true" degree of association between different indicators of intellectual ability. Inspired by Galton's Human Faculty [cf. Galton, 1883], Spearman's initial aim was simply to "find out whether, as Gaiton had indicated, the abilities commonly taken to be 'intellectual' had any correlation either with each other or with sensory discrimination" [Spearman, 1930, p. 322]. On the basis of several small studies conducted in a village school, a preparatory school, and with adults, he concluded that, once allowances are made for attenuation due to errors of measurement, there really exists a something that we may provisionally term 'General Sensory Discrimination' and similarly a 'General Intelligence,' and further that the functional correspondence between these two is not appreciably less than absolute [Spearman, 1904, p. 272].
Having thus convinced himself that general sensory discrimination in the laboratory and general intelligence in the classroom were indistinguishable, apart from errors of measurement, Spearman set forth the following general theorem: "Whenever branches of intellectual activity are at all dissimilar, then their correlations with one another appear wholly due to their being all variously saturated with some common fundamental Function (or group of Functions)" [1904, p. 273]. He advocated further research designed to corroborate the theory of the universal unity of intellectual functioning and deduced that the truth of this theory would reveal itself in a "hierarchy of specific intelligences"; i.e., the entries within any table of correlations among manifest measures of intellectual ability would show a characteristic pattern of "equiproportionality". We now know that such proportionality obtains
Page 287
when the model of one common general factor fits the manifest correlations exactly and therefore communalities can be found such that the reduced correlation matrix is of unit rank. Spearman thus encountered the problem of fitting the factor analysis model for the first time—a topic that we have approached from the modern perspective of resisting local dependency outliers. Even in his first discussion of the topic, however, Spearman was aware that his postulated common factor of general intelligence could only be expected to account for relationships among intellectual activities that were somewhat distinct or "dissimilar" [1904, p. 273]. He repeatedly insisted that intercorrelations among mental tests could only be expected to admit of tabulation into a perfect "hierarchy" if those tests that were "very obviously like others in the same set" were eliminated from consideration [Spearman, 1920, p. 159]. The latter was taken as evidence for the existence of multiple common factors by critics of the theory of general intelligence [e.g., Kelley, 1928; Thurstone, 1931], however, rather than as a mere symptom of overlap or local dependency among highly similar tests (e.g., among arithmetic tests such as addition, subtraction, multiplication, and division). Spearman acknowledged that excessive ''overlap" between tests could lead to the appearance of so—called "group" factors but regarded the latter as of only minor interest in comparison to the general factor because "any element whatever in the specific factor of an ability will be turned into a group factor if this ability is included in the same set with some other ability which also contains this element" [Spearman, 1927, p. 82]. Although Spearman acknowledged that the model of one common general factor may not fit all aspects of manifest data if a battery contains variables that are so similar that they overlap in certain respects, he saw no scientific reason to entertain more than one broad general factor common to intellectual activity as a whole. By tolerating superficial linkages between manifest variables in the form of minor group factors, as in the bifactor method developed by the Spearman school [Holzinger and Swineford, 1937], the theory of general intelligence could be largely sustained despite idiosyncratic sampling of variables from the intellectual domain. It is not surprising that critics of Spearman's two factor (i.e., general plus specific) theory regarded the departure from "purity" indicated by admission of group factors as evidence of failure of the theory: By postulating a host of group factors or 'disturbers,' the twofactorist attempts to 'explain' the aberrancy of these data .... [but] the list of group factors or 'disturbers' which the twofactorist has had to put forth has grown to such a length that the theory threatens to become a multiple factor theory in its own right [Tryon, 1932b, p. 403].
Page 288
As Tryon went on to point out, Spearman's admission of group factors meant that the only issue really separating twofactor theorists from multiplefactor theorists was that of whether or not the general factor should be retained. Tryon himself had little hope for any variety of factor analysis, however, being convinced from his exposure to research on genetics and learning that innumerable small and more or less independent factors must be entertained rather than a few large ones: "These factors on the hereditary side are genes, and on the environmental side, they are the innumerable conditionings or associations formed in the course of learning" [Tryon, 1932a, p. 328]. He therefore advocated the strategy of clustering manifest variables purely for descriptive purposes without attempting to make inferences about hypothetical determinants of the covariation summarized [cf. Tryon 1939, 1958a, 1958b]. Thurstone's work on multiple factor analysis did not represent as much of a revolution in scientific thought on the matter as might be suspected, given the extent to which it now overshadows the earlier work of Spearman and his coworkers. In contrast, by demonstrating that mental functioning was essentially unitary, as opposed to being constituted of the traditional list of independent faculties, Spearman's work shook the very foundations of psychological theory. Moreover, Spearman's result provided a much needed justification for experimental psychology as well as a great deal of hope for practical applications of psychological testing. Spearman's vulnerability ultimately came from his failure to entertain more than one broad general common factor underlying manifest variation within the domain of intellectual activity. He readily admitted the occurrence of socalled overlap or group factors but justifiably deemphasized the latter as being of little scientific importance because they depend on what other abilities we choose to put into one and the same set; they therefore come and go at our will. Whereas the primary bisection into universal and nonuniversal factors remains inviolate; it is not dependent on any chance composition of a particular set of abilities, but instead marks the most fundamental feature of ability as a whole [Spearman, 1933, p. 600].
Spearman thus used the basic concept of "factorial invariance" to justify dismissing group factors as mere artifacts of variable sampling before the concept of invariance seems to have occurred to Thurstone [cf. Thurstone, 1935]. Indeed, Thurstone's early work on multiple factor analysis [1933], which advocated principal axes as a mathematically unique solution to the rotation problem, had to be retracted upon his discovery that a factorially invariant solution could be reached only through rotation away from the principal axes.
Page 289
A strange turn of events took place once Thurstone finally undertook a major illustrative application of his technique of multiple factor analysis to data he had laboriously collected from the domain of intellect [cf. Thurstone, 1938]. Rather than arriving, as might have been expected in the interest of scientific parsimony, at a small number of broad, general, overdetermined, and very likely invariant common factors, Thurstone's combination of exhaustive centroid extraction followed by subjective graphical rotation to "simple structure" yielded a total of thirteen factors. Most of the latter closely reflected patterns of overlap built into the test battery at the outset through the initial selection of variables; a selection which reflected, in turn, the kinds of group factors found earlier by Burt [1917], Alexander [1935], and others, albeit uncited by Thurstone. Satisfaction of Thurstone's original mathematical criterion of simple structure alone does not imply factorial invariance, since that criterion can generally be satisfied in a number of different ways even with the same set of data—especially in the ambiguous case of predominantly independent clusters [cf. Tucker, 1955]. It is therefore difficult to understand Thurstone's enthusiasm for a solution involving thirteen group factors, regardless of how thoroughly they satisfied the criterion of simple structure, not to speak of his declaration that seven of these factors could be regarded as the "primary mental abilities." Upon immediately reanalyzing Thurstone's data, Spearman found the usual general factor plus a few obvious group factors due to overlapping content among tests [1939]. He then complained that, although the socalled "primary" factors found by Thurstone imply statistical support for the old doctrine of mental faculties, in reality they amount to "nothing more than those constituents of the tests with respect to which two and more of them happen to overlap" [1939, p. 15]. Such batteryspecific group factors had long been set aside by Spearman and his coworkers in order to arrive at a more pure and uncontaminated (i.e., factorially invariant) assessment of the pervasive common factor of general intelligence. Now, however, the overdetermined, easily replicated, and highly informative general factor had been made to disappear completely through the use of "oblique reference axes" that referred to clearly unstable group factors! Thurstone's technique of multiple factor analysis lent credibility to group factors and thus constituted an apparent return to the discredited theory of mental faculties. Spearman, who had expended great efforts developing common factor analysis precisly as a means of halting the fruitless enumeration of mental faculties, could hardly be expected to take the outcome of Thurstone's new approach seriously. As events would have it, however, the first major application to psychological data of the computationally unwieldy approach to multiple factor analysis developed by Thurstone stood as the prime example of how factor analytic practice should be undertaken for an entire
Page 290
generation of American psychologists. The results presented by Thurstone [1938] in his study of primary mental abilities (PMA) quickly became a standard against which alternative methods of factor analysis (especially rotation) could be tested [cf. Holzinger and Harman, 1938; Eysenck, 1939; Wrigley, Saunders, and Neuhaus, 1958; Kaiser, 1960]. With the identification of seven to thirteen distinct clusters of highly similar or overlapping tests, such reanalyses were invariably held to be in substantial agreement with the original results of Thurstone, even though a strong general factor appeared in all orthogonal reanalyses except for that of Kaiser. Spearman, however, saw reason to criticize the group factors found by Thurstone from the outset because "group factors, far from constituting a small number of sharply cut "primary" abilities, are endless in number, indefinitely varying in scope, and even instable in existence. Any constituent of ability can become a group factor. Any can cease being so" [1939, p. 15]. Nevertheless, subsequent researchers have taken solace in the fact that their more modern techniques for accomplishing multiple factor analysis tend to yield results that are comparable to those initially obtained by Thurstone. Moreover, mere discovery of the sort of independent clusters of colliner variables found by Thurstone in his PMA study has now come to be seen as the proper scientific objective of common factor analysis by many researchers. What may have guaranteed perpetuation of the general approach to multiple factor analysis illustrated by Thurstone in his supposed discovery of the primary mental abilities, aside from its status as Psychometric Monograph No. 1, is the claim that the resulting solution is invariant with respect to changes in test battery composition. Since the supposed invariance of Thurstone's primary mental ability factors has been repeatedly disproved through the generation of all manner of group factors in subsequent investigations of the intelligence domain, however, it is more likely that Thurstone's approach to multiple factor analysis overshadowed earlier work because it was presented in a convenient mathematical form—that of matrix algebra. The latter certainly led to ease of communication, elaboration, and implementation of Thurstone's ideas. Hence, regardless of how destructive the unbridled imitation of Thurstone's exemplary application of his technique of multiple factor analysis to tests of mental ability might have been to psychological theory per se, it provided a great stimulus to the emerging field of psychometrics. From our perspective, the most tragic aspect of the conflict between Spearman and Thurstone over the existence of one general factor vs. many group factors is that an effective compromise position was never worked out. The bifactor solution, which grew out of the Spearman school, did admit group factors—but only as long as they accompanied just one broad general factor.
Page 291
As such, it is not really a multiple factor solution but only a general factor solution in which group factors happen to receive some recognition because they cannot be completely ignored. In Thurstone's method, on the other hand, one or more general factors could theoretically be tolerated in the form of intercorrelations among group factors. But when an attempt was made to recover such broad general factors through "second order" analysis [cf. Thurstone, 1944; 1947, pp. 411439] it had to be based upon a severely limited number of highly suspect relationships among the original, rather unstable, group factors. A more plausible SpearmanThurstone compromise solution is that in which a small number of broad, general, and overdetermined common factors are directly entertained in the first place, while as many small residual (group or doublet) factors are tolerated as might be required to fit the data. The latter are presumably due largely to superficial overlap or redundancy between pairs or among groups of manifest variables, while the former presumably correspond to hypothetical determinants that are of scientific interest because they are invariant with respect to sampling of variables. Hence, Spearman's idea that group factors due to overlap among variables must be tolerated, but cannot be allowed to detract from the identification of the general factor(s), can profitably be combined, through collinearity resistant fitting, with Thurstone's early technology for invariant isolation of hypothetical determinants of manifest covariation through bounding hyperplane simple structure transformation. It should thus be possible to identify more than one general common factor, if such are active within any given domain of variation, without getting distracted by the group factors abounding in most analyses. In this way the goals of second order factor analysis can be accomplished directly without having to rely on prior determination of group factors. The latter are of only secondary interest because they are highly battery specific, unstable (if even identifiable), likely to be artifactual, and essentially innumerable. It remains to be seen what impact this proposed synthesis of classical approaches to common factor analysis might have upon the ancient doctrine of mental faculties. Although faculty psychology was put to rest at the turn of the century through Spearman's supposed discovery of the one common factor of general intelligence, its resurrection was accomplished a few decades later through Thurstone's popular approach to multiple factor analysis. Currently, though, neither the concept of general intelligence nor that of primary mental abilities is taken seriously by many psychologists. Nevertheless, the basic steps carried out by Thurstone in arriving at his group factors are still slavishly imitated every time a clusteroriented factor analysis is undertaken. It seems that the skeptic Tryon has had the last word after all, since factor analysis is no longer seen as a means of identifying theoretically in
Page 292
teresting hypothetical determinants of manifest covariation, but merely as a way of describing and summarizing that association in the form of cluster dimensions [cf. Overall, 1963; Nunnally, 1967]. The extremely popular clusteroriented varimax method for isolating group factors was developed by Kaiser, a student of Tryon [cf. Kaiser, 1958, footnote on p. 187]. In the historical effort to determine where Thurstone's ideas about common factor analysis might have gone wrong, it is instructive to examine his 1933 APA presidential address in which a multiple factor alternative to Spearman's single common factor model was strongly advocated and thoroughly illustrated before a wide audience of potential users [Thurstone, 1934]. At the time of this address Thurstone seemed quite uncertain about just how multiple factor analysis might best be accomplished. He expressed the opinion that hypothetical causal factors would be more likely to be found near the outer edges or "fringes" of the typical coneshaped configuration of test vectors than near its center [1934, pp. 3031]. Since this cone of test vectors could be thought of as fanning out about a central axis, however, he felt that the latter could still be entertained as the conceptual equivalent of Spearman's central intellective function. Thurstone's early insight along these lines eventually gave rise to the "simple structure" strategy of isolating primary factors at the intersections of hyperplanes that closely and effectively bound the entire test vector configuration [1935, 1936]. However, he continued to harbor another and quite incompatible perspective on the task of multiple factor analysis that had also found its way into the APA presidential address. After reasoning that independent causal determinants of manifest covariation might best be identified by studying the outer "fringes" of any test vector configuration, he went on to suggest that "it is best to assemble the test batteries in such a way that there are several similar tests of each kind that are to be investigated" [1934, p. 32]. He recognized that including highly similar tests in the same battery "is in a sense the opposite of the precautions which have been current in factor studies where investigators have been careful to avoid similar tests," but nevertheless argued that it is ''illuminating to investigate the nature of the specific variance of each test by including several similar tests of each kind in the test battery" [1934, p. 32]. From the outset, then, Thurstone felt compelled to account for the specific variance of every test through the expedient of introducing minor factors of group overlap. His subsequent use of this strategy in the construction of the primary mental abilities battery virtually guaranteed the production of such strong clustering that any evidence for the existence of truly invariant common factors at the outer edges of the test vector cone was completely obscured. Moreover, this extremely unsound strategy for test battery construction seems eventually to have led Thurstone to mistake independent cluster
Page 293
structure as evidence of factorial purity of the tests involved [cf. Thurstone, 1947]. Independent clustering implies the absence of factorial complexity, but when complexity is avoided in the quest for clearcut group factors it becomes impossible to tell where the outer "fringes" (i.e., lateral edges) of the latent test vector cone are actually located. Hence, Thurstone's deliberate efforts to produce group factors ruled out the possibility of his ever detecting truly invariant common factors, and the group factors that he and subsequent users of clusteroriented multiple factor analysis have since found appear to be just as "endless in number, indefinitely varying in scope, and even instable in existence" as Spearman [1939, p. 15] surmised they would be! Thurstone's efforts to account for all reliable variance in his data were wellintentioned, but they made it impossible to distinguish the effects of invariant major common factors from the effects of unstable minor factors of group overlap. Thurstone actually proposed multiple factor analysis because, from the moment it was introduced, Spearman's [1904] theory of general intelligence had been criticized for its failure to account adequately for the actual patterns of intercorrelation found among tests of intellectual ability. Thurstone felt that Spearman had taken on an unnecessary burden in attempting to confirm that his single common factor model would account adequately (i.e., within the range of expected sampling fluctuations) for all relationships seen among measures of intellectual ability [cf. Thurstone, 1935]. He consequently proposed a more general mathematical model that held the important promise of identifying invariant common factors at the outer edges of the test vector configuration—and promptly fell right back into, Spearman's error of attempting to make that model account for everything of interest going on within any given domain of variation aside from mere statistical sampling fluctuations. From the perspective of factorial invariance we can now see that local dependency outliers in the form of excessive collinearities between doublet pairs or among overlapping groups of manifest variables must be accommodated even when a multiple common factor model is being fit to real data. In this way, a Spearman Thurstone compromise can finally be worked out within the modern context of exploratory data analysis. A Second Look at the Primary Mental Abilities The crucial historical role played by Thurstone's [1938] presentation of his approach to multiple factor analysis as a means of discovering "primary mental abilities" has been mentioned at several points in this volume. It would therefore be appropriate to discuss our reanalysis of his PMA data as
Page 294
an example of fully exploratory factor analysis within the domain of intellectual performance. Unfortunately, the PMA study is not a good candidate for reanalysis because "it was decided to reduce the computational labor by using tetrachoric correlation coefficients instead of productmoment coefficients" [1938, p. 58]. Moreover, "the computation of tetrachoric coefficients was made by means of facilitating tables" [1938, p. 59] so there is apt to be a good deal of inaccuracy in the nearly 1600 coefficients Thurstone obtained in this way. Thurstone's other large factorial study of intelligence [Thurstone and Thurstone, 1941] was methodologically more sound than the initial PMA study not only because product moment correlations were used but because sample size was increased to 710. The latter subjects were fourteenyearold children, whereas the original PMA sample consisted of 240 university student volunteers ranging from 16 to over 25 years of age. Sixty tests were included in the factorial study of intelligence (FSI) battery, along with age, sex, and mental age. The original PMA battery included 57 tests. Thurstone and Thurstone state that their analysis of the sixty variable FSI battery "revealed essentially the same set of primary factors which had been found in previous factorial studies" [1941, p. 27]. Six of the PMA factors (Verbal Comprehension, Word Fluency, Number, Space, Rote Memory, and Induction) were judged to be highly stable among the age levels studied while two (Perception and Deduction) were less stable or failed to replicate. Our discussion will therefore concentrate upon the results of reanalyzing the methodologically more sound FSI study of Thurstone and Thurstone. Brief mention will be made of our reanalysis of Thurstone's original PMA data only to suggest consistency of results across the studies. Although not essential, it would be helpful for the reader to have available the Thurstone and Thurstone monograph [1941] for comparison of their results with those to be discussed here. The monograph contains a complete description of each of the sixty tests administered; it is impossible for us to give more than a brief mention of the contents of some of the more factorially "interesting" tests. In summarizing their ten factor solution for sixty variables Thurstone and Thurstone [1941, Table 5, pp. 1819] grouped tests to emphasize the high degree of independnet clustering evident in the analysis. The clearcut cluster structure obtained in this attempt to isolate primary mental abilities at the fourteenyear age level seemed to indicate successful "purification" of the factors identified earlier using adult subjects. Primary factors in the original PMA solution had not been found to be nearly as "clearly defined" and easy to interpret as they proved to be in the refined factorial study of intelligence battery. Of course, we would argue that it is not collinearity as reflected in independent clustering but coplanarity as reflected by the defintion of
Page 295
bounding hyperplanes that implies factorial invariance, so Thurstone's successful efforts to purify his test battery actually tended to limit its usefulness in the task of identifying hypothetical determinants of real scientific interest. Although initially we took seriously Thurstone's conclusion that there are six or seven welldefined and highly replicable primary mental abilities, reanalysis of his PMA and FSI batteries with this number of factors showed evidence only of independent clusters that were not located at the intersections of highly populated and well defined hyperplanes bounding a coneshaped test vector configuration. We found, in fact, that no more than three evidently invariant common factors are required to account for the coplanarity relationships extant in the FSI battery, while there is some doubt that even this many broad general and evidently invariant hypothetical determinants can be supported in reanalysis of the PMA battery. We will therefore proceed directly to a discussion of the threefactor FSI solution which was finally accepted and will consider the fourfactor outcome only to justify our rejection of all higherdimensional solutions. Table 16 contains the directgeoplanetransformed primary factor pattern weights and intercorrelations among three reciprocaluniquenessweighted collinearity resistant factors of the Thurstone and Thurstone FSI battery, along with the associated communalities. Figure 19a presents the extended vectors plot of the Table 16 solution for the FSI data. A glance at Figure 19a reveals that the test vectors seem to fall, as expected, into a trihedral convex coneshaped configuration with only a few exceptions. Moreover, the primary factors appear to have been placed near the lateral edges of the test vector cone at the intersections of hyperplanes that tend to bound the outer faces of that cone. Notice that the central region of the test vector cone is highly populated and that there is little to suggest the existence of independent clusters of test vectors except, perhaps, near the midsection of the hyperplane common to the first two primary factor axes (FX and FY in Figure 19a). Certainly there is little evidence that the primary factor axes have themselves been passed through the centroids of isolated clusters of test vectors. (A look at the outcome of direct geoplane II transformation is given in Figure 19b to show the placement of hyperplanes prior to relaxation of those options initially used to encourage equality of factor sizes and mutual orthogonality. Some may prefer this solution, in which the correlations of Factor I with the others are .09 and .12, respectively. in truth, visual inspection of extended vectors plots in computeraided graphical rotation could probably improve upon the outcome of any purely analytic transformation method, but different investigators would seldom arrive at the same result.) Variables #45 through #48—all reading test scores—prove to be highly
Page 296 Table 16. Three Direct Geoplane Transformed Reciprocal UniquenessWeighted CollinearityResistant Factors of the Thurstone and Thurstone [1941] FSI Battery No.
Label
1
*
ABC
II
III
Communality
25*
29
27
27
2
Absurdities
46
4
22
29
3
Addition
23
42
0
23
4
Anagrams
29
38
l1
27
5
Arithmetic
50
2
27
37
6
Association
42
42
3
36
7
Backward Writing
20
50
32
46
8
Cards
7
6
61
38
9
Classification
21
11
39
26
10 Completion
78
16
12
69
11 Digit Span
23
18
8
11
12 Directions
61
26
14
51
13 Disarranged Sentences
53
35
7
44
14 Dot Counting I
1
48
6
25
15 Dot Counting II
11
45
32
35
16 Dot Counting Ill
6
49
10
27
17 Dot Patterns
3
47
41
45
18 Faces
8
36
47
44
19 Figure Grouping
4
20
40
23
20 Figure Naming
9
49
6
26
21 Figure Recognition
17
5
19
8
22 Figures
17
12
51
31
23 First and Last Letters
28
30
9
20
24 First Letters
31
43
0
29
25 First Names
38
36
12
26
26 Flags
17
4
56
37
27 FourLetter Words
40
25
5
24
28 Geometrical Forms
9
9
61
39
29 High Number
14
19
54
41
30 Identical Numbers
3
65
16
49
31 Identical Pictures
1
26
46
33
32 Incomplete Words
33
59
4
45
33 Letter Grouping
27
38
19
30
34 Letter Series
39
19
31
35
35 Mazes I
6
13
46
25
36 Mazes II
6
1
55
29
Decimal points omitted
I
Page 297 Table 16. (Continued) No.
Label
I
II
III
Communality
37 Multiplication
23
62
8
43
38 Number Patterns
3
34
30
24
39 Paragraph Recall
61
5
10
41
40 Pedigrees
50
9
27
39
41 Picture Naming
10
60
1
37
42 Prefixes
27
27
l
15
43 Proverbs
59
11
19
42
44 Pursuit
6
18
50
31
45 Reading: Voc.
79
4
4
63
46 Reading: Sen.
75
0
7
58
47 Reading: Par. I
73
0
2
53
48 Reading: Par. II
60
2
0
35
49 Reasoning
38
3
22
22
50 Rhyming Words
48
34
23
35
51 Same or Opposite
71
11
3
53
52 Scattered X's
22
24
21
15
53 Secret Writing
21
16
47
35
54 Suffixes
36
34
5
23
55 Synonyms
29
21
3
13
56 ThreeHigher
32
31
34
39
57 Verbal Enumeration
53
50
4
55
58 Word Checking
53
23
16
41
59 WordNumberRecall
6
20
8
05
60 Word Puzzles
37
51
4
42
61 Age
42
13
2
19
62 Sex
6
46
29
26
63 Mental Age
57
22
29
54
FACTOR CORRELATIONS
I I
II
III
100*
2
17
II
2
100
17
III
17
17
100
*
Decimal points omitted
Page 298
Figure 19a. Extended Vectors Plot of the Table 16 FSI Solution
pure indicators of Factor I in Table 16 and Figure 19 (i.e., 19a or 19b), but other verbal ability measures obtained by different methods load almost exclusively on the first factor as well; e.g., #51 (detecting synonyms or antonyms), #39 (paragraph completion via recall), #43 (identifying proverbs with similar meanings), etc. From this it can be gathered that the first factor corresponds to the widely recognized verbal comprehension factor even before determining which tests fall into the associated hyperplane. Nevertheless, it is important to ascertain that none of the tests with low loadings on the first factor involve a verbal component. With the possible exception of #20 (figure naming), #41 (picture naming), and #59 (wordnumber recall), the lack of verbal involvement in those tests populating the Factor I hyper plane is quite evident simply from the test titles. Moreover, even the reference to naming or word use in the three tests just mentioned is found to be deceptive upon examination of their descriptions in the FSI monograph. The figure "naming" test merely requires that the letter t, r, c, or s be written
Page 299
Figure 19b. Geoplane II Extended Vectors Plot Corresponding to the Geoplane I Plot in Figure 19a
under each figure encountered to indicate whether it is a triangle, rectangle, circle, or star; picture "naming" is just a more general version of the same task. In word number recall numbers are assigned to various objects and the task is to recall the correct number when each object is named later; the test has a very low communality, which accounts for its "noisy" projection well outside the boundaries of the test vector configuration in Figure 19. Those tests that are rather pure indicators of Factor II in Table 16 and Figure 19 seem to require a good deal of motivation to perform rapidly and fluently and amount to rather simple, repetitive tasks. items in tests # 14 and # 16 (the most straightforward dot counting tests included in the battery) are simply coded versions of elementary addition and could thus be regarded as measures of number speed analogous to those included in Lord's battery. Likewise, the identical numbers task (#30) is comparable to the cancellation task used by Lord as a measure of perceptual speed. Together with picture
Page 300
naming (#41), which has already been described as a simple task in which pictures are identified by giving the first letter of their name (e.g., h for a house), and figure naming (#20), the foregoing measures suggest that Factor II relates to "general speed" or motivation to perform rapidly and fluently given simple but tedious perceptual tasks. Notice also that the tests that fall into the hyperplane normal to the second factor and that have high communality all involve the manipulation of rather meaningful spatial or verbal material to solve problems and/or answer questions accurately. Aside from the reading comprehension tests mentioned in connection with our discussion of Factor I, the hyperplane in question includes tests #2 (absurdities, which involves deciding whether or not senteces are "foolish" or make good sense), #5 (arithmetic, in which the correct answer to a multiple choice arithmetic problem must be indicated), #49 (reasoning, which is made up of logical syllogisms), #26 (flags, in which representations of flags must be compared after imaginary rotation), #28 (geometrical forms, in which a region that is "inside the two solidline figures and outside the two dottedline figures" must be indicated for each set of four intersecting figures), and several other rather demanding tests of spatial ability that will be discussed shortly. In all of these members of the Factor II hyperplane, emphasis is placed upon the accuracy with which a fairly complex and meaningful task is accomplished rather than upon the speed or fluency with which a simple perceptual discrimination or numerical operation is performed. By now the reader is undoubtedly aware that the third factor identified in our analysis of the Thurstone and Thurstone FSI data relates to spatial ability and that we have arrived at a final set of factors that compares quite closely to that obtained in the analysis of Lord's data .The results in this case are more convincing, however, since the Thurstones set about quite deliberately to sample from the full range of variables that could be regarded as representative of intellectual performance. Evidently the results we have seen so far do reflect the action of hypothetical determinants that can be expected to remain invariant with respect to wide changes in test battery composition. Of course, Thurstone and Thurstone, as well as workers both before and after them, successfully identified Verbal, Spatial, and Fluency factors so it might well be asked what makes these results noteworthy. The important thing about our analysis is that it reveals little evidence that any more than these three broad, general factors can be expected to remain invariant with respect to changes in test battery composition. We will return to the issue of dimensionality shortly. First, however, it is necessary to bring the account of the third factor identified in Table 16 and Figure 19 to completion through contrasting those variables lying clearly in its hyperplane vs. those well out of the hyperplane.
Page 301 Table 17. Linkages Among WordFluency Tests in the FSI Battery: ThreeFactor CollinearityResistant Residuals Below Diagonal, Original Correlations Above
#23
#23
#24
#27
#42
#50
#54
#55
#60
#4
#6
52
51
46
47
48
39
46
48
42
29
47
51
56
53
49
57
49
54
31
23
41
47
43
32
45
48
41
#42
29
30
23
45
55
45
39
37
42
#50
21
26
19
22
49
46
52
42
48
#54
27
28
20
37
21
40
51
37
40
#55
24
31
15
31
25
23
31
28
45
#60
17
22
15
14
15
20
09
53
46
#4
24
22
25
18
13
13
10
20
40
#6
16
22
13
19
14
11
24
8
10
#24 #27
*
*
Decimal points omitted
The variables that appear to be rather pure indicators of Factor III in Table 16 and Figure 19 are #36 (mazes 11, comprised of rather difficult maze problems), #8 (cards, in which figures are to be visualized sliding around on the page to determine whether they fit a template), #28 (geometrical forms, described above), and #35 (mazes l, simple maze problems). Clearly these are spatial tests, several of which demand realtime simulated rotation of figures to determine whether or not they match. The hyperplane containing tests appearing to be unrelated to the space factor is interesting in that it contains, in addition to the rather pure indicators of verbal ability and speed or fluency of performance mentioned above, a rather strong concentration of factorially complex tests midway between the verbal and speed factors. As it happens, this concentration midway along the III or FXFY arc in Figure 19 actually corresponds to a compact cluster of tests that share excess collinearity even beyond the high degree of parallelism in their relationships to the three broad, general, and quite likely invariant common factors under discussion in this analysis. An inspection of residual linkages left among these tests by reciprocaluniquenessweighted collinearityresistant fitting reveals the Table 17 matrix of outliers that must be attributed to a minor factor of group overlap or redundancy. The substantial residual linkages seen in Table 17 are left among eight FSI variables that do, indeed, seem to reflect the joint action of a verbal ability factor and a speed or fluency factor: #23 (writing as many words as
Page 302
possible with a specified first and last letter), #24 (writing as many words as possible with a specified first letter), #27 (writing as many fourletter words beginning with a given letter as possible), #42 (writing as many words as possible with a given prefix), #50 (writing several words which rhyme with a given word), #54 (writing as many words as possible with a given suffix), #55 (writing several synonyms of a given word), #60 (rearranging letters to spell words from a specified class; e.g., animals: ebar, odg, atc), #4 (anagrams, in which as many smaller words as possible are made up from the letters of a larger original), and #6 (writing as many words as possible that name things to eat or drink). The strong residual linkages seen in Table 17, the high degree of manifest similarity among the tasks involved, and the fact that they all project midway along the hyperplanar arc connecting verbal and speed factors in Figure 19 imply that we are dealing here with a minor factor of group overlap such as redundancy in format or content rather than a broad general and invariant common factor. Thurstone and Thurstone admit that ''to determine whether our original interpretation of the wordfluency factor W could be sustained . . . we devised a number of new tests which were thought to be well saturated with this factor" and then go on to conclude that "this factor is well supported by our findings" [1941, p. 2]. Quite the contrary conclusion follows from our results, however, unless one wishes to get excited about the fact that deliberate inclusion of many highly overlapping tests in the same battery produces a minor factor of group overlap. The latter factor clearly fails to fall at the lateral edges of the coneshaped test vector configuration where bounding hyperplanes intersect, so it cannot be expected to remain invariant with respect to changes in test battery composition—nor can it be taken to indicate factorial purity of the tests involved. Such a minor group factor could be produced at any point in the test vector cone simply by clustered sampling of many highly redundant test vectors at that point. Note that some FSI tests project into the same part of the Factor III hyperplane as do members of the cluster of overlapping tests just mentioned but do not share local dependency outlier residual linkages with any of the latter. For example, test #57 (finding and marking words in a list that are instances of a specified class of words) and test #13 (making sentences of disarranged words) are clearly measures of verbal fluency, but they address the issue in a manner that is quite distinct from the openended "divergent production" format characterizing most members of the cluster of overlapping verbal fluency tests. We suspect that the minor factor of group overlap betrayed in the Table 17 residuals is a consequence of the openended format in which most word fluency tests are administered rather than a function of their content; i.e., a method factor. Rather than being a unitary trait, then, word fluency as measured by
Page 303
openended tests seems to be the complex outcome of verbal ability, motivation to perform rapidly and fluently given relatively simple tasks, and ready use of the openended response format. The verbal and speed factors involved seem to be quite broad, general, and likely to remain invariant with respect to changes in test battery composition. The minor factor related to testing format seems to be a matter of superficial overlap among certain tests included in this particular battery and could simply reflect the role of writing speed in the openended format, for example. The distorting role a minor factor of group overlap can play in an analysis when it is not recognized as such and the latent dimensionality is therefore overestimated was demonstrated in the Part III discussion of Thurstone's invariant 26variable box problem. Not surprisingly, something similar happens when the FSI data is analyzed in terms of four or more common factors. The distortion is even more profound than was seen in the case of the box problem, however, since rather strong patterns of group overlap were deliberately built into the FSI battery in Thurstone's misguided attempt to account for all reliable word fluency variance through reference to common factors. The consequences of overfactoring in the presence of strong minor factor effects will be seen when we consider the fourdimensional FSI analysis. First, however, it is in order to examine the threefactor residuals left by reciprocaluniquenessweighted collinearityresistant fitting for local dependency outlier residuals other than those just discussed. Aside from the Table 17 residuals reflecting excessive collinearity at the manifest level among openended word fluency tests, the possible action of several other minor factors is suggested in the patterned threefactor residual covariance matrix. A residual linkage of .43 remains, for instance, between the two mazes tests, #35 and #36, reflecting the fact that their manifest correlation of .68 is inflated due to superficial overlap in format or content. The highest correlation of either member of this doublet pair with any remaining test in the battery is with test #44 (pursuit, in which squiggly lines are traced to their destination), and this relationship leaves a residual of. 16. Apparently the minor factor of group overlap involved here is due to the tracing response required in both maze and pursuit tasks. However, such a response format factor is highly superficial in contrast to the latent determinants to which broad general and invariant common factors must refer, so the latter have been decontaminated of this method variance through deliberate tolerance for "error of approximation" to the observed data in the form of local dependency outlier residuals. A strong pattern of residual linkages is found among those tests requiring visualization of figural rotation in a plane. Tests #8 (cards), #22 (figures), and #26 (flags) all load strongly on the space factor in Table 16 and Figure 19, but they also share residual linkages ranging from .18 to .29. These
Page 304
linkages reflect the method variance shared in common by such highly redundant measures of spatial ability. Another strong pattern of superficial linkage is found in the matrix of threefactor residuals among the FSI dot counting tests, #14 through #16, where the residuals range from .24 to .44. The dot patterns test, #17, also seems to be linked to the simpler dot counting tasks, with a residual of. 19 between it and # 14. These tests can be seen from Figure 19 and Table 16 to range from measures of general speed and fluency of performance to a mixture of the latter and general spatial ability. However, they also share a minor factor that seems to relate to facility in determining how many dots make up a pattern or part thereof. The latter capacity is known as that of "subitizing", and apparently differentiates individuals reliably. It is distinct, however, from general spatial ability as well as general speed or fluency of performance, so the latter have been freed of the related method variance in the threefactor collinearityresistant analysis. The most difficult dot counting test, #15 (in which approximately fifteen dots are scattered at random within each group to be counted), is linked with a residual of .17 to pursuit test #44 mentioned above. Both of these tasks can be presumed to demand skill in visually keeping track of either dots already counted or that portion of a squiggly line that has already been traced. Many other such minor factors that apparently enter into successful performance of two or more of the FSI tests can likewise be identified through inspection of local dependency outlier residuals left by collinearityresistant fitting of the lowdimensional invariant common factor model. We haven't space to list all of these, but a few of the more interesting can be mentioned. A residual linkage of .25 between test #25 (memory of the connection between first and last names) and test #59 (recall of numbers that have been associated with objects) suggests a doublet factor of rote memory. However, the latter two tests are also linked with residuals of. 19 and .11, respectively, to test #21 (recognition of figures previously seen), so it seems that a minor memory factor might relate to both recognition and recall of material encountered previously. On the other hand, the fact that digit span test #11 is not found to be linked to the other memory tests in the matrix of threefactor residuals, given low communalities for all but #25 in the Table 16 solution, means that a general memory factor is not sustained by the FSI test battery. The FSI tests with obvious mathematical content are linked by moderate residuals ranging from .09 to. 19. These tests are #3 (addition, determining whether or not three twodigit numbers have been summed correctly), #5 (arithmetic, multiple choice word problems), #37 (multiplication, determining whether multiplication of a twodigit number by a single digit has been
Page 305
carried out properly or not), and #56 (marking those numbers in a list of twelve which are exactly three higher than the foregoing number). From an inspection of Table 16 and Figure 19, however, it is clear that only addition and multiplication share the same pattern of linkage to the three broad general factors identified in our analysis. Moreover, the latter fall quite near the locus of the word fluency cluster, being indicators mainly of the general speed or fluency of performance factor. The word problem test of arithmetic skills, on the other hand, is not a speed or fluency indicator but falls in the hyperplane of that factor quite near syllogistic reasoning test #49. The task of finding numbers in a sequence where each number has to be exactly three higher than the previous entry displays yet another loading pattern, being a fully complex indicator of all three factors. The only other test in the battery displaying this same degree of factorial complexity is #1 (in which the three letters A, B, and C occur in varied sequences and are to be combined according to fixed rules so that a simple letter answer results). Evidently these tests are fairly different in factorial composition but still share a minor factor of group overlap. In this connection, note that addition test #3 is linked by residuals of .17 and .13 to simple dot counting tests #14 and #16, respectively. These can be regarded as coded versions of simple addition. The threefactor residual linkages we have discussed imply several minor factors of group overlap which the Thurstone identified as "primary mental abilities" in their analysis. On the other hand, two of the broad general and evidently invariant common factors resulting from our analysis differ rather markedly from any of the factors identified by Thurstone. Tests #45, #46, #51, #10, #39, #47, #48, #43, and #2 load, in that order, on the FSI verbal comprehension factor so the latter can certainly be identified with Factor I in Table 16 and Figure 19. Although the FSI space factor resembles Factor III in our analysis (the order of variables loading thereon in the FSI analysis was #8, #22, #26, #36, #35, #29, #28, and #44), the Thurstone's analysis clearly led to displacement of the latter toward the cluster of spatial rotation tests (#8, #22, and #26). Meanwhile, the other spatial cluster of maze and line tracking tests (#35, #36, and #44) gave rise to a separate "factor" of "visual pursuit". From the perspective of our analysis, then, it can be said that both of the Thurstones' spatial factors are seriously contaminated by method variance due to superficial overlap in test format. The Thurstones did not identify a general fluency or speed of performance factor comparable to Factor II in the Table 16 analysis. On the contrary, they designated both the word fluency cluster and the numerical operations cluster (which we identified only through inspection of threefactor residuals) as the most clearly defined primary factors in the entire FSI analysis! This reveals the great discrepancy between Thurstone's theoretical conception of the
Page 306
goals of simple structure transformation and his practical application of that method. It was Thurstone himself who reasoned that invariant common factors can be isolated only at the lateral edges of a compact test vector cone through bounding hyperplane simple structure transformation. However, he does not seem to have realized that independent clustering of highly collinear variables can be brought about anywhere within the test vector cone simply as a consequence of arbitrary steps taken in test battery construction. These clusters cannot be expected to be invariant nor do they define the lateral edges of the latent test vector cone. Provided that enough hyperplànes are entertained, it is possible to get all but one of them to intersect at each cluster; however, such a tactic does not ensure that all of the hyperplanes involved fall at the actual outer boundaries of the latent test vector cone. Thurstone's word fluency "factor" is a clear example of the way in which complex effects (i.e., clusters) can be mistakenly interpreted as simple causes (i.e., primary factors) when they are located at the intersections of hyperplanes that fail to bound the test vector configuration. In addition to the "number", "word fluency", "space", and "verbal comprehension" factors which we have just discussed, Thurstone identified "rote memory'', "induction or reasoning", and "perceptual" factors as primary mental abilities. The last was not regarded as "sufficiently clear for general use" [Thurstone and Thurstone, 1941, p. 27], however. When we entertain the possibility that more than three broad, general and invariant common factors can be isolated through collinearityresistant fitting and bounding hyperplane simple structure transformation of the FSI data the solution given in Table 18 results. In contrast to the rather low levels of mutual intercorrelation seen among primary factors in Table 16, it is evident that entertaining a fourth factor gives rise to distinct patterns of pairwise linkage. Not surprisingly, the extra factor seen in Table 18 corresponds to the word fluency cluster discussed earlier. It is noteworthy that this "factor" accounts for more common variance than any other in the fourdimensional analysis. There are a good number of word fluency tests in the FSI battery and their latent factorial complexity means that they fall nearer the major principal axis or first centroid of the test vector configuration than do the more pure indicators of latent determinants at its lateral edges. Three of the factors (I, III, and IV) in Table 18 can be recognized as modified versions of what we saw earlier in Table 16. However, placing an extra axis through the superfluous word fluency cluster, near the central region of the hyperplane common to the general verbal ability and general speed factors, has distorted the entire solution. The new axis correlates .25, .49, and .24, respectively, with representations of the three original and evidently invariant common factors. So much distortion in the locus of the general speed factor (III) results when its contribution to word fluency is
Page 307 Table 18. Four Direct Geoplane Transformed Reciprocal UniquenessWeighted CollinearityResistant Factors of the Thurstone and Thurstone [1941] FSI Battery No.
Label
1
ABC
II
27*
2
Absurdities
35
3
Addition
4
Anagrams
5
III
2
IV
44
Communality
12
33
31
14
17
28
15
1
68
24
38
4
38
18
7
28
Arithmetic
62
3
24
4
47
6
Association
3
56
14
6
40
7
Backward Writing
0
34
22
30
44
8
Cards
32
9
1
53
37
9
Classification
37
8
32
21
3l
10
Completion
44
60
3
2
68
11
Digit Span
14
15
17
1
11
12
Directions
36
45
15
3
51
13
Disarranged Sentences
13
65
1
5
50
14
Dot Counting I
10
I
61
8
34
15
Dot Counting II
17
2
45
27
39
16
Dot Counting III
19
6
47
5
31
17
Dot Patterns
3
7
60
26
51
18
Faces
4
18
18
47
44
19
Figure Grouping
l
8
3
46
25
20
Figure Naming
12
20
37
3
26
21
Figure Recognition
22
2
11
12
9
22
Figures
40
7
1
39
30
23
First and Last Letters
0
44
4
9
24
24
First Letters
6
51
14
1
33
25
First Names
13
26
35
24
26
26
Flags
35
l
I
47
35
27
FourLetter Words
14
42
8
1
25
28
Geometrical Forms
30
1
12
60
40
29
High Number
28
1
27
40
41
30
Identical Numbers
26
21
45
17
50
31
Identical Pictures
l
15
2
55
38
32
Incomplete Words
10
53
26
5
44
33
Letter Grouping
13
23
32
9
30
34
Letter Series
38
16
24
16
37
35
Mazes I
3
11
14
62
35
36
Mazes 11
9
1
18
66
37
(Table continued on next page)
I
Page 308 Table 18. (Continued) No.
Label
I
II
III
IV
Communality
37
Multiplication
2
14
73
26
53
38
Number Patterns
2
12
22
28
23
39
Paragraph Recall
47
33
6
4
42
40
Pedigrees
41
30
4
16
39
41
Picture Naming
34
53
9
13
42
42
Prefixes
4
46
0
0
20
43
Proverbs
52
27
8
7
41
44
Pursuit
2
3
5
55
34
45
Reading: Voc.
52
49
12
7
60
46
Reading: Sen.
53
43
2
7
56
47
Reading: Par. I
49
42
1
16
50
48
Reading: Par. II
48
25
8
16
38
49
Reasoning
48
1
17
4
28
50
Rhyming Words
3
69
6
1
46
51
Same or Opposite
39
56
6
5
52
52
Scattered X's
22
2
6
30
16
53
Secret Writing
31
3
22
35
36
54
Suffixes
2
46
12
8
26
55
Synonyms
9
61
18
3
28
56
ThreeHigher
35
4
50
12
47
57
Verbal Enumeration
1
81
2
8
67
58
Word Checking
28
45
6
9
41
59
WordNumberRecall
4
9
30
0
7
60
Word Puzzles
I
50
24
0
42
61
Age
23
28
7
9
18
62
Sex
38
23
22
22
24
63
Mental Age
42
38
14
16
55
FACTOR CORRELATIONS
I
II
III
IV
I
100
25
1
2
II
25
100
49
24
III
1
49
100
45
IV
2
24
45
100
*
Decimal points omitted
*
Page 309
eliminated, however, that the former is left primarily as an indicator of fluency of performance in spatial and numerical tasks. Not surprisingly, the latter then correlates .45 with the general spatial ability factor (IV). An examination of the extended vectors plots associated with the Table 18 fourfactor analysis of the FSI battery reveals, as might be expected, that these factors have been isolated at the intersections of hyperplanes that fail to define the true outer boundaries of the test vector configuration. These plots are given in Figures 20a through 20d. Note that the factors happened to emerge from the fourfactor analysis in the same order seen earlier, but for intrusion of word fluency in second place. Hence, Figure 20c is comparable to the original extended vectors plot in Figure 19. Notice that a good deal of distortion has occurred due to removing factorially complex word fluency variance. It is no longer so clear that we are dealing with a polyhedral convex coneshaped test vector configuration, for instance, since
Figure 20a. Extended Vectors Plot of the Verbal (FX), Word Fluency (FY), and Speed (FZ) Factors in Table 16
Page 310
Figure 20b. Extended Vealors Plot for the Verbal (FX), Word Fluency (FY), and Spatial (FZ) Factors in Table 16
formerly rather pure indicators of general speed or fluency of performance (i.e., tests #14, #16, #20, #30, and #41) now project outside of the boundaries defined by hyperplanes. It is as though verbal ability has a negative influence on these tests, but that runs counter to the positive manifold assumption through which bounding hyperplane simple structure transformation is justified as a technique for discovery of invariant hypothetical determinants. The "speed" factor identified in this analysis is therefore quite suspect; the hyperplane it should share with verbal ability (FXFY in Figures 19 and 20c) has been distorted by intrusion of the word fluency factor nearby. Figure 20a depicts the outcome of placing a primary factor axis (FY) through the word fluency cluster (II) that lies beside the hyperplane common to the verbal factor (I) and the speed factor (III). Notice that Factor II (FY) is correlated with both Factors I (FX) and III (FZ), but the latter are uncorrelated with one another. As can readily be seen in Figure 20a, Factor II
Page 311
Figure 20c. Extended Vectors Plot of the Verbal (FX), Speed (FY), and Spatial (FZ) Factors in Table 16
(FY) falls midway between the latter pair, besides the hyperplane they share. The tests that project nearest factor vector FX in Figure 20a (#36, #28, #22, #8) are not verbal but spatial tests (see Table 18), while the tests found to be rather pure measures of the verbal factor in Table 16 and Figure 19 (e.g., reading tests #45, #46, #47, and #48) are located closer to the socalled Word Fluency factor in the FXFY hyperplane. The hyperplanes in this figure are not distinct, implying that the factors involved are but slight variations on one another; i.e., they get at no new hypothetical determinants beyond those that give rise to general verbal ability, general spatial ability, and general speed or fluency of performance. Figures 20b and 20d show that hardly any tests fall along the hyperplane common to the hypothesized word fluency factor (II) and the spatial ability factor (IV). Once again, it is apparent that entertaining four broad general factors common to the FSI battery is unnecessary; all latent coplanarity in
Page 312
Figure 20d. Extended Vectors Plot of the Word Fluency (FX), Speed (FY) and Spatial (FZ) Factors in Table 16
the battery can evidently be accounted for quite adequately in a space of reduced dimensionality, provided that excessive collinearity among the word fluency measures and among certain subsets of the spatial ability measures are accommodated through resistant fitting. We have already seen that the latter was accomplished in the threefactor FSI solution and that the resulting factors appeared highly likely to show invariance with respect to changes in test battery composition. In Figure 21 we have reproduced the twodimensional configuration found by Guttman [1965, p. 32, Fig. 1] when he applied smallest space analysis to a selection of variables from the Thurstone and Thurstone FSI battery under discussion here. Notice that rotation of Figure 21 clockwise a little over 90º will align the selected points rather closely with those projected in the extended vectors plot of three directgeoplanetransformed collinearityresistant factors (Figure 19) discussed earlier. Although the nonmetric multidimensional scaling solution agrees with our results to this strong degree,
Page 313
Figure 21. Smallest Space Analysis of Selected FSI Tests, based on Guttman, 1965, p. 32, Fig. 1. (Figure reproduced with the permission of Educational Testing Service)
it is based upon such a weak relationship between the (twodimensional mathematical model and observed data that there is no objective way to use the solution to identify hypothetical determinants that might account for the test vector configuration. In fact, Guttman failed to see evidence in his plot that the test vectors are arrayed within a trihedral convex cone. Of course, he did not expect multidimensional scaling to reveal such an underlying structure, as can be gathered from the following comment: The blind empirical results of the computer cannot by themselves show a substantive law of formation; they give only dimensionality and distance. Without the faceted definitional system it would be very difficult to interpret a [smallest space analysis] plot [1965, p. 33].
In his solution Guttman designated numerical, verbal, and geometrical tests by letters while analytical (as opposed to achievement) tests were
Page 314
designated by asterisks. In inspecting the outcome Guttman was surprised to find that the more complex (i.e. analytical) tests fell near the center of the plot whereas the simpler tests of achievement (as measured in each respective language of communication) fell near the periphery. His radex theory of intelligence had postulated a "radial expansion of complexity [where] simplicity would be in the center and expand outward into complexity" [1965, p.34]. In other words, the obtained data structure strongly contradicted the hypothesized law of formation. Nevertheless, Guttman took his results as evidence that smallest space analysis is much preferable to common factor analysis. He argued that factor analysis places too much emphasis on coordinates rather than on the configuration of points per se. This argument is puzzling since in a factor analytic interpretation of his results it is clear why the most complex tests fall in the center of the test vector configuration, whereas in a radex interpretation it is not! Moreover, in fully exploratory factor analysis one can inspect both the coordinates and the test vector configuration so choosing between them is not an issue. In Table 19 and Figure 22 we have presented the threefactor solution which was arrived at through collinearity resistant fitting and bounding hyperplane simple structure transformation of Thurstone's original PMA data [1938] in which tetrachoric correlations were used. The results are quite comparable to what was obtained for the FSI battery, except that the speed or fluency of performance factor is not convincing. In fact, the latter does not lie at the intersection of welldefined bounding hyperplanes and could simply be a minor group factor reflecting superficial collinearity among the number speed tests. The FSI analysis just discussed seems much more definitive as well as being based on sounder methodology (larger sample size, product moment correlations). We have presented the PMA results only to show that they do not conflict with the FSI analysis—especially with respect to the conclusion that no more than three broad, general, and quite likely invariant common factors need be entertained. Toward an Understanding of Higher Integrative Mental Functioning On the matter of interpreting broad, general, and evidently invariant common factors identified through collinearity resistant fitting and bounding hyperplane simple structure transformation, a great deal remains to be said and done. in fact, any final effort to interpret evidently invariant common factors must wait until theoretically inclined reserchers themselves apply fully exploratory factor analysis as a routine technique of scientific discovery. Although we have used the designations of general verbal ability, general
Page 315 Table 19. Three Direct Geoplane Transformed ReciprocalUniquenessWeighted Collinearity Resistant Factors of the Thurstone [1938] Primary Mental Abilities Battery No.
Label
1
Reading I
I