119 40 4MB
English Pages 215 Year 2023
Behaviormetrics: Quantitative Approaches to Human Behavior 16
Shizuhiko Nishisato
Measurement, Mathematics and New Quantification Theory
Behaviormetrics: Quantitative Approaches to Human Behavior Volume 16
Series Editor Akinori Okada, Professor Emeritus, Rikkyo University, Tokyo, Japan
This series covers in their entirety the elements of behaviormetrics, a term that encompasses all quantitative approaches of research to disclose and understand human behavior in the broadest sense. The term includes the concept, theory, model, algorithm, method, and application of quantitative approaches from theoretical or conceptual studies to empirical or practical application studies to comprehend human behavior. The Behaviormetrics series deals with a wide range of topics of data analysis and of developing new models, algorithms, and methods to analyze these data. The characteristics featured in the series have four aspects. The first is the variety of the methods utilized in data analysis and a newly developed method that includes not only standard or general statistical methods or psychometric methods traditionally used in data analysis, but also includes cluster analysis, multidimensional scaling, machine learning, corresponding analysis, biplot, network analysis and graph theory, conjoint measurement, biclustering, visualization, and data and web mining. The second aspect is the variety of types of data including ranking, categorical, preference, functional, angle, contextual, nominal, multi-mode multi-way, contextual, continuous, discrete, high-dimensional, and sparse data. The third comprises the varied procedures by which the data are collected: by survey, experiment, sensor devices, and purchase records, and other means. The fourth aspect of the Behaviormetrics series is the diversity of fields from which the data are derived, including marketing and consumer behavior, sociology, psychology, education, archaeology, medicine, economics, political and policy science, cognitive science, public administration, pharmacy, engineering, urban planning, agriculture and forestry science, and brain science. In essence, the purpose of this series is to describe the new horizons opening up in behaviormetrics — approaches to understanding and disclosing human behaviors both in the analyses of diverse data by a wide range of methods and in the development of new methods to analyze these data. Editor in Chief Akinori Okada (Rikkyo University) Managing Editors Daniel Baier (University of Bayreuth) Giuseppe Bove (Roma Tre University) Takahiro Hoshino (Keio University)
Shizuhiko Nishisato
Measurement, Mathematics and New Quantification Theory
Shizuhiko Nishisato Professor Emeritus University of Toronto Toronto, ON, Canada
ISSN 2524-4027 ISSN 2524-4035 (electronic) Behaviormetrics: Quantitative Approaches to Human Behavior ISBN 978-981-99-2294-9 ISBN 978-981-99-2295-6 (eBook) https://doi.org/10.1007/978-981-99-2295-6 © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Singapore Pte Ltd. The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721, Singapore
Preface
It is more than half a century ago when someone uttered “Garbage in, garbage out.” It was a simple reminder and warning that if the input for data analysis is not well scrutinized, the output of analysis may be unworthy of examining. This was a very timely warning since there was then a giant leap in the popularization of data analysis from among statisticians to other researchers in diverse areas of disciplines. It was an exciting time when interdisciplinary research made great progress into a routine mode of research. Almost everyone then became an expert in data analysis and the profession of data analysts became over-populated. It is significant to note that the above saying was uttered as a criticism around that time for the new scene of data analysis. It is almost half a century since then, and surprisingly, the above saying still sounds valid as an important lesson for data analysis. This makes us wonder if the popularity of data analysis has also resulted in substantial improvement in quality of data analysis. We would like to think that the substance of data analysis has made comparative progress. From the practical point of view, we should note that today’s data analysis is dominated by survey-type data. Once we focus our attention to survey analysis, we still see numerous problems from sampling to data analysis itself, and it is not too difficult to find examples of survey analysis which one can raise questions about the procedure, analysis and conclusions. These days, researchers are better informed of data analysis, and the above criticism may sound too harsh to generalize. However, when we look at data analysis of surveys, particularly with quasi-quantitative data, there are many problems that need substantial improvement. As an interdisciplinary researcher, the current author has observed many problems associated with inappropriate use of statistical procedures when data are strictly not quantitative. If non-quantitative data are subjected to quantitative analysis without appropriate transformations, the output may completely lack the validity. This undeniable situation has existed for many years, and the current book is intended to shed some light on this aspect even by a small step toward improvement.
v
vi
Preface
The book is elementary, and many topics discussed here are well known to many researchers. Nevertheless, it is the author’s belief that we still need this kind of an elementary book if we are to promote appropriate data analysis. Upgrading the basic understanding of data we analyze is the first step toward appropriate data analysis. The book will remind us what the main object of data analysis is, through the use of numerical examples. So, please be patient when we use many simple examples to illustrate a number of steps toward optimal data analysis. It is the author’s wish to be helpful even for a small number of researchers who are involved in analysis of non-quantitative data. Let us start with simple examples which help us to understand the data, then discuss the nature of input data (quantitative, quasi-quantitative and non-quantitative), then basic mathematics, and finally a new look at quantification theory which provides optimal analysis for non-quantitative data. Let us make sure that our research is not characterized by “garbage in, garbage out.” Keep in mind that our main object is how to analyze non-quantitative data in a quantitative way, and its desideratum is optimal analysis of our precious data. This book is geared for everyone involved in analysis of qualitative (nonquantitative) data. Simple enough examples will be used to facilitate the readers’ understanding of what is important for data analysis. The book will be useful for many students and researchers in diverse areas of specialty, particularly for those who are outside the discipline of statistics. To understand how non-quantitative data can be analyzed quantitatively and optimally would likely help all those involved in data analysis. The book may be helpful to those who find most textbooks of statistics rather difficult to follow. We hope that the book will attract attention of not only students but also those who are currently involved in data analysis. As the author’s final contribution to quantification theory, the last few chapters are devoted to his geometric theory of doubled multidimensional space which not only clarifies the relation between quantification and geometry but also explains convincingly the perennial problem of joint graphical display of quantification results. With these clarifications as a new look, the current book will guide the researchers toward a right course for further developments. There are two sincere apologies: One that inferential aspects of quantification theory is not covered in the current book, and the other a personal apology that I presented a summary of my own work without referring to other researchers’ perhaps more important studies than mine. Toronto, Canada March 2023
Shizuhiko Nishisato
Acknowledgments
First of all, the author wishes to acknowledge the generous support of Dr. Akinori Okada, Editor of Springer Behaviormetrics Series, particularly my gratitude for his constant support and many years of friendship, Managing Editors Daniel Baier, Giuseppe Bove and Takahiro Hoshino, Mr. Yutaka Hirachi of Springer Japan, for his kind, persuasive encouragement, the Editorial Staff of Springer Nature Singapore for the meticulous work and the kind reviewers of the manuscript, all of whom have offered their generous support to make this publication possible. I have been lucky to have many friends over 60 years of my research career. But, many of my mentors whom I wish to thank have already passed away. Among them are Masanao Toda, Yoichiro Takada, Tadasu Oyama and Yoshio Sugiyama in Japan, R. Darrell Bock, Lyle V. Jones and Emir H. Shuford in the USA, and Dalbir Bindra in Canada. Among the younger generations, I owe very much to many people for their kind friendship, in particular, Ross E. Traub, Takashi Asano, Wolfgang Gaul, Shuichi Iwatsubo, Yasumasa Baba, Yoshio Takane, Michael J. Greenacre, Setsuko Thurlow, Akira Kobashigawa, Hans-Hermann Bock, Jan de Leeuw, Willem Heiser, Jacqueline Meulman, Peter van der Heijden, Pieter Kroonenberg, Jos ten Berge, Henk Kiers, Patrick Groenen, Ledovic Lebart, Michel Tenenhaus, Gilbert Saporta, Edwin Diday, Norman Cliff, Lawrence Hubert, James Ramsay, Patrick Curran, Brigette Le Roux, Carlo Lauro, Boris Mirkin, Hamparsum Bozdogan, Serghei Adamov, HansJoachim Mucha, Fion Murtagh, Vartan Choulakian, Helmut Vorkauf, Lucien Preus, Michel van de Velden, Ryozo Yoshino, Takashi Murakami, Naohito Chino, Hiroshi Mizuta, Tadashi Imaizumi, Yutaka Kano, Kohei Adachi, Hisashi Yadohisa, Koji Kanefuji, Heungsun Hwang, Reinhold Decker, Graham Beans, Philip Weingarden, Elizabeth Abbe, Amnon Rapoport, Steve Zyzanski, Larry Gordon, Christopher Ringwalt, Richard Wolfe, Gila Hanna, Ruth Childs, Philip Nagy, Susan Elgie, Wen-Jenn Sheu, Yukihiko Torii, Yukio Inukai, Kuo-Sing Leong, Ian Wiggins, Hyung Ahn, Mary Kolic, Daniel Lawrence, Oscar Milones, Mark Gessaroli, David Hemsworth, Charles Mayenga, Maurice Odondi, Stuart Halpine, Akira Fujii, Reiko Komatsu, Tsuyoshi Hirata, Hideshi Seki, Miyuki Hasuike, Reiko Kawanishi, Hiroshi Oikawa, Hirozumi Shibuya, Osamu Shirahata, Koichi Murakami, Shozo Nagai, Ryoichi Shibuya, Akira vii
viii
Acknowledgments
Fig. 1 Nishisato with wife Lorraine in Egypt
Ozaki, Suketoshi Iiyama, Toshitaka Tago, Shizue Hashieda, Minoru Shimosaka, Ryouko Miura, Yasuko Miura, Tsuyako Ikehata and Mitsugu Takabatake. Special thanks are to (1) José G. Clavel, Eric J. Beh and Rosaria Lombardo who not only co-authored a book with me (Nishisato, Beh, Lombardo& Clavel, 2021) but also edited my Festschrift (Beh, Lombardo & Clavel, 2023) and to (2) Se-Kang Kim, who has kept up correspondence with me for all these years and offered me useful discussions. I cannot thank enough the four friends from Spain, Australia, Italy and the USA. There are many friends from my two alma maters, Department of Experimental Psychology, Hokkaido University, Sapporo, Japan, and the Psychometric Laboratory, University of North Carolina, Chapel Hill, N.C., USA, and then many colleagues at the Department of Psychology, McGill University, Montreal, and the Ontario Institute for Studies in Education (OISE) of the University of Toronto, Toronto, Canada. On a personal level, my profound and utmost thanks goes to my wife Lorraine for her constant and kind support, my son Ira Nishisato who co-authored three dual scaling books with me, his wife Samantha Dugas, my grandson Lincoln DugasNishisato, and Samantha’s parents André and Gillian Dugas. I must also include my sister Michiko Soma and brother Akihiko Nishisato in Japan who struggled with me during the war years and constantly supported me throughout my life—I also owed so much to my late brother Tsunehiko Nishisato. Currently, I live a comfortable life with my wife in a beautiful condominium, the Riverhouse, and I would like to extend my thanks to all my friends there, too. Toronto, Canada March 2023
Shizuhiko Nishisato
Contents
Part I
Measurement
1
Information for Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1 An Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2.1 Fundamental Arithmetic Operations . . . . . . . . . . . . . . . . 1.2.2 Data Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3 Stevens’ Theory of Measurement . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3.1 Nominal Measurement . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3.2 Ordinal Measurement . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3.3 Interval Measurement . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3.4 Ratio Measurement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.4 Concluding Remarks on Measurement . . . . . . . . . . . . . . . . . . . . . . 1.5 Task of Quantification Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3 3 3 4 5 7 8 11 12 13 14 14 16
2
Data Analysis and Likert Scale . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1 Two Examples of Uninformative Reports . . . . . . . . . . . . . . . . . . . . 2.1.1 Number of COVID Patients . . . . . . . . . . . . . . . . . . . . . . . 2.1.2 Number of Those Vaccinated . . . . . . . . . . . . . . . . . . . . . . 2.2 Likert Scale, a Popular but Misused Tool . . . . . . . . . . . . . . . . . . . . 2.2.1 How Does Likert Scale Work? . . . . . . . . . . . . . . . . . . . . . 2.2.2 Warnings on Inappropriate Use of Likert Scale . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
19 20 20 20 21 24 35 36
Part II 3
Mathematics
Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1 An Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Series and Limit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.1 Examples from Quantification Theory . . . . . . . . . . . . . . 3.3 Differentiation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
39 39 39 41 45
ix
x
4
Contents
3.4 3.5 3.6 3.7 3.8 3.9
Derivative of a Function of One Variable . . . . . . . . . . . . . . . . . . . . . Derivative of a Function of a Function . . . . . . . . . . . . . . . . . . . . . . . Partial Derivative . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Differentiation Formulas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Maximum and Minimum Value of a Function . . . . . . . . . . . . . . . . Lagrange Multipliers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.9.1 Example 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.9.2 Example 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
46 47 47 47 49 49 50 51 53
Matrix Calculus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1 Different Forms of Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1.1 Transpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1.2 Rectangular Versus Square Matrix . . . . . . . . . . . . . . . . . 4.1.3 Symmetric Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1.4 Diagonal Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1.5 Vector . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1.6 Scaler Matrix and Identity Matrix . . . . . . . . . . . . . . . . . . 4.1.7 Idempotent Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Simple Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.1 Addition and Subtraction . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.2 Multiplication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.3 Scalar Multiplication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.4 Determinant . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.5 Inverse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.6 Hat Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.7 Hadamard Product . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3 Linear Dependence and Linear Independence . . . . . . . . . . . . . . . . . 4.4 Rank of a Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.5 System of Linear Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.6 Homogeneous Equations and Trivial Solution . . . . . . . . . . . . . . . . 4.7 Orthogonal Transformation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.8 Rotation of Axes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.9 Characteristic Equation of the Quadratic Form . . . . . . . . . . . . . . . . 4.10 Eigenvalues and Eigenvectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.10.1 Example: Canonical Reduction . . . . . . . . . . . . . . . . . . . . 4.11 Idempotent Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.12 Projection Operator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.12.1 Example 1: Maximal Correlation . . . . . . . . . . . . . . . . . . . 4.12.2 Example 2: General Decomposition Formula . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
55 55 56 56 56 56 57 57 57 58 58 58 59 60 62 62 62 63 64 64 65 66 66 67 68 69 70 71 71 72 73
Contents
xi
5
Statistics in Matrix Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1 Mean . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2 Variance-Covariance Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3 Correlation Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4 Linear Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.5 One-Way Analysis of Variance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.6 Multiway Analysis of Variance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.7 Discriminant Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.8 Principal Component Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
75 75 75 76 76 77 79 82 83 84
6
Multidimensional Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2 Pierce’s Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2.1 Pythagorean Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2.2 The Cosine Law . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2.3 Young–Householder Theorem . . . . . . . . . . . . . . . . . . . . . 6.2.4 Eckart–Young Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2.5 Chi-Square Distance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.3 Distance in Multidimensional Space . . . . . . . . . . . . . . . . . . . . . . . . 6.4 Correlation in Multidimensional Space . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
85 85 85 87 87 87 88 88 89 90 91
Part III A New Look at Quantification Theory 7
General Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.1 An Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.2 Historical Background and Reference Books . . . . . . . . . . . . . . . . . 7.3 First Step . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.3.1 Assignment of Unknown Numbers . . . . . . . . . . . . . . . . . 7.3.2 Constraints on the Unknowns . . . . . . . . . . . . . . . . . . . . . . 7.4 Formulations of Different Approaches . . . . . . . . . . . . . . . . . . . . . . . 7.4.1 Bivariate Correlation Approach . . . . . . . . . . . . . . . . . . . . 7.4.2 One-Way Analysis of Variance Approach . . . . . . . . . . . 7.4.3 Maximization of Reliability Coefficient Alpha . . . . . . . 7.5 Multidimensional Decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.6 Eigenvalues and Singular Values Decompositions . . . . . . . . . . . . . 7.7 Finding the Largest Eigenvalue . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.8 Method of Reciprocal Averages . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.9 Problems of Joint Graphical Display . . . . . . . . . . . . . . . . . . . . . . . . 7.10 How Important Data Formats Are . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.11 A New Framework: Two-Stage Analysis . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
95 95 95 97 98 99 100 101 104 110 111 113 115 116 118 119 120 121
xii
Contents
8
Geometry of Space: A New Look . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.2 Geometric Space Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.3 Rorschach Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.3.1 Major Dual Space or Contingency Space . . . . . . . . . . . . 8.3.2 Residual Space of Response-Pattern Table . . . . . . . . . . . 8.3.3 Minor Dual Space of Response-Pattern Table . . . . . . . . 8.3.4 Dual Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.4 Dual Subspace, A Bridge Between Data Types . . . . . . . . . . . . . . . 8.4.1 A Shortcut for Finding Exact Coordinates . . . . . . . . . . . 8.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
125 125 126 126 129 132 133 133 135 136 137 140
9
Two-Stage Quantification: A New Look . . . . . . . . . . . . . . . . . . . . . . . . . 9.1 Barley Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.1.1 Stage 1 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.1.2 Stage 2 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.2 Rorschach Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.2.1 Stage 1 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.2.2 Stage 2 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.3 Squared Distance Matrix in Dual Space . . . . . . . . . . . . . . . . . . . . . 9.4 Summary of Two-Stage Quantification . . . . . . . . . . . . . . . . . . . . . . 9.5 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
143 144 145 147 155 155 159 167 168 168 169
10 Joint Graphical Display . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.1 Toward a New Horizon . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.2 Correspondence Plots and Exact Plots . . . . . . . . . . . . . . . . . . . . . . . 10.2.1 Rorschach Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.2.2 Barley Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.2.3 Kretschmer’s Typology Data . . . . . . . . . . . . . . . . . . . . . . 10.3 Multidimensional Joint Display . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.3.1 Readers’ Tasks: Rorschach Data . . . . . . . . . . . . . . . . . . . 10.3.2 Readers’ Tasks: Barley Data . . . . . . . . . . . . . . . . . . . . . . . 10.3.3 Readers’ Tasks: Kretschmer’s Data . . . . . . . . . . . . . . . . . 10.4 Discussion on Joint Graphical Display . . . . . . . . . . . . . . . . . . . . . . 10.5 Cluster Analysis as an Alternative . . . . . . . . . . . . . . . . . . . . . . . . . . 10.6 Final Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
171 171 172 174 179 182 186 189 190 190 191 193 195 195
11 Beyond the Current Book . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1 Selected Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1.1 Geometric Space Theory for Many Variables . . . . . . . . 11.1.2 Non-symmetric Quantification Analysis . . . . . . . . . . . . . 11.1.3 Forced Classification Analysis . . . . . . . . . . . . . . . . . . . . . 11.1.4 Projection Operators and Quantification . . . . . . . . . . . . .
197 197 197 198 198 199
Contents
xiii
11.1.5 11.1.6 11.1.7 11.1.8 11.1.9 11.1.10 11.1.11
Robust Quantification . . . . . . . . . . . . . . . . . . . . . . . . . . . . Biplots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Multidimensional Joint Graphs . . . . . . . . . . . . . . . . . . . . Cluster Analysis as an Alternative . . . . . . . . . . . . . . . . . . Computer Programs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Inferential Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Nishisato’s Quandary on Multidimensional Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1.12 Gleaning in the Field of Quantification . . . . . . . . . . . . . . 11.2 Final Words: Personal Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
200 200 201 201 201 202 202 204 204 206
Part I
Measurement
Data analysis is to manipulate numbers, which we call data. We must know, however, there are many ways to use numbers such as those given to genders, countries, movie ranking, temperature and distance. Depending on the mathematical nature of those numbers, some are amenable to addition and multiplication, while other numbers are not even appropriate for basic arithmetic operations. For data analysis, the assignment of numerals to the data is the very important first step, and we must bear in mind that different ways of assigning numbers would immediately affect what arithmetic operations are permitted to process such numbers. This is a problem of measurement, which is defined as assignment of numerals to the data according to certain rules. Thus, measurement is of crucial importance to determine the quality of data analysis outcomes. Unfortunately most books on data analysis skip this initial step towards data analysis, and this is of vital importance so that our data analysis may avoid the garbage-in garbage-out fame. Part I is devoted to the problems of measurement, and we will identify our domain of quantification theory. Then we will glance at the task of quantification theory from the measurement perspectives.
Chapter 1
Information for Analysis
1.1 An Overview When we carry out data analysis, we encounter many kinds of data. Some are purely quantitative (e.g., amounts of rainfall, total corn crops, number of sunny days), some are quasi-quantitative (e.g., rankings of movies, achievement test scores, Likert scores used in surveys), and some are completely qualitative or non-quantitative (e.g., gender, kinds of illnesses, religions, provinces, types of housing). It is obvious that some of these can be subjected to the basic arithmetic operations such as addition, subtraction, multiplication and division and that some others are not at all appropriate for the arithmetic operations. No matter which types of data we may obtain, it is essential that we choose appropriate methods to analyze them. One important problem we should keep in mind is the fact that data analysis deals with all kinds of data, and some can be subjected to the standard arithmetic operations (addition, subtraction, multiplication, division), while others cannot. Thus, the task of identifying appropriate ways of handling different types of data is crucial in producing meaningful results of analysis. We therefore will first discuss classification of data to identify appropriate methods of analysis. This is a necessary step for us to assure that we can trust the outcomes of analysis. This step is crucially important for the validity of analytical results; therefore, this is the first topic for Chap. 1. The identification of the nature of data is called the problem of measurement. This task gives us the knowledge of what computations or transformations of data we are allowed for appropriate data analysis.
1.2 Introduction There is a very informative book on measurement written by Hand (2004), a noted British statistician. Measurement is an observation of information collected by inves© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Nishisato, Measurement, Mathematics and New Quantification Theory, Behaviormetrics: Quantitative Approaches to Human Behavior 16, https://doi.org/10.1007/978-981-99-2295-6_1
3
4
1 Information for Analysis
tigators and characterized by numbers. Thus, our data we collect for analysis are all examples of measurement. Hand (2004) shows how measurement can differ, depending on research fields, and discusses measurement obtained in psychology, medicine, physical sciences, economics, social sciences and other fields. One would immediately realize how diverse the kinds of information we are dealing with in our data analysis. Ask: (1) How accurate and reliable is the information collected in those diverse areas of research fields? (2) Are they all amenable to arithmetic operations? We can admit that not all of our data are immediately amenable to statistical analysis. Our typical expectation is, however, that our data are mostly amenable to the ordinary mathematical operations of addition, subtraction, multiplication and division. But, this expectation is often wrong.
1.2.1 Fundamental Arithmetic Operations Before discussing measurement, let us look at basic arithmetic operations, namely axioms on addition, subtraction, multiplication and division. Suppose we have real numbers a, b, c and d. We have the following laws governing these numbers: Axioms of Equality • • • • • • •
a = a (reflexive) If a = b, then b = a (symmetric) If a = b and b = c, then a = c (transitive) If a = b and a + c = d, then b + c = d (substitution) If a = b and ac = d, then bc = d (substitution) If a = b and c = d, then a + c = b + d (addition) If a = b and c = d, then ac = bd (multiplication).
Axioms of Addition • • • • •
a + b is a unique real number. a + b = b + a (commutative) (a + b) + c = a + (b + c) (associative) Existence of real number 0 such that a + 0=a (additive identity) For each number a, there exists a unique number −a such that a + (−a) = 0 (additive inverse).
Axioms of Multiplication • ab is a unique real number (closure) • ab = ba (commutative)
1.2 Introduction
5
• a(bc) = (ab)c (associative) • There exists a real number a such that a1 = a (multiplicative identity) • For each nonzero number a, there exists a unique real number a −1 such that aa −1 = 1 (multiplicative inverse). Axioms of Addition and Multiplication • a(b + c) = ab + ac and (b + c)a = ba + ca (distributive). Ask if your data satisfy these axioms? Keep in mind that these axioms govern our basic mathematical operations of addition, subtraction, multiplication and division! We wonder how often you can answer the above question with “yes.” As an example of our data, consider such a multiple-choice question as: Do you have any fever today? (yes, no, I do not know). Whatever scores you may give to these three response alternatives, do you think they will satisfy those basic axioms? Almost all the answers would be definitely no, except a few of probably not.
1.2.2 Data Types Generally, we classify data into quantitative and non-quantitative (qualitative, categorical) types, where quantitative data satisfy those basic axioms. Then, such quantitative data can typically be subjected to traditional statistical analysis such as regression analysis, principal component analysis and the analysis of variance. For this type of quantitative data, there are many books on data analysis. Therefore, we will leave the discussion of quantitative data to those books. This means that our topic of the current book is what we should do with data which cannot be subjected to the basic arithmetic operations. Before we concentrate on our main topic, we should be aware that there are ordinal data such as rank orders of movies. The ordinal data are tricky because some quantitative information is involved in the data. Consider other examples of ordinal data, say, ranking of beauty contestants, ranking of wealth of countries and ranking of popularity of movies. These data are typically coded by assigning such integers as ranks 1, 2, 3 and so on. These rank numbers may sound like quantitative, but can we add them or multiply them to come up with meaningful quantities? Definitely not. Ranking does not have the rational unit, and the difference between rank 1 and rank 2 cannot be equated to the difference between rank 8 and rank 9. This simple demonstration makes it obvious that the basic mathematical operations are not applicable to rank order data. So we must transform such rank order data before subjecting them to data analysis. Can we transform rank order data in such a way that we can add or subtract transformed ranks? As will briefly be discussed later, rank order data have a specifically difficult problem, called ipsativity. For example, someone’s rank 1 cannot be equated to another person’s rank 1. Therefore, we cannot compare the ranking data from different subjects. As noted above, ranks 1, 2, 3 and so on are not necessarily
6
1 Information for Analysis
equally distant even within a subject. So, we can anticipate great difficulty in handling rank order data in a numerical way. The problem of ranking data will briefly be discussed in another chapter. The other extreme end of this classification is non-quantitative data, also called qualitative data or categorical data. Some examples of this type of data are gender, countries, drinks, colors and religions where the only quantitative operation one can use is counting (e.g., the numbers of left-handed, right-handed, ambidextrous students in a class). As for the variable which has categories of left-handedness, right-handedness and ambidexterity, we cannot apply the basic arithmetic operations to those categories. In the same token, we cannot add, subtract, multiply or divide different genders, different countries, different drinks and colors or different religions. As we can see, the nature of data has serious implications about how to handle them mathematically. The problem is that before we can carry out sensible analysis of data, we must know the data types and arithmetic or logical operations permitted for those different types of data. How to handle such a question as this is the problem of measurement. When the author studied experimental psychology in the 1950s in Japan, he did not learn statistics because statistics and mathematics were not included in the curriculum for psychology students. However, surprisingly “measurement” was the first topic for the curriculum of experimental psychology. In retrospect, this choice of the topic was very wise and commendable, for data collected in psychology were typically not directly amenable to the arithmetic operations. For example, consider preference of colors, personality types, responses to Rorschach inkblots, recollections of dreams, emotional reactions to movies, personality types and attitudes toward political systems. These are typical data that psychologists collect. How can we analyze them? We definitely need the knowledge of measurement. Nowadays, however, it looks as though the topic of measurement is less important than it used to be, for today’s curriculum of university education appears to place less importance on the topic of measurement than half a century ago. In spite of the technological advancement, particularly in computer sciences, the scrutiny of the nature of data seems to have been widely forgotten. Some twenty years ago, the author gave a talk about measurement problems at an interdisciplinary conference in Tokyo. Then, one natural scientist was flabbergasted about my statement that the intelligence quotient (IQ) could not be subjected to the general arithmetic operations: The author’s talk included an example of how absurd it was to make such a statement that person A with IQ of 100 is twice as bright as person B with IQ of 50. If a reasonable person pauses a moment and reconsiders the statement, the person would have agreed with his statement, but as the first impression it is easily conceivable that this scientist doubted the correctness of his statement. This kind of incidence was not anticipated in the 1950s, but for some reason or other researchers nowadays would rather doubt the accuracy of such a statement as his. What matters there was the lack of training in measurement issues for data analysis.
1.3 Stevens’ Theory of Measurement
7
It is not clear why the topic of measurement has lost its full-fledged status in data analysis courses, but the topic is nevertheless important, and it warrants at least some discussion before we talk about data analysis of non-quantitative data. This is why we start the book with measurement issues before discussing a special branch of data analysis. Measurement is defined as assignment of numbers according to certain mathematical rules. Let us keep in mind that data we analyze are typically created through this measurement process unless the data are purely quantitative. In most cases, therefore, how to assign numbers to objects is the first serious task for data analysis and is crucial to identify what kinds of mathematical operations are appropriate for our data set.
1.3 Stevens’ Theory of Measurement In the social sciences, the best-known theory of measurement is Stevens’ classification of measurement (Stevens, 1951), and his theory of measurement became a familiar topic for data analysis in the social sciences in the 1950s and 1960s. As the computers became a routine tool for data analysis after the 1960s, Stevens’ theory of measurement appears to have been left only for secondary consideration. However, his theory is still relevant and important, particularly when we want to analyze non-quantitative data as we do in the current book. We cannot afford ignoring it. Let us start with a skeleton of Stevens’ theory of measurement as summarized in Table 1.1. This is a version of his table slightly augmented by Nishisato (2022). The table contains information about kinds of measurement, what they are, what operations are allowed and typical examples of four different kinds of measurement. From this table, one can tell those data which can be subjected to the ordinary arithmetic operations and those which cannot. The information in this table offers a guide for appropriate use of statistics. Misapplications of mathematical operations can lead us to the garbage-in garbage-out situation. Let us agree then how relevant this topic of measurement is to our data analysis. For those who are interested in its relevance to data analysis, there are very few books to direct our attention to the important contributions by Stevens. Although in Japanese (Nishisato, 1975), discussed in his entire book what analytical methods are appropriate for each of the four types of measurement. He also discussed in another book in Japanese (Nishisato, 2010) how to choose analytical procedures appropriate for different types of measurement. Other researchers have also extended the scope of measurement under the term scaling (e.g., Bock & Jones, 1968; Coombs, 1964; Dunn-Rankin, 1983; Hand, 2004; Torgerson, 1958). Let us now examine Stevens’ four types of measurement, nominal, ordinal, interval and ratio measurement, in the context of data analysis.
8
1 Information for Analysis
Table 1.1 Stevens’ four kinds of measurement Measurement (scale)
Basic operations
Mathematical structure
Permissible statistics
Typical examples
Nominal
Determining equality or identification
Permutation group x = f (x) [ f (x) means any one-to-one substitution]
No. of cases
Numbering of football players
Mode
Type or model numbers
Contingency correlation
Gender Body types
Ordinal
Determining greater or less
Isotonic group
Median
Hardness of materials
x = f (x)
Percentiles
Quality of wool/leather/lumber
[ f (x) means any increasing monotonic function]
Order correlation
Pleasantness of odors
General linear group
Mean
x = ax + b
Standard deviation
(F and C)
Pearson’s correlation
Test scores
Movie rank Interval
Determining equality of intervals or differences
Temperature:
Calendar date IQ scores Ratio
Determining equality of ratio
Similarity group
Geometric mean
Length
x = ax
Coefficient of variation
Weight
Decibel
Resistance
Arithmetic operations
Density Pitch (mels) Loudness (sones)
1.3.1 Nominal Measurement For nominal measurement, the numbers are used as labels such as baseball players’ back numbers and group numbers. The sole role of numbers at this level is only for identification purposes. Therefore, the nominal numbers cannot be added, subtracted, multiplied or divided to yield any meaningful numbers. Nominal data are often collected in survey studies such as following: • Are you right-handed, left-handed or ambidextrous? • Where is your residence, urban, suburban or rural? • What is the color of your car, black, silver, red, blue, white or any other color?
1.3 Stevens’ Theory of Measurement
9
• Where is your travel destination, North America, South America, Europe, Asia, Australia or Africa? • What is your vaccination status, no vaccination, one vaccination, two vaccinations or more? • Which kind of car do you have, electric, hybrid or gasoline? • What is your political affiliation, liberal, conservative, new democratic or green? • Retired players’ back numbers such as 3 (Babe Ruth), 4 (Lou Gehrig), 7 (Micky Mantle), 9 (Ted Williams), 42 (Jackie Robinson) and 44 (Hank Aaron). Look at the last example of baseball players’ back numbers. Notice if one wants to collect the numbers of rank 1 popularity votes of these players from a large group of people, the counts of rank 1 are no longer nominal measurement, but ratio measurements that can be subjected to the arithmetic operations. It is important to note this distinction that although the back numbers are nominal measurement, the popularity votes of those players can be subjected to the basic arithmetic operations, which we call an example of ratio measurement as we will see it shortly. Here we see a possibility of analyzing nominal measurements (e.g., back numbers) through popularity ratings or the number of games they played and so on. This example shows how to bridge nominal measurement (back numbers) to ratio measurement (popularity counts). In the current book, we will learn how to assign some mathematically manipulable numbers to those baseball players’ back numbers with the aid of the popularity counts—this is essentially a background information of quantification theory as we will discuss later. Let us restate the above finding: Using the counts, we can transform those nominal back numbers into ratio measurement, say popularity scores. Let us look at a real example: Kretschmer’s typology (Kretschmer, 1925): This German psychiatrist classified patients’ body types (1 = pyknic type, 2 = leptosmatic type, 3 = athletic type, 4 = dysplastic type, 5 = others) and mental types (1 = manic-depressive type, 2 = schizophrenic type, 3 = epileptic type). See Table 1.2. Again, these numbers are nominal measurement and cannot be subjected to arithmetic operations. But, as was the example of baseball players’ popularity votes, the numbers of patients which fall into combinations of the body types and the mental types (e.g., and the number of patients who can be classified into the combination of leptosomatic body type and schizophrenic mental type) can be treated as ratio measurements, thus are amenable to the arithmetic operations. This is a subtle bridge between nominal measurement and quantitative analysis. Indeed, quantification theory is an optimal method of analysis for this type of data. Let us consider Kretschmer’s data of frequencies of nominal variables. When we subject this data set to quantification analysis, it will provide twodimensional coordinates of the three mental types and the five body types (note: the number of dimensions is given by the number of the mental types minus 1. More generally, when the table is m × n, the dimensionality is given by the smaller one out of m and n minus 1, that is, for the 3-by-5 table, it is 2, namely 3-1 = 2) as given in Table 1.3.
10
1 Information for Analysis
Table 1.2 Kretschmer’s typology data Pyknic
Leptosomatic
Athletic
Total
114
1360
261
Schizophrenic
717
2632
884
549
450
5232
83
378
435
444
166
1506
1679
3271
1410
1008
730
8098
Total
15
Others
879
Epileptic
91
Dysplastic
Manicdepressive
Table 1.3 Two-dimensional coordinates of mental and body types Variable Dimension 1 Dimension 2 Manic-depressive Schizophrenic Epileptic Pyknic Leptosomatic Athletic Dysplastic Others
− 1.09 0.14 0.50 − 0.96 0.16 0.33 0.55 0.06
0.16 − 0.18 0.48 0.11 − 0.29 0.18 0.45 0.09
Fig. 1.1 Correspondent plot of components 1 and 2
These coordinates are determined so as to maximize the association between mental types and body types. Now that we have the coordinates of our variables, let us look at the so-called two-dimensional correspondence plot (Fig. 1.1). From this graph, we can roughly summarize the following relations. • • • •
Mental type “manic-depressive” is close to the body type “pyknic” (chubby). Mental type “epileptic” is close to the body type “dysplastic” (unbalanced). Mental type “schizophrenic” is close to the body type “leptosomatic” (lean). Body type “others” does not seem to be associated with any mental types.
1.3 Stevens’ Theory of Measurement
11
In this way, we can quantify nominal categories and visualize the configuration of the data. We should clearly state here that nominal measurement is the major domain of quantification theory, where we use frequencies of nominal categories to assign numbers to them. Recall, however, that the data set must be a collection of responses to nominal categorical variables, and quantification theory cannot do anything if the task is to find optimal values of, for example, right-handed, left-handed and ambidextrous categories without any count (frequency) data on the categories. Thus, for example, we can assign values to those baseball players only when we collect data on, for example, how many rank 1 votes they receive. In summary, nominal measurement is the main domain of quantification theory. Later we will see how best possible quantities can be derived for nominal categories. What is “best possible?” From nominal categories to optimal category values? This may sound like an impossible task, but we will see that quantification theory offers optimal quantitative analysis of nominal data (e.g., Bock, 1960).
1.3.2 Ordinal Measurement Question: Rank the following five movies according to the order of your preference: 1. 2. 3. 4. 5.
Sound of Music. Manchurian Candidate. King Solomon’s Mine. To Kill a Mocking Bird. Dr. Zhivago.
Rank orders are from the most preferred to the least preferred, indicated by 1, ..., 5, respectively. Note that rank numbers are not amenable to the basic arithmetic operations, for the distance between rank 1 and rank 2 cannot generally be equated to the difference between rank 2 and rank 3; someone’s rank 3 cannot generally be equal to someone else’s rank 3 and so on. Thus, the rank numbers are not amenable to the operations of addition and subtraction, not to mention division and multiplication. Consider another example: Miss World contestants’ rankings by judges. Suppose there were 20 contestants and five judges. To be strict, rank orders are partly quantitative but not quantitative enough to subject them to arithmetic operations of addition, subtraction, division and multiplication. Why not? Because, as mentioned above, ranking numbers do neither have equal units nor the rational origin. Furthermore, different judges may use different criteria for the judgments, making interjudge differences impossible to compare in any meaningful way. Consider another example of cosmetics ranking by consumers. The data are again partially quantitative and pose the same concerns as stated above. The consumers have different criteria for cosmetics such as making one look younger, good moisturizer, or inexpensive, and ranking is not of equal interval and changes from ranks to ranks and from judges to judges. So, unlike nominal measurement which is not amenable
12
1 Information for Analysis
to any arithmetic operations, ordinal measurement has a number of difficult problems to handle mathematically for analysis. In terms of difficulty in mathematical manipulability, ordinal measurement follows nominal measurement. Rank order data are the main data type of ordinal measurement, and we must admit that ranking conveys some quantitative information about goodness or performance superiority, but rank orders are not equal-distance variables. Quantification theory was developed for rank order data by several investigators (e.g., Carroll, 1972; Greenacre & Torres-Lacomba, 1999; Guttman, 1946; Nishisato, 1976, 1978; Slater, 1960; van de Velden, 2000), but ordinal measurement is not in the main domain of quantification theory, and these investigators mentioned above have found only approximations to the quantification problem of ordinal measurement. Further quantification investigation is needed to arrive at a satisfactory stage of progress. From the viewpoint of measurement theory, the task of quantification theory for ordinal measurement can be stated as follows: Quantification problem for ordinal measurement • Find a multidimensional configuration of both objects and subjects in such a way that the rankings of distances between each subject and all the ranked objects from quantification be equal to the rankings of distances between each subject and the objects in the input data, and this should hold in multidimensional space. The above references for quantification theory for ordinal measurement do not satisfy the above statement, but only approximations to the exact match of the exact matching of ranking between the input data and the output of those studies mentioned above; hence, no solution is available as of now.
1.3.3 Interval Measurement The majority of social science data fall into this category, and the most notable characteristic of interval measurement is the absence of the rational origin. This means that interval measurement cannot be subjected to any division to yield meaningful outcome (Remember the earlier mention of IQ comparison, where we mentioned the erroneous statement of the person with IQ of 100 being twice as smart as the person with IQ of 50). Imagine that most social science data cannot validly be divided! We should always be reminded of this restriction of interval measurement: Interval measurement does not have the rational origin. So, redundant as it may be, let us repeat that the following statements are all false: • The IQ of 0 means the person has no intelligence. • The achievement test score of 0 means the person has no ability.
1.3 Stevens’ Theory of Measurement
13
• A’s IQ is 100 and B’s IQ is 50, therefore A is twice as intelligent as B. • The achievement test scores range from 0 to 100, the person with score of 100 has complete ability and the person with score of 25 has one quarter of ability of the person with the score of 100. What do these mean? Although we may not believe it, many test scores, including IQ, have been treated as if they were ratio measurement where the rational origin is defined—this is definitely wrong! So, please keep in mind that almost all test scores we construct and use provide not ratio measurement, but at best interval measurement. Even this assertion of equal interval of no origin may not be true for many achievement tests. Therefore, we can say at least that such mathematical operations as multiplication and division of those test scores are not appropriate. This statement applies to all interval measurement data. From the analytical point of view, however, we have developed a number of statistics which are not weakened by the lack of the absolute origin. For example, such popular statistics as variance, covariance and correlation are all free from the arbitrary origin of measurement because those statistics are defined in terms of deviation scores, that is, original scores minus the mean. Thus, even if the measurement lacks the rational origin, the effect of an arbitrary or undefined origin is totally removed through dealing with deviation scores (i.e., original scores minus the mean). Those deviation scores are used to calculate such statistics as correlation, covariance, variance and correlation ratio. When data are interval measurement, therefore, quantification theory is not needed.
1.3.4 Ratio Measurement Ratio measurement is a full-fledged quantitative measurement, namely measurement has the origin and the well-defined equal intervals (units); thus, all the basic mathematical operations can validly be applied to process the ratio measurement. Examples of ratio measurement are distance, weight measurement and counts, where 0 is well defined. In other words, once we have data which are ratio measurement, one can apply any mathematical operations to yield meaningful outcomes; thus, there is no need for quantification theory for ratio measurement. Many books on statistics deal with data of ratio measurement, and we will therefore not spend more time on ratio measurement data. It should be noted, however, that most data we deal with are not ratio measurement, and that we must carefully examine the nature of measurement we are dealing with.
14
1 Information for Analysis
1.4 Concluding Remarks on Measurement From the data analytic point of view, we see a lot of problems in dealing with interval measurement. As repeatedly mentioned, the lack of the rational origin is often ignored in practice. Let us provide one more example of a mistake: Consider data measured in Celsius, 10 and 20 ◦ C. By looking at these two numbers, one may be tempted to say that the latter is twice as hot as the former. But, we now know that this is absurd. Just change them to the Fahrenheit scale. 10 ◦ C = 50 ◦ F, and 20 ◦ C = 68 ◦ F. We can see that 68 ◦ F is no longer twice of 50 ◦ F. So, please remember that division or multiplication of interval measurement yields uninterpretable outcomes. Just use deviation scores from the arbitrary mean (i.e., original scores minus the mean) which are free from the effect of the arbitrary origin. Hence, deviation scores are amenable to division and multiplication. As mentioned earlier, handling of ordinal measurement from the view of quantification theory leaves further elaboration. We must continue to strive for a better method of quantification analysis of ordinal measurement than we have now. As discussed above, nominal measurement itself is the least quantitative measurement. Luckily, however, quantification theory has been developed to deal with nominal measurement by using frequencies of nominal categorical variables. As a concluding remark on Stevens’ theory of measurement, let us introduce quantification theory of nominal measurement.
1.5 Task of Quantification Theory As clarified above, our quantification theory deals with, in some sense, the least quantitative data, that is, nominal measurement. How can we process such nonquantitative information quantitatively? Furthermore, we will learn that quantification theory provides an optimal analysis of non-quantitative data. To satisfy our curiosity, we will look at a sample data set and statement of the objective of the task here so that we may understand why we need some mathematical background to understand the task. Without any discussion on its mathematical tasks, let us look at the setup of the quantification task, using a small numerical example. This numerical example has been used a number of times (e.g., Nishisato, 1980, 2022): Twenty-nine (29) students were asked to evaluate the performance of three teachers in terms of three evaluation categories: good, average and poor. The object of quantification theory is to quantify the three teachers and the three evaluation categories (Note: The three evaluation categories, good, average and poor, are ordinal measurement, but we treat them purely as categorical, that is, as nominal measurement). The data are tabulated in the 3 × 3 (the teachers-by-rating categories) table as in Table 1.3:
1.5 Task of Quantification Theory
15
Table 1.4 Evaluation of three teachers Teacher Good Average Whitea Green Brown Total
1 3 6 10
3 5 3 11
Poor
Total
6 2 0 8
10 10 9 29
a
In the original use (Nishisato, 1980), these teachers were identified as 1, 2 and 3; but later these were changed to white, green and brown, respectively Table 1.5 Teacher evaluation data expressed by unknown numbers Teacher Good Average White
Green
Brown
(y1 , x1 ) (y2 , x1 ) (y2 , x1 ) (y2 , x1 ) (y3 , x1 ), (y3 , x1 ) (y3 , x1 ), (y3 , x1 ) (y3 , x1 ), (y3 , x1 )
(y1 , x2 ) (y1 , x2 ) (y1 , x2 ) (y2 , x2 ), (y2 , x2 ) (y2 , x2 ), (y2 , x2 ) (y2 , x2 ) (y3 , x2 ), (y3 , x2 ) (y3 , x2 )
Poor (y1 , x3 ), (y1 , x3 ) (y1 , x3 ), (y1 , x3 ) (y1 , x3 ), (y1 , x3 ) (y2 , x3 ) (y2 , x3 )
0
In quantification theory, we assign unknown values to the three teachers and the three evaluation categories. Therefore, our input for analysis is a collection of those unknowns. Can we determine the values for those unknowns? Yes, we can determine them by introducing an appropriate set of constraints on the unknowns (this will be discussed later). First, express our data in terms of two sets of unknowns, one set for the three teachers in the rows of this table (y1 , y2 , y3 ) and the three evaluation categories (x1 , x2 , x3 ). These six variables are unknown numbers that we want to determine in the best possible way, which we call an optimal way. Since each element of our crosstable can be represented by the corresponding teacher (row weight) and evaluation category (column weight), the data can be expressed in terms of the sets of two corresponding unknowns as in Table 1.5. Our task is to determine these two sets of unknowns in a mathematically optimal way, that is, the most efficient and exhaustive way under an appropriate set of constraints on these unknowns. As our constraints on the unknowns, we typically use the conditions that (1) The sum of the quantified data is zero, and (2) the sum of squares of the quantified responses is equal to the total number of responses in the table.
16
1 Information for Analysis
See a trick here: By introducing these constraints on the sets of unknowns, we are able to determine the values of the unknowns! The only major difference between these values we obtain from quantification and ratio measurement is that ratio measurement has absolute origin and units, rather than arbitrarily introduced unit and origin. These constraints have widely been used in quantification theory literature, and under these constraints, we know that all the values of the unknowns can be determined. Note that nominal measurement does not have the rational origin nor the unit, while ratio measurement has the rational origin and the rational unit. Thus, this is the price we must pay in order to upgrade the handling of the original nominal measurement to quantitative data. A crucially important aspect of quantification is that we do not use any assumptions on the form of relations (linear or nonlinear) between those unknowns, but that we leave the determination of the relation entirely to the given data. Thus, the outcome may be nonlinear or linear relations in unidimensional or multidimensional space. In this regard, this is totally a data-oriented approach to data analysis. This is the task of quantification theory, and, as we immediately note, we are dealing with the data expressed in terms of the only unknowns. We can determine those unknowns in anyway we like, but our choice is to determine them in the mathematically optimal way under the given constraints on the unknowns. Remember that Bock (1960) called the quantification theory optimal scaling. One remarkable aspect of this quantification problem is that the optimal solution always exits under the normal circumstances. The above example shows that we are dealing with quite a different problem in quantification theory from ordinary data analysis—we are dealing with the analysis of non-quantitative input and producing quantitative output, which may or may not be linear or nonlinear analysis, unidimensional or multidimensional analysis, totally dependent on the data we analyze. This is a beautiful characteristic of quantification theory. It is hoped that the topic of measurement is kept in mind as a matter of the utmost importance, and we should know that different levels of measurement require different methods of analysis. So, identify appropriate methods of data analysis for different types of measurement. In Chap. 2, we will look at a few more topics on data analysis before we discuss some introductory mathematical backgrounds for eventual discussion of quantification theory.
References Bock, R. D. (1960). Methods and applications of optimal scaling (p. 25). No: The University of North Carolina Psychometric Laboratory Research Memorandum. Bock, R. D., & Jones, L. V. (1968). Measurement and Prediction of Judgment and Choice. San Francisco: Holden-Day.
References
17
Carroll, D. J. (1972). Individual differences and multidimensional scaling. In R. N. Shepard, A. K. Romney, & S. B. Nerlove (Eds.), Multidimensional Scaling: Theory and Applications in the Behavioral Sciences. (Vol. I). New York: Seminar Press. Coombs, C. H. (1964). A Theory of Data. New York: Wiley. Dunn-Rankin, P. (1983). Scaling Methods. Hillsdale, N.J.: Lawrence Erlbaum Associates. Greenacre, M. J. & Torres-Lacomba, A. (1999). A note on the dual scaling of dominance data and its relationship to correspondence analysis. Working Paper Ref. 430, Departament d’Economia i Empresa, Universidad Pompeu Fabra, Barcelona, Spain. Guttman, L. (1946). An approach for quantifying paired comparisons and rank order. Annals of Mathematical Statistics, 17, 144–163. Hand, A. J. (2004). Measurement Theory and Practice: The World through Quantification. London: Oxford University Press. Kretschmer, E. (1925). Physique and Character: An Investigation of the Nature of Constitution and of the Theory of Temperament; with 31 Plates. London: Kegan Paul, Trench, Trubner. Nishisato, S. (1975). Applied Psychological Scaling. Tokyo: Seishin Shobo Publisher (in Japanese). Nishisato, S. (1976). Optimal Scaling as Applied to Different Forms of Categorical Data. Toronto: Department of Measurement and Evaluation, OISE. Nishisato, S. (1978). Optimal scaling of paired comparison and rank order data: An alternative to Guttman’s formulation. Psychometrika, 43, 267–271. Nishisato, S. (1980). Analysis of Categorical Data: Dual Scaling and Its Applications. Toronto: The University of Toronto Press. Nishisato, S. (2010). Data Analysis for Behavioral Sciences: Use of Methods Appropriate for Information Retrieval. Tokyo: Baifukan. (in Japanese). Nishisato, S. (2022). Optimal Quantification and Symmetry. Singapore: Springer Nature. Slater, P. (1960). Analysis of personal preferences. British Journal of Statistical Psychology, 3, 119–135. Stevens, S. S. (1951). Mathematics, measurement, and psychophysics. In S. S. Stevens (Ed.), Handbook of Experimental Psychology. New York: Wiley. Torgerson, W. S. (1952). Multidimensional scaling: I. Theory and method. Psychometrika, 17, 401–419. Torgerson, W. S. (1958). Theory and Methods of Scaling. New York: Wiley. van de Velden, M. (2000). Dual scaling and correspondence analysis of rank order data. In R. D. H. Heijmans, D. S. G. Pollock, & A. Satorra (Eds.), Innovations in multivariate statistical analysis. Dordrecht: Kluwer Academic Publishers.
Chapter 2
Data Analysis and Likert Scale
Before we move on to the introductory chapter on necessary mathematical background, let us take a moment and look at today’s data analysis reported on mass media which makes us wonder if such reports convey correct or understandable information. In Chap. 1, we discussed the important topic of measurement. When data are ratio measurement and interval measurement, most researchers are likely to use available statistical procedures. Typically, we do not see any problems with such data analysis, but the two cases given below are small examples of inappropriate data analysis even when data are ratio measurement. These cases may be trivial but show problems even when the data are quantitative where we do not anticipate any problems. They can easily be rectified, but we will mention them anyway as a reminder that we must exercise careful attention to what we do in data analysis. After glancing at these two mundane cases, we would like to move to the core of this chapter, a case which involves ordinal measurement, and of which misapplications are almost universal in today’s survey reports. In Chap. 1, we discussed some serious problems in dealing with ordinal measurement, and one method which flourished in the 1930s and onward is still used without too much scrutiny, thus yielding a countless number of mishandling of ordinal data. This problem must be mentioned here as a serious precaution for many survey studies. This is a problem associated with the so-called Likert scale/scores/data. In Chap. 1, we already noted that analysis of ordinal measurement is a murky area of data analysis. So, let us spend a moment to provide some assistance in handling today’s popular analysis involving Likert scores, before we jump into our major topic of quantification theory later. Note that these three cases are not related to quantification theory, but we would like to point out that problematic cases are not unique to analysis of nominal data. If your interest is only in quantification theory, please feel free to skip this chapter.
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Nishisato, Measurement, Mathematics and New Quantification Theory, Behaviormetrics: Quantitative Approaches to Human Behavior 16, https://doi.org/10.1007/978-981-99-2295-6_2
19
20
2 Data Analysis and Likert Scale
2.1 Two Examples of Uninformative Reports Nowadays, we are so inundated with reports on data analysis such as crime statistics, accident statistics, homicide statistics, health statistics and election statistics, those reported statistics are generally trusted by most people, and questions on them are rarely raised even by critics. Occasionally, however, there are reports which are not helpful at all to understand what is going on. The following two cases are minor examples, but are examples that simple precaution can rectify the problems.
2.1.1 Number of COVID Patients During the current COVID-19 pandemic, each municipality or province reported the number of newly identified positively tested people, such as 2356, for a day. This number sounds very serious, but the real situation is definitely much worse than what this number indicates. Why? It is because the reported number is based on only those people who took the test and tested positive. Recall, however, that most residents of the community were not typically tested for a number of reasons: fear, the lack of testing sites, the absence of the system to test all the residents, the lack of testing kits, etc. This means that the above number does not honestly describe the current situation of the community. Nevertheless, public news media have reported such a number as an indicator of the spread of the virus. As most readers would agree, the statistic reported under this kind of condition completely lacks validity. What should have been done to report a more reliable number of new COVID patients? We must use a random sample of the community for testing and then estimate how many new COVID patients are on a daily basis in the entire community. During the COVID years, the most widely reported number of new patients was typically based on the volunteers, and such reported numbers are most likely to be grossly underestimated, thus useless. For the entire community, one must spend a resource to carry out the results from random samples and then statistically estimate the real number of new patients. The key is random sampling, a task which may be difficult to implement for media companies, but it is definitely needed to arrive at a valid estimate of the statistic.
2.1.2 Number of Those Vaccinated The second case is much simpler than the first one. The TV news reported that the number of people who have had two vaccinations plus one booster for COVID-19 has exceeded 7 million in the province. Such an impressive number, however, is hardly interpretable unless we know the total number of residents in the province.
2.2 Likert Scale, a Popular but Misused Tool
21
To make it interpretable, that number should be converted to the percentage of those who had received two vaccines and one booster out of the population in the province. Whenever we talk about the community health, we need a report, not on the absolute number of the chosen group without the total population, but on the relative number or percentage. Since the daily number of new patients has been reported even if the number is based on volunteers only, the same government agencies must know the total number of residents in the province. So, at least the percentage of those who have received two vaccinations and one booster can easily be calculated and would be helpful to see how well the province is doing to prevent COVID-19 from spreading. These two examples provide a case in which reported data are counts (quantitative), thus amenable to arithmetic operations of addition, subtraction, multiplication and division. Thus, for the researchers, those data are the simplest kind to handle properly. Yet, the lack of common sense has created unbelievable situations surrounding the use of numbers. We need the validity of statistics instead of the statistics based on inappropriate sampling for Example 1 and the inappropriate statistic for Example 2. From the data analytic point of view, these are two examples which one can easily rectify to come up with appropriate statistics. In both situations, the data are quantitative, and there is no problem in handling them by basic mathematical operations. The problems in these examples are due to invalid handling of the data, and there is nothing wrong with the data. Let us now use extra space to discuss the third case, which arises in dealing with ordinal measurement. Because of the popularity of this type of data and handling them, we need to devote much more space than a few paragraphs to explain and clarify what is wrong with the current practice in dealing with ordinal measurement. The case in point is what we typically call Likert scaling (scoring).
2.2 Likert Scale, a Popular but Misused Tool When we look at data analysis today, there is one outstandingly serious and widespread problem, that is, inappropriate use of Likert scale data. Once you see an example, you would immediately recognize Likert scale that you have seen at hospitals or government offices or in surveys of various types. Likert (1932) proposed a way to handle ordered multiple-choice categories in his Ph.D. thesis. It is a coding method to assign integers to ordered response categories such as (never, sometimes, often, always). This idea is very simple and looks valid. Likert scale has been the most popular, universally adopted method, used everywhere in medical clinics, educational institutions, consulting companies, universities and government offices. It is a very simple practical method for data collection. Let us look at a few examples of Likert scale. As most readers are familiar with, data are collected by asking people to choose appropriate response categories of Likert scale such as
22
• • • •
2 Data Analysis and Likert Scale
never = 1, sometimes = 2, often = 3, always = 4. cold = 1, lukewarm = 2, warm = 3, hot = 4. weak =1, strong = 2. poor = 1, good = 2, excellent = 3.
There is nothing wrong with collecting data using Likert scale. So, please do not at all worry about using Likert scales to collect data. The real problem is how to handle such data. Note that it is customary to use these ordered integers as quantitative data for analysis and treated almost always as ratio measurement. This practice is absolutely indefensible and almost always wrong! Likert scores are simply “codes” and not “scores.” Let us reminisce the old days. When Likert proposed his scale, one of the dominant and popular research topics was to construct a unidimensional scale of, for examples, anxiety, attitudes and personality traits. Likert’s coding method was simple and convincing that it appealed to many researchers then, but the problem with Likert’s method is that most people used it not as the coding method (it is a super method for coding), but as data (scores) for analysis, that is, his coding method is now used as a scoring method for data analysis. This is wrong unless one lives in a strictly unidimensional world of data. Over years, his ordered scale became the most popular scoring method, yielding Likert scores. For modern-day data analysis, this is absolutely a wrong way to use Likert scale! The reason for this statement is obvious: Such ordered scores will filter out most content of information in the data. Times have changed since Likert’s days, and nowadays, we are interested in all kinds of information from data, such as multidimensional personality traits and attitudes as well as both linear and nonlinear relations between variables. Today researchers are interested in both unidimensional and multidimensional phenomena and linear and nonlinear relations between variables. Likert scores stand in the way for such extended analysis. Why should we restrict our attention only to linear unidimensional relations between variables, as implicitly implied by the use of Likert scale as scores? The change of time and interests, however, did not make any impact on the popularity of Likert scores, and today, Likert scores are everywhere in survey analyses, medical check lists and evaluations of employees. But, note that Likert scores cannot directly be subjected to data analysis, without any prior processing of them into valid scores! No one seems to care if the phenomena we are trying to analyze are unidimensional or multidimensional, or linear relations or nonlinear relations, for data analysis is expected to tell us what kinds of information are embedded in our data. Why should we restrict our attention only to linear relations in data by using Likert scale as scores for analysis? The validity of Likert scores depends solely on what kinds of relations between variables we are interested in analyzing. Many researchers who use Likert scores are convinced that their Likert scores will reveal whatever the phenomena captured by the data would be retrieved from the analysis of Likert scores. But, the major
2.2 Likert Scale, a Popular but Misused Tool
23
problem with Likert scores is that one can capture only linearly related phenomena, not necessarily very effectively. Many studies using Likert scores look as though the researchers are not aware of the fact that Likert scores would limit their sight only for unidimensional relations. The fact is that most data we collect typically contain a great deal of nonlinear and multidimensional information. In many survey studies with Likert scores, the researchers are totally unaware that Likert scores cannot capture other than linear relations between variables. It is obvious, however, that most data contain nonlinear relations between variables. If so, why do they still use Likert scale as scores? One step that some investigators do is to go further and impose order constraints in analyzing Likert scales, namely to derive scores for Likert codings in such a way that the numerals x1 , x2 , x3 , x4 , x5 , assigned to categories (never, sometimes, often, always), respectively, satisfy the following order constraint: x5 ≥ x4 ≥ x3 ≥ x2 ≥ x1 . This is consistent with the original consideration of Likert scales, but is such an order constraint on the categories of a Likert scale really appropriate or necessary? Our answer is “definitely not.” It is typically absurd to impose an order constraint on Likert categories. Of course, we understand the intention of such order constraints on Likert scores, for it is probably what Likert originally thought. But, as we have so far discussed, the order constraints on Likert categories will limit the scope of data analysis to only linear relations. This amounts to a very little amount of information contained in the data. We will see the reason why the order constraint is meaningless, using numerical examples. Let us state that: • Likert scores work well only when all the variables are extremely highly correlated so that the data set, if subjected to factor analysis or principal component analysis, will reveal only one dominant component. • Likert scores do not work when the set of variables contains more than one dominant factors or components, namely when the data are multidimensional. If all the variables are very highly correlated, it is the case when the data capture a highly dominant phenomenon. In this case, if we plot the average scores on the vertical axis and Likert scores on the horizontal axis, the graph will show a linear relation if Likert scores are appropriate for the data set (Nishisato, 1975, 1980). Thus, the test of linearity in data can be easily done in this way. On the other hand, if there are many nonlinear relations involved in the data, Likert scores are destined to fail in capturing the essential part of the information in the data. Let us use this method to test linearity of data, or the validity of Likert scores, using a few examples.
24
2 Data Analysis and Likert Scale
2.2.1 How Does Likert Scale Work? Let us look at a few numerical examples to see how Likert scores work in practice. In this examination, we will plot the average Likert scores of ordered categories (vertical axis) against Likert scores (horizontal axis), following Nishisato (1975, 1980). If both plots of rows and columns show linearly ascending (or descending) graphs, we can say that Likert scores are appropriate for the data; if the two plots show large departures from the monotonically increasing (or decreasing) graph, we conclude that Likert scores fail to capture much of information embedded in the data and should therefore not be used. This is an effective way to demonstrate appropriateness of Likert scores of the data sets. Suppose that we now replace the Likert scores with the optimal scores obtained by quantification theory. It is important to know that quantification theory always provides perfectly linear graphs of the means as a function of category scores. Nishisato (1980) has shown that the slope is equal to the maximum correlation between the two variables. We will see later that the so-called method of reciprocal averages, one of the techniques of quantification theory, shows gradual changes of the original nonlinear graph into a perfectly linear graph, and when the plot reaches a perfect line, we conclude the process of reciprocal averaging has converged to the optimal weights. This is how Nishisato used in his numerous workshops on dual scaling and demonstrated the limited utility of Likert scale in empirical research. For the method of reciprocal averages, Likert scores can be used as initial scores for its convergent process, and we will then see how much Likert scores can miss important information to describe the data. In the next section, we will look at Likert scores applied to three numerical examples: one where Likert scores work well and the remaining two where Likert scores show complete failures in capturing nonlinear information embedded in the data. Effects of Sleeping Pills on Sleeping This example is reported in Nishisato (1980) and is also discussed in several publications (e.g., Nishisato, 2007). In this example, subjects were asked the following two multiple questions, and the data are summarized in Tables 2.1 and 2.2. Q1: What do you think of taking sleeping pills? (1) strongly for; (2) agree; (3) indifferent; (4) against; (5) strongly against. Q2: Do you sleep well every night? (1) never; (2) rarely; (3) sometimes; (4) often; (5) always. Let us use centered Likert scores for the ordered categories of the two questions, namely − 2, − 1, 0, 1 and 2 for the five ordered categories of each set. How appropriate are these scores? How can we investigate the appropriateness of Likert scores? To answer these questions, use what Nishisato (1975, 1980) suggested: First, calculate the mean of each category, using these scores and plot them as a function of Likert scores. For example, the mean of category Never is
2.2 Likert Scale, a Popular but Misused Tool Table 2.1 Sleeping pills and sleeping Na R S Strongly for Agree Indifferent Against Strongly against Sum Likert score a Note
O
A
Sum
15 5 6 0 1
8 17 13 7 2
3 4 4 7 6
2 0 3 5 3
0 2 2 9 16
28 28 28 28 28
27 −2
47 −1
24 0
13 1
29 2
140
Likert score −2 −1 0 1 2
N—never; R—rarely; S—sometimes; O—occasionally; A—always
Table 2.2 Likert scores and means Row score Mean −2 −1 0 1 2
25
− 1.2 − 0.5 0.4 0.5 1.3
Column score
Mean
−2 −1 0 1 2
− 1.3 − 0.8 − 0.6 0.6 1.1
[15 × (−2) + 5 × (−1) + 6 × 0 + 7 × 1 + 1 × 2)]/27 = −1.2. Likewise, we calculate the means of row categories and those of column categories, as summarized in Table 2.2. Now, plot those averages (in the vertical axis) against the original scores (− 2, − 1, 0, 1, 2) (in the horizontal axis) as seen in Figs. 2.1 and 2.2. Each of the two lines is relatively close to a straight line. On the basis of this observation, we can conclude that the original Likert scores work well for this example. This graph indicates that the harder to sleep the more agreeable to taking the sleeping pills. There exists a linear relation between sleep and pills. Although this example has shown that Likert scores are appropriate for this example, we may ask if there exists a better scoring method. The answer is: Yes, use quantification theory!, that is, optimal scores from quantification theory in lieu of Likert scores. Let us apply quantification theory to this example. The first sets of optimal scores for the categories and the means calculated from the optimal scores are as given in Table 2.3. If we plot those weighted category means against the optimal scores, we see a perfect linear relation for each of the two questions (Figs. 2.3 and 2.4). As we will learn later, these scores, obtained by quantification theory, are mathematically optimal: These optimal scores are such that they extract the maximal amount of
26
2 Data Analysis and Likert Scale
Fig. 2.1 Optimal scores and means (sleeping pill)
Fig. 2.2 Optimal scores and means (sleeping) Table 2.3 Optimal scores and means Row score Mean − 1.30 − 0.059 0.43 0.58 1.55
− 0.84 − 0.38 0.28 0.38 1.00
Column score
Mean
− 1.20 − 0.64 − 0.49 0.87 1.47
− 0.78 − 0.42 − 0.32 0.56 0.95
2.2 Likert Scale, a Popular but Misused Tool
27
Fig. 2.3 Optimal scores and means (sleeping pill)
Fig. 2.4 Optimal scores and means (sleeping)
information from the data, and it is known that they maximize the correlation between sleeping pills and sleep. In fact, the slope of each of the graphs is equal to the maximized correlation coefficient. So, the first example is a case in which Likert scores are appropriate for this data set. We should mention that this is a very rare case. We should note that quantification theory further improves the results, that is, perfectly linear relations between category scores and the means. One way to show the superiority of quantification theory over Likert scores, even for the current example, is to show that the row-column correlation from Likert scores is lower than that obtained from quantification theory. So, even when Likert scores work well, quantification theory can still improve it. This is a remarkable property of quantification theory.
28
2 Data Analysis and Likert Scale
Table 2.4 Body strength and age Strength Up to 10 kg
Age
Younger than 15 16–40 41–65 Over 65
11–25
Over 40
15
20
3
0
0 1 16
12 16 9
18 25 3
26 6 0
Table 2.5 Likert scores and corresponding means Row score Mean Column score 1 2 3 4
26–40
1.68 2.86 2.75 1.53
1 2 3 4
Mean 2.56 2.25 2.62 2.50
Fig. 2.5 Likert scores and means (age)
Body Strength as a Function of Age It is very rare that Likert scores are successful in capturing the information in the data reasonably well. The above example is exceptional. The next example is a typical one. Consider the following two questions: Q1: What is your age category? (1 = younger than 15; 2 = 16–40; 3 = 41–65; 4 = over 65) Q2: What is the maximum weight you can lift with one hand? (1 = up to 10 kg; 2 = 11–25 kg; 3 = 26–40 kg; 4 = more than 40 kg). One hundred and seventy subjects answered these two questions, and the data collected are listed in Table 2.4.
2.2 Likert Scale, a Popular but Misused Tool
29
Fig. 2.6 Likert scores and means (weight) Table 2.6 Optimal scores and means (age and weight) Row score Mean Column score 0.79 − 0.73 − 0.35 0.99
0.56 0.08 − 0.25 0.71
1.19 0.25 − 0.47 − 0.92
Mean 0.98 0.18 − 0.33 − 0.66
As in the first example, let us calculate the means of the two sets of categories using Likert scores (Table 2.5), and plot the means against Likert scores as shown in Figs. 2.5 and 2.6. The results are totally different from the first example. We no longer see any linear relations between the Likert scores and the means. Each of the two lines in each graph is so different from a linearly ascending or descending line. These graphs show a total failure of Likert scores in capturing information in the data. The main reason for this failure of Likert scores is because the body strength and the age are related not linearly, but nonlinearly. We can easily tell from the data that the body strength is weak when the subject is very young, but as one gets older the body strength increases to a certain age group, from which the strength becomes weaker again as one gets older. This is an example of nonlinear relation between the two variables (body strength and age), and Likert scores utterly failed to capture nonlinear relations in the data. What will happen if we replace Likert scores with optimal weights obtained from quantification theory? Table 2.6 shows the means calculated from optimal weights and optimal weights for the rows and the columns of data. When the average scores are plotted against the optimal category weights, we obtain a perfect linear graph as shown in each of Figs. 2.7 and 2.8. Note that Likert scores for the horizontal axis in the previous graph are now replaced with the optimal scores. Look at Fig. 2.7 (Note: The sign of optimal quan-
30
2 Data Analysis and Likert Scale
Fig. 2.7 Optimal scores and means (age)
Fig. 2.8 Optimal scores and means (weight)
tities can be reversed as is the case of this example, yet in either way, they maximize the criterion such as the row-column correlation. Should this happen, simply reverse the signs of the optimal weights. In our current example, the sign is reversed and we will interpret the results accordingly). By reversing the sign of the quantification output, Fig. 2.7 shows that the weakest group is over 60 years of age, then the second weakest group is 16 years and younger, then 41–60 years and the strongest is the group of age 16–40. This makes sense. Then, the signs of the corresponding results of weights are also reversed (Fig. 2.8). In terms of our data, the lightest one is up to 10 kg, then 11–25 kg, 26–40 kg and the heaviest is over 40 kg as our common sense would also tell us. The interval between the adjacent weights can be inferred from their positions on the straight line in the graph. From this example, we can tell the nonlinear relation between the age and the weight one can lift with one hand. This clear relation can never be observed from
2.2 Likert Scale, a Popular but Misused Tool
31
Table 2.7 Preference of tea as a function of water temperature Temperature Frozen . . . . . Best . . . . . Worst
0 0 0 1 2 5 21
0 3 6 1 0 0 0
1 8 5 0 0 0 0
0 1 2 8 1 0 0
0 0 0 1 9 1 0
1 9 3 1 0 0 0
.
.
.
Boiling
13 2 1 0 0 0 0
5 15 2 1 0 0 0
0 5 10 2 1 0 0
0 0 0 0 1 11 18
the analysis of Likert scores. In other words, Likert scores lead us to an utterly uninterpretable output. Note: Although we used here the first set of optimal weights for rows and columns, the current data have multidimensional structure, and in practice, we will extract more components. Although we do not look at the remaining components at this stage, we will look at more than one components when we fully discuss quantification theory. For each component, we know that the graph of ages and the graph of body strength show identical slopes. In later chapters, we will show that this angle is given by θ = cos−1 ρ (Nishisato, 1988); see also Nishisato and Clavel (2003), where ρ is the maximized correlation between age and body strength categories. This correlation is the maximal nonlinear correlation between age and body strength (Note: One may wonder about the term nonlinear correlation, but it is used here to designate Pearson’s linear correlation of nonlinearly transformed variables as nonlinear correlation). Other interesting statistics associated with quantification results will be explained later. Preference of Tea as a Function of Water Temperature Let us look at another example, in which subjects were asked to indicate the preference of tea in relation to the water temperature. We can easily guess that some people prefer ice tea, while others like hot tea better. Most people would hate frozen or boiling-hot tea. Lukewarm tea is not much liked either. Anyway, this is a case where we would anticipate a nonlinear relation between tea preference and water temperature. The preference was evaluated in terms of a seven-point scale from 1 (the most preferred) to 7 (the least preferred) when the water temperature was controlled on a ten-point scale from 1=completely frozen to 10=boiling hot. The 7 × 10 contingency table of the preference-by-temperature is given as Table 2.7. Using Likert scores, we calculate the means of tea preference and the means of the water temperatures. Here we decided to use centered Likert scores of 3, 2, 1, 0, − 1, − 2 and − 3 for the preference ratings and 4, 3, 2, 1, 0, − 1, − 2, − 3 and − 4 for the temperature ratings (Table 2.8), and the graphs for these cases are shown in Figs. 2.9 and 2.10, one for the preference ratings and the other for temperature ratings.
32
2 Data Analysis and Likert Scale
Table 2.8 Likert scores and means of preference and water temperature Row score Mean Column score Mean 3 2 1 0 −1 −2 −3
− 1.05 − 2.10 − 0.19 0.50 − 0.14 − 1.41 4.31
4 3 2 1 0 −1 −2 −3 −4
Table 2.9 Optimal scores and means (preference and temperature) Row score Mean Column score 1.02 0.90 0.81 0.36 − 0.52 − 1.21 − 1.23
0.24 0.11 0.78 0.35 − 0.50 − 1.17 − 1.19
Fig. 2.9 Likert scores and means (preference)
− 1.16 0.81 0.90 0.42 − 0.52 1.02 0.91 0.73 − 1.24
− 2.58 1.20 1.71 0.25 − 1.00 2.75 2.04 1.06 − 2.77
Mean − 1.09 0.28 0.25 0.33 − 0.48 0.26 0.21 0.45 − 1.16
2.2 Likert Scale, a Popular but Misused Tool
33
Fig. 2.10 Likert scores and means (temperature)
These two plots (Figs. 2.9 and 2.10) show two nonlinear relations, which suggests Likert scores have failed to capture the relations between the preference of tea and the water temperature. In other words, Likert scores will give us only meaningless information, and we must conclude that Likert scores cannot be used for this data set. This is another example where Likert scores totally failed to capture the information in the data, the key for failure being that the relation between the tea preference and water temperature is nonlinear. Let us subject the data to quantification analysis to obtain the two sets of optimal weights for the two categorical variables and then calculate the means of the two sets of categories using the corresponding optimal weights as given in Table 2.9 Following the same procedure as before, we can obtain two graphs of the means as functions of optimal scores. Even in this case, our quantification theory presents two sets of perfectly linear relations between category means and optimal category weights (see Figs. 2.11 and 2.12), that is, a perfect linear relation between nonlinearly transformed variables. Note that the positions on the horizontal axis indicate optimal weights for the corresponding variables, tea preference and water temperature. Even in this disastrous case for Likert scores, optimal quantification successfully captures the relation between water temperature and tea preference: Hot tea is best, ice tea is also a hit, but lukewarm is not very popular and the worst are frozen tea and boiling-hot tea. We have shown only the first component from quantification theory. The fact is that most data have multidimensional structure and quantification theory extracts all those multidimensional components, thus providing much more information than what we have seen here, but we should wonder what may happen if we use Likert scores for multidimensional analysis. Even for the first (major) component, Likert scores often fail to capture the information in the data. It is almost certain that Likert scores will fail to capture multidimensional information. We should always be concerned with the validity of results of analysis.
34
2 Data Analysis and Likert Scale
Fig. 2.11 Optimal scores and means (preference)
Fig. 2.12 Optimal scores and means (temperature)
In the current sections, we looked at only the first step of analysis, using Likert scores and quantification scores. As mentioned above, however, we typically carry out multidimensional analysis of data, and the failure of Likert scores will compound in multidimensional analysis. Thus, we must be certain that Likert scores pass the initial test of linearity. It is typically the case, however, that our data contain very little information of linear relations between variables. What we have learned here is that Likert scores can be used only when we are dealing with linear relations between variables. Such an ideal situation, however, is extremely rare in practice. Keep in mind that Likert scores are useful only when the response categories are strictly ordered (e.g., never, sometimes, often, always), which are likely to restrict our attention to the scope of analysis. Other than the limitation to only linear relations, Likert scores cannot be used when the response categories are not ordered ( e.g., right-handed, left-handed, ambidextrous). In contrast, quantification theory can handle such non-ordered categories as well.
2.2 Likert Scale, a Popular but Misused Tool
35
Considering that Likert scale is used in so many survey studies, what advice can we give to the researchers? We are constantly asked to fill in medical check lists on Likert scale, but we wonder if the researchers are aware of the limitations of Likert scale and if they are knowledgeable enough to resort to a better alternative than analyzing Likert scores.
2.2.2 Warnings on Inappropriate Use of Likert Scale (1) Use Likert scores as codes, not scores: Even if the response categories are ordered and thus Likert scores look appropriate, use Likert scores not as numerals for analysis, but only as codings or codes for responses without any numerical information. In this context of Likert scores being used as codes, not as scores, we can stretch our study domain to any categorical variables. (2) Never impose an order constraint on Likert scale: Likert scale consists of ordered categories, and because of this there are researchers who wish to impose an order constraint on Likert scale in order to arrive at modified and “better” scores than Likert scores. This is totally absurd, however, from the data analytic point of view. An example of this problem can be given as follows: “A researcher wishes to derive a better set of scores for ordered categories than what Likert scale provides.” This sounds reasonable at a first glance, but the order constraint will severely limit the scope of data analysis. As mentioned above, most researchers nowadays deal with multidimensional analysis, anticipating all types of relations involved in data. Although we understand the intention of refining the equal intervals of categories to variable intervals, for example, in order to maximize intervariable correlation, the order constraint limits the analysis to unidimensional analysis since other multidimensional components do not satisfy the order constraint. Remember that most data we deal with involve multidimensional and nonlinear relations. Most researchers are well aware that Likert scale is a remnant of the early days of unidimensional data analysis. Nowadays, we are well aware that our data contain not only linear but also nonlinear relations and many components. Data collection is a very expensive task, and we should therefore try to extract as much information as possible once data are collected. In this context, quantification theory is an ideal tool for analysis because it provides the most efficient procedure for tapping into whatever information, linear or nonlinear, unidimensional or multidimensional, the data may contain. It also provides a method of exhaustive analysis of information in your data by extracting more than one component. Quantification theory deals with multidimensional decompositions of data, and the next few chapters are devoted to some mathematical background necessary to pursue quantification theory. If the information there is not needed, please skip those chapters on mathematical background.
36
2 Data Analysis and Likert Scale
References Likert, A. A. (1932). A technique for the measurement of attitudes. Archives of Psychology, 140, 1–55. Nishisato, S. (1975). Oyo Shinri Shakudoho: Shitsuteki Data no Bunseki to Kaishaku (Applied psychological scaling: Analysis and interpretation of qualitative data). Seishin Shobo (in Japanese). Nishisato, S. (1980). Analysis of categorical data: Dual scaling and its applications. The University of Toronto Press. Nishisato, S. (1988). Effects of coding on dual scaling. University of California, Los Angeles: A paper presented at the annual meeting of the Psychometric Society. Nishisato, S. (2007). Multidimensional nonlinear descriptive analysis. Chapman-Hall/CRC. Nishisato, S., & Clavel, J. G. (2003). A note on between-set distances in dual scaling and correspondence analysis. Behaviormetrika, 30, 87–98.
Part II
Mathematics
To introduce statistical procedures of data analysis, one of the basic tools we must discuss is an elementary set of mathematical procedures as used in data analysis. Since mathematics is taught at high schools and universities, it may not be necessary for many readers to follow this chapter. However, it is not always the case that the readers are all familiar with some useful mathematical tools we use in quantification theory. Recall that we will discuss quantification of nominal variables, and some readers may not be familiar with a few mathematical procedures we will use. Redundant as it may be, therefore, we will briefly overview some basic topics for the benefit of some readers.
Chapter 3
Preliminaries
3.1 An Overview It is difficult to write an appropriate textbook on mathematics for data analysis. Even so, it is still important to provide enough mathematical background necessary for understanding quantification theory, particularly because quantification theory deals with qualitative or non-quantitative data where simple mathematical operations of addition, subtraction, multiplication and division of data cannot directly be applied to such qualitative information as being right-handed, left-handed and ambidextrous. The part 2 is a collection of some mathematical topics which are hopefully useful for handling of non-quantitative data. Although there are a countless number of good reference books, the following are some of the relevant old books from the author’s library (Bock, 1975; Browne, 1958; Graybill, 1961, 1969, 1976; Horst, 1935; Lewis, 1960; Merritt, 1962; Miller, 1972; Paul & Haeussler, 1973; Tan, 1987; Taylor, 1955; Yamane, 1962). It is hoped that our ensuring discussion of mathematical topics will be helpful in a smooth transition from the traditional statistical analysis to quantification analysis. If this chapter appears to be frivolous, please skip the chapter.
3.2 Series and Limit A succession of terms a1 , a2 , a3 , . . . , an , . . . is called a sequence, and a series is the sum of the terms of a sequence, for example, s1 = a 1 , s 2 = a 1 + a 2 , s 3 = a 1 + a 2 + a 3 , . . . When sn approaches a limit S as n → ∞, the series is said to be convergent, and we write it as © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Nishisato, Measurement, Mathematics and New Quantification Theory, Behaviormetrics: Quantitative Approaches to Human Behavior 16, https://doi.org/10.1007/978-981-99-2295-6_3
39
40
3 Preliminaries
lim sn = S
n→∞
Suppose that A = lim f (x) and B = lim f (y) n→∞
n→∞
then lim [ f (x) + f (y)] = A + B
n→∞
lim [ f (x) f (y)] = AB
n→∞
lim
n→∞
f (x) A = f (y) B
When the series is not convergent, it is called divergent. There are a number of mathematical functions which are expressed as infinite series, for example, trigonometric functions, sin x = x −
x5 x7 x3 + − + ... 3! 5! 7!
cos x = 1 −
x2 x4 x6 + − + ... 2! 4! 6!
tan x = x +
2x 5 17x 7 x3 + + + ... 3 15 315
Functions similar to the trigonometric functions are defined with respect to the equilateral hyperbola, called hyperbolic functions. Some of the familiar hyperbolic functions we encounter in data analysis are sinh x =
e x − e−x : Hyperbolic sine x 2
cosh x =
e x + e−x : Hyperbolic cosine x 2
tanh x =
e x − e−x : Hyperbolic tangent x e x + e−1
These hyperbolic functions can also be expressed as an infinite series sinh x = x +
x3 x5 x7 + + + ... 3! 5! 7!
3.2 Series and Limit
41
cosh x = 1 + tanh x = x −
x4 x6 x2 + + + ... 2! 4! 6!
2x 5 17x 7 x3 + − + ... 3! 15 315
In addition to these trigonometric and hyperbolic series which we use in quantification, there are other equally important series such as binomial series, logarithmic series, exponential series, Maclaurin’s series, Taylor series and Fourier series. The applications of limits are not restricted to an infinite series, but we can choose the limit to any numbers of a continuous function, for example, what is the limit of the function f (x) =
1−x 1 − x2
when x goes to 1? The answer is as follows: lim f (x) = lim
x→1
x→1
1 1−x 1−x 1 = lim = = lim 2 x→1 x→1 1−x (1 − x)(1 + x) 1+x 2
What about the limit of the following function as x → ∞? 10 − x5 + x22 10x 3 − 5x 2 + 2x 10 = lim = 2.25 = x→∞ 4x 3 − 8x 2 + 3 x→∞ 4 − 8 + 33 4 x x lim
3.2.1 Examples from Quantification Theory Let us look at two applications of the notion of limit to quantification theory. The Method of Reciprocal Averages The well-known method of reciprocal averages (Horst, 1963; Mosier, 1946; Richardson & Kuder, 1933) is based on successive calculations of the means, first with arbitrary weights on the categories (excluding 0 for all categories), then the first set of means as weights for calculating new means, then new means as weights to calculate yet another set of new means and so on. This process eventually leads to the optimal quantification of categorical variables. Recall that we discussed Likert scoring and compared the average Likert scores and optimal scores. Let us use the same example of the relation between the age and the weight one can lift with a single arm. Recall that we asked the following questions: • Q1: What is your age category? (1 = younger than 15; 2 = 16–40; 3 = 41–65; 4 = over 65)
42
3 Preliminaries
Table 3.1 Body strength and age Strength Age
Younger than 15 16–40 41–65 Over 65
Up to 10 kg
11–25
26–40
Over 40
15 0 1 16
20 12 16 9
3 18 25 3
0 26 6 0
Table 3.2 Likert scores and corresponding means Row score Mean Column score 1 2 3 4
1.68 2.86 2.75 1.53
1 2 3 4
Mean 2.56 2.25 2.62 2.50
Fig. 3.1 Likert scores and means (age)
• Q2: What is the maximal weight you can lift with one hand? (1 = up to 10 kg; 2 = 11–25 kg; 3 = 26–40 kg; 4 = more than 40 kg). One hundred and seventy subjects answered these two questions, and the data collected are listed in Table 3.1. Remember that we calculated the means of the two sets of categories using Likert scores (Table 3.2), and plot the means against Likert scores as shown in Figs. 3.1 and 3.2. Let us now use the means of the categorical variables as weights (this time, not Likert scores) to calculate the new means and then use these new means as weights to calculate yet another set of new means. If we continue this reciprocal averaging computations, we will see that after each iteration the graph of the new means plotted against the previous means gradually approaches a straight line. We have already seen
3.2 Series and Limit
43
Fig. 3.2 Likert scores and means (weight)
Fig. 3.3 Final scores and means (age)
the final results where the means calculated from previous means show a perfectly straight line. Then, the last means are the optimal weights. This is the result based on the limit of the reciprocal averaging scheme. The important points are • The process is independent of initial selections of arbitrary weights for the categories so long as they are not all equal to zero. • The two final plots for the rows and the columns have the same slope, which is equal to the maximal correlation between the rows and the columns of the original table. When the process converges to the stable point, it is the optimal quantification with the graphs showing a straight line with the slope equal to the maximal correlation between the rows and the columns as we saw in Chap. 2. The final graph is duplicated here (Figs. 3.3 and 3.4).
44
3 Preliminaries
Fig. 3.4 Final scores and means (weight)
Forced Classification A simple application of series and limits can also be found in Nishisato’s forced classification method (1984; see also Nishisato, 1980, 1988; Nishisato & Gaul, 1989, 1990; Nishisato & Lawrence, 1989; Nishisato & Baba, 1999), based on a simple idea that if we multiply a part of the multiple-choice data matrix (chosen as the criterion item) successively, the quantification of the multiple-choice data approaches the point where the correlation of each item with the criterion item attains the maximal values. As can be inferred from the relevant papers mentioned above, the original idea has been much further generalized to more complex situations than the original simple idea. Consider seven multiple-choice data where the seventh item is chosen as the criterion item so that the data can be represented as [F1 , F2 , . . . , F6 ,Fc ], where each item-data matrix is (the number of subjects)-by-(the number of response alternatives of the item): Each subject chooses one option per questions so that the elements of the data matrix consist of 1 (choice) and 0s (non-choices) such that Fj 1 = 1 for all the subjects. Consider a modified data matrix where the criterion item is multiplied by a constant p so that the modified data matrix can be written as [F1 , F2 , . . . , F6 , pFc ]. As the value of p of the criterion item c goes to infinity, the correlation between the criterion and each item reaches the maximal value. Figure 3.5 shows how the correlation coefficients of the other items with the criterion item (the vertical axis) become larger as the value of p increases and finally reaches the maximal value. Nishisato (1984) showed that the above process is nothing but quantification of the data projected onto the column space of the criterion item and that the correlation between the criterion item with each of the remaining items attains the maximal value when the data are projected onto the criterion-item space. This is an important
3.3 Differentiation
45
Fig. 3.5 Limit to the maximum correlation
finding. In our example, this process of approaching the maximal correlations is captured in Fig. 3.5. Let us get back to the main topic. For the current chapter, we will use our knowledge of series and limits in introducing differential calculus.
3.3 Differentiation Let us look at the way in which we can find a tangent line to a function, called derivative of a function. Let y be a function of x, that is, y = f (x). Then, we can pick up two points on the function, say (x1 , f (y1 )) and ((x2 , f (y2 )). If y is a quadratic function of x, we can visualize this as in Fig. 3.6, where two points on the horizontal axis of variable X are chosen, that is, x1 and x2 and the corresponding axes on the vertical line are, respectively, f (x1 ) and f (x2 ). Let us indicate the difference between x1 and x2 by h, that is, x2 − x1 = h. Then, the slope of the two points on the function can be expressed as f (x2 ) − f (x1 ) f (x1 + h) − f (x1 ) = . (x2 − x1 ) h
46
3 Preliminaries
Fig. 3.6 Quadratic function
Consider the limit of this function as h → 0. If this limit exists, it is called derivative of f with respect to x and is typically indicated by f , and the function is called differentiable at that point. The value then gives the slope of the function at that point. The derivative of a function is very important for data analysis because when we have a function of order 2 or greater, we can use the fact that finding the point of a function where the derivatives are equal to zero, namely those points which correspond to local minima or maxima of the function. The problem of maximization or minimization of a quadratic function shows up in many statistical problems of the least-squares estimation of a quadratic functions. In quantification theory, too, we will use the differentiations of functions so as to maximize the Cronbach’s reliability coefficient α (Cronbach, 1951), the correlation ratio, the correlation, the variance, the coefficient of homogeneity and so on.
3.4 Derivative of a Function of One Variable Consider a function y = f (x) as we saw above, then the derivative of y is defined by dy f (x + h) − f (x) = lim . h→0 dx h For example, if y = x 2 , then the derivative of y with respect to x is given by (x + h)2 − x 2 dy = lim = lim (2x + h) = 2x. h→0 h→0 dx h
3.7 Differentiation Formulas
47
3.5 Derivative of a Function of a Function Suppose that y is a function of u and u is a function of x, then dy dy du = . dx du dx For example, if y = (x 2 + 3)3 , we can express this as y = u 3 and u = x 2 + 3, then dy dy du = = 3u 2 (2x) = 3(x 2 + 3)2 2x = 6x(x 2 + 3). dx du dx
3.6 Partial Derivative Suppose we have a function of many variables. When we consider a derivative of one variable by treating all other variables as constant, then it is called a partial derivative. The partial derivative of function z = f (x, y) with respect to x is denoted by ∂z ∂x and is defined by ∂z f (x + h, y) − f (x, y) = lim . h→0 ∂x h This equals the slope of the curve of intersection of the surface surrounded by z = f (x, y) and a plane y = constant.
3.7 Differentiation Formulas It is useful to know the following formulas for differentiations. Let c be any constant, then d (c) = 0. dx Let n be any real number, then d n (x ) = nx n−1 . dx
48
3 Preliminaries
There are a few rules such as df d [c f (x)] = c ; dx dx
d df dg [ f (x) + g(x)] = + dx dx dx
d df dg [ f (x) − g(x)] = − ; dx dx dx
d dg df f (x)g(x) = f (x) + g(x) dx dx dx
g(x) ddxf − f (x) dg d f (x) dx = . dx g(x) [g(x)]2 Suppose y is a function of u and u is a function of x, then dy dy du = ; dx du dx d du sin u = cos u ; dx dx
du n du = nu n−1 ; dx dx d du cos u = − sin u ; dx dx
d du cot u = − csc2 u ; dx dx d du csc u = csc u ; dx dx d 1 du arccos u = √ ; dx 1 − u 2 dx d 1 du u=− ; dx 1 + u 2 dx d 1 du u=− √ ; 2 dx dx u u −1 d du cosh u = sinh u ; dx dx d du coth u = −2 u ; dx dx
d 1 du ln u = dx u dx d du tan u = sec2 u dx dx
d du sec u = sec u tan u dx dx d 1 du arcsin u = √ dx 1 − u 2 dx d 1 du arctan u = dx 1 + u 2 dx d 1 du u= √ 2 dx dx u u −1 d du sinh u = cosh u dx dx d du tanh u =2 u dx dx d du u = −u tanh u dx dx
d du u = u coth u . dx dx As mentioned earlier, derivatives and partial derivatives are typically used when we want to find maximums or minimums of a function, such as finding a linear combination of test scores that maximize the variance or a composite score of several tests that maximize Cronbach’s reliability coefficient α (Cronbach, 1951) or quantification theory with optimal properties.
3.9 Lagrange Multipliers
49
3.8 Maximum and Minimum Value of a Function When we deal with data, we often consider a linear combination of variables that is associated with, for example, the maximal value of such statistics as variance, reliability coefficient or validity coefficient. The main mathematical tool is differentiation and derivatives. We have so far looked at derivatives of functions f , which are referred to as the first derivatives, indicated as f . Let us introduce the second derivatives, say f , defined as follows: Let f (x) and f =
d f (x) df , then f = . dx dx
The first and the second derivatives are needed when we want to see if the function of the first derivative indicates a maximum or a minimum. For instance, consider the function y = f (x) = 4x 3 − 2x 2 . Then, this is a cubic function of x and has one local maximum and one local minimum. What are they? Set the first derivative equal to 0, that is, dy = f = 12x 2 − 4x = 4x(3x − 1) = 0. dx The solutions are x = 0, 13 . Substitute these to the second derivative which is given by f = 12x − 4 = −4 and f x= we obtain f x=0 1 = 0. 3 When the second derivative is negative, the function has a local maximum, and when the second derivative is non-negative, the function has a local minimum. We can visualize a local maximum at the peak of a concave function and a local minimum at the bottom of a convex function.
3.9 Lagrange Multipliers In quantification theory, we often deal with the task of maximization of a function under some constraints on the variables. This maximization problem can be handled by Lagrange’s maximization method (Note: Joseph Louis Lagrange was a great eighteenth-century mathematician). Consider the maximization or the minimization of a function f (x, y, z) under the constraint g(x, y, z) = k.
50
3 Preliminaries
Lagrange’s method introduces a function u = f (x, y, z) + λg(x, y, z), where λ is a constant to be determined and λ is called Lagrange’s multiplier. Treat x, y, z as independent variables, and calculate ∂u ∂u ∂u = 0, = 0, = 0. ∂x ∂y ∂z Solve these three equations along with the equation of the constraint g(x, y, z) = k. Lagrange’s method of multipliers is frequently used in data analysis. In quantification theory, too, we encounter such a problem as maximizing the statistic, called the correlation ratio, which is defined as the ratio of the between-group sum of squares to the total sum of squares. To maximize the correlation ratio, it is easier to set the problem as that of maximizing the between-group sum of squares, subject to the condition that the total sum of square is, for example, 100. Later we will see this example when we discuss quantification theory. Lagrange’s method of multipliers is very useful in many areas of data analysis. Therefore, let us look at its applications to three examples.
3.9.1 Example 1 The first example is to find the relative minimum of the function f (x, y) = x 2 + 5y 2 subject to the condition that x + y = 1. The Lagrangian function for this problem is u(x, yλ) = f (x, y) + λg(x, y) = x 2 + 5y 2 + λ(x + y − 1) We then differentiate u with respect to x, y, λ and set them equal to zero, ∂u = 2x + λ = 0 ∂x
3.9 Lagrange Multipliers
51
∂u = 10y + λ = 0 ∂y ∂u = x + y − 1 = 0. ∂λ Thus, we obtain λ λ x =− ,y =− . 2 10 Substituting these to the third equation, we obtain x +y−1=−
λ λ − − 1 = 0. 2 10
Thus, λ=−
5 3
and x=
5 1 and y = . 6 6
At this point, the function attains its minimum value, which is x 2 + 5y 2 =
2 2 5 1 5 +5 = . 6 6 6
3.9.2 Example 2 This is from the author’s favorite text book in his student days, that is, Taylor’s Advanced Calculus (Taylor, 1955). Find the dimensions of the box of the largest volume which can be fitted to the ellipsoid x2 y2 z2 + 2 + 2 =1 2 a b c assuming that each edge of the box is parallel to a coordinate axis. Each of the eight corners of a box will lie on the ellipsoid. Let the corners in the first octant have coordinates (x, y, z); then the dimensions of the box are 2x, 2y, 2z, and its volume is 8x yz (p.199).
52
3 Preliminaries
The Lagrangian function for this problem u is given by y2 z2 x2 + 2 + 2 −1 . u = 8x yz + λ a2 b c
Then, ∂u x = 8yz + 2 2 = 0 ∂x a y ∂u = 8x z + 2 2 = 0 ∂y b z ∂u = 8x y + 2 2 = 0. ∂z c Let us multiply these three expressions by x, y, z, respectively, and add all the three equations, resulting in
y2 z2 x2 + + 24x yz + 2λ a2 b2 c2
= 0.
But, inside the parenthesis is 1, so that we obtain 24x yz + 2 = 0. That is, 12x yz + λ = 0 hence, λ = −12x yz. Substituting this value of λ to the above three equations, we eventually arrive at the following equations: yz(a 2 − 3x 2 ) = 0,
zx(b2 − 3y 2 ) = 0,
x y(c2 − 3z 2 ) = 0.
Assuming that x, y, z are positive, we arrive at the following solutions: a x=√ , 3
b y=√ , 3
c z=√ , 3
4 λ = −12x yz = − √ . 3
References
53
Therefore, the box of maximum volume has dimensions (2x, 2y, 2z) which is
2a 2b 2c √ ,√ ,√ 3 3 3
and the maximal volume is 8abc V = 8x yz = √ . 3 3 We will revisit Lagrange’s method of multipliers when we introduce matrix calculus and quantification theory, where we will consider maximizing the between-group sum of squares under the condition that the total sum of squares is equal to a constant, one of the quantification approaches.
References Bock, R. D. (1975). Multivariate statistical methods in behavioral research. McGraw-Hill. Browne, E. T. (1958). Introduction to the theory of determinants and matrices. The University of North Carolina Press. Cronbach, L. J. (1951). Coefficient alpha and the internal structure of tests. Psychometrika, 16, 297–334. Graybill, F. A. (1961). An introduction to linear statistical models. McGraw-Hill. Graybill, F. A. (1969). Introduction to matrices with applications in statistics. Wadsworth Publishing Company. Graybill, F. A. (1976). Theory and applications of linear models. Wadsworth Publishing Company. Horst, P. (1935). Measuring complex attitudes. Journal of Social Psychology, 6, 369–374. Horst, P. (1963). Matrix Algebra for social scientists. Holt, Rinehart and Winston. Lewis, D. (1960). Quantitative methods in psychology. McGraw-Hill. Merritt, F. S. (1962). Mathematics manuals. McGraw-Hill. Miller, R. E. (1972). Modern mathematical methods for economics and business. Holt, Rinehart and Winston. Mosier, C. I. (1946). Machine methods in scaling by reciprocal averages. In Proceedings, Research Forum (pp. 35–39) International Business Corporation. Nishisato, S. (1980). Analysis of categorical data: Dual scaling and its applications. The University of Toronto Press. Nishisato, S. (1984). Forced classification: A simple application of a quantification technique. Psychometrika, 49, 25–36. Nishisato, S., et al. (1986). Generalized forced classification for quantifying categorical data. In E. Diday (Ed.), Data analysis and informatics, IV (pp. 351–362). Elsevier Science Publishers B. V., North Holland. Nishisato, S. (1988). Market segmentation by dual scaling through generalized forced classification. In W. Gaul & M. Schader (Eds.), Data, expert knowledge and decisions (pp. 268–278). SpringerVerlag. Nishisato, S. (1988). Forced classification procedure of dual scaling: Its mathematical properties. In H. H. Bock (Ed.), Classification and related methods (pp. 523–532). North Holland.
54
3 Preliminaries
Nishisato, S., & Baba, Y. (1999). On contingency, projection and forced classification of dual scaling. Behaviormetrika, 26, 207–219. Nishisato, S., & Clavel, J. G. (2003). A note on between-set distances in dual scaling and correspondence analysis. Behaviormetrika, 30, 87–98. Nishisato, S., & Gaul, W. (1989). Marketing data analysis by dual scaling. International Journal of Research in Marketing, 5, 151–170. Nishisato, S., & Gaul, W. (1990). An approach to marketing data analysis: The forced classification procedure of dual scaling. Journal of Marketing Research, 27, 354–360. Nishisato, S., & Lawrence, D. R. (1989). Dual scaling of multiway data matrices: Several variants. In R. Coppi & S. Bolasco (Eds.), Multiway data analysis (pp. 317–326). North Holland. Paul, R. S., & Haeussler, E. F., Jr. (1973). Introductory mathematical analysis: For students of business and economics. Reston Publishing Company. Richardson, M., & Kuder, G. F. (1933). Making a rating scale that measures. Personnel Journal, 12, 36–40. Tan, S. T. (1987). Calculus for the managerial, life and social sciences. PWS Publishers. Taylor, A. E. (1955). Advanced Calculus. Ginn and Company. Yamane, T. (1962). Mathematics for economists: An elementary survey. Prentice-Hall.
Chapter 4
Matrix Calculus
When we deal with many variables, the designation of a variable by a subscript becomes cumbersome. Once we introduce the matrix notation, the expression of formulas with many variables can not only be vastly simplified, but it also leads to new mathematics of many variables in a unified way. The latter makes it possible for us to arrive at a unified formulation of multivariate analysis. Thus, the use of matrix algebra offers us simplification of mathematical formulas and integration of a variety of multivariate analysis into a unified framework. We hope that this chapter is easy enough to give us an overview of multivariate analysis. As for the references, there are too many to list here, but only a few from the author’s library are listed here (Browne, 1958; Curtis, 1963; Horst, 1963; Feeman & Grabois, 1970; Graybill, 1961, 1969, 1976; Bock, 1975). Let us start with matrix notation.
4.1 Different Forms of Matrices A matrix is a rectangular array of numbers, and these numbers are called elements. We indicate a matrix by a capital letter such as A, B, C and so on, and the elements by small letters with subscripts such as ai j , bi j , ci j , and so on where the first subscript indicates row and the second subscript column. A 3 × 4 matrix can be represented as, ⎤ ⎡ a11 a12 a13 a14 A = ⎣ a21 a22 a23 a24 ⎦ . a31 a32 a33 a34 The expression 3 × 4 is called order or dimension of the matrix. Consider summarizing data collected on preferences of coffee, tea and juice from four age groups, showing the numbers of choices for the 12 cells as ⎡ ⎤ 1364 A = ⎣3 5 2 6⎦. 6305 © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Nishisato, Measurement, Mathematics and New Quantification Theory, Behaviormetrics: Quantitative Approaches to Human Behavior 16, https://doi.org/10.1007/978-981-99-2295-6_4
55
56
4 Matrix Calculus
4.1.1 Transpose When we interchange rows and columns of A, the resultant matrix is called the transpose of A and indicated by A . For our 3 × 4 matrix A, the transpose is given by ⎡
1 ⎢3 A =⎢ ⎣6 4
3 5 2 6
⎤ 6 3⎥ ⎥ 0⎦ 5
4.1.2 Rectangular Versus Square Matrix When the number of rows is equal to that of columns, the matrix is called a square matrix, otherwise a rectangular matrix.
4.1.3 Symmetric Matrix A square matrix A is called symmetric if ai j = a ji for all values of i and j. For example, the following matrices are symmetric. ⎡
⎤
⎡
5 520 ⎢2 ⎣2 7 1⎦,⎢ ⎣0 013 7
2 7 1 6
0 1 3 8
⎤ 7 6⎥ ⎥. 8⎦ 4
4.1.4 Diagonal Matrix When a square matrix contains nonzero elements only in the diagonal position, it is called diagonal matrix and is typically designated as D. The following is an example of a diagonal matrix. ⎡
⎤ 5 0 0 D = ⎣0 7 0⎦. 0 0 3
4.1 Different Forms of Matrices
57
4.1.5 Vector A single column is referred to as a column vector, and it is customary to indicate it by the bold lowercase letter (e.g., a), and a single row is called a row vector and it is usually indicated as the transposed vector (e.g., a ). For example, if ⎡ ⎤ 3 a = ⎣2⎦, 5 then a = 3 2 5 .
4.1.6 Scaler Matrix and Identity Matrix If all the diagonal elements are equal, it is called a scalar matrix, and when the diagonal elements of a scalar matrix are 1, it is called the identity matrix and it is indicated by I. Examples of a scalar matrix and the identity matrix are: ⎡
5 ⎢0 ⎢ S=⎣ 0 0
0 5 0 0
0 0 5 0
⎤ 0 0⎥ ⎥ 0⎦ 5
⎡
1 ⎢0 ⎢ I=⎣ 0 0
0 1 0 0
0 0 1 0
⎤ 0 0⎥ ⎥ 0⎦ 1
The identity matrix often appears in statistical handling of data. If A and I are conformable for multiplication, then AI = IA = A
4.1.7 Idempotent Matrix If A is a square matrix such that A2 = A, then A is said to be an idempotent matrix. The necessary and sufficient condition for a matrix to be idempotent is that the eigenvalues (to be introduced shortly) are either 1 or 0. Furthermore, if A = A, then A is said to be a symmetric idempotent matrix. At this stage, we are not ready to discuss the role of idempotent matrices, but we mention here only that idempotent matrices play a very important role in statistics.
58
4 Matrix Calculus
4.2 Simple Operations 4.2.1 Addition and Subtraction When two matrices A and B are of the same order, the two matrices are said to be conformable for addition and subtraction, which are defined as follows: A + B = (ai j ) + (bi j ) = (ai j + bi j ) A − B = (ai j ) − (bi j ) = (ai j − bi j ) The operations are carried out on the corresponding elements of A and B so that, for example, ⎡
⎤ ⎡ ⎤ ⎡ ⎤ 1364 2161 3 4 12 5 ⎣3 5 2 6⎦ + ⎣1 2 7 0⎦ = ⎣4 7 9 6⎦ 6305 0193 64 9 8 ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ 1364 2161 −1 2 0 3 ⎣ 3 5 2 6 ⎦ − ⎣ 1 2 7 0 ⎦ = ⎣ 2 3 −5 6 ⎦ 6305 0193 6 2 −9 2 ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ 1 2 3 ⎣3⎦ + ⎣1⎦ = ⎣4⎦. 6 0 6
4.2.2 Multiplication The multiplication of two matrices AB is defined when the number of columns of A is equal to the number of rows of B. Matrix A is then said to be conformable to matrix B for multiplication. If A is m × r and, B is r × n, the product is an m × n matrix. Multiplication AB is defined by AB = C = (ci j ) =
r
ai p b pj .
p=1
Thus, the element of the product of cell (i, j) is the cross product of row i of A and column j of B.
4.2 Simple Operations
59
For example, ⎡
⎤ ⎡ ⎤ 13 (1 × 2) + (3 × 1) (1 × 1) + (3 × 2) (1 × 6) + (3 × 7) ⎣ 3 5 ⎦ 2 1 6 = ⎣ (3 × 2) + (5 × 1) (3 × 6) + (5 × 7) (3 × 6) + (3 × 7) ⎦ 127 (6 × 2) + (3 × 1) (6 × 1) + (3 × 2) (6 × 6) + (3 × 7) 63 ⎡ ⎤ 5 7 27 = ⎣ 11 51 39 ⎦ 15 12 57 ⎡ ⎤ 2 1 3 6 ⎣ 1 ⎦ = (1 × 2) + (3 × 1) + (6 × 0) = 5 0 ⎡ ⎤ ⎡ ⎤ 2 2 6 12 ⎣1⎦ 1 3 6 = ⎣1 3 6 ⎦. 0 00 0 Remember that the multiplication AB is carried out by multiplying rows of A and columns of B. The identity matrix I was defined earlier and it often appears in statistical handling of data. If A and I are conformable for multiplication, then AI = IA = A, There is another special matrix called idempotent matrix, which is often used in statistics. Let us define it here. If A is a square matrix such that A2 = A, then A is said to be an idempotent matrix. Furthermore, if A = A, then A is said to be a symmetric idempotent matrix. At this stage, we are not ready to discuss the role of idempotent matrices, but we mention here only that idempotent matrices play a very important role in statistics.
4.2.3 Scalar Multiplication The multiplication of a matrix by a constant k is called scalar multiplication and is defined by kA = (kai j ).
60
4 Matrix Calculus
For example, ⎡
⎤ ⎡ ⎤ 123 10 20 30 10 × ⎣ 4 5 6 ⎦ = ⎣ 40 50 60 ⎦ . 70 80 90 789
4.2.4 Determinant Associated with any square matrix, there exists a quantity called the determinant. It is a function of the elements of the matrix and is closely related to the volume function. The determinant of A is indicated by |A|. The determinant of a 2 × 2 matrix A is defined by a11 a12 = a11 a22 − a12 a21 . |A| = a21 a22 When the matrix is larger than 2 × 2, the determinant can be expressed in terms of minors and co-factors of elements. When the elements of the i-th row and the j-th column are deleted from an n × n square matrix A, the determinant of the remaining (n − 1) × (n − 1) matrix is called the minor of the element ai j and indicated by |Mi j |. For example, if ⎡
⎤ a11 a12 a13 A = ⎣ a21 a22 a23 ⎦ , a31 a32 a33 then a22 a23 = a22 a33 − a23 a32 |M11 | = a32 a33 a a |M12 | = 21 23 = a21 a33 − a31 a23 a31 a33 a21 a22 = a21 a32 − a22a31 . |M13 | = a31 a32 The minor with sign (−1)i+ j is called co-factor of ai j and is indicated by αi j . For example, α11 = (−1)1+1 |M11 | = a22 a33 − a23 a32
4.2 Simple Operations
61
α12 = (−1)1+2 |M12 | = −a21 a33 + a23 a31 α13 = (−1)1+3 |M13 | = a21 a32 − a22 a31 . Then the determinant of A can be expressed as a sum of the product of each element of a row (or column) of A times its co-factor, that is, |A| =
n
αi j ai j for any j (column expansion)
i=1
=
n
αi j ai j for any i (row expansion).
j=1
The following is an example of a row expansion: 2 1 0 3 5 4 = 2(−1)1+1 5 4 + 1(−1)1+2 3 4 + 0(−1)1+3 3 5 7 8 6 8 6 7 6 7 8 5 4 3 4 = 2[(5 × 8) − (4 × 7)] − [(3 × 7) − (4 × 6)] = 24. − = 2 7 8 6 8 We have the following important relations of the determinant. • |A| = |A | • If the two rows (or columns) of a square matrix are interchanged, the determinant of the matrix changes a sign. • If each element of rows (or columns) of matrix A is zero, |A| = 0. • If two rows (or columns) of A are identical, then |A| = 0. • If A and B are n × n matrices, then |AB| = |A||B|. There is an interesting example of the determinant, which can be used to express the area of a triangle by determinant. Consider the coordinates of three points, given by (x1 , y1 ), (x2 , y2 ), (x3 , y3 ). Then, the area of the triangle created by connecting these three points, A, can be expressed as x y 1 1 1 1 A = x2 y2 1 . 2 x y 1 3 3 For example, let P1 , P2 , P3 be given by (5,5), (−6,7) and (−7,−2). Then, the area of the triangle created by connecting these three points can be given by
62
4 Matrix Calculus
5 5 1 101 1 A = −6 7 1 = = 50.5. 2 −7 −2 1 2 A matrix A is said to be singular if |A| = 0 and non-singular if |A| = 0.
4.2.5 Inverse The inverse of a matrix A is defined for a square non-singular matrix, denoted as A−1 , and is defined by the relations as AA−1 = A−1 A = I. If D is a non-singular diagonal matrix, its inverse is given by −1
D
1 = diag djj
.
For example, if ⎡
⎤ ⎡ ⎤ 20 0 0.5 0 0 0⎦. A = ⎣ 0 4 0 ⎦ , A−1 = ⎣ 0 0.25 0 0 10 0 0 0.10
4.2.6 Hat Matrix Given an m × n matrix A, where r(A) = n, a hat matrix H is given by H = A(A A)−1 A . The hat matrix is often used in the least-squares estimation, the decomposition of the analysis of variance and to define projection operators. We will see its applications later.
4.2.7 Hadamard Product A Hadamard product ⊗ of two matrices A and B is an element-wise product defined in such a way that
4.3 Linear Dependence and Linear Independence
⎡ ⎤ a11 B a12 B a11 a12 · · · a1n ⎢ a21 B a22 B ⎢ a21 a22 · · · a2n ⎥ ⎢ ⎥ ⎢ ⎢ ⎢ . . ··· .⎥ . . ⎥⊗B=⎢ A⊗B=⎢ ⎢ ⎥ ⎢ . . ··· . . . ⎢ ⎥ ⎢ ⎣ ⎣ . . ··· .⎦ . . am1 am2 · · · amn am1 B am2 B ⎡
63
⎤ · · · a1n B · · · a2n B ⎥ ⎥ ··· .⎥ ⎥. ··· .⎥ ⎥ ··· .⎦ · · · amn b
4.3 Linear Dependence and Linear Independence A set of n m × 1 vectors (a1 , a2 , . . . , an ) are said to be linearly dependent if and only if there exists a set of numbers (c1 , c2 , . . . , cn ), not all of which are zero, such that c1 x1 + c2 x2 + · · · + cn xn = 0. For example, ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ 2 1 3 ⎣3⎦,⎣4⎦,⎣2⎦ 1 2 0 are linearly dependent, for the set of numbers (2, −1, −1) yields ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ 2 1 3 0 2⎣3⎦ − ⎣4⎦ − ⎣2⎦ = ⎣0⎦ 1 2 0 0 If a set of vectors is not linearly dependent, it is said to be linearly independent. This can also be stated that a set of vectors is linearly independent if and only if the equation c1 a1 + c2 a2 + · · · + cn an = 0 implies that c1 = c2 = · · · = cn = 0. For example, consider the following three vectors ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ 1 0 0 ⎣0⎦,⎣1⎦,⎣0⎦. 0 1 0 These vectors are linearly independent, for the equation ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ 1 0 0 c1 0 c1 ⎣ 0 ⎦ + c2 ⎣ 1 ⎦ + c3 ⎣ 0 ⎦ = ⎣ c2 ⎦ = ⎣ 0 ⎦ 0 0 1 0 c3 indicates that c1 = c2 = c3 = 0.
64
4 Matrix Calculus
4.4 Rank of a Matrix In terms of linear independence of vectors, the rank of A, r (A), can be defined as the maximum number of linearly independent vectors one can construct from the set (a1 , a2 , · · · , an ).
4.5 System of Linear Equations Suppose we have a set of m linear equations with n unknowns x1 , x2 , . . . , xn , namely ⎧ a11 x1 + a12 x2 + · · · + a1n xn = c1 ⎪ ⎪ ⎪ ⎪ ⎨ a21 x1 + a22 x2 + · · · + a2n xn = c2 ··· · ⎪ ⎪ · · · · ⎪ ⎪ ⎩ am1 c1 + am2 x2 + · · · + amn xn = cm where ai j and ci are known constants. Let us express the above set of equations in matrix notation as Ax = c where ⎡
a11 a12 · · · ⎢ a21 a22 · · · ⎢ ⎢ · · ··· A=⎢ ⎢ · · ··· ⎢ ⎣ · · ··· am1 am2 · · · amn
⎡ ⎤ ⎡ ⎤ ⎤ c1 x1 a1n ⎢ c2 ⎥ ⎢ ⎥ ⎢ x2 ⎥ a2n ⎥ ⎢ · ⎥ ⎢ ⎥ ⎥ ⎢ ⎥ ⎢ · ⎥ · ⎥ ⎥,x = ⎢ ⎥,c = ⎢ · ⎥ ⎢ ⎥ ⎢ · ⎥ · ⎥ ⎢ · ⎥ ⎢ ⎥ ⎥ ⎢ ⎥ ⎣ · ⎦ · ⎦ ⎣ cm ⎦ xn .
When we pre-multiply both sides of the equation by the inverse of A, we obtain A−1 Ax = A−1 c, namely x = A−1 c. For example, consider the following set of equations
2x1 + 3x2 = 16 4x1 + 8x2 = 36
4.6 Homogeneous Equations and Trivial Solution
65
or
23 48
x1 x2
=
16 . 36
The solution is given by
x1 x2
=
23 48
−1
1 16 8 −3 16 5 = = . 36 36 2 4 −4 2
Therefore, we obtain x1 = 5 and x2 = 2. What if A is not square, but non-square? In this case, the set of linear equations is solvable only if r (A) = r (A, c), where the matrix (A, c) is called the augmented matrix. In other words, the vector c must be linear combinations of the columns of A, the condition under which the solution can be found.
4.6 Homogeneous Equations and Trivial Solution The formula Ax = 0 is called a set of homogeneous equations. There always exists the solution that x = 0. If A is square and non-singular, this is the only solution, which we call a trivial solution. If A is square and singular, however, there exists a non-trivial solution. In this case, we can change the set of homogenous equations to a set of non-homogeneous equations by assigning arbitrary values of n − k unkowns, where k = r (A). For example, consider the following example ⎤⎡ ⎤ ⎡ ⎤ 0 532 x1 ⎣ 4 3 1 ⎦ ⎣ x2 ⎦ = ⎣ 0 ⎦ , 0 x3 303 ⎡
that is, Ax = 0. Note that there are three unknowns and r (A) = 2. Thus, we assign an arbitrary value to one of the three unknowns. Let us assign −1 to x3 . Then we
66
4 Matrix Calculus
transfer the corresponding vector -a3 to the right-hand side of the equation, resulting in the following equation ⎡ ⎤ ⎤ 2 53 ⎣ 4 3 ⎦ x1 = ⎣ 1 ⎦ , x2 3 30 ⎡
or A∗ x∗ = c∗ . Note that r(A∗ ) = r(A∗ , c∗ ) =2. Thus, delete the third row of A∗ which is linearly dependent on the first two rows. Then, we have
53 43
x1 x2
2 = 1
and we obtain
x1 x2
1 = −1
Recall that x3 = −1 by assignment.
4.7 Orthogonal Transformation A square matrix P is called orthogonal if and only if its inverse is equal to the transpose of P, namely
P−1 = P . Therefore, if P is an orthogonal matrix, then
P P = I, and P P = I The determinant of orthogonal matrix P is either 1 or −1.
4.8 Rotation of Axes Consider the rectangular coordinates, (x, y), of a point T and rotate axes X and Y by an angle θ to obtain new coordinates, x ∗ , y ∗ , of T . This can be described by the relation
4.9 Characteristic Equation of the Quadratic Form
x∗ y∗
=
67
cos θ sin θ − sin θ cos θ
x y
or, u∗ = Pu, where
x∗ u = y∗ ∗
or,
x u= . y and P is the 2 × 2 orthogonal matrix. This is called orthogonal coordinate transformation of u. If the determinant of P is 1, that is, |P| = 1, the orientation of the axes is preserved, but if |P| = −1, the orientation of the axes is altered. Suppose A is an n × n symmetric matrix and x is an n × 1 column vector. Then the scaler x Ax is called a quadratic form in x.
4.9 Characteristic Equation of the Quadratic Form Consider the following quadratic equation ax 2 + 2bx y + cy 2 =
x ab x y bc y
This is a quadratic equation in x and y. Through orthogonal coordinate transformations, it is always possible to express a quadratic equation in terms of only the quadratic terms, that is, x 2 and y 2 , namely λ1 x
∗2
+ λ2 y
∗2
∗
λ1 0 x = u∗ u∗ , = x y y∗ 0 λ2
∗
∗
where λ1 and λ2 are constants and = diag(λ j ). It is known that this problem of transformation can algebraically be handled by solving the equation (A − λ j I)pj = 0,
68
4 Matrix Calculus
where pj is the jth column of P. This is a well-known eigenequation which appears in many places in multivariate analysis. This is a set of homogeneous equations and has a trivial solution, pj = 0. The system has non-trivial solutions if and only if the determinant of the coefficient matrix vanishes, that is, if and only if |A − λ j I| = 0.
4.10 Eigenvalues and Eigenvectors The last equation is called the characteristic equation of the quadratic form x’Ax, and λ j are called the eigenvalues (latent roots, characteristic roots, proper values) and p j are called eigenvectors, latent vectors, characteristic vectors, proper vectors) associated with λ j . Consider the following symmetric matrix
A=
ab . bc
Then the characteristic equation is (a − λ j ) b |A − λ j I| = b (c − λ j ) = (a − λ j )(c − λ j ) − b2 = λ2 − (a + c)λ j + (ac − b2 ) = 0. Solve this equation for λ j . Then the corresponding p j can be obtained by solving (A − λ j I)p j = 0 and then u∗ is obtained by the relation u∗ = Pu. Let = diag(λ j ).
Then, u∗ u is called the canonical form of u’Au. The axis, rotated to the canonical form, is called the principal axes. This transformation is called canonical reduction of the quadratic form.
4.10 Eigenvalues and Eigenvectors
69
4.10.1 Example: Canonical Reduction This example is from (Nishisato, 1980) and is also used in his other publications. Suppose that we are given the quadratic function: 5x 2 + 8x y + 5y 2 = 9. We wish to transform this quadratic function into canonical form, that is, the quadratic function without the crossed term. To derive the canonical form, let us express the quadratic function in the matrix form as follows:
54 x x y = 9. 45 y Let us introduce the characteristic function, or, more popularly, the eigenequation, which is given by
54 1 0 5 − λ 4 4 5 − λ 0 1 = 4 5 − λ
= (5 − λ)2 − 16 = (λ − 9)(λ − 1) = 0. From the above equation, we obtain two eigenvalues, as λ1 = 9 and λ2 = 1. Thus, the canonical form for our example is given by 9x 2 + y 2 = 9. The above transformation is called canonical reduction, and this transformation and its relation to the original quadratic function can be visualized in Fig. 4.1. The canonical reduction amounts to the transformation of a tilted function (i.e., axes (X, Y ) into a symmetric form (axes X ∗ , Y ∗ ). Note that the original axes show a tilted quadratic function, while the canonical form shows a symmetric quadratic function of (X ∗ , Y ∗ ). These axes are called principal axes. Here we can get a hint of what principal component analysis is. Through our mathematics, we have just introduced canonical reduction of a quadratic function. This concept is related to many formulas used in statistics as eigenequation and singular value decomposition, which have served as a backbone of multivariate analysis since as early as 1870s (e.g., Beltrami, 1873; Jordan, 1874) and are related to such familiar topics as principal component analysis (Pearson,
70
4 Matrix Calculus
Fig. 4.1 Canonical reduction of a quadratic equation
1901; Hotelling, 1933) and Eckart–Young decomposition theorem (Eckart & Young, 1936). In particular, principal component analysis is known as a technique to find principal axes. The projections of data points on those axes are called principal coordinates, and it is well known that principal component analysis provides the most economical way of describing the data. It can also be stated as a method to project the data onto space where the variance of the data attains the maximum value (i.e., the maximum information). In this way, principal component analysis identifies a set of orthogonal coordinates onto which the distribution of data is symmetric with the maximum variance. Quantification theory is known to be the principal component analysis of categorical data (Torgerson, 1958).
4.11 Idempotent Matrices Remember that an n × n matrix A such that A2 = A is called idempotent matrix. Furthermore, if an idempotent matrix is symmetric, it is called a symmetric idempotent matrix. The necessary and sufficient condition for a matrix to be idempotent is that the eigenvalues (to be introduced shortly) are either 1 or 0. Suppose that an n × n idempotent matrix A has p nonzero characteristic roots (to be discussed later). Then each of them is equal to 1. If A is symmetric, then a necessary and sufficient condition that A is idempotent is that it has p nonzero characteristic roots which are all equal to 1. If A is a (symmetric) idempotent matrix, then
4.12 Projection Operator
71
A is a (symmetric) idempotent matrix, P AP is a (symmetric) idempotent matrix if P is an n × n orthogonal matrix, P AP−1 is an n × n non-singular matrix, I − A is a (symmetric) idempotent matrix.
1. 2. 3. 4.
If A is a symmetric matrix, there exists an orthogonal matrix P such that ⎡
λ1 ⎢0 ⎢ P AP = ⎢ ⎢ . ⎣ . 0
0 . . . 0
. . . . .
⎤ . 0 . . ⎥ ⎥ 2 . . ⎥ ⎥= . . . ⎦ . λn
In other words, since P P = PP = I,
A = λ1 x1 x1 + · · · + λn xn xn
4.12 Projection Operator The matrix that we have looked at in the previous section is called Projection Operator and can be characterized by the following: 1. 2. 3. 4. 5.
P(a1 x1 + a2 x2 ) = a1 Px1 + a2 Px2 . If P is idempotent, that is, P2 = P, then P is a projection operator. if P is a projection operator, then Q = I − P is also a projection operator. If Q = I − P, then Q2 = Q and PQ = QP = 0. An n × n matrix P is a projector if and only if it can be expressed as P = Sp S−1
where S is a non-singular matrix and p is a diagonal matrix with diagonal elements being 1 or 0. 6. If P is a projector, then rank(P) = trace(P).
4.12.1 Example 1: Maximal Correlation In Chap. 3, we mentioned a quantification method, called forced classification (Nishisato, 1984; Nishisato and Baba, 1999), as an example of the use of series and limit. The idea also can more directly be handled by using projection operators. Let us quote from Chap. 3:
72
4 Matrix Calculus Consider seven multiple-choice data where the seventh item is chosen as the criterion item so that the data can be represented as F1 , F2 , · · · , F6 , Fc , where each item-data matrix is (the number of subjects)-by-(the number of response alternatives of the item): Each subject chooses one option per questions so that the elements of the data matrix consists of 1 (choice) and 0’s (non-choices) such that F j 1) = 1 for all the subjects. Consider a modified data matrix where the criterion item is multiplied by a constant p so that the modified data matrix can be written as [F1 , F2 , · · · , F6 , pFc ] As the value of p of the criterion item c goes to infinity, the correlation between the criterion and each item reaches the maximal value. Figure 3.5 shows how the correlation coefficients of the other items with the criterion item (the vertical axis) become larger as the value of p increases and finally reaches the maximal value.
The above problem is more directly solved by using projection operator Pc , where Pc = Fc (Fc Fc )−1 Fc and carry out quantification of the new matrix Pc (F1 , F2 , · · · , Fn ). We can now state that the quantification of the above matrix (i.e., the matrix projected onto the space of item c) maximizes the correlation between the criterion item c with each of the remaining items (Nishisato, 1984).
4.12.2 Example 2: General Decomposition Formula Let F be an n × m data matrix. We can decompose it by two sets of projection operators, one set for the rows and the other for columns such that P1 + P2 + · · · + Pn = In and Q1 + Q2 + · · · + Qm = Im . Then, F = (P1 + P2 + · · · + Pn )F(Q1 + Q2 + · · · + Qm ) This general formula (Nishisato and Lawrence, 1989) offers later a way to analyze a subspace or a set of chosen subspace of rows or columns or both of data for quantification, called generalized forced classification analysis (Nishisato, 1984). In other words, the above expression will allow optimization of subspace of data in the context of quantification theory.
References
73
As in other contexts of statistics, the concept of projection offers many applications of non-quantitative data through quantification, and this is one area of data analysis which requires much further explorations.
References Bock, R. D. (1975). Multivariate statistical methods in behavioral research. McGraw-Hill. Browne, E. T. (1958). Introduction to the theory of determinants and matrices. The University of North Carolina Press. Curtis, C. W. (1963). Linear Algebra: An introductory approach. Allyn and Bacon. Feeman, G. F., & Grabois, N. R. (1970). Linear Algebra and multivariable calculus. McGraw-Hill. Graybill, F. A. (1961). An introduction to linear statistical models. McGraw-Hill. Graybill, F. A. (1969). Introduction to matrices with applications in statistics. Wadsworth Publishing Company. Graybill, F. A. (1976). Theory and application of linear model. Wadsworth Publishing Company. Horst, P. (1963). Matrix Algebra for social scientists. Rinehart and Winston: Holt. Nishisato, S. (1980). Analysis of Categorical Data: Dual Scaling and Its Applications. Toronto: The University of Toronto Press. Nishisato, S. (1984). Forced classification: A simple application of a quantification technique. Psychometrika, 49, 25–36. Nishisato, S., & Baba, Y. (1999). On contingency, projection and forced classification of dual scaling. Behaviormetrika, 26, 207–219. Nishisato, S., & Lawrence, D. R. (1989). Dual scaling of multiway data matrices: Several variants. In R. Coppi & S. Bolasco (Eds.), Multiway data analysis (pp. 317–326). Elsevier Science Publishers.
Chapter 5
Statistics in Matrix Notation
5.1 Mean Let us consider an N × n data matrix X, where N is the number of subjects and n the number of variables. Let xj be the N × 1 vector of variable j and 1N be the N × 1 vector of 1 s. Then, the mean of variable j, x j , can be expressed as xj =
xj 1N x ji = N 1N 1N
The n × 1 vector of j means, x, can be expressed as x=
X 1N 1N 1N
5.2 Variance-Covariance Matrix For the same data set as above, let us consider the n × n variance-covariance matrix, V. V=
1 (X X − nxj xj ) N −1
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Nishisato, Measurement, Mathematics and New Quantification Theory, Behaviormetrics: Quantitative Approaches to Human Behavior 16, https://doi.org/10.1007/978-981-99-2295-6_5
75
76
5 Statistics in Matrix Notation
5.3 Correlation Matrix Pearson’s product-moment correlation matrix, R can be obtained from the variancecovariance matrix by R = D− 2 VD− 2 1
1
where D− 2 = diag( σ1j ) and σ j is the standard deviation of variable j. 1
5.4 Linear Regression Suppose we have the final overall achievement examination scores, y and first-term mathematics test scores, x1i , English scores, x2i and history scores, xi3 , from N students. Our question is if we can predict the final scores from the first-term scores by a linear regression model, yi = β1 x1i + β2 x2i + β3 x3i + ei Or in matrix notation, y = Xβ + e where y is the N × 1 vector of the achievement test scores of N students, X is the N × 3 matrix of three first-term scores of N students and e is the N × 1 vector of errors in prediction. The least-squares method is to determine the weight vector β so as to minimize the sum of squares of the residuals, that is, e e. The values of the estimates of βi are called the least-squares estimates. Thus, our task is to solve the following equation for β, ∂(y − Xβ) (y − Xβ) ∂ee = =0 ∂β ∂β or ∂ee ∂(y − Xβ) (y − Xβ) ∂(y − Xβ) = =0 ∂β ∂(y − Xβ) ∂β ˜ as Solving this, we obtain the least-squares estimate of β, β, β˜ = (X X)−1 X y
5.5 One-Way Analysis of Variance
77
Note that the matrix X(X X)−1 X , which we call a hat matrix, is a projection operator, X(X X)−1 X = P so that the original vector can also be expressed as the projection of the observed vector onto the estimation space (model space) plus the projection onto the residual space (error space), y = Py + (I − P)y = y˜ + e˜ This idea of the least-square estimation matches the idea of projection by its definition. In quantification theory, y is a vector of unknowns. The problem then is to determine y so as to maximize the contribution of Py, or minimize that of (I − Py).
5.5 One-Way Analysis of Variance Let us now steer our attention more to some statistical procedures in the context of quantification theory. As Fisher (1940) and Nishisato (1971, 1972, 1980) advanced the idea of quantification theory in the context of the analysis of variance, we will follow their steps. Consider the same example as we saw in Chap. 1 with the following interpretation of the data: (1) We assign three unknown scores y1 for judgment “good,” y2 for “average” and y3 for “poor”; (2) we then carry out one-way analysis of variance to examine if there is any significant differences among the three teachers (Table 5.1). In terms of the scores given to the evaluation categories, yi , we can express the one-way analysis of variance as in Table 5.2. This is a typical data set for the one-way
Table 5.1 Evaluation of three teachers Teacher Good Average
Poor
Total
Whitea 1 3 6 10 Green 3 5 2 10 Brown 6 3 0 9 Total 10 11 8 29 a In the original (Nishisato, 1980), these teachers were identified as 1, 2 and 3; but later these were changed to White, Green and Brown, respectively (Nishisato & Nishisato, 1994) Table 5.2 Data for one-way analysis of variance Teacher Data White Green Brown
y1 , y2 , y2 , y2 , y3 , y3 , y3 , y3 , y3 , y3 y1 , y1 , y1 , y2 , y2 , y2 , y2 , y2 , y3 , y3 y1 , y1 , y1 , y1 , y1 , y1 , y2 , y2 , y2
No. of responses 10 10 9
78
5 Statistics in Matrix Notation
analysis of variance, except that those yi j are unknown scores (numbers) as responses in the table. In quantification theory, our objective is to determine those unknown scores yi j so that the analysis of variance model shows the best fit to the data. In one-way analysis of variance, we introduce the following terms: The total sum of squares (SSt ), the between-group sum of squares (SSb ) and the within-group of sum of squares (SSw ) which are defined by SSt =
y ji −
ft
y ji
2 =
(y ji − y)2
where f t is the total number of observations, which is 29 in this example, and y is the overall mean. f j. (y j − y)2 SSb = where f j. is the total number of responses for row j and y j is the mean of the row j, SSw =
(y ji − y j )2
There exists the following relations: SSt = SSb + SSw In one-way analysis of variance, we test the null hypothesis that the three means are random sample from the population that the three means have come from the same population (see a statistics book about this hypothesis testing). Our current concern is how these terms can be expressed in matrix notation. Let us introduce the following matrices and the vectors for the current numerical example: ⎡
⎤ 136 F = ⎣3 5 2⎦; 630
⎡
⎤ 10 f = ⎣ 10 ⎦ ; 9
⎡
⎤ 10 0 0 D = ⎣ 0 10 0 ⎦ ; 0 0 9
⎤ y1 y = ⎣ y2 ⎦ y3 ⎡
We also indicate the total number of responses by f t , that is, f t = 29. In quantification analysis, we set that the sum of the weighted responses is set equal to zero, that is, ⎡ ⎤ y1
f y = 10 10 9 ⎣ y2 ⎦ = 0 y3
5.6 Multiway Analysis of Variance
79
so that SSt =
y 2ji = y Dy
SSb = y F D−1 Fy SSw = SSt − SSb = y (D − F D−1 F)y In quantification theory, we can use this model. W determine those unknowns y1 , y2 , y3 in such a way that the ratio of SSb to SSw be a maximum. In statistics, we have the term “correlation ratio” which is indicated by η2 as defined by η2 =
SSb SSt
We can determine y so as to maximize η2 . Alternatively, we can maximize SSb in terms of y under the condition that SSt is constant, say SSt = f t . This latter case is typically handled by Lagrange’s method of multipliers. For instance, we can define the Lagrangian function as L(λ) = SSb − λ(SSt − f t ) In this case, the quantification solution can be obtained by solving ∂SSb = 0, ∂y
∂SSb =0 ∂λ
And our quantification task is defined as that of maximizing the Lagrangian function with respect to y. In other words, we partially differentiate the Lagrangian function with respect to y j and the Lagrangian multiplier λ, set the derivatives equal to zero and find the values of y j . It turns out that the Lagrangian multiplier is equal to the eigenvalue, the squared correlation between the rows and the columns of the input data.
5.6 Multiway Analysis of Variance Let us make the discussion more general than the above case and look at the analysis of variance of categorical data through optimal quantification (Nishisato, 1971, 1972). We will use the example from Nishisato (1980) and consider a 2 × 2 factorial design with two subjects in each cell, responding to four multiple-choice questions. Remember that multiple-choice option weights must be determined for this analysis of variance of categorical variables. The data are categorical, and we will use the
80
5 Statistics in Matrix Notation
Table 5.3 Four groups of subjects, responding to four questions Item 1 2 Factor a Factor a Factor a Factor a Factor b Factor b Factor b Factor b
Treatment 1 Treatment 1 Treatment 2 Treatment 2 Treatment 1 Treatment 1 Treatment 2 Treatment 2
1 1 0 1 0 0 1 0
0 0 1 0 1 1 0 1
0 0 1 1 1 1 0 0
1 1 0 0 0 0 1 1
0 1 1 1 1 0 0 1
3 1 0 0 0 0 1 1 0
4 1 1 0 0 1 0 0 1
0 0 1 1 0 1 1 0
notation for data as we have so far used, that is, the response-pattern format, as shown in Table 5.3. The scores of the eight subjects, z, are unknown to start with, and our task is to determine those scores so as to maximize the contributions of the analysis of variance model. In terms of the analysis of variance model, these scores have the following structure: ⎤ ⎡ ⎤ ez1 μ + α1 + β1 + αβ11 ⎢ μ + α1 + β1 + αβ11 ⎥ ⎢ ez2 ⎥ ⎥ ⎢ ⎥ ⎢ ⎢ μ + α1 + β2 + αβ12 ⎥ ⎢ ez3 ⎥ ⎥ ⎢ ⎥ ⎢ ⎢ μ + α1 + β2 + αβ12 ⎥ ⎢ ez4 ⎥ ⎥ ⎢ ⎥ z=⎢ ⎢ μ + α2 + β1 + αβ21 ⎥ + ⎢ ez5 ⎥ ⎥ ⎢ ⎥ ⎢ ⎢ μ + α2 + β1 + αβ21 ⎥ ⎢ ez6 ⎥ ⎥ ⎢ ⎥ ⎢ ⎣ μ + α2 + β2 + αβ22 ⎦ ⎣ ez7 ⎦ μ + α2 + β2 + αβ22 ez8 ⎡
The design matrix for the analysis of variance, corresponding to these parameters [μ, α1 , α2 , β1 , β2 , αβ11 , αβ12 , αβ21 , αβ22 ], is given by ⎡
1 ⎢1 ⎢ ⎢1 ⎢ ⎢1 A=⎢ ⎢1 ⎢ ⎢1 ⎢ ⎣1 1
1 1 1 1 0 0 0 0
0 0 0 0 1 1 1 1
1 1 0 0 1 1 0 0
0 0 1 1 0 0 1 1
1 1 0 0 0 0 0 0
0 0 1 1 0 0 0 0
0 0 0 0 1 1 0 0
⎤ 0 0⎥ ⎥ 0⎥ ⎥ 0⎥ ⎥ = [Aμ , Aα , Aβ , Aαβ ] 0⎥ ⎥ 0⎥ ⎥ 1⎦ 1
Let us define the following ’hat’ matrix, Hk = Ak (Ak Ak )−1 Ak
5.6 Multiway Analysis of Variance
81
where k = α, β, αβ. For the analysis of variance, we can now define projection operators Pk which partition the variates into independent sets. The disjoint space projection operators are defined by ⎧ Pμ = Hμ = 1(1 1)−1 1 ⎪ ⎪ ⎨ Pα = Hα − Hμ Pβ = Hβ − Hμ ⎪ ⎪ ⎩ Pαβ = Hαβ − Hα − Hβ + Hμ Pe = I − Hαβ Note that Pμ + Pα + Pβ + Pαβ = I The score vector z can now be expressed in terms of these projection operators: z = Pμ z + Pα z + Pβ z + Pαβ z + Pe z For the p × q crossed design with the same number of subjects in each of the pq cells, we can represent the analysis of variance table in terms of projection operators. Let N be the total number of subjects, m the total number of categories and n the total number of categorical variables. We then define the following terms: (1) Degrees of freedom d f : dfα = p − 1, dfβ = q − 1, dfαβ = ( p − 1)(q − 1) dfbetween = pq − 1, dfwithin = (N − 1)(m − n) − pq + 1 dftotal = (N − 1)(m − n) (2) The mean sums of squares (MS): MSα =
SSα SSβ SSαβ , MSβ = , MSαβ = dfα dfβ dfαβ
MSbetween =
SSbetween SSwithin , MSwithin = dfbetween dfwithin
(3) The F ratios: Fα =
MSα , MSwithin
Fβ =
MSβ , MSwithin
Fαβ =
MSαβ MSwithin
Then, the analysis of variance can be tabulated as in Table 5.4.
82
5 Statistics in Matrix Notation
Table 5.4 Two-way factorial analysis of variance Source df SS A B AB Between Within Total
p−1 q −1 ( p − 1)(q − 1) pq − 1 (N − 1)(m − n) − pq + 1 (N − 1)(m − n)
y P
αy y Pβ y y Pαβ y y (Pα + Pβ + Pαβ y (I − Pαβ )y y y
MS
F
M Sα M Sβ M Sαβ
Fα Fβ Fαβ
df = degrees of freedom; SS = sum of squares; MS = mean sum of squares
The quantification task is to determine the weight vector y in such a way, for example, to maximize the contribution of the analysis of variance parameter α, β, αβ, or any combinations of these or the analysis of variance model (i.e., all the parameters or only α and β).
5.7 Discriminant Analysis Nishisato (1984) proposed forced classification analysis (see also Nishisato & Baba, (1999)) It is a variant of discriminant analysis of categorical variables. When he proposed forced classification analysis, it was still the time when the method of reciprocal averages (MRA) (Mosier, 1946; Richardson and Kuder, 1933) was popular: It was the procedure based on the limit theorem that an infinite series approaches the optimal quantification. Nishisato’s forced classification procedure was proposed with that historical background, starting with multiple-choice data of n multiple-choice questions: F = (F1 , F2 , · · · , Fn ) where Fj is the N × m j response-pattern (incidence) matrix from N subjects’ responses to item j with m j response options. Thus, we have Fj 1 = 1 The forced classification procedure is based on the mathematical identity, lim [F1 , F2 , · · · , kFj , · · · , Fn ] = Pj F
k→∞
Nishisato (1984) formulation was further generalized (Nishisato, 1986) to allow the discrimination focus on any part of categorical data. This is similar to the above discussion of multiway analysis of variance in which we quantify the data so as to maximize the effects of α, β or αβ or any combinations of these choices.
5.8 Principal Component Analysis
83
In terms of focusing, the most general framework was proposed by Nishisato and Lawrence (1989) with the following general expression: F=
m i
Pi F(
n
Qj = P1 + P2 + · · · + Pm )F(Q1 + Q2 + · · · + Qn )
j
where m
Pi = Im and
n
i
Qj = In
j
We can use this framework and quantify the categorical data in such a way that the chosen structure may maximally be represented in analysis.
5.8 Principal Component Analysis The idea of principal component analysis can be inferred from canonical reduction, discussed earlier, where the n-variable data can be rotated to the canonical form, which can be obtained by rotating the original configuration with arbitrary axes into the configuration with principal axes. As (Hotelling, 1933; Pearson, 1901) demonstrated, this canonical reduction can be attained by determining linear combinations of variables that maximize the variance of the linear composites. Thus, the problem can be stated as follows: (1) Consider a linear combination of n variables yi = β1 x1i + β2 x2i + · · · + xni βn Or y = Xβ The task of principal component analysis is to determine the weight vector β that maximizes the variance of the composite scores y, subject to the condition that β β = 1 This condition is important in order to keep the multidimensional configuration invariant over orthogonal rotations of the axes. The variance-covariance matrix can be given, as we have already seen earlier, by V=
1 (X X − nxj xj ) N −1
84
5 Statistics in Matrix Notation
and the Lagrangian function for the current task is given by L(β, λ) = V − λ(β β − 1) Differentiating the function with respect to β and λ leads to the equations (V − λI)y = 0 Solving this equation, we obtain a set of principal coordinates and a set of the eigenvalues associated with them. Each eigenvalue can be considered as the amount of information associated with the component.
References Fisher, R. A. (1940). The precision of discriminant functions. Annals of Eugenics, 10, 422–429. Hotelling, H. (1933). Analysis of complex of statistical variables into principal components. Journal of Educational Psychology, 24, 417-441, and 498-520. Mosier, C. I. (1946). Machine methods in scaling by reciprocal averages. In Proceedings, Research Forum(pp. 35–39). International Business Corporation. Nishisato, S. (1971). Analysis of variance through optimal scaling. In Proceedings of the First Canadian Conference in Applied Statistics (pp. 306–316). Sir George Williams University Press. Nishisato, S. (1972). Analysis of variance of categorical data through selective scaling. Proceedings of the 20th International Congress of Psychology, Tokyo. Nishisato, S. (1980). Analysis of categorical data: Dual scaling and its applications. The University of Toronto Press. Nishisato, S. (1984). Forced classification: A simple application of a quantification technique. Psychometrika, 49, 25–36. Nishisato, S. (1986). Generalized forced classification for quantifying categorical data. In E. Diday (Ed.), Data analysis and informatics (pp. 351–362). North-Holland. Nishisato, S., & Baba, Y. (1999). On contingency, projection and forced classification of dual scaling. Behaviormetrika, 26, 207–219. Nishisato, S., & Lawrence, D. R. (1989). Dual scaling of multiway data matrices: Several variants. In R. Coppi & S. Bolasco (Eds.), Multiway data analysis (pp. 317–326). Elsevier Science Publishers. Pearson, K. (1901). On lines and planes of closest fit to systems of points in space. Philosophical Magazines and Journal of Science, Series, 6(2), 559–572. Richardson, M., & Kuder, G. F. (1933). Making a rating scale that measures. Personnel Journal, 12, 36–40.
Chapter 6
Multidimensional Space
6.1 Introduction Nowadays, many data analysis problems involve multidimensional space as in factor analysis, principal component analysis and multidimensional scaling. We know that given a set of n real numbers, a point in n-dimensional Euclidean space is specified by the n coordinates (x1 , x2 , x3 , . . . , xn ). The Euclidean distance dx y between two points with coordinates (x1 , x2 , x3 , . . . , xn ) and (y1 , y2 , y3 , . . . , yn ) is defined by the following formula: dx y = (x1 − y1 )2 + (x2 − y2 )2 + (x3 − y3 )2 + · · · + (xn − yn )2 . We are very familiar with one-dimensional, two-dimensional and threedimensional space and graphs through our experiences in the daily life. Multidimensional space is its extension to the higher-dimensional space, and used whenever we deal with many variables with the key concept being interpoint distances, as defined above.
6.2 Pierce’s Description We typically consider a multidimensional configuration of a set of standardized variables. Ask about a set of points in multidimensional space. How are they scattered in multidimensional space? An important and interesting description of multidimensional space was presented by Pierce, (1961). In 1961, Pierce presented an interesting discussion about the distribution of information in multidimensional space. He considered a circle of radius 1 and a concentric circle of radius 21 inside of it. The area of a circle is πr 2 , and thus, the area of the outer circle is π and that of the inner circle is 41 π . He then considered a sphere. The © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Nishisato, Measurement, Mathematics and New Quantification Theory, Behaviormetrics: Quantitative Approaches to Human Behavior 16, https://doi.org/10.1007/978-981-99-2295-6_6
85
86
6 Multidimensional Space
volume is 43 πr 3 . So, 18 of the volume of a sphere lies within a sphere of one-half diameter. We can now generalize this to the point that the volume of a hypersphere of n dimensions is proportional to r n . Let us now consider a sphere of radius r and a sphere of radius 0.99r. For a 1000-dimensional sphere, a fraction of 0.00004 of the volume lies in a sphere of 0.99 the radius. Therefore, Pierce (1961, p. 170) states that “The conclusion is inescapable that in the case of a hypersphere of a very high dimensionality, essentially all of the volume lies very near the surface!” This is an important point. In principal component analysis [PCA; (Nishisato & Clavel, 2003; Hotelling, 1933)], we often standardize n variables (i.e., each variable has the variance of 1), then if we draw a two-dimensional graph, using the first two components, the graph may show that n data points scatter all over the twodimensional graph on and within the circle of radius 1. This indicates that the data require more than two-dimensional space to describe all the information, for if the data are strictly two-dimensional, all the variables should lie on the circle with the diameter of 1. Similarly, if the data are strictly three-dimensional, all the variables will lie on the three-dimensional sphere at the distance of 1 without exception. We can advance the same view to the higher-dimensional space. Therefore, if you draw k-dimensional graph and the data points are not at the distance of 1 from the origin, the data require more dimensions to be fully accounted for. In our principal component analysis of the standardized variables, we often see some data points lie inside the circle of the diameter 1 in a two-dimensional graph. This means that for a complete description of the data we need more than two dimensions. The more points are closer to the origin, the higher-dimensional space is needed to accommodate the data. In this way, we can safely conclude that if all the data points lie on the surface of the hyperdimensional space, the dimensionality of the graph is appropriate for the data set. Otherwise, one must increase the dimension of the graph. One problem we have to solve is how to draw a graph in hyperspace of more than three dimensions. This is an important problem we must tackle and develop, that is, find out a useful method of graphical display in space of the dimensionality higher than three. Thus, if all variables are standardized as our example above, and if the data are purely two-dimensional, we will see that all the data points lie on the circle of radius 1. If some points lie inside and away from this circle, we can be certain it indicates that the data require more than two dimensions. Our conclusion is therefore that if the data structure occupies k-dimensional space, all the data points lie on the surface of k-dimensional hypersphere. This suggests how to decide the number of dimensions one needs to accommodate the data points. Euclidean space is the familiar medium in which most of our discussion of distance is conducted. For instance, a map of a city is typically drawn on two axes, northsouth and east-west, or a map of a country is described, using two axes, longitude and latitude. Euclidean space in these examples is a two-dimensional space, governed by the distance between Point A to Point B as
6.2 Pierce’s Description
87
d AB =
(X A1 − X B1 )2 + (X A2 − X B2 )2
where, for instance, (X A1 , X A2 ) is the coordinate of Point A on two axes. More generally, the Euclidean space is a space that consists of all points of n numbers (x1 , x2 , ..., xn ), where the distance d(x,y) between x = (x1 , x2 , ..., xn ) and y = y(y1 , y2 , ..., yn ) is given by n d(x, y) = (xi − yi )2 i=1
There are a number of relations useful for data analysis, and some of them will be listed here.
6.2.1 Pythagorean Theorem The sum of the squares of the lengths of the legs of a right triangle is equal to the square of the hypotenuse.
6.2.2 The Cosine Law Consider three points i, j, k in two-dimensional Euclidean space. Let us indicate the distance between two points by d with two points in the subscript, and the angle between i, j and i, k with i as the origin by θ jik . Then, there exists the relation, d 2jk = di2j + dik2 − di j dik cos θ jik .
6.2.3 Young–Householder Theorem Young and Householder (1938) published a paper to clarify the space used in psychometrics. The main concept is now known as Young–Householder theorem. (i) If the matrix with the typical element being di j dik cos θ jik is positive definite or positive semi-definite, the distances between two points may be considered as distances between points in Euclidean space.
88
6 Multidimensional Space
(ii) The rank of this matrix is equal to the dimensionality of space which accommodates all the points. (iii) Any positive definite or positive semi-definite matrix can be factored into the product of the matrix of coordinates of the points in Euclidean space.
6.2.4 Eckart–Young Theorem Eckart and Young (1936) theorem is well known in psychometrics, thanks to their publication in Psychometrika. In statistics, this is known as singular value decomposition (Beltrami, 1873; Jordan, 1874; Schmidt, 1907). Eckart and Young (1936) state that for any real n × m rectangular matrix A, we can find two matrices, n × k matrix B and m × k matrix C where B B = In and C C = Im such that B AC = = diag(λk ). We will see shortly that quantification theory is nothing but singular value decomposition of categorical data.
6.2.5 Chi-Square Distance Categorical data are grouped into incidence data and dominance data (Nishisato, 1993). The two different metrics are used to quantify them, the chi-square distance for incidence data and the Euclidean distance for dominance data. Here we will look at the chi-square distance. Suppose that row j and row k have f j. and fk. observations each. We then indicate by p j. and pk. the respective proportions over the total responses ft . Let us indicate by y js and yks the scale values of the two respective rows in component s. Then, the squared chi-square distance between points j and k is given by
2 dchi( jk)
=
K s=1
ρs
2
y js yks −√ √ p j. pk.
2 .
Similarly, the squared chi-square distance between column c and column d is given by
6.3 Distance in Multidimensional Space
2 dchi(cd)
=
K s=1
89
ρs
2
xcs xds −√ √ p.c p.d
2 .
The above two expressions are for what we call “within-set” distances. Recently, Nishisato and Clavel (2003) presented the formula for the chi-square distance between row j and column c, that is, the between-set distances.
2 dchi( jc)
=
K s=1
ρs
2
y js 2 y js xcs xcs 2 + − 2ρs √ p j. p.c p j. p.c
The last formula is based on the cosine law, mentioned earlier: Consider two points A and B in a two-dimensional space, with the distance between the origin (O) and A being a and the distance between the origin and B being b, and the angle between AOB being θ . Then the cosine law states that the square of the distance between A and B is given by d AB 2 = a 2 + b2 − 2ab cos θ. For quantification of dominance data such as rank order data and pared comparison data [see (Guttman, 1946; Slater, 1960; Nishisato, 1978, 1980)], all marginals are equal. In this case, the chi-square distance is reduced to the Euclidean distance. Let us consider principal component analysis of standardized variables (i.e., each variable has the variance of 1). If the set of variables can be accommodated in twodimensional space, this means that each variable is located at the distance of 1 from the origin and on the circle of diameter 1. Similarly, if the data are perfectly threedimensional, all the variables lie at the distance of 1 from the origin and are located on the surface of the ball with the radius of 1. An important point here is that all the variables are located at the distance of 1 from the origin of the graph, and none of the standardized variables lie inside the radius of 1 from the origin. The fact that many data points lie inside the radius of 1 indicate that the space we are using for the graph does not have enough dimensions. Therefore, if the configuration of data points in a given graph does not show that all the data points lie at the distance of 1, if the data are standardized, means that the dimensionality of the graph is not large enough for the given data set.
6.3 Distance in Multidimensional Space From the equation for the Euclidean distance, it is obvious that if two points lie in multidimensional space the distance between them increases as we view it in higherdimensional space than two- or three-dimensional space. Thus, if two points are very
90
6 Multidimensional Space
far when viewed in two-dimensional space, there is a guarantee that they are further apart from each other in three- or higher-dimensional space. However, the opposite is not true. If two points are very close to each other in two-dimensional space, there is no guarantee that they are also close in three- or higher-dimensional space. The above point of the relation between two points in mutidimensional space as a non-decreasing function of the dimensionality of space is extremely important from the graphical point of view. Considering that we can only draw two- or threedimensional graph of data, we tend to overjudge the closeness of two points. Those closely located two points in two-dimensional space may be very far from each other in the total space of say five-dimensional space. Does this not warn us to be extremely careful in interpreting analytical data by looking at a two-dimensional graph? Yes, this is the warning: Be very careful when you look at a two-dimensional graph when the data are multidimensional! This appears contrary to the general practice in multidimensional data analysis. People typically look for a cluster of points which are close to one another in twoor three-dimensional space. Why? Instead we should look for points in two- or three-dimensional space which are widely separated from one another, or if we are interested in a cluster of close points, we should look for such a cluster of points in multidimensional space. Later we will revisit this problem of interpreting quantification results by looking at only two-dimensional graphs. And, this problem of graphical display of quantification results will unfortunately remain as one of the most important unsolved problems.
6.4 Correlation in Multidimensional Space Similar to the above aspect of multidimensional space is the interpretation of correlation between two variables. When two variables are expressed as two vectors with the origin in two-dimensional graph, it is well known that Pearson’s correlation is equal to the cosine of the angle between the two vectors. When we express data in the orthogonal coordinate system, it is known that each variable can be expressed as an axis, along which all variates of the variable lie. Suppose that two variables lie in three-dimensional space, and suppose we look at them in two-dimensional space. It is not difficult to see that the angle between two axes, viewed in two-dimensional space cannot be larger than the angle between them, viewed in three-dimensional space. In other words, when two variables lie in multidimensional space with angle θ , the angle becomes smaller when we project the two axes onto a space of smaller dimensionality than that. This means that if we look at the data in two dimensions when they are actually in a space of dimensionality higher than two, we are overestimating the correlation, for the smaller the angle the larger the correlation (Nishisato, 1988; Nishisato & Clavel, 2003).
References
91
The above point may appear paradoxical when we consider the model of linear factor analysis, in which the correlation between two variables is decomposed into the contributions of many dimensions, giving us the impression that the more dimensions involved the higher the correlation. This, however, is not the case and this is another aspect of our future problem together with the distance problem mentioned in the previous section.
References Beltrami, E. (1873). Sulle funzioni bilineari (on the bilinear functions). In G. Battagline & E. Fergola (Eds.), Giornale di Mathematiche, 11, 98–106. Eckart, C., & Young, G. (1936). The approximation of one matrix by another of lower rank. Psychometrika, 1, 211–218. Guttman, L. (1946). An approach for quantifying paired comparisons and rank order. Annals of Mathematical Statistics, 17, 144–163. Hotelling, H. (1933). Analysis of complex of statistical variables into principal components. Journal of Educational Psychology, 24,417–441, and 498–520. Jordan, C. (1874). Mémoire sur les formes bilinieres (Note on bilinear forms). Journal de Mathématiques Pures et Appliquées, deuxiéme Série, 19, 35–54. Nishisato, S. (1978). Optimal scaling of paired comparison and rank order data: An alternative to Guttman’s formulation. Psychometrika, 43, 267–271. Nishisato, S. (1980). Analysis of categorical data: Dual scaling and its applications. The University of Toronto Press. Nishisato, S. (1988). Market segmentation by dual scaling through generalized forced classification. In W. Gaul & M. Schader (Eds.), Data, expert knowledge and decisions (pp. 268–278). SpringerVerlag. Nishisato, S. (1993). On quantifying different types of categorical data. Psychometrika, 58, 617– 629. Nishisato, S. & Clavel, J. G. (2003). A note on between-set distances in dual scaling and correspondence analysis. Behaviormetrika, 30, 87–98. Pearson, K. (1901). On lines and planes of closest fit to systems of points in space. Philosophical Magazines and Journal of Science, Series, 6, 2, 559–572. Pierce, J. R. (1961). Symbols, Signals and Noise: The Nature and Process of Communication. New York: Harper and Rowe Publisher. Schmidt, E. (1907). Zür Theorie der linearen und nichtlinearen Integralgleichungen. Esster Teil. Entwickelung willkürlicher Functionen nach Systemaen vorgeschriebener (On theory of linear and nonlinear integral equations. Part one. Development of arbitrary functions according to prescribed systems). Mathematische Annalen, 63, 433–476. Slater, P. (1960). Analysis of personal preferences. British Journal of Statistical Psychology, 3, 119–135. Young, G., & Householder, A. A. (1938). Discussion of a set of points in terms of their mutual distances. Psychometrika, 3, 19–22.
Part III
A New Look at Quantification Theory
Quantification theory has some 100 years of history. It started with ecologists’ dealing with non-quantitative data, and then caught the attention of, among others, statisticians, psychologists, educational researchers, sociologists, biologists and archaeologists. Thus, the foundation of quantification theory is now formly built. The main aim of Part III is to offer a new look at the theory through the examination of quantification space consisting of major dual space (contingency space), minor dual space (adjustment space) and residual space. To carry out quantification of categorical data over major and minor dual space, we will present the entire quantification task through Stage 1 analysis (maximization of row-column correlation in major dual space) and Stage 2 analysis (finding principal coordinates of rows and columns in common space (i.e., major and minor dual space, one way to summarize a complete outcome of quantification) and calculating the (rows, columns)-by(rows, columns) distance matrix (another way of summarizing the total information in the data). This two-stage analysis offers a new look at the total information in the data. Although we now know what we must do to carry out logically correct analysis and logically correct joint graphical display of quantification outcomes, it does not solve the practical problem of joint graphical display, so long as we are unable to graph a multidimensional configuration. We hope that we will eventually develop multidimensional graphs and effective ways of analyzing the total distance matrix of rows and columns of the data matrix. Chapter 7 will discuss a basic formulation, including the new proposal of twostage analysis, followed by Chap. 9 where the strategy of the two-stage analysis is fully explained with two numerical examples. Chapter 10 will be devoted to controversial joint graphical display problems and alternative ways of summarizing the total information extracted by quantification analysis. Chapter 11 is devoted to other problems of quantification, ending with the author’s epilogue.
Chapter 7
General Introduction
7.1 An Overview Quantification theory is popular in those areas where data are mostly non-quantitative, qualitative or categorical. In this chapter, we will look at a general background (i.e., reference books and basic ideas of quantification) and an outline of quantification theory. Then, in Chaps. 8 and 9, we will present numerical examples for our new strategy of quantification, namely the analysis via Stage 1 and Stage 2. For Stage 1 analysis, we will analyze the contingency-table format, to investigate the correlational structure of the data. For Stage 2 analysis, we will quantify the response-pattern format of the data generated from the contingency table. This is primarily to explore orthogonal coordinates of both rows and columns of the contingency table in doubled multidimensional space (see the space theory in Chap. 8). This will enhance the understanding of the background logic for joint graphical display of rows and columns in common doubled Euclidean space.
7.2 Historical Background and Reference Books As stated in Nishisato (2007a), there are over 50 aliases of quantification theory. The main reason for this large number of aliases stems from the fact that there are different approaches to quantification theory to arrive at essentially the same results. Currently, we have not only many research papers but also a large number of books on quantification theory. There is a summary of books published in English, French and Japanese in Beh and Lombardo (2014) (pages 10–11, 2021). So, let us borrow their listings here and get familiar with a wealth of relevant information: (1) Books in French: Many of the books in French remained unknown to the English-speaking researchers until around 1970. Historically French contributions were monumental and significant with so many outstanding researchers © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Nishisato, Measurement, Mathematics and New Quantification Theory, Behaviormetrics: Quantitative Approaches to Human Behavior 16, https://doi.org/10.1007/978-981-99-2295-6_7
95
96
7 General Introduction
contributing to the theoretical developments of Analyse de Correspondances. Major books are: Beltrami (1873), Caillez and Pagés (1976), Lefevre (1976), Bouroche (1977), Lebart et al. (1977), Jambu and Lebeaux (1978), Lebart et al. (1979), Saporta (1979), Bouroche and Saporta (1980), Nakache (1982), Cibois (1983), de Lagarde (1983), Escofier and Pagés (1988), Jambu (1989), Tenenhaus (1994) and Moreau et al. (1999). (2) Books in Japanese: Unfortunately, many books in Japanese also remained unknown to many English-speaking researchers. This may be a rare occasion for us to glance at their contributions to research on their systematic classification of quantification theory. Main books are Hayashi et al. (1970), Hayashi (1974), Hayashi (1993); Komazawa (1978), Komazawa (1982); Nishisato (1982, 2007b), Nishisato (2010), Hayashi and Suzuki (1986), Iwatsubo (1987), Akiyama (1993), Hayashi (1993), Komazawa et al. (1998) and Adachi and Murakami (2011). Quantification theory is discussed also in chapters of such Japanese books as Takane and Yanai (1972), Nishisato (1975), (Yanai and Takane, 1977), Saito (1980), Yanai and Takagi (1986), Ohsumi et al. (1994), Takane (1995), Hayashi (2001) and Adachi (2006). (3) Books in English: Our readers may be familiar with books in English, but let us start from the early days of quantification theory: Likert (1978), Whittaker (1978), Likert et al. (1979), Nishisato (1980, 1994, 2007a), Gauch (1982), Meulman (1982), de Leeuw (1984), Greenacre (1984), Greenacre (1984), Lebart et al. (1984), Nishisato and Nishisato (1984), Nishisato and Nishisato (1994); van der Heijden (1987), van Rijckevorsel (1987), van der Burg (1988), Kiers (1989), Koster (1989), Gifi (1990), van Buuren (1990), Weller and Romney (1990), Benzécri (1992), van de Geer (1993), Greenacre and Blasius (1994), Gower and Hand (1975), Clausen (1998), Blasius and Greenacre (1998), Le Roux and Rouanet (2004), Verboon (1994), Murtagh (2005) and Beh and Lombardo (2014). After the last entry of English books, another book was published (i.e., Nishisato, 2022). The readers can pick up any of those books to find out what quantification theory is. Different books have different orientations, some simple and others advanced, and no matter what, one can get good ideas about quantification theory from any one of these books. In the current book, we will look at quantification theory from one specific point of view, namely quantification space. The major difference of the current book from the other books is a new look at quantification theory through Stage 1 and Stage 2 analyses. This new framework makes it clear what is involved in identifying coordinates in doubled multidimensional space in relation to half the space which involves quantification of the contingency table. This new strategy can be considered the main contribution of the current book for the general development of quantification theory, in particular, multidimensional geometric problems of graphical display.
7.3 First Step
97
7.3 First Step Quantification theory deals with nominal measurement such as • (x1 = brie, x2 = cottage cheese, x3 = goat cheese) • (x1 = Liberal party, x2 = Conservative party, x3 = New democratic party, x4 = Green party) and its main task is to assign numerals to those nominal measurements. Its desideratum is to extract from nominal data an exhaustive amount of information, using the choice frequencies of those nominal variables as the main source for quantification of nominal variables or categories. Many researchers have chosen different criteria for this quantification task, and the names of quantification methods have often been chosen to reflect their criterion statistics. Some of the popular ones are correlation (simple, multiple, canonical), correlation ratio, homogeneity coefficient and Cronbach’s coefficient of internal consistency reliability. More concretely we wish to determine unknown nominal variates so as to maximize one of these statistics, for example: • Determine the unknown category values so as to maximize the row-column correlation of the contingency table. • Determine the column (or row) weights of a two-way contingency table so that the ratio of the between-row (column) sum of squares to the total sum of squares (i.e., correlation ratio) be a maximum. • Determine the response options of multiple-choice questions so that Cronbach’s reliability coefficient (Cronbach, 1951) be a maximum. When we deal with unknown numerals, one wonders how one can find specific values for nominal categories. This question is reasonable because there is no logical basis to fix the values. As we will see shortly, we introduce a set of constraints on those unknown numbers so that we may eventually determine the values. This shows us how different our problem is from handling ratio measurement which has the rational origin and the unit of measurement. Let us use the same example as we have already seen in Chap. 1, that is, the teacher evaluation data, which we will reproduce here for convenience: Recall that twenty-nine (29) students were asked to evaluate the performance of three teachers in terms of three evaluation categories, good, average, poor (Table 7.1). Our task is to assign appropriate values to the three teachers and the three evaluation categories.
98
7 General Introduction
Table 7.1 Evaluation of three teachers Teacher Good Average White Green Brown Total
1 3 6 10
Poor
3 5 3 11
Total
6 2 0 8
10 10 9 29
7.3.1 Assignment of Unknown Numbers Let us use the following notation: (1) (2) (3) (4) (5) (6) (7) (8) (9)
n = the number of rows of the contingency table. m = the number of columns of the contingency table. F = the n × m contingency table. f i j = the element of cell (i, j) of F. f i. = the sum of the elements in row i of F. f . j = the sum of the elements in column j of F. f t = the total of f i j , that is, the total number of responses. yi = the unknown weight for the row i of F. x j = the unknown weight for the column j of F.
Given the above data, F, we express our data in terms of two sets of unknowns, one set for the three teachers in the rows of this table (y1 , y2 , y3 ) and the three evaluation categories (x1 , x2 , x3 ). These six variables are unknown numbers, and our quantification task is to determine the values for these unknowns in the best possible way. To see the scope of our task, let us represent our data in terms of these unknowns (Table 7.2). There are six unknowns, and our task is to assign to them the best possible, that is, optimal, values. Since all of them are unknown, we need constraints on them so
Table 7.2 Teacher evaluation data expressed by unknown numbers Teacher Good Average White
(y1 , x1 )
(y1 , x2 ) (y1 , x2 ) (y1 , x2 )
Green
(y2 , x1 ) (y2 , x1 ) (y2 , x1 )
Brown
(y3 , x1 ),(y3 , x1 ) (y3 , x1 ),(y3 , x1 ) (y3 , x1 ),(y3 , x1 )
(y2 , x2 ), (y2 , x2 ) (y2 , x2 ), (y2 , x2 ) (y2 , x2 ) (y3 , x2 ), (y3 , x2 ) (y3 , x2 )
Poor (y1 , x3 ),(y1 , x3 ) (y1 , x3 ),(y1 , x3 ) (y1 , x3 ),(y1 , x3 ) (y2 , x3 ) (y2 , x3 )
0
7.3 First Step
99
that under the constraints we may determine the values of the unknowns. Recall our discussion on measurement, where we saw the ratio measurement with the rational origin and the unit so that we may subject the data of ratio measurement to any of the basic mathematical operations. We are now at the other end of the spectrum, where there is neither the rational origin nor the unit. To compensate for this lack of the basic requirements for computation, we are now introducing constraints on the unknowns.
7.3.2 Constraints on the Unknowns The following are standard constraints we use in quantification theory: We set the origin and the unit of the unknown measurements as follows: • The sum of responses weighted by yi is zero: f i j yi = 0. 0. • The sum of responses weighted by x j is zero: f i j x j = f i. yi 2 = f. j x j 2 = ft . • The sum of squares of responses is equal to f t : In other words, with these constraints we have now introduced the origin and the unit of the unknown nominal measurement yi and x j . Remember that the highest (most quantitative) level of measurement is ratio measurement, and it has the rational origin and the equal unit. In comparison, quantification theory deals with the lowest (least quantitative) level of measurement and in the process of its quantification, we have just introduced an arbitrary origin and an arbitrary unit. Unfortunately, we have to live with these arbitrary choices so as to overcome the qualitative aspect of our data. This is the way that quantification theory deals with quantifying the least quantitative data, that is, under arbitrary constraints. Let us introduce matrix notation for our data. The data matrix is indicated by F, ⎡
⎤ 136 F = ⎣3 5 2⎦ 630 fc is the column vector of the column totals of F, ⎡
⎤ 10 fc = ⎣ 11 ⎦ 8 fr is the row vector of the row totals of F,
100
7 General Introduction
⎡
⎤ 10 fr = ⎣ 10 ⎦ 9 Dc is the diagonal matrix of the column totals of F, ⎡
⎤ 10 0 0 D = ⎣ 0 11 0 ⎦ 0 0 8 Dr is the diagonal matrix of the row totals of F, ⎡
⎤ 10 0 0 Dr = ⎣ 0 10 0 ⎦ 0 0 9 y is the column vector of scores for the teachers, ⎡
⎤ y1 y = ⎣ y2 ⎦ y3 x is the column vector of weights for the categories, ⎡
⎤ x1 x = ⎣ x2 ⎦ x3 In matrix notation, our constraints on the unknowns are: (1) fr y = 0 (2) fc x = 0 (3) y Dr y = x Dc x = f t .
7.4 Formulations of Different Approaches Our task is to find the values of these unknowns (yi , x j ) under the constraints mentioned above.
7.4 Formulations of Different Approaches
101
Let us now look at three popular formulations, the bivariate correlation approach, the one-way analysis of variance approach and the reliability coefficient approach. Notational note: In quantification, we will talk about product-moment correlation, singular value, projection operator, eigenvalue and correlation ratio. In typical settings, these statistics have different notational symbols. In quantification theory, most of these statistics are used for optimization and it turns out that optimal quantities of these statistics are strictly related. Therefore, we will use ρ and ρ 2 , rather than other familiar notations such as η and η2 , to avoid possible confusion. In other words, we have the following identities of maximized statistics: (1) Singular value ρ = maximized correlation = projection operator. (2) Eigenvalue ρ 2 = maximized correlation ratio = maximized squared correlation.
7.4.1 Bivariate Correlation Approach Consider the data in Table 1.1. This is a table of two sets of unknowns (those in y and those in x). The bivariate correlation approach must determine those weights for rows and those for columns in such a way that the row-column correlation be a maximum under the aforementioned constraints. For f t responses, there are f t pairs of weights (yi , x j ). Since the two weights in each pair are supposed to describe the same response, it is reasonable to expect that the two weights given to the same response should be as close as possible. Thus, the correlation defined over those f t pairs is an ideal candidate for the optimization criterion. The correlation ρ between responses weighted by yi and those weighted by x j can be expressed as y Fx 1 ρ= = y Fx ft x Dc xy Dr y under the constraints that fr y = fc x = 0 and y Dr y = x Dc x = f t . For this maximization problem, we will define the task as that of maximizing the cross product, subject to the conditions on the denominators. This is a problem of Lagrangian method of unknown numbers, as discussed earlier. For the current problem, the Lagrangian function can be defined as 1 1 Q(y, x,, λ1 , λ2 ) = y Fx − λ1 (y Dr y − f t ) − λ2 (x Dc x − f t ). 2 2
102
7 General Introduction
Then, we solve the following set of equations: ∂Q = 0, ∂y
∂Q = 0, ∂x
∂Q = 0, ∂λ1
∂Q = 0. ∂λ2
Thus, we obtain the following equations to solve, F y − λ1 Dc x = 0, Fx − λ2 Dr y = 0
x Dc x − f t = 0, y Dr y − f t = 0. The first two equations lead us to the following equations,
λ1 =
x F y y Fx , λ . = 2 x Dc x y Dr y
Notice that the denominators and the numerators of these two equations are equal, and we arrive at the conclusion that the two Lagrange multipliers are equal. Notice that the product of these two expressions is the square of the correlation coefficient. Therefore we conclude that the Lagrange multipliers are equal and equal to our correlation coefficient, that is, λ1 = λ2 = ρ. Note further that we obtain from the first two partial derivatives the formulas for dual relations (Nishisato, 1980):
y=
1 −1 1 Dr Fx, x = Dc −1 F y. ρ ρ
These weight vectors, y and x, are called normed weights (Nishisato, 1980) or standard coordinates (Greenacre, 1984). But, as we will later discuss joint graphical display, the weight vectors y, x, obtained from the contingency table in our example, are not Euclidean coordinates for the unknown variates. Therefore, we will simply adopt Nishisato’s designation. Let us rewrite them as follows: ρy = Dr −1 Fx, ρx = Dc −1 F y. These are called projected weights (Nishisato, 1980) or principal coordinates (Greenacre, 1984). As noted above, these are neither principal coordinates nor
7.4 Formulations of Different Approaches
103
Euclidean coordinates. Therefore, we will call them as projected weights to avoid any confusion. If we substitute x for y and y for x in the dual relations, we obtain
y=
1 −1 1 −1 Dr FDc −1 F y, x = 2 Dc −1 F Dr Fx. 2 ρ ρ
From these relations, we obtain (F Dc
−1
F − ρ 2 Dr )x = 0, (FDr −1 F − ρ 2 Dc )y = 0.
If we set −1
1
1
Dr 2 FDc − 2 = B, Dc2 x = w, and Dr2 y = v. 1
then, we can derive the standard form: (B B − ρ 2 I)w = 0, (BB − ρ 2 I)v = 0. It is well known that the above equation has the so-called trivial solution, namely the equation is always satisfied when ρ 2 = ρ02 = 1, w = w0 = 1, v = v0 = 1. irrespective of what data we may use. Therefore, we must remove the trivial solution from the eigenequation and analyze the residual matrix C or C , given by
C = B B − ρ0
1 w0 w0 1 1 = B B − Dc 2 11 Dc 2 w0 w0 ft
C = BB − ρ0
1 v0 v0 1 1 = BB − Dr 2 11 Dr 2 . v0 v0 ft
Let us consider only C. The first component we extract is given by solving the following equation associated with the maximal eigenvalue, say ρ12 , (C − ρ 2 I)w = 0.
104
7 General Introduction
Once the eigenvector, associated with the largest eigenvalue η12 , that is, w1 , is obtained, we calculate the first optimal weight vector x1 from w1 by the following formula, −1
x1 = Dc 2 w. Recall the constraints we imposed on our quantification. Thus, our weights for the categories are scaled in such a way that w1 w1 = x1 Dc x1 = f t . We can carry out our analysis with C , but this is not necessary since we already know the dual relations between x and y. The optimal scores are given by
x1 =
1 −1 1 Dc F y1 and y1 = D−1 Fx1 . ρ1 ρ1 r
Or, perhaps more meaningfully than the above expressions, we can write the optimal dual relations as −1 ρ1 x1 = D−1 c F y1 , ρ1 y1 = Dr Fx1 .
Later we learn that ρ is a projection operator. Then, the last expressions become meaningful, namely the projected quantities of x and y are given by the average of the responses weighted by y and x, respectively.
7.4.2 One-Way Analysis of Variance Approach Unlike the previous approach, the analysis of variance approach considers determining row weights only, or column weights only, and then derives the other weights later. We will use the same data sets and call the first task of determining one set of weights (rows or columns) Task a and that of determining the other set Task b. Task a: Given the data matrix as in Table 7.1, let us consider the one-way analysis of variance model to investigate the teacher differences, that is, to examine if there exist significant differences among the three teachers in their performance ratings. Using the same notation, we assign arbitrary weights to the three evaluation categories, that is, x1 for “Good,” x2 for “Average” and x3 for “Poor.” Then, the data in Table 7.1 can be converted only to these as in Table 7.3.
7.4 Formulations of Different Approaches
105
Table 7.3 Data expressed by unknown numbers to ratings Teacher Good Average White
x1
Green
x1 x1 x1 x1 , x1 x1 , x1 x1 , x1
Brown
Poor x3 , x3 x3 , x3 x3 , x3 x3 x3
x2 x2 x2 x2 , x2 x2 , x2 x2 x2 , x2 x2
0
In one-way analysis of variance, we define the total sum of squares (SSt ), the between-group (between-teachers) sum of squares (SSb ) and the within-group sum (within-teachers) of squares (SSw ), which are given as follows: SSt = x [Dc −
fc fc ]x ft
SSb = x [F Dr −1 F −
fc fc ]x ft
SSw = SSt − SSb = x [Dc − F Dr
−1
F]x.
There is a statistic called correlation ratio, indicated by ρ 2 and defined by
ρ2 =
SSb SSt
Our quantification task is carried out as the problem of determining x so as to maximize ρ 2 . The objective function for this maximization is the correlation ratio, and we wish to determine x so as to maximize ρ 2 . This problem can also be handled as that of maximizing SSb , subject to the two conditions that (1) SSt is constant, say, SSt = f t and (2) the sum of the weighted responses is zero. This problem can be handled by Lagrange’s method by stating that we want to: “Maximize SSb , subject to the conditions that SSt = f t and x f c f x = 0.”
106
7 General Introduction
Then, our Lagrangian function is Q(x, λ1 , λ2 ) = SSb − 2λ1 (SSt − f t ) − 2λ2 (fc x − 0) Partially differentiating the Lagrangian function with respect to x and two Lagrangian multipliers, and setting them to 0, we obtain fc f fc f ∂ Q(x, λ1 , λ2 ) −1 = 2[F Dr F − c ]x − 2λ1 [D − c ]x − 2λ2 fc fc x = 0 ∂x ft ft ∂ Q(x, λ1 , λ2 ) fc f = x [D − c ]x − f t = 0 ∂λ1 ft ∂ Q(x, λ1 , λ2 ) = x f c fc x = 0. ∂λ2 The last two are nothing but our constraints on the unknowns. If we simplify the first expression, we arrive at the following: [F Dr
−1
F − λDc ]x = 0 −→ F Dr
−1
Fx = λDc x.
This is called a generalized eigenequation. Why generalized? Do you recall that the standard form of our eigenequation has the identity matrix, in lieu of Dc ? The value λ is called an eigenvalue, and the vector x is the corresponding eigenvector. Notice here that our eigenvalue is nothing but the correlation ratio we want to maximize. This can be seen by first pre-multiplying both sides by x , and then rearranging the resultant expression is as follows:
λ=
x F Dr −1 Fx = ρ2, x Dc x
where ρ 2 is the correlation ratio. Notice that the quantified data are centered, thus without the mean in the formula. Let us introduce a new vector w, where 1
w = Dc 2 x
7.4 Formulations of Different Approaches
107
Then rewrite the formula for correlation ratio in terms of this new vector, −1
w B Bw w Dc − 2 F Dr FDc − 2 w = , ww w w 1
η2 =
1
where B = Dr − 2 F Dc 1
− 21
Therefore the standard eigenequation is given by [B B − λI]w = 0. Once w is obtained, x can be obtained by x = Dc − 2 w 1
It is well known that the above eigenequation involves one trivial solution, namely λ0 = 1 and w0 = 1, irrespective of the matrix B, or the data matrix F. Therefore, it is more efficient to eliminate it before solving the eigenequation than solving it for the entire set of solutions. In order to eliminate the trivial solution, we can follow the standard procedure and calculate the residual matrix, say C, where
C = B B − λ0
w0 w0 = B B − . w0 w0
In our numerical example, the matrix C is given as follows: ⎡
⎤ 0.1552 0.0006 −0.1742 C = ⎣ 0.0006 0.0207 −0.0250 ⎦ . −0.1742 −0.0250 0.2241 Our first component is the solution of the following equation, associated with the maximal eigenvalue, say η12 , (C − η2 I)w = 0.
108
7 General Introduction
Table 7.4 Data expressed by unknown numbers, given to teachers Teacher Good Average White
y1
Green
y2 y2 y2 y3 ,y3 y3 ,y3 y3 ,y3
Brown
y1 y1 y1 y2 , y2 y2 , y2 y2 y3 , y3 y3
Poor y1 , y1 y1 , y1 y1 , y1 y2 y2 0
Once the eigenvector, associated with the largest eigenvalue η12 , that is, w1 , is calculated, we obtain the first optimal weight vector x1 from w1 , that is, −1
x1 = Dc 2 w. Recall that the weights for the categories are scaled in such a way that w1 w1 = x1 Dc x1 = f t . Let us now consider Task b, which deals with the data with unknowns as presented in Table 7.4. Task b: Our quantification task can also be carried out as the problem of determining rating scores y so as to maximize η2 . In this case, the input data are expressed in terms of the scores given to the rating categories (Table 7.4). The decomposition of the total sum of squares can now be expressed in terms of y: ⎧ fr fr ⎪ ⎨ SSt = y [Dr − ft ]y f f SSb = y [FDc −1 F − rftr ]y ⎪ ⎩ SSw = y [Dr − FDc −1 F ]y. Symmetric Quantification: As Nishisato (1980, 2022) has stressed, the quantification process is completely symmetric with respect to the rows and the columns. In other words, instead of starting with the optimal vector for the columns of the data matrix, one can start with the optimal vector for the rows of the data matrix. We will arrive at exactly the same result.
7.4 Formulations of Different Approaches Table 7.5 Multiple-choice data Subject Q1-1 Q1-2 Q2-1 1 2 3 . N
1 0 1 . 0
0 1 0 . 1
0 0 0 . 1
109
Q2-2
···
Qn-1
Qn-2
Qn-3
Qn-4
1 1 1 . 0
··· ··· ··· ··· ···
0 0 0 . 1
1 0 0 . 0
0 0 1 . 0
0 1 0 . 0
Once we obtain x, for example, the corresponding optimal scores for subject i of the component 1 can be obtained by
y1 =
1 −1 D Fx. ρ1 r
As we have demonstrated, we can express SSt , SSb and SSw in terms of the weights for the rows, that is, y, and maximize the correlation ratio, SSb /SSt , subject to the condition that the sum of the weighted responses is zero and that the sum of squares of the weighted responses is equal to f t . It results in the eigenequation [BB − λI]v = 0, where y = Dr − 2 v. 1
So far, we have considered the contingency-table representation of data, but the same idea can be extended to n-item multiple-choice data. For example, consider the following example: 1. Do you live in Toronto? [yes, no] 2. Do you work in Toronto? [yes, no] 3. Which party do you support? [Liberal Party, Progressive Conservative Party, New Democratic Party, Green Party] So, let us generalize the case and consider that the data are collected from N people, answering n multiple-choice questions. Then the data are presented as in Table 7.5 in which N subjects responded to n questions with items 1 and 2 having two response options each and item n four response options. To quantify this type of data, we will consider the quantification approach described in the next section.
110
7 General Introduction
7.4.3 Maximization of Reliability Coefficient Alpha Let us call this approach as maximization of Cronbach’s reliability α (Cronbach, 1951). Lord (1958) considered partitioning the N × n (subjects-by-items) data matrix. More specifically, Lord considered the partitioning of the total sum of squares SSt into the sum of squares between items SSn , the sum of squares between subjects SSb , and the residual sum of squares SSe . Namely, SSt = SSn + SSb + SSe . In terms of these quantities, Lord expressed Cronbach’s α as
SSt − SSn n . 1− α= n−1 SSb Nishisato (1980) used Lord’s formula for the response-pattern matrix expressed its relation to one of the objective functions used in quantification, that is, the correlation ratio ρ 2 . Note that earlier we observed the following relation
ρ2 =
SSb . SSt
In quantification theory, we typically set f x = 0, which means SSn = 0. Under this condition, Nishisato (1980) showed the following relation.
ρ2 =
1 1 + (n − 1)(1 − α)
or
α =1−
1 − ρ2 . (n − 1)η2
Thus, if we restrict the range of α as 1≥α≥0
7.5 Multidimensional Decomposition
111
it follows
1 ≥ η2 ≥
1 . n
This result on the range of ρ 2 should be kept in mind when we discuss the quantification of the response-pattern format of the contingency table in Phase 2 analysis. From the above discussion, it is clear that the maximization of α means the maximization of ρ 2 as we have already seen. In other words, the maximization of the correlation (approach 1) and the correlation ratio (approach 2) is equivalent to the maximization of α.
7.5 Multidimensional Decomposition There are many other approaches as the relevant literature suggests, but it is good to know that all of these different criteria lead to the same singular value decomposition of the data matrix under the same set of constraints on the unknowns. Let us define the following matrices: ⎡
1 ⎢0 ⎢ =⎢ ⎢0 ⎣. 0
0 ρ12 0 . 0
0 0 ρ22 . 0
... ... ... ... ...
⎤ 0 0 ⎥ ⎥ 0 ⎥ ⎥ . ⎦ ρk2
W = [w0 , w1 , w2 , ..., wk ]
X = [x0 , x1 , x2 , ..., xk ], 1
where w0 = 1 and x0 = Dc2 1. For each component, our eigenequation satisfies the following dual relations, 1 yik = ρk
J j=1
f i j x jk
f i.
and
x jk =
1 ρk
I
f i j yik . f. j
i=1
We can also express the set of components in terms of the weighted averages as J ρk yik =
j=1
f i j x jk
f i.
I and ρk x jk =
i=1
f i j yik
f. j
112
7 General Introduction
The entire set of components can be expressed as follows: B BW = W, or W B BW = Note that W W = X Dc X = I, where I is the identity matrix. The decomposition of the contingency table can be summarized by the wellknown bilinear expansion of an element of the contingency table f i j . The typical element in cell (i, j) can be expressed by the following formula: fi j =
f i. f . j [1 + ρ1 yi1 x j1 + ρ2 yi2 x j2 + · · · + ρk yik x jk + · · · + ρ K yi K x j K ] ft
where we typically assume that ρ1 ≥ ρ2 ≥ ρ3 ≥ · · · ≥ ρk ≥ · · · ≥ ρη K . The first term inside the square bracket corresponds to the trivial component, which accounts for the portion of data expected when the rows and the columns are statistically independent. When we approximate the data matrix by using up to ρk in the above formula, it is called the rank k approximation to the data matrix (Nishisato & Nishisato, 1994). The above formula can be expressed by matrix notation as follows: F=
1 1 Dr Y 2 XDc ft
where these terms in our numerical example of the teacher evaluation contingency table are: ⎡ ⎤ ⎡ ⎤ 136 10 0 0 F = ⎣3 5 2⎦, f t = 29, Dr = ⎣ 0 10 0 ⎦ , 630 0 09 ⎡
⎤ 1.0000 1.0000 1.0000 Y = ⎣ −1.2322 0.1227 1.2326 ⎦ , −1.9739 4.3269 −2.5056
⎡
⎤ 1.0000 0.0000 0.0000 = ⎣ 0.0000 0.6069 0.0000 ⎦ , 0.0000 0.0000 0.1780
7.6 Eigenvalues and Singular Values Decompositions
⎡
⎤ 1.0000 1.0761 −0.8608 X = ⎣ 1.0000 0.0920 1.2755 ⎦ , 1.0000 −1.4126 −0.6795
113
⎡
⎤ 10 0 0 Dc = ⎣ 0 11 0 ⎦ 0 08
We have so far discussed an eigenequation associated with our quantification task. We have seen a number of equations, and all of them are variations of singular value decomposition (SVD) of a data matrix (e.g., Beltrami, 1873; Jordan, 1874; Schmidt, 1907; Eckart & Young 1936). While SVD handles a rectangular matrix, eigenvalue decomposition deals only with a square matrix, but they are closely related to each other. In this context, we can understand why (Torgerson, 1958) called the quantification method “principal component analysis of categorical data.” Let us now briefly look at SVD and eigenvalue decomposition (EVD).
7.6 Eigenvalues and Singular Values Decompositions Eigenvalues are also called latent roots and characteristic roots and appear in many mathematical problems. For instance, consider a quadratic function of x and y and we want to eliminate the product term of both variables. The task can be carried out through orthogonal transformation of the coordinate system to principal axes. This is called canonical reduction of a quadratic form. The standard form of an eigenequation has already been introduced. For a square matrix A, the eigenequation is given by [A − λI]w = 0,
or,
Aw = λw,
where λ is an eigenvalue and w is the corresponding eigenvector. When A is an n × n full-rank matrix, there exist n eigenvalues and n corresponding eigenvectors. For this entire set, we have AW = W, where W = [w1 , w2 , . . . , wn ], and = diag[λ1 , λ2 , . . . , λn ]. Since W is an orthogonal matrix, its inverse is equal to the transpose. Therefore, pre-multiplying the complete eigenequation by W , we obtain W AW = . Thus, the eigenequation can be viewed as orthogonalization of the original matrix A. Because W is an orthogonal matrix, it follows that Ak W = Wk .
114
7 General Introduction
In other words, the raising the power of A does not change the matrix of eigenvectors, but raises the power of each eigenvalue to k. This aspect is fully used in the power method such as the method of reciprocal averages MRA (Richardson and Kuder, 1933; Horst, 1935; Mosier, 1946). Singular value decomposition (SVD) is more general than eigenvalue decomposition in the sense that SVD deals with any rectangular matrix, say F. SVD is also based on the orthogonalization of a rectangular matrix, that is, Y FX = , where ⎡
ρ1 ⎢ 0 Y Y = I, X X = I, = ⎢ ⎣ . 0
0 ρ2 . 0
⎤ .. 0 .. 0⎥ ⎥. .. . ⎦ . . ρr
In other words, any rectangular matrix can be expressed as the product of an orthogonal matrix to depict the row structure, times a diagonal matrix of singular values, times an orthogonal matrix to describe the column structure of the data matrix, F = Y X. The relation between SVD and EVD is simple. Consider the following two forms of the product of a rectangular matrix: FF = (Y X)(X Y) = Y 2 Y = Y Y
F F = (X Y)(Y X) = X 2 X = X X. So we obtain the well-known results that the square of a singular value is an eigenvalue and that the eigenvalues of FF are the same as those of F F. Let us note that principal component analysis deals typically with the square matrices such as correlation matrices and variance-covariance matrices, which are square. In contrast, quantification theory deals with any data matrices such as contingency tables and response-pattern tables which are typically rectangular. This aspect of quantification theory leads to the problem of the so-called perennial problem of joint graphical display as we will later discuss. We then realize that principal component analysis and quantification analysis are in essence quite different from each other.
7.7 Finding the Largest Eigenvalue
115
7.7 Finding the Largest Eigenvalue In dealing with SVD or EVD, we often use an iterative method to extract singular values or eigenvalues according to the order of their magnitudes. This is so because in most multidimensional analyses we are not interested in all the components but only in a few components associated with large singular values or eigenvalues. To this end, let us discuss how to extract the largest eigenvalue. Let V be an n × n symmetric matrix, i.e., V = V , be the n × n diagonal matrix of eigenvalues, and U be the n × n matrix with eigenvectors in its columns. Then, the eigenequation is defined by Vu j = λ j u j or VU = U. Since U is an orthogonal matrix, its inverse is given by the transpose of U, and it follows that V = UU and = U VU. Suppose k is a positive integer, and consider the power of V, for instance, k = 2,
VV = [UU ][UU = U(U U)U = UIU = U2 U . Thus, more generally for any k, we obtain the following: Vk = V × V × V × · · · × V = (UU ) × (UU ) × · · · (UU ) = U U k
Consider any arbitrary vector of n elements, b0 . This can be expressed as a linear combination of n eigenvectors, ui , that is,
b0 =
n
ci ui .
i=1
But, Vu = λu. Therefore, we obtain
Vb0 =
n i=1
λi ci ui .
116
7 General Introduction
Let us form a sequence of vectors as follows: b0 , Vb0 = b1 , Vb1 = V2 b0 = b2 , . . . . . . , Vb p−1 = V p b0 = b p
b p = V p b0 =
n
λi p ci ui = λ1 p
i=1
⎧ ⎨ ⎩
c1 u1 +
n λj p j=2
λ1
⎫ ⎬ cjuj
⎭
.
Therefore,
lim b p = λ1 p c1 u1 ,
p→inf
, and λ1 =
bp b p bp b p−1
.
7.8 Method of Reciprocal Averages Let us look at the method of reciprocal averages (MRA) (Richardson & Kuder, 1933; Horst, 1935; Mosier, 1946) in terms of a process of an infinite series. Let V be an n × n symmetric matrix, and b0 be an n × 1 arbitrary non-null vector. k j is the largest absolute value of a resultant vector b j . From the following sequence which is known to be mathematically convergent (Nishisato, 1980):
Vb0 = b1 ,
b1 = a1 , Va1 = b2 , k1
b2 = a2 , . . . , Va j−1 = b j , k2
bj = aj. kj
This sequence eventually reaches the state that a j−1 = a j . Then, bj = a j means kj
bj = kjaj.
Then, it follows that Va j−1 = Va j = b j = k j a j , and Va j = k j a j . The above formulas and the convergent sequence, put together, indicate that k is the largest eigenvalue and that a is the corresponding eigenvector.
7.8 Method of Reciprocal Averages
117
When V is not square or symmetric, Nishisato (1987) has shown that the process converges to two generally distinct constants, say k j and k∗ j . Then, the eigenvalue is the product of these and the singular value is the geometric mean of the two constants. For us to extract the second component, we calculate the residual matrix C1 by eliminating the contribution of the first component from C,
C1 = C − ρ12
w1 w1 . w1 w1
The maximal eigenvalue of this equation, say η22 , and the associated eigenvector w2 are then calculated from C1 . Once w2 is obtained, the second optimal weight vector x2 is obtained from w2 , that is, x2 = Dc w2 . The same process continues until the data are exhaustively analyzed. Nishisato (1987, see also Nishisato & Clavel, 2003) showed that the row-column correlation rr c also converges to the stable value and that it can be converted to the angle θr c between the row axis and the column axis, calculated from the correlation by θr c = cos−1 rr c Thus, when the two variables are perfectly correlated, the two axes become one and the variates span the same space, but when the correlation is not perfect the two axes of the two variables are separated by θr c . This point is crucially important when we later discuss joint graphical display of quantification theory. In terms of graphical display, there is one key difference between PCA and quantification theory. This difference comes from the fact that PCA is typically uni-modal analysis, while quantification theory of the contingency table is bimodal. Thus, the methods employ the following decompositions: • PCA: the variance-covariance matrix is decomposed into Y Y = Z 2 Z so that the matrix of principal coordinates is given by Z . This is the decomposition of Z 2 Z, that is, ZZ = 2 . • Quantification theory: The standardized frequency table is given by Y X, so that we either obtain Y = Z and X, or Y and X = Z. This is the decomposition of Y X. Notice the difference in which the squared singular value appears in PCA, while the singular value is used in quantification theory. This distinction will later prove to be the source of the perennial problem of joint graphical display in quantification theory as discussed in the next section.
118
7 General Introduction
7.9 Problems of Joint Graphical Display At this stage, let us introduce the perennial problem of joint graphical display, a unique problem for quantification theory. When we carry out quantification analysis, principal component analysis and factor analysis, the graphical display of the outcome plays an important role in interpreting the multidimensional outputs. When we look at the case of quantification analysis, we have a fundamental problem, the so-called perennial problem of joint graphical display. Let us now introduce this problem. In quantifying the contingency table, one of our objective functions is the rowcolumn correlation. We determine the weights (scores) for the rows and the columns so as to maximize the row-column correlation. As we have seen so far, we extract those components from the contingency table, and it is customary to plot the row weights and the column weights typically for a combination of two components in a two-dimensional graph (e.g., component 1 against component 2). Recall, however, that we maximized the row-column correlation. Therefore, those row weights and column weights were associated with the maximized correlation: We know that the maximized correlation is typically different from 1. Mathematically speaking then, if the correlation is 1, we can plot rows and columns in the same space, but almost always this is not the case. If the correlation is less than 1, we know that the row weights require one axis and the column weight another. Furthermore, we know that the angle between the row axis and the column axis for a given component can be calculated by θ = cos−1 ρ (Nishisato & Clavel, 2003). As clarified by Nishisato (1980), we need a two-dimensional graph for representing one component of our quantification task. See more explanation of this point in Nishisato (2016, 2019b, 2022), Beh and Lombardo (2014). Empirically, we can state that the angle between the row axis and the column axis associated with a single component is typically 30 ◦ or greater. See some numerical examples in Chap. 8. Yet, in using the most popular current graphical display, called correspondence plot or French plot, researchers typically ignore the space discrepancy between the row axis and the column axis and place both row weights and column weights on the same axes (common space). This is referred to as the perennial problem of joint graphical display. The problem is that almost always the correlation between the rows and the columns of the contingency-table data is different from 1. This means that there is one axis to accommodate row variables and the other column variables. We should know that the row-column correlation is almost always imperfect, and in consequence, we need one axis for rows and the other for columns. How can we justify to represent row weights and column weights of a single component on a single axis? Lebart et al. (1977) warned that the distance between a row and a column in a correspondence plot is not accurate, but their warning has almost always been ignored. The truth we should keep in mind is this: Each of the individual components from the contingency table requires a two-dimensional graph. Then how can we justify the
7.10 How Important Data Formats Are
119
graphical method that places row weights and column weights of a single component from the contingency table on the same axis? The only solution to this problem is to carry out quantification analysis of the response-pattern form of the contingency table and use doubled multidimensional space (Nishisato, 2016; 2019b; 2022; Beh & Lombardo, 2014). We will show the comparison of graphs from the contingency table and those from the response-pattern table in Chaps. 9 and ch10. We will also note that at the current moment there is no satisfactory way to draw multidimensional graphs where the number of dimensions is greater than 3. To compensate for this problem, our Stage 2 analysis, to be introduced shortly, also include in its output the (rows, columns)-by-(rows, columns) distance matrix, which contains the total information of the response-pattern data.
7.10 How Important Data Formats Are What we have not discussed so far is quantification of a modified contingency table, in particular, modification of the table into the form of the response-pattern frequencies. Nishisato (1980) presented many interesting comparisons between the contingency table and its transform to the response-pattern table. Nishisato’s condensed responsepattern table of our contingency table (Table 7.1) can be represented as in Table 7.2. As we can see clearly, both rows and columns of the contingency table are now placed in the columns of the response-pattern table. Therefore, the Young–Householder theorem (Young & Householder, 1939) guarantees that the columns of the responsepattern table (i.e., the rows and the columns of the contingency table) now span the same space (Table 7.6). We will further see that the response-pattern table yields the required space of doubled multidimensionality to accommodate both rows and columns of the contingency table in common multidimensional space, that is, common coordinates for both rows and columns. Thus, the greatest advantage of quantifying the response-
Table 7.6 Response-pattern table of teacher evaluation data P* White Green Brown Good 1 2 3 4 5 6 7 8
1 3 6 0 0 0 0 0
Note P* = response-patterns
0 0 0 3 5 2 0 0
0 0 0 0 0 0 6 3
1 0 0 3 0 0 6 0
Average
Poor
0 3 0 0 5 0 0 3
0 0 6 0 0 2 0 0
120
7 General Introduction
pattern format is that we will obtain Euclidean coordinates of both rows and columns of the original contingency table. In other words, the rows and the columns can now be mapped correctly in the same space, the only additional difference being that the response-pattern table requires doubled dimensions (see the theoretical discussion of quantification space in Chap. 8).
7.11 A New Framework: Two-Stage Analysis The quantification of the contingency table yields the weights (scores) for the rows and those for the columns such that the row-column correlation be a maximum. The row weights and the column weights, however, are not coordinates of the space which accommodates both the rows and the columns. To arrive at the Euclidean coordinates for the rows and the columns of the contingency table, we must transform the contingency table to the response-pattern format and quantify the latter table. When we find the optimal weights for the rows and the columns from the responsepattern format, we discover that the dimensionality of the data space is twice that of the contingency table. Thus, in concluding this chapter, we note: • Quantification analysis of the contingency table results in the weights (scores) for the rows and the columns such that the row-column correlation attains the maximal value. The weights for such rows and columns are called projected weights, and these are not the coordinates of the rows and the columns in common space. • Quantification analysis of the response-pattern format of the data yields principal coordinates of rows and columns in Euclidean space of doubled dimensionality. These coordinates are important not only from the graphical point of view but also from the information retrieval point of view—more specifically, those coordinates provide the (row, columns)-by-(row, columns) distance matrix, which is as equally important as the coordinates from the information retrieval point of view. For this reason, in order to obtain maximal row-column correlation, we quantify the contingency table, which we will call Stage 1 analysis, and in order to obtain principal coordinates for rows and columns in common space, we quantify the responsepattern table of the same data, which we call Stage 2 analysis. The dimensionality of Stage 2 analysis is twice that of Stage 1 analysis. Using this doubled space analysis, we will also provide the (rows, columns)-by-(rows, columns) distance matrix, as another source of full information one can extract from the response-pattern table. Although the analysis of the full-distance matrix is not a part of our main discussion, the distance table offers a hope for dealing with hyperdimensional data in case our graphical display fails to give us satisfaction. For us to understand the geometry of the analytical results of the two formats (Nishisato, 1980, 2016, 2019a, b), we will discuss geometry of quantification space in Chap. 8 and then we will carry out Stage 1 and Stage 2 analyses, using two numerical examples in Chap. 9. These are based on recent studies as mentioned above.
References
121
References Adachi, K. (2006). Tahenryo data Kaiseki Ho (Multivariate data analysis). Nakanishiya Publisher. Adachi, K., & Murakami, T. (2011). Hikeiryou Tahenryou Kaisekihou: Shuseibun Bunseki kara Tajyuu Taiou Bunseki e (Nonmetric multivariate analysis: From principal component analysis to multiple correspondence analysis). Asakura Shoten. Akiyama, S. (1993). Suryouka no graphics: Taido no Tahenryoukaiseki (Graphics for quantification: Multivariate analysis of attitudes). Asakura Shoten. Beh, E. J., & Lombardo, R. (2014). Correspondence analysis: Theory, practice and new strategies. Wiley. Beltrami, E. (1873). Sulle funzioni bilineari (On the bilinear functions). In G. Battagline & E. Fergola (Eds.), Giornale di Mathematiche (Vol. 11, pp. 98–106). Benzécri, J. P. (1992). Correspondence analysis handbook. Marcel Dekker. Benzécri, J. P., et al. (1973). L’analyse des données: II. L’analyse des correspondances. Dunod. Blasius, J., & Greenacre, M. J. (1998). Visualization of categorical data. Academic Press. Bouroche, J. M. (1977). Analyse des Données en Marketing. Masson. Bouroche, J. M., & Saporta, G. (1980). L’Analyse des Données. Presses Univesitaires de France. Caillez, F., & Pagés, J. P. (1976). Introduction a L’Analyse des Données. SMASH. Cibois, P. (1983). L’Analyse Factorielle. Presses Universitaires de France. Clausen, S. E. (1998). Applied correspondence analysis: An introduction. Sage Publications. Cronbach, L. J. (1951). Coefficient alpha and the internal structure of tests. Psychometrika, 16, 297–334. de Lagarde, J. (1983). Initiation á l’Analyse des Données. Dunod. de Leeuw, J. (1984). Canonical analysis of categorical data. DSWO Press, Leiden University. Eckart, C., & Young, G. (1936). The approximation of one matrix by another of lower rank. Psychometrika, 1, 211–218. Escofier, B., & Pagés, J. (1988). Analyses Factorielles Simples et Multiples. Dunod. Gauch, H. G. (1982). Multivariate analysis in community ecology. Cambridge University Press. Gifi, A. (1990). Nonlinear multivariate analysis. Wiley. Gower, J. C., & Hand, D. J. (1996). Biplots. Chapman & Hall. Greenacre, M. J. (1984). Theory and applications of correspondence analysis. Academic Press. Greenacre, M. J. (1993). Correspondence analysis in practice. Academic Press. Greenacre, M. J., & Blasius, J. (Eds.). (1994). Correspondence analysis in the social sciences. Academic Press. Greenacre, M. J., & Blasius, J. (Eds.). (2006). Multiple correspondence analysis and related methods. Chapman and Hall/CRC. Hayashi, C. (1974). Suryouka no Houhou (Methods of quantification). Tokyo Keizai Sha. Hayashi, C. (1993). Suryouka: Riron to Hoho (Quantification: Theory and methods). Asakura Shoten. Hayashi, C. (2001). Data no Kagaku (Data science). Asakura Shoten. Hayashi, C., Higuchi, I., & Komazawa, T. (1970). Johoshori to Tokeisuiri (Information processing and statistical mathematics). Sangyou Tosho. Hayashi, C., & Suzuki, T. (1986). Shakai Chosa to Suryouka (Social surveys and quantification). Iwanami Shoten. Hill, M. O. (1973). Reciprocal averaging: An eigenvector method of ordination. Journal of Ecology, 61, 237–249. Hill, M. O. (1974). Correspondence analysis: a neglected multivariate method. Journal of the Royal Statistical Society C (Applied Statistics), 23, 340–354. Hirschfeld, H. O. (1935). A connection between correlation and contingency. Cambridge Philosophical Society Proceedings, 31, 520–524. Horst, P. (1935). Measuring complex attitudes. Journal of Social Psychology, 6, 369–374. Horst, P. (1936). Obtaining a composite measure from a number of different measures of the same attribute. Psychometrika, 1, 53–60.
122
7 General Introduction
Hotelling, H. (1933). Analysis of complex of statistical variables into principal components. Journal of Educational Psychology, 24, 417-441, and 498-520. Iwatsubo, S. (1987). Suryouka no Kiso (Foundations of quantification). Asakura Shoten. Jambu, M. (1989). Exploration Informatique et Statistique de Données. Dunod. Jambu, M., & Lebeaux, M. O. (1978). Classification Automatique pour l’Analyse des Données. Métodes et Algorithmes: Dunod. Johnson, P. O. (1950). The quantification of qualitative data in discriminant analysis. Journal of the American Statistical Association, 45, 65–76. Johnson, R. M. (1963). On a theorem stated by Eckart and Young. Psychometrika, 28, 259–263. Jordan, C. (1874). Mémoire sur les formes bilinieres (Note on bilinear forms). Journal de Mathématiques Pures et Appliquées, deuxiéme Série, 19, 35–54. Kalantari, B., Lari, I., Rizzi, A., & Simeone, B. (1993). Sharp bounds for the maximum of the chi-square index in a class of contingency tables with given marginals. Computational Statistics and Data Analysis, 16, 19–34. Kendall, M. G. (1957). A course in multivariate analysis. Charles Griffin and Company. Kiers, H. (1989). Three-way methods for the analysis of qualitative and quantitative two-way data. DSWO Press, Leiden University. Komazawa, T. (1978). Tagenteki Data Bunseki no Kiso (Foundations of multidimensional data analysis). Asakura Shoten (in Japanese). Komazawa, T. (1982). Suryouka Riron to Data Shori (Quantification theory and data analysis). Asakura Shoten (in Japanese). Komazawa, T., Higuchi, I., & Ishizaki, R. (1998). Pasokon Suryoukabunseki (Quantification analysis with personal computers). Asakura Shoten. Koster, J. T. A. (1989). Mathematical aspects of multiple correspondence analysis for ordinal variables. DSWO Press, Leiden University. Lancaster, H. O. (1953). A reconciliation of χ 2 , considered from metrical and enumerative aspects. Sakhya, 13, 1–10. Lebart, L., Morineau, A., & Fénelon, J. P. (1979). Traitement des Données Statistiques. Dunod. Lebart, L., Morineau, A., & Tabard, N. (1977). Techniques de la Description Statistique: Méthodes et Logiciels pour l’Analyse des Grands Tableaux. Dunod. Lebart, L., Morineau, A., & Warwick, K. M. (1984). Multivariate descriptive statistical analysis. Wiley. Lefevre, J. (1976). Introduction aux analyses statistiques multidimensionnelles. Masson. Le Roux, B., & Rouanet, H. (2004). Geometric data analysis: From correspondence analysis to structured data. Kluwer. Likert, A. A. (1932). A technique for the measurement of attitudes. Archives of Psychology, 140, 1–55. Lingoes, J. C. (1964). Simultaneous linear regression: An IBM 7090 program for analyzing metric/nonmetric or linear/nonlinear data. Behavioral Science, 9, 87–88. Lingoes, J. C. (1978). Geometric representation of relational data. Mathesis Press. Lingoes, J. C., Roskam, E. E., & Borg, I. (1978). Geometric representation of relational data. Mathesis Press. Lord, F. M. (1958). Some relations between Guttman’s principal components of scale analysis and other psychometric theory. Psychometrika, 23, 291–296. Maung, K. (1941). Measurement of association in contingency tables with special reference to the pigmentation of hair and eye colours of Scottish children. Annals of Eugenics, 11, 189–223. McDonald, R. P. (1968). A unified treatment of the weighting problem. Psychometrika, 33, 351–381. McKeon, J. J. (1966). Canonical analysis: Some relations between canonical correlation, factor analysis, discriminant function analysis and scaling theory. Psychometric Monograph No. 13. Meulman, J. (1982). Homogeneity analysis of incomplete data. DSWO Press, Leiden University. Michailidis, G., & de Leeuw, J. (1998). The gifi system of descriptive multivariate analysis. Statistical Science, 13, 307–336.
References
123
Moreau, J., Doudin, P. A., & Cazes, P. (1999). L’Analyse des Correspondances et les Techniques Connexes: Approches Nouvelles pour l’Analyse Statistique des Données. Springer. Mosier, C. I. (1946). Machine methods in scaling by reciprocal averages. In Proceedings, Research Forum (pp. 35–39). International Business Corporation. Murtagh, F. (2005). Correspondence analysis and data coding with R and Java. Chapman and Hall. Nakache, J. P. (1982). Exercices Comment’es de Mathématiques pour l’Analyse des Données. Dunod. Nishisato, S. (1980). Analysis of categorical data: Dual scaling and its applications. The University of Toronto Press. Nishisato, S. (1982). Shitsuteki Data no Suryouka: Soutsuishakudo-ho to Sono Ohyou (Quantification of qualitative data: Dual scaling and its applications). Asakura Shoten. Nishisato, S. (1987). Robust techniques for quantifying categorical data. In MacNeil, I. B. & Umphrey, G. J. (eds.), Foundations of Statistical Inference, 209-217. Dordrecht, The Netherlands: D. Reidel Publishing Company. Nishisato, S. (1994). Elements of dual scaling: An introduction to practical data analysis. Lawrence Erlbaum Associates. Nishisato, S. (2007). Multidimensional nonlinear descriptive analysis. Chapman-Hall/CRC. Nishisato, S. (2007). Insight into data analysis: Fundamentals of quantification. Kwansei Gakuin University Press (in Japanese). Nishisato, S. (2010). Data analysis for behavioral sciences: Use of methods appropriate for information retrieval. Baifukan. (in Japanese). Nishisato, S. (2016). Quantification theory: Dual space and total space. In Paper presented at the annual meeting of the Behaviormetric Society (p. 27). Sapporo, Japan (In Japanese). Nishisato, S. (2019). Reminiscence: Quantification theory and graphs. Theory and Applications of Data Analysis, 8, 47–57. (in Japanese). Nishisato, S. (2019b). Expansion of contingency space: Theory of doubled multidimensional space and graphs. In An invited talk at the Annual Meeting of the Japanese Classification Society, Tokyo (in Japanese). Nishisato, S. (2022). Optimal quantification and symmetry. Springer. Nishisato, S., Beh, E. J., Lombardo, R., & Clavel, J. G. (2022). Modern quantification theory: Joint graphical display, biplots and alternatives. Springer. Nishisato, S., & Clavel, J. G. (2003). A note on between-set distances in dual scaling and correspondence analysis. Behaviormetrika, 30, 87–98. Nishisato, S., & Nishisato, I. (1984). An introduction to dual scaling. MicroStats. Nishisato, S., & Nishisato, I. (1994). Dual scaling in a Nutshell. MicroStats. Ohsumi, N., Lebart, L., Morineau, A., Warwick, K. M., & Baba, Y. (1994). Kijyutsuteki Tahenryou Kaiseki (Descriptive multivariate analysis). Nikkagiren (in Japanese). Richardson, M., & Kuder, G. F. (1933). Making a rating scale that measures. Personnel Journal, 12, 36–40. Saito, T. (1980). Tajigen Shakudo Kouseihou (Multidimensional scale construction). Asakura Shoten. Saporta, G. (1979). Theories et Méthodes de la Statistique. Technip. Schmidt, E. (1907). Zür Theorie der linearen und nichtlinearen Integralgleichungen. Esster Teil. Entwickelung willkürlicher Functionen nach Systemaen vorgeschriebener (On theory of linear and nonlinear integral equations. Part one. Development of arbitrary functions according to prescribed systems). Mathematische Annalen, 63, 433–476. Takane, Y. (1980). Tajigen Shakudo Ho (Multidimensional scaling). University of Tokyo Press (in Japanese). Takane, Y. (1995). Seiyakutsuki Shuseibun Bunsekiho: Atarashii Tahenryou Kaisekiho (Constrained principal component analysis: A new approach to multivariate data analysis). Asakura Shoten. Takeuchi, K., & Yanai, H. (1972). Tahenryou Kaiseki no Kiso (Foundations of multivariate analysis). Toyo Keizai-sha (in Japanese). Tenenhaus, M. (1994). Méthodes Statistiques en Gestion. Dunod.
124
7 General Introduction
Torgerson, W. S. (1958). Theory and methods of scaling. Wiley. van Buuren, S. (1990). Optimal scaling of time series. DSWO Press, Leiden University. van de Geer, J. P. (1993). Multivariate analysis of categorical data: Applications. Sage Publications. van der Burg, E.(1988). Nonlinear canonical correlation and some related techniques. Leiden University, DSWO Press. van der Heijden, P. G. M. (1987). Correspondence analysis of longitudinal data. DSWO Press, Leiden University. van Rijckevorsel, J. (1987). The applications of fuzzy coding and horseshoes in multiple correspondence analysis. DSWO Press, Leiden University. Verboon, P. (1994). A robust approach to nonlinear multivariate analysis. DSWO Press, Leiden University. Weller, J. C., & Romney, A. K. (1990). Metric scaling: Correspondence analysis. Sage Publications. Whittaker, R. H. (1978). Ordination of plant communities. Junk. Yanai, H., & Takane, Y. (1977). Tahenryou Kaiseki Ho (Multivariate analysis). Asakura Shoten (in Japanese). Young, G., & Householder, A. A. (1938). Discussion of a set of points in terms of their mutual distances. Psychometrika, 3, 19-22
Chapter 8
Geometry of Space: A New Look
8.1 Background We have outlined quantification approaches to analysis of nominal data, and we also noted problems associated with joint graphical display of rows and columns of the contingency table. These graphical display problems must be resolved now because the graphical display is vitally important, for most researchers depend on joint graphical displays for interpreting the analytical results. If we may boldly say, the only outstanding problem of quantification theory is the absence of an appropriate method for multidimensional joint graphical display. Therefore, in the current chapter, we would like to present a theory of doubled multidimensional space on which future methods of joint multidimensional graphical display may be built. So, this chapter offers a new look at quantification theory. When the so-called CGS scaling was proposed by Carroll et al. (1986), its main purpose was to map rows and columns of the contingency table in common space by analyzing the response-pattern table (Note: this idea was already described in Nishisato (1980), several years before the CGS proposal), derived from the contingency table. To our surprise, however, the CGS scaling proposal was strongly criticized by Greenacre (1989a), and heated arguments ensued (Carroll et al., 1986, 1987, 1989; Greenacre, 1989b). After Greenacre’s repeated criticism (Greenacre, 1989b) against the CGS scaling, Nishisato wrote a paper to the journal where the relevant papers had been published, explaining what the problems over the two-party arguments were, more specifically, that their arguments were out of focus. So long as they argued over the components from the contingency table and the corresponding components from the response-pattern table, there is no substantive difference, for the corresponding weights are proportional (i.e., only the singular values are different). Nishisato’s paper was to suggest that one must examine the extra components from the response-pattern table. However, his paper was rejected by the editor without a formal review, for the reason that all the arguments had already been exhaustively debated by the proponents and the opponent. Nishisato’s sole intention was to ful© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Nishisato, Measurement, Mathematics and New Quantification Theory, Behaviormetrics: Quantitative Approaches to Human Behavior 16, https://doi.org/10.1007/978-981-99-2295-6_8
125
126
8 Geometry of Space: A New Look
fill his promise to a distraught Douglas J. Carroll after Greenacre’s talk (Greenacre, 1989b) that he would clarify the problem in favor of the CGS scaling. Years later in 2015, Nishisato decided to send the aforementioned paper to another journal since he thought it was important enough. After the editor requested the author to shorten the paper three times, the editor unconditionally rejected the paper for the reason that it was “fundamentally flawed.” The editor did not explain to the author why it was so and abruptly declared to the author not to contact the editor again. With the kind help of Yasumasa Baba, Nishisato (2016) presented the same paper at a conference, and then with the encouragement of Ryozo Yoshino, he published a revised article in the journal of the Japanese Classification Society (JCS) (2019a) and gave an invited talk at its annual meeting (2019b), which is now called his theory of multidimensional space. This chapter is an outcome of these unfortunate and then fortunate events. In addition to Baba and Yoshino, there were strong supporters of the theory, among others, Eric J. Beh, Rosaria Lombardo and Jose G. Clavel who agreed and decided to write a book together (Nishisato et al., 2021), where the reason why the CGS proposal was destined to fail is demonstrated: The key for the CGS scaling rested in the extra components one can obtain from the response-pattern format of the contingency table. Unfortunately, these crucial extra dimensions were not taken into consideration by the CGS proponents or the opponent. All the issues discussed above can be solved by the following geometric space theory, based on Nishisato’s studies (Nishisato, 1980, 1994, 2007, 2016, 2019).
8.2 Geometric Space Theory To illustrate the theory, let us use a numerical example.
8.3 Rorschach Data The data were reported by Garmize and Rychlak (1964), in which the investigators devised a way to study the relations between subjects’ moods and their perceptions of the Rorschach inkblots. The original data set included 16 Rorschach symbols and 6 moods, but Nishisato (1994) noted that there were very few responses of Rorschach symbols of Bear, Boot(s), Bridge, Hair and Island and decided to delete those five symbols from his analysis. The deletions were based on his concerns that the variables with very few responses might contribute to possible outlier effects in quantification (Nishisato, 1984a, 1985, 1987, 1996; Verboon, 1994). We will follow his lead in the current chapter and will use the reduced data set of 11 × 6 table as shown in Table 8.1. Nishisato (1980) compared the contingency table, say F, its response-pattern table, Frp , and the condensed response-pattern table, Fp . He noted that one can construct
8.3 Rorschach Data
127
Table 8.1 Rorschach data and induced moods (Garmize & Rychlak, 1964) Rorschach Induced Moods Fear Anger Depression Love Ambition Bat Blood Butterfly Cave Clouds Fire Fur Mask Mountains Rocks Smoke
33 10 0 7 2 5 0 3 2 0 1
10 5 2 0 9 9 3 2 1 4 6
18 2 1 13 30 1 4 6 4 2 1
1 1 26 1 4 2 5 2 1 1 0
2 0 5 4 1 1 5 2 18 2 1
Security 6 0 18 2 6 1 21 3 2 2 0
Notes Rorschach symbols, Bear, Boot(s), Bridge, Hair and Island were dropped from the original data set, due to small frequencies
one form of the three formats from any one of the other two, and at the same time that they are structurally quite different, except for the response-pattern table and the condensed response-pattern table, which are structurally equivalent. Let us consider the response-pattern table and the condensed response-pattern table for the above Rorschach contingency table. The response-pattern table of the Rorschach data is very large since the dimension of the table is (the total number of responses)-by-(the sum of the number of rows and that of the columns). To make it feasible, let us consider only the condensed response-pattern table, which is (the total number of different response-patterns)-by-(the sum of the number of rows and that of columns). Table 8.2 is the condensed response-pattern table of our Rorschach data. In the table of the response-patterns, the following abbreviations are used to save the space: Ba = bat, Bl = blood, Bu = butterfly, Cl = clouds, Fi = fire, Fu = fur, Ma = masks, Mt = mountain, Ro = rock, Sm = smoke, Fe = fear, An = anxiety, Dp = depression, Am = ambition, Se = security and Lo = love. Let us define the following notation: (1) F is the m × n contingency table, where we assume that n ≥ m. In our Rorschach example, F is 11 × 6. Recall that the quantification task of the contingency table is to determine the weights for the rows and those for the columns so as to maximize the row-column correlation. This means that so long as the correlation is less than 1 the rows and the columns require different axes in their joint graphical display. (2) Fp is the N × (m + n) response-pattern matrix, where N is the total number of distinct response-patterns and we assume that (N − 1) ≥ (m + n − 2). Recall that in the response-pattern table, the rows and the columns of the contingency table now occupy the columns of Fp . Therefore, the quantification of the data
128
8 Geometry of Space: A New Look
Table 8.2 Condensed response-pattern table of Rorschach data Fp Ba
Bl
Bu
Ca
Cl
Fi
Fu
Ma Mt
Rc
Sm Fe
An
Dp
Am Se
Lo
33 10 18 1 2 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 10 5 2 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 2 1 26 5 18 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 7 13 1 4 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 9 30 4 1 6 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 5 9 1 2 1 1 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3 4 5 5 21 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 10 0 0 0 0 0 5 0 0 2 0 0 0 0 0 0 0 0 0 0 9 0 0 0 0 0 9 0 0 0 0 3 0 0 0 0 0
0 0 18 0 0 0 0 0 2 0 0 1 0 0 0 0 13 0 0 0 0 0 30 0 0 0 0 0 1 0 0 0 0 4 0 0 0 0
0 0 0 1 0 0 0 0 0 1 0 0 26 0 0 0 0 1 0 0 0 0 0 4 0 0 0 0 0 2 0 0 0 0 5 0 0 0
0 0 0 0 0 1 0 0 0 0 0 0 0 0 18 0 0 0 0 2 0 0 0 0 0 6 0 0 0 0 0 1 0 0 0 0 21 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
33 0 0 0 0 0 10 0 0 0 0 0 0 0 0 7 0 0 0 0 2 0 0 0 0 0 5 0 0 0 0 0 0 0 0 0 0 3
0 0 0 0 2 0 0 0 0 0 0 0 0 5 0 0 0 0 4 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 5 0 0
(continued)
8.3 Rorschach Data
129
Table 8.2 (continued) Ba Bl Bu Ca Cl 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Fi
Fu
Ma Mt
Rc
Sm Fe
An
Dp
Am Se
Lo
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
2 6 2 2 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 4 2 1 2 2 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 6 1 1
2 0 0 0 0 0 1 0 0 0 0 4 0 0 0 0 0 6 0 0
0 6 0 0 0 0 0 4 0 0 0 0 2 0 0 0 0 0 1 0
0 0 2 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0
0 0 0 0 3 0 0 0 0 0 2 0 0 0 0 2 0 0 0 0
0 0 0 0 0 2 1 4 1 18 2 0 0 0 0 0 0 0 0 0
0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 1 0 0 0
0 0 0 2 0 0 0 0 0 18 0 0 0 0 2 0 0 0 0 1
guarantees that the rows and the columns of the original contingency table occupy the same space, namely a single axis for both in each component. See the Young– Householder theorem (Young and Householder, 1938). Nishisato (1980) has shown that quantifications of F and Fp yield the following eigenvalues: 2 (1) F: we obtain ρ12 , ρ22 , . . . , ρm−1 for (m − 1) components. 2 2 (2) Fp : we obtain ρ p1 , ρ p2 , . . . , ρ 2p,m+n−2 for (m + n − 2) components.
8.3.1 Major Dual Space or Contingency Space This space is spanned by (m − 1) dimensions associated with the following eigenvalues: • Contingency table: 1 ≥ ρk2 > 0. • Response-pattern table: 1 ≥ ρ 2pk > 0.5.
130
8 Geometry of Space: A New Look
Table 8.3 Summary statistics for contingency table Component 1 2 3 ρ2
Eigenvalue Singular value ρ Delta(δ) CumuD ( δk ) Row-column angle (θ)
5
0.4633 0.6807
0.2505 0.5005
0.1704 0.4128
0.1281 0.3579
0.0721 0.2686
42.72 42.72
23.10 65.82
15.71 81.53
11.81 93.35
6.65 100.00
47◦
60◦
66◦
69◦
79◦
Table 8.4 Summary statistics for response-pattern format Component Space ρk2 ρk 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
4
Contingency = main dual
Residual
Minor dual
0.86 0.75 0.71 0.68 0.63 0.5 0.5 0.5 0.5 0.5 0.37 0.32 0.29 0.25 0.14
0.93 0.87 0.84 0.82 0.80 0.71 0.71 0.71 0.71 0.71 0.61 0.57 0.54 0.50 0.38
δ
Cumδ
11.42 10.00 9.43 9.06 8.45 6.67 6.67 6.67 6.67 6.67 4.89 4.23 3.90 3.34 1.91
11.42 21.42 30.85 39.91 48.36 55.02 61.69 68.36 75.02 81.69 86.58 90.85 94.75 98.09 100.00
• The two sets of coordinates, one from the contingency table and other from the response-pattern table, are proportional. Let us now look at basic statistics from the two data formats of Rorschach data to see their respective sets of eigenvalues and related statistics. In the table of the response-patterns, we already entered the names of the different kinds of quantification space—please wait till later when we define different kinds of space. At this stage, please note that ranges of the eigenvalues for the two data types (Tables 8.3 and 8.4). Note that major dual space (contingency space) is where we typically deal with the analysis of contingency tables. In other words, major dual space is the total space for the quantification of the contingency table, but not for the response-pattern table.
8.3 Rorschach Data
131
Table 8.5 Coordinates of five contingency components Component 1 2 3 Bat Blood Butterfly Cave Clouds Fire Fur Mask Mountain Rocks Smoke Fear Anger Depression Ambition Security Love
− 0.70 − 0.87 1.17 − 0.37 − 0.23 − 0.42 0.78 − 0.05 0.34 0.11 − 0.57 − 0.87 − 0.44 − 0.39 1.03 0.42 0.76
− 0.16 − 0.35 − 0.44 0.30 − 0.08 − 0.30 − 0.08 0.04 1.54 0.15 − 0.05 − 0.17 − 0.23 0.09 − 0.51 1.27 − 0.23
0.15 0.60 0.17 − 0.44 − 0.75 0.63 − 0.08 − 0.21 0.31 0.13 0.57 0.39 0.35 − 0.68 0.15 0.29 − 0.09
4
5
− 0.34 − 0.18 − 0.15 − 0.34 0.30 0.59 0.01 − 0.04 − 0.04 0.70 1.25 − 0.48 0.75 0.02 − 0.12 0.00 − 0.08
− 0.08 0.06 0.32 0.11 0.13 0.07 − 0.67 0.02 0.11 − 0.08 0.00 − 0.03 − 0.02 0.10 0.48 0.05 − 0.47
This is also the space where the largest m − 1 components of the response-pattern table span. It is particularly important that as we will see later the coordinates of the (m − 1) components from the contingency table and those of the (m − 1) components from the response-pattern table are proportional (note: However, this is the space where the CGS controversy was focused). Let us now look at the corresponding coordinates of Rorschach data in the two data formats in major dual space (Tables 8.5 and 8.6). The fact that the principal coordinates from the two data formats are proportional 2 , and the eigenvalues also means that the eigenvalues from the contingency table, ρcj 2 from the response-pattern table, ρ pj , are proportional, namely 2 ∝ ρ 2pj . ρcj
Hence, all the corresponding components from the two response formats are proportional, and thus as far as major dual space is concerned, there is no substantial difference between the contingency-table analysis and the response-pattern table analysis. This is what the CGS scaling proponents and the opponent were argued about, but we can now see that there is nothing to argue about, because “proportional” means there is no difference in results between the two data formats.
132
8 Geometry of Space: A New Look
Table 8.6 Principal coordinates of major dual space Co 1 2 3 Ba Bl Bu Ca Cl Fi Fu Ma Mo Ro Sm Fe An De Am Se Lo
− 1.09 − 1.12 1.51 − 0.47 − 0.26 − 0.53 1.12 − 0.04 0.40 0.19 − 0.71 − 1.19 − 0.56 − 0.50 1.29 0.51 1.19
0.26 0.63 0.74 − 0.50 0.17 0.54 0.12 − 0.06 − 2.66 − 0.23 0.13 0.29 0.43 − 0.12 0.87 − 2.18 0.39
− 0.34 − 1.17 − 0.45 0.89 1.53 − 1.20 0.202 0.43 − 0.64 − 0.21 − 1.02 − 0.79 − 0.63 1.37 − 0.42 − 0.59 0.22
4
5
− 0.74 − 0.32 − 0.51 − 0.83 0.57 1.39 0.19 − 0.12 − 0.07 1.62 2.90 − 1.02 1.73 − 0.03 − 0.50 0.01 − 0.01
− 0.24 − 0.17 0.92 0.13 0.49 0.23 − 1.97 0.00 0.34 0.00 0.33 − 0.34 0.11 0.30 1.40 0.18 − 1.40
Notes Co = component, Ba = bat, Bl = blood, Bu = butterfly, Ca = cave Cl = clouds, Fi = fire, Fu = fur, Ma = mask, Mo = mountains, Ro = rock, Sm = smoke Fe = fear, An = anger, Dep = depression, Am = ambition, Se = security, Lo = love
8.3.2 Residual Space of Response-Pattern Table This space exists only for the response-pattern table. When n > m the residual space has n − m dimensions with all the eigenvalues equal to: ρ 2pk = 0.5 Note that the eigenvalues of 0.5 for the response-pattern format correspond to the eigenvalues of 0 for the contingency-table format (Nishisato, 1980). Thus, we may say that the residual space does not contribute to any discovery of relations between rows and columns of the data. This is so because all the row weights or all the column weights in residual space are zero, indicating that there is no information about the relations between rows and columns of the contingency table. Thus, all the components in residual space are typically discarded. The above discussion can be verified when we look at the principal coordinates of the variables of the response-pattern table in residual space as in Table 8.7. Note that the residual space is defined only for the response-pattern table when the number of rows and that of columns of the contingency table are different.
8.3 Rorschach Data
133
Table 8.7 Principal coordinates of residual space Co 6 7 8 Ba Bl Bu Ca Cl Fi Fu Ma Mo Ro Sm Fe An De Am Se Lo
− 0.77 0.25 − 0.07 2.60 − 0.95 0.61 − 0.12 1.38 − 0.71 0.83 0.76 0 0 0 0 0 0
0 0.74 0.14 − 0.45 − 0.12 − 2.51 0.00 1.61 − 0.01 − 1.53 3.77 0 0 0 0 0 0
0 − 0.28 0.20 1.31 0.10 0.11 0.33 − 3.15 0.03 − 2.30 2.26 0 0 0 0 0 0
9
10
0 1.39 0.11 0.29 − 0.02 − 2.06 − 0.12 − 1.73 − 0.18 3.85 0.03 0 0 0 0 0 0
1.03 − 3.31 0.17 0.14 − 0.55 − 0.16 − 0.16 0.10 − 0.36 1.57 1.40 0 0 0 0 0 0
Notes Co = component, Ba = bat, Bl = blood, Bu = butterfly, Ca = cave Cl = clouds, Fi = fire, Fu = fur, Ma = mask, Mo = mountains, Ro = rocks, Sm = smoke Fe = fear, An = anger, Dep = depression, Am = ambition, Se = security, Lo = love
8.3.3 Minor Dual Space of Response-Pattern Table This space is spanned by the (m − 1) components associated with the eigenvalues smaller than 0.5. This space is ignored when one quantifies the contingency table, but it is crucially important because this space contains the amount of information which is completely ignored by the traditional quantification of the contingency table. Minor dual space is therefore where we can identify what is being missed by correspondence plot (Table 8.8). The importance of minor dual space is not immediately obvious, but it is extremely important when we discuss the joint graphical display of rows and columns of the contingency table. Let us wait to discuss the role of minor dual space until we define dual subspace shortly.
8.3.4 Dual Space This space consists of major dual space and minor dual space which span 2(m − 1)dimensional space, the origin of the name doubled multidimensional space. This
134
8 Geometry of Space: A New Look
Table 8.8 Principal coordinates of minor dual space Co 11 12 13 Ba Bl Bu Ca Cl Fi Fu Ma Mo Ro Sm Fe An De Am Se Lo
− 0.18 − 0.13 0.70 0.10 0.37 − 0.18 − 1.50 0.00 0.26 0.00 0.25 0.26 − 0.08 − 0.23 − 1.07 − 0.14 1.07
− 0.51 − 0.22 − 0.35 − 0.57 0.39 0.96 0.13 − 0.08 − 0.05 1.11 1.99 0.70 − 1.19 0.02 0.34 − 0.01 0.00
− 0.22 − 0.75 − 0.29 0.57 0.98 − 0.77 0.13 0.28 − 0.41 − 0.13 − 0.66 0.51 0.41 − 0.88 0.27 0.38 − 0.14
14
15
0.15 0.36 0.43 − 0.29 0.10 0.31 0.07 − 0.03 − 1.53 − 0.13 0.07 − 0.17 − 0.25 0.07 − 0.50 1.26 − 0.22
0.44 0.46 − 0.62 0.19 0.11 0.22 − 0.46 0.02 − 0.16 − 0.08 0.29 − 0.49 − 0.23 − 0.20 0.53 0.21 0.49
Notes Co = component, Ba = bat, Bl = blood, Bu = butterfly, Ca = cave Cl = clouds, Fi = fire, Fu = fur, Ma = mask, Mo = mountains, Ro = rocks, Sm = smoke Fe = fear, An = anger, Dep = depression, Am = ambition, Se = security, Lo = love
space contains the entire information associated with the relations between the rows and the columns of the contingency table, and this is the space where rows and columns of the contingency space span fully, that is, the space where we can calculate the distance between a row variable and a column variable from the joint graph. Recall the warning by Lebart et al. (1977) that one cannot calculate the distance between a row and a column from a correspondence plot. We now know the reason for this sensible warning, because in correspondence plot one does not use coordinates in dual space, but employs half the space, that is, major dual space or contingency space, where the coordinates of rows and those of columns are imperfectly correlated, and thus, it is necessary to represent row variables on one axis and column variables on another axis. These two axes, however, are not orthogonal to each other, but are obliquely positioned in two-dimensional space. Now the most important topic of Nishisato’s space theory is the concept of dual subspace. We will learn that for the exact graph for the decomposition of the contingency information, we must use the coordinates in dual subspace, which has doubled dimensions of the contingency space.
8.4 Dual Subspace, A Bridge Between Data Types
135
8.4 Dual Subspace, A Bridge Between Data Types Dual subspace is a key concept in Nishisato’s theory of doubled multidimensional space because this space is what we need to know when we try to understand the relation between the contingency-table analysis and the response-pattern table analysis. Each dual subspace consists of one component from major dual space and one from minor dual space such that the sum of their eigenvalues is equal to 1.00. This is crucially important for the theory of doubled space. The pair of components in dual space provides the exact two-dimensional coordinates of rows and columns of each contingency-table component if we are to represent them in common space. In other words, the two components in each dual subspace provide the exact Euclidean coordinates of rows and columns of each contingency-table component, this being so in common Euclidean space. Thus, the coordinates of the two components in each dual subspace are the mathematically correct two-dimensional coordinates for rows and columns of each contingency-table component. Recall that we obtained each contingency-table component by maximizing the correlation between rows and columns, but that the correlation almost never reaches 1. When we converted the correlation associated with each contingency component into the angle between the row axis and the column axis of the exact Euclidean space (see those angles listed in the key statistics of the contingency table, that is, Table 8.3). If the correlation between rows and columns is one, this angle is zero, but our Rorschach example shows that the discrepancy angles range from 47 ◦ C (the smallest) to 79 ◦ C (the largest)!! How can we ignore these angle discrepancies and adopt 0 degree discrepancy for the joint graphical display! It sounds like an outrageous handling of the joint graph. We know that two imperfectly correlated variables require a two-dimensional graph, and we have just found out the exact two-dimensional coordinates! Indeed, this dual subspace provides the exact two-dimensional coordinates for each contingencytable component!! Thus, we have finally solved the perennial problem of joint graphical display. The CGS scaling did not go far enough to reach this conclusion. For our Rorschach data, principal coordinates of rows and columns of the contingency table in dual subspace are given in Table 8.9. The two-dimensional graphs for the five components of the contingency table are shown in five graphs (Figs. 8.1, 8.2, 8.3, 8.4 and 8.5). See even when the correlation between rows and columns is high, the correct axis for the rows and the correct axis for the columns show a fairly wide angle at the origin, telling us how inaccurate the unidimensional corresponding plot may be. These graphs, one the traditional one-dimensional correspondence plot and the other, the mathematically correct two-dimensional plot, are very instructive to tell us that even if we can interpret the graph by correspondence plot, it is totally insufficient for the correct interpretation of the data configuration. Traditionally many researchers adopt correspondence graphs, based on contingency-table components, if they are
136
8 Geometry of Space: A New Look
Table 8.9 Dual subspace: reconciliation of two data formats Dual Sub 1 Dual Sub2 Dual Sub3 Dual Cmp 1 15 2 14 3 13 4 θ ρ2 Ba Bl Bu Ca Cl Fi Fu Ma Mo Ro Sm Fe An De Am Se Lo
47◦ 0.86 − 1.09 − 1.12 1.51 − 0.47 − 0.26 − 0.53 1.12 − 0.04 0.40 0.19 − 0.71 − 1.19 − 0.56 − 0.50 1.29 0.51 1.19
0.14 0.44 0.46 − 0.62 0.19 0.11 0.22 − 0.46 0.02 − 0.16 − 0.08 0.29 − 0.49 − 0.23 − 0.20 0.53 0.21 0.49
60◦ 0.75 0.26 0.63 0.74 − 0.50 0.17 0.54 0.12 − 0.06 − 2.66 − 0.23 0.13 0.29 0.43 − 0.12 0.87 − 2.18 0.39
0.25 0.15 0.36 0.43 − 0.29 0.10 0.31 0.07 − 0.03 − 1.53 − 0.13 0.07 − 0.17 − 0.25 0.07 − 0.50 1.26 − 0.22
66◦ 0.71 − 0.34 − 1.17 − 0.45 0.89 1.53 − 1.20 0.20 0.43 − 0.64 − 0.21 − 1.02 − 0.79 − 0.63 1.37 − 0.42 − 0.59 0.22
0.29 − 0.22 − 0.75 − 0.29 0.57 0.98 − 0.77 0.13 0.28 − 0.41 − 0.13 − 0.66 0.51 0.41 − 0.88 0.27 0.38 − 0.14
69◦ 0.68 − 0.74 − 0.32 − 0.51 − 0.83 0.57 1.39 0.19 − 0.12 − 0.07 1.62 2.90 − 1.02 1.73 − 0.03 − 0.50 0.01 − 0.01
Sub4 12
Dual 5
Sub5 11
0.32 − 0.51 − 0.22 − 0.35 − 0.57 0.39 0.96 0.13 − 0.08 − 0.05 1.11 1.99 0.70 − 1.19 0.02 0.34 − 0.01 0.01
79◦ 0.63 − 0.24 − 0.17 0.92 0.13 − 0.49 0.23 − 1.97 0.00 0.34 0.00 0.33 − 0.34 0.11 0.30 1.40 0.18 − 1.40
0.37 − 0.18 − 0.13 0.70 0.10 0.37 − 0.18 − 1.50 0.00 0.26 0.00 0.25 0.26 − 0.08 − 0.23 − 1.07 − 0.14 1.07
Note 1 Cmp = component, Ba = bat, Bl = blood, Bu = butterfly, Ca = cave Cl = clouds, Fi = fire, Fu = fur, Ma = mask, Mo = mountains, Ro = rocks, Sm = smoke Fe = fear, An = anger, Dep = depression, Am = ambition, Se = security, Lo = love Note 2 These angles θ in this table are from Stage 1 analysis
interpretable. These five examples are sufficient enough to tell us how wrong the practice can be. Indeed, the “interpretable graph” can be quite misleading and may be a wrong strategy.
8.4.1 A Shortcut for Finding Exact Coordinates It is often the case that even the condensed response-pattern table is still too large to analyze. For this type of situation, Nishisato proposed the following formulas to derive the (m − 1) additional components from the contingency-table analysis for dual subspace. For row i and column j of component k of the contingency-table analysis, the two-dimensional coordinates in dual subspace can be calculated by
8.5 Conclusions
137
Fig. 8.1 Component 1: contingency versus dual subspace
Row i: [ρk yi j , ρk yik sin
θk ] 2
Columns j: [ρk x jk, −ρk x jk sin
θk ] 2
See the applications of the above formulas in Nishisato et al. (2021).
8.5 Conclusions We have finally settled a long debate on the perennial problem of joint graphical display via a theory of doubled multidimensional space. The exact Euclidean coordinates for each contingency-table component are given by the coordinates of rows and columns in dual subspace.
138
Fig. 8.2 Component 2: contingency versus dual subspace
Fig. 8.3 Component 3: contingency versus dual subspace
8 Geometry of Space: A New Look
8.5 Conclusions
Fig. 8.4 Component 4: contingency versus dual subspace
Fig. 8.5 Component 5: contingency versus dual subspace
139
140
8 Geometry of Space: A New Look
The solution to the problem, however, does not mean that joint graphical display can now play a role of interpreting multidimensional graphs. The problem of joint graphical display is another matter that we must work out in order to make it useful. This is the main topic of Chap. 10. Before discussing the joint graph, let us look at a new approach to quantification theory, that is, the two-stage quantification in the next chapter.
References Carroll, J. D., Green, P. E., & Schaffer, C. M. (1986). Interpoint distance comparisons in correspondence analysis. Journal of Marketing Research, 23, 271–280. Carroll, J. D., Green, P. E., & Schaffer, C. M. (1987). Comparing interpoint distances in correspondence analysis: A clarification. Journal of Marketing Research, 24, 445–450. Carroll, J. D., Green, P. E., & Schaffer, C. M. (1989). Reply to Greenacre’s commentary on the Carroll-Green-Schaffer scaling of two-way correspondence analysis solutions. Journal of Marketing Research, 26, 366–368. Garmize, L. M., & Rychlak, J. F. (1964). Role-play validation of a socio-cultural theory of symbolism. Journal of Consulting Psychology, 28, 107–115. Greenacre, M. J. (1989). The Carroll-Green-Schaffer scaling in correspondence analysis: A theoretical and empirical appraisal. Journal of Marketing Research, 26, 358–365. Greenacre, M. J. (1989). An invited talk on the CGS scaling at the meeting of the International Federation of Classification Societies. Charlottesville. Lebart, L., & Mirkin, B. D. (1993). Correspondence analysis and classification. Multivariate Analysis: Future Directions, 2, 341–357. Lebart, L., Morineau, A., & Tabard, N. (1977). Techniques de la Description Statistique: Méthodes et Logiciels pour l’Analyse des Grands Tableaux. Dunod. Nishisato, S. (1980). Analysis of categorical data: Dual scaling and its applications. The University of Toronto Press. Nishisato, S. (1984a). Dual scaling by reciprocal medians. Proceedings of the 32nd Scientific Conference of the Italian Statistical Society (pp. 141–147), Sorrento, Italy. Nishisato, S. (1984). Forced classification: A simple application of a quantification technique. Psychometrika, 49, 25–36. Nishisato, S. (1985). Methods of handling outlier responses in dual scaling. Proceedings of the 49th Annual Meeting of the Japanese Psychological Association (p. 501). Tokyo, Japan. Nishisato, S. (1987). Robust techniques for quantifying categorical data. In I. B. MacNeil & G. J. Umphrey (Eds.), Foundations of statistical inference (pp. 209–217). D. Reidel Publishing Company. Nishisato, S. (1991). Standardizing multidimensional space for dual scaling. Proceedings of the 20th Annual Meeting of the German Operations Research Society (pp. 584–591), Hohenheim University. Nishisato, S. (1994). Elements of dual scaling: An introduction to practical data analysis. Lawrence Erlbaum Associates. Nishisato, S. (1996). Gleaning in the field of dual scaling. Psychometrika, 61, 559–599. Nishisato, S. (2007). Multidimensional nonlinear descriptive analysis. Chapman-Hall/CRC. Nishisato, S. (2016). Quantification theory: Dual space and total space. Paper presented at the Annual Meeting of the Behaviormetric Society, Sapporo, Japan, p. 27 (In Japanese). Nishisato, S. (2019a). Reminiscence: Quantification theory and graphs. Theory and Applications of Data Analysis, 8, 47–57 (in Japanese).
References
141
Nishisato, S. (2019b). Expansion of contingency space: Theory of doubled multidimensional space and graphs. In An invited talk at the Annual Meeting of the Japanese Classification Society, Tokyo (in Japanese). Nishisato, S., & Baba, Y. (1999). On contingency, projection and forced classification of dual scaling. Behaviormetrika, 26, 207–219. Nishisato, S., Beh, E. J., Lombardo, R., & Clavel, J. G. (2021). Modern quantification theory: Joint graphical display, biplots, and alternatives. Springer Nature. Verboon, P. (1994). A Robust approach to nonlinear multivariate analysis. DSWO Press: Leiden University. Young, G., & Householder, A. A. (1938). Discussion of a set of points in terms of their mutual distances. Psychometrika, 3, 19–22.
Chapter 9
Two-Stage Quantification: A New Look
In Chap. 8, we discussed a theory of doubled multidimensional space, where we introduced such concepts as major dual space (contingency space), residual space, minor dual space, and then we discussed “dual subspace” as the defining concept of joint graphical display (see Nishisato, 1980, 2016, 2019a, 2019b, Nishisato et al., 2021). We are now in a position to apply the space theory to real data. At the same time, we need one more proposition. Following Nishisato (2023), we will introduce the so-called two-stage analysis of a two-way table: . • The first stage is to analyze the data in a contingency-table format for the search of the row-column associations. • The second stage is to identify the joint principal coordinates of row and column variables through the analysis of the response-pattern format of the same data. The current author and his long-time collaborator José G. Clavel have often used the following two data sets to demonstrate their findings: 1. Stebbins’ Barley Data: Stebbins provided a numerical example of his minitheory of evolution (Stebbins, 1950). This is a case where the number of rows and that of columns of the contingency table are equal. This means that this example does not involve residual space. 2. Garmize and Rychlak’s Rorschach Data: Garmize and Rychlak (1964) have investigated the relations between the perceptions of Rorschach inkblots and moods of subjects. This data set has already been used in the previous chapter. Unlike the barley data, we will see residual space in this example. Using these two familiar data, we would like to demonstrate our two-stage quantification task, hoping these processes will become a standard mode of quantification theory. The two-stage analysis reveals a number of interesting aspects of quantification, and in particular, it will show the true picture of the contingency-table components, each of which is not unidimensional but two-dimensional, as demonstrated © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Nishisato, Measurement, Mathematics and New Quantification Theory, Behaviormetrics: Quantitative Approaches to Human Behavior 16, https://doi.org/10.1007/978-981-99-2295-6_9
143
144
9 Two-Stage Quantification: A New Look
in the previous chapter. We noted then that this had been the cause of the so-called perennial problem of joint graphical display for many years. Let us now go through our quantification procedures with these real data sets, using our new strategy of two-stage quantification analysis.
9.1 Barley Data In 1950, Stebbins (1950) published a book in which he provided a data set that demonstrated how different varieties of barley survived in different locations over years. This was an empirical demonstration of the theory of evolution. He collected data on six varieties of barley at six locations in the USA over some ten years time and studied how those varieties of barley would survive at different agricultural stations under different climatic conditions. The procedure was to collect equal numbers of seeds of the six varieties, planted them, and at the harvest 500 seeds were randomly selected and they were classified into the six varieties, and those 500 seeds were planted at the respective locations, and at the harvest, again 500 seeds were randomly chosen, and classified into six varieties of barley, and so on with the same procedure over some ten years. The results were tabulated in Table 9.1. Are any of the different agricultural stations better suited for any varieties of barley? This is an interesting problem to investigate. Notice that the number of rows is equal to the number of columns. This means that in our Stage 2 analysis, this data set does not have residual space, meaning that the entire space consists of dual major space (contingency space) and dual minor subspace, or simply dual space. Thus, equating the number of rows to the number of columns has an advantage of eliminating additional computation involved in residual space.
Table 9.1 Stebbins’ barley data Arlington Ithaca Coast Trebi 446 Hanchen 4 White 4 Smyrna Manchuria 1 Gatemi 13 Meloy 4
St. Paul
Moccasin
Moro
Davis
57 34 0
83 305 4
87 19 241
6 4 489
362 34 65
343 9 0
2 15 0
21 58 4
0 0 0
0 1 27
Note Arlington, Virginia; Ithaca, New York; St. Paul, Minnesota Moccasin, Montana; Moro, Oregon; Davis, California
9.1 Barley Data
145
Table 9.2 Order 0 approximation (row-column independence) Arlington Ithaca St. Paul Moccasin Coast Trebi 179.2 Hanchen 68.9 White 138.2 Smyrna Manchuria 63.2 Gatemi 16.5 Meloy 6.0
155.3 59.7 119.8
163.2 62.7 125.9
189.4 72.8 146.1
185.6 71.3 143.2
59.3 15.5 5.7
54.7 14.3 5.2
57.6 15.1 5.5
66.8 17.5 6.4
65.4 17.1 6.2
Moro
Davis
200.5 120.6 −119.1
164.1 75.0 51.5
150.7 41.0 222.5
159.2 20.4 378.9
184.7 69.6 150.9
225.2 11.2 4.6
100.3 13.1 4.9
−6.9 16.7 5.9
−88.4 21.5 7.3
60.3 17.3 6.3
Moro
Davis
Table 9.4 Order 2 approximation = order 1 + component 2 Arlington Ithaca St. Paul Moccasin Coast Trebi 306.5 Hanchen 125.1 White 26.3 Smyrna Manchuria −11.0 Gatemi 14.9 Meloy 10.2
Davis
168.2 64.6 129.7
Table 9.3 Order 1 approximation = order 0 + component 1 Arlington Ithaca St. Paul Moccasin Coast Trebi 181.8 Hanchen 73.3 White 118.3 Smyrna Manchuria 76.5 Gatemi 16.2 Meloy 5.9
Moro
36.0 52.3 2.2
276.6 121.7 −31.5
104.1 21.7 256.9
31.7 −32.5 473.0
286.1 111.7 76.1
340.6 12.8 −0.9
21.4 12.0 8.8
25.8 17.2 4.3
1.0 22.8 3.0
−10.8 16.2 9.7
9.1.1 Stage 1 Analysis Let us first look at how the successive extractions of components reach the perfect reproduction of the original data, using the k- order approximations Nishisato and Nishisato (1994). See Tables 9.2, 9.3, 9.4, 9.5, 9.6, 9.7. We can see that Order 3 approximation to the input data are very close to the entire set of data, although the data still require five dimensions for the complete reproduction of the input data. The summary statistics of quantification are given in Table 9.8. Observe the angles of separation between the axis for rows and the axis for columns of each component. Particularly, look at the last component. 91◦ ! This means that the fifth component
146
9 Two-Stage Quantification: A New Look
Table 9.5 Order 3 approximation = order 2 + component 3 Arlington Ithaca St. Paul Moccasin Coast Trebi 434.6 Hanchen 5.1 White 3.7 Smyrna Manchuria 1.8 Gatemi 11.6 Meloy 15.3
81.4 304.6 3.0
98.2 27.2 257.9
−2.3 −0.7 479.0
372.9 30.4 60.7
342.6 12.3 −0.1
1.8 17.1 1.0
25.2 17.4 4.1
−2.4 23.7 1.6
−2.2 14.0 13.2
Moro
Davis
56.8 34.0 0.0
81.7 305.0 3.8
90.7 18.9 241.6
2.0 4.1 488.4
375.8 33.7 67.1
343.0 9.0 0.1
2.0 15.2 1.2
20.9 57.3 0.6
0.1 0.8 3.6
−0.5 −1.6 14.5
Table 9.7 Order 5 approximation = order 4 + component 5 = data Arlington Ithaca St. Paul Moccasin Coast Trebi 446 Hanchen 4 White 4 Smyrna Manchuria 1 Gatemi 13 Meloy 4
Davis
56.2 33.3 −1.4
Table 9.6 Order 4 approximation = order 3 + component 4 Arlington Ithaca St. Paul Moccasin Coast Trebi 433.9 Hanchen 4.3 White 2.1 Smyrna Manchuria 1.4 Gatemi 15.3 Meloy 14.9
Moro
Moro
Davis
57 34 0
83 305 4
87 19 241
6 4 489
362 34 65
343 9 0
2 15 0
21 58 4
0 0 0
0 1 27
of the contingency table consists of two almost orthogonal components, that is, the axis for row variables is almost orthogonal to the axis for column variables. Our observation of Order 3 and Order 4 approximations can be verified by a huge gap in the amount of information, expressed by the eigenvalues, between Component 3 and Component 4. In terms of the row-column discrepancy angles, Component 3 shows 51◦ , while Component 4 shows 84◦ which is so close to 90◦ . The same angle for Component 5 exceeded 90◦ ! We wonder what this means! At this juncture, we should simply state that there is sufficient information that this data set cannot fully be accommodated in five-dimensional space.
9.1 Barley Data
147
Table 9.8 Summary statistics of barley data in contingency space Component 1 2 3 ρ2
Eigenvalue Singular value ρ Delta(δ) CumuD( δk ) Row-column angle (θ)
0.7352 0.8575 38.18 38.18 34◦
0.6325 0.7953 32.85 71.03 41◦
0.4802 0.6930 24.94 95.97 51◦
Table 9.9 Projected weights of five contingency components Component 1 2 3 Coast Trebi Hanchen White Smyrna Manchuria Gatemi Meloy Arlington Ithaca St. Paul Moccasin Moro Davis
0.11 0.49 −1.10 1.60 −0.16 −0.10 0.11 1.50 0.45 −0.60 −1.25 −0.04
0.68 0.74 −0.65 −1.36 −0.07 0.69 0.81 −1.14 0.85 −0.33 −0.79 0.64
−0.61 1.49 0.14 −0.17 0.17 −0.72 −0.81 −0.14 1.43 0.04 0.20 −0.53
4
5
0.0586 0.2421 3.04 99.01 84◦
0.0190 0.1379 0.99 100.00 91◦
4
5
−0.02 −0.06 −0.06 −0.04 1.26 −0.30 0.04 −0.04 −0.03 0.51 −0.25 −0.18
0.04 0.00 0.01 0.00 −0.09 −1.19 0.21 0.00 0.03 −0.07 0.07 −0.23
Let us list five-dimensional coordinates of six varieties of barley and six locations of the firms. These coordinates are typically used for graphs, so-called correspondence plots (Table 9.9). At this stage, we move on with our quantification to Stage 2.
9.1.2 Stage 2 Analysis We first recall that the number of rows is equal to that of columns in this data set, and according to our theory of space, this data set does not have residual space involved. The contingency table is first converted to the condensed response-pattern format, as shown in Table 9.10. When the response-pattern table is subjected to quantification analysis, we obtain the following basic statistics as in Table 9.11. Let us look at the principal coordinates of the ten components. As we see in the summary table, the first five components belong to major dual space (i.e., the eigenvalues are greater than 0.5) and the remaining five components lie in minor dual
148
9 Two-Stage Quantification: A New Look
Table 9.10 Response-pattern table of the entire data set C&T Han WS Man Gat Mel Arl Ith 446 57 83 87 6 362 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 4 34 305 19 4 34 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 4 4 241 489 65 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 343 2 21 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 13 9 15 58 1 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 4 4 27
446 0 0 0 0 0 4 0 0 0 0 0 4 0 0 0 0 1 0 0 0 13 0 0 0 0 4 0 0
0 57 0 0 0 0 0 34 0 0 0 0 0 0 0 0 0 0 343 0 0 0 9 0 0 0 4 0 0
StP
Moc
Mor
Dav
0 0 83 0 0 0 0 0 305 0 0 0 0 4 0 0 0 0 0 2 0 0 0 15 0 0 0 0 0
0 0 0 87 0 0 0 0 0 19 0 0 0 0 241 0 0 0 0 0 21 0 0 0 58 0 0 4 0
0 0 0 0 6 0 0 0 0 0 4 0 0 0 0 489 0 0 0 0 0 0 0 0 0 1 0 0 0
0 0 0 0 0 362 0 0 0 0 0 34 0 0 0 0 65 0 0 0 0 0 0 0 0 0 0 0 27
space (i.e., the eigenvalues less than 0.5). Those components in minor dual space are the ones that contingency-table analysis ignores completely, but they are necessary to build sufficient space for both rows and columns to be located in common space (Table 9.12). For this data set, there are five pairs of dual subspace components. Again the defining condition of each pair is that the sum of the two eigenvalues is equal to 1, where one eigenvalue is greater than 0.5 and the other smaller than 0.5. Therefore, the five sets of dual subspace components are:
9.1 Barley Data
149
Table 9.11 Basic statistics Component Space 1 2 3 4 5 6 7 8 9 10
Major dual space
Minor dual space
ρk2
ρk
δ
Cumδ
0.93 0.90 0.85 0.62 0.57 0.43 0.38 0.15 0.10 0.07
0.96 0.95 0.92 0.79 0.75 0.66 0.62 0.39 0.32 0.26
18.57 17.95 16.93 12.41 11.37 8.63 7.59 3.07 2.05 1.43
18.57 36.53 53.46 65.87 77.25 85.87 93.46 96.53 98.57 100.00
Table 9.12 Principal coordinates of 10 components Comp 1 2 3 4 5 6 ρ2 C&T Han WS Man Gat Mel Arl Ith StP Moc Mor Dav
0.93 0.12 0.55 −1.23 1.80 −0.20 −0.12 0.13 1.69 0.50 −0.68 −1.40 −0.05
0.90 −0.81 −0.88 0.78 1.61 0.11 −0.82 −0.97 1.36 −1.01 0.40 0.94 −0.76
0.85 −0.81 1.97 0.18 −0.23 0.25 −0.96 −1.08 −0.18 1.90 0.06 0.27 −0.71
0.62 −0.06 −0.21 −0.21 −0.12 4.09 −1.06 0.18 −0.13 −0.08 1.65 −0.81 −0.60
0.57 −0.24 0.01 −0.05 0.02 0.56 6.47 −1.15 −0.02 −0.14 0.43 −0.38 1.26
0.43 −0.21 0.01 −0.05 0.02 0.49 5.64 1.00 0.02 0.12 −0.38 0.33 −1.10
7
8
9
10
0.38 −0.05 −0.16 −0.16 −0.09 3.19 −0.83 −0.14 0.10 0.06 −1.29 0.64 0.47
0.15 −0.34 0.84 0.08 −0.10 0.10 −0.41 0.46 0.08 −0.81 −0.02 −0.11 0.30
0.10 −0.27 −0.30 0.26 0.54 0.04 −0.28 0.33 −0.46 0.34 −0.14 −0.32 0.26
0.07 0.03 0.15 −0.34 0.50 −0.05 −0.03 −0.03 −0.47 −0.14 0.19 0.39 0.01
Note C&T = Coast&Trebi; Han = Hanchen; WS = White Smyrna Man = Manchuria; Gat = Gatemi; Mel = Meloy; Arl = Arlington; Ith = Ithaca StP = St. Paul; Moc = Moccasin; Mor = Moro; Dav = Davis
⎧ ⎫ 2 (Dual subspace 1): ρ12 + ρ10 = 0.93 + 0.07 = 1 ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ 2 2 ⎪ ⎪ + ρ = 0.90 + 0.10 = 1 (Dual subspace 2): ρ ⎪ ⎪ 2 9 ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎨ 2 2 (Dual subspace 3): ρ3 + ρ8 = 0.85 + 0.15 = 1 ⎬ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ (Dual subspace 4): ρ42 + ρ72 = 0.62 + 0.38 = 1 ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ 2 2 ⎪ ⎪ (Dual subspace 5): ρ + ρ = 0.57 + 0.43 = 1 ⎪ ⎪ 6 5 ⎪ ⎪ ⎪ ⎪ ⎩ ⎭
150
9 Two-Stage Quantification: A New Look
Table 9.13 Dual subspace: Reconciliation of the two data formats Cmp Dual Sub 1 Dual Sub2 Dual Sub3 Dual 1 10 2 9 3 8 4 ρ2 Coast Trebi Han WS Man Gat Mel Arl Ith StP Moc Mor Dav
Sub4 7
Dual 5
Sub5 6
0.93 0.12
0.07 0.03
0.90 −0.81
0.10 −0.27
0.85 −0.81
0.15 −0.34
0.62 −006
0.38 −0.05
0.57 −0.24
0.43 −0.21
0.55 −1.23 1.80 −0.20 −0.12 0.13 1.69 0.50 −0.68 −1.40 −0.05
0.15 −0.34 0.50 −0.05 −0.03 −0.03 −0.47 −0.14 0.19 0.39 0.0 1
−0.88 0.78 1.61 0.11 −0.82 −0.97 1.36 −1.01 0.40 0.94 −0.76
−0.30 0.26 0.54 0.04 −0.28 0.33 −0.46 0.34 −0.14 −0.32 0.26
1.97 0.18 −0.23 0.25 −0.96 −1.08 −0.18 1.90 0.06 0.27 −0.71
0.84 0.08 −0.10 0.10 −0.41 0.46 0.08 −0.81 −002 −0.11 0.30
−0.21 −0.21 −0.12 4.09 −1.06 0.18 −0.13 −0.08 1.65 −0.81 −0.60
−0.16 −0.16 −0.09 3.19 −0.83 −0.14 0.10 0.06 −1.29 0.64 0.47
0.01 −0.05 0.02 0.56 6.47 −1.15 −0.02 −0.14 0.43 −0.38 1.26
0.01 −0.05 0.02 0.49 5.64 1.00 0.02 0.12 −0.38 0.33 −1.10
Note C&T = Coast & Trebi; Han = Hanchen; WS = White Smyrna Man = Manchuria’ Gat = Gatemi; Mel = Meloy; Arl = Arlington; Ith = Ithaca StP = St. Paul; Moc = Moccasin; Mor = Moro, Dav = Davis
Namely, our five sets of dual subspace are
Components (1, 10), (2,9), (3,8), (4,7), (5,6) The table of principal coordinates of the ten components is grouped into five pairs of dual subspaces in Table 9.13. Notice that the sum of the eigenvalues of the two components in each dual subspace is 1. Let us look at graphical displays, component by component, that is, one contingency component and the corresponding two components in dual subspace. As noted earlier, the two-dimensional graphs of dual subspace are mathematically correct representations of individual contingency components. Observe how discrepant the axis for the rows and the axis for the columns are in each dual subspace. We wonder why we have treated each two-dimensional graph as a unidimensional graph by using correspondence plots. But, the fact is that correspondence plots are almost the only graphs we see in the applications of quantification theory today! To the best of the author’s knowledge, these mathematically correct twodimensional graphs in dual subspace have never been used! Why not? To plot this two-dimensional graph for each dual subspace is absolutely the correct one, but it is usually ignored and replaced with a unidimensional graph under the name of correspondence plot.
9.1 Barley Data
151
Fig. 9.1 Component 1: Contingency versus dual subspace
We need to reconsider the practice of correspondence plots and we should adopt the mathematically correct plots, in this case, two-dimensional plot of dual subspace. Let us present two graphs, one by correspondence plot (one dimension) and the other by the exact mathematical plot (two dimensions) for each dual subspace (Figs. 9.1, 9.2, 9.3, 9.4 and 9.5). Dual subspace 1: For component 1 of the contingency-table format, barley Manchuria is very closely located to Ithaca, New York (Note: you can construct the dimension 1 graph for contingency-table analysis by projecting all points in Fig. 9.1 onto the horizontal axis), but in the exact graph from dual subspace 1, these are widely separated. Similarly, barley White Smyrna is close to locations Moro and Moccasin in contingency space, but they are widely separated in the exact twodimensional geometric presentation. How can we reconcile these discrepancies? To be frank, we should not distort the graph of dual subspace 1. This means that we should interpret the two-dimensional graph for each contingency component, rather than the unidimensional correspondence plot. Dual subspace 2: What about dual subspace 2? In contingency space, Manchuria and Ithaca are closely located, and St. Paul, Arlington and Davis are closely located to barley varieties Meloy and Coast and Trebi. But, if we look at the geometrically correct two-dimensional dual subspace plots, Manchuria is quite far away from Ithaca (dual subspace 1), and Arlington and Davis are far away from Meloy and Coast and Trebi (dual subspace 2). Dual subspace 3, 4, 5: In the same way, we can examine the distributions of locations and barley varieties in contingency space and dual subspace of dual subspace 3, 4 and 5. All of them show startling amounts of discrepancies between what
152
9 Two-Stage Quantification: A New Look
Fig. 9.2 Component 2: Contingency versus dual subspace
Fig. 9.3 Component 3: Contingency versus dual subspace
correspondence plots show (i.e., unidimensional graphs in contingency space) and our two-dimensional dual subspace graphs (dual subspace) reveal. As noted earlier, component 4 requires a full two-dimensional space. Since the scales for the horizontal axis and the vertical axis of Fig. 9.5 are not equal, the graph looks like a system of two oblique axes. But, as shown earlier, the two axes are
9.1 Barley Data
153
Fig. 9.4 Component 4: Contingency versus dual subspace
Fig. 9.5 Component 5: Contingency versus dual subspace
separated by 91◦ , that is, two-dimensional orthogonal axes. We should keep this in mind when we discuss joint multidimensional graphs in Chap. 10. The above comparisons of plots in contingency space and dual space are quite remarkable, and the two sets of graphs for different components are starkly different and strongly indicate that we cannot substitute the exact two-dimensional plots of
154
9 Two-Stage Quantification: A New Look
Table 9.14 The squared distance matrix in dual space: Barley data 1 2 3 4 5 6 7 8 1. Coast Trebi 2. Hanchen 3. Whiet Smyrna 4. Manchuria 5. Gatemi 6. Meloy 7. Arlington 8. Ithaca 9. St. Paul 10. Moccasin 11. Moro 12. Davis
0 3.1 2.5 3.2 5.5 9.0 1.9 2.9 2.9 3.3 2.8 2.1
3.1 0 3.2 3.8 6.0 9.2 3.5 3.5 1.8 4.5 3.5 3.4
2.5 3.2 0 3.3 5.7 9.0 3.0 3.1 3.2 3.2 1.5 2.8
3.2 3.8 3.3 0 6.0 9.3 3.6 1.4 3.8 4.1 3.6 3.6
5.5 6.0 5.7 6.0 0 10.3 5.7 5.8 5.8 5.6 5.8 5.8
9.0 9.2 9.0 9.3 10.3 0 9.1 9.2 9.2 9.3 9.2 8.7
1.9 3.5 3.0 3.6 5.7 9.1 0 3.5 3.5 4.2 3.4 3.4
2.9 3.5 3.1 1.4 5.8 9.2 3.5 0 3.6 4.1 3.4 3.4
9
10
11
12
2.9 1.8 3.2 3.8 5.8 9.2 3.5 3.6 0 3.7 3.5 3.5
3.3 4.5 3.2 4.1 5.6 9.3 4.2 4.1 3.7 0 3.9 4.1
2.8 3.5 1.5 3.6 5.8 9.2 3.4 3.4 3.5 3.9 0 3.3
2.1 3.4 2.8 3.6 5.8 8.7 3.4 3.4 3.5 4.1 3.3 0
dual subspace with unidimensional correspondence plots. This matter will further be discussed in Chap. 10, where we will discuss problems of joint graphical display. Two Sources of Full Information In terms of information retrieval, there are two outputs of importance: • Principal coordinates of dual space (major and minor), as we have discussed. • The total distance matrix, calculated from the above-mentioned coordinates. As for the first source of principal coordinates, we typically use them for multidimensional graphical display, and we will use them when we discuss joint graphical display in Chap. 10. As for the second source of distances between variables (rows, columns), we typically use it in multidimensional scaling and cluster analysis. Although this is not one of the main topics of the current book, we will further discuss the topic as an aid for information retrieval in Chap. 10. So, as our final step of Stage 2 analysis, let us look at the table of (rows, columns)-by-(rows, columns) distance matrix, calculated from the principal coordinates of dual space. Given the principal coordinates of rows and columns in dual space, we can calculate the entire distance matrix. In Chap. 6 of Nishisato et al. (2021), Clavel and Nishisato stated that we should use the squared distance to maintain the additive aspects of distance information (i.e., the squared total distance = (the squared major dual distance) + (the squared minor dual distance). This additivity does not hold for distance itself. Therefore, we will use here the squared distance matrix (Tables 9.12 and 9.14). How to use the distance information, however, will be discussed in Chap. 10 as (1) a source of information to evaluate joint graphical display and (2) an alternative to joint graphical display problems.
9.2 Rorschach Data
155
Table 9.15 Rorschach data and induced moods Garmize and Rychlak (1964) Rorschach Fear Anger Induced Moods love Ambition depression Bat Blood Butterfly Cave Clouds Fire Fur Mask Mountains Rocks Smoke
33 10 0 7 2 5 0 3 2 0 1
10 5 2 0 9 9 3 2 1 4 6
18 2 1 13 30 1 4 6 4 2 1
1 1 26 1 4 2 5 2 1 1 0
2 0 5 4 1 1 5 2 18 2 1
Security 6 0 18 2 6 1 21 3 2 2 0
Notes Rorschach symbols Bear, Boot(s), Bridge, Hair and Island were dropped from the original data set, due to small frequencies
9.2 Rorschach Data The data are already presented and described in Chap. 8, where we discussed our space theory. Thus, it may look redundant to talk about the same data again, but it is our view that another example here may be useful. The relevant tables and the graphs are duplicated for those readers who skipped Chap. ch8. The Rorschach inkblot data by Garmize and Rychlak (1964) are first given as a contingency table (Table 9.15).
9.2.1 Stage 1 Analysis The bilinear expansions of the joint frequencies (Nishisato and Nishisato, 1994) are presented in Tables 9.16, 9.17, 9.18, 9.19, 9.20 and 9.21. The most important aspect of this expansion is the expression of each term inside the bracket, namely the term ρk yki xk j . This means that the traditional eigenvalue decomposition cannot be applied to the contingency table to yield components ρk yki and ρk x jk . This is the main source of the perennial problem of joint graphical display: One cannot plot ρk yki and ρk x jk in common space. The last statement is obvious, considering that each component is obtained by maximizing the correlation between y and x. In other words, if y and x are not perfectly correlated, they occupy different axes, where the angle between the two axes is related to the correlation. Note if they are perfectly correlated, they can be plotted in the same space (no need for quantification in this case). In spite of our attempt to maximize the correlation, the maximized correlation is typically far from
156
9 Two-Stage Quantification: A New Look
Table 9.16 Order 0 approximation: Row-column independence 12.9 3.3 9.6 5.0 9.6 3.5 7.0 3.3 5.2 2.0 1.7
10.4 2.7 7.8 4.0 7.8 2.8 5.7 2.7 4.2 1.6 1.3
16.8 4.3 12.5 6.5 12.5 4.6 9.1 4.3 6.7 2.6 2.2
9.0 2.3 6.7 3.5 6.7 2.4 4.9 2.3 3.6 1.4 1.2
8.4 2.2 6.2 3.2 6.2 2.3 4.6 2.2 3.4 1.3 1.1
12.5 3.2 9.3 4.8 9.3 3.4 6.8 3.2 5.0 2.0 1.6
Table 9.17 Order 1 approximation: Independence + component 1 24.4 7.0 −4.6 7.3 12.4 5.4 0.0 3.5 2.9 1.7 2.9
15.2 4.2 1.9 5.0 8.9 3.6 2.8 2.8 3.3 1.5 1.8
23.5 6.4 4.2 7.9 14.1 5.6 5.0 4.4 5.4 2.5 2.9
−0.6 −0.7 18.5 1.5 4.3 0.9 10.7 2.1 5.4 1.7 0.2
4.7 1.0 10.8 2.5 5.3 1.7 6.8 2.1 4.1 1.4 0.7
2.7 0.1 21.3 2.8 6.9 1.8 12.7 3.0 6.9 2.2 0.6
1.2 −0.9 3.8 4.9 4.1 0.0 5.9 2.3 17.2 1.9 0.6
3.7 0.7 23.2 2.1 7.2 2.3 12.9 3.0 3.3 2.1 0.6
Table 9.18 Order 2 approximation: component 2 added 25.1 7.4 −3.2 6.8 12.7 5.7 0.2 3.5 0.3 1.6 2.9
16.0 4.6 3.5 4.4 9.2 4.0 3.0 2.7 0.3 1.4 1.9
23.1 6.2 3.2 8.2 14.0 5.4 4.9 4.5 7.2 2.5 2.8
0.9 0.1 21.5 0.5 4.8 1.6 11.1 2.0 −0.2 1.5 0.2
9.2 Rorschach Data
157
Table 9.19 Order 3 approximation: component 3 added 27.0 9.2 −1.7 4.8 5.9 7.8 −0.4 2.8 1.8 1.9 3.8
17.3 6.0 4.6 2.9 4.3 5.5 2.6 2.2 1.4 1.6 2.5
18.9 1.9 −0.3 12.9 29.2 0.7 6.2 6.0 3.8 2.0 0.8
1.4 0.6 21.9 −0.1 3.1 2.2 10.9 1.9 0.2 1.5 0.5
2.1 0.0 4.6 3.9 0.8 1.0 5.6 2.0 17.9 2.0 1.0
3.3 0.2 22.8 2.6 8.7 1.8 13.0 3.1 2.9 2.0 0.4
2.1 0.0 4.6 3.9 0.8 1.0 5.6 2.0 17.9 2.0 1.0
4.2 0.4 23.1 2.9 8 .1 1.4 13.0 3.1 3.0 1.7 0.0
2
6
0 5 4 1 1 5 2 18 2 1
0 18 2 6 1 21 3 2 2 0
Table 9.20 Order 4 approximation: component 4 added 32.9 10.0 0.3 7.1 2.1 5.0 −0.5 3.0 2.1 0.0 1.0
9.9 5.0 2.2 0.0 9.1 9.0 2.7 2.0 1.0 4.0 6.0
18.5 1.9 −0.4 12.7 29.4 0.9 6.2 6.0 3.7 2.1 1.0
2.4 0.7 22.2 0.3 2.4 1.7 10.9 1.9 0.3 1.2 0.0
Table 9.21 Order 5 approximation: component 5 added 33 10 18 1 10 0 7 2 5 0 3 2 0 1
5 2 0 9 9 3 2 1 4 6
2 1 13 30 1 4 6 4 2 1
1 26 1 4 2 5 2 1 1 0
158
9 Two-Stage Quantification: A New Look
Table 9.22 Summary statistics of rorschach data from contingency table Component 1 2 3 4 ρ2
Eigenvalue Singular value ρ Delta(δ) CumuD( δk ) Row-column angle(θ)
0.4633 0.6807 42.72 42.72 47◦
0.2505 0.5005 23.10 65.82 60◦
0.1704 0.4128 15.71 81.53 66◦
0.1281 0.3579 11.81 93.35 69◦
5 0.0721 0.2686 6.65 100.00 79◦
1, thus creating the for two-dimensional space for one component. As noted before, when the singular value is not 1, the angle between the two axes of y and x, θ yx , is given by θ yx = cos−1 ρk Unless the correlation is 1, y and x of each component do not span the same space. Keep in mind that the source of our problem lies in the above decomposition formula, expressed in terms of ρk yki x jk . Let us again summarize the major roles of the singular values. • ρ is the maximal correlation between the rows and the columns of the contingency table. • ρ is the least-square regression coefficient for x in predicting y, or vice versa. • ρ is a function of the angle between the row axis and the column axis, more specifically, θ = cos−1 ρ. • When ρ is maximized by quantification, it means that Cronbach’s generalized reliability coefficient α is also maximized (Lord, 1958; Nishisato, 1980). • ρ is the projection operator of the row variable onto the space of the column variable, or the column variable onto the row variable. Stage 1 analysis yields five components, and their key statistics of five components are summarized in Table 9.22. Notice those individual components and the angles of discrepancies between the row space and the column space. They clearly suggest the needs of Stage 2 analysis. Even when the row-column correlation is relatively high, for example, 0.6807 of Component 1, the angle between the y axis and the x axis is 47◦ ! Projected weights of rows and columns of the data are listed in Table 9.23. Let us keep in mind that those projected weights are typically used for graphical display. Note, however, the magnitudes of those angles between row axes and column axes are surprisingly large, definitely not 0! How can we justify correspondence plot?
9.2 Rorschach Data
159
Table 9.23 Projected weights of five contingency components Component 1 2 3 Bat Blood Butterfly Cave Clouds Fire Fur Mask Mountains Rocks Smoke Fear Anger Depression Ambition Security Love
−0.70 −0.87 1.17 −0.37 −0.23 −0.42 0.78 −0.05 0.34 0.11 −0.57 −0.87 −0.44 −0.39 1.03 0.42 0.76
−0.16 −0.35 −0.44 0.30 −0.08 −0.30 −0.08 0.04 1.54 0.15 −0.05 −0.17 −0.23 0.09 −0.51 1.27 −0.23
0.15 0.60 0.17 −0.44 −0.75 0.63 −0.08 −0.21 0.31 0.13 0.57 0.39 0.35 −0.68 0.15 0.29 −0.09
4
5
−0.34 −0.18 −0.15 −0.34 0.30 0.59 0.01 −0.04 −0.04 0.70 1.25 −0.48 0.75 0.02 −0.12 0.00 −0.08
−0.08 0.06 0.32 0.11 0.13 0.07 −0.67 0.02 0.11 −0.08 0.00 −0.03 −0.02 0.10 0.48 0.05 −0.47
9.2.2 Stage 2 Analysis For Stage 2 analysis, we convert the contingency table into the condensed responsepattern format as in Table 9.24. The summary statistics are given in Table 9.24. Unlike the barley data, our current example contains residual space because the number of rows and that of columns of the contingency table are different. As mentioned earlier, the information in major dual space (contingency space) is all that Stage 1 analysis provides, and the information in residual space and minor dual space is the additional information we can draw from Stage 2 analysis. In Stage 2 analysis, we obtain the coordinates of rows and columns in major dual space, residual space and minor dual space (Tables 9.25 and 9.26). As mentioned in Chap. 8, the following principal coordinates of residual space is not analyzed since they do not covey any information related to the row-column association. It is listed here just for information (Table 9.27). As is the case with the barley data, minor dual space is particularly important because it is what Stage 1 analysis ignores completely, but what is vitally important for the determination of the exact coordinates of rows and columns in doubled multidimensional space (Table 9.28). Dual subspace consists of two components, one from major dual space (components with the eigenvalues greater than 0.5) and the other from minor dual space
160
9 Two-Stage Quantification: A New Look
Table 9.24 Response-pattern table Ba Bl Bu Ca Cl Fi Fu 33 10 18 1 2 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 10 5 2 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 2 1 26 5 18 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 7 13 1 4 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 9 30 4 1 6 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 5 9 1 2 1 1 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3 4 5 5 21 0
Ma Mt 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Ro 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Sm Fe 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
33 0 0 0 0 0 10 0 0 0 0 0 0 0 0 7 0 0 0 0 2 0 0 0 0 0 5 0 0 0 0 0 0 0 0 0 0 3
An
Dp
Am Se
0 10 0 0 0 0 0 5 0 0 2 0 0 0 0 0 0 0 0 0 0 9 0 0 0 0 0 9 0 0 0 0 3 0 0 0 0 0
0 0 18 0 0 0 0 0 2 0 0 1 0 0 0 0 13 0 0 0 0 0 30 0 0 0 0 0 1 0 0 0 0 4 0 0 0 0
0 0 0 1 0 0 0 0 0 1 0 0 26 0 0 0 0 1 0 0 0 0 0 4 0 0 0 0 0 2 0 0 0 0 5 0 0 0
Lo
0 0 0 0 2 0 0 0 0 0 0 0 0
0 0 0 0 0 1 0 0 0 0 0 0 0
0 0 0 0 4 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 5 0 0
0 18 0 0 0 0 2 0 0 0 0 0 6 0 0 0 0 0 1 0 0 0 0 21 0
5
(continued)
9.2 Rorschach Data
161
Table 9.24 (continued) Ba Bl Bu Ca Cl
Fi
Fu
Ma Mt
Ro
Sm Fe
An
Dp
Am Se
Lo
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
2 6 2 2 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 2 1 4 1 18 2 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 4 2 1 2 2 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 6 1
0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 1 0 0
2 0 0 0 0 0 1 0 0 0 0 4 0 0 0 0 0 6 0
0 6 0 0 0 0 0 4 0 0 0 0 2 0 0 0 0 0 1
0 0 2 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0
0 0 0 2 0 0 0 0 0 18 0 0 0 0 2 0 0 0 0
0 0 0 0 3 0 0 0 0 0 2 0 0 0 0 2 0 0 0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
1
0
Note Bl = blood, Bu = butterfly, Cl = clouds, Fi = fire, Ma = masks, Mt = mountains Ro = rocks, Sm = smoke, Fe = fear, An = anger, Dp = depression, Am = ambition Se = security, Lo = love
(components with eigenvalues smaller than 0.5), such that the sum of the eigenvalues of the two components is equal to 1. As we have already seen in Chap. 8, the Rorschach example yields the following five sets of components in dual subspace: ⎧ 2 (1): ρ12 + ρ15 ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ 2 ⎪ (2): ρ22 + ρ14 ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎨ (3): ρ 2 + ρ 2 3 13 ⎪ ⎪ ⎪ 2 ⎪ ⎪ (4): ρ42 + ρ12 ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ 2 2 ⎪ ⎪ ⎪ (5): ρ5 + ρ11 ⎪ ⎩
= 0.86 + 0.14 = 1 = 0.75 + 0.25 = 1 = 0.71 + 0.29 = 1 = 0.68 + 0.32 = 1 = 0.63 + 0.37 = 1
162
9 Two-Stage Quantification: A New Look
Table 9.25 Basic statistics Component Space* 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
ρk2
Contingency = 0.86 major dual 0.75 0.71 0.68 0.63 0.5 0.5 Residual 0.5 0.5 0.5 0.37 0.32 Minor dual 0.29 0.25 0.14
ρk
δ
0.93
11.42
11.42
0.87 0.84 0.82 0.80 0.71 0.71 0.71 0.71 0.71 0.61 0.57 0.54 0.50 0.38
10.00 9.43 9.06 8.45 6.67 6.67 6.67 6.67 6.67 4.89 4.23 3.90 3.34 1.91
21.42 30.85 39.91 48.36 55.02 61.69 68.36 75.02 81.69 86.58 90.85 94.75 98.09 100.00
Cumδ
Note Space designations will be fully discussed in Chap. 11 Table 9.26 Principal coordinates of response-pattern table: Major dual space (contingency space) Co 1 2 3 4 5 Ba Bl Bu Ca Cl Fi Fu Ma Mo Ro Sm Fe An Dep Am Se Lo
−1.09 −1.12 1.51 −0.47 −0.26 −0.53 1.12 −0.04 0.40 0.19 −0.71 −1.19 −0.56 −0.50 1.29 0.51 1.19
0.26 0.63 0.74 −0.50 0.17 0.54 0.12 −0.06 −2.66 −0.23 0.13 0.29 0.43 −0.12 0.87 −2.18 0.39
−0.34 −1.17 −0.45 0.89 1.53 −1.20 0.202 0.43 −0.64 −0.21 −1.02 −0.79 −0.63 1.37 −0.42 −0.59 0.22
−0.74 −0.32 −0.51 −0.83 0.57 1.39 0.19 −0.12 −0.07 1.62 2.90 −1.02 1.73 −0.03 −0.50 0.01 −0.01
−0.24 −0.17 0.92 0.13 0.49 0.23 −1.97 0.00 0.34 0.00 0.33 −0.34 0.11 0.30 1.40 0.18 −1.40
Notes Co = component, Ba = bat, Bl = blood, Bu = butterfly, Ca = cave Cl = clouds, Fi = fire, Fu = fur, Ma = mask, Mo = mountains, Ro = rock, Sm = smoke Fe = fear, An = anger, Dep = depression, Am = ambition, Se = security, Lo = love
9.2 Rorschach Data
163
Table 9.27 Principal coordinates of response-pattern table: Residual space Co 6 7 8 9 Ba Bl Bu Ca Cl Fi Fu Ma Mo Ro Sm Fe An Dep Am Se Lo
−0.77 0.25 −0.07 2.60 −0.95 0.61 −0.12 1.38 −0.71 0.83 0.76 0 0 0 0 0 0
0 0.74 0.14 −0.45 −0.12 −2.51 0.00 1.61 −0.01 −1.53 3.77 0 0 0 0 0 0
0 −0.28 0.20 1.31 0.10 0.11 0.33 −3.15 0.03 −2.30 2.26 0 0 0 0 0 0
0 1.39 0.11 0.29 −0.02 −2.06 −0.12 −1.73 −0.18 3.85 0.03 0 0 0 0 0 0
10 1.03 −3.31 0.17 0.14 −0.55 −0.16 −0.16 0.10 −0.36 1.57 1.40 0 0 0 0 0 0
Notes Co = component, Ba = bat, Bl = blood, Bu = butterfly, Ca = cave Cl = clouds, Fi = fire, Fu = fur, Ma = mask, Mo = mountains, Ro = rocks, Sm = smoke Fe = fear, An = anger, Dep = depression, Am = ambition, Se = security, Lo = love
Thus, we have the following five cases of dual subspace: (Component 1 and 15), (2,14), (3,13), (4, 12) and (5, 11) which provide the true two-dimensional structures of the five contingency components we obtained in Stage 1 analysis. These ten-dimensional coordinates for both rows and columns of the contingency table are what we have looked for, because they provide the exact coordinates for the description of the ten-dimensional configuration of the Rorschach data. Although redundant it may be, let us reproduce comparison graphs of both contingency components and the corresponding response-pattern components. We have already seen in Chap. 8 great discrepancies between row variables and column variables with large absolute-value coordinates, and we see the same findings for the current case. The illustrations with these two sample data pose a serious problem for joint graphical display of correspondence plots (Figs. 9.6, 9.7, 9.8, 9.9 and 9.10). Let us remember a very interesting aspect of the above comparisons: In each dual subspace, the row variables are located on a single line and the column variables on the other line, crossed at the origin with the angle of θ , given by cos−1 ρ, where ρ is the singular value of the contingency-table analysis. With the results from the two numerical examples, we can summarize our observations as follows:
164
9 Two-Stage Quantification: A New Look
Table 9.28 Principal coordinates of response-pattern table: minor dual space Co 11 12 13 14 Ba Bl Bu Ca Cl Fi Fu Ma Mo Ro Sm Fe An Dep Am Se Lo
−0.18 −0.13 0.70 0.10 0.37 −0.18 −1.50 0.00 0.26 0.00 0.25 0.26 −0.08 −0.23 −1.07 −0.14 1.07
−0.51 −0.22 −0.35 −0.57 0.39 0.96 0.13 −0.08 −0.05 1.11 1.99 0.70 −1.19 0.02 0.34 −0.01 0.00
−0.22 −0.75 −0.29 0.57 0.98 −0.77 0.13 0.28 −0.41 −0.13 −0.66 0.51 0.41 −0.88 0.27 0.38 −0.14
0.15 0.36 0.43 −0.29 0.10 0.31 0.07 −0.03 −1.53 −0.13 0.07 −0.17 −0.25 0.07 −0.50 1.26 −0.22
15 0.44 0.46 −0.62 0.19 0.11 0.22 −0.46 0.02 −0.16 −0.08 0.29 −0.49 −0.23 −0.20 0.53 0.21 0.49
Notes Co = component, Ba = bat, Bl = blood, Bu = butterfly, Ca = cave Cl = clouds, Fi = fire, Fu = fur, Ma = mask, Mo = mountains, Ro = rocks, Sm = smoke Fe = fear, An = anger, Dep = depression, Am = ambition, Se = security, Lo = love
Fig. 9.6 Component 1: Contingency versus dual subspace
9.2 Rorschach Data
Fig. 9.7 Component 2: Contingency versus dual subspace
Fig. 9.8 Component 3: Contingency versus dual subspace
165
166
9 Two-Stage Quantification: A New Look
Fig. 9.9 Component 4: Contingency versus dual subspace
Fig. 9.10 Component 5: Contingency versus dual subspace
9.3 Squared Distance Matrix in Dual Space
167
• Each contingency-table component has two-dimensional structure. • The two oblique axes lie in two-dimensional dual subspace. • One axis is for the rows and the other for the columns of the contingency table. These findings are the major contribution of Nishisato’s two-stage quantification and theory of doubled multidimensional space.
9.3 Squared Distance Matrix in Dual Space The final output of Stage 2 analysis is the squared distance matrix of rows and columns in dual space (major and minor dual space). This matrix is as informative as that of complete principal coordinates of rows and columns in dual space. The squared distance table is as shown in Table 9.29. The content of this table will be examined later when we discuss joint graphical display in Chap. 10.
Table 9.29 The squared distance matrix in dual space: Rorschach data 1 2 3 4 5 6 7 8 9 10 11 12 13 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
0 1.2 3.2 1.9 2.9 2.9 3.5 1.7 3.9 3.2 4.6 1.9 2.8 2.3 3.3 3.3 3.1
1.2 0 3.3 3.0 3.6 2.2 3.8 2.4 4.2 3.1 4.0 2.1 2.8 2.9 3.5 3.6 3.3
3.2 3.3 0 3.2 3.7 3.5 3.9 2.5 4.2 3.4 4.9 3.4 3.5 3.2 2.5 3.6 2.9
1.9 3.0 3.2 0 2.1 3.9 3.6 1.3 3.4 3.3 5.1 2.5 3.2 2.0 3.3 3.1 3.0
2.9 3.6 3.7 2.1 0 3.6 3.3 1.7 4.4 2.6 4.3 3.1 3.1 2.3 3.8 3.7 2.7
2.9 2.2 3.5 3.9 3.6 0 3.9 2.8 4.3 1.7 2.0 3.1 2.6 3.2 3.5 3.7 3.6
3.5 3.8 3.9 3.6 3.3 3.9 0 2.8 4.5 3.3 5.0 3.8 3.8 3.5 3.8 3.8 2.8
1.7 2.4 2.5 1.3 1.7 2.8 2.8 0 3.4 2.3 4.1 2.2 2.5 1.6 2.7 2.7 2.3
3.9 4.2 4.2 3.4 4.4 4.3 4.5 3.4 0 3.5 5.0 4.0 4.2 3.8 4.3 3.0 4.1
3.2 3.1 3.4 3.3 2.6 1.7 3.3 2.3 3.5 0 2.1 3.2 2.6 2.8 3.4 3.2 3.0
4.6 4.0 4.9 5.1 4.3 2.0 5.0 4.1 5.0 2.1 0 4.5 3.7 4.3 4.8 4.7 4.7
1.9 2.1 3.4 2.5 3.1 3.1 3.8 2.2 4.0 3.2 4.5 0 3.5 3.1 3.6 3.7 3.4
2.8 2.8 3.5 3.2 3.1 2.6 3.8 2.5 4.2 2.6 3.7 3.5 0 3.3 3.8 3.8 3.6
14
15
16
17
2.3 2.9 3.2 2.0 2.3 3.2 3.5 1.6 3.8 2.8 4.3 3.1 3.3 0 3.4 3.5 3.2
3.3 3.5 2.5 3.3 3.8 3.5 3.8 2.7 4.3 3.4 4.8 3.6 3.8 3.4 0 4.0 3.7
3.3 3.6 3.6 3.1 3.7 3.7 3.8 2.7 3.0 3.2 4.7 3.7 3.8 3.5 4.0 0 3.8
3.1 3.3 2.9 3.0 2.7 3.6 2.9 2.3 4.1 3.0 4.7 3.4 3.6 3.2 3.7 3.8 0
Note 1 = bat; 2 = blood; 3 = butterfly; 4 = cave; 5 = cloud; 6 = fire; 7 = fur; 8 = mask; 9 = mountains; 10 = rock; 11 = smoke; 12 = fear; 13 = anger; 14 = depression; 15 = ambition; 16 = security; 17 = love
168
9 Two-Stage Quantification: A New Look
9.4 Summary of Two-Stage Quantification The results of this section suggest a number of important matters, summarized as follows: 1. So long as we are interested in the multidimensional decomposition of the association between the rows and the columns of the contingency table, we should ignore the information in the residual space, which corresponds to components 6, 7, 8, 9 and 10 of Rorschach data: Note that all components in residual space have the eigenvalues of 0.5. This tells us how important it is to equate the number of rows to that of columns as much as possible in data collection (i.e., the dimensionality of residual space is 0 when the number of rows is equal to the number of columns, as was the case with barley data. 2. Those remaining components in each case are arranged in the descending order of the eigenvalues. We should note that the first five components are what correspondence plot deals with while ignoring the remaining five components. 3. However, look at the graphs obtained for each dual subspace, then one would realize how much information we tend to ignore. 4. The above observation clearly divides researchers into two categories, those who consider the practical aspect more important than accuracy, and those who must always see the correct representation of data structure. 5. Facing difficulties with joint graphical display, we will look at the information in the matrix of the squared distances between rows and columns in dual space in Chap. 10.
9.5 Concluding Remarks In this chapter, we have found many aspects of quantification theory, in particular: (1) Joint graphical display of rows and columns of the contingency table in doubled multidimensional space, (2) The merit of transforming the contingency table into the response-pattern format, (3) The subsequent discussion of different types of quantification space, where we finally have the answer to the perennial problem of joint graphical display, although only in theory. (4) To combat the practical problem with joint graphical display, we may have to look for the future of research into the distance relations embedded in the matrix of squared distance of rows and columns in dual space. In terms of practice, we have a number of problems to make the current two-stage analysis to work as a routine method. We will discuss this problem more in the next chapter on joint graphical display and alternatives in Chap. 10.
References
169
References Garmize, L. M., & Rychlak, J. F. (1964). Role-play validation of a socio-cultural theory of symbolism. Journal of Consulting Psychology, 28, 107–115. Lord, F. M. (1958). Some relations between Guttman’s principal components of scale analysis and other psychometric theory. Psychometrika, 23, 291–296. Nishisato, S. (1980). Analysis of categorical data: Dual scaling and its applications. The University of Toronto Press. Nishisato, S. (2016). Quantification theory: Dual space and total space. Paper presented at the annual meeting of the Behaviormetric Society, Sapporo, Japan, p. 27. (In Japanese). Nishisato, S. (2019). Kaisko: Suryouka riron to graph (Reminiscence: Quantification theory and graphs). Theory and Applications of Data Analysis, 8, 47–57. (in Japanese). Nishisato, S. (2019b). Bunkatsu-hyou no tenkai: nibai tajigen kuukann no riron to graph (Expansion of contingency space: Theory of doubled multidimensional space and graphs). An invited talk at the Annual Meeting of the Japanese Classification Society, Tokyo. (in Japanese). Nishisato, S. (2023, in press). Propositions for quantification theory. In A. Okada, K. Shigemasu, R. Yoshino, & S. Yokoyama (Eds.), Facets of behaviormetrics: The 50th anniversary of the behaviormetric society. Springer Nature. Nishisato, S., Beh, E. J., Lombardo, R., & Clavel, J. G. (2021). Modern quantification theory: Joint graphical display, biplots and alternatives. Springer Nature. Nishisato, S., & Nishisato, I. (1994). Dual scaling in a nutshell. MicroStats. Stebbins, G. L. (1950). Variation and evolution in plants. Columbia University Press.
Chapter 10
Joint Graphical Display
10.1 Toward a New Horizon Graphical display has been a poplar method to interpret multivariate analysis such as principal component analysis (PCA), factor analysis (FA) and multidimensional scaling (MDS). Because of some fundamental differences between PCA and FA, researchers developed algorithms for axis rotations for factor analysis (e.g., Harman, 1960; Kaiser, 1958; Kaiser & Caffrey, 1965; Schönemann, 1966). Other than this extra step for FA, graphical display problems of PCA, FA and MDS were basically solved a long time ago, except for hyperdimensional graphs, say more than threedimensional graphs. The problem of joint graphical display for quantification results is quite different from those of PCA, FA and MDS, which are based on solving eigenequations. In contrast, quantification theory deals with singular value decomposition of a contingency table, where one of the objects is to maximize the row-column correlation. In the current book, we separated our quantification problem into two stages, Stage 1 analysis of the contingency-table format and Stage 2 analysis of the responsepattern format. When we talk about joint graphical display of quantification results, it is typically the case that we consider plotting row weights and column weights obtained from Stage 1 analysis in two-dimensional graphs. The resultant graphs are called correspondence plots or French plots. But, as we have already revealed, each component from Stage 1 analysis has two-dimensional structure with one axis for rows and the other for columns of the contingency table, and we further have shown that the angle between the row axis and the column axis of component k is given by θk , which is equal to cos−1 ρk . This is where the famous CGS controversy between the proponents and the opponent had resulted in heated arguments. In Chaps. 8 and 9, we placed our task on Stage 2 analysis as the rational approach to the problem of joint graphical display. For the first time, we now have the principal coordinates of rows and columns of contingency tables in the expanded dual space (i.e., major dual space and minor dual space). © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Nishisato, Measurement, Mathematics and New Quantification Theory, Behaviormetrics: Quantitative Approaches to Human Behavior 16, https://doi.org/10.1007/978-981-99-2295-6_10
171
172
10 Joint Graphical Display
Yes, this is the matrix of coordinates that we can use to draw mathematically correct joint graphs of rows and columns in common space. Stage 2 analysis also yielded the second statistics which contain exact relations between rows and columns in terms of interpoint distance, namely the squared distance matrix of those variables (rows and columns) over the entire dual space.
10.2 Correspondence Plots and Exact Plots In this section, we will discuss popular two-dimensional correspondence plots (practical but not exact) and two two-dimensional logical plots for each of the two components in the joint graph, which together, therefore, occupy four-dimensional space. This four-dimensional graph, if we can combine those two two-dimensional graphs, is mathematically the exact joint graph. Graphical display, based on the output of the principal coordinates in dual space, is a new topic because the traditional joint graphical display has dealt with only the coordinates associated with major dual space (i.e., the space involved in the contingency table), that is, half the space needed for the exact graphical display. Traditionally, correspondence plot (or French plot) has been the standard method of joint graphical display, or the only graphical display, where for each component, the axis for rows and the axis for columns are forced to be identical, the case which holds only when the rows and the columns are perfectly correlated. Correspondence plot, based on coordinates obtained from Stage 1 analysis, is therefore, an oversimplified procedure for graphical representation of the results. In Chaps. 8 and 9, we have already investigated that each component of the contingency table requires a two-dimensional graph, where all rows lie on one axis and all columns lie on another axis, with the two axes crossing at the origin with the angle θ = cos−1 ρ (Nishisato, ch10p20; Nishisato & Clavel, ch10cl8). In this major dual space (contingency space), the CGS controversy arguments (Carroll et al., 1986, 1987, 1989; Greenacre, 1989a, 1989b) were exchanged between the proponents and the opponent, where correspondence plot and CGS scaling plot are essentially the same, as mentioned earlier, thus no issue to argue, hence no winner or loser (Nishisato, 2016, 2019a, 2019b; Nishisato, 2022). The true difference can be found in the joint space of major and minor dual space, and we know now that each contingency component, the target of the CGS controversies, has two-dimensional structure, whose coordinates are given by those in dual subspace. We know very well, as Lebart et al. (1977) warned, that correspondence plots, based on contingency space, are not accurate. The current book also discussed the matter fully in Chap. 8. Nevertheless, correspondence plot is the almost only one method of joint graphical display currently in use. Therefore, it is urgently needed to discuss correspondence plot once again so that we may show what is at stake with the practice: A serious misrepresentation of a correct joint graph.
10.2 Correspondence Plots and Exact Plots
173
In relation to those warnings by French scholars, let us also add the following facts. As mentioned earlier in the current book, we should be aware of the following points which govern multidimensional space: • The distance between two points cannot become smaller as the dimensionality of space increases. • The correlation between two variables cannot increase as the dimensionality of space increases. • For a given data set, the graph with fewer dimensions is easier to interpret than the graph with a larger number of dimensions. We should keep in mind these fundamental facts about multidimensional space when we discuss multidimensional graphs. Now, let us talk about joint graphical display. In correspondence plot, it is typical to plot component 1 against component 2, component 3 against component 4 and so on with any combinations of two components from Stage 1 analysis. Unfortunately, we do not have easily useful graphical methods for more than three-dimensional configurations. Therefore, we will limit our discussion only to two-dimensional joint graphs. We have so far used three numerical examples, Kretschmer’s typology data in Chap. 1, Rorschach data and barley data in a few chapters. Since we are now familiar with these data sets, let us continue to use them to demonstrate two-dimensional correspondence plots in comparison with correct four-dimensional plots. Our comparisons of correspondence plot and the exact plots of doubled dimensions will hopefully convince the readers that it is the time to reconsider the utility of the currently most popular correspondence plot. Through our numerical demonstrations, we hope that our comparisons will amply demonstrate that we are discussing the problem of convenience versus accuracy. We hope to realize that we must take the accuracy over practical convenience. We have briefly looked at Kretschmer’s typology data (Kretschmer, 1925) and extensively investigated two other examples, namely Rorschach data (Garmize & Rychlak, 1964) and barley data (Stebbins, 1950). Kretschmer’s data require four dimensions, and the other two data sets require ten dimensions for their complete descriptions. In this chapter, we will again use these three examples. Since Kretschmer’s data set requires much smaller dimensional space (two dimensions for contingency space and four dimensions for dual space), it may be instructive to look at this case. The other two cases are more representative of data that we typically encounter in practice, requiring more than four dimensions. Since we have already thoroughly investigated graphical problems with Rorschach data and barley data, let us start with these examples.
174
10 Joint Graphical Display
Table 10.1 Projected weights of five contingency components Component 1 2 3 Bat Blood Butterfly Cave Clouds Fire Fur Mask Mountains Rocks Smoke Fear Anger Depression Ambition Security Love
−0.70 −0.87 1.17 −0.37 −0.23 −0.42 0.78 −0.05 0.34 0.11 −0.57 −0.87 −0.44 −0.39 1.03 0.42 0.76
−0.16 −0.35 −0.44 0.30 −0.08 −0.30 −0.08 0.04 1.54 0.15 −0.05 −0.17 −0.23 0.09 −0.51 1.27 −0.23
0.15 0.60 0.17 −0.44 −0.75 0.63 −0.08 −0.21 0.31 0.13 0.57 0.39 0.35 −0.68 0.15 0.29 −0.09
4
5
−0.34 −0.18 −0.15 −0.34 0.30 0.59 0.01 −0.04 −0.04 0.70 1.25 −0.48 0.75 0.02 −0.12 0.00 −0.08
−0.08 0.06 0.32 0.11 0.13 0.07 −0.67 0.02 0.11 −0.08 0.00 −0.03 −0.02 0.10 0.48 0.05 −0.47
10.2.1 Rorschach Data As we have already seen, we can obtain two sets of principal coordinates of Rorschach data (Garmize & Rychlak, 1964), one from the contingency table Stage 1 analysis (Table 10.1) and the other in terms of five sets of dual subspace from Stage 2 analysis (Table 10.2). Two-Dimensional Joint Graphical Display In joint graphical display, we choose two components for graphical display. Let us show a correspondence plot of contingency-table component 1 on the horizontal axis and component 2 on the vertical axis. This is a typical example where correspondence plot is created. Remember that correspondence plot typically deals with Stage 1 results. Figure 10.1 is the two-dimensional correspondence plot of component 1 and component 2. As researchers typically do, let us examine the graph. In this twodimensional plot, we can roughly identify three clusters: • The first cluster consisting of Rorschach inkblots butterfly and fur, associated with mood ambition and love. • The second comprising inkblots blood and bat, associated with mood fear. • The third cluster, consisting of inkblot mountain associated with mood security.
10.2 Correspondence Plots and Exact Plots
175
Table 10.2 Rorschach data: Dual subspace in order of dominance Comp Dual Sub 1 Dual Sub2 Dual Sub3 Dual 1 15 2 14 3 13 4 θ ρ2 Bat Blod Btfy Cave Clod Fire Fur Mask Mtn Rock Smok Fear Angr Dep Ambi Secu Love
47◦ 0.86 −1.09 −1.12 1.51 −0.47 −0.26 −0.53 1.12 −0.04 0.40 0.19 −0.71 −1.19 −0.56 −0.50 1.29 0.51 1.19
0.14 0.44 0.46 −0.62 0.19 0.11 0.22 −0.46 0.02 −0.16 −0.08 0.29 −0.49 −0.23 −0.20 0.53 0.21 0.49
60◦ 0.75 0.26 0.63 0.74 −0.50 0.17 0.54 0.12 −0.06 −2.66 −0.23 0.13 0.29 0.43 −0.12 0.87 −2.18 0.39
0.25 0.15 0.36 0.43 −0.29 0.10 0.31 0.07 −0.03 −1.53 −0.13 0.07 −0.17 −0.25 0.07 −0.50 1.26 −0.22
66◦ 0.71 −0.34 −1.17 −0.45 0.89 1.53 −1.20 0.20 0.43 −0.64 −0.21 −1.02 −0.79 −0.63 1.37 −0.42 −0.59 0.22
0.29 −0.22 −0.75 −0.29 0.57 0.98 −0.77 0.13 0.28 −0.41 −0.13 −0.66 0.51 0.41 −0.88 0.27 0.38 −0.14
69◦ 0.68 −0.74 −0.32 −0.51 −0.83 0.57 1.39 0.19 −0.12 −0.07 1.62 2.90 −1.02 1.73 −0.03 −0.50 0.01 −0.01
Sub4 12
Dual 5
Sub5 11
0.32 −0.51 −0.22 −0.35 −0.57 0.39 0.96 0.13 −0.08 −0.05 1.11 1.99 0.70 −1.19 0.02 0.34 −0.01 0.01
79◦ 0.63 −0.24 −0.17 0.92 0.13 −0.49 0.23 −1.97 0.00 0.34 0.00 0.33 −0.34 0.11 0.30 1.40 0.18 −1.40
0.37 −0.18 −0.13 0.70 0.10 0.37 −0.18 −1.50 0.00 0.26 0.00 0.25 0.26 −0.08 −0.23 −1.07 −0.14 1.07
Note 1 Comp = component, Blod = blood, Btfy = butterfly Clod = clouds, Mont = mountains, Smok = smoke Angr = anger, Dep = depression, Ambi = ambition, Secu = security Note 2 These angles θ in this table are from Stage 1 analysis
At a glance, these clusters appear to make common sense, namely the associations (butterfly, fur: ambition, love), (blood, bat: fear) and (mountain: security). The problem, however, is that this conclusion is drawn from a joint graphical display based on the assumption that for each of the two components the correlation between rows and columns is 1, that is, perfect: If the correlation is perfect, we will probably not need to analyze the data. As we have already seen, however, if we accept that the correlation between rows and columns is less than 1, then we need an axis for rows and another axis for columns for each contingency component. The assumption of perfect row-column correlation is much more costly than one might think as we will see in our numerical examples. From the correct four-dimensional plots, supplied by two relevant cases of dual subspace, the true configurations clearly lead us not only to different graphs but also different conclusions. Let us look at the correspondence plot and two two-dimensional graphs of mathematically correct configurations (Figs. 10.1, 10.2 and 10.3). The two-dimensional joint graphical display of Fig. 10.1 has two two-dimensional dual subspace plots, that is, four-dimensional plot. Look at Fig. 10.2, where we clearly see inkblots butterfly and fur are quite far from mood ambition and love; inkblot bat
176
10 Joint Graphical Display
Fig. 10.1 Rorschach data: Correspondence plot of component 1 against 2
Fig. 10.2 Dual subspace 1 for contingency component 1
Fig. 10.3 Dual subspace 2 for contingency component 2
10.2 Correspondence Plots and Exact Plots
177
Fig. 10.4 Rorschach data: Correspondence plot of component 3 against 4
Fig. 10.5 Dual subspace 3 for contingency component 3
and blood are far from mood fear. If we look at Fig. 10.3, which is a mathematically correct two-dimensional plot of contingency component 2, we see clearly that inkblot mountain is very far from mood security. Therefore, with the right mind, we would probably not accept the results of correspondence plot and report instead the mathematically correct four-dimensional results. How to interpret the two two-dimensional correct plots is not that easy as correspondence plot. Let us look at one more case of our Rorschach example, contingency components 3 and 4. Compare correspondence plot with the exact plots of doubled space. From the correspondence plot of components 3 and 4 (Fig. 10.4), we see that inkblot cloud is very close to mood depression, but these two are quite far apart in dual subspace 3 (Fig. 10.5). Inkblots fire, smoke and blood are close to mood fear in Fig. 10.4, but inkblots fire, smoke and blood are quite far away from mood fear in Fig. 10.5. In Fig. 10.4, inkblot smoke is very close to fear and anger, but in Fig. 10.6, they are widely apart from each other.
178
10 Joint Graphical Display
Fig. 10.6 Dual subspace 4 for contingency component 4
How can we reconcile these differences? Recall a common sense of geometry of space configuration: The distance between two points increases as the dimensionality of space increases. Therefore, when variables span ten-dimensional space, the distance between two points attains its maximum in ten-dimensional space, and the distance between the same two points tends to decrease typically steadily as we shrink the space to nine-dimensional, eight-dimensional, ...., and two-dimensional space. This is a mathematical fact: When two objects lie in four-dimensional space, their distance becomes smaller as we reduce the dimensionality of space to two. This is exactly what is happening to our case. The fact is that two contingency components lie in four-dimensional space, which is in our example the exact geometric representation of our data. Naturally, the proximity of two objects in four-dimensional space becomes closer in two-dimensional space, but the mathematically correct space is four-dimensional. We intentionally reduced the dimensionality of space by assuming that the rows and the columns are perfectly correlated. This assumption, however, is false. This discussion, therefore, can finally be concluded that we must adopt the configurations of rows and columns in four-dimensional space for the current examples. The above point of using four-dimensional space is perhaps extraordinary for most researchers who are used to use principal coordinates: The component with the maximal eigenvalue is the most representative of the data structure. This point may lead to the notion that we can justify the two-dimensional plot, without any reference to other minor components. However, this is a special case of quantification of contingency tables and is different from that of principal component analysis (PCA). Please note the fundamental difference between graphs of PCA and our correspondence plot. This view will tell us that we should continue examining component-by-component joint plots. Incidentally, some researchers considered non-symmetric joint graphs, using principal coordinates of rows and standard coordinates of columns, or vice versa. This idea sounds reasonable as a means of reducing the dimensionality of space to a half
10.2 Correspondence Plots and Exact Plots
179
Table 10.3 Projected weights of five contingency components Component 1 2 3 Coast trebi Hanchen White smyrna Manchuria Gatemi Meloy Arlington Ithaca St. Paul Moccasin Moro Davis
0.11 0.49 −1.10 1.60 −0.16 −0.10 0.11 1.50 0.45 −0.60 −1.25 −0.04
0.68 0.74 −0.65 −1.36 −0.07 0.69 0.81 −1.14 0.85 −0.33 −0.79 0.64
−0.61 1.49 0.14 −0.17 0.17 −0.72 −0.81 −0.14 1.43 0.04 0.20 −0.53
4
5
−0.02 −0.06 −0.06 −0.04 1.26 −0.30 0.04 −0.04 −0.03 0.51 −0.25 −0.18
0.04 0.00 0.01 0.00 −0.09 −1.19 0.21 0.00 0.03 −0.07 0.07 −0.23
the correct space through the concept of projection. But, this idea is definitely wrong, because standard coordinates are quantities which have nothing to do with the structure of data (it does not contain the information about singular values). Thus, the idea of non-symmetric scaling should be outright discarded. It is a mathematically wrong alternative for data analysis. So, the answer to our quandary seems obvious: Adopt dual subspace plots, not correspondence plot. Let us now look at the example of barley data.
10.2.2 Barley Data Let us now move on and look at the joint graphical display of our barley data (Stebbins, ch10p22). The coordinates of five components from Stage 1 analysis are reproduced here as in Table 10.3). The coordinates from Stage 2 analysis are listed in Table 10.4. For convenience, the components are arranged in terms of dual subspace pairs. We will start again comparing correspondence plots of two components with dual subspace plots for contingency components. Thus, we will look at the comparison of a two-dimensional correspondence plot with mathematically correct pairs of twodimensional plots. From the correspondence plot of Fig. 10.7, we can immediately identify three clusters, barley Manchuria at Ithaca, barley White Smyrna at Moro and barley Hanchen at St. Paul. These close pairs appear in our correspondence plot. However, if we represent them in mathematically correct plots, Manchuria and Ithaca are vastly separated in dual subspace 1 (Fig. 10.8) and dual subspace 2, White Smyrna and Moro are quite
180
10 Joint Graphical Display
Table 10.4 Barley data: dual subspace in order of dominance Comp Dual Sub 1 Dual Sub2 Dual Sub3 Dual 1 10 2 9 3 8 4 θ C&T Hanc WS Manc Gate Mloy Arln Ithc StPa Moca Moro Davs
34◦ 0.12 0.55 −1.23 1.80 −0.20 −0.12 0.13 1.69 0.50 −0.68 −1.40 −0.05
0.03 0.15 −0.34 0.50 −0.05 −0.03 −0.03 −0.47 −0.14 0.19 0.39 0.0 1
41◦ −0.81 −0.88 0.78 1.61 0.11 −0.82 −0.97 1.36 −1.01 0.40 0.94 −0.76
−0.27 −0.30 0.26 0.54 0.04 −0.28 0.33 −0.46 0.34 −0.14 −0.32 0.26
51◦ −0.81 1.97 0.18 −0.23 0.25 −0.96 −1.08 −0.18 1.90 0.06 0.27 −0.71
84◦ −0.34 −006 0.84 −0.21 0.08 −0.21 −0.10 −0.12 0.10 4.09 −0.4d1 −1.06 0.46 0.18 0.08 −0.13 −0.81 −0.08 −002 1.65 −0.11 −0.81 0.30 −0.60
Sub4 7
Dual 5
Sub5 6
−0.05 −0.16 −0.16 −0.09 3.19 −0.83 −0.14 0.10 0.06 −1.29 0.64 0.47
91◦ −0.24 0.01 −0.05 0.02 0.56 6.47 −1.15 −0.02 −0.14 0.43 −0.38 1.26
−0.21 0.01 −0.05 0.02 0.49 5.64 1.00 0.02 0.12 −0.38 0.33 −1.10
Note C&T = Coast & Trebi; Hanc = Hanchen; WS = White Smyrna Manc = Manchuria; Gate = Gatemi; Mloy = Meloy; Arln = Arlington; Ithc = Ithaca StPa = St. Paul; Moca = Moccasin; Davs = Davis
Fig. 10.7 Barley data: Correspondence plot of component 1 against component 2
apart in dual subspace 1 (Fig. 10.8), Hanchen and St. Paul are largely separated in dual subspace 2 (Fig. 10.9). Our conclusion is again the same: Correspondence plot is not very accurate. How about the correspondence plot of components 3 and 4 for the barley data? We see clearly that Hanchen is very close to St. Paul, Gatemi and Moccasin, and Coast and Trebi and Meloy are close to Arlington and Davis. But, again the above associations are broken far away in the correct graphs from what we saw in correspondence plot. Hanchen and St. Paul are widely apart in dual subspace 3 (Fig. 10.11), Gatemi and Moro are each at the extreme ends in dual subspace 4 (Fig. 10.12), and Coast & Trebi and Meloy are not that close to each other in dual subspace 3 (Fig. 10.10).
10.2 Correspondence Plots and Exact Plots
Fig. 10.8 Dual subspace 1 for contingency component 1
Fig. 10.9 Dual subspace 2 for contingency component 2
Fig. 10.10 Barley data: Correspondence plot of component 3 against component 4
181
182
10 Joint Graphical Display
Fig. 10.11 Dual subspace 3 for contingency component 3
Fig. 10.12 Dual subspace 4 for contingency component 4
The above comparisons of correspondence plots and dual subspace plots leave us in total dismay. Apparently nicely interpretable correspondence plots are not that accurate as most researchers would believe. This is a serious problem for joint graphical display of quantification results. Let us now look at the third example of Kretschmer’s typology data, which is quite different from the previous two examples as we will see below.
10.2.3 Kretschmer’s Typology Data This data set was briefly looked at in Chap. 1. German psychiatrist Kretschmer (ch1p2) collected from his patients the data on their mental types and body types (Table 10.5).
10.2 Correspondence Plots and Exact Plots
183
Table 10.5 Kretschmer’s typology data Pyknic Leptosomatic Athletic Manicdepressive Schizophrenic Epileptic Total
Dysplastic Others
879
261
91
15
114
1360
717 83 1679
2632 378 3271
884 435 1410
549 444 1008
450 166 730
5232 1506 8098
Table 10.6 Kretschmer’s data in condensed response patterns Pat M-D Sch Epi Pyk Lept 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
879 261 91 15 114 0 0 0 0 0 0 0 0 0 0
Total
0 0 0 0 0 717 2632 884 549 450 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 83 378 435 444 166
879 0 0 0 0 717 0 0 0 0 83 0 0 0 0
0 261 0 0 0 0 2632 0 0 0 0 378 0 0 0
Ath
Dys
Oth
0 0 91 0 0 0 0 884 0 0 0 0 435 0 0
0 0 0 15 0 0 0 0 549 0 0 0 0 444 0
0 0 0 0 114 0 0 0 0 450 0 0 0 0 166
Note Pat = pattern; M-D = manic-depressive; Sch = schizophrenic Epi = epileptic; Pyk = pyknic ; Lept = leptosomatic; Ath = athletic Dys = dysplastic; Oth = others
The above contingency table can be transformed into the condensed responsepattern table as in Table 10.6. The total number of components we can extract from Table 10.6 is equal to (the number of rows plus the number of columns) minus 2, which is 8. As we have already discussed, this is a case where the number of rows and that of columns of the original contingency table are different, leading to the residual space of two dimensions, where the contributions of either rows or columns are totally absent (recall that those components in residual space have eigenvalues all equal to 0.5). Deleting these two components, dual space now consists of four dimensions. Recall also that each dual subspace consists of two components where one component is associated with the eigenvalue greater than 0.5 and the other associated with the eigenvalue smaller than 0.5, such that the sum of the two eigenvalues is 1. This is the definition of dual subspace. In the current example, the distribution of the six eigenvalues ρ 2 are:
184
10 Joint Graphical Display
Table 10.7 Kretschmer’s data: Dual subspace in order of dominance Component Dual Sub 1 Dual 1 6 2 Manic-depressive Schizophrenic Epileptic Pyknic Leptosomatic Athletic Dysplastic Others
1.86 −0.24 −0.86 1.63 −0.28 −0.58 −0.94 −0.10
1.06 −0.14 −0.49 −0.93 0.16 0.33 0.54 0.06
0.48 −0.55 1.47 0.34 −0.89 0.54 1.36 0.27
Sub2 5 0.37 −0.42 1.12 −0.26 0.68 −0.41 −1.04 −0.21
Fig. 10.13 Kretschmer’s data: Correspondence plot of component 1 against component 2
• (1) 0.75, (2) 0.63, (3) 0.50, (4) 0.50, (5) 0.37, (6) 0.25. Therefore, components (4) and (5) belong to residual space, where there is no shared information between rows and columns, thus ignored in data analysis. We can then identify two sets of dual subspace. Namely, Dual subspace 1 = (Component 1, Component 6) Dual subspace 2 = (Component 2, Component 5) Dual subspace consists of components 1 and 6 with the eigenvalues 0.75 and 0.25, respectively, and the other dual subspace is of components 2 and 5 with the eigenvalues 0.63 and 0.37, respectively. As mentioned before, each paired two-dimensional subspace corresponds to a single component from the contingency table. The coordinates of the four components in dual space are shown in Table 10.7. Each dual subspace offers a two-dimensional graph consisting of the row axis and the column axis (Figs. 10.14 and 10.15). In our correspondence plot (Fig. 10.13), mental-type Epileptic is closely located to body-type dysplastic, but the exact plots of Figs. 10.14 and 10.15, they are not that
10.2 Correspondence Plots and Exact Plots
185
Fig. 10.14 Dual subspace 1 for contingency component 1
Fig. 10.15 Dual subspace 2 for contingency component 2
close: Observe how the distance between two distant points can become close when we reduce the dimensionality of space. Similarly, mental-type manic-depressive is close to body-type Pyknic. Again those two closely located variables in twodimensional space become far apart in four-dimensional space of Figs. 10.14 and 10.15. Here, the importance of Kretschmer’s data should be recognized: The data are mathematically four-dimensional. Therefore, those graphs of dual subspace are mathematically correct, and correspondence plot is a simplified two-dimensional plot. This means that we can no longer talk about how accurate correspondence plot is even for this data set of small dimensionality. Indeed, the current example indicates that even the data of four dimensions cannot be handled by correspondence plot with sufficient accuracy. This is our conclusion on joint graphical display.
186
10 Joint Graphical Display
10.3 Multidimensional Joint Display We have just discussed the case of two-dimensional graphical display and discovered that we need four-dimensional space for the exact display of the data. We also have discussed how distorted the graphical display can be if a four-dimensional data configuration is projected onto two-dimensional space as correspondence plot always does, namely two distant points in four-dimensional space can become two close points in two-dimensional space. This observation definitely provides a reason why we should hesitate to use correspondence plots. As mentioned before, we should be reminded that correspondence plots are based on the assumption that the row-column correlation is perfect for all components that we can extract from Stage 1 analysis. Our numerical examples have provided ample evidence that the above assumption is very unreasonable and too risky to use. Our discussion was so far carried out in the situation where the data points require four-dimensional space, which is handled by correspondence plots as twodimensional problems. Now, we are moving from the above situation to the total space of ten dimensions for Rorschach data and barley data and four dimensions for Kretschmer’s data. We can immediately conjecture that the projection of ten-dimensional space configuration onto two-dimensional space would lead to the results that many distant points are now viewed as exceedingly close points to one another. The previous cases of two-dimensional correspondence plots should also be reexamined not in four-dimensional space, but in ten-dimensional space for Rorschach and barley data and in four-dimensional space for Kretshcmer’s data. What will happen then to our correspondence plots? This generalization of space dimensionality seems to pose more problems to the use of correspondence plots than we are currently aware of. The statistic useful for this total space exploration can be provided by the distance matrix, calculated over ten dimensions for Rorschach data and barley data and four dimensions for Kretschmer’s data. Luckily, our Stage 2 analysis yields the (rows, columns)-by-(rows, columns) squared distance matrix, obtained from the entire space of ten dimensions and four dimensions. Let us reproduce the matrix for each of the three examples here (Tables 10.9, 10.10, 10.11, 10.12 and 10.13). These tables are too large to see what is happening to our data sets. What can we do with these goldmines? Nishisato (Nishisato, 2012, 2016) discussed an idea of p-percentile clustering, where we simply discard all the distances larger than the p-percentile point since we are only interested in two points which are closer, than that, to each other if we are to carry out clustering (Note: In cluster analysis, only close point to one another are used to define clusters of data point). We will use his rationale here and show an example of discarding relatively large distances from the above table and see which pairs of (rows, columns) are relatively close in each example of ten-dimensional space. Since we do not want to discard any row or column variables in this elimination process, we will also retain the smallest distance of each variable with another variable.
10.3 Multidimensional Joint Display
187
Table 10.8 The squared distance matrix in dual space: Rorschach data 1 2 3 4 5 6 7 8 9 10 11 12 13 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
0 1.2 3.2 1.9 2.9 2.9 3.5 1.7 3.9 3.2 4.6 1.9 2.8 2.3 3.3 3.3 3.1
1.2 0 3.3 3.0 3.6 2.2 3.8 2.4 4.2 3.1 4.0 2.1 2.8 2.9 3.5 3.6 3.3
3.2 3.3 0 3.2 3.7 3.5 3.9 2.5 4.2 3.4 4.9 3.4 3.5 3.2 2.5 3.6 2.9
1.9 3.0 3.2 0 2.1 3.9 3.6 1.3 3.4 3.3 5.1 2.5 3.2 2.0 3.3 3.1 3.0
2.9 3.6 3.7 2.1 0 3.6 3.3 1.7 4.4 2.6 4.3 3.1 3.1 2.3 3.8 3.7 2.7
2.9 2.2 3.5 3.9 3.6 0 3.9 2.8 4.3 1.7 2.0 3.1 2.6 3.2 3.5 3.7 3.6
3.5 3.8 3.9 3.6 3.3 3.9 0 2.8 4.5 3.3 5.0 3.8 3.8 3.5 3.8 3.8 2.8
1.7 2.4 2.5 1.3 1.7 2.8 2.8 0 3.4 2.3 4.1 2.2 2.5 1.6 2.7 2.7 2.3
3.9 4.2 4.2 3.4 4.4 4.3 4.5 3.4 0 3.5 5.0 4.0 4.2 3.8 4.3 3.0 4.1
3.2 3.1 3.4 3.3 2.6 1.7 3.3 2.3 3.5 0 2.1 3.2 2.6 2.8 3.4 3.2 3.0
4.6 4.0 4.9 5.1 4.3 2.0 5.0 4.1 5.0 2.1 0 4.5 3.7 4.3 4.8 4.7 4.7
1.9 2.1 3.4 2.5 3.1 3.1 3.8 2.2 4.0 3.2 4.5 0 3.5 3.1 3.6 3.7 3.4
2.8 2.8 3.5 3.2 3.1 2.6 3.8 2.5 4.2 2.6 3.7 3.5 0 3.3 3.8 3.8 3.6
14
15
16
17
2.3 2.9 3.2 2.0 2.3 3.2 3.5 1.6 3.8 2.8 4.3 3.1 3.3 0 3.4 3.5 3.2
3.3 3.5 2.5 3.3 3.8 3.5 3.8 2.7 4.3 3.4 4.8 3.6 3.8 3.4 0 4.0 3.7
3.3 3.6 3.6 3.1 3.7 3.7 3.8 2.7 3.0 3.2 4.7 3.7 3.8 3.5 4.0 0 3.8
3.1 3.3 2.9 3.0 2.7 3.6 2.9 2.3 4.1 3.0 4.7 3.4 3.6 3.2 3.7 3.8 0
Note 1 1 = Bat; 2 = Blood; 3 = Butterfly; 4 = Cave; 5 = Cloud; 6 = Fire; 7 = Fur; 8 = Mask; 9 = Mountain; 10 = Rock; 11 = Smoke; 12 = Fear; 13 = Anger; 14 = Depression; 15 = Ambition; 16 = Security; 17 = Love Note 2: J. G. Clavel kindly supplied this table to the author Table 10.9 The squared distance matrix in dual space: Barley data 1 2 3 4 5 6 7 8 1. Coast Trebi 2. Hanchen 3. Whiet Smyrna 4. Mancuria 5. Gatemi 6. Meloy 7. Arlington 8. Ithaca 9. St. Psul 10. Mocasin 11. Moro 12. Davis
0 3.1 2.5 3.2 5.5 9.0 1.9 2.9 2.9 3.3 2.8 2.1
3.1 0 3.2 3.8 6.0 9.2 3.5 3.5 1.8 4.5 3.5 3.4
2.5 3.2 0 3.3 5.7 9.0 3.0 3.1 3.2 3.2 1.5 2.8
3.2 3.8 3.3 0 6.0 9.3 3.6 1.4 3.8 4.1 3.6 3.6
5.5 6.0 5.7 6.0 0 10.3 5.7 5.8 5.8 5.6 5.8 5.8
9.0 9.2 9.0 9.3 10.3 0 9.1 9.2 9.2 9.3 9.2 8.7
Note J. G. Clavel kindly supplied this table to the author
1.9 3.5 3.0 3.6 5.7 9.1 0 3.5 3.5 4.2 3.4 3.4
2.9 3.5 3.1 1.4 5.8 9.2 3.5 0 3.6 4.1 3.4 3.4
9
10
11
12
2.9 1.8 3.2 3.8 5.8 9.2 3.5 3.6 0 3.7 3.5 3.5
3.3 4.5 3.2 4.1 5.6 9.3 4.2 4.1 3.7 0 3.9 4.1
2.8 3.5 1.5 3.6 5.8 9.2 3.4 3.4 3.5 3.9 0 3.3
2.1 3.4 2.8 3.6 5.8 8.7 3.4 3.4 3.5 4.1 3.3 0
188
10 Joint Graphical Display
Table 10.10 The squared distance matrix in dual space: Kretschmer’s data 1 2 3 4 5 6 1. Manic-depressive 2. Schizophrenic 3. Epileptic 4. Pyknic 5. Leptosomatic 6. Atheletic 7. Dysplastic 8. Others
0 2.7 3.4 2.1 2.7 2.7 3.3 2.3
2.7 0 2.6 2.2 1.2 1.2 2.2 0.9
3.4 2.6 0 3.1 2.5 2.0 2.4 2.0
2.1 2.2 3.1 0 2.7 2.6 3.2 2.0
2.7 1.2 2.5 2.7 0 1.8 2.9 1.5
2.7 1.2 2.0 2.6 1.8 0 1.1 0.6
7
8
3.3 2.2 2.4 3.2 2.9 1.1 0 1.7
2.3 0.9 2.0 2.0 1.5 0.6 1.7 0
Note J. G. Clavel kindly supplied this table to the author Table 10.11 The squared distance matrix after filtering: Rorschach data 1 2 3 4 5 6 7 8 9 10 11 12 13 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
0 1.2 1.2 0 1.9 3.0 2.9 2.9 2.2 1.7 2.4
1.9 2.1 2.8 2.8 2.3 2.9
1.9 2.9 2.9 3.0 2.2
1.7 2.4 0 2.5 0 2.1 1.3 2.1 0 1.7 0 2.8 0 2.8 2.5 1.3 1.7 2.8 2.8 0 3.4 0 2.6 1.7 2.3 2.0 2.5 2.2 2.6 2.5 2.0 2.3 1.6 2.5 2.7 2.7 3.0 2.9 3.0 2.7 2.8 2.3
14
15
16
17
1.9 2.8 2.3 2.1 2.8 2.9 2.5 2.5 2.6 1.7 2.0 2.3 0 2.1 2.6 2.8
3.0
2.0 2.3
2.9 3.0 2.7
2.6
2.9 2.2 2.5 1.6 2.7 2.7 2.3 3.0 2.1 2.6 2.8 3.0 0 0 0 0 0 0 0
Note 1 = Bat; 2 = Blood; 3 = Butterfly; 4 = Cave; 5 = Cloud; 6 = Fire; 7 = Fur; 8 = Mask; 9 = Mountain; 10 = Rock; 11 = Smoke; 12 = Fear; 13 = Anger; 14 = Depression; 15 = Ambition; 16 = Security; 17 = Love
Table 10.11 is a table where all the distances larger than 3.0 are replaced with blanks since they are unlikely to contribute to the formation of clusters (Note: This value of 3.0 is arbitrary and does not represent any specific percentile point in the matrix). How about our barley data (Table 10.12)? We note that there is one case where one variable is so far away from the rest of locations and varieties. Rather than eliminating it from our analysis, let us keep it in analysis by retaining the smallest
10.3 Multidimensional Joint Display
189
Table 10.12 The squared distance matrix after filtering: Barley data 1 2 3 4 5 6 7 8 1. Coast trebi 2. Hanchen 3. White Smyrna 4. Manchuria 5. Gatemi 6. Meloy 7. Arlington 8. Ithaca 9. St. Paul 10. Moccasin 11. Moro 12. Davis
0 3.1 2.5 3.2
3.1 0 3.2
2.5 3.2 0 3.3
3.2
(5.5)
3.3 0
1.9
2.9
3.0
3.1 1.4
9
10
11
12
2.9 1.8 3.2
3.3
2.8
2.1
3.2
1.5
2.8
0
(5.6) 0
1.9 2.9 2.9 3.3 2.8 2.1
1.8
3.0 3.1 3.2 3.2 1.5 2.8
(8.7) 0
1.4
0 0 0 0 3.3
8.7
Table 10.13 The squared distance matrix after filtering: Kretschmer data 1 2 3 4 5 6 1. Manic-depressive 2. Schizophrenic 3. Epileptic 4. Pyknic 5. Leptosomatic 6. Athletic 7. Dysplastic 8. Others
0
3.3 0
7
8
1.1 0 1.7
0.9 2.0 2.0 1.5 0.6 1.7 0
2.1 0
1.2
1.2 2.0
0 1.8
1.8 0 1.1 0.6
0 2.1
0 1.2 1.2
2.0
0.9
2.0
2.0
1.5
distance from the variable. In the current example there are three such cases, namely the distance 8.7 between barley Meloy and location Davis, the distance 5.6 between barley Gatemi and location Moccasin, and the distance 5.5 between barley Coast Trebi and barley Gatemi. Regarding the rest of the table, the threshold of 3.4 was used (i.e., those distances greater than 3.4 are removed). Let us also look at Kretschmer’s data. We will arbitrary choose 0.12 as the cutting point, that is, we eliminate all the squared distances which are greater than 2.2 (Table 10.13).
10.3.1 Readers’ Tasks: Rorschach Data Tables 10.10 and 10.11 contain a large amount of interesting information. The detailed analysis will take many more pages; thus, the task of analysis of Table 10.11
190
10 Joint Graphical Display
is left to the readers. The key question is if the analysis of the table will provide a large amount of helpful information to our limited analysis through joint graphs. Without any systematic analysis, we can immediately see such information as • Inkblot Mask is close to other inkblots and all moods. • Inkblots Bat and Blood are similar in their associations with other inkblots and moods. • Inkblots Cave and Cloud are similar in their associations with other inkblots and moods. • Inkblot Mountains is associated with mood Security. • Moods, Fear, Anger and Depression are similar in their association with inkblots. and so on. The major question for the task is if the information contained in the distance matrix has any resemblance to the information one can get from joint graphical display. This is a question which still requires much more time to answer. How about our barley data?
10.3.2 Readers’ Tasks: Barley Data Some of the immediately observable results are • Barley varieties Coast and Trebi and White Smyrna show similar associations with other locations and barley varieties. • Coast and Trebi and White Smyrna thrive in all agricultural locations. • Moro and Davis have similar associations with other locations and varieties of barley. • Barley varieties Manchuria, Coast & Trebi, Hanchen and White Smyrna thrive in Ithaca • Coast and Trebi, Hanchen and White Smyrna also thrive in St. Paul and Davis. • Meloy does not do well with Davis being the closest best location. The question is how to further analyze the information in Tables 10.9 and 10.12. How about Kretschmer’s data?
10.3.3 Readers’ Tasks: Kretschmer’s Data • Mental-type manic-depressive is close to the body-type pyknic as shown by correspondence plot but also examine the two dual subspace plots, where the two are slightly separated. • Mental-type schizophrenic is closer to body type others than to leptosomatic and athletic. • Mental-type epileptic is close to body types athletic and others.
10.4 Discussion on Joint Graphical Display
191
The above results are quite different from the correspondence plot and the dual subspace plots. Why? We cannot find an answer to this question.
10.4 Discussion on Joint Graphical Display As mentioned earlier, graphical display of principal component analysis is straightforward: Given a table of principal coordinates of variables, we plot the variables in two- or three-dimensional graphs. There is no problem of row weights and column weights with moderate correlation, but all the components are orthogonal to one another. In contrast, Stage 1 analysis of quantification theory yields projected weights for all variables so as to maximize the correlation between rows and columns—the fact is that this correlation almost never reaches 1; hence, we cannot map the rows and the columns in the same space. By assuming this row-column correlation being 1, correspondence plots deal with half the Euclidean space, thus achieving the economy of work, but at the expense of accuracy. Correlation of 1 is almost never attained in practice, thus leading to the situation where the angle between the axis for rows and that for axis for columns is generally much larger than zero 30◦ . The angle is typically larger than 40◦ ! How can we be comfortable with correspondence plots which assume the angle to be 0? For nearly 100 years, it is customary to use correspondence plots to summarize quantification outcomes. But, now that we know the mathematical structure of our input and output which requires doubled multidimensional space is it not the time to give up correspondence plots completely? Is it worth sacrificing accuracy for simplicity? We can obtain a perfet conguation from PCA of the distance matrices (e.g., Tables 10.8, 10.9 and 10.10). Our major problem is how to come up with similar conclusions from multidimensional graphs and multidimensional analysis of the distance matrix. The above examples have left us in the air and we do not know how to reconcile our different observations of the outcomes of the two approaches, that is, through multidimensional coordinates and the total distance matrix. This decision problem is also very much dependent on our ability of dealing with multidimensional space. We would not be able to reject the idea of dealing with half the space rather than the full space, considering that we can deal with only up to three-dimensional space in terms of graphical display. We urgently need to break the current barrier of space into four-dimensional and hyperdimensional graphical display. Such a breakthrough will bring us back to joint graphical display as a routine mode of quantification analysis. Nishisato (2023) states: Our biggest task ahead of us is to develop a better and more efficient way of representing multidimensional quantification results. We also need such a program to be capable of displaying dynamic, holographic multidimensional graphs.
192
10 Joint Graphical Display
Whether dynamic or holographic or interactive, we must develop a visual presentation method of multidimensional configurations in useful ways. Unfortunately, we have to leave this task to the next generation of research. Apologetic Note: It is quite expected that those users of correspondence plot would be much offended by the critical views against it as presented above. It is true that the problems of joint graphical display, particularly against the traditional correspondence plot, cannot be decided only through small specific examples, in particular only two-dimensional plots. When data are multidimensional, it is quite possible that those critical observations on our examples might not apply in a more general multidimensional case. Therefore, the above critical comments against correspondence plots should accordingly be discounted slightly—yes, only very slightly because the small amount of separation between two points in two-dimensional space will grow larger as the dimensionality of the space increases. Yes, the interpoint distance cannot become smaller as the dimensionality of space increases. This applies to other data points. We have already seen this phenomenon by increasing the dimension of space from 1 to 2, that is, the unidimensional graph from Stage 1 analysis to the two-dimensional graph in dual subspace. We can state again that the distance between two points steadily increases as the dimensionality of space increases from 1 to 2, 2 to 3, and all the way to ten-dimensional space in our two numerical examples. Thus, a close point between two points in twodimensional space will steadily increase as we introduce more dimensions to the interpoint distance calculation. Of course, there is no guarantee that the distance in unidimensional space will rapidly grow as the dimensionality increases. This all depends on a particular example, but the undeniable certainty is that the distance between two points in kdimensional space cannot be smaller than the distance in reduced space such as (k − 1)-dimensional space. Thus, the current conclusions on correspondence plot also depend on the data sets. No matter what, however, we should be aware that the distance between two close points cannot become smaller as the dimensionality of space increases. Apologetic Note, But...: Let us consider the role of joint graphical display. Whether we can handle two-dimensional joint graphs or hyperdimensional graphs, the graphs should provide configurations of data points which are representative of the exact data configurations. From this point of view, the role of the squared distance matrix of all the data points becomes crucial. Is our two-dimensional correspondence plot representative of the total configuration? If not, what is the role of our data analysis? To see only a specific part of the data structure? A general answer to this question would be yes. But, if the components we are looking at are major components, should not the joint graph be representative of the general data configuration, too? To explore this question, we calculated the squared distance matrix, to see if our joint graph is representative of the total picture. We discovered that even the case of Kretschmer’s data which require four-dimensional space, the two-dimensional correspondence plot failed to offer a reasonable approximation to the four-dimensional configuration.
10.5 Cluster Analysis as an Alternative
193
10.5 Cluster Analysis as an Alternative Stage 2 analysis provides principal coordinates of data in dual space, that is, major dual space plus minor dual space, and this entire information of the data set was used in joint graphical display. The same information, that is, about the coordinates in dual space, can also be converted to the (rows, columns)-by(rows, columns) distance matrix. This distance matrix contains the entire amount of information in the data and thus is an equally good source of information to investigate the matrix of coordinates. The use of the distance matrix, however, has been rather limited in use by quantification theory. Now that joint graphical display seems to have a decisive limitation to its practical use, we should look into the distance matrix as a source of information for structural inquiry of the data. Out of all possibilities, a glimpse of hope is to look for non-spatial graphical approach, more specifically cluster analysis. The author’s Ph.D. thesis (supervisor R. D. Bock, University of North Carolina, Chapel Hill, 1965) was entitled Minimum entropy clustering of test items (Nishisato, 1966), which was based on the combination of information theory and logistic regression. Times have changed after almost 58 years since then, and we now have a large variety of cluster analytic techniques. However, the basic property of cluster analysis has not changed: It is a dimension-free technique for exploring data structure. Without any literature review, one can see a definite possibility that cluster analysis will give us an effective and comprehensive summary of all relations embedded in a set of multidimensional orthogonal coordinates or the total distance matrix of rows and columns. Once we have the entire coordinates of those variables, the entire matrix of distance can easily be calculated as we did for our three numerical examples here. Let us consider cluster analysis of the (rows, columns) × (rows, columns) distance matrix as an alternative to joint graphical display. Early days’ main references for cluster analysis were, among others, Tryon and Bailey (ch10p23), Anderberg (1973), Duran and Odell (1974), Everitt (1974), Hartigan (1975), Jain and Dubes (1988), Jardin and Sibson (1971), Sneath and Sokal (1973). Many readers must be familiar with these books. Irrespective of dimensionality of data, cluster analysis identifies clusters of variables in terms of proximity among variables. In this sense, cluster analysis is a dimensionless method of multidimensional data analysis. However, how many clusters we should identify is mostly an empirical question, and at this point, one can identify as many clusters as one wishes since there does not seem to be any universally accepted criterion for determining the number of clusters for a given set. Its applications to quantification theory has been considered by at least several investigators, but due to the lack of extensive literature search, what is available is not known to the author of this book. Although it is an extremely limited amount of information, interested readers may refer to Nishisato (2021) and for numerical examples to Clavel & Nishisato (2021, Chap. 6 Clustering as an alternative, 107– 130) in Nishisato, Beh, Lombardo and Clavel (2021), where specific applications of
194
10 Joint Graphical Display
cluster analysis to our (rows, columns)-by-(row, columns) distance matrix in dual space are discussed and presented. This Chapter 6 has the following headings: Chap. 6 Clustering as an Alternative 6.1 Decomposition of Input Data 6.1.1 Rorschach Data 6.1.2 Barley Data 6.2 Partitions of Super-Distance Matrix 6.3 Outlines of Cluster Analysis 6.3.1 Universal Transform for Clustering (UTC) 6.4 Clustering of Super-Distance Matrix 6.4.1 Hierarchical Cluster Analysis: Rorschach Data 6.4.2 Hierarchical Cluster Analysis: Barley Data 6.4.3 Partitioning Cluster Analysis: Rorschach Data 6.4.4 Partitioning Cluster Analysis: Barley Data 6.5 Cluster Analysis of Between-Set Relations 6.5.1 Hierarchical Cluster Analysis of Rorschach Data (UTC) 6.5.2 Hierarchical Cluster Analysis of Barley Data (UTC) 6.5.3 Partitioning Cluster Analysis: Rorschach Data and Barley Data (UTC) 6.5.4 Effects of Constant Q for UC on Cluster Formation 6.6 Overlapping Versus Non-overlapping Clusters 6.7 Discussion and Conclusion 6.8 Final Comments on Part 1 The above reference will be useful as an introduction to the use of cluster analysis for summarizing our multidimensional configurations of Rorschach data and barley data. There are many numerical examples of different types of cluster analysis, using Rorschach data and barley data. Chapter 6 of the above book will provide a lot of information one can explore in pursuing the approach to multidimensional data structure by cluster analysis. There is one note here: When multidimensional quantification space is partitioned, we would like to see that the distances should also be partitioned in the same way. To this end, we need to use the squared distance and then we can partition the total squared distance as the sum of the squared distance in major dual space plus the squared distance in minor dual space. For our two numerical examples, the matrices of the between variable squared distances in dual space were as given earlier in Tables 10.8 and 10.9. Thus, as for exploratory applications of cluster analysis of our two examples, please refer to Chap. 6 of Nishisato, Beh, Lombardo and Clavel (2021). There are many future problems that can further be pursued, and there are many hints in the above article.
References
195
10.6 Final Notes We started with joint graphical display as almost the only way to summarize quantification results in a tangible way. In this chapter, however, we noted a severe limitation of technology to visualize a multidimensional configuration in a satisfactory manner. This means there is an urgent need to develop the technique for visualizing a multidimensional configuration of data. As an alternative to visual display, we discussed non-dimensional alternatives, in particular the use of cluster analysis, which has a history of over 80 years. We hope that applications of cluster analysis to understanding our multidimensional configurations will eventually lead to an everyday routine of multidimensional quantification analysis. Currently, we are still exploring its potential as a routine method for summarizing quantification results.
References Anderberg, M. R. (1973). Cluster analysis for applications. Academic Press. Carroll, J. D., Green, P. E., & Schaffer, C. M. (1986). Interpoint distance comparisons in correspondence analysis. Journal of Marketing Research, 23, 271–280. Carroll, J. D., Green, P. E., & Schaffer, C. M. (1987). Comparing interpoint distances in correspondence analysis: A clarification. Journal of Marketing Research, 24, 445–450. Carroll, J. D., Green, P. E., & Schaffer, C. M. (1989). Reply to greenacre’s commentary on the carroll-green-schaffer scaling of two-way correspondence analysis solutions. Journal of Marketing Research, 26, 366–368. Clavel, J. G. & Nishisato, S. (2021). Chapter 6. Clustering as an Alternative. In B. Nishisato, Lombardo & Clavel (Eds.), Modern Quantification Theory: Joint Graphical Dispaly, Biplots and Alternatives (pp. 107–130). Springer. Duran, B. S., & Odell, P. L. (1974). Cluster analysis: A survey. Springer-Verlag. Everitt, B. S. (1974). Cluster analysis. Wiley & Sons. Garmize, L. M., & Rychlak, J. F. (1964). Role-play validation of a socio-cultural theory of symbolism. Journal of Consulting Psychology, 28, 107–115. Greenacre, M. L. (1989b). An invited talk on CGS scaling. The International Federation of Classification Society Meeting. Charlottesville, Virginia. Greenacre, M. J. (1989). The carroll-green-schaffer scaling in correspondence analysis: A theoretical and empirical appraisal. Journal of Marketing Research, 26, 358–365. Harman, H. H. (1960). Modern factor analysis. University of Chicago Press. Hartigan, J. A. (1975). Clustering algorithms. Wiley & Sons. Jain, A. K., & Dubes, R. C. (1988). Algorithms for clustering data. Prentice-Hall. Jardin, N., & Sibson, R. (1971). Mathematical taxonomy. Wiley & Sons. Kaiser, H. F. (1958). The varimax criterion for analytic rotation in factor analysis. Psychometrika, 23, 187–200. Kaiser, H. F., & Caffrey, J. (1965). Alpha factor analysis. Psychometrika, 30, 1–14. Kretschmer, E. (1925). Physique and Character: An Investigation of the Nature of Constitution and of the Theory of Temperament; with 31 Plates. London: Kegan Paul, Trench, Trubner. Lebart, L., Morineau, A., & Tabard, N. (1977). Techniques de la Description Statistique: Méthodes et Logiciels pour l’Analyse des Grands Tableaux. Dunod. Nishisato, S. (1966). Minimum entropy clustering of test items. University Microfilms Inc.
196
10 Joint Graphical Display
Nishisato, S. (1988). Forced classification procedure of dual scaling: Its mathematical properties. In Bock, H. H. (ed.), Classification and Related Methods, 523–532. Amsterdam: North Holland. Nishisato, S. (2012). Quantification theory: Reminiscence and a step forward. In W. Gaul, A. GeyerSchultz, L. Schmidt-Tiéme, & J. Kunze (Eds.), Challenges at the Interface of Data Analysis, Computer-Science and Optimization (pp. 109–119). Springer. Nishisato, S. (2014). Structural representation of categorical data and cluster analysis through filters. In W. Gaul, A. Geyer-Schultz, Y. Baba, & A. Okada (Eds.), German-Japanese Interchange of Data Analysis Results (pp. 81–90). Springer. Nishisato, S. (2016). Quantification theory: Dual space and total space. Paper presented at the Annual Meeting of the Behaviormetric Society, Sapporo, Japan, p. 27 (in Japanese). Nishisato, S. (2019b). Expansion of contingency space: Theory of doubled multidimensional space and graphs. An invited talk at the Annual Meeting of the Japanese Classification Society, Tokyo (in Japanese). Nishisato, S. (2022). Optimal quantification and symmetry. Springer Nature. Nishisato, S., & Clavel, J. G. (2003). A note on between-set distances in dual scaling and correspondence analysis. Behaviormetrika, 30, 87–98. Nishisato, S., & Clavel, J. G. (2003). A note on between-set distances in dual scaling and correspondence analysis. Behaviormetrika, 30, 87–98. Nishisato, S., Beh, E. J., Lombardo, R., & Clavel, J. G. (2021). Modern quantification theory: Joint graphical display, biplots and alternatives. Springer Nature. Nishisato, S., et al. (2020). From joint graphical display to bi-modal clustering: [1] A giant leap in quantification theory. In T. Imaizumi (Ed.), Advanced Research in Classification and Data Science (pp. 157–168). Springer. Nishisato, S. (2019). Reminiscence: Quantification theory and graphs. Theory and Applications of Data Analysis, 8, 47–57. (in Japanese). Schönemann, P. H. (1966). A general solution of the orthogonal Procrustes problem. Psychometrika, 31, 1–10. Stebbins, G. L. (1950). Variation and Evolution in Plants. New York: Columbia University Press. Tryon, R. C., & Bailey, D. E. (1970). Cluster Analysis. New York: McGraw-Hill.
Chapter 11
Beyond the Current Book
As mentioned in Chap. 1, there are an enormous number of books on quantification theory, which have been called by over 50 aliases (Nishisato, 2007). As such, the current book is not new, except for its emphasis on the two-stage analysis of symmetric analysis and geometric theory of quantification space. In addition, the current book is more like the author’s own research summary, which needs some apologies to the readers—many other researchers have done many wonderful studies on quantification theory.
11.1 Selected Problems In the current book, we have covered only a limited number of topics, namely the most fundamental topics of quantifying two-way data matrices (Note: these data types are handled by simple correspondence analysis and multiple correspondence analysis; or, more specifically, dual scaling of contingency tables and multiple-choice data). As most readers would know, there are many other equally important topics related to quantification problems. Let us briefly look at some of them.
11.1.1 Geometric Space Theory for Many Variables The current book introduced a theory of doubled multidimensional space for the analysis of contingency tables via analysis of its response-pattern format. As was made clear, the importance of the theory is invaluable for knowing what scope of analysis we are dealing with. As noted throughout the book, the space theory covers only two-way data. © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Nishisato, Measurement, Mathematics and New Quantification Theory, Behaviormetrics: Quantitative Approaches to Human Behavior 16, https://doi.org/10.1007/978-981-99-2295-6_11
197
198
11 Beyond the Current Book
It is therefore urgently needed to expand the space theory to n-variable quantification problems. The data under this consideration is what we call multiple-choice data, which are currently analyzed by multiple correspondence analysis and dual scaling of multiple-choice data. These data consist of mixed numbers of response choices (alternatives). Suppose that one item has only two alternatives (e.g., [pass, failure], [urban, rural], [success, failure]). How would this item affect the entire quantification space for data sets which contain various numbers of response alternatives? For such data sets, what would be the definition of residual space, dual space and total space? To be frank, we do not know much about our quantification space for the general multiple-choice data. Another theory of quantification space is urgently needed.
11.1.2 Non-symmetric Quantification Analysis A friend of the author, Rosaria Lombardo (personal communication, 2022), has pointed out about the clear, important and abundant existence of non-symmetric quantification analysis. It is true that the current book omitted the topic for no obvious reasons, except for the namesake of dual scaling (Nishisato, 1980). For non-symmetric quantification problems, there are many references, among others, Beh et al. (2007), Beh and Lombardo (2014), Beh and Simonetti (2010), D’Ambara et al. (1989, 2010), D’Ambra and Lauro (1992), D’Ambra and Lombardo (1993), Kroonenberg (2002), Kroonenberg and Lombardo (1999a, 1999b), Lauro and D’Ambra (1984), Lombardo and Kroonenberg (1993), Lombardo et al. (1996, 2000, 2007), Simonetti and Gallo (2002), Siciliano et al. (1993), Simonetti (2008), Simonetti and Lucadamo (2013), Takane and Jung (2009a, 2009b), Verde (1992), Verde and Palumbo (1996), Williams and Galindo Villardon (2008). The current author limited his scope of the coverage to symmetric scaling, as inferred from his dual scaling. However, there are many studies on non-symmetric quantification, and it is hoped that one day the symmetric and the non-symmetric cases will be covered by a single framework of quantification, together with some further geometric investigations and geometric theory of quantification space.
11.1.3 Forced Classification Analysis See, for example, Nishisato (1984, 1986, 1988a, 1988b, 1994, 2007), Nishisato and Gaul (1989), Nishisato and Gaul (1990), Nishisato and Baba (1999). This is essentially discriminant analysis of quantification, approached by a limit theorem, like the method of reciprocal averages. Let us consider a response-pattern table of n variables, F, F = [F1 , F2 , · · · , Fk , · · · , Fn ]
11.1 Selected Problems
199
Consider the quantification of a modified data F(k, s), given by F(k, s) = [F1 , F2 , · · · , sFk , · · · , Fn ], where s is a real number such as 1, 2 and so on. Forced classification of the data matrix with the criterion variable k is defined by quantification of the matrix F(k) as s goes to infinity, that is, singular value decomposition of lim F(k, s) = Dominated by the structure of Fk
s→∞
This is equivalent to quantifying Pk F, where Pk is the projection operator associated with variable k, namely Pk = Fk (Fk Fk )−1 Fk . This procedure was generalized to any subspace or a combination of subspace (Nishisato, 1986). As a version of discriminant analysis for categorical data, forced classification procedure offers a wide range of applications of quantification theory to many research inquiries. However, a more direct way than forced classification is through projection operators, as mentioned next.
11.1.4 Projection Operators and Quantification See, for example, Nishisato (1971, 1980) and Nishisato and Lawrence (1989). Quantification does not have to be limited to the entire data space. One may be interested in the behavior of a subgroup of special interest or a group of specific illnesses or some consumer subgroups whose purchasing behavior is of great interest to the sale of goods. In other words, quantification can be carried out with respect to some subsection of data, using the following general expression of the data matrix F (Nishisato & Lawrence, 1989), F = (P1 + P2 + · · · + Pn )F(Q1 + Q2 + · · · + Qm ), where
n i=1
Pi = In and
m
Pj = Im .
j=1
With this general expression of the decomposition of the data matrix into a general expression of the data structure, we can carry out optimization of quantification in terms of any space we may wish. For example, try the following matrices for quantification:
200
1. 2. 3. 4. 5.
11 Beyond the Current Book
(P1 + P2 )F P1 FQ1 FQ1 (I − P1 )F (P2 + P5 )F(I − Q2 ).
We can also consider the analysis of variance of qualitative data and generate appropriate projection operators in the process of the analysis of variance, where we can enhance the effects of certain selected analysis of variance parameters (see Nishisato, 1971). In general, as we see in the above five examples of selecting a combined space maximization, it would be important to produce a computer program which automatically selects interesting space combinations, or carries out all combinations and select as a final output only “interesting” combinations, “interesting” being an unknown parameter at this stage.
11.1.5 Robust Quantification See, for example, Nishisato (1984a, 1987) and Verboon (1994). Since quantification analysis involves maximization or minimization of some statistics, so-called outlier responses can have decisive influences on the outcome, the reason why Nishisato (1984a) proposed the method of reciprocal medians in lieu of the method of reciprocal averages. In fact, we have observed that even a single response out of thousands can determine the first (major) component, which is properly called an outlier response. However, further investigations are needed to deal with robust quantification in general. For example, what about the number of response options of n multiple-choice questions? What about a rare, thus unique, response determining the first principal axis? What about the response distributions over response alternatives in multiplechoice questions. What about an enormous influence of missing responses on the quantification outcomes? No wonder why Nishisato and Ahn (1995) even considered when to stop data analysis.
11.1.6 Biplots See, for example, Gabriel (1971), Gower and Hand (1996), Greenacre (2010), Beh and Lombardo (2014) and Nishisato et al. (2021). Gower and Hand (1996) define that “biplots are the multivariate analogue of scatter plots. They approximate the multivariate distribution of a sample in a few dimensions.” Biplots have been extended to such statistical procedures as principal component analysis, multidimensional scaling and quantification theory, but mostly in two-dimensional space. We need much further developments toward handling multidimensional configurations as we face
11.1 Selected Problems
201
in joint graphical display of quantification outcomes. For the latter, we considered cluster analysis as a promising alternative. What alternative do we have for biplots for handling multidimensional configurations?
11.1.7 Multidimensional Joint Graphs This is a topic which must immediately be attended to. As mentioned earlier, the first problem for the future is to extend the current geometric space theory, discussed in Chap. 8, to many variables and to re-evaluate the problem of extending twodimensional joint graph to multidimensional joint graphs. Now that we have managed to open the tight lid on the geometry of our 2-variable joint graphical problem, we should expect that its extension to multidimensional joint graphs is likely to require a drastic means, logical, mathematical or technological. Priority of research initiatives toward multidimensional joint graphs is necessary to move beyond the current stage of progress in quantification theory. All we need is a practical way of exploring multidimensional space and summarizing it to our satisfaction. Out of all the future problems, this is the number one topic which requires urgency.
11.1.8 Cluster Analysis as an Alternative As Nishisato (2012, 2014, 2016, 2020) suggested, cluster analysis is a feasible way to handle multidimensional configurations. Cluster analysis is often called a nondimensional approach, the dimensionality of data we deal with does not stand in the way of its applications. See, for example, Lebart & Mirkin (1993), Clavel and Nishisato (2020), Nishisato (2012, 2014) and Nishisato et al. (2021). Cluster analysis has so far produced a countless number of publications since early days of 1950s and 1960s. It is highly anticipated that cluster analysis will offer much help in our coping with multidimensional configurations. We know that cluster analysis does not employ the concept of multidimensional space, and this is the reason why we expect that cluster analysis will offer us a useful tool with organizing information scattered in multidimensional space.
11.1.9 Computer Programs Since quantification analysis typically involves a large amount of computations, we need further developments of efficient computer programs which handle multidimensional space effectively, perhaps through interactive modes for examining multidimensional configurations of data. Standardization of multidimensional space is one topic which requires immediate investigation. It would be particularly helpful
202
11 Beyond the Current Book
if such a computer program offers an interactive interpretation search for meaningful understanding of multidimensional configurations of our data, and some focused analysis as offered by forced classification may play an important role in such selective multidimensional graphical display, looking for informative subspace, such as a special combination of patients’ medical symptoms. In the current book, we also discussed the analysis of the squared distance table of rows and columns in dual space. Instead of discussing cluster analysis, we offered a rather intuitive way looking at the distance table through the idea of filtering. Along this line of research, one can develop a very useful computer program which will arrange distances in order or filter the distances with a percentile filter, perhaps in color schemes to identify percentiles for filtering. Then, such a computer program should offer a number of plausible conclusions based on filtered distances. It would also be interesting to develop a computer program to identify a specific set of response pattern such as the most popular response pattern to a subset of questions or such a response pattern to identify which patients fall in a critically serious stage of a certain illness. We can employ the power of computers to explore specific sets of response patterns useful for election surveys, medical surveys or market surveys. There are a number of other problems which need further investigations. Toward that end, there are two papers on “gleaning in dual scaling fields” Nishisato (1996, 2016), which may be useful sources for further explorations of various problems.
11.1.10 Inferential Problems In the current book, we did not discuss any matters related to inferential problems such as significance testing, confidential regions and sampling investigations of various statistics through Monte Carlo computations. These topics are too large to be included in chapters of a single book, but inferential problems are needed to be looked into in order to make quantification theory a full-fledged inferential tool of data analysis.
11.1.11 Nishisato’s Quandary on Multidimensional Analysis Most problems mentioned above are likely to be solved or thoroughly investigated in the next 20 years or so. There is one problem, however, that the current author cannot make such a prediction as above. It is the problem of how the domain of multidimensional analysis can universally be defined theoretically and in practice. In other words, the quandary is how to define multidimensional analysis for quantitative and categorical data. Some readers would say that the current book has answered this quandary. However, we have left a key aspect of the problem, namely whether to analyze the original (raw) data or some kinds of optimal transforms of the original data. Let us elaborate on this quandary further.
11.1 Selected Problems
203
Start with principal component analysis, PCA (Hotelling, 1933). In physical sciences, we see many applications of PCA to raw data (non-transformed data), where the analysis deals with the eigenequation of the variance-covariance matrix. In contrast, the most frequently encountered data in the social sciences are interval measurement or nominal measurement, where the unit or the origin, or both may not be defined, resulting in the typical applications of PCA of standardized variables, a way to equate the unit of measurement. This latter process leads to solving an eigenequation of the correlation matrix, rather than the variance-covariance matrix. As Nishisato (1991) and Nishisato and Yamauchi (1974) numerically demonstrated, principal components associated with the correlation matrix are often totally different from the principal components associated with the variance-covariance matrix of the same data. In other words, those dominant variables from the PCA of the correlation matrix are often not dominant in teh PCA of the variance-covariance analysis. This means that in general there is no correspondence between the PCA results from the correlation matrix and the covariance matrix. One can easily understand those differences in data because standardization of the data completely changes the resultant principal axes because the variables with different variances are now transformed to the variables of the same variance through standardization. Even with quantitative data, some investigators prefer standardizing the variables because some variables are known to have much larger variances, than the others, thus dominating in data structure in major components. But, the standardization of variables changes the principal component structure of the data extensively, hence the interpretation of the data structure! When data are questionnaire data, many researchers would standardize scores of all the questions, leading to principal component analysis of the correlation matrix. Again, the resultant principal components are typically quite different from the principal components of the raw data. For quantification theory, Nishisato (1991) demonstrated how the quantification results are influenced if the contributions of the variables with different numbers of categories are fixed to be constant. The results after this type of “standardization” are so unpredictable and are beyond our imagination. Considering that the number of response alternatives in multiple-choice data also affects the quantification results as Nishisato demonstrated, would you not like to try to fix the number of response alternatives in survey research to be constant? This kind of standardization cannot always be useful, however, because some variables have pre-fixed numbers of categories (e.g., gender; right-handed versus left-handed versus ambidextrous; political parties). Another problem one often encounters in data analysis is to categorize continuous variables so that quantification theory may appropriately be applied to analyze the transformed data. This problem of discretizing continuous variables has been handled by researchers in many areas of investigation. See, for example, an extensive literature survey by Eouanzoui (2004), and relevant recent studies by Kim and Frisby (2019) and Kim et al. (2022). Again, in this situation, the same quandary will arise, that is,
204
11 Beyond the Current Book
fixing the number of categories for all variables or introducing various numbers of categories by the researchers’ judgment. We know how to carry out multidimensional analysis, whether data are quantitative or categorical. The quandary is what form or forms of data we should analyze to arrive at our satisfactory multidimensional structure of the data: standardize or not (PCA), equating the contributions of multiple-choice items or not (quantification of multiple-choice data), equating the sums of the squares of different variables or not, and so on. Our ultimate question is what is the realistic data structure and how to arrive at it. This question needs to be answered to our satisfaction. Ask what is a sensible form of the data, original or transformed?
11.1.12 Gleaning in the Field of Quantification See Nishisato (1966, 2014, 2023) as souces of relevant discussions.
11.2 Final Words: Personal Notes My research life started with experimental psychology with B.A. and M.A. from Hokkaido University in Sapporo, Japan. Then, with the precious help of the Fulbright Commission in 1961, I was fortunate enough to start my Ph.D. work at the Psychometric Laboratory, the University of North Carolina at Chapel Hill at the age of 26. I specialized in psychometrics as major and mathematics as minor disciplines. Under the supervision of R. Darrell Bock, who was then known as the leader of optimal scaling, I completed my Ph.D. thesis on “Minimum entropy clustering of test items.” After the successful final Ph.D. oral examination in 1965, I returned to Japan at the age of 30. I was very disappointed that there were no employment prospects anywhere in Japan. Fortunately in 1966, however, I was offered a position by George A. Ferguson, Dalbir Bindra and Wally Lambert in the Department of Psychology, McGill University in Montreal, Canada. The following year, I was recruited to the Ontario Institute for Studies in Education of the University of Toronto by my lifelong friend Ross E. Traub. I was 31 years old then. I worked there until July 31, 2000, the first end of the academic year after my 65th birthday on June 9th. As I wrote in my memoir in Beh et al. (2023), my research life was enriched by many colleagues and students at the University of Toronto and by many international researchers. Thanks to W. Gaul, H. H. Bock of Germany, L. Lebart, A. Morineau, M. Tenenhaus, G. Saporta and E. Diday of France, C. Lauro of Italy, J. C. Gower of England, B. Mirkin and S. Adamov of USSR (now Russia), V. Zudrafkov of Bulgaria, W. Heiser and J. Meulman of the Netherlands, J. G. Clavel and H. Hererra of Spain, J. Sachs of Hong Kong, K. S. Leong, T. Nishikiori and S. H. Poh of Singapore,
11.2 Final Words: Personal Notes
205
and many friends in Japan, I could expand my horizon with their kind invitations to conferences and universities. It was a fortunate encounter in meeting José G. Clavel some 25 years ago at a conference in Barcelona, Spain (since then, he has been my technical advisor on programming and text editing, in fact he kindly produced the three distance matrices for the three data sets used in the book). Then, I met super scholars Eric J. Beh and Rosaria Lombardo through my struggle with publishing a paper on the theory of doubled multidimensional space and my review of their book (Beh & Lombardo, 2014) in Psychometrika. Four of us decided to publish a book together where the geometric theory of doubled space was a major part (Nishisato et al., 2021). Their friendship motivated me once again to write another book (Nishisato, 2022) and then the current book as well. Our friendship did not end with our joint book, but they kindly accepted the editorship of my Festschrift in honor of my work. It is expected to be published by Springer on my 88th birthday in June 2023. My special thanks go to these three friends from Australia, Italy and Spain. Se-Kang Kim has also been a helpful correspondence friend for many years of my research career. I would like to dedicate this book to my family members in Canada and Japan, colleagues and friends who supported me for so many years. Thank you all! Finally, for those working in quantification theory, I should mention my first encounter with quantification theory. It was Hayashi’s 1950 paper. The first time I met him was in 1960 at a conference in Tokyo, Japan. However, it was not until late 1961 when I took a course with my mentor R. Darrell Bock that I realized Hayashi’s quantification theory and Bock’s optimal scaling shared the same theoretical basis. It is interesting to note, however, that my mentor Bock actually supervised my Ph.D. thesis on cluster analysis, not on his optimal scaling. Those days cluster analysis was not well-known, and my Ph.D. thesis (Nishisato, 1966) was based on Bock’s idea to combine a logistic regression and information theory into clustering of binary variables. In retrospect, those were all new topics then and I still admire Bock’s ingenuity and insight for future research directions. My pursuit for optimal scaling was changed to dual scaling (Nishisato, 1980), and now after so many years, I am discovering again the monumental promise of cluster analysis which I studied in the 1960s. It is not a sentimental journey, but a belief that cluster analysis may hold a key in successful handling of hyperdimensional data structure, be it dendrograms, tree structures, holographic images or something of future invention. I now would like to look forward to seeing many young talented researchers make a giant step toward progress of quantification theory, in particular, developing efficient dynamic and interactive techniques of illustrating multidimensional information in data for everyone’s use, be it visual display or cluster analysis or something completely new to us. Quantification theory now needs vast advancement from the current foundation to the next higher stage for everyday use. I hope that the message in the current book on a new strategy of two-stage analysis and doubled multidimensional geometry may serve as a small first step toward monumental future progress of quantification theory. Geometric space theory is urgently needed for other data
206
11 Beyond the Current Book
types which were excluded from the current book. Also, inferential problems need to accompany the progress in multidimensional analysis of non-quantitative data. Most of all, I look forward to seeing satisfactory answer(s) to my quandary on multidimensional analysis, as mentioned above.
References Beh, E. J. & Simonetti, B. (2010). A few moments for non-symmetric correspondence analysis. In Proceedings of the European Symposium on Statistical Methods for the Food Industry (pp. 277–228) Bevebento, Italy. Beh, E. J., & Lombardo, R. (2014). Correspondence analysis: Theory, practice and new strategies. Wiley. Beh, E. J., Lombardo, R., & Clavel, J. G. (2023). Analysis of categorical data from historical perspectives. Springer Nature: A festschrift in honor of Shizuhiko Nishisato. Beh, E. J., Simonetti, B., & D’Ambra, L. (2007). Partitioning a non-symmetric measure of association for three-way contingency tables. Journal of Multivariate Analysis, 98, 1391–1411. Clavel, J. G., Nishisato, S., et al. (2020). From joint graphical display to bi-modal clustering: [2] Dual space versus total space. In T. Imaizumi (Ed.), Advanced Research in Classification and Data Science. Springer. D’Ambara, L., D’Ambra, A., & Sarnacchiaro, P. (2010). Visualizing main effects and interaction in multiple non-symmetric correspondence analysis. Journal of Applied Statistics, 30, 2165–2175. D’Ambra, L. & Lauro, N. C. (1989). Non-symmetric correspondence analysis for three-way contingency table. In R. Coppi, & S. Bolasco (Eds.), Multiway Data Analysis, pp. 301–315. D’Ambra, L. & Lombardo, R. (1993). L’analysi non-simmetrica normalizzati degli alionente. Proceedings of the Conference Statchem. Venice, Italy. D’Ambra, L., & Lauro, N. C. (1992). Non-symmetric exploratory analysis. Statistica Applicata, 4, 511–529. Eouanzoui, K. B. (2004). On desensitizing data from interval to nominal measurement with minimum information loss. The University of Toronto Ph.D. thesis. Gabriel, K. R. (1971). The biplot graphical display of matrices with applications to principal component analysis. Biometrics, 58, 453–467. Gower, J. C., & Hand, D. J. (1996). Biplots. London: Chapman & Hall. Greenacre, M. J. (2010). Biplots in practice. Foundation BBVA. Hayashi, C. (1950). On the quantification of qualitative data from the mathematico-statistical point of view. Annals of the Institute of Statistical Mathematics, 2, 35–47. Hotelling, H. (1933). Analysis of complex of statistical variables into principal components. Journal of Educational Psychology, 24, 417–441, and 498–520. Kim, S. K, McKay, D. & Tolin, D. (2022). Examining the generality and specificity of gender moderation in obsessive compulsive beliefs: Stacked prediction by correpondence analysis. British Journal of Clinical Psychology, (pages unknown yet). Kim, S. K., & Frisby, C. L. (2019). Gains from discretization of continuous data: The correspondence analysis biplot approach. Behavior Resarch Methods, 51(2), 589–601. Kroonenberg, P. M. (2002). Analyzing dependence in large contingency tables: Non-symmetric correspondence analysis and regression with optimal scaling. In S. Nishisato, Y. Baba, H. Bozdogan, & K. Kanefuji (Eds.), Measurement and Multivariate Analysis (pp. 87–96). Springer. Kroonenberg, P. M., & Lombardo, R. (1999). Non-symmetric correspondence analysis: A tutorial. Kwantitative Methoden, 58, 57–83. Kroonenberg, P. M., & Lombardo, R. (1999). Non-symmetric correspondence analysis: A tool for analyzing contingency tables with a dependent structure. Multivariate Behavioral Research Journal, 34, 367–397.
References
207
Lauro, N. C., & D’Ambra, L. (1984). L’analyse non-symmetrique des correspondances. In E. Diday (Ed.), Data Analysis and Informatics (pp. 433–446). Elsevier. Lebart, L., & Mirkin, B. D. (1993). Correspondence analysis and classification. Multivariate Analysis: Future Directions, 2, 341–357. Lombardo, R. & Kroonenberg, P. M. (1993). Non-symmetric correspondence analysis: Some examples. The International Statistical Institute Proceedings, 49th Session Book 2 , (pp. 127–128). Florence, Italy. Lombardo, R., Beh, E. J., & D’Ambra, L. (2007). Non-symmetric correspondence analysis with ordinal variables. Computational Statistics and Data Analysis, 52, 566–577. Lombardo, R., Carlier, A., & D’Ambra, L. (1996). Non-symmetric correspondence analysis for three-way contingency tables. Metodologica, 4, 59–80. Lombardo, R., Kroonenberg, P. M., & D’Ambra, L. (2000). Non-symmetric correspondnce analysis: A simple tool in market share distribution. Journal of the Italian Statistical Society, 3, 107–126. Nishisato, S. (1966). Minimum entropy clustering of test items. Ph.D. thesis at the University of North Carolina, Chapel Hill, N.C. (University Microfilms, Inc., Ann Arbor, Michigan). Nishisato, S. (1971). Analysis of variance through optimal scaling. in Proceedings of the First Canadian Conference in Applied Statistics (pp. 306–316). Sir George Williams University Press. Nishisato, S. (1980). Analysis of categorical data: Dual scaling and its applications (p. 24). Mathematical Expositions No: The University of Toronto Press. Nishisato, S. (1984a). Dual scaling by reciprocal medians. In Proceedings of the 32nd Scientific Conference of the Italian Statistical Society (pp. 141–147). Sorrento, Italy. Nishisato, S. (1987). Robust techniques for quantifying categorical data. In I. B. MacNeil & G. J. Umphrey (Eds.), Foundations of statistical inference (pp. 209–217). D. Reidel Publishing Company. Nishisato, S. (1988). Market segmentation by dual scaling through generalized forced classification. In W. Gaul & M. Schader (Eds.), Data, Expert Knowledge and Decisions (pp. 268–278). SpringerVerlag. Nishisato, S. (1991). Standardizing multidimensional space for dual scaling. In Proceedings of the 20th Annual Meeting of the German Operations Research Society (pp. 584–591) Hohenheim University. Nishisato, S. (1994). Elements of dual scaling: An introduction to practical data analysis. Lawrence Erlbaum Associates. Nishisato, S. (2007). Multidimensional nonlinear descriptive analysis. Chapman-Hall/CRC. Nishisato, S. (2012). Quantification theory: Reminiscence and a step forward. In W. Gaul, A. GeyerSchultz, L. Schmidt-Tiéme, & J. Kunze (Eds.), Challenges at the interface of data analysis, computer-science and optimization (pp. 109–119). Springer. Nishisato, S. (2014). Structural representation of categorical data and cluster analysis through filters. In W. Gaul, A. Geyer-Schultz, Y. Baba, & A. Okada (Eds.), German-Japanese interchange of data analysis results (pp. 81–90). Springer. Nishisato, S. (2019b). Expansion of contingency space: Theory of doubled multidimensional space and graphs. An invited talk at the Annual Meeting of the Japanese Classification Society, Tokyo. (in Japanese). Nishisato, S. (2022). Optimal quantification and symmetry. Springer. Nishisato, S. (2023 in press). Propositions for quantification theory. In Okada et al. (eds), Facets of Behavormetrics. Behavormetrics: Quantitative Approaches to Human Behavior 4(series). Nishisato, S., & Lawrence, D. R. (1989). Dual scaling of multiway data matrices: Several variants. In R. Coppi & S. Bolasco (Eds.), Multiway data analysis (pp. 317–326). North Holland. Nishisato, S., Beh, E. J., Lombardo, R., & Clavel, J. G. (2021). Modern quantification theory: Joint graphical display, biplots, and alternatives. Springer Nature. Nishisato, S., et al. (2020). From joint graphical display to bi-modal clustering: [1] A giant leap in quantification theory. In T. Imaizumi (Ed.), Advanced Research in Classification and Data Science. Springer.
208
11 Beyond the Current Book
Nishisato, S., et al. (1986). Generalized forced classification for quantifying categorical data. In E. Diday (Ed.), Data analysis and informatics (Vol. IV, pp. 351–362). Elsevier Science Publishers B. V., North Holland. Nishisato, S. (1984). Forced classification: A simple application of a quantification technique. Psychometrika, 49, 25–36. Nishisato, S. (1988). Forced classification procedure of dual scaling: Its mathematical properties. In H. H. Bock (Ed.), Classification and related methods (pp. 523–532). North Holland. Nishisato, S. (1993). On quantifying different types of categorical data. Psychometrika, 58, 617– 629. Nishisato, S. (1996). Gleaning in the field of dual scaling. Psychometrika, 61, 559–599. Nishisato, S. (2016). Dual scaling: Revisit to gleaning of the field. Theory and Applications of Data Analysis, 5, 1–9. (In Japanese). Nishisato, S. (2019). Reminiscence: Quantification theory and graphs. Theory and Applications of Data Analysis, 8, 47–57. (in Japanese). Nishisato, S., & Ahn, H. (1995). When not to analyze data: Decision making on missing responses in dual scaling. Annals of Operations Research, 55, 361–378. Nishisato, S., & Baba, Y. (1999). On contingency, projection and forced classification of dual scaling. Behaviormetrika, 26, 207–219. Nishisato, S., & Clavel, J. G. (2003). A note on between-set distances in dual scaling and correspondence analysis. Behaviormetrika, 30, 87–98. Nishisato, S., & Gaul, W. (1989). Marketing data analysis by dual scaling. International Journal of Research in Marketing, 5, 151–170. Nishisato, S., & Gaul, W. (1990). An approach to marketing data analysis: The forced classification procedure of dual scaling. Journal of Marketing Resarch, 27, 354–360. Nishisato, S., & Yamauchi, H. (1974). Principal components of deviation scores and standardized scores. Japanese Psychological Research, 16, 162–170. Siciliano, R. (2008). Taxi-cab non-symmetric correspondence analysis. In L. D. Ciavolino, L. D’Ambra, M. Squillante, & G. Ghiani (Eds.), Methods (pp. 257–260). Models of Information Technologies for Session Support System: University of Sorrento. Siciliano, R., Mooijaart, A., & van der Heijden, P. G. M. (1993). A probabilistic model for nonsymmetric correspondence analysis and prediction in contingency tables. Journal of the Italian Statistical Society, 2, 85–106. Simonetti, B. (2008). Taxi-cab non-symmetric correspondence analysis. In L. D. Ciavolino, L. D’Ambra, M. Squillante, & G. Ghiani (Eds.), Methods (pp. 257–260). Models of Information Technologies for Session Support System: University of Sorrento. Simonetti, B., & Gallo, M. (2002). Alternative interpretations to the non-symmetrical correspondence analysis. Caribbean Journal of Mathematical and Computing Sciences, 12, 18–22. Simonetti, B., & Lucadamo, A. (2013). Taxi-cab non-symmetrical correspondence analysis for the evaluation of the passenger satisfaction. Advanced Dynamic Modeling of Economic and Social System Studies in Computational Intelligence, 448, 175–184. Takane, Y., & Jung, S. (2009). Regularized nonsymmetric correspondence analysis. Computational Statistics and Data Analysis, 53, 3159–3170. Takane, Y., & Jung, S. (2009). Tests of ignoring and eliminating in nonsymmetric correspondence analysis. Advances in Data Analysis and Classification, 3, 315–340. Verboon, P. (1994). A robust approach to nonlinear multivariate analysis. Leiden University: DSWO Press. Verde, R. & Palumbo, F. (1996). Analysi fattoriale discriinante non-simmetrica su predittori qualitativi. Proceedings of the XXXVIII Conference of the Italian Statistical Society. Verde, R. (1992). Nonsymmetrical correspondence analysis: A nonlinear approach. Statistica Applicata, 4, 453–463. Williams, P. M., & Galindo Villardon, M. P. (2008). Canonical non-symmetrical correspondence analysis: An alternative to constrained ordination. SORT, 32, 93–112.