Optimal Quantification and Symmetry 981169169X, 9789811691690

This book offers a unique new look at the familiar quantification theory from the point of view of mathematical symmetry

216 54 3MB

English Pages 198 [199] Year 2022

Report DMCA / Copyright

DOWNLOAD PDF FILE

Table of contents :
Preface
Acknowledgments
Contents
Part I Theory with Examples
1 Optimality and Symmetry
1.1 Symmetry and Graph
1.2 Optimal Quantification
References
2 Examples of Quantification
2.1 Kretschmer's Typology
2.1.1 Ordinary Analysis and Results
2.1.2 Quantification Analysis and Results
2.2 Joint Graphical Display
2.3 Singapore Data
2.3.1 Some Consideration for Ordinary Analysis
2.3.2 Quantification Analysis and Results
2.4 Graphical Display
2.5 Examples of Other Types of Graphs
2.5.1 Sorting Countries: Sorting Data
2.5.2 Seriousness of Criminal Acts: Successive Categories Data
2.5.3 Mothers' and Children's Wishes for Professions
2.5.4 Christmas Party Plans
2.5.5 Attractions of Hot Springs: Rank-Order Data
References
3 Constraints on Quantification
3.1 What Data Should We Quantify?
3.2 Some More Observations
3.3 Data in Terms of Unknown Numbers
3.4 Quantification Under Constraints
References
4 Quantification Procedures
4.1 Historical Background
4.2 Strategies
4.2.1 Quantification Through Correlation
4.2.2 Quantification Through Correlation Ratio
4.2.3 Quantification Through Cronbach's Alpha
4.2.4 Method of Reciprocal Averages: MRA
4.3 Optimal Symmetric Properties
4.4 Bilinear Expansion and Graphical Display
References
5 Mathematical Symmetry
5.1 Bi-modal Symmetry
5.1.1 Correlation
5.1.2 Correlation Ratio
5.2 Multi-modal Symmetry
5.2.1 Piecewise Method of Reciprocal Averages
5.2.2 Generalization to n Variables
References
6 Data Format and Information
6.1 Two Formats of Same Data
6.2 Further Comparisons of Data Formats
6.3 Numerical Illustration
6.3.1 Kretschmer's Typology Revisited
References
7 Space Theory and Symmetry
7.1 Spatial Symmetry
7.2 Theory of Quantification Space
7.2.1 Contingency Space
7.2.2 Dual Space: Symmetric Space
7.2.3 Pairwise Dual Subspaces
7.2.4 Total Space
7.2.5 Residual Space
7.3 Example of Space Decomposition
7.4 Recommendations
References
8 Graphical Display
8.1 Graphical Display of Rows or Columns
8.1.1 Blood Pressures and Migraines
8.2 Joint Graph: Correspondence Plot
8.3 Logically Correct Graph and Discrepancy Diagram
8.3.1 Graphs of Response-Pattern Format
8.4 Re-evaluating Correspondence Plot
8.4.1 Alternatives to Joint Graphical Display
References
Part II Gleaning in the Field
9 Forced Classification
9.1 Procedure of Forced Classification
9.1.1 Criterion-Total Correlation
9.1.2 Criterion Items Correlation
9.1.3 Partitioning of Total Space
9.1.4 Contributions of Individual Components
9.1.5 Legitimacy of Set by Set Analysis
9.1.6 An Example of Application
9.1.7 Graph in Criterion-Item Space
9.2 Generalized Forced Classification
References
10 Data with Designed Structure
10.1 Analysis of Variance of Nominal Data
10.1.1 Maximizing the Effects of α, β and γ
10.2 Quantification of Multi-way Analysis of Data
References
11 Quantifying Dominance Data
11.1 Dominance Data
11.1.1 Quantification Approaches
11.1.2 Quantification
11.1.3 Total Information
11.2 Example: Ranking of Municipal Services
11.3 Paired Comparison Data
11.3.1 Example: Wiggins' Christmas Party Plans
11.3.2 Example: Seriousness of Criminal Acts
11.3.3 Goodness of Fit
11.4 Forced Classification of Ordinal Data
11.4.1 Rank-Order and Paired Comparison Data
References
Part III Cautions for Quantification
12 Over-Quantification
12.1 Adverse Conditions of Data
12.1.1 Future of English in Hong Kong: Tung's Data
12.2 Standardized Quantification
12.2.1 Option Standardization
12.2.2 Results of Standardization
12.3 Handling Outlier Responses
12.3.1 The Method of Reciprocal Medians: MRM
12.3.2 Alernating Reciprocal Averaging and Reciprocal Medians
12.3.3 Method of Trimmed Reciprocal Averages
References
13 When Not to Analyze Data
13.1 Missing Responses and Quantification
13.2 Some Procedures
13.2.1 List-Wise Deletions
13.2.2 Extra Categories
13.2.3 Imputation
13.3 Imputation Principles
13.3.1 Principle of Maximal Internal Consistency
13.3.2 Hot-Deck Principle
13.3.3 Principle of Complete Ignorance
13.4 Decision Rules: When Not to Analyze
13.5 Towards a State-of-the-Art Framework
References
14 Epilogue
14.1 Reminiscence
14.1.1 John C. Gower
14.1.2 Jean-Paul Benzécri
14.2 Going Forward
References
Part IV Appendices
15 Stevens' Measurement Theory
15.1 Four Kinds of Measurement
15.1.1 Nominal Measurement
15.1.2 Ordinal Measurement
15.1.3 Interval Measurement
15.1.4 Ratio Measurement
15.2 Domains of Quantification
15.2.1 Full-Fledged Domain
15.2.2 Quasi-Domain
15.2.3 Outside Domain
References
16 A Numerical Example of MRA
16.1 Computing Optimal Component
16.2 Extracting More Components
References
Recommend Papers

Optimal Quantification and Symmetry
 981169169X, 9789811691690

  • 0 0 0
  • Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up
File loading please wait...
Citation preview

Behaviormetrics: Quantitative Approaches to Human Behavior 12

Shizuhiko Nishisato

Optimal Quantification and Symmetry

Behaviormetrics: Quantitative Approaches to Human Behavior Volume 12

Series Editor Akinori Okada, Professor Emeritus, Rikkyo University, Tokyo, Japan

This series covers in their entirety the elements of behaviormetrics, a term that encompasses all quantitative approaches of research to disclose and understand human behavior in the broadest sense. The term includes the concept, theory, model, algorithm, method, and application of quantitative approaches from theoretical or conceptual studies to empirical or practical application studies to comprehend human behavior. The Behaviormetrics series deals with a wide range of topics of data analysis and of developing new models, algorithms, and methods to analyze these data. The characteristics featured in the series have four aspects. The first is the variety of the methods utilized in data analysis and a newly developed method that includes not only standard or general statistical methods or psychometric methods traditionally used in data analysis, but also includes cluster analysis, multidimensional scaling, machine learning, corresponding analysis, biplot, network analysis and graph theory, conjoint measurement, biclustering, visualization, and data and web mining. The second aspect is the variety of types of data including ranking, categorical, preference, functional, angle, contextual, nominal, multi-mode multi-way, contextual, continuous, discrete, high-dimensional, and sparse data. The third comprises the varied procedures by which the data are collected: by survey, experiment, sensor devices, and purchase records, and other means. The fourth aspect of the Behaviormetrics series is the diversity of fields from which the data are derived, including marketing and consumer behavior, sociology, psychology, education, archaeology, medicine, economics, political and policy science, cognitive science, public administration, pharmacy, engineering, urban planning, agriculture and forestry science, and brain science. In essence, the purpose of this series is to describe the new horizons opening up in behaviormetrics — approaches to understanding and disclosing human behaviors both in the analyses of diverse data by a wide range of methods and in the development of new methods to analyze these data. Editor in Chief Akinori Okada (Rikkyo University) Managing Editors Daniel Baier (University of Bayreuth) Giuseppe Bove (Roma Tre University) Takahiro Hoshino (Keio University)

More information about this series at https://link.springer.com/bookseries/16001

Shizuhiko Nishisato

Optimal Quantification and Symmetry

Shizuhiko Nishisato University of Toronto Toronto, ON, Canada

ISSN 2524-4027 ISSN 2524-4035 (electronic) Behaviormetrics: Quantitative Approaches to Human Behavior ISBN 978-981-16-9169-0 ISBN 978-981-16-9170-6 (eBook) https://doi.org/10.1007/978-981-16-9170-6 © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Singapore Pte Ltd. The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721, Singapore

Dedicated to: My wife Lorraine, son Ira, Samantha Dugas grandson Lincoln Dugas-Nishisato in Canada and My sister Michiko Soma and brother Akihiko Nishisato in Japan for their many years of constant and total support in my personal and professional life!

Preface

The major role of quantification theory is to transform qualitative data into quantitative data so that our familiar arithmetic operations can be used to analyze the data. The desideratum of this transformation is to introduce some optimal characteristics into the transforms so that we may hopefully grasp complex information contained in the original data in the most effective way. Quantification theory has existed for roughly 100 years and many researchers have been involved in its development. During these years, a large number of books and research papers have been published on the topic. Therefore, some readers may wonder why this book now? The main reason is to present a new perspective of quantification theory in terms of “symmetry” which appears throughout its mathematics and geometry, and most importantly in the context of symmetry combined with optimization. “Symmetry” appears in mathematics, biology, arts and our everyday life, all occupying some status of desirable, fine and supreme characters. Indeed, symmetry is also a paramount, underlying principle for quantification theory. This new look will help dissolve some ice surrounding the theory and hopefully bring its attractiveness closer to the readers. Be sure that symmetry and optimization will indeed play the role of an ironclad framework for multidimensional quantification. In the current book, we will start with the reason why optimal quantification should be pursued, using examples. Then, we discuss input data for quantification and the objectives of quantification. This is to make sure what kinds of data are appropriate for quantification and what is the goal of quantification. We will then proceed from simple steps for quantification to handling complex problems step by step. This book is geared for typical university courses on data analysis. With enough mathematics and numerical examples, the current book will attract the attention of not only eager students, but also many seasoned researchers. Recently I read a book written by my university classmate, Mr. Hideshi Seki, a noted historian in Japan, who published a book on the pioneer days of Hokkaido, Japan, based on his series of lectures. Through a cleaver arrangement of the questions from his students, the entire book became an excellent introduction to the history. I was so impressed with the style of Mr. Seki’s book which even an outsider could vii

viii

Preface

easily grasp and appreciate the history. His book motivated me to write this book, immediately after finishing a 2021 Springer book with three other friends, mentioned in the next paragraph. During the past few years, I felt uneasy about one topic of quantification, joint graphical display. Thus, this topic became the central issue of my contribution to Nishisato, Beh, Lombardo and Clavel (2021). When the book was completed, however, Mr. Seki’s book made me wonder if I had explained quantification theory clear enough to the readers. This prompted me to write the current book. To worry about a little thing like this may be a sign of senior moments, but I still hope that some readers will find the current book interesting, useful and appealing enough to read. The book is not a sequel to Nishisato, Beh, Lombardo and Clavel’s Modern Quantification Theory: Joint Graphical Display and Alternatives, Singapore: Springer Nature (2021). As compared with this book, the current book is introductory, but the object is to elucidate further a very basic symmetric framework flourished in the early days of quantification theory. As such, I hope the book will serve well as a good companion book to the book by Nishisato, Beh, Lombardo and Clavel. The topics were arranged so that the entire spectrum of quantification theory is covered in an understandable way. May the book serve as a valuable addition to the quantification theory literature. Finally, I have to apologize to the readers about two personal matters. First, since my 1960 paper on factor analysis, time has passed rather quickly. To write a book based on my chosen career, it is natural to cite my own work perhaps too often. Should I appear terribly conceited or if you feel slighted by my not referring to your work equally extensively, please do accept my sincere apologies. Second, I took the liberty of acknowledging the friendship of many friends, including those from my childhood. This may be unusual, but please excuse me for that since it was due to my selfish and sentimental urge to do so. Toronto, Canada November 2021

Shizuhiko Nishisato Professor Emeritus

Acknowledgments

The author wishes to acknowledge the generous support of Dr. Akinori Okada, the Editor of Springer Behaviormetrics Series, Mr. Yutaka Hirachi of Springer Japan and the Editorial Staff of Springer Nature Singapore, all of whom have offered their kind and utmost support to make this publication possible. My heartfelt thanks go to my true mentor the late Prof. R. Darrell Bock at the Psychometric Laboratory of the University of North Carolina in Chapel Hill, and to the Fulbright Commission for making it possible for me to study in the USA. I also owe a great deal of gratitude to the following friends: McGill University, Montreal Albert Bregman, Don Donderi, Michael Corballis, James O. Ramsay, Richard Tucker, Charles Crawford and Roger Blackman. Ontario Institute for Studies in Education, University of Toronto (Colleagues): Ross E. Traub, Richard G. Wolfe, Gila Hanna, Ruth Childs, Susan Elgie, David Abbey, Phillip Nagy and Tony Lam; (Visiting scholars): Yasumasa Baba, José G. Clavel, Takayuki Saito, Fumiyasu Yamada, Hirotsugu Yamauchi, Hideo Tsujimoto, Cheong-Ho Baek and Honesto Herrera; (Former students): Wenn-Jen Sheu, Kuo-Sing Leong, Wan-Pui Poon (Sheu), Yukio Inukai, Yukihiko Torii, Daniel R. Lawrence, Maria Svoboda, Ten-Poh Lim, John Sachs, Mary Kolic, Hyung Ahn (Kim), David Hemsworth, Ian Wiggins, Peter Tung, Deborah Day, Diana Chan, Stuart Halpine, Gwenyth Boodoo, Mark Gesarroli, Charles Mayenga, Maurice Odondi, Liqun Xu, Oscar Millones, Peter Lewicky, Luis Moreno and Keanre B. Eouanzoui. Research Yoshio Takane, Shigeo Kashiwagi, Hiroshi Ikeda, Jan de Leeuw, Koji Kanefuji, Tadashi Imaizumi, Shigeo Tatsuki, Yutaka Tanaka, Noboru Ohsumi, Ryozo Yoshino, Eric J. Beh, Rosaria Lombardo, Se-Kang Kim, Lawrence Hubert, Norman Cliff, Wolfgang Gaul, Hans-Hermann Bock, Michael J. Greenacre, Ludovic Lebart, Michel Tenenhaus, Gilbert Saporta, Willem Heiser, Jaqueline Meulman, Peter van der

ix

x

Acknowledgments

Heijden, Natale Lauro, Hamparsum Bozdogan, Ian Spence, William Day, Edwin Diday, Philip Weingarden, Chun-Ho Chen, Boris Mirkin, Serghei Adamov, Jean Moreau, Pierre A. Doudin, Pierre Cazes, David Kaplan, Brigette Le Roux, Vihar Zudravkov, Hans-Joachim Mucha, Ingo Böckenholt, Ulf Böckenholt, Glen Milligan, Alain Morineu, Eecke van der Burg, Ineke Stoop, Pieter Kroonenberg, Henk A. L. Kiers, Patrick Groenen, Albert Satorra, Rosario Martínez Arias, Michel van de Velden, Ana Torres-Lacomba, Daniel Baier, Rheinhold Decker, Jörge Blasius, Vartan Choulakian, Helmut Vorkauf, Giuseppe Bove, Werner Wothke, Graham Bean, Elliot Noma, Takashi Murakami, Shuichi Iwatsubo, Setsuko Takakura, Yasuharu Okamoto, Kumiko Maruyama, Fumi Hayashi, Masahiro Mizuta, Yutaka Kano, Hiroshi Yadohisa, Kohei Adachi, Takefumi Ueno, Philip Yu, Peter Bentler, Heungsun Hwang, Mark de Rooij, Terry Ackerman, Brian Junker, Mark Wilson, Wayne de Sarbo, David Thissen, Robert Cudeck, Patric Curran, Maurizio Vichi, L. Andries van der Ark, Kazuo Shigemasu, Naohito Chino, Maomi Ueno, Hisashi Kamisasa, Hiroshi Akuto, Tomokazu Haebara, Shin-ichi Maekawa, Kentaro Hayashi, Tatsuo Otsu, Koji Kurihara, Takemi Yanagimoto, Tomoyuki Tarumi, Takashi Nakamura, Takahiro Hoshino, Tadahiko Maeda, Atsuho Nakayama, Nobuo Shimizu, Hisao Miyano, Yuichi Mori, Satoru Yokoyama, Tamaki Yano, Koji Kosugi and Natsumi Wakamoto. Childhood (War years in Urahoro, Hokkaido, Japan) Suketoshi Iiyama, Minoru Shimosaka, Toshitaka Tago, Ryoko Miura, Tsuyako Ikehata, Shizue Hashieda, Yasuko Miura, Etsuko Kawabata, Toshiko Usui and Mitsugu Takabatake. High Schools and Universities (In Sapporo, Japan) Takashi Asano (California, USA, a recipient of the Stockholm Water Prize), Satoshi Kon, Tadashi Yamada, Hiroshi Nakamura, Tetsuro Kokubu, Yutaka Komuro, Keiko Niizuma, Akiyoshi Tanaka, Chieko Azuma, Mamoru Homma, Tamako Ueda, Hirotsugu Kamakura, Ko Kuwata, Satoshi Murayama, Hideshi Seki, Miyuki Hasuike, Reiko Kawanishi, Hiroshi Oikawa, Osamu Shirahata, Michio Takada, Masahiro Kibitsu, Kouichi Murakami, Hirozumi Shibuya, Ryuichi Shibuya, Shuzo Nagai, Miyuki Nakamura, Shozo Yokoyama, Yuzo Matsuda, Akira Fujii, Akifumi Fujimoto, Reiko Komatsu, Noriko Okifuji, Kikuko Naruse, Takashi Kitano, Shigeo Ueda, Shinsuke Muto, Sachiko Teraoka, Reiko Akatsuka and Tsuyoshi Hirata; (In Chapel Hill, N.C., USA) Elizabeth Abbe, Amnon Rapoport, David M. Messick, Larry Gordon, Steve Zyzanski, Nancy Cole, James Kahn, Jun-ichi and Mutsumi Nakahara, Shigemichi and Takako Suzuki, Hidesaburo Kusama, Christopher Ringwalt family and Nobuharu and Kazuko Okuno. Friends Setsuko Thurlow (Toronto, Canada, a recipient of the Nobel Peace Prize), André and Gillian Dugas, Clayton and Claire Ford, Akira Ozaki, Ken Takeda, Hiroshi Imada, Akira and Keiko Kobashigawa, Masao Nakanishi, Akihiro Yagi, Ken-ichi Narita, Tsneo Shimazaki, Hiroshige Okaichi, Masao Yogo, Ayumi Tanaka, Naoto Suzuki, Suguru Sato, Ichiro Uchiyama, Takuma Takehara, Kenjiro Aoyama, Frances

Acknowledgments

xi

Heppolette family, Isao and Michio Tomita family, Hiroko Inoue family, Hiroshi Iwaki, Jun-ichi Abe, Yukinobu Sawada, Kayo Nakata, Masayuki Jimichi, Nobukazu Tanii, Kiyoharu Doi, Kanako Otsui, Katsunosuke Namita family, Keiko Nakao, Mika Yamai and all friends in the Riverhouse at the Old Mill, the Toronto Japanese Family Services, Toronto Hokkaido-jin Kai and the Japanese Canadian Cultural Centre. Thank you all for your friendship throughout my life.

Contents

Part I

Theory with Examples

1

Optimality and Symmetry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1 Symmetry and Graph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Optimal Quantification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

3 3 7 7

2

Examples of Quantification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1 Kretschmer’s Typology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1.1 Ordinary Analysis and Results . . . . . . . . . . . . . . . . . . . . . . 2.1.2 Quantification Analysis and Results . . . . . . . . . . . . . . . . . 2.2 Joint Graphical Display . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3 Singapore Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.1 Some Consideration for Ordinary Analysis . . . . . . . . . . . 2.3.2 Quantification Analysis and Results . . . . . . . . . . . . . . . . . 2.4 Graphical Display . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5 Examples of Other Types of Graphs . . . . . . . . . . . . . . . . . . . . . . . . . 2.5.1 Sorting Countries: Sorting Data . . . . . . . . . . . . . . . . . . . . . 2.5.2 Seriousness of Criminal Acts: Successive Categories Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5.3 Mothers’ and Children’s Wishes for Professions . . . . . . . 2.5.4 Christmas Party Plans . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5.5 Attractions of Hot Springs: Rank-Order Data . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

9 9 10 12 13 15 16 18 21 23 23 24 26 27 27 29

Constraints on Quantification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1 What Data Should We Quantify? . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Some More Observations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3 Data in Terms of Unknown Numbers . . . . . . . . . . . . . . . . . . . . . . . . 3.4 Quantification Under Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

31 31 32 33 35 36

3

xiii

xiv

Contents

4

Quantification Procedures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1 Historical Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Strategies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.1 Quantification Through Correlation . . . . . . . . . . . . . . . . . . 4.2.2 Quantification Through Correlation Ratio . . . . . . . . . . . . 4.2.3 Quantification Through Cronbach’s Alpha . . . . . . . . . . . . 4.2.4 Method of Reciprocal Averages: MRA . . . . . . . . . . . . . . . 4.3 Optimal Symmetric Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4 Bilinear Expansion and Graphical Display . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

37 37 37 38 39 41 42 43 45 46

5

Mathematical Symmetry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1 Bi-modal Symmetry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1.1 Correlation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1.2 Correlation Ratio . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2 Multi-modal Symmetry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2.1 Piecewise Method of Reciprocal Averages . . . . . . . . . . . . 5.2.2 Generalization to n Variables . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

49 49 49 50 51 52 56 57

6

Data Format and Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.1 Two Formats of Same Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2 Further Comparisons of Data Formats . . . . . . . . . . . . . . . . . . . . . . . 6.3 Numerical Illustration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.3.1 Kretschmer’s Typology Revisited . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

59 59 64 65 65 68

7

Space Theory and Symmetry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.1 Spatial Symmetry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.2 Theory of Quantification Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.2.1 Contingency Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.2.2 Dual Space: Symmetric Space . . . . . . . . . . . . . . . . . . . . . . 7.2.3 Pairwise Dual Subspaces . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.2.4 Total Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.2.5 Residual Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.3 Example of Space Decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . 7.4 Recommendations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

69 69 69 70 70 70 71 71 72 76 80

8

Graphical Display . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.1 Graphical Display of Rows or Columns . . . . . . . . . . . . . . . . . . . . . . 8.1.1 Blood Pressures and Migraines . . . . . . . . . . . . . . . . . . . . . 8.2 Joint Graph: Correspondence Plot . . . . . . . . . . . . . . . . . . . . . . . . . . 8.3 Logically Correct Graph and Discrepancy Diagram . . . . . . . . . . . 8.3.1 Graphs of Response-Pattern Format . . . . . . . . . . . . . . . . . 8.4 Re-evaluating Correspondence Plot . . . . . . . . . . . . . . . . . . . . . . . . . 8.4.1 Alternatives to Joint Graphical Display . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

83 83 83 87 89 91 93 96 97

Contents

Part II 9

xv

Gleaning in the Field

Forced Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.1 Procedure of Forced Classification . . . . . . . . . . . . . . . . . . . . . . . . . . 9.1.1 Criterion-Total Correlation . . . . . . . . . . . . . . . . . . . . . . . . . 9.1.2 Criterion Items Correlation . . . . . . . . . . . . . . . . . . . . . . . . . 9.1.3 Partitioning of Total Space . . . . . . . . . . . . . . . . . . . . . . . . . 9.1.4 Contributions of Individual Components . . . . . . . . . . . . . 9.1.5 Legitimacy of Set by Set Analysis . . . . . . . . . . . . . . . . . . . 9.1.6 An Example of Application . . . . . . . . . . . . . . . . . . . . . . . . 9.1.7 Graph in Criterion-Item Space . . . . . . . . . . . . . . . . . . . . . . 9.2 Generalized Forced Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

101 101 104 105 107 110 110 112 113 115 116

10 Data with Designed Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.1 Analysis of Variance of Nominal Data . . . . . . . . . . . . . . . . . . . . . . . 10.1.1 Maximizing the Effects of α, β and γ . . . . . . . . . . . . . . . . 10.2 Quantification of Multi-way Analysis of Data . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

119 119 122 123 123

11 Quantifying Dominance Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1 Dominance Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1.1 Quantification Approaches . . . . . . . . . . . . . . . . . . . . . . . . . 11.1.2 Quantification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1.3 Total Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Example: Ranking of Municipal Services . . . . . . . . . . . . . . . . . . . . 11.3 Paired Comparison Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.3.1 Example: Wiggins’ Christmas Party Plans . . . . . . . . . . . . 11.3.2 Example: Seriousness of Criminal Acts . . . . . . . . . . . . . . 11.3.3 Goodness of Fit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.4 Forced Classification of Ordinal Data . . . . . . . . . . . . . . . . . . . . . . . 11.4.1 Rank-Order and Paired Comparison Data . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

125 125 126 127 128 129 135 137 139 141 143 143 144

Part III Cautions for Quantification 12 Over-Quantification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.1 Adverse Conditions of Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.1.1 Future of English in Hong Kong: Tung’s Data . . . . . . . . . 12.2 Standardized Quantification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.2.1 Option Standardization . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.2.2 Results of Standardization . . . . . . . . . . . . . . . . . . . . . . . . . 12.3 Handling Outlier Responses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.3.1 The Method of Reciprocal Medians: MRM . . . . . . . . . . . 12.3.2 Alernating Reciprocal Averaging and Reciprocal Medians . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

149 149 150 157 159 160 160 162 163

xvi

Contents

12.3.3 Method of Trimmed Reciprocal Averages . . . . . . . . . . . . 163 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163 13 When Not to Analyze Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.1 Missing Responses and Quantification . . . . . . . . . . . . . . . . . . . . . . . 13.2 Some Procedures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.2.1 List-Wise Deletions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.2.2 Extra Categories . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.2.3 Imputation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.3 Imputation Principles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.3.1 Principle of Maximal Internal Consistency . . . . . . . . . . . 13.3.2 Hot-Deck Principle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.3.3 Principle of Complete Ignorance . . . . . . . . . . . . . . . . . . . . 13.4 Decision Rules: When Not to Analyze . . . . . . . . . . . . . . . . . . . . . . . 13.5 Towards a State-of-the-Art Framework . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

165 165 166 167 168 168 168 168 169 169 170 171 172

14 Epilogue . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.1 Reminiscence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.1.1 John C. Gower . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.1.2 Jean-Paul Benzécri . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.2 Going Forward . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

173 173 174 176 177 179

Part IV Appendices 15 Stevens’ Measurement Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15.1 Four Kinds of Measurement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15.1.1 Nominal Measurement . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15.1.2 Ordinal Measurement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15.1.3 Interval Measurement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15.1.4 Ratio Measurement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15.2 Domains of Quantification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15.2.1 Full-Fledged Domain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15.2.2 Quasi-Domain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15.2.3 Outside Domain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

183 183 184 184 184 185 186 186 186 187 187

16 A Numerical Example of MRA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16.1 Computing Optimal Component . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16.2 Extracting More Components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

189 189 193 195

Part I

Theory with Examples

Chapter 1

Optimality and Symmetry

1.1 Symmetry and Graph Generally speaking, it is believed that the concept of symmetry can be traced back at least to Pythagorean times, and a variety of theories of symmetry flourished in recent centuries. The concept of symmetry appears in all aspects of our daily life, such as in art, geometry, mathematics, biology, chemistry, physics, architecture and cosmology. Depending on the areas of its applications, we should know that widely different definitions of symmetry exist, some of which are far from our common-sense notion of symmetry. Even so, our common sense tells us that symmetry is associated with beauty, balance, beautiful projection, equilibrium, harmony, stability, simplicity, superiority and good taste. We hope that our quantification theory will prove to be a research strategy of good taste. If one wants to study symmetry in diverse areas, it is an extremely difficult topic because of the immense theoretical range of its definitions and a huge variety of symmetry. Some definitions of symmetry are far from our ordinary common-sense definitions. Although it may be too technical, theoretical or mathematical, the interested readers are referred to the monumental Russian translation of Symmetry in Science and Art by Shubnikov and Kopsik (1974) and beautiful mathematical treatise Geometry and Symmetry by Yale (1968). In the current book, we will see the concept of symmetry as a strong backbone of quantification theory. More specifically in our work, symmetry appears in the centroid method (in factor analysis), the wide applications of the concept of projection (in mathematical estimation), the bi-modal expansion (in mathematics), the eventual limit of a mathematical series (in quantification theory of reciprocal averaging), principal-coordinate space, dual space (geometry) and the eigenequations (in multivariate analysis). As such, symmetry is everywhere surrounding data analysis. It is associated with optimality. As a simple and familiar example, consider canonical reduction, which we can explain using the following example. Suppose that we are given the quadratic function: 5x 2 + 8x y + 5y 2 = 9 © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 S. Nishisato, Optimal Quantification and Symmetry, Behaviormetrics: Quantitative Approaches to Human Behavior 12, https://doi.org/10.1007/978-981-16-9170-6_1

3

4

1 Optimality and Symmetry

Then, canonical reduction is to transform this function into the function which does not have the cross-product term (i.e., 8x y). For this example, the canonical form of the above function is given by 9x 2 + y 2 = 9

(1.1)

To derive this form, let us express the first quadratic function in the matrix form as follows:      54 x x y =9 45 y At this stage, we introduce a function, called the characteristic function, or more popularly the eigenequation, which is given by        54 1 0   5 − λ 4    4 5 − λ 0 1  =  4 5 − λ = (5 − λ)2 − 16 = (λ − 9)(λ − 1) = 0 Therefore, we obtain the two unknowns, called eigenvalues, as λ1 = 9 and λ2 = 1, which leads to the earlier expression of (1.1). The above transformation is called canonical reduction, but may not tell us anything until we graph those two equations as shown in Fig. 1.1. The canonical reduction amounts to the transformation of a tilted function (i.e., axes (X, Y ) into a symmetric form (axes X ∗ , Y ∗ ). In terms of the original axes, the function shows a tilted form, while in the canonical form with the axes going through as shown in the graph, the quadratic function is symmetric with respect to the new axes (X ∗ , Y ∗ ), which are called principal axes. Through our mathematics, we have just introduced canonical reduction of a quadratic function. This concept is related to many formulas used in statistics as eigenequation and singular-value decomposition, which have served as a backbone of multivariate analysis since as early as 1870s (e.g., Beltrami 1873; Jordan 1874) and are related to such familiar topics as principal component analysis (Pearson 1901; Hotelling 1933) and Eckart-Young decomposition theorem (Eckart and Young 1936). In particular, principal component analysis is known as a technique to find principal axes. The projections of data points on those axes are called principal coordinates, and it is well known that principal component analysis provides the most economical way of describing the data. It can also be stated as a method to project the data onto space where the variance of the data attains the maximum variance (i.e., the maximum information). In this way, principal component analysis identifies a set of orthogonal coordinates onto which the distribution of data is symmetric with the maximum variance. Quantification theory is known to be the principal component analysis of categorical data (Torgerson 1958).

1.1 Symmetry and Graph

5

Fig. 1.1 Canonical reduction of a quadratic equation

We are also familiar with the least-squares estimation in statistics, and the leastsquares estimator is obtained by the projection of data onto the model space. The concept of projection operators appears in many branches of data analysis, and they are a tool to arrive at the balanced configuration of data clouds, such as principal coordinates in Euclidean space. The above simple demonstration of canonical reduction of a quadratic function can therefore be regarded as the origin of optimal data analysis as we will see in quantification theory. Let us consider another example, a six-item questionnaire Q1: Your blood pressure? (1 = low; 2 = normal; 3 = high). Q2: Migraine? (1 = none; 2 = occasionally; 3 = often). Q3: Age? (1 = 20–34; 2 = 35–49; 3 = 50–65). Q4: Anxiety? (1 = low; 2 = medium; 3 = high). Q5: Weight? (1 = light; 2 = average; 3 = high). Q6: Height? (1 = short; 2 = average; 3 = tall). The data in Table 1.1 are artificial and created for our illustration of quantification theory. Without any explanation, we will simply show an example in which we project the data clouds onto the space of one variable Age (Fig. 1.2). This is a different kind of symmetry to show that the data can be optimally interpreted in relation to the chosen variable of age. See the two-dimensional triangle of age (young, middle age, old), in terms of which we can see, for example, the old age is almost equal to high blood pressure, middle age almost equal to low anxiety and so on showing proximity of other variables to the three age groups. This is an example in which the concept of projection is used to define the base of analysis (the age space), and it is combined

6

1 Optimality and Symmetry

Table 1.1 Data on six multiple-choice questions Item Subject 1 2 3 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

1 1 3 3 2 2 2 1 2 1 2 2 3 1 3

3 3 3 3 1 1 2 3 2 3 1 2 3 3 3

3 1 3 3 2 2 2 1 2 2 1 3 3 1 3

4

5

6

3 3 3 3 2 3 1 3 1 2 3 3 3 2 3

1 2 1 1 3 3 1 1 1 1 2 2 3 1 1

1 2 3 1 2 1 3 3 2 3 2 2 1 1 2

Fig. 1.2 Projection of data onto “age” space

with symmetric analysis, which we will later explore. Although it may not be clear, this configuration satisfies a number of criteria, including symmetry and optimality. Later we will see the discussion of dual (symmetric) space, where quantification analysis is centered.

1.2 Optimal Quantification

7

1.2 Optimal Quantification The symmetric framework for optimal quantification may have escaped the attention of many researchers, but it is vitally important for quantification theory. This exposition is introductory, but its main aim is to illustrate quantification theory with numerical examples, which we hope will enhance the readability of the book. As mentioned in the preface, quantification theory has been called by many names, and some of them (e.g., optimal scaling, homogeneity analysis) clearly suggest that quantification involves optimal linear and nonlinear transformations of data to arrive at the best description of qualitative data. There is also a name (e.g., dual scaling) which suggests that symmetry is the property of optimality. The concepts of optimization and symmetric computations appear in tandem in our discussion of quantification theory. This approach may not have attracted much attention in the past, but it is the author’s view that the combination of the two concepts is inevitable in order to advance one more step in quantification theory. In this step, we will also discuss spatial symmetry, which tells us a boundary of quantification theory in a geometric term. In the current book, therefore, we take this unique look at these concepts as the backbone of quantification theory, and we will see a reasonable boundary of quantification space, a topic which has never been clearly delineated. Under the current scheme of optimal quantification with symmetry, we will arrive at a solution to the perennial problem of joint graphical display. Such a solution is obtained in conjunction with the use of symmetric space, called dual space (Nishisato 2019; Nishisato et al. 2021). We will explore how these concepts, particularly optimality and symmetry, are used in quantification theory. When we discuss mathematics for the theory, optimality surfaces from time to time; when we consider graphical display of quantified results, we will be led to spatial symmetry, a key concept for joint graphical display we use for summarizing quantification outcomes. From Chap. 2, we will introduce quantification theory with symmetry and optimality as our goal. This combination will be shown as instrumental to defining the multidimensional space, which is unique to quantification theory.

References Beltrami, E. (1873). Sulle funzioni bilineari (On the bilinear functions). In G. Battagline, & E. Fergola (Eds.), Giornale di Mathematiche, 11, 98–106. Benzécri, J. P., et al. (1973). L’analyse des données: II. L’analyse des correspondances. Paris: Dunod. Bock, R. D. (1960). Methods and applications of optimal scaling. The University of North Carolina Psychometric Laboratory Research Memorandum, No. 25. de Leeuw, J. (1984). Canonical analysis of categorical data. Leiden University: DSWO Press. Eckart, C., & Young, G. (1936). The approximation of one matrix by another of lower rank. Psychometrika, 1, 211–218.

8

1 Optimality and Symmetry

Gifi, A. (1990). Nonlinear multivariate analysis. New York: Wiley. Greenacre, M. J. (1984). Theory and applications of correspondence analysis. London: Academic Press. Guttman, L. (1941). The quantification of a class of attributes: A theory and method of scale construction. In The Committee on Social Adjustment (Ed.), The Prediction of Personal; Adjustment (pp. 319–348). New York: Social Science Research Council. Hayashi, C. (1950). On the quantification of qualitative data from the mathematico-statistical point of view. Annals of the Institute of Statistical Mathematics, 2, 35–47. Hill, M. O. (1974). Correspondence analysis: A neglected multivariate method. Journal of the Royal Statistical Society C (Applied Statistics), 23, 340–354. Hirschfeld, H. O. (1935). A connection between correlation and contingency. Cambridge Philosophical Society Proceedings, 31, 520–524. Horst, P. (1935). Measuring complex attitudes. Journal of Social Psychology, 6, 369–374. Hotelling, H. (1933). Analysis of complex of statistical variables into principal components. Journal of Educational Psychology, 24, 417–441, and 498–520. Jordan, C. (1874). Mémoire sur les formes bilinieres (Note on bilinear forms). Journal de Mathématiques Pures et Appliquées, deuxiéme Série, 19, 35–54. Le Roux, B., & Rouanet, H. (2004). Geometric data analysis: From correspondence analysis to structural data analysis. Dordrecht, the Netherlands: Kluwer Academic Publishers. Lebart, L., Morineau, A., & Warwick, K. M. (1984). Multivariate descriptive statistical analysis. New York: Wiley. Meulman, J. (1982). Homogeniety Analysis of Incomplete Data. Leiden University: DSWO Press. Nishisato. (1976). Optimal scaling. A talk at the Symposium on Optimal Scaling, organized by F. Young at the 1976 Annual Meeting of the Psychometric Society at Murray Hill, N.J., where “dual scaling” was proposed for optimal scaling by Nishisato. Nishisato, S. (1980). Analysis of Categorical Data: Dual Scaling and Its Applications. Toronto: The University of Toronto Press. Nishisato, S. (1996). Gleaning in the field of dual scaling. Psychometrika, 61, 559–599. Nishisato, S. (2007). Multidimensional nonlinear descriptive analysis. London: ChapmanHall/CRC. Nishisato, S. (2019). Reminiscence: Quantification theory and graphs. Theory and Applications of Data Analysis, 8, 47–57. (in Japanese). Nishisato, S., Beh E.J., Lombardo, R. & Clavel, J.G. (2021). Modern Quantification Theory: Joint Graphical Display, Biplots, and Alternatives. Singapore: Springer Nature. Pearson, K. (1901). On lines and planes of closest fit to systems of points in space. Philosophical Magazines and Journal of Science, Series, 6(2), 559–572. Ramensky, L. G. (1930). Zür Methodik der vegleichenden und Ordnung von Pfanzenlisten und anderen Objecten, die durch mehrere, verschiedenartig wirkende Factoren bestimmt werden. Beitr. Biol. Pft., 18, 269–304. Richardson, M., & Kuder, G. F. (1933). Making a rating scale that measures. Personnel Journal, 12, 36–40. Shubnikov, A. V., & Kopsik, V. A. (1974). (translated from Russian by Archard, G. D., edited by Harker, D.) Symmetry in Science and Art. New York: Plenum Press. Torgerson, W. S. (1958). Theory and methods of scaling. New York: Wiley. Whittaker, R. H. (1978). Ordination of plant communities. The Hague: Junk. Yale, P. B. (1968). Geometry and symmetry. San Francisco: Holden-Day.

Chapter 2

Examples of Quantification

2.1 Kretschmer’s Typology Kretschmer was a German psychiatrist, who considered that mental types and body types are correlated, and proposed his typology. This data set is from Kretschmer (1925), which is a 3 × 5 contingency table of counts of patients. The three rows of the table represent mental types (manic-depressive, schizophrenic and epileptic) and the five columns are body types (pyknic, leptosomatic, athletic, dysplastic and others). The body types are: • • • •

Pyknic body type: Short, thickset and stocky. Leptosomatic body type: Frail, long-limbed and narrow-chested. Athletic body type: Muscular, well-proportioned and broad-shouldered. Dysplastic body type: disproportionate physique.

Kretschmer’s typology is based on his theory that these body types are related to the three mental types: manic-depressive type, schizophrenic type and epileptic type. Table 2.1 shows joint counts of the combinations of the mental types and the body types. Let us use the following notation: m = the total number of rows of the table (in the current example, m = 3). n = the total number of columns of table (in the current example, n = 5). f i j is the element of the ith row and the jth column of the table (e.g., f 11 = 879, f 13 = 91, f 21 = 717). f i. is the marginal (sum) of row i (i.e., f 1.0=1360 ). f . j is the marginal of column j (i.e., f 0.2 = 3271). f t is the total of all responses, that is, 8098 patients in our example!

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 S. Nishisato, Optimal Quantification and Symmetry, Behaviormetrics: Quantitative Approaches to Human Behavior 12, https://doi.org/10.1007/978-981-16-9170-6_2

9

10

2 Examples of Quantification

Table 2.1 Kretschmer’s typology data Pyknic Leptosomatic Athletic Manicdepressive Schizophrenic Epileptic Total

Dysplastic

Others

Total

879

261

91

15

114

1360

717 83 1679

2632 378 3271

884 435 1410

549 444 1008

450 166 730

5232 1506 8098

Table 2.2 Kretschmer’s data under statistical independence Pyknic Leptosomatic Athletic Manic282.0 depressive Schizophrenic 1084.8 Epileptic 312.2

Dysplastic

Others

549.3

236.8

169.3

122.6

2113.3 608.3

911.0 262.2

651.3 187.5

471.6 135.8

2.1.1 Ordinary Analysis and Results If we are to choose perhaps the most widely used method of analysis for this data set, it would be the so-called chi-square analysis. We then use a statistic to test statistical independence between rows (mental types) and columns (body types) of the table. In other words, we would be interested in the question of whether or not there exists significant association between mental types and body types. This is quite a reasonable inquiry about this data set. First, when the rows and the columns are statistically independent, the expected frequency of cell (i, j) is given by ei j , which we can calculate by the following formula: f i. f . j ei j = ft Let us apply this formula to our data and calculate a table of frequencies expected when mental types and body types are statistically independent (see Table 2.2). These numbers are calculated from the marginals of the table without any consideration of association between mental types and body types. Thus, our interest lies in the differences between the two tables (Tables 2.1 and 2.2). The differences reflect the amount of association between the two variables and we want to calculate this statistic from the data. The way in which statistical testing of association is carried out may sound strange, but we will test the independence between rows and columns, and if this hypothesis is rejected, we then conclude that there exists the significant amount of association between rows and columns. Testing the row-column independence is carried out by the statistic χ 2 (chisquare), which is given by the formula

2.1 Kretschmer’s Typology

11

χ2 =

 ( f i j − ei j )2 ft

To evaluate this statistic, we need the parameter, called the degrees of freedom (df), which is given by (m − 1)(n − 1). There may be other ways of evaluating the data, but we will follow this most widely used method for evaluating the association between the mental types and the body types, using the chi-square statistic. For our data, we obtain χ 2 = 2, 643, 40; df = (3 − 1)(5 − 1) = 8 To evaluate this outcome, we can refer to a statistical table of χ 2 , which can be found in any textbook of statistics. According to the table of the χ 2 distribution, Our chi-square value is significant at the 0.05 level of confidence. This means that we cannot consider the obtained chi-square value to be a random sample from the population where the rows and the columns are statistically independent. In other words, the association between the rows (mental types) and the columns (body types) is statistically significant, hence our conclusion is Mental types are significantly related to body types. This is the conclusion from the ordinary analysis. If we further exercise our common-sense judgment, we can say that “manic-depressive” mental type is likely to be of “pyknic” body type. Similarly, schizophrenic mental type has a strong association with body type “leptosomatic,” and “epileptic” mental type is related to “athletic and dysplastic” body types. These observations are based on the joint frequencies in Table 2.1. The above observation is a typical report on this chi-square analysis. It is simple and easy to interpret, which is wonderful. But, we should ask if that is all we can say, irrespective of the differences between the observed data and those under statistical independence. In other words, are we just interested in whether the differences are large enough for the chi-square statistic to be significant? Does it not matter whether the difference just clears the significance level or huge? We wonder why we do not consider the magnitude of discrepancy. We are definitely interested in how much the discrepancy is between the observed and the expected under independence. How can we entertain our curiosity into more detailed analysis? We will have an answer from quantification analysis.

12

2 Examples of Quantification

2.1.2 Quantification Analysis and Results In quantification analysis, we can analyze the valuable source of association much more extensively than in the above analysis. In doing so, it is typically the case to introduce multidimensional Euclidean space, and determine the coordinates of the rows (mental types) and the columns (body types). Once we identify multidimensional coordinates of the mental types and the body types, we can see the most dominant pattern of association between the mental types and the body types, often through multidimensional graphical display. A typical path of quantification analysis is; if the first dominant association type does not exhaustively account for the elements of the contingency table, we seek the second most dominant association pattern under the condition that it is independent of the first pattern, and the analysis goes on in this way until we extract all the patterns that constitute the data. Thus, our analysis can be called exhaustive analysis of association embedded in the contingency table. This aspect alone makes quantification analysis much more inviting and worth considering than the simple chi-square analysis. Imagine that Kretschmer’s data were obtained from 8098 subjects! With this expectation, it is indeed worth spending a lot of time in collecting the data as well. This most important difference between the two types of analysis under consideration is that quantification analysis makes it possible for us to look into patterns of relations between mental types and body types, and furthermore it typically allows us to capture the entire structure of the data in multidimensional space, some linear and some nonlinear relations between the mental types and the body types. What do we mean by that? Let us explore it in the next section. The main results on Kretschmer’s data are: • There are two orthogonal components to depict distinct relations between mental types and body types. • The correlation between the mental types and the body types of the first component, ρ1 , is 0.5082. • For component 2, ρ2 = 0.2611. • The relative variance of each component is called the eigenvalue, ηk2 , which is the square of ρk . Therefore, η12 = 0.2582 and η22 = 0.0682. The variance is a statistic of information. • Therefore, the first component accounts for 79% of information and the second component 21%. • The coordinates of the mental types and the body types of these two components are as in Table 2.3. Remember that these two components are independent, and recall that the earlier χ 2 -analysis did not identify these orthogonal components, but only yielded one χ 2 value. Notice that we now have the coordinates of the mental types and the body types, and we can use them for graphical display, to figure out their relative geometric relations.

2.2 Joint Graphical Display

13

Table 2.3 Two-dimensional coordinates Component 1 2 Manic−1.09 depressive Schizophrenic 0.14 Epileptic 0.50

Component

1

2

0.61

Pyknic

−0.96

0.11

−0.18 0.48

Leptosomatic Athletic Dysplastic Others

0.16 0.33 0.55 0.06

−0.29 0.18 0.45 0.09

2.2 Joint Graphical Display Keep in mind, however, that this graphical display has some complex problem since the mental types and the body types are not exactly in the same two-dimensional space. In other words, if only the mental types and the body types are perfectly correlated they occupy the same space. Remember that in a typical course on multivariate analysis, we learn that if the product-moment correlation between variable A and variable B, r AB , is not 1, we require two-dimensional space to show their relation; if they are not perfectly correlated, we must introduce one axis for the mental types and one axis for the body types so that the space to accommodate the two axes must be two dimensional. How can we introduce such two axes for our two variables? Nishisato and Clavel (2003) showed that the axis for the mental type and the axis for the body types cross at the origin of two-dimensional space at the angle given by θ , where, if we indicate the correlation between mental types and body types by rmb , θ = cos −1 rmb The angles for components 1 and 2 of Kretschmer’s data are, respectively, θmb:1 = 59◦ θmb:2 = 75◦ Therefore, for two extracted components we need four-dimensional space. This is a difficult topic and we will discuss it later after we have looked at mathematics of quantification theory. Right now, let us simply assume that the correlation between mental types and body types in each component is 1 so that we may drop the space discrepancy, allowing us only a roughly approximated two-dimensional graph for our example. Sorry about this shortcut, but this simplified method is currently the most popular graphical method (please do not worry about this since we will discuss the exact graphical method later). Let us look at this compromised graph to see a rough relation between mental types and body types of our example (Fig. 2.1).

14

2 Examples of Quantification

Fig. 2.1 Plot of Components 1 and 2

From this graph, it is easy to see that mental type manic-depressive is close to the body type pyknic; the mental type epileptic is near the body type dysplastic and finally the mental type schizophrenic is closely located to the body type leptosomatic. The results seem to make sense, but this graphical display must be carefully examined since it is only an approximation to the correct graph. At this moment, we will simply mention that the exact graph is four dimensional and that we will introduce a logically correct joint graphical method in Chap. 8. Notes Recall that we are considering plotting rows (mental types) and columns (body types) and that quantification analysis is carried out so as to maximize the row-column correlation. But, no matter what we do not typically attain perfect correlation of one. Under the circumstance, we must realize that rows and columns cannot be graphed in the same graph, but require two-dimensional space for each of our two components. Yet, the currently most popular graphical method, popularized by Benzécri et al. (1973); Greenacre (1984), plots the rows and the columns in the same space as if their correlation is 1. This problem has an interesting historical debate between Carroll et al. (1986, 1989) and Greenacre (1989), but without solving the problem of joint graphical display. However, the problem was solved in 2016 (Nishisato 2016, 2019a, b), which will be discussed later in Chap. 8. See also Nishisato et al. (2021). Bilinear Data Structure In quantification analysis, we use a well-known data decomposition formula, called bilinear expansion. A typical element of the contingency table, f i j , can be expressed in terms of orthogonal components as follows: fi j =

f i. f . j (1 + ρ1 y1i x1 j + ρ2 y2i x2 j + · · · + ρ K y K i x K j ) ft

(2.1)

where ρk is called the singular value associated with component k, yik is the weight for row i of component k, x jk is the weight for column j of component k, K is the

2.2 Joint Graphical Display

15

Table 2.4 Order-0 and order-1 approximations Order-0 Pyk* Lep Ath Dys Oth M-D* Sch Epi

282 1085 312

549 2113 608

237 911 262

169 651 188

127 472 136

Order-1 Pyk Lep

Ath

860 801 18

66 995 349

358 2208 706

Dys −31 749 289

Oth 107 479 144

total number of components, which is equal to the smaller value, out of m and n, minus 1. Nishisato and Nishisato (1994) introduced the following terms for bilinear decomposition: • Order-0 approximation = • Order-1 approximation =

f i. f . j ft f i. f . j (1 ft f i. f . j (1 ft

+ ρ1 y1i x1 j )

• Order-2 approximation= + ρ1 y1i x1 j + ρ2 y2i x2 j ) • and so on, down to Order-K approximation which is nothing but the data set itself. The order-0 and the order-1 approximations to our data are as in Table 2.4. For our example, the order-2 approximation yields the table identical to the input data. Note that the component 1 has substantial contributions to the data structure since the order-1 approximation is very close to the input data. Remember that the first component accounts for 79 % of the total information.

2.3 Singapore Data In 1985, Nishisato conducted a workshop at NEC Computers Singapore Ltd., organized by K. S. Leong and T. Nishikiori, and the following data were collected from 23 participants. They answered the four multiple-choice questions: Q.1. How old are you? (20–29; 30–39; 40 or over). Q.2. Children today are not as disciplined when I was a child. (agree; disagree; I cannot tell). Q.3. Children today are as not fortunate as when I was a child. (agree; disagree; I cannot tell). Q.4. Religions should not be taught at school. (agree; disagree; indifferent). The data are tabulated in two formats, one in terms of chosen options and the other one in the response-pattern format, as seen in Table 2.5. We have chosen those

16

2 Examples of Quantification

Table 2.5 Singapore data in the response-pattern format Subject 1 2 3 4 1

2

3

4

100 100 100 010 100 001 100 100 010 100 010 100 100 100 100 100 100 001 100 100 001 001 100 001

010 001 010 010 010 100 010 010 001 010 010 100 001 010 010 010 100 010 010 010 001 001 010 001

100 010 010 001 010 010 010 010 100 100 001 100 001 100 001 100 100 010 100 010 001 001 010 001

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24

3 2 2 1 3 1 2 2 1 3 1 2 2 3 1 3 3 2 3 2 1 1 2 1

1 1 1 2 1 3 1 1 2 1 2 1 1 1 1 1 1 3 1 1 3 3 1 3

2 3 2 2 2 1 2 2 3 2 2 1 3 2 2 2 1 2 2 2 3 3 2 3

1 2 2 3 2 2 2 2 1 1 3 1 3 1 3 1 1 2 1 2 3 3 2 3

001 010 010 100 001 100 010 010 100 001 100 010 010 001 100 001 001 010 001 010 100 100 010 100

two response formats since the first one is typically used in ordinary analysis and the other one in quantification analysis. Let us now compare these two types of analysis.

2.3.1 Some Consideration for Ordinary Analysis Ordinary analysis starts with assigning numerals to those responses for different options. Although in the above table, we arbitrarily assigned 1, 2, 3 and 4 to the four response alternatives of the four questions, we must examine if these initial values are appropriate. Namely, we must pose the question: Are these assignments of scores to different options appropriate?

2.3 Singapore Data

17

By the way, in quantification analysis, too, we ask the same question, but the important difference is that quantification analysis is carried out mathematically, not subjectively, so as to determine good numerals for these response options. Unless this question is reasonably answered for ordinary analysis, we cannot go forward. Suppose we decide to give the following scores for those response options: disagree = 1, I cannot tell = 2, agree = 3 for Questions 2 and 3. Suppose we also give: indifferent = 2 for Question 4. These category scores are used in the above table and look comparatively reasonable, but what can we do about those options of Question 1? Should we use the following scores for the three age groups, 20–29 = 1, 30–39 = 2 and 40 or over = 3? Even if the investigator considers those integer scores for ordered categories appropriate, others would wonder if this investigator’s judgment is reasonable. There is no guarantee that the investigator’s judgment is correct. Likert scores This problem of initial assignment of scores to the response alternatives of each question is crucial for the validity of analysis. Therefore, it is a common practice to use ordered response alternatives and use Likert scores (Likert 1932). Likert invented the so-called Likert scale for ordered response categories such as 1 = ever, 2 = sometimes, 3 = often, 4 = always 1 = poor, 2 = average, 3 = good, 4 = excellent 1 = strongly disagree, 2 = moderately disagree, 3 = neutral, 4 = moderately agree, 5 = strongly agree. When the data are collected by questions with these alternatives (response options), those ordered numerals are used as Likert scores for data analysis. His proposal was widely accepted and has become almost a routine tool for survey studies. Likert scores have been popular, but are they valid? It may be shocking, but we must say that Likert scores are a ghost of the past research strategy when investigators were mostly interested in unidimensional measurement (e.g., unidimensional anxiety scale, attitude scales), where the addition of Likert scores made sense. Times have changed and nowadays we are dealing with multidimensional and nonlinear data (e.g., there are several dimensions of anxiety (multidimensional); the body strength is a nonlinear function of age since a child is not strong, a teenager is stronger, but a senior is weaker than a teenager (nonlinear relation), where Likert scores will totally eliminate the possibility of finding multidimensional nonlinear relations between attributes measured by multiple-choice questions.) Thus, Likert scale can be used as a method of coding responses (and these codes must be quantified so as to capture the structure of the data), but not as scores for analysis. In conclusion, given this Singapore data set, the results of analysis solely depend on what scores the investigator would assign to those response options of each question. Therefore, we cannot present a general enough report on the traditional analysis for Singapore data. Our recommendation is to use Likert scale as a method for coding responses, and leave the assignment of numerals to them to quantification analysis.

18

2 Examples of Quantification

2.3.2 Quantification Analysis and Results Our task here is how to assign scores for those response options. The data are represented in terms of the response patterns, consisting 1’s and 0’s. Our task is how to assign three scores to the three response options of each item. For this analysis, we represent the data in response patterns as we see in the right half of Table 2.5, where in total we must determine 12 numerals for the 12 response options of the four questions, and 24 scores for 24 subjects. The task of quantification analysis is: Determine optimal scores for 24 subjects and optimal weights for 12 response options. Other than the size of the table of unknown numerals, the quantification task is the same as for the contingency table. But, as we will see later, there are enormous differences in optimization outcomes between the contingency table and the response-pattern table. A crucial difference between the traditional analysis of Likert scores and quantification analysis lies in the fact that under quantification analysis the same response categories are likely to be assigned different scores for different components, while in the traditional analysis the scores for different response options are fixed for the entire analysis. For this data set, the maximal number of components is 8 (i.e., there are eight different sets of assigning numerals to the response options), resulting in eight different sets of response-option scores (while in the traditional analysis, there is only one set of scores for the response options). For multiple-choice data which are often used for test construction, the concept of “reliability of a test” is quite important. One of the reliability coefficients is Cronbach’s coefficient alpha (Cronbach 1951), typically called Cronbach’s alpha. This coefficient cannot become negative since it is defined as a ratio of two theoretically expected positive numbers. However, when we decompose data into components, it is not uncommon that we obtain negative values for Cronbach’s alpha. So, Nishisato (1980a) suggested to adopt only those components for which Cronbach’s alpha is non-negative (see Chap. 3 of his 1980 book for further discussion on Cronbach’s alpha). When we apply his suggestion to Singapore data, we find there are three components that satisfy this condition. Therefore, we will look at only those three components here. Results from quantification analysis can be summarized in three tables, one for the data structure (Table 2.6), one for the subjects (Table 2.7) and one for the items (Table 2.8). The eigenvalue is considered to be a statistic that indicates the relative amount of information attributable to the component; the singular value is the product-moment correlation between the rows and the columns of the data table, which is equal to the square root of the eigenvalue, the reliability alpha is Cronbach’s alpha of internal consistency reliability, the statistic delta is the percentage of the total information accounted for by the component, cumulative delta is the value of delta up to the component, adjusted alpha is introduced here because we have decided not to consider

2.3 Singapore Data

19

Table 2.6 Summary statistics of first three components Component 1 2 Eigenvalue Singular value Row-column discrepancy Reliability alpha Delta (%) Cumulative delta (%) Adjusted delta (%) Adjusted cumulative delta (%)

3

0.65 0.80 37◦

0.44 0.66 49◦

0.32 0.56 56◦

0.82 32 32 46 46

0.58 22 54 31 77

0.28 16 70 23 100

all possible components, but only those components with Cronbach’s alpha being positive: the adjusted delta and the cumulative adjusted delta are calculated in this example as if the total consists of three components, in lieu of the absolute maximum of eight components. The principal scores for subjects correspond to the principal coordinates for the rows of the contingency table: we used “scores” because of the context of test scores. Mathematically, however, these are principal coordinates for the rows of the data table associated with the first three components. These three sets of weights are principal coordinates of the 12 options of the 4 questions. Although we have already looked at many aspects of quantification analysis, we should also mention that our analysis provides a lot of useful information for test construction. We typically start the construction of a questionnaire with many questions, and choose a subset of questions which contribute to the construction of sub-scales, that is, we gather those questions which tend to measure similar traits. This is a task of collecting a set of homogeneous items. For this task, we often use inter-item correlation, item-total correlation or its square and the sum of squares of the items. These statistics are also included in the output of typical quantification analysis. The following are tables of these statistics obtained from our quantification analysis. Inter-item Correlation: Each component is derived also so as to maximize the interitem correlation coefficients. Our results from the three components are shown in the table of correlation (Table 2.9). Notice that substantial correlation coefficients appear mostly in Component 1. In test construction, three widely used statistics are the sum of squares of item j (SS( j)), the squared item-total correlation of item j (r 2jt ) and the item-total correlation of item j (r jt ). These are also typical outputs of quantification analysis programs (Table 2.10). Important Reminder 1: In test construction, the concept of Cronbach’s alpha reliability is very important. As Lord (1958) and Nishisato (1980a) have demonstrated, our quantification method makes it certain that the derived scores have a maximal

20

2 Examples of Quantification

Table 2.7 Subjects’ principal scores of three components Component 1 2 −0.75 −0.03 −0.48 1.18 −0.66 0.58 −0.48 −0.48 1.01 −0.75 1.18 −0.57 0.50 −0.75 0.57 −0.75 −0.75 0.05 −0.75 −0.48 1.56 −0.48 1.56

Subject 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23

2

3

4

1 2 3 1 2 3 1 2 3 1 2 3

−0.08 −0.02 −0.35 −0.85 −0.24 1.49 −0.35 −0.35 −0.23 −0.08 −0.85 0.96 −0.15 −0.08 −0.39 −0.08 1.07 0.25 −0.08 −0.35 0.55 −0.35 0.55

0.75 −0.87 −0.79 0.58 −0.04 −0.38 −0.79 −0.79 0.82 0.75 0.58 0.10 −0.40 0.75 0.19 0.75 0.85 −1.00 0.75 −0.79 −0.09 −0.79 −0.09

Table 2.8 Three principal weights for response options Item Option 1 1

3

1.36 −0.34 −0.92 −0.55 1.39 1.17 −0.30 −0.32 1.14 −0.63 −0.34 1.35

2

3

0.34 −1.03 0.98 −0.04 0.99 −0.59 0.28 0.01 −0.19 1.04 −1.05 0.19

0.06 −0.14 0.11 −0.10 −1.15 1.26 2.08 −0.50 0.25 0.31 −0.05 −0.34

2.3 Singapore Data

21

Table 2.9 Inter-item correlation matrices of three components 1 2 Item 1 2 3 4 1 2 3 4 1 2 3 4

1.00 0.80 0.39 0.70

1.00 0.33 0.48

1.00 0.40

1.00

1.00 0.12 0.11 0.65

Table 2.10 Item statistics Component 1

1.00 −0.10 1.00 0.27 0.06

1.00

2

4 1

2

3

1.00 −0.07 1.00 0.02 0.27 1.00 0.25 −0.17 0.18

4

1.00

3

Item

SS( j)

r 2jt

r jt

SS( j)

r 2jt

r jt

SS( j)

r 2jt

r jt

1 2 3 4

30.6 24.9 12.9 23.7

0.86 0.70 0.36 0.66

0.93 0.84 0.60 0.81

38.7 9.9 1.0 42.4

0.74 0.19 0.02 0.81

0.86 0.44 0.14 0.90

0.90 32.9 53.5 4.7

0.01 0.45 0.74 0.07

0.11 0.67 0.86 0.25

value of the coefficient alpha. Such scores are what our quantification method provides. Important Reminder 2: It may not have been clear, but the derived scores have the maximal variance, that is, maximal discrimination among the examinees. Those scores of the subjects who choose different options of the questions from you receive maximally different scores from you.

2.4 Graphical Display As briefly mentioned in our discussion on the Kretschmer data, the joint graphical display of rows and columns of the data matrix has theoretical problems (e.g., See Table 2.6: the discrepancy angles between the row vector and the column vector are 37, 49 and 56◦ for components 1, 2 and 3, respectively. These values are definitely different from 0◦ , the condition used in the correspondence plot). Therefore, we will not look at the traditional graph, but would like to show another joint graphical method, proposed by Nishisato (1994). His method is to graph subjects using the first two components. This is a twodimensional graph of subjects only and the graph is logically correct (Fig. 2.2). The response options, however, do not span the same space as subjects. Thus, rather than plotting both subjects and options, his method uses each subject’s response patterns as labels for that subject. In this way, there is no logical problem here because we

22

2 Examples of Quantification

Fig. 2.2 Components 1 and 2 of subjects

graph only rows of the data and use response patterns (columns) of individuals (rows) as labels. Thus, no logical problem anywhere, and the method is correct. The only practical problem is that the length of response patterns can be too long to be practical! From this alternative graph of only subjects, shown in Fig. 2.2, we can grasp a general relation between subjects and their choices of the response options: • The large circle in the first quadrant shows those subjects with the response pattern (1**3), indicating that they are the youngest group who are indifferent about the religious education at schools. • The circle in the second quadrant shares the response pattern (31**), indicating that they represent the old group of subjects who do not think that today’s children are disciplined. • The large circle in the third quadrant is dominated by the response pattern (21**), meaning they are in the middle age group with the perception that today’s children are not disciplined. • The circle in the quadrants 3 and 4 shows the patterns (21*2) and (23*2), showing the subjects in the middle of the age group who support the teaching of religion at schools. There are, of course, some outliers, but isn’t this quite informative and interesting? This graphical method can be used for market-segmentation studies, although forced

2.4 Graphical Display

23

Fig. 2.3 Sorting of countries: Components 1 and 2

classification analysis (Nishisato 1984; Nishisato and Gaul 1990; Nishisato and Baba 1999) would be a more ideal strategy for market-segmentation studies than the above graphical method.

2.5 Examples of Other Types of Graphs Without providing data, let us look at some other graphs of different types.

2.5.1 Sorting Countries: Sorting Data The data set was obtained in Nishisato’s class: a list of countries was given, and the students were asked to sort them into piles of similar countries. The actual task is to assign 1 to the first country, and go down the list of countries and as soon as one finds a country similar to the first country, give 1 to that country; once the list is finished, assign 2 to the first remaining countries and as soon as a country similar to the country is found, assign 2 to that country and go down to the remaining list, assigning 2 to the countries similar to it, and so on until the list is thoroughly finished. The number of piles of similar countries is not fixed, and the number of sizes of each group is also not fixed. Based on quantification of this sorting data set, we obtain Fig. 2.3.

24

2 Examples of Quantification

Fig. 2.4 Seriousness of criminal acts

As we can see, these are the students’ perceptions of similar countries, which seem to be quite reasonable or similar to what we would expect. Isn’t the result interesting?

2.5.2 Seriousness of Criminal Acts: Successive Categories Data This is an example of quantification analysis of successive category data (Nishisato 1980b; Nishisato and Sheu 1984), a topic which will not be discussed in the current book for the reason that the method provides only one-dimensional output. However, as you will see the graph here, this method of dual scaling is very interesting, and the interested readers are referred to the above two publications. We are going to look at it because it is very different from the main stream of quantification theory. The quantification task is to determine the values of the criminal acts and also the boundaries of judgmental categories. Because of the order imposition on judgmental category boundaries, this application yields only one component with correctly ordered response categories (Fig. 2.4). The results seem to be straightforward. This is the reflection of the students’ perception of seriousness of those criminal acts. But, what if we collect the data not in terms of the rating scale as this example, but ask the subjects to rank them according to the order of seriousness. The most crucial difference between rating data and rank-order data is that we do not use category boundaries in ranking. Therefore, without this order constraints we can discover more than one component! The same subjects were involved in this ranking task and

2.5 Examples of Other Types of Graphs

25

Fig. 2.5 Seriousness of criminal acts: rank-order data

the results of quantification analysis are that the first component accounts for 81% of information (close to the unidimensional situation) and the second component 6%. Thus, the results are almost unidimensional. But, the outputs of this analysis also include subjects’ positions. Although this will be discussed later, the location of subjects mean that they will give the closest criminal act as the most serious, and the second closed the second most serious and so on. The subjects are indicated by squares and criminal acts by triangles. Isn’t it interesting to see the seriousness of those criminal acts are arranged similarly as the previous example of the successive categories data? The locations of subjects are also interesting to see individual differences in ranking these acts. The fact that subjects are all gathered on the left-hand side of the first axis means that they are quite similar in the main judgment of seriousness of these criminal acts. But in the second dimension some individual differences show up: try to reproduce ranking of the criminal acts as the ranking of the criminal acts in terms of the distances between subjects and criminal acts, and the rule is that a subject ranks the closest criminal act as the most serious, and ranks 2 the second most serous one and so on. In this way, the contribution of the second dimension (vertical axis) to the judgment makes sense. Our conclusion is that although their judgments are similar, we still see some individual differences (Fig. 2.5).

26

2 Examples of Quantification

Fig. 2.6 Wishes for children’s professions: rank-order data

2.5.3 Mothers’ and Children’s Wishes for Professions The data were reported by a student in Nishisato’s class. The data were obtained from an old document, in which mothers and their children were asked what professions they want the children to have. The data were obtained by ranking those professions, according to the order of their preferences. Mothers and children are indicated by squares and professions by inverted triangles. The first two components account for 62 and 13 % of the total information, respectively. Therefore, a two-dimensional graph can be regarded as a good representation of the data. It is interesting to note that both Korean mothers and children are closest to lawyer/doctors, teachers, professor/scientist and high governmental official. Japanese children and Thai mothers and Thai children are closest to lawyer/doctor, teacher and “I don’t know.” Professions Monk/priest/nun/clergyman are at the other end of their ranking (Fig. 2.6).

2.5 Examples of Other Types of Graphs

27

Fig. 2.7 Preference for Christmas party plans: paired comparison data

2.5.4 Christmas Party Plans We now have paired comparison data. Ian Wiggins (now a successful consultant) collected these data as an assignment to Nishisato’s class. The first two components account for 34 and 26 % of the total information. The plot of the two components tells us something very interesting. The two-dimensional space accommodates some inexpensive party plans (P in the graph indicates Potluck party) versus expensive plans and daytime parties versus evening parties. The subjects are represented by squares, and it is interesting to see that the subjects are scattered over two-dimensional space. Again, the meaning of the locations of the subjects is as follows: the closer the party plan the more likely the subject prefers it to the party plan which is further away from the subject. Some subjects prefer inexpensive parties, some expensive parties, some daytime and others evening and so on (Fig. 2.7).

2.5.5 Attractions of Hot Springs: Rank-Order Data Data were collected in Nishisato’a class at Kwansei Gakuin University in Nishinomiya, Japan, where 30 students were asked to rank the following ten attractions of hot springs (Note: there are many hot springs throughout Japan):

28

2 Examples of Quantification

1. Day visit. 2. Less than 10,000 yen a night. 3. Natural rock hot spring. 4. Family bath. 5. Beautiful scenery. 6. Superb food. 7. Gym. 8. Resort hotel. 9. Game room. 10. Pet friendly. The first two components account for 70 % of information. Again, subjects are widely scattered, indicating individual differences, namely, those who prefer day visit or inexpensive hotels, those who prefer good foods, those who prefer family bath (reserved for a family) and animal friendly and those who prefer beautiful surroundings. The results are understandable, and of course if the subjects were not students we would expect different configurations of relations between hot-spring attractions and consumers (Fig. 2.8).

Fig. 2.8 Hot-spring attractions

2.5 Examples of Other Types of Graphs

29

In concluding this chapter, we should note that quantification analysis is a method which is totally dependent on the nature of the data. If the data contain mostly linear relations, the analysis will identify the relative amount of information accounted for by linear relations; if the data contain multidimensional information, it will show you how many independent components the data contain; if the data involve some nonlinear relations, it will identify the nature of nonlinearity and reveal how much of the nonlinear relations are embedded in the data, and so on. We can be assured that quantification analysis is a method which extracts whatever information our data may contain. It is a data-dependent linear/nonlinear, unidimensional/multidimensional method of data analysis. We hope that this chapter has given the readers enough incentive to pursue quantification analysis. Data oriented is the key word of quantification theory.

References Benzécri, J. P., et al. (1973). L’analyse des données: II. L’analyse des correspondances. Paris: Dunod. Carroll, J. D., Green, P. E., & Schaffer, C. M. (1986). Interpoint distance comparisons in Correspondence analysis. Journal of Marketing Research, 23, 271–280. Carroll, J. D., Green, P. E., & Schaffer, C. M. (1989). Reply to Greenacre’s commentary on the Carroll-Green-Sc scaling of two-way correspondence analysis solutions. Journal of Marketing Research, 26, 366–368. Cronbach, L. J. (1951). Coefficient alpha and the internal structure of tests. Psychometrika, 16, 297–334. Greenacre, M. J. (1984). Theory and applications of correspondence analysis. London: Academic Press. Greenacre, M. J. (1989). The Carroll-Green-Schaffer scaling in correspondence analysis: A theoretical and empirical appraisal. Journal of Marketing Research, 26, 358–365. Kretschmer, E. (1925). Physique and character: An investigation of the nature of constitution and of the theory of temperament; with 31 plates. London: Kegan Paul, Trench, Trubner. Likert, A. A. (1932). A technique for the measurement of attitudes. Archives of Psychology, 140, 44–53. Lord, E. M. (1958). Some relations between Guttman’s principal components of scale analysis and other psychometric theory. Psychometrika, 23, 291–296. Nishisato, S. (1980). Analysis of categorical data: Dual scaling and its applications. Toronto: The University of Toronto Press. Nishisato, S. (1980). Dual scaling of successive categories data. Japanese Psychological Research, 22(134), 143. Nishisato, S. (1984). Forced classification: A simple application of a quantification technique. Psychometrika, 49, 25–36. Nishisato, S. (1994). Elements of dual scaling: An introduction to practical data analysis. Hillsdale, NJ: Lawrence Erlbaum Associates. Nishisato, S. (2016). Quantification theory: Dual space and total space. Paper presented at the annual meeting of the Behaviormetric Society, Sapporo, Japan (p. 27) (In Japanese) Nishisato, S. (2019). Reminiscence: Quantification theory and graphs. Theory and Applications of Data Analysis, 8, 47–57. (in Japanese).

30

2 Examples of Quantification

Nishisato, S. (2019b). Expansion of contingency space: Theory of doubled multidimensional space and graphs. An invited talk at the Annual Meeting of the Japanese Classification Society, Tokyo (in Japanese). Nishisato, S., & Baba, Y. (1999). On contingency, projection and forced classification of dual scaling. Behaviormetrika, 26, 207–219. Nishisato, S., Beh, E. J., Lombardo, R., & Clavel, J. G. (2021). Modern quantification theory: Joint graphical display, biplots and alernatives. Singapore: Springer Nature. Nishisato, S., & Clavel, J. G. (2003). A note on between-set distances in dual scaling and correspondence analysis. Behaviormetrika, 30, 87–98. Nishisato, S., & Gaul, W. (1990). An approach to marketing data analysis: The forced classification procedure of dual scaling. Journal of Marketing Research, 27, 354–360. Nishisato, S., & Nishisato, I. (1994). Dual Scaling in a Nutshell. Toronto: MicroStats. Nishisato, S., & Sheu, W. J. (1984). A note on dual scaling of successive categorical data. Psyhcometrika, 49, 493–500.

Chapter 3

Constraints on Quantification

3.1 What Data Should We Quantify? In our common sense, we can say that if data are quantitative, we can subject them directly to any arithmetic operations. Examples of such quantitative data are height, weight and distance. These quantitative data do not need any quantification. Sometimes we find quasi-quantitative data such as ranking and temperature. This is tricky because arithmetic operations do not necessarily produce meaningful numbers: • A Rank 1 movie and a Rank 2 movie together do not yield a Rank 3 movie. • The difference between Rank 1 and Rank 2 cannot generally be equated with the difference between Rank 2 and Rank 3. • 30-degree Celsius (30C) is not twice as hot as 15C, for if we change them to Fahrenheit (F), 30C = 86F and 15C = 59F. Thus, it is now obvious that 86F is no longer twice of 59F. This means that the temperature expressed in Celsius or Fahrenheit cannot be subjected to division or multiplication. What can we do to make such quasi-quantitative data as ranks and temperature amenable to more arithmetic operations? This will be discussed later after we take care of purely non-quantitative data. There are purely qualitative (non-quantitative) data. For them, ordinary arithmetic operations do not generate meaningful numbers, such as • Genders 1 and 2. • Baseball players’ back numbers such as 3, 4, 5, 6, 9 and 24. • District 1, 2, 3 and so on. It is obvious that arithmetic operations applied to these numbers would not yield any meaningful numbers because here the numbers are used only as identification numbers or labels. These qualitative data are the main objects for quantification, where the basic motivation is to quantify them in such a way that the transforms are amenable to the basic arithmetic operations. We wish to make sure that quantified data are also maximally informative in some sense. © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 S. Nishisato, Optimal Quantification and Symmetry, Behaviormetrics: Quantitative Approaches to Human Behavior 12, https://doi.org/10.1007/978-981-16-9170-6_3

31

32

3 Constraints on Quantification

Stevens (1951) defined measurement as assignment of numbers according to certain rules, and provided measurement theory with appropriate mathematical operations. He classified measurement into four types: (1) Nominal measurement (e.g., gender, players’ back numbers). (2) Ordinal measurement (e.g., ranking of movies). (3) Interval measurement (e.g., temperature, test scores). (4) Ratio measurement (e.g., distance, weight, height). His theory of measurement used to be one of the requisite topics for psychometrics and statistics for the social science courses at the universities because social science data are typically not directly amenable to arithmetic operations. For some reasons or others, Stevens’ theory of measurement has disappeared from many introductory courses in applied statistics. The real situation, however, is dominated by data which are not quantitative enough to justify arithmetic operations on them. Many data we collect are not amenable even to such operations as division or multiplication. Imagine that most of test scores we collect at schools are not amenable to simple operations of division and multiplication, yet those test scores are subjected to most demanding statistical analysis. Just ask simple questions such as • Do these test scores have the rational origin of zero? • Do you think that a student with 80 points is twice as bright as the student with 40 points? • Do you think that the student with the mathematic test’s score of 0 has no mathematics ability? We can continue questioning us in this way to realize how sloppy we are in using the numbers properly. Those interested in Stevens’ theory of measurement and those who have never heard of it, please see Appendix A for a detailed introduction to Stevens’ theory of measurement. Those interested, please see also Guilford (1954), Torgerson (1958) and Hand (2004).

3.2 Some More Observations Let us look at the practice of our ordinary data analysis. We often see such scoring as: never = 0, sometimes = 1, often = 3, always = 4. But, can you trust such a scoring scheme? They are rather arbitrary, and later we will learn that such an arbitrary scoring scheme is detrimental to accessing complex information which typically underlies our data. The above example exemplifies a current practice of data analysis. Namely, the input data are fixed and must be subjected to data analysis as they are. This course of action, however, blocks us to look into complex structure of the data and prevents us from nonlinear multidimensional decomposition of data, an important backbone of quantification theory. As we will see in the current book, we look for different functional relations embedded in data as “components” or factors. This means that

3.2 Some More Observations

33

Table 3.1 Evaluation of three teachers Teacher Good Average White* Green Brown Total

1 3 6 10

3 5 3 11

Poor

Total

6 2 0 8

10 10 9 29

our data typically contain a number of components, some to capture linear relations, but mostly nonlinear relations of the variables we want to quantify, that is, our data are rich sources of complex patterns of relations among variables. We will see this aspect of quantification analysis later. In this book, we will start with what Stevens calls nominal data. In our daily usage of data types, nominal data include the contingency table (two-item questionnaire data) and the multiple-choice data (many-item questionnaire data). These (two items) versus (many items) classifications are important from the quantification point of view, as we will see later. Let us start our discussion of quantification problems with contingency tables. In Chap. 1, we saw Kretschmer’s typology data as an example of the contingency table. As noted earlier, the contingency table data are popular and it is interesting to note that the history of quantification theory started with the question of how to analyze the contingency tables.

3.3 Data in Terms of Unknown Numbers Since the Kretschmer data contain a large number of observations, we will use a smaller example from Nishisato (1980): 29 students were asked to evaluate the performance of three teachers (one set of variates for quantification) in terms of three evaluation categories, Good, Average, Poor (another set of variates for quantification). The data are tabulated in the 3 × 3 (the teachers-by-rating categories) contingency table as in Table 3.1. What is quantification? It is the task of assigning unknown numbers to the three teachers in the rows of this table (y1 , y2 , y3 ) and the three evaluation categories (x1 , x2 , x3 ) in an optimal way. In other words, we want to determine those six unknowns in such a way that the original contingency table can be “best explained” in some sense. Without defining the term optimal, let us move on with our discussion. For this book, we will consider a few different ways of quantifying those totally unknown numbers so as to attain optimality. What this optimality means leads to a few different approaches. We will also see that when optimality is attained we see symmetric relations between weights for rows (teachers) and those for columns (evaluation categories).

34

3 Constraints on Quantification

Table 3.2 Teacher evaluation data by unknown numbers Teacher Good Average 1 (y1 , x1 ) 2

3

(y2 , x1 ) (y2 , x1 ) (y2 , x1 ) (y3 , x1 ), (y3 , x1 ) (y3 , x1 ), (y3 , x1 ) (y3 , x1 ), (y3 , x1 )

(y1 , x2 ) (y1 , x2 ) (y1 , x2 ) (y2 , x2 ), (y2 , x2 ) (y2 , x2 ), (y2 , x2 ) (y2 , x2 ) (y3 , x2 ), (y3 , x2 ) (y3 , x2 )

Table 3.3 Teacher evaluation data by unknown scores for teachers Teacher Good Average 1 2 3

y1 y2 , y2 y2 y3 , y3 , y3 y3 , y3 , y3

y1 , y1 y1 y2 , y2 , y2 y2 , y2 y3 , y3 y3

Poor (y1 , x3 ), (y1 , x3 ) (y1 , x3 ), (y1 , x3 ) (y1 , x3 ), (y1 , x3 ) (y2 , x3 ) (y2 , x3 )

0

Poor y1 , y1 , y1 y1 , y1 , y1 y2 y2 0

To start with, we will see how we can express the input contingency table in terms of unknown variates, of which optimal determination is the core of our quantification task. 1: First, let us express our data in terms of those two sets of unknowns, that is, for the three teachers in the rows of this table (y1 , y2 , y3 ) and the three evaluation categories (x1 , x2 , x3 ). Since each element of the contingency table can be represented by the corresponding row weight and column weight, the data can be represented in terms of the two sets of unknowns as in Table 3.2. 2: Let us now consider assigning unknown numbers only to the rows of the table, (y1 , y2 , y3 ). Then, our data can be expressed in terms of these unknowns as in Table 3.3. 3: Finally, let us express the data in terms of unknowns to the columns of the data, (x1 , x2 , x3 ). Our data are now represented as in Table 3.4. Notice that we used a very small example. Yet, the number of unknowns is very large. Imagine if we want to express Kretschmer’s data in terms of unknowns whether they are for unknowns for rows and those for columns or just one of them, we will not be able to represent the data in a table of manageable size, but what is needed here is the idea of representing data in unknowns. Remember that our quantification task is to

3.3 Data in Terms of Unknown Numbers

35

Table 3.4 Teacher evaluation data by unknown category scores Teacher Good Average 1 2 3

x1 x1 , x1 x1 x1 , x1 , x1 x1 , x1 , x1

x2 , x2 x2 x2 , x2 , x2 x2 , x2 x2 , x2 x2

Poor x3 , x3 , x3 x3 , x3 x3 x3 0

assign “appropriate numbers” to those unknown. This task cannot be done unless we impose a number of constraints on the unknowns. Let us discuss those constraints.

3.4 Quantification Under Constraints Our task for quantification is to determine those six unknowns (y1 , y2 , y3 ) for the three teachers and (x1 , x2 , x3 ) for the three rating categories. The assigned numbers must be “very best” in some sense, which we call optimal. Through such optimal numbers (numerals), we should be able to extract the maximal amount of information from the original data. This task sounds rather overwhelming, but rest assured that it is a feasible task. Please keep in mind, however, that this is not a totally free task to determine whatever numbers we may want, but those numbers we will assign to the teachers and the rating categories must be under certain constraints. Yes, only under appropriate conditions on the unknowns we can determine their optimal values. We will impose common-sense conditions on the unknowns. Given the n × m contingency table F with typical element f i j , we wish to assign a vector of n numerals to the rows, y, and a vector of m numerals to the columns, x, under the following conditions:  • The sum of responses weighted by y is zero: f i j yi = y F1c = 0, where 1c is the m × 1 vector of 1’s.  • The sum of responses weighted by x is zero: f i j x j = 1n Fx = 0, where 1n  is the n × 1 vector of 1 s. Notice that we impose the same condition for the rows and the columns, that is, symmetric constraints. • The sum ofsquares of responses weighted by y is equal to the total number of responses: f i. yi2 = y Dr y = f t , where Dr is the diagonal matrix of row totals in the diagonal, and f t is the total number of elements in data matrix F, namely, f t = 1m F1 n.  • Similarly f . j x 2j = x Dc x = f t , where Dc is the diagonal matrix of column totals in the main diagonal. Notice again that we impose the same condition on the rows and the columns.

36

3 Constraints on Quantification

Let us determine those unknowns y and x under the above conditions in such a way that the quantified data retain the maximal amount of information in the original data. What is meant by the maximal amount of information will be explained later. In this regard, quantification theory is to map a set of non-quantitative data optimally onto multidimensional Euclidean space of dimensionality necessary for the data. In other words, we create a new quantitative framework for the structure of the data, using whatever essential information embedded in the contingency table. We have expressed the data in terms of unknown quantities. Our quantification task is to determine those unknowns, under the stated constraints on them, in such a way that the quantified data attain maximally optimal properties. In terms of the data analysis, this process can be stated as generating continuous data that carry the maximal amount of information embedded in qualitative data. Please remember that we assign optimal weights dimension by dimension, which makes it possible for us to identify independent weighting schemes. These allow us to decompose our data into independent linear or nonlinear components. Whether we quantify the contingency table or multiple-choice data, the basic idea is the same, but the differences in the structure of these data will necessitate us to look at different quantification problems. In Chap. 4, we will see different aspects of quantification for the contingency table and the response-pattern table.

References Guilford, J. P. (1954). Psychometric methods. New York: McGraw-Hill. Hand, D. J. (2004). Measurement theory and practice: The world through quantification. London: Arnold. Nishisato, S. (1980). Analysis of categorical data: Dual scaling and its applications. Toronto: The University of Toronto Press. Stevens, S. S. (1951). Mathematics, measurement, and psychophysics. In S. S. Stevens (Ed.), Handbook of experimental psychology. New York: Wiley. Torgerson, W. S. (1958). Theory and methods of scaling. New York: Wiley.

Chapter 4

Quantification Procedures

4.1 Historical Background We can trace early work back to almost 100 years ago by international scholars of ecology (e.g., Gleason (1926) in USA; Lenoble (1927) in France; Ramensky (1930) in Russia). For those in the social and the statistical sciences, early pioneers include Richardson and Kuder (1933), Hirschfeld (1935; better known as H.O. Hartley), Horst (1935), Fisher (1940), Guttman (1941, 1946), Maung (1941), Mosier (1946), Hayashi (1950) and Bock (1956, 1960), followed by a strong and sophisticated group from France (Benzécri et al. 1973). When the basic work matured, books on quantification theory became widely available, most notably the contributions by Lingoes (1978), Nishisato (1980, 1994), Gauch (1982), Meulman (1982, 1986), de Leeuw (1984), Greenacre (1984), Lebart et al. (1984), Nishisato and Nishisato (1984), Nishisato (1994, 2007), van der Heijden (1987), van Rijckevorsel (1987), van der Burg (1989), van Rijckevorsel and de Leeuw (1988), Gifi (1990). The current scene can be inferred from (Beh and Lombardo 2014; Le Roux and Rouanet 2004; Murtagh 2005; Nishisato 2007; Nishisato et al. 2021). Please also look at many important books published in early days in French and Japanese. A large number of them are listed in Chap. 1 of Nishisato et al. (2021).

4.2 Strategies It is important to note that quantification analysis is symmetric analysis as the name dual scaling implies. Here, the symmetry indicates two kinds: (1) mathematical symmetry with respect to the quantification of rows and columns of a two-way data and (2) spatial symmetry over mapping of data in multidimensional space. The latter symmetry is rarely discussed in the literature, but it is equally important to mathematical symmetry which we will use for deriving the optimal quantification. The two types of symmetry are the backbone of quantification theory and we will learn that the symmetry as a result of quantification guarantees optimality of the results.

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 S. Nishisato, Optimal Quantification and Symmetry, Behaviormetrics: Quantitative Approaches to Human Behavior 12, https://doi.org/10.1007/978-981-16-9170-6_4

37

38

4 Quantification Procedures

4.2.1 Quantification Through Correlation We represented the teacher evaluation data in terms of pairs of unknowns, given teacher i and rating j. We will consider this setup for our quantification task. What comes to our mind first is that those two unknowns in each pair are two quantities that represent the same entry in the data table. Therefore, those two unknowns in each parenthesis must be as close as possible between them and as different as possible from those responses that occupy different locations in the data matrix. What would be a statistic that is appropriate to represent the relation between pairs of variables? It is Pearson’s correlation coefficient (product-moment correlation). If this decision is made, our task is how to determine those unknowns given to rows and columns of the data matrix in such a way that the correlation between the rows and the columns is a maximum. Let us indicate the product-moment correlation by r x y and express it in terms of the two sets of unknowns y for teachers and x for rating categories. If we indicate the contingency table by F, then in terms of the conditions imposed on the unknowns (e.g., the sum of scores or weights is zero, thus the mean is zero; see the earlier discussion) the product-moment correlation can be expressed as rx y = √

y Fx  c xy Dr y

(4.1)

x D

where Dc and Dr are, respectively, diagonal matrices of column marginals and row marginals. In our example, the vectors and the matrices are as follows: ⎡ ⎤ ⎡ ⎤ y1 x1 y = ⎣ y2 ⎦ , x = ⎣ x2 ⎦ y3 x3 ⎡

⎤ 10 0 0 Dc = ⎣ 0 11 0 ⎦ , 0 0 8



⎤ 10 0 0 Dr = ⎣ 0 10 0 ⎦ 0 0 9

Our quantification task is to determine the vector of rows, y, and the vector of columns, x, so as to maximize r x y . Although we need some discussions, this problem turns out to be the same as the following task: • Maximize y Fx, subject to x Dc x = y Dr y = ft This constraint is one of the conditions for quantification as discussed earlier. In case you are not familiar with the problem of maximizing a quadratic function under a certain constraint, please refer to some books on calculus. To maximize a quadratic function under a constraint, we follow the ordinary procedure. Namely, we first define the Lagrangian function

4.2 Strategies

39

1 1 Q = y Fx − α(x Dc x − f t ) − β(y Dr y − f t ) 2 2 where 21 α and 21 β are Lagrangian multipliers. Following the standard procedure, we differentiate Q with respect to y, x, α, β, and set the derivatives equal to zero. In this process, we find that α = β = maximized correlation, which we indicate as ρ, instead of r x y . Finally, we will arrive at the optimal vectors: x=

1 −1  D Fy ρ c

1 −1 D Fx ρ r

(4.2)

D−1 r Fx = ρy

(4.3)

y=

This can be rewritten as  D−1 c F y = ρx,

Our task is to find the set (x, y, ρ) that satisfies these two Eqs. (4.2) or (4.3). Notice (4.2) which shows perfect symmetry between y and x, namely, y is proportional to the mean vector weighted by x and vice versa. Please do not worry too much about the mathematics involved in this task now, but see that after further algebraic manipulations, we finally arrive at the eigenequations: Let us first define −1 −1 B = Dc 2 FDr 2 Then the eigenequation is given by −1

(B B − ρ 2 I)Dc 2 x = 0 −1

(BB − ρ 2 I)Dr 2 y = 0

(4.4) (4.5)

Right now, let us not worry too much about this final stage, but accept that the optimal numerals for the rows and the columns of the contingency table can be given by the solution to the above equations. Our optimal vectors satisfy (4.2) and (4.3). Without further discussion of our actual tasks, let us look at another way of formulating the quantification task.

4.2.2 Quantification Through Correlation Ratio In one-way analysis of variance, it is known that the total sum of squares SSt (i.e., the sum of squares of differences between individual scores minus the general mean) can be expressed as the sum of the between-group sum of squares SSb (i.e., the sum of squares of differences between category means and the general means, times category frequency) and the within-group sum of squares SSw (i.e., the sum of squares of differences between individual scores and the group means). Namely

40

4 Quantification Procedures

SSt = SSb + SSw and the ratio of SSb over SSt is the statistic called correlation ratio, η2 . That is, η2 =

SSb SSt

The quantification is carried out to determine weights x j for rating categories in such a way that the correlation ratio be a maximum. Once we determine x j , it is known that yi can be calculated by formula (4.2) or (4.3). Note that the same procedure can be formulated in terms of yi and once the optimal weights are obtained, x j can be obtained by (4.2) or (4.3). The symmetric nature of our optimal solution also guarantees the following relation: η2 = ρ 2 Namely, the eigenvalue is equal to the maximized correlation ratio, which is also equal to the square of the maximum correlation ρ 2 . Going back to our example, SSt and SSb are expressed as follows: SSt =

SSb =



⎤⎡ ⎤ 10 0 0 x1 f . j x 2j = [x1 x2 x3 ] ⎣ 0 11 0 ⎦ ⎣ x2 ⎦ = x Dc x 0 0 8 x3 ⎡

(3x + 1 + 5x2 + 2x3 )2 (6x1 + 3x2 )2 (x1 + 3x2 + 6x3 )2 + + 10 10 9 ⎡ 1 ⎤ 0 0 10 1 = x F ⎣ 0 10 0 ⎦ Fx = x F D−1 c Fx. 0 0 19

Therefore, the correlation ratio η2 can be expressed as η2 =

SSb x F D−1 c Fx =  SSt x Dc x

(4.6)

The quantification task is to determine the weight vector x so as to maximize η2 . The details of this maximization task will not be discussed here, but following the widely used procedure, we will arrive at the stage of solving the so-called eigenequation, which is in the current case −1

(B B − ρ 2 I)Dc 2 x = 0 Once we obtain x, we can calculate y by Eq. (4.2) or (4.3).

(4.7)

4.2 Strategies

41

The process of optimization is the same for y: we can express the correlation ratio as a function of y, which is  SSb y FD−1 r Fy = SSt y  Dr y

(4.8)

(BB − ρ 2 I)Dr − 2 y = 0

(4.9)

η2 = The eigenequation is

1

Once we obtain y, we can calculate x from y by formula (4.2) or (4.3).

4.2.2.1

Notes

These computations, whether the maximization of ρ or η2 , require an extensive amount of computations and the task should be carried out by a computer program.

4.2.3 Quantification Through Cronbach’s Alpha In 1958, Lord demonstrated that the maximization of the correlation ratio leads to the maximization of the generalized Kuder-Richardson reliability, which is typically referred to as Cronbach’s alpha, or just α. In terms of our decomposition of the one-way analysis of variance as we saw in the previous section, we can express Cronbach’s alpha as   SSt − SSw n 1− α= n−1 nSSb where n is the number of questions (items). Since we set f  x = 0, we obtain that SSw = 0. Under this condition, we obtain η2 =

1 1 + (n − 1)(1 − α)

Therefore, we obtain α =1−

1 − η2 (n − 1)η2

This formula means that maximizing η2 means maximizing α! Nishisato (1980) went further to derive the following interesting relation between α and η2 . As we can agree, the reliability must be non-negative and the reliability of 0 must be the minimum value and 1 as its maximum, 1≥α≥0

42

4 Quantification Procedures

If we adopt his reasoning, we arrive at the following important relation: 1 ≥ η2 ≥

1 ⇐⇒ 1 ≥ α ≥ 0 n

(4.10)

Therefore, his suggestion is that we should adopt the components, of which η2 is greater than n1 . We will use this condition only when we talk about reliability of a test. In passing, we should note that the value n1 is equal to the average eigenvalue of n-item multiple-choice data (Nishisato 1980).

4.2.4 Method of Reciprocal Averages: MRA When the computers were not readily available, the researchers often resorted to a simple iterative process, called the method of reciprocal averages, abbreviated as MRA. Let us look at this old procedure next. The method was described by Richardson and Kuder (1933) and named MRA by Horst (1935). In early days of quantification theory, computers were not available, and how to solve the quantification problem was a difficult problem to answer. To respond to the need of the time, MRA was promoted by Baker (1960); Hill (1973); Horst (1935); Mosier (1946); Richardson and Kuder (1933), among others. In early days, researchers were mainly interested in unidimensional analysis, and the object of research was to look for an optimal quantification method to derive only the first component. Let us use our teacher evaluation data again. MRA involves the following iterative scheme: • Assign subjective scores to alternative response categories (good, average, poor). • Calculate scores for the teacher as the weighted average; using these scores, calculate the averages of response categories. • Using these response category averages as weights, re-calculate scores for the teachers. • Repeat the same process. • This reciprocal averaging process always converges to the stable point, at which we find the optimal scores for the teachers and the optimal weights for the evaluation categories. This method was very popular until the middle of 1960s when personal computers became widely available. Let us ask a key question: How can we be confident that the final results are optimal? The answer to this question touches the core of our quantification and we must know possible answers before going further. To see a glimpse of MRA, let us follow its successive averaging process. Let us indicate the initial row vector and the column vector, respectively, by y0 and x0 . Using these we calculate the averages of rows and columns, which are then used to calculate another set of averages, and so on. We will

4.2 Strategies

43

express the successive iterations by the subscripts for the mean vectors. −1  y1 = D−1 r Fx0 .....x1 = Dc F y1 −1  y2 = D−1 r Fx1 .....x2 = Dc F y2

.......... .......... −1  yp = D−1 r Fxp−1 .....xp = Dc F yp

See Nishisato (1980) where he derived the expressions of these vectors in terms of the initial vectors and then provided the solution for MRA as a problem of solving the following equations. He demonstrated that the solution is independent of the initial vectors: lim yp =?

p→∞

lim xp =?

p→∞

He has shown that the question marks in the above formulas are exactly equal to the optimal vectors for the rows and the columns. The MRA task indeed leads us to solving the same eigenequations as those obtained in the maximization of correlation, correlation ratio and Cronbach’s alpha. However, for the practical use of MRA, one must center and standardize the iterative processes. The detailed computational procedure is borrowed from Nishisato (1994), and is included in Appendix B with a simple numerical example.

4.3 Optimal Symmetric Properties The correlation approach, the correlation-ratio approach, the reliability approach, MRA and many other approaches, mentioned earlier, all yield identical results. Those methods all attain simultaneous linear regressions (i.e., the method of simultaneous linear regressions, Hirschfeld 1935), optimal quantification (i.e., optimal scaling, Bock 1960), maximal homogeneity (i.e., homogeneity analysis, de Leeuw, Heiser, Meulman, van der Heijden and others of the Dutch group, 1973 and on) and symmetric transformation (i.e., dual scaling, Nishisato 1976, 1980). These aspects clearly suggest that the method is almost almighty as a means for data analysis, providing optimal row weight and optimal column weights in complete symmetry. Let us summarize some of the mathematical properties of our quantification results.

44

4 Quantification Procedures

• Our quantification procedure transforms the data in such a way that it will capture whatever information contained in the data, sometimes, it is a linear relation or a variety of nonlinear relations, unidimensional or multidimensional. Consider a widely used method of Likert scores, which can capture only linear relations, and if the data contain nonlinear relations, Likert scores will yield uninterpretable outputs. • Our quantification procedure maximizes the product-moment correlation, the correlation ratio, Cronbach’s reliability α, item-total correlation, item-total sum of squares and inter-item correlation coefficients, that is, those statistics researchers are most interested in. • The means of optimally weighted row responses are the projections of the data onto the column space and vice versa: ρk yk = D−1 r Fxk ;

• • • • • •

 ρk xk = D−1 c F yk

where Dr and Dc are, respectively, diagonal matrices of row totals and column totals of the data matrix. Notice that ρk is a projection operator! ρk is the maximal product-moment correlation for component k. yk is the vector of weights for rows that maximizes the correlation ratio ηk2 , which is equal to ρk2 . xk is the vector of weights for columns that maximizes correlation ratio ηk2 , which is equal to ρk2 . ρk is the k-th singular value of the singular-value decomposition, obtained from the maximization of product-moment correlation in terms of yk and xk . ρk2 is the k-th eigenvalue of the eigenvalue decomposition, obtained from the maximization of the correlation ratio. cos−1 ρk is the angle between yk and xk of the k-th singular vectors of rows and that of columns. This is a very important point to remember, for it means that the optimal weight vector for the rows and the one for the columns of the contingency table do not generally span the same space. For a single component, we need twodimensional space, one for the rows and the other for the columns. These vectors are crossed at the origin with the angle equal to θ , which is given by cos −1 ρ.

The last point clearly shows that it is wrong to plot y and x in the same space, which, however, has become the routine procedure that correspondent plot (French plot) employs. In other words, correspondence plots are theoretically not correct but only an approximation to the correct graph. Some of these may be too technical, but all of these contribute to the characterizations that our quantification procedure offers to you in the name of optimal quantification. As mentioned earlier, the relation between rows and columns can be linear, nonlinear or any other form, and whatever our data are we will find the structure of the data through quantification analysis.

4.4 Bilinear Expansion and Graphical Display

45

4.4 Bilinear Expansion and Graphical Display All the discussions so far can be represented by the bilinear expansion of the elements of the contingency table. In terms of matrix notation, the bilinear expansion can be written as follows:   1 Dr YX Dc F= ft where F is the m × n contingency table, f t is the sum of the total elements of the contingency table, Dr is the m × m diagonal matrix of row totals of the contingency table, Y is the m × K matrix of singular vectors such that Y Y = I, Dc is the n × n diagonal matrix of column totals of F, X is the n × K matrix of singular vectors such that X X = I and  is the K × K diagonal matrix of singular values. Let us note the following outcomes of the expansion: • The matrix of principal coordinates of rows, say Y∗ , is given by Y∗ = Y . • The matrix of principal coordinates of columns, say X∗ , is given by X∗ = X . • The matrix of standard coordinates of rows is given by Y. • The matrix of standard coordinates of columns is given by X. At this stage, note that the key component of the contingency table YX can be expressed as (4.11) YX = (Y)X = Y∗ X = Y ()X = Y X∗ In other words, the contingency table can be decomposed into the product of [the principal coordinates of the rows and standard coordinates of the columns] or [standard coordinates of the rows and principal coordinates of the columns]. Please note here that the data structure of the contingency table cannot be expressed as the product of principal coordinates of both rows and columns, the condition under which the rows and the columns can be accommodated in common space. In contrast, principal component analysis (Hotelling 1933) deals with coordinates of either rows or columns and therefore does not have this problem we face with quantification theory. This will become an important aspect of our quantification when we discuss joint graphical display (graphical display of both rows and columns) later. The fact that the contingency table cannot be expressed as the products of principal components of rows and columns indicates that the rows and the columns cannot be graphed in the same graph. This difficulty is referred to as the perennial problem of joint graphical

46

4 Quantification Procedures

display, a unique problem for quantification theory. In contrast, principal component analysis deals with only one set of variables (either rows or columns) and does not encounter this difficult problem of joint graphical display. In order to solve this space-discrepancy problem, Nishisato (1980) transforms the contingency table to the response-pattern table so that the rows and the columns of the contingency table acquire the symmetric status in the response-pattern format. This means that it is proper to analyze the response-pattern format of the contingency table so that we do not have to face the space-discrepancy problem. This will be discussed when we look at the joint graphical display of quantification results. This chapter is devoted to the basic mathematics of quantification theory. What we discovered are (1) there are many ways to formulate quantification theory and all of them converge to the same decomposition of the data matrix; (2) the optimal quantification is attained through symmetric treatments of rows and columns and (3) the perennial problem of joint graphical display can be solved by doubling the space, which can be achieved by subjecting the response-pattern data, instead of the contingency table. This is the topic to be discussed later. Please do not worry about joint graphical display problem, because we will devote one chapter to the problem later.

References Baker, F. B. (1960). UNIVAC scientific computer program for scaling psychological inventories by the method of reciprocal averages CPA 22. Behavioral Science, 5, 268–269. Beh, E. J., & Lombardo, R. (2014). Correspondence analysis: Theory, practice and new strategies. UK: Wiley. Benzécri, J. P., et al. (1973). L’analyse des données: II. L’analyse des correspondances. Paris: Dunod. Bock, R. D. (1956). The selection of judges for preference testing. Psychometrika, 21, 349–366. Bock, R. D. (1960). Methods and applications of optimal scaling, No. 25. The University of North Carolina Psychometric Laboratory Research Memorandum. de Leeuw, J. (1984). Canonical Analysis of Categorical Data. Leiden: DSWO Press. Fisher, R. A. (1940). The precision of discriminant functions. Annals of Eugenics, 10, 422–429. Gauch, H. G. (1982). Multivariate analysis in community ecology. Cambridge: Cambridge University Press. Gifi, A. (1990). Nonlinear multivariate analysis. New York: Wiley. Gleason, H. A. (1926). The individual concept of the plant association. Bulletin, Torrey Botanical Club, 53, 7–26. Greenacre, M. J. (1984). Theory and applications of correspondence analysis. London: Academic Press. Guttman, L. (1941). The quantification of a class of attributes: A theory and method of scale construction. In the Committee on Social Adjustment (Ed.), The prediction of personal; adjustment (pp. 319–348). New York: Social Science Research Council. Guttman, L. (1946). An approach for quantifying paired comparisons and rank order. Annals of Mathematical Statistics, 17, 144–163. Hayashi, C. (1950). On the quantification of qualitative data from the mathematico-statistical point of view. Annals of the Institute of Statistical Mathematics, 2, 35–47.

References

47

Hill, M. O. (1973). Reciprocal averaging: An eigenvector method of ordination. Journal of Ecology, 61, 237–249. Hirschfeld, H. O. (1935). A connection between correlation and contingency. Cambridge Philosophical Society Proceedings, 31, 520–524. Horst, P. (1935). Measuring complex attitudes. Journal of Social Psychology, 6, 369–374. Hotelling, H. (1933). Analysis of complex of statistical variables into principal components. Journal of Educational Psychology, 24,417–441, and 498–520. Lebart, L., Morineau, A., & Warwick, K. M. (1984). Multivariate descriptive statistical analysis. New York: Wiley. Le Roux, B., & Rouanet, H. (2004). Geometric data analysis: From correspondence analysis to structured data. Dordrecht: Kluwer. Lenoble, F. (1927). A propos des associations vegetables. Bulletin de la Société Botanique de France, 73, 873–893. Lingoes, J. C. (1978). Geometric representation of relational data. Ann Arbor: Mathesis Press. Maung, K. (1941). Measurement of association in contingency tables with special reference to the pigmentation of hair and eye colours of Scottish children. Annals of Eugenics, 11, 189–223. Meulman, J. (1982). Homogeneity Analysis of Incomplete Data. Leiden University: DSWO Press. Meulman. (1986). A distance approach to nonlinear multivariate analysis. Leiden: DSWO Press. Mosier, C. I. (1946). Machine methods in scaling by reciprocal averages. In Proceedings of the research forum (pp. 35–39). Endicath, NY: International Business Corporation. Murtagh, F. (2005). Correspondence analysis and data coding with R and Java. Boca Raton: Chapman and Hall. Nishisato. (1976). Optimal scaling. A talk at the Symposium on Optimal Scaling, organized by F. Young at the 1976 Annual Meeting of the Psychometric Society at Murray Hill, N.J. (Note: The name “dual scaling” was proposed during this meeting by Nishisato). Nishisato, S. (1980). Analysis of categorical data: Dual scaling and its applications. Toronto: The University of Toronto Press. Nishisato, S. (1994). Elements of dual scaling: An introduction to practical data analysis. Hillsdale, NJ: Lawrence Erlbaum Associates. Nishisato, S. (2007). Multidimensional nonlinear descriptive analysis. London: ChapmanHall/CRC. Nishisato, S., Beh, E. J., Lombardo, R., & Clavel, J. G. (2021). Modern quantification theory: Joint graphical theory: Joint graphical display, biplots and alternatives. Singapore: Springer Nature. Nishisato, S., & Nishisato, I. (1984). An introduction to dual scaling. Toronto: MicroStats. Nishisato, S., & Nishisato, I. (1994). Dual scaling in a nutshell. Toronto: MicroStats. Ramensky, L. G. (1930). Zür Methodik der vegleichenden und Ordnung von Pfanzenlichen und anderen Objecten, die durch mehrere, verschiedenartig wirkende Factoren bestimmt werden. Beiträge zur Biologie der Pflanzen, 18, 269–304. Richardson, M., & Kuder, G. F. (1933). Making a rating scale that measures. Personnel Journal, 12, 36–40. van der Burg, E. (1988). Nonlinear canonical correlation and some related techniques. Leiden: DSWO Press. van der Heijden, P. G. M. (1987). Correspondence analysis of longitudinal categorical data. Leiden: DSWO Press. van Rijckevorsel, J. L. A. (1987). The application of fuzzy coding and horseshoe in multiple correspondence analysis. Leiden: DSWO Press. van Rijckevorsel, J. L. A., & de Leeuw, J. (1988). Component and correspondence analysis: Dimension reduction by functional application. New York: Wiley.

Chapter 5

Mathematical Symmetry

5.1 Bi-modal Symmetry This type of symmetry is directly related to the optimal quantification of the contingency table, and because of the symmetric treatment of rows and columns of a two-way table, we call it bi-modal symmetry.

5.1.1 Correlation Under the condition that the sum of responses weighted by y is zero and the sum of responses weighted by x is zero, our optimization task is to determine y and x so as to maximize the correlation between the two sets of weighted responses. Namely, maximize r x y in terms of unknowns x and y. The correlation can be expressed as follows: y F x rx y = √  c  x Dc xy Dr y where Fc is the m × n contingency table, Dc is the diagonal matrix of column marginals in the main diagonals and Dr is the diagonal matrix of row marginals in the main diagonal. This maximization problem results in the following set of equations: 1 1 F y, y = D−1 Fc x x = D−1 ρ r c ρ c or, perhaps more meaningfully, we can express that the weighted average vector of rows is the projection of the column vector onto the row space, and vice versa, namely,  ρy = D−1 ρx = D−1 r F y, c Fx

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 S. Nishisato, Optimal Quantification and Symmetry, Behaviormetrics: Quantitative Approaches to Human Behavior 12, https://doi.org/10.1007/978-981-16-9170-6_5

49

50

5 Mathematical Symmetry

We have already discussed that ρ (singular value) is the projection operator of the data from the row space onto the column space, or vice versa. Recall that the above relation was the conclusion of Hirschfeld’s method of simultaneous linear regressions (Hirschfeld 1935). It is interesting to know that the iterative method of reciprocal averages (MRA) (Richardson and Kuder 1933; Horst 1935) was solved directly by Hirschfeld without any iterations.

5.1.1.1

Notes on Herman Otto Hirschfeld

His 1935 paper on simultaneous linear regressions (Hirschfeld 1935) was his first academic paper. He left Germany after his Ph.D. in mathematics from the University of Berlin, moved to Cambridge where he obtained another Ph.D. in statistics, and then moved to the USA, where he later served the President of the American Statistical Association and also founded the Institute of Statistics at Texas A&M University. His first world-fame was through his publication of the first volume with E. S. Pearson, entitled Biometrika Tables for Statisticians in 1954, and the second volume was published 18 years later. In 1938, he changed his name from Hirschfeld to Hartley. His 1935 paper is sometimes cited as mathematical origins of correspondence analysis. The author of the current book knew Hartley through the famous Biometrika Tables for Statisticians and also met him at conferences, but did not realize then that he was the author of the famous 1935 paper. According to a record, only his first paper of 1935 was published under the name of Hirschfeld, and the subsequent numerous publications are all under the name of Hartley.

5.1.2 Correlation Ratio We have already looked at the correlation-ratio approach, but let us review what we have looked at. One-way analysis of variance model was used to decompose the total sum of squares (SSt ) into the between-group sum of squares (SSb ) and the within-group sum of squares (SSw ): SSt = SSb + SSw where in one-way analysis of variance the data are indicated by yi j , and these terms are given by  (yi j − m .. )2 SSt = SSb =

 (yi j − m j )2

SSw = SSt − SSb

5.1 Bi-modal Symmetry

51

where m .. is the overall mean and m j is the group mean. The correlation ratio η 2 is the ratio of SSb to SSt . The quantification task is to determine row weights (i.e., scores for teachers) so as to maximize the correlation ratio. Then, we consider the same problem as above, except we change unknown numerals to columns (rating categories) and we define the correlation ratio as a function of the weights to the columns, and the maximization is carried out as the problem of determining column weights to maximize the correlation ratio. After a few manipulations of formula, recall that we came to the conclusion that both quantification outcomes yield the identical results. Here we find complete symmetric relations between the two unknown vectors y and x. Namely, the vector of principal coordinates of rows ρy is the mean vector of responses weighted by x and the vector of principal coordinates of columns ρx is the mean vector of responses weighted by y,  ρx = D−1 r F y,

ρy = D−1 c Fx

Thus, in summary of the correlation and the correlation-ratio approaches, we can state that: • The principal coordinate of row i is the mean of the optimally weighted ith rows. • The principal coordinate of column j is the mean of the optimally weighted jth columns. • The row-column correlation is maximum. • The maximized correlation ratio of responses weighted by row weights is equal to the maximized correlation ratio of responses weighted by column weights. • The regression coefficient for predicting rows from columns is equal to the regression coefficient for predicting columns from rows, which is equal to ρ, as shown by Hirschfeld (1935). • Simultaneous projections of the data onto principal coordinates of data space mean show symmetric transforms for both rows and columns. These are some of our observations that quantification is carried out symmetrically for the rows and the columns of the data matrix. Let us now move on to a slightly more intriguing symmetry than what we have seen so far.

5.2 Multi-modal Symmetry We have just seen the symmetry in quantification of rows and columns of contingency tables. Let us now extend the same discussion when we have more than two sets (rows and columns) of variables. This applies to the quantification of multiple-choice data and we will see the symmetric relations over many more sets of sub-matrices of data sets. This is called multi-modal symmetry, and it appears when multiple-choice data are quantified.

52

5 Mathematical Symmetry

5.2.1 Piecewise Method of Reciprocal Averages Suppose we consider multiple-choice data with the number of items greater than two. Then, we can see what is called multi-modal symmetry. This was demonstrated in the context of the piecewise method of reciprocal averages (PMRA) for multiple-choice data, developed by Nishisato and Sheu (1984). We have discussed the method of reciprocal averages (Richardson and Kuder 1933; Horst 1935) for quantification of the contingency table and more generally a two-way table. If we consider multiple-choice data, expressed in the form of the response-pattern table, we can still apply MRA to the subjects-by-response patterns of many items. One practical problem is the size of the response-pattern table, which can be too large to handle. Therefore, to reduce the computational labor, Nishisato and Sheu (1984) developed a method which deals with a set of item-options-byitem-options cross-product matrices of (1, 0) response patterns, rather than one huge matrix of the subjects-by-(a large number of options of many items) or a huge matrix of (all response options)-by-(all options of many items). In developing their method, they discovered this multi-modal symmetry in the optimal solution. It is amazing that symmetry exists among options-by-options matrices of many items. To simplify its application to multiple-choice data, let us use data obtained from nine subjects, who answered three multiple-choice questions with three response options per question. The data (Nishisato and Sheu 1984) are as follows: ⎡

Question ⎢ O ption ⎢ ⎢ Subject1 ⎢ ⎢ 2 ⎢ ⎢ 3 ⎢ F=⎢ 4 ⎢ ⎢ 5 ⎢ ⎢ 6 ⎢ ⎢ 7 ⎢ ⎣ 8 9

1 1 1 1 0 0 0 0 0 0

1 2 0 0 0 1 1 1 0 0 0

3 0 0 0 0 0 0 1 1 1

1 1 1 0 0 0 0 0 0 0

2 2 0 0 1 1 1 0 1 0 0

3 0 0 0 0 0 1 0 1 1

1 1 0 1 0 0 1 0 1 0

3 2 0 1 0 1 0 0 1 0 0

⎤ 3⎥ ⎥ 0⎥ ⎥ 0⎥ ⎥ 0⎥ ⎥ 0⎥ ⎥ 1⎥ ⎥ 0⎥ ⎥ 0⎥ ⎥ 0⎦ 1

One way of applying MRA to multiple-choice data is to apply it to this responsepattern table. In real research, however, the response-pattern table can be too large to handle. If this should happen, the next alternative is to use the following crossproduct matrix:

5.2 Multi-modal Symmetry

53



3 ⎢0 ⎢ ⎢0 ⎢ ⎢2 ⎢  FF=⎢ ⎢1 ⎢0 ⎢ ⎢2 ⎢ ⎣1 0

0 3 0 0 2 1 1 1 1

0 0 3 0 1 1 1 1 1

2 0 0 2 0 0 1 1 0

1 2 1 0 4 0 1 2 1

0 1 2 0 0 3 2 0 1

2 1 1 1 1 2 4 0 0

1 1 1 1 2 0 0 3 0

⎤ 0 1⎥ ⎥ 1⎥ ⎥ 0⎥ ⎥ 1⎥ ⎥ 1⎥ ⎥ 0⎥ ⎥ 0⎦ 2

The current example does not show any advantage of this second option, but when the number of respondents is large, say 100, this second alternative would be more advantageous than the first one in terms of computation. In general, however, these two formats of input are not very practical. Thus, this method of Piecewise Method of Reciprocal Averages (PMRA) was developed and proposed for the case of many multiple-choice items and/or the case of large numbers of response options. The method PMRA deals with options-by-options of individual items. In our example, we use only nine 3-by-3 sub-matrices as shown in the following matrix. Notice that each sub-matrix is, in our example, a 3-by-3 matrix, that is, the elements of the following matrix: ⎡ ⎤ D1 C12 C13 ⎣ C21 D2 C23 ⎦ C31 C32 D3 More specifically, ⎡

⎤ ⎡ ⎤ 300 210 D1 = ⎣ 0 3 0 ⎦ ; C12 = ⎣ 0 2 1 ⎦ = C21 003 012 ⎡

C13

⎤ ⎡ ⎤ 210 110 = ⎣ 1 1 1 ⎦ = C31 C23 = ⎣ 1 2 1 ⎦ = C32 111 201 ⎡

⎤ ⎡ ⎤ 200 400 D2 = ⎣ 0 4 0 ⎦ ; D3 = ⎣ 0 3 0 ⎦ 003 002 The correlation ratio or the eigenvalue for the current example can be expressed in terms of the weight vector x as x F Fx ρ2 = 3x Dx

54

5 Mathematical Symmetry

To determine x so as to maximize ρ2 , the partial derivative of ρ2 with respect to x is set equal to zero, which leads to F Fx = 3ρ2 Dc x Namely,

D1 x1 + C12 x2 + C13 x3 = 3ρ2 D1 x1 C21 x1 + D2 x2 + C23 x3 = 3ρ2 D2 x2 C31 x1 + C32 x2 + D3 x3 = 3ρ2 D3 x3

Let us set 3ρ2 − 1 = λ Then the above set of three equations can be written as C12 x2 + C13 x3 = λD1 x1 C21 x1 + C23 x3 = λD2 x2 C31 x1 + C32 x2 = λD3 x3 To derive reciprocal averaging formulas, let us set λ = 1, and rewrite the above formulas as x1 = D−1 1 (C12 x2 + C13 x3 ) (5.1) x2 = D−1 2 (C21 x1 + C23 x3 ) (C x + C x ) x3 = D−1 31 1 32 2 3 These are the core formulas for the piecewise reciprocal averaging. However, to avoid the trivial component, as discussed earlier, we should use the following sub-matrices, instead of those in the above formulas: Cjk = F j Fk −

F j 11 Fk N

Then, the iterative scheme can be summarized as follows: • Step 1: Assign arbitrary vectors to x1 , x2 , x3 for the right-handed sides of the above formulas. • Step 2: Calculate the new values of x1 , x2 , x3 by the above formula. • Step 3: Calculate p1 , p2 , p3 by the formula pj =

x j Dj xj 3N

where we use the new xj , j = 1, 2, 3. • Step 4: Using the new vectors, calculate weighted vectors β j xj , where

5.2 Multi-modal Symmetry

55

Table 5.1 Constancy of correlation ratio Sum of squares 1 2

3

Total

SS(T j ) SS(B j ) SS(W j )

11.14 7.51 3.63

10.96 7.39 3.57

4.90 3.30 1.59

27.00 16.26 8.80

SS(B j ) SS(T j )

0.6742

0.6742

0.6742

0.6742

βj =

nNpj xj Dj xj

and indicate these weighted vectors by xj • Step 5: If the discrepancy between xj of the previous iteration and the new ones are all less than say 0.0001, go to Step 6. Otherwise go to Step 2 and use the new xj for the right-hand side of the reciprocal averaging formulas. • Step 6: Calculate λ and ρ2 by the formula. The score for a subject can be calculated as the average of the final weights of those options chosen by the subject, divided by ρ. The quantification of the original response-pattern table F provides the following optimal weights: x’ = [(1.56, −0.63, −0.93), (1.93, −0.16, −1.07), (0.29.0.52, −1.37)] ρ2 = 0.6742. From the piecewise method, we obtain exactly the same scale values for these categories and the eigenvalue, namely, ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ 1.56 1.93 0.29 (5.2) x1 = ⎣ −0.63 ⎦ , x2 = ⎣ −0.16 ⎦ , x3 = ⎣ 0.52 ⎦ , ρ2 = 0.6742 −0.93 −1.07 −1.37 Let us also see a small example of multi-modal symmetry, using the same example. To demonstrate this piecewise symmetry, let us calculate the total sum of squares of each of the three items, SS(T j ); the between-group sum of squares, SS(B j ) and the within-group sum of squares, SS(W j ), j = 1, 2, 3 under PMRA (Table 5.1). Please note the last lines of the above table, which reveals quite unexpected but clear symmetry over individual items. The piecewise ratios are constant. This is nothing but multi-modal symmetry. When Nishisato and Sheu (1984) demonstrated this piecewise phenomenon, it was one of the most remarkable discoveries: these consistent ratios are embedded in the optimization of multiple-choice data. This was a noteworthy finding in the history of quantification theory. We must further investigate the implication of this

56

5 Mathematical Symmetry

finding with respect to the data formats and optimal quantification. We need much further investigation along this line of inquiry.

5.2.2 Generalization to n Variables This procedure can easily be extended to data of n multiple-choice questions. Let F be N × m matrix of 1’s and 0’s such that ⎤ ⎡ D1 C12 ... C1n ⎢ C21 D2 ... C2n ⎥ ⎥ F F = ⎢ ⎣ . . ... . ⎦ Cn1 Cn2 ... Dn The reciprocal averaging formula can be expressed as xj = D−1 j



Cjk xk , j = 1, 2, ..., n

k=1

The relative size p j is given by A/B where A=

n 

xj Cjk xk ,

B=

k,k= j

λ is given by λ=

n  n  i

xj Cjk xj xj Dj xj

xi Cik xk

k,i=k

, j = 1, 2, ..., n

and η 2 (or ρ2 ) is given for any j by η 2 = ρ2 =

1+λ n

Look at the formula for λ where the right-hand sides are all equal when the optimal quantification is attained. In other words,during the iterative process, these ratios are not equal, but as soon as the optimal solution is attained, all the ratios on the right-hand sides of the equation become identical!. This is an example of multi-modal symmetry attained when the optimal quantification is carried out. Namely, this symmetry is attained only when the process converges to the solution. An impressive symmetry of this optimization can also be demonstrated in a slightly different way from the above demonstration. Nishisato and Sheu (1984) also showed the following: At the time of the convergence of the piecewise reciprocal averaging,

5.2 Multi-modal Symmetry

57

the optimal results can be demonstrated by the entire arrangements of symmetry throughout the data set, namely, ρ2 =

SSb1 SSb2 SSbn = =···= SSt1 SSt2 SStn

As also noted in our numerical example, the above relations are a remarkable aspect of symmetric optimization. We should note that the above consistency relations can be extended to any combination of the number of options and items. This is multimodal symmetry!

References Bock, R. D. (1960). Method and applications of optimal scaling. The University of North Carolina Psychometric Labortory Research Memorandum, No. 25. de Leeuw, J. (1973). Canonical analysis of categorical data. Doctoral thesis, Leiden University. Hirschfeld, H. O. (1935). A connection between correlation and contingency. Cambridge Philosophical Society Proceedings, 31, 520–524. Horst, P. (1935). Measuring complex attitudes. Journal of Social Psychology, 6, 369–374. Meulman, J. (1982). Homogeneity analysis of incomplete data. Leiden University: DSWO Press. Nishisato (1976). Optimal scaling. A talk at the Symposium on Optimal Scaling, organized by F. Young at the 1976 Annual Meeting of the Psychometric Society at Murray Hill, N.J., where “dual scaling” was proposed for optimal scaling by Nishisato. Nishisato, S. (1980). Analysis of categorical data: dual scaling and its applications. Toronto: The University of Toronto Press. Nishisato, S., & Sheu, W. (1984). Piecewise method of reciprocal averages for multiple-choice data of multiple-choice data. Psyhometrika, 45, 467–478. Richardson, M., & Kuder, G. F. (1933). Making a rating scale that measures. Personnel Journal, 12, 36–40.

Chapter 6

Data Format and Information

6.1 Two Formats of Same Data From the quantification point of view, Nishisato (1980) paid his attention to the merits of using an alternative format of the contingency table, namely the response-pattern table, and devoted one chapter of his book (Chap. 4) to the problem. It is unfortunate that his view has been mostly ignored for many years since 1980, and we must revive it after so many years. In his view, the response-pattern format of the contingency data is much more informative than the contingency table format. In the former format, one can capture more information than from the contingency table. The importance of this choice problem becomes apparent only when we consider graphical displays of the quantification outcomes: To arrive at a logically correct graphical display of quantification outcomes, we must extract information from the response-pattern format of the contingency table (Nishisato 2019a, b). More concretely, once we transform the contingency table to the response-pattern table of subjects-by-(row categories and column categories of the contingency table), we have the situation in which we can see the multidimensional structure of rows and columns in common space, thanks to the pioneering work of Young and Householder (1938). In other words, we can now apply the Young-Householder theorem to the quantification outcomes from the response-pattern table, but not to the outcomes from the contingency table! If, however, we decide to quantify the contingency table format, the rows and the columns of quantified outcomes do not generally span the same multidimensional space because the rows and the columns are not perfectly correlated. Even though the quantification determines the weights for rows and those for columns of the contingency table so as to maximize the row-column correlation, it does not reach the value of 1 (if the correlation is 1, it is not interesting to analyze, but one can then plot rows and columns in common space of the graph). Thus, if we analyze the contingency table format, the rows and the columns associated with a single component require two dimensions, one for the row weights © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 S. Nishisato, Optimal Quantification and Symmetry, Behaviormetrics: Quantitative Approaches to Human Behavior 12, https://doi.org/10.1007/978-981-16-9170-6_6

59

60

6 Data Format and Information

Table 6.1 Contingency table format F Smoking? Drink Coffee Yes-smoking Non-smoking

3 1

Not always

Tea

2 2

1 4

Table 6.2 Response-pattern table Fp Yes

No

Coffee

C or T

Tea

1 1 1 1 1 1 0 0 0 0 0 0 0

0 0 0 0 0 0 1 1 1 1 1 1 1

1 1 1 0 0 0 1 0 0 0 0 0 0

0 0 0 1 1 0 0 1 1 0 0 0 0

0 0 0 0 0 1 0 0 0 1 1 1 1

and the other for column weights. Thus, in this case we must devise a way to determine those two axes, one for rows and the other for columns. The formulas for the row axis and one for the column axis of each component were derived by Nishisato (2019b) as we will see later. It is true that we can obtain the same amount of information from the two formats of the same data, but to make the amount of information the same, we must do additional work on the results from the contingency table, while the quantification of the response-pattern format directly yields the entire amount of information. Furthermore the response-pattern format provides an output for the joint graphical display, a vitally important topic for quantification analysis (Tables 6.1, 6.2 and 6.3). To see the relation between the data formats and information, let us use the same numerical example as the one used in Nishisato (1980, Chap. 4), where 13 subjects were asked the following two multiple-choice questions: 1. Do you smoke? (yes, no) 2. Do you prefer coffee to tea? (yes, not always, no)

6.1 Two Formats of Same Data

61

Table 6.3 Condensed response-pattern table F∗ Yes No Coffee 3 2 1 0 0 0

0 0 0 1 2 4

3 0 0 1 0 0

C or T

Tea

0 2 0 0 2 0

0 0 1 0 0 4

The data were presented in three formats: (1) Contingency table. F (2) Response-pattern table Fp . (3) Condensed response-pattern table F∗ . When we look at these three tables, their sizes are different, but we can confidently say that if we are given one of the three, we can generate the other two from it. In this sense, one is tempted to conclude that the three formats of the same data are equivalent in terms of information contained in them. But, this conjecture is utterly wrong. The information contents in the three formats of the same data are indeed different! Let us understand this point first. Now our contingency table F is 2 × 3, the response-pattern table Fp is 13 × 5, and the condensed response-pattern table F∗ is 6 × 5. F∗ consists of the distinct response patterns of Fp with frequencies as elements. As we can infer from Benzécri’s principle of distributional equivalence (Benzécri et al. 1973) and Nishisato’s principle of equivalent partitioning (Nishisato 1984), the formats Fp and F∗ lead to identical quantification results with respect to the number of dimensions and the information captured by the eigenvalues. Therefore, in this chapter we will compare only the contingency table F and the condensed response-pattern table F∗ . We have the following comparisons between the two formats. Contingency Table Versus Condensed Response-Pattern Table • F: The number of components is equal to min(m, n) − 1, that is, the (smaller of m and n) minus 1. Let us indicate this number as N (F). In our example, N (F) = 2 − 1 = 1. • F∗ : The number of components N (F∗ ) is equal to m + n − 2. In our example, N (F∗ ) = 3 + 2 − 2 = 3. Note that the rows and the columns of F are now arranged in the columns of F∗ . This is very important because the Young-Householder theorem (Young and Householder 1938) guarantees that the quantification of F∗ yields both rows and columns of the contingency table mapped in common Euclidean space. Note that this does not apply to the rows and the columns of the contingency table.

62

6 Data Format and Information

Table 6.4 Comparisons of two formats of data Statistics F F∗ Eigenvalue ρ 2 Singular value ρ δ δ Angle θ (degree)

0.21 0.46 100 100 63◦

0.73 0.85 49 49

0.50 0.71 33 82

0.27 0.52 18 100

Remember that we maximize the correlation between the rows and the columns of the contingency table, but in all the cases of practical interest, the correlation will not be 1. This means that rows and columns from the contingency table analysis cannot be mapped on the same axis, but we need one axis for the rows and another axis for the columns, thus requiring two-dimensional space for one component. • Note the following important relation: N (F∗ ) ≥ 2 × N (F)

(6.1)

In other words, the response-pattern table requires double or more than double the space of the contingency table. • In the above formula, the equality holds when m = n. In our example, N (F∗ ) = 3 and N (F) = 1. Therefore, the contingency table requires one-dimensional space, while the response-pattern table three-dimensional space. From the above comparisons, we should note that the quantification of the contingency table maximizes the row-column correlation, but that the correlation does not reach 1. This necessitates an expansion of the space if we are to plot the rows and the columns in common space. This problem is automatically solved if we quantify the response-pattern matrix. From the graphical point of view, therefore, we should analyze the response-pattern table in stead of the contingency table. The above comparison also demonstrates the fact that if we need to make the rows and the columns span the same space we must double the space. In our example, we need two-dimensional space to accommodate the rows and the columns in common space. But, the response-pattern table yields 3 components for the situation in which we need only two-dimensional space. What is the extra component then? Let us look at the results of quantification and compare those from the contingency table and the response-pattern table (Tables 6.4 and 6.5). Note that • (1) The contingency table yields one component (C), while the response-pattern table has three components (R1, R2, R3). • The singular value of F, ρ, and the first eigenvalue of F∗ , ρr21 , are related by ρ = 2ρr21 − 1

(6.2)

6.1 Two Formats of Same Data

63

Table 6.5 Standard and principal coordinates of two formats Question C-S C-P R1-S R2-S R3-S Smoking 1.08 Non−0.93 smoking Coffee 1.26 Coffee 0.17 or Tea Tea −1.14

R1-P

R2-P

R3-P

0.50 −0.43

1.08 −0.93

0.00 0.00

−1.08 0.93

0.92 −0.79

0.00 0.00

−0.56 0.48

0.58 0.08

1.26 0.17

−1.15 2.11

1.26 0.17

1.08 0.14

−0.81 1.49

0.66 0.09

−0.52

−1.14

−0.77

−1.14

−0.98

−0.54

0.59

Note: C = Contingency table, R = Condensed response-pattern table S = Standard coordinate, P = Principal coordinate, Number 1, 2, 3 = Component 1, 2, 3

• The information distributions over three components (δ) of F∗ are: Component 1 (R1) accounts for 49% of total information, R2 33%, and R3 18%. • The cumulative percentages are from R1 to R3 49%, 82% and 100%, respectively. • Since the discrepancy between row space and column space of F is 63 degrees, we need two-dimensional space to accommodate the “true” distribution of row variables and column variables. • The eigenvalue of the second component of F∗ is 0.5. Nishisato (1980) showed that the eigenvalue of 0.5 from F∗ corresponds to the eigenvalue of 0 from F, that is, the row-column correlation is 0. Regarding standard coordinates and principal coordinates associated with F and F∗ , we see • Standard coordinates of the contingency table (C-S) in boldface and the standard coordinates of the first component of the response-pattern table (R1-S) in boldface are identical! This means in general that as far as the common components from the two formats are concerned, they are identical. • The standard coordinates and the principal coordinates associated with the eigenvalue of 0.5 (the second component of F∗ in our example) show 0.00 for the two categories of smoking. Thus, this component has nothing to do with smoking. Although the response-pattern table yielded three components, one of them does not contribute to the relation between the rows and the columns of the contingency table. • We have shown that only two components from F∗ capture the row-column relation of the contingency table, that is, twice the number of component of F. This is the basis for Nishisato’s theory of doubled multidimensional space. In the current example, the principal coordinates of rows and columns in common space are as shown in Table 6.6.

64

6 Data Format and Information

Table 6.6 Principal coordinates in common space Question Component 1 1 1 2 2 2

Smoking Non-smoking Coffee Coffee or Tea Tea

Component 3 −0.56 0.48 0.66 0.09 0.59

0.92 −0.79 1.08 0.14 −0.98

6.2 Further Comparisons of Data Formats Let us summarize the main comparisons of data formats, as described in Nishisato (1980). • Consider all the components from the contingency table and the corresponding components from the response-pattern table (i.e., the first (min(m, n) − 1) components). Then, there exist the following relations between the eigenvalues from F, ρc2 , and those from F∗ , ρ 2f : ρc2 = (2ρ 2f − 1)2 , that is, ρ 2f =

ρc4 + 1 2

Therefore, ρ 2f = 0.5 when ρc2 = 0. This means that when the eigenvalue(s) from the response-pattern table is (are) 0.5, it is the case in which rows and columns of the contingency table are (are) uncorrelated. Please note that those components are, therefore, of no interest to the search for the row-column relation. • For those corresponding components, there exist the following inequalities, ρ 2f ≥ ρc2 , equality holding when ρ 2f = 1 • Notice also that

1 ≥ ρ 2f ≥ 0.5 when 1 ≥ ρc2 ≥ 0

• For those corresponding components, the standard coordinates associated with F are identical to the standard coordinates of F∗ . This is important, for as long as we look at all those corresponding components from the two formats, F and F∗ , the results are in essence identical, the only difference being in their singular values which make the two corresponding sets of principal coordinates different but proportional. The above two relations are limited only to the first (min(m, n) − 1) components.

6.2 Further Comparisons of Data Formats

65

Furthermore, Nishisato (1993, 1994) lists the following points on analysis of F∗ : • The sum of the eigenvalues is equal to the average number of categories (response options) minus 1. m j −1 ρ 2f k = n For our two-item case, this statistic is given by ρ 2f k =

m+n −1 2

• When data are from a single multiple-choice item with n response options, obtained from N respondents, where N is larger than n − 1, then (n − 1) eigenvalues are all equal to 1. • When the numbers of response options of two items are different (say m smaller than n), each of (n − m) eigenvalues is equal to 0.5. • The average of all the eigenvalues is also 0.5. In this chapter, we have discussed one of the most important, but often ignored, problems of data formats. For many researchers, this may be an eye-opening problem, but it is important to see that the data formats can be crucial for retrieving as much information as possible. As far as quantification analysis is concerned, it is advantageous to transform the contingency table to the condensed responsepattern table and analyze the latter. This is particularly important when we consider the joint graphical display of rows and columns, that is, when we wish to graph rows and columns of the contingency table in common space.

6.3 Numerical Illustration This may be redundant, but let us look at one more example of real data, Kretschmer’s data that we have already seen in Chap. 2. We will reproduce the contingency table and then show the same data in the response-pattern format.

6.3.1 Kretschmer’s Typology Revisited The contingency table of Kretschmer’s data (1925) is small enough (Table 6.7), but its response-pattern table is enormous. Therefore we will instead use the condensed response-pattern table (Table 6.8), which is still manageable as you see. Quantification results from these two data formats of the same data are given in Tables 6.9 and 6.10. We should note that we are using the same information but represented in different formats. It is our job to extract as much information as possible from the data. The current demonstration of two formats of the same data provides

66

6 Data Format and Information

Table 6.7 Kretschmer’s typology data: contingency table Manicdepressive

Pyknic

Leptosomatic Athletic

Dysplastic

Others

Total

879

261

15

114

1360

91

Schizophrenic 717

2632

884

549

450

5232

Epileptic

83

378

435

444

166

1506

Total

1679

3271

1410

1008

730

8098

Table 6.8 Kretschmer’s data in condensed response-pattern format Ptn MaD Sch Epi Pyk Lep Ath 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

879 261 91 15 114 0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 717 2632 884 549 450 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0 83 378 435 444 166

879 0 0 0 0 717 0 0 0 0 83 0 0 0 0

0 261 0 0 0 0 2632 0 0 0 0 378 0 0 0

0 0 91 0 0 0 0 884 0 0 0 0 435 0 0

Dys

Oth

0 0 0 15 0 0 0 0 549 0 0 0 0 444 0

0 0 0 0 114 0 0 0 0 450 0 0 0 0 166

Note: Ptn = pattern; MaD = manic-depressive, Sch = schizophrenic, Epi = epileptic; Pyk = pyknic, Lep = leptosomatic, Ath = athletic, Dys = dysplastic, Oth = others

very different amounts of information we can extract from them. The key point here is the fact that the contingency table provides a rather complex form of structural information of the data. It provides two components, but these components are difficult to interpret, because one component indeed has two-dimensional information: the row variables (mental types) and the column variables (body types) for the same component is correlated to each other, hence requiring two axes, one for the rows and the other for the columns with the angle of the two axes θ (Nishisato and Clavel 2003), where θ = cos −1 ρk In spite of this discrepancy between the axis for rows and the axis for columns of a single contingency table component, most researchers ignore this discrepancy. Why?

6.3 Numerical Illustration

67

Table 6.9 Summary statistics and principal coordinates Component 1 Eigenvalue Delta Manic-depressive Schizophrenic Epileptic Pyknic Leptosomatic Athletic Dysplastic Others

2

0.26 79% −1.09 0.14 0.50 −0.96 0.16 0.33 0.55 0.06

0.07 21 0.61 −0.18 0.48 0.11 −0.29 0.18 0.45 0.09

Table 6.10 Summary statistics and principal coordinates Component

1

2

3

4

5

6

Eigenvalue

0.75

0.63

0.50

0.50

0.37

0.25

Delta

25%

21

17

17

12

8

ManicDepressive

1.86

0.48

0.00

0.00

0.37

1.06

Schizophrenic −0.24

−0.55

0.00

0.00

−0.42

−0.14

−0.86

1.47

0.00

0.00

1.12

−0.42

1.63

0.34

−0.28

−0.04

−0.26

−0.93

Leptosomatic −0.28

−0.89

−0.34

0.18

0.68

0.16

Atheletic

−0.58

0.54

0.89

−1.92

−0.41

0.33

Dysplastic

−0.94

1.36

−1.44

1.28

−1.04

0.54

Others

−0.10

0.27

2.45

1.24

−0.21

0.06

Epileptic Pyknic

We will later learn that the Kretschmer data require four-dimensional space and the coordinates of the mental types and the body types are given by components 1, 2, 5 and 6 of the response-pattern table. In other words, the contingency table analysis does not provide any direct information about the additional two dimensions but provides information on only half the number of required dimensions. In this example, please note that quantification of the contingency table provides components 1 and 2, while its response-pattern format yields components 1, 2, 5 and 6. The differences between these two formats are not just in the numbers of components but are related to the true Euclidean configuration of the data. We will discuss this problem later when we talk about graphical display.

68

6 Data Format and Information

References Benzécri, J. P. et al. (1973). L’Analyse des Données. II. L’Analyse des Correspondances. Paris: Dunod. Kretschmer, E. (1925). Physique and Character: An Investigation of the Nature of Constitution and of the Theory of Temperament; with 31 Plates. London: Kegan Paul, Trench, Trubner. Nishisato, S. (1980). Analysis of Categorical Data: Dual Scaling and Its Applications. Toronto: The University of Toronto Press. Nishisato, S. (1984). Forced classification: A simple application of a quantification technique. Psychometrika, 49, 25–36. Nishisato, S. (1993). On quantifying different types of categorical data. Psychometrika, 58, 617– 629. Nishisato, S. (1994). Elements of Dual Scaling: An Introduction to Practical Data Analysis. Hillsdale, N.J.: Lawrence Erlbaum Associates. Nishisato, S. (2016). Quantification theory: Dual space and total space. Paper presented at the annual meeting of the Behaviormetric Society, Sapporo, Japan, p. 27 (In Japanese) Nishisato, S. (2019a). Reminiscence: Quantification theory and graphs. Theory and Applications of Data Analysis, 8, 47–57. (in Japanese). Nishisato, S. (2019b). Expansion of contingency space: Theory of doubled multidimensional space and graphs. An invited talk at the Annual Meeting of the Japanese Classification Society, Tokyo (in Japanese). Nishisato, S., & Clavel, J. G. (2003). A note on between-set distances in dual scaling and correspondence analysis. Behaviormetrika, 30, 87–98. Young, G., & Householder, A. A. (1938). Discussion of a set of points in terms of their mutual distances. Psychometrika, 3, 19–22.

Chapter 7

Space Theory and Symmetry

7.1 Spatial Symmetry We have so far looked at symmetric quantification in which the rows and the columns of the contingency table were treated equally with completely symmetric quantification outcomes. In other words, the optimal row vector maximizes such a statistic as correlation, correlation ratio and Cronbach’s alpha (Cronbach 1951), and the optimal column vector also maximizes those statistics with the identical values. These vectors are reciprocally related, that is, one is a linear function of the other. In our demonstrations, we used mostly the cases where the number of rows is equal to the number of columns. Our question then is what will happen when the number of rows is different from that of columns. In this case, the rows and the columns span over different numbers of dimensions. Therefore it is reasonable to ask if we can still maintain symmetry of analysis. Spatially speaking, our answer is “NO” because the rows and the columns would employ different spaces, at least in terms of the number of dimensions. This is the proposal presented by Nishisato (2019a, b, 2020) that quantification theory should be defined only in dual space where spatial symmetry is maintained. In dual space the dimensionality of row space and that of column space are equal, and we consider the common space for rows and columns of the data matrix. The backbone for this spatial symmetry is the theory of quantification space, to be discussed next.

7.2 Theory of Quantification Space On the basis of comparisons of quantification between the formats of the contingency table and the corresponding response-pattern table, Nishisato (2019a, b, 2020) formalized his theory of quantification space (see also Nishisato et al. 2021): he postulates the following classification of quantification space. Please note that the following theory will be illustrated with a numerical example. © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 S. Nishisato, Optimal Quantification and Symmetry, Behaviormetrics: Quantitative Approaches to Human Behavior 12, https://doi.org/10.1007/978-981-16-9170-6_7

69

70

7 Space Theory and Symmetry

7.2.1 Contingency Space This is the space associated with the m×n contingency table, spanning (min(m, n)-1) dimensions. If the contingency table is 8 × 12, then the contingency space is seven dimensional.

7.2.2 Dual Space: Symmetric Space This is the space which accommodates both row variables and column variables of the contingency table in common space. In other words, this space is defined for the response-pattern format of the contingency table. • The dimension of dual space is twice that of contingency space. If the contingency table is 8 × 12, contingency table is seven dimensional, while dual space is 14 dimensional (twice of that of contingency space). • The average eigenvalue of all the components in dual space is 0.5. • The principal coordinates in dual space are the Euclidean coordinates for rows and columns in common space, hence the graphs in dual space are the solution to our perennial problem of joint graphical display. How to identify which components span dual space will be explained later using a numerical example.

7.2.3 Pairwise Dual Subspaces • As noted before, the dimensionality of dual space is twice that of contingency space and is always an even number. • When the dimensionality of dual space is more than 2 (1) there are pairwise subspaces in dual space such that the average eigenvalue of the two components in each subspace is 0.5, (2) the two-dimensional graph based on each set of pairwise components in dual subspace is the correct two-dimensional graph, associated with a single component of the contingency table (Remember our discussion that the rows and the columns, associated with a single component of the contingency table, do not lie in unidimensional space, but in a two-dimensional space. That two-dimensional space is one of the dual subspaces). We have therefore just found the correct coordinates for both rows and columns of each component from the contingency table in this dual subspace. (3) Thus, the pairwise dual subspace is the correct space in which rows and columns can share. The only requirement is that the average eigenvalue of the two components is 0.5. When the contingency table yields 7 components, dual space is 14-dimensional space, which contains 7 pairwise subspaces, each depicting the true two-dimensional graph of a single component associated with the contingency table,

7.2 Theory of Quantification Space

71

(4) Given a single component from the contingency table, the coordinates of pairwise dual subspace can be computed by the following formula: Since the separation angle is θk and the distribution of variables is assumed centered, thus symmetric, Nishisato (2019b) introduced the row axis and the column axis, separated from the horizontal axis by the angle θ2k in one direction and the other axis by the same angle in the opposite direction, arriving at the formulas for two-dimensional coordinates for component k: For row i and column j of contingency table component k, the correct two-dimensional coordinates are given respectively by   θk ; Row i: ρk yik , ρk yik sin 2

 Column j:

θk ρk x jk , −ρk x jk sin 2



Because of symmetry and two dimensions for a single component from the contingency table, Yano (2021) calls this graph Symmetric Biplot. This name sounds quite reasonable. Thus we can introduce in this way two dimensions for each component associated with the contingency table. The coordinates for two dimensions thus generated are exactly proportional to those in pairwise dual subspace (Note: proportional because the two eigenvalues from the two formats are different). Therefore, the response-pattern format yields such a pair of components directly from quantification. This is a huge advantage of dealing with the response-pattern table, rather than the contingency table.

7.2.4 Total Space This is the space which accommodates the total variations of the response-pattern table. When the number of rows is equal to the number of columns, total space is identical to dual space; when the number of rows is not equal to that of columns, total space has larger dimensionality than dual space. The average eigenvalue of total space is also 0.5.

7.2.5 Residual Space When the number of rows is different from that of columns of the contingency table, the dimensionality of total space is larger than that of dual space, and the difference is called residual space. In residual space either row variables or column variables have zero coordinates. The eigenvalue of each component in residual space is 0.5. It is interesting to note that the eigenvalue of 0.5 from the response-pattern table corresponds to the eigenvalue of 0 from the contingency table (Nishisato 1980). As is clear, the components in residual space are of no interest to the investigators because they capture information of only rows or columns and the row-column correlations for all components in residual space are zero.

72

7 Space Theory and Symmetry

One of the conclusions from the above discussion is that quantification analysis should be carried out only in dual space. This is so because the residual space captures the variations in only rows or columns of the contingency table, thus the information in residual space is of no interest in search for the association between rows and columns. This presents us an important lesson to investigators: We should try to use the same number of rows and that of columns. Otherwise, imagine, for example, that we collect data of a 2 × 50 contingency table. Then dual space is two dimensional and residual space is 48 dimensional! Suppose that the investigator collects the data from two groups (urban and suburban residents) on their most preferred drinks out of 10, and the results of quantification of the 2 × 10 contingency table (or, the N × 12 response-pattern table) yields only two components in dual space, thus wasting coordinates of eight dimensions. Is this something that researchers should be concerned with? The lesson here is: “Try to use the same number of rows and columns” so long as it is possible.

7.3 Example of Space Decomposition Let us use an example from the paper by Garmize and Rychlak (1964), in which the authors investigated whether subjects’ perceptions of Rorschach inkblots could be influenced by the moods of the subjects. They devised a way in which different moods could be induced into subjects’ minds, and then studied how subjects’ moods would influence the perceptions of Rorschach inkblots. The original data set included 16 Rorschach symbols and 6 moods. In analyzing the 16 × 6 table of response frequencies, however, Nishisato (1994) noted that there were very few responses of Rorschach symbols of Bear, Boot(s), Bridge, Hair and Island. Thus he decided to delete those five symbols from his analysis in order to avoid possible outlier effects in quantification (Nishisato 1984, 1986, 1987, 1988a, b, 1991). We follow his lead and will use the reduced data set of 11 × 6 table as shown in Table 7.1. The analysis of this contingency table yields the results as summarized in Tables 7.2 and 7.3. Let us now look at the analysis of the condensed response-pattern table. The size of the response-pattern table is (the number of non-zero elements in the contingency table) × (the number of rows + the number of columns). Thus, our response-pattern table is 58 × 17 as shown in Table 7.4. From the researcher’s point of view, the size of the response-pattern table may soon reach the practical limit as the data size increases. What can we do then? As mentioned earlier, Nishisato (2019a, b, see also Nishisato et al. 2021) provided how to calculate two components (one for rows and the other for columns) associated with each component from the contingency table. Let us repeat the formulas here: For row i and column j of component k, the coordinates for the second dimension are given, respectively, by

7.3 Example of Space Decomposition

73

Table 7.1 Rorschach data and induced moods (Garmize and Rychlak 1964) Rorschach Induced Moods Inkblot Fear Anger Depression Love Ambition Bat Blood Butterfly Cave Clouds Fire Fur Mask Mountains Rocks Smoke

33 10 0 7 2 5 0 3 2 0 1

10 5 2 0 9 9 3 2 1 4 6

18 2 1 13 30 1 4 6 4 2 1

1 1 26 1 4 2 5 2 1 1 0

Security

2 0 5 4 1 1 5 2 18 2 1

6 0 18 2 6 1 21 3 2 2 0

Notes: Rorschach symbols Bear, Boot(s), Bridge, Hair and Island were dropped from the original data set, due to small frequencies Table 7.2 Analysis of the complete contingency table Component 1 2 3 Eigenvalue ρ 2 Singular value ρ Accounted for δ (%) δ Angle θ (degree) Symmetric ωsym,k

4

5

0.46 0.68

0.25 0.50

0.17 0.41

0.13 0.36

0.07 0.27

43

23

16

12

7

43 47◦

66 60◦

82 66◦

93 69◦

100 74◦

48

33

27

23

18

For row i:

For column j:

  θk ρk yik , ρk yik sin 2

(7.1)

  θk ρk x jk , −ρk x jk sin 2

(7.2)

Each component from the contingency table is augmented by the second component with the coordinates calculated from formulas (7.1) and (7.2). Only then, the entire set of pairwise components can be used as coordinates of the rows and the columns of the contingency table in common space (dual space). Recall that dual space for this example is 10 dimensional, and that residual space is 5 dimensional.

74

7 Space Theory and Symmetry

Table 7.3 Principal coordinates of five components Component 1 2 3 Bat Blood Butterfly Cave Clouds Fire Fur Mask Mountains Rocks Smoke Fear Anger Depression Love Ambition Security

−0.70 −0.87 1.17 −0.37 −0.23 −0.42 0.78 −0.05 0.34 0.11 −0.57 −0.87 −0.44 −0.39 1.03 0.42 0.76

−0.16 −0.35 −0.44 0.30 −0.08 −0.30 −0.08 0.04 1.54 0.15 −0.05 −0.17 −0.23 0.09 −0.51 1.27 −0.23

0.15 0.60 0.17 −0.44 −0.75 0.63 −0.08 −0.21 0.31 0.13 0.57 0.39 0.35 −0.68 0.15 0.29 −0.09

4

5

−0.34 −0.18 −0.15 −0.34 0.30 0.59 0.01 −0.04 −0.04 0.70 1.25 −0.48 0.75 0.02 −0.12 0.00 −0.08

−0.08 0.06 0.32 0.11 0.13 0.07 −0.67 0.02 0.11 −0.08 0.00 −0.03 −0.02 0.10 0.48 0.05 −0.47

Thus, the above procedure is one way to overcome the size-problem of the response-pattern table. We have now expanded our contingency space to dual space. Let us now look at the analysis of the response-pattern table. The total number of components is 15, out of which the dimensions of dual space is 10 dimensional, twice of the contingency space. As we noted earlier, dual space is the space in which we should explore the information contained in the current data set. The results are summarized in Tables 7.5 and 7.6. Let us look at principal coordinates of dual space as in Table 7.6. Some of the important results can be summarized as follows: • Since we are interested in the multidimensional decomposition of the association between the rows and the columns of the contingency table, we should ignore the residual space, which corresponds to components 6, 7, 8, 9 and 10. Note that each of these components has the eigenvalue of 0.5, and the contribution of every column is zero. Once we remove these components, we obtain Table 7.7. This tells us how important it is to equate the number of rows to that of columns so long as it is reasonable. Recall that the dimensionality of residual space is nil only when the number of rows is equal to the number of columns. • Those remaining components in Table 7.8 are arranged in the descending order of the eigenvalues. We should note that the first five components (major components in dual space (1), indicated by the boldface) are those associated with the contingency table format, while the remaining components (minor components

7.3 Example of Space Decomposition

75

Table 7.4 Response-pattern table of the entire data set Ba* Bl Bu Ca Cl Fi Fu Ma Mt Rc 33 10 18 1 2 6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 10 5 2 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0 2 1 26 5 18 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 7 13 1 4 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 9 30 4 1 6 0 0 0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 5 9 1 2 1 1 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3 4 5 5 21 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

Sm Fe

An

Dp

Lo

Am Se

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

0 10 0 0 0 0 0 5 0 0 2 0 0 0 0 0 0 0 0 0 0 9 0 0 0 0 0 9 0 0 0 0 3 0 0 0 0 0

0 0 18 0 0 0 0 0 2 0 0 1 0 0 0 0 13 0 0 0 0 0 30 0 0 0 0 0 1 0 0 0 0 4 0 0 0 0

0 0 0 1 0 0 0 0 0 1 0 0 26 0 0 0 0 1 0 0 0 0 0 4 0 0 0 0 0 2 0 0 0 0 5 0 0 0

0 0 0 0 2 0 0 0 0 0 0 0 0 5 0 0 0 0 4 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 5 0 0

33 0 0 0 0 0 10 0 0 0 0 0 0 0 0 7 0 0 0 0 2 0 0 0 0 0 5 0 0 0 0 0 0 0 0 0 0 3

0 0 0 0 0 6 0 0 0 0 0 0 0 0 18 0 0 0 0 2 0 0 0 0 0 6 0 0 0 0 0 1 0 0 0 0 21 0

(continued)

76

7 Space Theory and Symmetry

Table 7.4 (continued) Ba* Bl Bu Ca Cl 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

Fi

Fu

Ma Mt

Rc

Sm Fe

An

Dp

Lo

Am Se

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

2 6 2 2 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0 0 4 2 1 2 2 0 0 0 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 6 1 1

2 0 0 0 0 0 1 0 0 0 0 4 0 0 0 0 0 6 0 0

0 6 0 0 0 0 0 4 0 0 0 0 2 0 0 0 0 0 1 0

0 0 2 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0

0 0 0 2 0 0 0 0 0 18 0 0 0 0 2 0 0 0 0 1

0 0 0 0 0 2 1 4 1 18 2 0 0 0 0 0 0 0 0 0

0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 1 0 0 0

0 0 0 0 3 0 0 0 0 0 2 0 0 0 0 2 0 0 0 0

∗ Note: Bl = blood, Bu = butterfly, Cl = clouds, Fi = fire, Ma = masks Mt = mountains, Rc = rocks, Sm = smoke, Fe = fear, An = anger Dp = depression, Lo = love, Am = ambition, Se = security

in dual space (2)) are additional contributions, recovered thanks to the responsepattern format. Note that the eigenvalues of those first five major components are all greater than 0.5, while those of the remaining minor components are all smaller than 0.5. • The spatial symmetry is observed only in dual space, in which both rows and columns of the contingency table lie. As we have noted earlier, the residual space is contributed only by rows or by columns, and not by both. In other words, residual space does not contain any relations between rows and columns, hence it is not relevant to the investigation of row-column relationships.

7.4 Recommendations On the basis of our discussion, we realize that it is crucially important to be aware of different kinds of space involved in quantification. The most important is the point that quantification analysis should be carried out within dual space. With this recom-

7.4 Recommendations Table 7.5 Basic statistics Component ρk2 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

0.84 0.75 0.71 0.68 0.63 0.5 0.5 0.5 0.5 0.5 0.37 0.32 0.29 0.25 0.16

77

ρk

δ

Cumδ

θk

0.92 0.87 0.84 0.82 0.80 0.71 0.71 0.71 0.71 0.71 0.60 0.57 0.54 0.50 0.40

11.20 10.00 9.42 9.05 8.46 6.67 6.67 6.67 6.67 6.67 4.88 4.28 3.92 3.33 2.13

11.20 21.21 30.63 39.68 48.14 54.80 61.47 68.14 74.80 81.47 86.35 90.63 94.54 97.67 100.00

23 30 33 35 37 45 45 45 45 45 53 55 57 60 66

mendation, we are now aware that quantification of contingency tables (note: this is often called simple correspondence analysis) must be shifted to the quantification of the response-pattern tables (note: this is called multiple-correspondence analysis). In Nishisato’s terms, these two types of analysis are called dual scaling of contingency tables and that of multiple-choice data, respectively. Once we represent the two types of data as a condensed response-pattern table, we are then ready to deal with dual space. Knowing that the total number of components, associated with the m × n contingency table is equal to (the smaller number of m and n) minus 1 and the dimension of the corresponding response-pattern equal to (m + n) − 2, it is very important not to use, for example, two categories for one variable. Because there will be only one component in the contingency space and only two components in dual space. The rest of components from the quantification belongs to residual space. For example, consider the contingency table of 2×50, then dual space is two dimensional and residual space is 48 dimensional! In other words, 48 components out of 50 components will be wasted! In other words, only 2 components contain the information on the relations of the rows and the columns, and the rest of the components are meaningless to the investigators. We have not discussed the case with more than two multiple-choice items. It is therefore quite urgent to expand the theory of quantification space beyond what we have discussed in this chapter. Theory of space for this general case will be much more complex than the two-variable case, but it is urgently needed to advance quantification theory beyond what we now know. This will be a necessary step toward

0.43

0.63

0.74

−0.50

0.17

0.12

−0.06

−1.12

1.51

−0.47

−0.26

1.12

−0.04

Bl

Bu

Ca

Cl

Fu

Ma

3

Ba

4

−0.51

0.20 1.31

0.13

0.29

0.43

−0.12

7

0

0.74

0.14

−0.45

−0.71

−1.19

−0.56

−0.50

1.29

0.51

1.19

6

−0.77

0.25

−0.07

0.60

Ro

Sm

Fe

An

De

Lo

Am

Se

Component

Ba

Bl

Bu

Ca

−2.30

−1.53

0.83

0.76

Ro

Sm

3.77

0.03

−0.01

−0.71

Mo 2.26

0.03

3.85

−0.18

−1.73

1.61

1.38

Ma

−3.15

0.00

−0.12

Fu

−2.06

−0.02 −0.12

0.11

−2.51

Fi

0.29

0.33

0.10

−0.12

−0.95

0.61

Cl

1.39 0.11

0

−0.28

1.40

1.57

−0.36

0.10

−0.16

−0.35

−0.55

0.14

0.17

−3.31

1.03

10

−1.40

9

0.18

1.40

0.30

0.11

−0.34

0.33

0.00

0.34

0.00

−1.97

0.49

0.13

0.92

−0.17

−0.01

0

8

0.22

−0.59

−2.18

0.39

−0.50

−0.42

5 −0.24

0.01

−0.03

1.37

1.73

−1.02

2.90

1.62

−0.07

−0.12

0.19

0.57

0.87

−0.63

−0.79

−1.02

−0.64 −0.21

−2.66

−0.23

0.40

0.19

Mo

0.20

1.53

−0.83

−0.45

0.89

−0.32

−0.74

−1.17

−0.34

2

0.26

1

−1.09

Component

Table 7.6 Principal coordinates of total space

Inkblot

Mood

Inkblot

Residual Space

(continued)

Contingency Space and Dual Space (1)

Contingency Space and Dual Space (1)

78 7 Space Theory and Symmetry

−0.75 −0.29

−0.22

−0.35

−0.57

0.39

0.70

−0.10

Bu

Ca

4

5

0.29 −0.49

−0.13

0.07 −0.17

−0.41 −0.13 −0.66

−0.05

0.70

−1.19

0.02

0.34

−0.01

0.26

0.00

0.25

0.26

−0.08

−0.23

−1.07

−0.14

1.07

Mo

Ro

Sm

Fe

An

De

Lo

Am

Se

1.26 −0.22

−0.14

−0.50

0.07

−0.25

0.38

0.27

−0.88

0.41

0.51

−0.03

∗ Notes: Comp = component, Ba = bat, Bl = blood, Bu = butterfly, Ca = cave, Cl = clouds Fi = fire, Fu = fur, Ma = mask, Mo = mountains, Ro = rocks, Sm = smoke, Fe = fear An = anger, Dep = depression, Lo = love, Am = ambition, Se = security

0.00

1.99

1.11

−1.53

0.28

−0.08

0.00

Ma

0.07

0.49

0.21

0.53

−0.20

−0.23

−0.08

−0.16

0.02

−0.46

0.22

0.13

0.13

0.11

−1.50

0.31

−0.77

Fu

0.10

0.98

0.96

−0.37

0.18

0.19

−0.62

0.46

0.44

15

0

0

0

0

0

0

Fi

−0.29

0.43

0.36

0.15

14

0

0

0

0

0

0

Cl

0.57

−0.22

13

0

−0.13

0

0

Bl

Se

0

0

0

12

0

Am

0

0

−0.51

0

Lo

0

11

0

De

0

3 0

−0.18

0

An

2

0

Ba

0

Fe

Component

1

0

Component

Table 7.6 (continued)

Mood

Inkblot

Mood

Dual Space (2)

Dual Space (2)

Residual Space

7.4 Recommendations 79

80

7 Space Theory and Symmetry

Table 7.7 Principal coordinates of dual space Co* 1** 2 3 4 5 Ba Bl Bu Ca Cl Fi Fu Ma Mo Ro Sm Fe An De Lo Am Se

−1.09 −1.12 1.51 −0.47 −0.26 −0.53 1.12 −0.04 0.40 0.19 −0.71 −1.19 −0.56 −0.50 1.29 0.51 1.19

0.26 0.63 0.74 −0.50 0.17 0.54 0.12 −0.06 −2.66 −0.23 0.13 0.29 0.43 −0.12 0.87 −2.18 0.39

−0.34 −1.17 −0.45 0.89 1.53 −1.20 0.202 0.43 −0.64 −0.21 −1.02 −0.79 −0.63 1.37 −0.42 −0.59 0.22

−0.74 −0.32 −0.51 −0.83 0.57 1.39 0.19 −0.12 −0.07 1.62 2.90 −1.02 1.73 −0.03 −0.50 0.01 −0.01

−0.24 −0.17 0.92 0.13 0.49 0.23 −1.97 0.00 0.34 0.00 0.33 −0.34 0.11 0.30 1.40 0.18 −1.40

11

12

13

14

15

−0.18 −0.13 0.70 0.10 0.37 0.18 −1.50 0.00 0.26 0.00 0.25 0.26 −0.08 −0.23 −1.07 −0.14 1.07

−0.51 −0.22 −0.35 −0.57 0.39 0.96 0.13 −0.08 −0.05 1.11 1.99 0.70 −1.19 0.02 0.34 −0.01 0.00

−0.22 −0.75 −0.29 0.57 0.98 −0.77 0.13 0.28 −0.41 −0.13 −0.66 0.51 0.41 −0.88 0.27 0.38 −0.14

0.15 0.36 0.43 −0.29 0.10 0.31 0.07 −0.03 −1.53 −0.13 0.07 −0.17 −0.25 0.07 −0.50 1.26 −0.22

0.44 0.46 −0.62 0.19 0.11 0.22 −0.46 0.02 −0.16 −0.08 0.29 −0.49 −0.23 −0.20 0.53 0.21 0.49

∗ Notes: Co = component, Ba = bat, Bl = blood, Bu = butterfly, Ca = cave Cl = clouds, Fi = fire, Fu = fur, Ma = mask, Mo = mountains, Ro = rocks, Sm = smoke Fe = fear, An = anger, Dep = depression, Lo = love, Am = ambition, Se = security ∗∗ Major components; the other five are minor components

the next stage of quantification analysis. We hope that the discussion in this chapter will pave the way toward a general theory of quantification space.

References Cronbach, L. J. (1951). Coefficient alpha and the internal structure of tests. Psychometrika, 16, 297–334. Garmize, L. M., & Rychlak, J. F. (1964). Role-play validation of a socio-cultural theory of symbolism. Journal of Consulting Psychology, 28, 107–115. Nishisato, S. (1984). Forced classification: A simple application of a quantification technique. Psychometrika, 49, 25–36. Nishisato, S. (1980). Analysis of Categorical Data: Dual Scaling and Its Applications. University of Toronto Press. Nishisato, S. (1986). Generalized forced classification for quantifying categorical data. In E. Diday, et al. (Ed.), Data Analysis and Informatics (pp. 351–362). Amsterdam: North-Holland. Nishisato, S. (1987). Robust techniques for quantifying categorical data. In I. B. MacNeil & G. J. Umphrey (Eds.), Foundations of Statistical Inference (pp. 209–217). Dordrecht, The Netherlands: D. Reidel Publishing Company.

References

81

Nishisato, S. (1988a). Forced classification procedure of dual scaling: its mathematical properties. In H. H. Bock (Ed.), Classification and Related Methods (pp. 523–532). Amsterdam: NorthHolland. Nishisato, S. (1988b). Market segmentation by dual scaling through generalized forced classification. In W. Gaul & M. Schader (Eds.), Data, Expert Knowledge and Decisions (pp. 268–278). Berlin: Springer. Nishisato, S. (1991). Standardizing multidimensional space for dual scaling. In The Proceedings of the 20th Annual Meeting of the German Operations Research Society (pp. 584–591). Hohenheim University. Nishisato, S. (1994). Elements of Dual Scaling: An Introduction to Practical Data Analysis. Hillsdale, N.J.: Lawrence Erlbaum Associates. Nishisato, S. (2019a). Reminiscence: Quantification theory and graphs. Theory and Applications of Data Analysis, 8, 47–57. (in Japanese). Nishisato, S. (2019b). Expansion of contingency space: Theory of doubled multidimensional space and graphs. An invited talk at the Annual Meeting of the Japanese Classification Society, Tokyo (in Japanese). Nishisato, S. (2020). Quantification theory: Categories, variables and mode of analysis. In T. Imaizumi, et al. (Ed.), Advanced Studies in Behaviormetrics and Data Science (pp. 253–264). Singapore: Springer. Nishisato, S., Beh, E. J., Lombardo, R., & Clavel, J. G. (2021). Modern Quantification Theory: Joint Graphical Display, Biplots, and Alternatives. Singapore: Springer Nature. Yano, T. (2021). Personal communication.

Chapter 8

Graphical Display

8.1 Graphical Display of Rows or Columns This is a straightforward case in terms of graphical display. Quantification analysis yields principal coordinates of rows (or columns) which span the Euclidean space with orthogonal axes. Therefore, there is no problem in plotting, for example, component 1 against component 2. The only problem is that we are typically limited to two-dimensional graphs (e.g., component 1 against component 2; component 1 against component 3). The important point is that we are dealing with graphs of only rows or only columns, not rows and columns.

8.1.1 Blood Pressures and Migraines The data were collected by Nishisato in Toronto, and were used in a few publications (Nishisato 1999, 2003, 2007). But, since the data were collected from a small group of people, the results would not be very representative of the general population. Therefore, the data are used here solely to illustrate graphical display. Fifteen people were asked to answer the following six multiple-choice questions. 1. How would you rate your blood pressure? ...(Low, Medium, High) 2. Do you get migraines?...(Rarely, Sometimes, Often) 3. What is your age group?...(20–34; 35–49; 50–65) 4. How would you rate your daily level of anxiety? ...(Low, Medium, High) 5. How would you rate your weight?...(Light, Medium, Heavy) 6. What about your height?...(Short, Medium, Tall)

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 S. Nishisato, Optimal Quantification and Symmetry, Behaviormetrics: Quantitative Approaches to Human Behavior 12, https://doi.org/10.1007/978-981-16-9170-6_8

83

84

8 Graphical Display

Table 8.1 Multiple-choice data in the response-pattern format Subject Bpr* Mig Age Anx 123 123 123 123 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

100 100 001 001 010 010 010 100 010 100 010 010 001 100 001

001 001 001 001 100 100 010 001 010 001 100 010 001 001 001

001 100 001 001 010 010 010 100 010 010 100 001 001 100 001

001 001 001 001 010 001 100 001 100 010 001 001 001 010 001

Wgt 123

Hgt 123

100 010 100 100 001 001 100 100 100 100 010 010 001 100 100

100 001 001 100 010 100 001 001 010 001 010 010 100 100 010

Note: Bpr* = blood pressure; Mig = migraine; Anx = anxiety Wgt = weight; Hgt = height

The data are as shown in Table 8.1. Since we do not have any information about the background of those subjects, we can justify plotting only the columns (options of the questions) of the data table. The quantification results of this data set reveal that the fifth component has a negative value of Cronbach’s α. Therefore we adopt only the first four components. Principal coordinates of the four components of the columns of the data matrix (those of options) are as shown in Table 8.2, and the relative contributions of the four components are from component 1 to component 4, respectively 35, 24, 22 and 19 percents. The principal coordinates of the four components are as shown in Table 8.2. Let us examine component 1. Please note that there is no specific meaning whether the weight is positive or negative because we have produced these weights with the center at 0 and projected our multidimensional configuration onto orthogonal components. If we gather those options with one side of the fist axis (negative weights) and the other side (positive weights), we obtain the following division on component 1: One side: Low, high blood pressure, frequent migraine, youngest oldest, high anxiety, short. The other side: Medium blood pressure, occasional migraine middle age, low anxiety and medium height.

8.1 Graphical Display of Rows or Columns

85

Table 8.2 Principal coordinates of options of items Item Option comp* comp 1 2 BP*

Mig

Age

Anxiety

Weight

Height

Low Medium High Rarely Sometimes Often 20–34 35–49 50–68 Low Medium High Light Medium Heavy Short Medium Tall

−0.72 1.17 −0.86 1.04 1.31 −0.78 −0.37 1.03 −0.61 1.55 0.12 −0.35 −0.27 0.32 0.50 −0.56 0.83 −0.27

0.82 −0.19 −0.74 −1.08 0.70 0.12 0.56 0.22 −0.56 1.21 0.31 −0.33 0.46 0.01 −1.40 −0.63 −0.35 0.98

comp 3

comp 4

0.72 0.02 −0.93 1.11 −1.07 −0.01 1.06 0.08 −0.78 −1.25 1.11 −0.08 −0.35 0.54 0.50 0.04 −0.11 0.07

0.12 −0.17 0.11 0.01 −0.35 0.11 −0.75 0.84 −0.20 0.42 1.07 −0.40 0.30 −1.77 0.85 0.55 −0.57 0.02

Note: comp* = component; BP* = blood pressure Table 8.3 Characteristics of two components Component 1 One side Low BP High BP Frequent migraine Old age group High anxiety Short

Opposite side Medium BP Rare migraine Middle age Low anxiety Medium height

Component 2 One side High BP Rare migraine Old Heavy Short

Opposite side Low BP Occasional migraine Young Tall

Similarly, we can summarize the following relations from the graph by looking at each of the two components and separating the response options into two ends of each axis (Table 8.3). From this comparison, we can roughly tell what kind of information these components carry. But a better way is to plot component 1 and component 2 in the same graph using two orthogonal axes, corresponding to these two components. See the plot of options of the item using component 1 and component 2 as two orthogonal axes (Fig. 8.1).

86

8 Graphical Display

Fig. 8.1 Blood pressure data: Components 1 and 2

This is only one example of two-dimensional graph. Other two-dimensional graphs can be obtained for the pairs of components, (1, 3), (1, 4), (2, 3), (2, 4) and (3, 4). Which graphs we should adopt is entirely up to the investigator’s judgment since it depends on the data. From Fig. 8.1, we can see clusters in the third quadrant consisting of (high blood pressure, high anxiety, old, short), in the second quadrant (low blood pressure, young,

Fig. 8.2 Blood pressure data: Components 1 and 2

8.1 Graphical Display of Rows or Columns

87

light, tall), in the first quadrant (low anxiety, mild headache, middle age) and lastly in the fourth quadrant (no headache, heavy). We would also like to show another way of describing the two-dimensional configuration, namely, a graph obtained by connecting the three options of each question, thus forming a triangle. An important aspect of this exercise is that the area of each triangle is proportional to the amount of information associated with each item. In other words, the area is proportional to the amount of information of the item, or mathematically, the area of each triangle is proportional to the square of the item-total correlation r 2jt in the first two dimensions. See Fig. 8.2 which is a graph of triangles of the items. We can see the dominance of blood pressure and migraines in the first two-dimensional space.

8.2 Joint Graph: Correspondence Plot Correspondence plot is also called French plot and symmetric scaling plot. This is the case in which we want to graph both rows and columns together. This is called joint graphical display, which has been dealt with as a perennial problem of joint graphical display in the history of quantification theory for many years. Those interested in the historical background, please see Nishisato (2020) and Nishisato et al. (2021)— see in particular the CGS controversies between the proponents Carroll, Green and Schaffer and the opponent Greenacre (Carroll et al. 1986, 1987, 1989; Greenacre 1989). In spite of our attempt to maximize the row-column correlation of a two-way table, we can never attain a perfect correlation of 1, or rather if the correlation is 1 the data do not offer any interesting information. Historically, the joint graphical display was drawn with the common coordinates for rows and columns, as if the row-column correlation is 1. This assumption is almost always wrong. Remember that unless the row-column correlation is perfect, rows and columns do not span the same space, the matter which is typically taught in introductory courses in statistics: we have an axis for rows and an axis for columns for one contingency component. More specifically, the vector of row coordinates and that of column coordinates of the same component are crossed at the origin with angle θ (Nishisato and Clavel 2003): θk = cos−1 ρk where ρk is the singular value of component k. Let us look at a numerical example to learn how wrong the current popular graphical method is. Let us use the Kretschmer data again since we are already familiar with this example. Let us reproduce the data and quantification results here. From our point of view, we need to examine how much different row space and column space are. The first singular value (correlation) is 0.508, which appears quite high. However, even with this high row-column correlation, the discrepancy of the angle between the row weight vector and the column weight vector is 59 degrees!

88

8 Graphical Display

Fig. 8.3 Correspondent plot of components 1 and 2

Fig. 8.4 Row-column angle as a function of correlation

This is far from what correspondence plot is based on, which is 0 degree! One can never assume that 59 degrees are 0 degree! Even so, the correspondence plot uses the same axis for the rows and the columns, thus ignoring that they are separated 59-degree angles for the first component and 75 degrees for the second component! Would you not say that the angle of 75 degrees in particular is closer to 90 degrees than to 0 degrees, adopted by correspondence plot? Are we now convinced how wrong the assumption or the condition used by correspondence plot is? Our correspondence plot is as shown in Fig. 8.2 and looks good. In fact, Nishisato (2016) reported that this two-dimensional correspondence plot looks better than the correct four-dimensional plot because row variables and column variables are plotted closer in two dimensions than they actually are in four-dimensional space (Figs. 8.3 and 8.4). As for the correspondence plot itself, the graph looks reasonable, but let us repeat the warning that by ignoring this space discrepancy we are making the mental types and body types look closer than they actually are (Nishisato 2016). The evaluation judgment in terms of graph’s appearance is indeed problematic and should not be adopted in practice. The real culprit of the perennial problem of joint graphical display lies in this discrepancy angle. So, the currently most popular joint graphical display (correspondence plot, French plot, symmetric scaling plot) is nothing but a compromise in assuming that cos −1 ρ = 0, which is almost never the case.

8.3 Logically Correct Graph and Discrepancy Diagram

89

8.3 Logically Correct Graph and Discrepancy Diagram Apart from the current practice of joint graphical display, let us go back to the stepby-step derivation of a correct graph for joint (rows, columns) display, referred here as an Euclidean graph. Our task is how to draw rows and columns of a contingency table, where the rows and the columns are somewhat correlated, that is, the correlation is between 0 and 1. This depicts a realistic situation that we face in reality. Considering the joint graph for this real situation, we will see what a correct Euclidean graph should be like. Such a graph is clearly different from our familiar correspondence plot. Please keep this in mind that correspondence plot is based on the condition that ρ = 1, that is, θ = 0. Let us now discuss how to draw a Euclidean graph of rows and columns of the contingency table. We will again use our familiar numerical example of Kretschmer’s typology data, discussed in Chap. 1 and used in the previous section of this chapter, Table 8.4. Earlier we extracted four components. From the table, let us list the first two components as in Table 8.5. Since we maximized the row-column correlation, the weights for rows and columns are correlated, namely, in our example, the two components revealed the following correlation coefficients: ρ1 = 0.5082 and ρ2 = 0.2611. Therefore, for component 1 the rows and the columns are in two-dimensional space, and the same is also true for component 2. This means that we need a four-dimensional graph to capture the structure of the data (Table 8.6). Note, however, that it is absolutely correct to plot only rows in a two-dimensional graph, or only columns in a two-dimensional graph, as discussed in the previous section. Note such a two-dimensional graph is not a joint graph. The problem arises only when we want to graph correlated rows and columns in two-dimensional graph in this example.

Table 8.4 Kretschmer’s typology data Pyknic Leptosomatic Athletic ManicDepressive Schizophrenic Epileptic Total

Dysplastic

Others

Total

879

261

91

15

114

1360

717 83 1679

2632 378 3271

884 435 1410

549 444 1008

450 166 730

5232 1506 8098

Table 8.5 Statistics for Graphs Component Correlation ρ 1 2

0.508 0.261

Discrepancy angle θ 59 degrees 75 degrees

90

8 Graphical Display

Table 8.6 Two-dimensional coordinates Component 1 2 Manic−1.09 Depressive Schizophrenic 0.14 Epileptic 0.50

0.61 −0.18 0.48

Component

1

Pyknic

−0.96

0.11

0.16 0.33 0.55 0.06

−0.29 0.18 0.45 0.09

Leptosomatic Athletic Dysplastic Others

2

Earlier, we discussed that the angle θ between the axis for rows and the axis for columns of each component space can be calculated by the formula (Nishisato and Clavel 2003) and noted that the discrepancy angles between rows and columns for two components are: cos−1 ρ1 = 59 degrees, and cos−1 ρ2 = 75 degrees. Therefore, we can use this information to introduce a two-dimensional graph for each of the two components—recall that we discussed the formulas to calculate the coordinates of rows and those of columns for each component in Chap. 7 (see also examples in Nishisato et al. 2021). This point is crucially important. To repeat, we need one axis to accommodate rows and another axis for columns for each component from the contingency table. The discrepancy diagram (Fig. 8.1) shows the angles of discrepancy between the row space and the column space as a function of the row-column correlation, which is the same as the singular value. By looking at the diagram with a few sample correlation coefficients, we realize that even a relatively large correlation coefficient separates the row axis from the column axis to a considerable extent. Our question now is how to introduce the discrepancy angle in the two-dimensional graph. As discussed earlier, Nishisato (2019a) proposed how to plot the twodimensional graph associated for a single component. For numerical illustrations for this adjustment, please refer to Nishisato (2019a, b) and Nishisato et al. (2021). This situation that we must double the quantification space to accommodate rows and columns in same space is behind the warning by Lebart, Morineau & Tabard (1977 and Lebart, Morineau & Warwick 1984) that one cannot measure the distance between a row-point and a column point using the joint graph, namely, correspondence plot. The reason is obvious because the correct graph is obtained by doubling the space used by correspondence plot. Let us now find out how to arrive at the necessary doubled space in a much straighter way than using the special adjustment for the output from the contingency table. This straighter procedure is to use the results of quantified response-pattern table.

8.3 Logically Correct Graph and Discrepancy Diagram

91

8.3.1 Graphs of Response-Pattern Format Let us consider quantifying the response-pattern format of the data. We will again use Kretschmer’s example and convert the contingency table to the (condensed) response-pattern format. As we discussed earlier, both the rows and the columns of the contingency table are now placed in the columns of the response-pattern table. Thus, thanks to the Young-Householder theorem, it is guaranteed that the rows and the columns of the contingency table will occupy the same space if the response-pattern table is subjected to quantification analysis. So, let us transform the contingency table to the condensed response-pattern table (Table 8.7). From the previous chapter, we can expect two components in contingency space and dual space (1), and two more components in dual space (2), and two components in residual space. The relevant results are tabulated in Table 8.8. Note that the two components in contingency space and dual space (1) have eigenvalues greater than 0.5, the two components in dual space (2) have eigenvalues smaller than 0.5, and each of the two components in residual space has the eigenvalue equal to 0.5. There are two pairwise components in dual subspace: the average of eigenvalues of D1 and D2 is 0.5, and the average of eigenvalues of D1* and D2* is also 0.5, and these pairs are called dual subspaces. Dual Subspace Each dual space consists of two quantification components. The important roles of dual subspace can be stated as follows:

Table 8.7 Kretschmer’s data in response-pattern format MaD Sch Epi Pyk Lep 879 261 91 15 114 0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 717 2632 884 549 450 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0 83 378 435 444 166

879 0 0 0 0 717 0 0 0 0 83 0 0 0 0

0 261 0 0 0 0 2632 0 0 0 0 378 0 0 0

Ath

Dys

Oth

0 0 91 0 0 0 0 884 0 0 0 0 435 0 0

0 0 0 15 0 0 0 0 549 0 0 0 0 444 0

0 0 0 0 114 0 0 0 0 450 0 0 0 0 166

92

8 Graphical Display

Table 8.8 Summary Statistics and Principal Coordinates Component 1 2 3 4 Eigenvalue Space Delta ManicDepressive Schizophrenic Epileptic Pyknic Leptosomatic Atheletic Dysplastic Others

5

6

0.75 C1, D1 25% 1.86

0.63 C2, D2 21 0.48

0.50 Residual 17 0.00

0.50 Residual 17 0.00

0.37 D2* 12 0.37

0.25 D1* 8 1.06

−0.24 −0.86 1.63 −0.28 −0.58 −0.94 −0.10

−0.55 1.47 0.34 −0.89 0.54 1.36 0.27

0.00 0.00 −0.28 −0.34 0.89 −1.44 2.45

0.00 0.00 −0.04 0.18 −1.92 1.28 1.24

−0.42 1.12 −0.26 0.68 −0.41 −1.04 −0.21

−0.14 −0.42 −0.93 0.16 0.33 0.54 0.06

*Space: C1, C2 = contingency space 1 and 2, (D1, D1*) = dual space (1) (D2, D2*) = dual space (2), Residual = Residual space (D1, D2) = pairwise dual subspace, (D1*, D2*) = pairwise dual subspace

• In our example, we see two sets of dual subspaces, namely, dual subspace 1 consists of component 1 and component 6 (note the condition that the sum of the two components’ eigenvalues is 1), and subspace 2 consists of component 2 and component 5. Note that the sum of the two eigenvalues in this case is also 1. • There exists a different kind of symmetry for each dual subspace such that, for example for dual subspace pair (D1 and D2), the eigenvalue ρ12 is equal to (1 − ρ22 ). Similarly, if ρ22 is given, then ρ12 = (1 − ρ22 ). • Each dual space, consisting of two components, corresponds to the twodimensional components, derived from a single component associated with the contingency table, which we can calculate by the formulas mentioned earlier. Recall that in the previous section we discussed formulas to generate two-dimensional coordinates from a single component of the contingency table, and we have seen those two-dimensional graphs. The two-dimensional coordinates we used then correspond to those in dual subspace. Thus, if we quantify the response-pattern table, we can directly obtain those pairwise components. As a consequence of mathematics involved, we arrive at the conclusion that we must double the dimensionality of the space occupied by the contingency table. This is Nishisato’s theory of doubled multidimensional space. Graphs: Contingency Table Versus Response-Pattern Table Our conclusion on Kretschmer’s data is that we can express the exact four-dimensional structure of the data in two ways: (a) using the analysis of contingency table with supplemented components (see Nishisato et al. 2021); and also (b) using the four components from the response-pattern table in dual space. Either way, the exact structure of the data is four dimensional, and their Euclidean coordinates can directly be found in dual space from the analysis of the response-pattern table.

8.3 Logically Correct Graph and Discrepancy Diagram Table 8.9 Principal Coordinates with Eigenvalues Component 1 2 Eigenvalue Space ManicDepressive Schizophrenic Epileptic Pyknic Leptosomatic Atheletic Dysplastic Others

93

5

6

0.75 D1 1.86

0.63 D2 0.48

0.37 D2* 0.37

0.25 D1* 1.06

−0.24 −0.86 1.63 −0.28 −0.58 −0.94 −0.10

−0.55 1.47 0.34 −0.89 0.54 1.36 0.27

−0.42 1.12 −0.26 0.68 −0.41 −1.04 −0.21

−0.14 −0.42 −0.93 0.16 0.33 0.54 0.06

*Space: (D1, D1*) = dual space (1), (D2, D2*) = dual space (2) (D1, D2) = pairwise dual subspace, (D1*, D2*) = pairwise dual subspace

The information we are interested in analyzing lies in dual space, consisting, in the current example, of components 1, 2, 5 and 6, which carry the total association information between the rows and the columns. The final table of information we are interested in can be summarized in Table 8.9, associated with dual space.

8.4 Re-evaluating Correspondence Plot Looking at the problems of joint graphical display, we can see that the traditional joint graphical display of correspondence plot (French plot, symmetric scaling plot) is limited to contingency space, which is two dimensional for Kretschmer’s data. Note that the mathematically correct joint graph is four dimensional, and that French plot is based only on the first two dimensions, thus ignoring the remaining two dimensions. In terms of our space theory, the exact joint graphical display requires dual space (1) and dual space (2), while correspondence plot is a joint graphical display with the only dominant components from each of the two dual subspaces. Unfortunately, the correspondence plot is correct if and only if the rows and the columns are perfectly correlated, a case in which joint graphical display is of no interest to the researchers. True Face of Correspondence Plot Let us plot the two dual subspace results, that is, components 1 and 6 which correspond to the first component of correspondence plot (Fig. 8.5), and components 2 and 5, which correspond to the second component of correspondence plot (Fig. 8.6). These two graphs are each two dimensional and are mathematically correct plots of Kretschmer’s data. You can see the discrepancy between the mental-type space and the body-type space is obvious, that is, all the mental types are on one axis and

94

8 Graphical Display

Fig. 8.5 Row-column angle as a function of correlation

Fig. 8.6 Row-column angle as a function of correlation

the body types on a different axis, meaning that they occupy different space, yet in correspondence plot these two axes in each plot are put together into a single axis. This demonstration reveals true nature of the correspondence plot.

8.4 Re-evaluating Correspondence Plot

95

However, the fact that the correspondence plot has acquired the status of “everyday method” can tell us something very positive and attractive to the researchers. Some of them are: • We know for many years (e.g., see Nishisato 1980) that the eigenvalues of the response-pattern matrix greater than 0.5 is equivalent to the eigenvalues of the contingency table greater than 0. • The fact that it uses the response-pattern components of the eigenvalues greater than 0.5 does indeed make the positions of rows and columns look closer than they actually are (Nishisato 2016), thus providing a reasonable compromise which makes it easier to identify clusters of rows and columns. • The use of half the components in dual space, particularly those two dominant components in our example can be very practical and appealing to the users. These points above are behind the reason why the practice is referred to as French wisdom (Nishisato et al. 2021). On the other hand, one can also be critical about this widely used correspondence plot or French plot, because • The fact that correspondence plot depicts rows and columns in the same space means that the rows and the columns are treated as if they are perfectly correlated. If they are perfectly correlated, the row-column axis discrepancy is zero, while the fact is that the row axis and the column axis for component 1 is 59 degrees for Kretschmer’s data (Kretschmer 1925). How can we justify treating 59 degrees to be 0 degree? And, how can we keep quiet about this gross mishandling of exact mathematics? • We started our quantification analysis by maximizing the correlation between rows and columns, and once we obtain the optimal quantification, we use correspondence plot, assuming that the rows and the columns are perfectly correlated—But, no! We have never reached the absolute maximum of 1, but only 0.5082! Is this not another gross mis-reporting of the results? • In multidimensional data analysis, it is so important that the rows and the columns can be accommodated in common space as the Young & Householder (1938) dictates. In other words, the response-pattern format should always be used for quantification of the contingency table. • The response to the above point is: use the response-pattern format, and report a four-dimensional joint graph in dual space for Kretshcmer’s data. • The correspondence plot, which has been used most widely as the method of joint graphical display, is not a scientific way to summarize the data structure, and the investigators should always make this point clear to the readers.

96

8 Graphical Display

8.4.1 Alternatives to Joint Graphical Display What can we recommend to the users of quantification analysis on how to summarize multidimensional outputs? Even for this small example of Kretschmer’s data, the exact Euclidean joint graphical display requires four-dimensional space. How can we deal with this problem associated with multidimensional space? Pairwise plots? Ignore those components other than the first two-dimensional components? Or, can we come up with a better alternative to joint graphs? Let us look at a few possible alternatives to joint graphical display of many dimensions. Popular Pairwise Plots The most common practice in joint graphical display is a pairwise display of rows and columns. But, the traditional correspondence plot of, for example, component 1 and component 2, leads to an over-simplification of a four-dimensional graph as a two-dimensional graph. This practice should be discarded from the routine use. The only consolation for the practice is that this joint graph is based on the two dominant components. But, we should feel it uneasy to assume that the rows and the columns are perfectly correlated within components. What is needed is a new development of a four-dimensional pairwise graph for a pairwise plot. What would a four-dimensional plot look like? We need a fourdimensional graphical display for Kretschmer’s data. Response-Pattern Labeling With no solution to the above pairwise plot of two-component case, it is a real step forward to plot subjects in two- or three- dimensional space and characterize each subject by his or her response-pattern over many items (Nishisato 1994). Recall the example of Singapore data in Chap. 2 where we plotted the subjects and labeled them with their response patterns, which is logically correct. The graph is reproduced here for your inspection (Fig. 8.4). Although this method is limited to a relatively small number of questions, the idea is brilliant. Nishisato once suggested this method to a researcher for his market segmentation study of cosmetics where subjects were randomly sampled from different groups of consumers and the study identified subgroups of subjects who prefer different cosmetics. So, there seems some possibility of further investigation along this line of approach to the graphical display of quantification results (Fig. 8.7). Cluster Analysis in Dual Space Lebart and Mirkin (1993) presented comparisons between the traditional graphical approach and cluster analysis. Note that the joint graphical display is dimensionoriented and this orientation provides a tough block for general use of graphical display. In contrast, cluster analysis is dimensionless. As such, cluster analysis has been suggested by a number of researchers. In particular, cluster analysis was recommended in lieu of multidimensional graphs by Clavel and Nishisato (2012, 2020),

8.4 Re-evaluating Correspondence Plot

97

Fig. 8.7 Components 1 and 2 of Subjects’ Response Patterns

Nishisato (2012, 2014, 2020), and Clavel and Nishisato (2020). The most comprehensive discussion of this alternative can be found in Nishisato et al. (2021). There are still many problems for cluster analysis to replace graphical display such as dealing with rectangular distance tables and stability analysis. But, this is definitely one possible approach to solving multidimensional problems that graphical methods do not seem to handle well. In concluding this chapter, it is important to note that correspondence plot (French plot, symmetric scaling plot) is not the exact plot, but an ingenious practical invention. When the number of components (dimensions) is large, however, the graphical method is not helpful to summarize the information in multidimensional space. Our recommendation is to develop methods of cluster analysis so as to suit our needs of data quantification. The interested readers are referred to Chap. 6 of Nishisato et al. (2021).

References Carroll, J. D., Green, P. E., & Schaffer, C. M. (1986). Interpoint distance comparisons in correspondence analysis. Journal of Marketing Research, 23, 271–280. Carroll, J. D., Green, P. E., & Schaffer, C. M. (1987). Comparing interpoint distances in correspondence analysis: A clarification. Journal of Marketing Research, 24, 445–450.

98

8 Graphical Display

Carroll, J. D., Green, P. E., & Schaffer, C. M. (1989). Reply to Greenacre’s commentary on the Carroll-Green-Schaffer scaling of two-way correspondence analysis solutions. Journal of Marketing Research, 26, 366–368. Clavel, J. G., & Nishisato, S. (2012). Reduced versus complete space configurations in total information analysis. In W. Gaul, A. Geyer-Schultz, L. Schmidt-Thiéme & J. Kunze (Eds.). Challenges at the Interface of Data Analysis, Computer Science and Optimization (pp. 91–99). Springer. Clavel, J. G., & Nishisato, S. (2020). From joint graphical display to bi-modal clustering: [2] Dual space versus total space. In T. Imaizumi, et al. (Ed.), Advanced Research in Classification and Data Science (pp. 131–143). Singapore: Springer. Greenacre, M. J. (1989). The Carroll-Green-Schaffer scaling in correspondence analysis: A theoretical and empirical appraisal. Journal of Marketing Research, 26, 358–365. Kretschmer, E. (1925). Physique and Character: An Investigation of the Nature of Constitution and of the Theory of Temperament; with 31 Plates. London: Kegan Paul, Trench, Trubner. Lebart, L., & Mirkin, B. G. (1993). Correspondence analysis and classification. In C. Cuadras & C. R. Rao (Eds.), Multivariate Analysis, Future Directions (pp. 341–357). North-Holland. Lebart, L., Morineau, A., & Tabard, N. (1977). Techniques de la Description Statistique: Méthodes et Logiciels pour l’Analyse des Grands Tableaux. Paris: Dunod. Lebart, L., Morineau, A., & Warwick, K. M. (1984). Multivariate Descriptive Statistical Analysis. New York: Wiley. Nishisato, S. (1980). Analysis of Categorical Data: Dual Scaling and Its Applications. Toronto: The University of Toronto Press. Nishisato, S. (1994). Elements of Dual Scaling: An Introduction to Practical Data Analysis. Hilsdale, N.J.: Lawrence Erlbaum Associates. Nishisato, S. (1999). Data types and information: Beyond the current practice of data analysis. In R. Decker & W. Gaul (Eds.). Classification and Information Processing at the Turn of the Millennium (pp. 40–51). Heidelberg: Springer. Nishisato, S. (2007). Multidimensional Nonlinear Descriptive Analysis of Categorical Data. London: Chapman & Hall/CRC Statistics Series. Nishisato, S. (2012). Quantification theory: Reminiscence and a step forward. In W. Gaul, A. Geyer-Schultz, Schmidt-Tiéme, & J. Kunze (Eds.), Challenges at the Interface of Data Analysis, Computer-Science and Optimization (pp. 109–119). Springer. Nishisato, S. (2014). Structural representation of categorical data and cluster analysis through filters. In W. Gaul, A. Geyer-Schultz, Y. Baba, & A. Okada (Eds.), German-Japanese Interchange of Data Analysis Results (pp. 81–90). Springer. Nishisato, S. (2016). Quantification theory: Dual space and total space. Paper presented at the annual meeting of the Behaviormetric Society, Sapporo, Japan, p. 27 (In Japanese) Nishisato, S. (2019a). Reminiscence: Quantification theory and graphs. Theory and Applications of Data Analysis, 8, 47–57. (in Japanese). Nishisato, S. (2019b). Expansion of contingency space: Theory of doubled multidimensional space and graphs. An invited talk at the Annual Meeting of the Japanese Classification Society, Tokyo (in Japanese). Nishisato, S. (2020). From joint graphical display to bi-modal clustering: [1] A giant leap in quantification theory. In T. Imaizumi (Ed.), Advanced Research in Classification and Data Science (pp. 157–168). Singapore: Springer. Nishisato, S., Beh E. J., Lombardo, R. & Clavel, J. G. (2021). Modern Quantification Theory: Joint Graphical Display, Biplots, and Alternatives. Singapore: Springer Nature. Nishisato, S., & Clavel, J. G. (2003). A note on between-set distances in dual scaling and correspondence analysis. Behaviormetrika, 30, 87–98. Young, G., & Householder, A. A. (1938). Discussion of a set of points in terms of their mutual distances. Psychometrika, 3, 19–22.

Part II

Gleaning in the Field

Chapter 9

Forced Classification

9.1 Procedure of Forced Classification The concept of projection is fundamentally important in data analysis and has fully been used in the familiar least-squares estimation. In addition, it is used in regression analysis, analysis of variance and covariance, discriminant analysis and so on. Thus, the idea of projection is everywhere in data analysis, and its specific applications to quantification theory will be very useful to expand the scope of analysis. In early days, Nishisato (1984) approached the problem of projection under the name of forced classification—his mind was then still confined in the environment surrounding the method of reciprocal averages (MRA), and forced classification was based on the idea of successive approximations to the solution. In this regard, forced classification is very much like MRA, but once we incorporate the tool of projection, forced classification would look obsolete, but like MRA its gradual approach to a solution is quite interesting and offers a learning experience. This comparison between forced classification and projection is very much like the comparison between MRA and Hirschfeld’s simultaneous linear regressions. Starting with Nishisato and Nishisato (1984), the idea of forced classification was further pursued in such papers as Nishisato (1986, 1988a, b, 1994), Nishisato and Gaul (1990), and was finally concluded in 1999 (Nishisato and Baba 1999). We will first look at the original formulation of forced classification and then relate it to the use of projection operators. Forced classification is a method of successive approximation to the results obtained by the use of projection operators. In the end, both yield identical results. Recall the principle of equivalent partitioning, mentioned earlier, where the identical rows were replaced with a single row of the response-pattern with frequencies as elements. In forced classification analysis, instead of rows, identical columns were replaced with a single column with the frequency. In this setup, forced classification means taking this frequency to a much larger value, or mathematically speaking, to plus infinity.

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 S. Nishisato, Optimal Quantification and Symmetry, Behaviormetrics: Quantitative Approaches to Human Behavior 12, https://doi.org/10.1007/978-981-16-9170-6_9

101

102

9 Forced Classification

In the context of forced classification, the following pairs are equivalent: ⎡

1 ⎢1 ⎢ ⎢1 F1 = ⎢ ⎢0 ⎢ ⎣0 0

1 1 1 0 0 0

1 1 1 0 0 0

1 1 1 0 0 0

1 0 1 0 1 0

⎤ ⎡ 41 0 ⎢4 0 1⎥ ⎥ ⎢ ⎢ 0⎥ ⎥ ; F2 = ⎢ 4 1 ⎥ ⎢0 0 1⎥ ⎢ ⎣0 1 0⎦ 1 00

⎤ 0 1⎥ ⎥ 0⎥ ⎥; 1⎥ ⎥ 0⎦ 1

In other words, F1 ≡ F2

Let us now generalize this to the data set, which is obtained from N respondents answering n multiple-choice questions. For question j, the response-pattern matrix is N ×m j matrix, indicated by Fj , where m j is the number of response options of item j. Then, the equivalence in the above example can be generalized as the equivalence of the structure of the following two partitioned matrices. p times  [F1 , F2 , . . . , Fk , . . . , Fk , · · · , Fn ] ≡ [F1 , F2 , · · · , pFk , . . . , Fn ]

Note that the left side involves p sets of Fk . Considering this equivalence, the procedure of forced classification uses the right-hand expression. As p goes to infinity, we obtain the results of forced classification. This is equivalent to projecting the data onto the column space of item k. In other words, as p goes to infinity, the data structure approaches the structure of the following matrix: Pk [F1 , F2 , . . . , Fk , . . . , Fn ] where Pk = Fk (Fk Fk )−1 Fk and Pk is the projection operator for the column space of item k. Therefore, it is easy to show that the above asymptote has the structure of [Pk F1 , Pk F2 , . . . , Fk , . . . , Pk Fn ] This is the matrix projected onto the space spanned by the columns of item k. This item k is called the criterion variable.

9.1 Procedure of Forced Classification

103

Table 9.1 Chosen options of 5 items and factorial design Subject 1 2 3 4 5 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24

1 1 1 1 2 3 3 4 5 2 2 2 2 2 2 5 5 4 5 5 5 5 5 4

3 3 3 1 1 2 4 5 5 2 3 4 3 4 4 5 5 5 4 5 5 5 5 5

2 2 2 1 1 3 3 3 4 3 3 4 2 2 2 4 4 4 4 5 5 5 5 4

1 1 2 1 1 1 5 5 5 3 4 4 2 3 4 2 2 3 3 4 5 4 5 5

1 2 2 1 2 2 4 4 5 2 2 2 2 2 2 2 3 3 4 4 5 4 5 5

α

β

γ

1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2

1 1 1 1 1 1 2 2 2 2 2 2 1 1 1 1 1 1 2 2 2 2 2 2

1 1 1 2 2 2 1 1 1 2 2 2 1 1 1 2 2 2 1 1 1 2 2 2

Let us consider an example (Table 9.1), which is reported in Nishisato and Nishisato (1984) to explain the processes behind forced classification with a large enough number p. We will borrow it since the data have a special feature that we can use it later again in relation of the analysis of variance of categorical variables. Please note that this table of chosen options and factorial design must be first converted to (1, 0) response patterns. For example, the response pattern of Subject 10 corresponding to the pattern in the data table 5

5

5

5

5

2

2

2

        (2, 2, 3, 3, 2, 1, 2, 2) ⇒ (01000, 01000, 00100, 00100, 01000, 10 , 01 , 01 )

104

9 Forced Classification

Please note that the five questions have all five options and each of the three factorial design parameters has two levels. Therefore, once the above table is transformed into the response-pattern table, it is a table of 24 × (25 + 6), that is, 24 × 31 table.

9.1.1 Criterion-Total Correlation In his pioneering work on forced classification, Nishisato (1984) demonstrated how fast the process converges to the optimal quantification as a function of the multiplier p, that is, to the structure of the criterion variable. One of the statistics to tell the successful run is to look at the correlation between the component and the criterion variable: if this correlation is 1, it is our solution to the modified data. This convergence to the forced classification solution indicates that after so many repetitions of the criterion item in the data set, the first component has converged onto the criterion-item space. Namely, the solution is the projection of the original data onto the criterion-item space. Let us look at a numerical example of this convergence process as reflected on the correlation of the five items with the criterion variable and also the criterion variable with the total component score. Remember we are now considering the task of forced classification with the following data [F1 , F2 , F3 , F4 , F5 , pF] with the criterion item γ. In this demonstration, the repetition parameter, p, was set equal to 1, 2, 3, 4, 5, 6, 7, 8, 12, 20, 50, to see how the value of p affects the correlations with the total score and other items as the value increases. It is remarkable to see how quickly the process converges to the solution as a function of p. Please note that we have not discussed what the process converges to. But, for the current example, our attention is focused on the convergence of the criterion-total correlation to 1. At this stage, we can say that the total space is now the same as the criterion space (Table 9.2). To start with ( p = 1), the criterion variable γ is correlated with the total score to be only 0.061. This means that this criterion variable is not a dominant or influential variable in this data set. But, as the value of p is increased to 2, 3 and 4, the correlation between the criterion and the total score increases to 0.126, 0.765 and 0.973, respectively, indicating that the criterion item is gradually taking over the total space as a function of the weight p. In other words, the process is gradually moving toward the space of the criterion variable. When p is 50, the criterion-total correlation is perfect, indicating that we have identified the space spanned by the criterion variable, or rather the criterion item has taken over the quantification space.

9.1 Procedure of Forced Classification

105

Table 9.2 Criterion-total and criterion-item correlation p rγt rγ1 rγ2 rγ3 1 2 3 4 5 6 7 8 12 20 50

0.061 0.126 0.765 0.973 0.985 0.990 0.993 0.995 0.998 0.999 1.000

0.146 0.150 0.184 0.232 0.242 0.245 0.246 0.246 0.247 0.248 0.248

0.044 0.073 0.415 0.569 0.574 0.575 0.576 0.576 0.577 0.577 0.577

0.217 0.248 0.540 0.628 0.629 0.629 0.629 0.629 0.629 0.629 0.629

rγ4

rγ5

−0.166 −0.166 −0.160 0.179 0.198 0.203 0.206 0.207 0.209 0.210 0.211

−0.050 −0.038 0.151 0.379 0.394 0.398 0.400 0.401 0.402 0.402 0.403

9.1.2 Criterion Items Correlation We wonder what is happening to the relation of the criterion item to the other variables. We can assess the effects of forced classification on the remaining items in terms of the criterion-item correlation as a function of the repetitions of the criterion item. Nishisato (1986) presented the following graph of the criterion-item correlation as a function of the number of repetitions of the criterion item (Fig. 9.1). As we can see, we do not need to increase very much the multiplier for the criterion item (γ in the current case) and the correlation with each item with the criterion increases smoothly as a function of the repetitions and reaches the maximal values quickly. The graph shows the two lines, one with five items plus γ (6 items) and the other with five items plus γ, α and β (eight items). What do all of these mean? To simplify the explanation, consider that all items have three response options (categories). Consider the forced classification with the first item as the criterion. Then, the solution has the following geometric interpretation: Since each item has three options, each item has two-dimensional structure of a triangle with the three vertexes corresponding to the three options. We can imagine that the data have the structure where these item triangles are floating in multidimensional space. When we project all the item triangles onto the criterion-item triangle, we will see that all other triangles will fall within the triangle of the criterion item. This is the projection of all other items onto the space of the criterion item, and this is what forced classification does. We will see a numerical example of this case. Going back to our example, consider the projection operator P for the columns of item γ P = F(F F)−1 F As we have seen, forced classification is defined as quantification of the projected data onto the criterion space.

106

9 Forced Classification

Fig. 9.1 Changes of correlations with the criterion

The underlying process involved in forced classification is based on gradual approach toward the solution like MRA, but the difference is that we now repeat the response pattern of a chosen item, called the criterion, once, twice, three times and so on until we obtain a stable output. In other words, in the new data table, the response pattern of the repeated item gradually overtakes the quantification process until the correlation of the criterion item with the modified data table becomes 1. At this final stage of iterations, we see that the data structure becomes identical to the original data projected onto the subspace of the response pattern associated with the criterion item.

9.1 Procedure of Forced Classification

107

9.1.3 Partitioning of Total Space Let us consider another set of artificial data (Nishisato and Baba 1999), which had been used for many years in Nishisato’s lectures to explain the forced classification analysis: ⎤ ⎡ ⎤ ⎤ ⎡ ⎡ 10 100 1000 ⎢0 1⎥ ⎥ ⎢ ⎢1 0 0⎥ ⎢0 1 0 0⎥ ⎥ ⎥ ⎢1 0⎥ ⎢ ⎢ ⎥ ⎢ ⎢0 1 0⎥ ⎢0 0 1 0⎥ ⎥ ⎥ ⎢0 1⎥ ⎢ ⎢ ⎥ ⎢ ⎢0 0 1⎥ ⎢0 0 0 1⎥ ⎥ ⎥ ⎢1 0⎥ ⎢ ⎢ ⎥ ⎢ ⎢0 1 0⎥ ⎢0 0 1 0⎥ ⎥ ⎥ ⎢0 1⎥ ⎢ ⎢ ⎥ ⎢ ⎢1 0 0⎥ ⎢0 0 0 1⎥ ⎥ ⎥ ⎢1 0⎥ ⎢ ⎢ ⎥ ⎢ ⎢0 0 1⎥ ⎢0 0 0 1⎥ ⎥ ⎥ ⎢0 1⎥ ⎢ ⎢ ⎥ ⎢ ⎢0 1 0⎥ ⎢1 0 0 0⎥ ⎥ ⎥ ⎢1 0⎥ ⎢ ⎢ ⎥ ⎢ ⎢0 0 1⎥ ⎢0 1 0 0⎥ ⎥ ⎥ ⎥ ⎢ ⎢ ⎢ F1 = ⎢ 0 1 ⎥ ; F2 = ⎢ ⎥ ; F3 = ⎢ 1 0 0 0 ⎥ ; ⎥ ⎢0 1⎥ ⎢1 0 0⎥ ⎢ ⎥ ⎢ ⎢0 1 0⎥ ⎢0 1 0 0⎥ ⎥ ⎥ ⎢1 0⎥ ⎢ ⎢ ⎥ ⎢ ⎢0 1 0⎥ ⎢0 0 1 0⎥ ⎥ ⎥ ⎢0 1⎥ ⎢ ⎢ ⎥ ⎢ ⎢0 0 1⎥ ⎢0 0 0 1⎥ ⎥ ⎥ ⎢1 0⎥ ⎢ ⎢ ⎥ ⎢ ⎢0 0 1⎥ ⎢0 0 0 1⎥ ⎥ ⎥ ⎢1 0⎥ ⎢ ⎢ ⎥ ⎢ ⎢1 0 0⎥ ⎢1 0 0 0⎥ ⎥ ⎥ ⎢1 0⎥ ⎢ ⎢ ⎥ ⎢ ⎢0 0 1⎥ ⎢0 1 0 0⎥ ⎥ ⎥ ⎢0 1⎥ ⎢ ⎢ ⎥ ⎢ ⎣0 1 0⎦ ⎣1 0 0 0⎦ ⎣1 0⎦ 100 0010 01 ⎡

1 ⎢0 ⎢ ⎢0 ⎢ ⎢0 ⎢ ⎢0 ⎢ ⎢0 ⎢ ⎢1 ⎢ ⎢1 ⎢ ⎢0 F4 = ⎢ ⎢0 ⎢ ⎢0 ⎢ ⎢0 ⎢ ⎢0 ⎢ ⎢0 ⎢ ⎢1 ⎢ ⎢0 ⎢ ⎣0 0

0 0 0 0 1 0 0 0 0 1 1 0 0 0 0 1 0 0

0 1 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0

0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1

⎤ ⎤ ⎡ 0 100000 ⎢0 0 0 0 1 0⎥ 0⎥ ⎥ ⎥ ⎢ ⎥ ⎢0 0 0 0 0 1⎥ 0⎥ ⎥ ⎢ ⎢0 0 0 0 0 1⎥ 1⎥ ⎥ ⎥ ⎢ ⎢0 1 0 0 0 0⎥ 0⎥ ⎥ ⎥ ⎢ ⎢0 0 0 1 0 0⎥ 0⎥ ⎥ ⎥ ⎢ ⎢0 0 1 0 0 0⎥ 0⎥ ⎥ ⎥ ⎢ ⎢1 0 0 0 0 0⎥ 0⎥ ⎥ ⎥ ⎢ ⎥ ⎢ 1⎥ ⎥ ; F5 = ⎢ 0 1 0 0 0 0 ⎥ ⎥ ⎥ ⎢ 0⎥ ⎢0 0 0 0 1 0⎥ ⎥ ⎥ ⎢ 0⎥ ⎢0 1 0 0 0 0⎥ ⎥ ⎥ ⎢ 0⎥ ⎢0 0 0 0 1 0⎥ ⎥ ⎥ ⎢ 1⎥ ⎢0 0 0 0 0 1⎥ ⎥ ⎥ ⎢ 0⎥ ⎢0 0 1 0 0 0⎥ ⎥ ⎥ ⎢ 0⎥ ⎢0 0 0 1 0 0⎥ ⎥ ⎥ ⎢ 0⎥ ⎢0 0 0 1 0 0⎥ ⎦ ⎣ 1 0 1 0 0 0 0⎦ 0 001010

108

9 Forced Classification

Table 9.3 Standard analysis (A) Item C 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Sum

1 0732 0401 1866 0810 3028 0286 1577 0201 0023 0696 0318 0014 0032 0021 0002 1.0

2 7258 4449 0070 2522 0328 0085 0594 2062 0125 1308 0806 0326 0033 0023 0012 2.0

3 8683 5822 6313 2308 0504 0433 0360 0651 3247 0162 0251 0227 0766 0270 0005 3.0

4 4966 7460 6847 7942 1417 3062 1429 2448 1835 0884 0502 0687 0298 0209 0015 4.0

5 4796 6809 6313 6896 8091 5751 4560 1361 0904 2262 0955 0745 0510 0033 0016 5.0

For these five multiple-choice data, quantification will yield 15 components (Note: the total number of options of the five items minus the number of questions = (2 + 3 + 4 + 5 + 6 − 5 = 20 − 5 = 15). Nishisato (1996) showed that the sum of the squared item-total correlations of any item over the entire space (components) is equal to the number of response options minus 1. Therefore, it would be interesting to see how the standard quantification results and the forced classification results look like with respect to the distributions of the squared correlations over individual components. This comparison reveals the true nature of forced classification. Let us first obtain the 15 × 5 table of squared correlation coefficients under the standard quantification and then the 15 × 5 table of squared correlation coefficients when we use Item 4 as the criterion for forced classification analysis. The following two tables (Tables 9.3, 9.4) show the results, where Table 9.3 shows the results of 15 components under ordinary quantification, and Table 9.4 shows the results of forced classification with Item 4 as the criterion. “C” indicates “Component.” Notice that the entire information of Item 4 under forced classification is accounted for by the first four components (i.e., the components of P4 F4 ) and the next 11 components are unrelated to Item 4 (i.e., the components of (I − P4 )F4 ). The elements of the tables are squared correlations with the criterion, except for Item 4 which is the squared correlation between Item 4 and the total scores. The decimal points are omitted (e.g., 0732 is 0.0732). Tables 9.3 and 9.4 clearly show what forced classification does to change the distribution of information in data.

9.1 Procedure of Forced Classification

109

Table 9.4 Forced classification with item 4 as criterion (B) Item C 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Sum

1 0046 0649 0662 0125 0173 5084 0272 0006 1149 0999 0726 0062 0012 0030 0005 1.0

2 1255 1976 0977 0494 5499 3697 1469 0800 1245 0510 0789 1161 0034 0047 0025 2.0

3 7006 2492 1286 1965 5770 0855 2908 0107 4721 0188 0301 0661 1080 0647 0010 3.0

4 1.0000 1.0000 1.0000 1.0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 4.0

5 3207 5197 5197 1765 7468 6162 5425 7981 1266 4543 1580 1306 0766 0146 0033 5.0

Note the following: • The information as reflected on the squared item-total correlations is distributed over 15 components under the standard quantification analysis. • The same information under forced classification changes quite drastically. By the first four (i.e., the number of options of the criterion item minus 1) components, the entire information associated with the criterion item (Item 4) is totally accounted for, and no more information of the criterion item is detected in the remaining 11 components. • The first four components are associated with the structure of P4  (F1 , F2 , F3 , F4 , F5 ), where P4 = F4 (F4 F4 )−1 F4 . • The remaining 11 components of forced classification are associated with (I − P4 ) (F1 , F2 , F3 , F4 , F5 ). Thus, it is clear that forced classification is a procedure to carry out quantification with the data projected onto the space spanned by the columns of the criterion variable, plus quantification of the data free from the effects of the criterion variable.

110

9 Forced Classification

Table 9.5 Squared item-total correlation Component 1 2 Item 1 2 3 4 5 ρ2

0.0043 0.1271 0.7049 1.0000 0.3152 1.0000

0.0660 0.1952 0.2465 1.0000 0.5236 1.0000

3

4

0.0651 0.1011 0.1276 1.0000 0.3177 1.0000

0.0129 0.0486 0.1957 1.0000 0.1776 1.0000

9.1.4 Contributions of Individual Components Using the same data as in the previous section, let us calculate the squares of item-total correlations of the items for four components as in Table 9.5. From this table, Nishisato and Baba (1999) found a way to assess the contribution of each component in forced classification, namely, ρ2k

=

r 2jt (k) − 1 n−1

−1

(9.1)

where r 2jt (k) is the squared correlation between item j and the total for component k, and n is the number of items. Since the criterion item (Item 4) has five options and we obtain 4 components from forced classification, the four squared item-total correlations for this example are, respectively: 0.2879, 0.2578, 0.1529 and 0.1087.

9.1.5 Legitimacy of Set by Set Analysis As we have seen so far, forced classification can be regarded as analysis of noncriterion variables in the criterion-variable space. This means that we can analyze item-by-item with the same criterion variable. In other words, in forced classification with the criterion item “h”, the results on non-criterion item 3, for instance, remains invariant whether or not item 3 is analyzed alone with the criterion variable or together with other items. To convince you, the following numerical example will suffice. Table 9.7 lists the optimal weights of the following analyses: (a): [ pF1 , F2 ] (b): [ pF1 , F2 , F3 ] (c): [ pF1 , F2 , F3 , F4 ] (d): [ pF1 , F2 , F3 , F4 , F5 ]

9.1 Procedure of Forced Classification

111

Table 9.6 Optimal category weights under forced classification with criterion item 1 Item (a) (b) (c) (d) 1

2

3

4

5

1.0000 −1.0000 −0.3334 0.3334 0.0000

1.0000 −1.0000 −0.3334 0.3334 0.0000 0.2000 −0.5001 0.5001 −0.2000

1.0000 −1.0000 −0.3334 0.3334 0.0000 0.2000 −0.5001 0.5001 −0.2000 0.5000 −0.5000 −0.3334 0.3335 0.0000

1.0000 −1.0000 −0.3334 0.3334 0.0000 0.2000 −0.5001 0.5001 −0.2000 0.5000 −0.5000 −0.3334 0.3335 0.0000 −0.0001 −0.5001 −0.3334 0.3335 0.3334 0.3334

The optimal weights of the items with Item 1 as the criterion are listed in Table 9.6. This result has an enormous implication for data analysis. Suppose we plan to investigate some background information behind the problems associated with sleeplessness rated on a five-point rating scale, say A. Suppose further that 500 pieces of background information on three-point scale were collected. In this case, we can carry out forced classification with the sleepless variable A as the criterion. Guess what, the data size is too large to carry out forced classification analysis. Then, the above demonstration has shown that one can carry out forced classification with the same criterion variable in a piecemeal fashion, say 10 variables at a time, or 20 variables at a time. The results will be identical to the analysis with the entire data set at once. Another possibility is to carry out the same investigation at several regions with different numbers of subjects. In this case, the criterion variable (sleeplessness) is the same, but depending on the regions, the space “sleeplessness” will be somewhat different. This poses the question about the sampling differences and we ideally need some knowledge about the sampling distribution of sleeplessness: this will be a future investigation. Nevertheless, being aware of the sampling discrepancies in the sleeplessness space, we can still project the remaining data onto the space, to

112

9 Forced Classification

Table 9.7 Singapore data in the response-pattern format Subject 1 2 3 4 1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24

3 2 2 1 3 1 2 2 1 3 1 2 2 3 1 3 3 2 3 2 1 1 2 1

1 1 1 2 1 3 1 1 2 1 2 1 1 1 1 1 1 3 1 1 3 3 1 3

2 3 2 2 2 1 2 2 3 2 2 1 3 2 2 2 1 2 2 2 3 3 2 3

1 2 2 3 2 2 2 2 1 1 3 1 3 1 3 1 1 2 1 2 3 3 2 3

001 010 010 100 001 100 010 010 100 001 100 010 010 001 100 001 001 010 001 010 100 100 010 100

2

3

4

100 100 100 010 100 001 100 100 010 100 010 100 100 100 100 100 100 001 100 100 001 001 100 001

010 001 010 010 010 100 010 010 001 010 010 100 001 010 010 010 100 010 010 010 001 001 010 001

100 010 010 001 010 010 010 010 100 100 001 100 001 100 001 100 100 010 100 010 001 001 010 001

arrive at a tentative result. This realistic situation suggests us that we must develop a technique to assess sampling distributions of key statistics in quantification theory, but it is indeed encouraging that forced classification can be carried out piece by piece as Table 9.6 demonstrates.

9.1.6 An Example of Application In Chap. 2, we looked at Singapore data. The four questions and the data from 23 subjects are reproduced here (see Table 9.5): 23 workshop participants answered the following four multiple-choice questions: Q.1. How old are you? (20–29; 30–39; 40 or over)

9.1 Procedure of Forced Classification

113

Q.2. Children today are not as disciplined when I was a child. (agree; disagree; I cannot tell) Q.3. Children today are as not fortunate as when I was a child. (agree; disagree; I cannot tell) Q.4. Religions should not be taught at school. (agree; disagree; indifferent) The data are tabulated in two formats, one in terms of chosen options and the other one in the response-pattern format as follows: Let us choose Item 1 as the criterion for forced classification. We are now interested in the question on how the participants’ ages have affected their responses to the remaining three questions. For this run, we used the DUAL3 program by Nishisato & Nishisato (1986: Note. This program is no longer available), and obtained the following results. As was the case of the standard method of quantification, our forced classification also revealed a negative value for Cronbach’s reliability coefficient α for component 3 and the remaining components. Therefore, we will look at only the first two components. In fact, since the criterion item has three response options, there are only two forced-classification components which account for the variations of the criterion variable completely. In terms of our discussion above, this means that the criterion item is perfectly correlated with the first two components (i.e., the squared criterion-total correlation is 1 for each of the two components) and no more information of the criterion variable is left to analyze. Therefore, we will examine only the first two components, which we may call forced-classification components. One of the most interesting observations is the fact that η12 = η22 = 1.00; η1 = η2 = 1.00; α1 = α2 = 1.00 Does this mean that there is no space discrepancy between row space (subjects) and column space (item options)? What do you think? This question is extremely important, but must remain to be further investigated. The scores of the subjects on the two forced-classification (FC) components and the corresponding weights for the response options of the four items are as listed in Tables 9.6 and 9.7 (Tables 9.8 and 9.9).

9.1.7 Graph in Criterion-Item Space Our Singapore data yielded two components under the forced classification with the Age as the criterion. Graphically this means that we projected the options of the other three items onto the space of Age. Graph Fig. 9.1 is the two-dimensional graph of our results. We can now understand that the options of the non-criterion items are all projected onto the Age space. Notice the triangle of the criterion variable (age) is the largest and the other non-criterion triangles all fall within the criterion triangle. This is what our forced classification procedure does (Fig. 9.2).

114

9 Forced Classification

Table 9.8 Scores for subjects: two FC components Subject Component 1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23

−0.60 0.07 −0.26 1.48 −0.47 0.96 −0.26 −0.26 1.22 −0.60 1.48 −0.29 0.53 −0.60 0.82 −0.60 −0.50 0.25 −0.60 −0.26 1.66 −0.26 1.66

Table 9.9 Item options: two FC components Item Option Age (Criterion)

No discipline

Not fortunate

No religion

20–29 30–39 40 or over Agree Disagree Cannot tell Agree Disagree Cannot tell Agree Disagree Indifferent

Component 2 0.47 −0.92 −0.82 0.17 −0.03 −0.25 −0.82 −0.82 0.28 0.47 0.17 −0.29 −0.62 0.47 0.01 0.47 0.49 −0.80 0.47 −0.82 −0.07 −0.82 −0.07

Component 1

Component 2

1.46 −0.37 −0.99 −0.53 1.46 1.01 0.03 −0.25 0.73 −0.61 −0.23 1.16

0.39 −1.20 1.15 −0.07 0.39 −0.01 0.11 0.06 −0.24 0.76 −0.76 0.13

9.2 Generalized Forced Classification

115

Fig. 9.2 Forced classification on age-space

9.2 Generalized Forced Classification Let us consider the data with multi-way structure for both rows and columns and consider carrying out forced classification (Nishisato 1986), which is referred to as generalized forced classification. Suppose we indicate the multi-way data with elements (1, 0) as we have so far seen, that is, F. Consider the following arrays of projection operators for the rows and the columns: • Rows:P1 + P2 + P3 + · · · + Pr = I • Columns: Q1 + Q2 + Q3 + · · · + Qc = I Therefore, we obtain the following expression: • F = (P1 + P2 + P3 + · · · + Pr )F(Q1 + Q2 + Q3 + · · · + Qc ) This general expression was proposed by Nishisato and Lawrence (1989). This encompasses a large number of applications and can be used in different formats for quantifying a large number of quantification applications onto a chosen subspace. The simple forced classification we discussed at the beginning was to specify only one projection operator, generated by the matrix of the response patterns associated with a single item, namely, (9.2) Pk = Fk (Fk F)−1 Fk In the case of the analysis of variance, one could choose a set of basis matrices (i.e., main effects and interactions, or combinations of a few main effects and interactions). In marketing research, one could choose a particular consumer group and a set of consumer goods as criteria. In this way, the general expression given above offers a countless variety of applications. To be more specific, we can consider such problems as:

116

9 Forced Classification

(1) Identifying consumers who prefer a particular set of cosmetics, their backgrounds and the attributes that the most popular cosmetics share. (2) Suppose that a set of medical symptoms are known to be related to an illness, can we use those symptoms as criteria for identifying potential patients and related personality characteristics and social backgrounds? (3) Suppose that ten award winning volunteers of the year have just been chosen, what are their common personality traits, social backgrounds, family compositions and other background information? We can continue to list possible research questions where more than one selection criteria are involved. This type of multiple discriminant-analysis problems can be handled by generalized forced classification. The model for the analysis is to represent the data matrix, pre- and/or post multiplied by the respective projection operators, and carried out the quantification analysis. See an interesting application reported by Day (1989).

References Day, D.A. (1989). Investigating the validity of the English and French versions of the Myers-Briggs type indicator. University of Toronto Ph.D. Thesis. Lawrence, R. D. (1985). Dual scaling of multidimensional data structures: An extended comparison of three methods. Ph.D. thesis, University of Toronto. Nishisato, S. (1971). Analysis of variance through optimal scaling. In Proceedings of the First Canadian Conference on Applied Statistics (pp. 306–316). Montreal: Sir George Williams University Press. Nishisato, S. (1972). Analysis of variance of categorical data through selective scaling. Proceedings of the 20th International Congress of Psychology, Tokyo, 279. Nishisato, S., et al. (1986). Generalized forced classification for quantifying categorical data. In E. Diday (Ed.), Data Analysis and Informatics (pp. 351–362). Amsterdam: North-Holland. Nishisato, S. (1988a). Forced classification procedure of dual scaling: Its mathematical properties. In H. H. Bock (Ed.), Classification and Related Methods (pp. 523–532). Amsterdam: NorthHolland. Nishisato, S. (1988b). Market segmentation by dual scaling through generalized forced classification. In W. Gaul & M. Schader (Eds.), Data, expert knowledge and decisions (pp. 268–278). Berlin: Springer. Nishisato, S. (1994). Elements of dual scaling: An introduction to practical data analysis. Hillsdale, N.J.: Lawrence Erlbaum Associates. Nishisato, S. (2007). Multidimensional nonlinear descriptive analysis. London: ChapmanHall/CRC. Nishisato, S. (2010). Data analysis for behavioral sciences: Use of methods appropriate for information retrieval. Tokyo: Baifukan. (in Japanese). Nishisato, S., & Baba, Y. (1999). On contingency, projection and forced classification of dual scaling. Behaviormetrika, 26, 207–219. Nishisato, S., & Gaul, W. (1990). An approach to marketing data analysis: The forced classification procedure of dual scaling. Journal of Marketing Research, 27, 354–360. Nishisato, S., & Lawrence, D. R. (1989). Dual scaling of multiway data matrices: Several variants. In R. Coppi & S. Bolasco (Eds.), Multiway Data Analysis. Elesevier Science Publishers B.V: North Holland.

References Nishisato, S., & Nishisato, I. (1986). Dual3 users’ guide. Toronto: MicroStats. Nishisato, S., & Nishisato, I. (1984). An introduction to dual scaling. Toronto: MicroStats. Nishisato, S., & Nishisato, I. (1994). Dual scaling in a nutshell. Toronto: MicroStats.

117

Chapter 10

Data with Designed Structure

10.1 Analysis of Variance of Nominal Data When we talk about the analysis of variance, it sheds some new light for expanding the scope of our applications of quantification theory. The following is a quotation from Nishisato (1980, p. 15) about the relation of quantification theory to the analysis of variance: In 1948 Fisher analyzed reactions to 12 samples of human blood tested with 12 different sera. The data (reactions) were by five symbols: −, ?, w (weak?), (+) and +. His question was:Given a two-way table of non-numerical observations, what values, or scores, shall be assigned to them in order that observations shall be as additive as possible? After assigning 0 to – and 1 to +, he defined the problem as that of determining three unknowns, x, y and z for reactions ?, w and +, respectively, so as to maximize the between-row and the between-column sums of squares, relative to the total sum of squares. Fisher thus presented an example of the analysis of variance of non-numerical observations through dual scaling.

Fisher’s work was in retrospect a quantification study on nominal data with a one-way analysis of variance design. This was very much in line with his earlier work (Fisher 1940), one of the earliest studies in quantification theory. Somehow, the analysis of variance of nominal data for a general design was not pursued until much later. Nishisato noted this and expanded Fisher’s work to multi-way analysis (1971a, 1972) before his forced classification paper was published in 1984. This chapter will look at his work as discussed in Nishisato (1980). We will use the example in his book, in which three subjects were randomly sampled for each of the following four groups, using a 2-by-2 factorial design: (α: smoker, non-smoker)×(β: alcoholic, non-alcoholic). Suppose that the subjects answered three multiple-choice questions and the responses were as summarized as in Table 10.1. The data matrix is 12×10 matrix of 1’s and 0’s, where the two factors α and β have two levels each (indicated by 1 and 2). As we can see, the subjects are sampled according to the 2×2 factorial design. The two-way analysis of variance design

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 S. Nishisato, Optimal Quantification and Symmetry, Behaviormetrics: Quantitative Approaches to Human Behavior 12, https://doi.org/10.1007/978-981-16-9170-6_10

119

120

10 Data with Designed Structure

Table 10.1 Data with factorial design on subjects α β Subject 1 2 1 1 1 1 1 1 2 2 2 2 2 2

1 1 1 1 2 2 1 1 1 2 2 2

1 2 3 4 5 6 7 8 9 10 11 12

1 1 0 0 0 0 0 0 0 0 0 0

0 0 1 0 0 1 1 1 1 0 0 0

0 0 0 1 1 0 0 0 0 1 1 1

1 0 0 0 0 0 0 0 0 0 0 0

3 0 0 1 0 0 1 1 0 1 1 1 0

0 1 0 1 1 0 0 1 0 0 0 1

1 0 0 0 0 1 0 0 0 0 0 0

Table 10.2 Differential maximization: two factors and interaction α β Subject α max β max 1 1 1 1 1 1 2 2 2 2 2 2

1 1 1 2 2 2 1 1 1 2 2 2

1 2 3 4 5 6 7 8 9 10 11 12

6.39* 1.49* −0.43* 0.71* −0.71* 0.99* −2.20 −0.71 −1.85 −0.42 −2.20 −1.07

3.67* 1.47* 0.49* −2.82 −1.59 1.10 1.10* 1.35* 1.71* −2.45 −1.84 −2.20

0 1 0 0 0 0 1 0 0 0 1 1

0 0 0 0 1 0 0 1 1 0 0 0

0 0 1 1 0 0 0 0 0 1 0 0

γ max 3.89* 2.53* −0.52* 0.65 −1.94 −1.17 −0.19 −3.50 −3.11 1.04* 1.36* 0.97*

postulates the following model for the subject i of level p of factor α and level q of factor β. yiα p βq = μ + α p + βq + γαβ + eiα p βq where μ is the general mean, α p is the effect of level p of factor α, βq is the effect of level q of factor β, γαβ is the interaction and eiαβ is the residual for cell iαβ. Therefore, the design matrix for data F is given by A as in Table 10.2.

10.1 Analysis of Variance of Nominal Data



1 ⎢1 ⎢ ⎢1 ⎢ ⎢1 ⎢ ⎢1 ⎢ ⎢1 A=⎢ ⎢1 ⎢ ⎢1 ⎢ ⎢1 ⎢ ⎢1 ⎢ ⎣1 1

1 1 1 1 1 1 0 0 0 0 0 0

0 0 0 0 0 0 1 1 1 1 1 1

1 1 1 0 0 0 1 1 1 0 0 0

0 0 0 1 1 1 0 0 0 1 1 1

1 1 1 0 0 0 0 0 0 0 0 0

0 0 0 1 1 1 0 0 0 0 0 0

121

0 0 0 0 0 0 1 1 1 0 0 0

⎤ 0 0⎥ ⎥ 0⎥ ⎥ 0⎥ ⎥ 0⎥ ⎥  0⎥ ⎥ = 1μ Aα Aβ Aγ ⎥ 0⎥ 0⎥ ⎥ 0⎥ ⎥ 1⎥ ⎥ 1⎦ 1

Then, our quantification task is to derive 12 scores for the subjects, y, from the 12×10 response-pattern table F through assigning weights to the 10 options, which we indicate by x. Let us define the vector of scores for the subjects by y = D−1 Fx where D−1 is 12×12 diagonal matrix with 13 as the diagonal elements. But, remember that we now postulate the 2×2 factorial design for y, that is, y = Ax where A is in our example 12×9. We would like to incorporate the information about this design in our quantification. We have introduced the concept of projection operators in forced classification. For the analysis of variance model, we can introduce the projection operators for various spaces associated with the analysis of variance design. Noting that we have set the sum of the weighted responses to be zero, the general mean of the analysis of variance model can be ignored from quantification. Then, the projection operators P for the remaining quantities of the analysis of variance model can be expressed as follows: • For the structure α:

Pα = Aα (Aα Aα )−1 Aα

• For the structure β:

Pβ = Aβ (Aβ Aβ )−1 Aβ

• For the interaction γ : Pγ = Aγ (Aγ Aγ )−1 Aγ

122

10 Data with Designed Structure

10.1.1 Maximizing the Effects of α, β and γ Once we derive projection operators, we project the data onto the chosen space and carry out the standard quantification analysis. Suppose we want to quantify the data so as to maximize the effects of factor α. This task can be carried out by subjecting Pα F to quantification: ⎡

2 ⎢2 ⎢ ⎢2 ⎢ ⎢2 ⎢ ⎢2 ⎢ 1 ⎢2 Pα F = ⎢ 6⎢ ⎢0 ⎢0 ⎢ ⎢0 ⎢ ⎢0 ⎢ ⎣0 0

2 2 2 2 2 2 3 3 3 3 3 3

2 2 2 2 2 2 3 3 3 3 3 3

1 1 1 1 1 1 0 0 0 0 0 0

2 2 2 2 2 2 4 4 4 4 4 4

3 3 3 3 3 3 2 2 2 2 2 2

2 2 2 2 2 2 0 0 0 0 0 0

1 1 1 1 1 1 3 3 3 3 3 3

⎤ 12 1 2⎥ ⎥ 1 2⎥ ⎥ 1 2⎥ ⎥ 1 2⎥ ⎥ 1 2⎥ ⎥ 2 1⎥ ⎥ 2 1⎥ ⎥ 2 1⎥ ⎥ 2 1⎥ ⎥ 2 1⎦ 21

Similarly, to maximize the effects of β and interaction γ , we should first calculate Pβ F and Pγ F, and subject these matrices to quantification. Remember that we are projecting the data onto particular subspace specified by these projection operators, and carry out quantification. We will show only the optimal scores of subjects on the first component under the quantification of Pα F, Pβ F and Pγ F. In this demonstration, those numbers marked with * are expected to be close (in our example, positive numbers) and those without the asterisks are expected to be close among themselves (negative numbers). In other words, • The use of Pα means the maximization of the difference between α1 and α2 , that is, the difference between the first six subjects and the remaining six subjects. • The use of Pβ means the maximization of the difference between β1 and β2 , that is, the difference between subjects (1, 2, 3, 7, 8, 9) and subjects (4, 5, 6, 10, 11, 12). • The use of Pγ means the maximization of the interaction effects, that is, the difference between subjects (1, 2, 3, 10, 11, 12) and subjects (4, 5, 6, 7, 8, 9) We should mention that the task carried out here is to evaluate the distribution of scores when we project the data onto a specific subspace such as theα space, the β space and the interaction γ space. Our quantification method certainly can carry out the task we intend to accomplish. The idea of a data matrix with some structures for rows or columns has also been extensively investigated in the field of principal component analysis (see Takane and Shibayama 1991; Takane 2014). Those interested are particularly referred to an excellent book by Takane (2014).

10.2 Quantification of Multi-way Analysis of Data

123

10.2 Quantification of Multi-way Analysis of Data We have looked at contingency tables and multiple-choice data. The contingency tables are two-way tables, while multiple-choice data can be presented as multi-way contingency tables or response-pattern tables. No matter which data types are considered, we have already discussed forced classification of data with design structures for the rows and the columns of a two-way table, which we called multi-way data. We have already introduced a prototype of this type of quantification, namely the one-way analysis of variance of nominal data via projection operators (Nishisato 1971). The same idea can be extended to the situation in which both rows and columns have specific structures. The data with both row structure and column structure are called multi-way data. As mentioned briefly in relation to forced classification analysis, Nishisato and Lawrence (1989) provided the following representation of the p × q data matrix F with the following structures on both rows and columns: F = (P1 + P2 + · · · + Ps )F(Q1 + Q2 + · · · + Qt ) where

s

i=1

Pi = Ip ,

q

Qj = Iq

j=1

This is perhaps the most general expression for quantifying multi-way data matrices. As was the case with the previous section, we do not necessarily use all the projection operators, but investigators can choose particular projection operators of their interests as we did in the previous section (e.g., maximizing only the effect of α). There are a few references for some examples (Poon 1977; Lawrence 1985). The above general scheme can be applied not only for the quantification of contingency tables and multiple-choice data but also for dominance data of rank-order and paired comparisons.

References Fisher, R. A. (1940). The precision of discriminant functions. Annals of Eugenics, 10, 422–429. Lawrence, D. R. (1985). Dual scaling of multidimensional data structures: An extended comparisons of three methods. University of Toronto Ph.D. Thesis. Nishisato, S. (1971). Analysis of variance through optimal scaling. Proceedings of the first Canadian conference of applied statistics (pp. 306–316). Montreal: Sir George Williams University Press. Nishisato, S. (1972). Analysis of variance of categorical data through selective scaling. Proceedings of the 20th international congress of psychology, Tokyo, 279. Nishisato, S. (1980). Analysis of Categorical Data: Dual Scaling and Its Applications. Toronto: The University of Toronto Press. Nishisato, S., & Lawrence, D. R. (1989). Dual scaling of multiway data matrices: Several variants. In R. Coppi & S. Bolasco (Eds.). Multiway Data Analysis. Amsterdam: North Holland.

124

10 Data with Designed Structure

Poon, W. (1977). Transformations of data matrices in optimal scaling. Thesis: University of Toronto M.A. Takane, Y. (2014). Constrained principal component analysis and related techniques. Boca Raton: CRC Press. Takane, Y., & Shibayama, T. (1991). Principal component analysis with external information on both subjects and variables. Psychometrika, 56, 97–120.

Chapter 11

Quantifying Dominance Data

11.1 Dominance Data According to Stevens’ description (Stevens 1951), dominance data do not have the equal unit or origin (e.g., the difference between Rank 1 and Rank 2 cannot be equated to the difference between Rank 4 and Rank 5; Rank 0 has no interpretable meaning), and as such we cannot multiply or divide dominance data to obtain a rational number. Even addition or subtraction does not yield a meaningful quantity. Thus, we can anticipate that the quantification of dominance data would be different from that of nominal data. We wonder if all the readers are familiar with row-conditional data. Let us explain this aspect of dominance data, using an example. Consider a small example of rank order data, where three subjects ranked four movies, with Rank 1 indicating the most preferred and Rank 4 the least (Table 11.1). We can safely assume that rank orders of the movies are meaningful only within subjects, for Rank 1 by subject 1 cannot be equated with Rank 1 by subject 2 or 3. In other words, the comparison over the columns (different subjects) of the data matrix is not meaningful. This is the reason why the data are called row-conditional. The other side of this coin is that we can compare the numbers within each row (subject). The data with this property have a fancy name, ipsative data. If you are not convinced by this movie example, consider ranking of two cars, A and B, by 100 consumers and if we decide to treat Rank 1 and Rank 2 as cardinal (numerical) numbers and calculate the product-moment correlation between A and B. Guess what! No matter how many people rank the two cars, the correlation is always −1. We can understand this because if A is ranked 1 and B ranked 2 by one subject, another subject may Rank A as Rank 2 and B as Rank 1. The judgments are always in the opposite directions, resulting the correlation of −1. Suppose we ask the same people to rank five cars and treat rank numbers as cardinal, then the correlation between any two cars is likely to be negative, because the ipsative nature of the data inherits the tendency that higher ranks of some cars are obtained at the expense of the other cars. © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 S. Nishisato, Optimal Quantification and Symmetry, Behaviormetrics: Quantitative Approaches to Human Behavior 12, https://doi.org/10.1007/978-981-16-9170-6_11

125

126

11 Quantifying Dominance Data

Table 11.1 Rank order data Subject St1 1 2 3

2 2 1

St2

St3

St4

3 1 3

4 4 2

1 3 4

This ipsative nature is embedded in dominance data (both rank order and paired comparison data) and creates a strange situation from the quantification point of view. Both rank-order data and paired comparison data are very much like independent of the number of subjects (judges). In other words, both rank-order data and paired comparison data can be analyzed even if there is only one subject, no numerical difficulty when there is only one subject. In this case, of course, one component accounts for the entire variation, but even in this case, the eigenvalue varies between one third and one, depending on the number of objects (Nishisato 1980). This seems to tell us why the data are called row-conditional. Quantification of data from many subjects is only a matter of calculating differentially weighted configurations of objects coming from different subjects. There are three obvious cases when data can be explained by one dimension, first when there is only one judge, second when there are only two objects, irrespective of the number of judges, and third when all the judges provide the same ranking of objects, irrespective of the number of objects and the number of judges. These are special cases and serve to show some important differences between Coombs’ unfolding approach and the quantification approach.

11.1.1 Quantification Approaches In dealing with rank-order data, Coombs (1964) proposed the unfolding method to find the scale values of the ranked objects by a purely nonmetric procedure. It was a brilliant nonmetric technique, but it was not straightforward to extend Coombs’ nonmetric method of unfolding to multidimensional case. An approach that is quite different from Coombs’ unfolding model but much more feasible approach was proposed by Guttman (1946). It is interesting to note that Guttman (1967) complained about Coombs’ lack of attention to his earlier work (Guttman 1941, 1946). It is understandable, however, that Coombs ignored Guttman’s work because Guttman handled ordinal measurement as cardinal numbers, while Coombs’ approach was purely nonmetric, that is, ordinal treatment of ordinal measurement. But, by abandoning a purely nonmetric approach, Guttman accomplished something monumental: quantification of dominance data. His decision was right: when Schönemann (1970) published his famous paper on metric multidimensional unfold-

11.1 Dominance Data

127

ing, it was clear that Coombs’ original intention of solving an ordinal problem with ordinal measurement was completely abandoned. By treating ordinal measurement as cardinal numbers, it was only then possible to adopt the least-squares approach to solving the problem of scaling. After Schönemann’s monumental work, a large number of papers on multidimensional unfolding were published, to name a few, Schönemann and Wang (1972), Davidson (1973), Gold (1973), Sixtl (1973), Heiser (1981), Greenacre and Browne (1986) and Adachi (2000). There was a different approach to the scaling of rank order as well (e.g., Baba 1986). As for the traditional approach of quantification theory, Guttman’s 1946 approach was followed by Tucker (1960), Slater (1960), Carroll (1972), de Leeuw (1973; 1984), Nishisato (1978; 1980; 1994; 2007), Hojo (1994), Han & Huh (1940), Okamoto (1995), van de Velden (2000) and Torres-Lacomba and Greenacre (2002). Nishisato (1978) demonstrated mathematical equivalence of methods by Slater (1960), Tucker (1960) and Carroll (1972) to Guttman’s, and presented an alternative formulation to Guttman’s, which extended it to handle tied judgments in rank order and paired comparisons.

11.1.2 Quantification Nishisato (1978, see also 1994) introduced the following response variable:

i

f jk

⎧ ⎨ 1 if Subject i judges X j > X k 0 if Subject i judges X j = X k = ⎩ −1 if Subject i judges X j < X k

(11.1)

i = 1, 2, ..., N ; j, k = 1, 2, ..., n ( j = k). Then, define the dominance number for Subject i and Object j by n  ei j = (11.2) i f jk k=1

The judge-by-object table of ei j is called dominance matrix and is indicated by E. In 1973, de Leeuw proposed the following formula for rank order data (i.e., the so-called de Leeuw formula): ei j = n + 1 − 2Ri j

(11.3)

where Ri j is the rank that Subject i gave to Object j out of n objects. Nishisato (1978) showed that his general formula above, developed for both rank order data and paired comparison data, is identical to the de Leeuw formula when it is applied to rank order data. In Nishisato’s formulation, the eigenequation to be solved is given by

128

11 Quantifying Dominance Data

(Hn − λI)x = 0, where Hn =

1 E E N n(n − 1)2

(11.4)

(11.5)

Nishisato (1978) demonstrated his formulation is equivalent to Guttman’s formulation, Slater’s formulation and Tucker-Carroll’s formulation. For the details of comparisons, see Nishisato (1978; 2007). Our task is to subject the matrix Hn to the eigenequation analysis.

11.1.3 Total Information In the case of nominal data (e.g., contingency table and multiple-choice data), the total amount  of information was given by the sum of eigenvalues, that is, T (in f ) =  ηk2 = ρk2 . In the case of dominance data, T (in f ) is given by the trace of Hn . Namely, it is given by the sum of the eigenvalues of the n × n Hn , T (in f ) =



ρk 2 = trace(Hn ) =

1 trace(E E) N n(n − 1)2

(11.6)

Note that the elements of E for rank-order data can be generated by the de Leeuw formula (1973): ei j = n + 1 − 2Ri j where Ri j is the rank that subject i gave to object j out of n objects. Therefore, we can obtain the trace of E’E as tr (E E) = tr (EE ) = N

n 

(n + 1 − 2R j )2

j=1

 =N ((n + 1)2 − 4(n + 1)R j + 4R j 2 )   R j + 4N Rj2 = N n(n + 1)2 − 4N (n + 1) = N n(n + 1)2 − 4N (n + 1) =

(11.7)

n(n + 1) 4N n(n + 1)(2n + 1) + 2 6

Nn (n − 1)(n + 1) 3

Therefore, the total information, T(inf), is the trace of Hn given by T (in f ) = tr (Hn ) =

1 n+1 . tr (E E) = N n(n − 1)2 3(n − 1)

(11.8)

11.1 Dominance Data

129

The total information is therefore bounded by 1 ≥ T (in f ) ≥

1 3

(11.9)

T (in f ) becomes a minimum when n goes to infinity and the maximum of 1 when n=2. The total number of components can be inferred from the fact that the sum of each row of the dominance table is zero. Thus, if the number of subjects N is larger than the number of objects n minus one, then T (sol) = n − 1

(11.10)

Otherwise, T(sol)=N. Note that there is no trivial solution in dominance data.

11.2 Example: Ranking of Municipal Services In collecting rank order data, we ask each respondent to rank a set of n objects according to the order of their preference, namely, Rank 1 for the most preferred and rank n for the least preferred object. In 1982, the data were collected in Nishisato’s scaling class in which 31 students ranked 10 municipal services in Toronto. (Note: it so happened that a long strike by postal service workers preceded this data collection. Therefore, it is quite possible that the strike affected the ranking of postal services). The ten municipal services are: A = Public transit system B = Postal services C = Medical care, including hospitals and clinics D = Sports, recreational facilities E = Police protection F = Public libraries G = Cleaning streets H = Restaurants I = Theaters J = Overall planning and development The ranking data and the corresponding dominance table are given in Tables 11.1 and 11.2, respectively. We should keep in mind that individual differences are the causes for multidimensional data structure (Table 11.3). From this data set, we obtain nine components. The corresponding statistics of these components are as in Table 11.4.

130

11 Quantifying Dominance Data

Table 11.2 Ranking of ten government services in Toronto Subject A B C D E F

G

H

I

J

3 7 10 4 5 2 8 5 6 4 4 3 5 9 3 8 2 7 5 6 4 7 5 8 3 3 4 9 5 5 3

8 2 2 8 3 4 4 3 3 1 2 9 2 2 2 2 6 2 4 3 6 2 2 6 4 1 2 2 3 3 2

5 4 1 7 9 10 2 9 7 3 3 8 7 1 5 1 7 1 3 5 8 1 1 3 5 4 1 4 6 6 6

4 8 7 9 8 9 9 8 9 6 8 5 8 4 7 7 4 6 9 8 9 5 3 7 8 8 6 1 8 8 8

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31

1 6 9 2 2 1 7 2 2 2 9 6 1 8 8 3 1 5 2 1 2 6 6 5 2 7 8 3 2 2 4

7 10 8 10 10 3 10 10 10 10 10 10 10 6 10 5 10 4 10 4 10 3 9 2 10 10 7 8 10 10 10

9 9 4 5 6 5 1 6 5 5 7 7 3 5 9 10 8 9 6 2 5 9 10 1 6 9 10 6 7 9 9

10 5 3 6 7 6 6 7 8 9 6 4 9 3 6 4 9 3 7 10 7 4 4 9 7 5 3 7 9 1 7

2 3 5 3 4 7 5 4 4 8 5 2 6 10 4 6 3 10 8 9 3 10 8 10 9 2 5 5 4 4 5

6 1 6 1 1 8 3 1 1 7 1 1 4 7 1 9 5 8 1 7 1 8 7 4 1 6 9 10 1 7 1

There are four major components and then the eigenvalues drop substantially. The principal coordinates of those government services for the first four components are as shown in Table 11.5. As you recall, there were 31 students. Unfortunately, no information is available about those students. Therefore, it may not be of much interest to look at the principal coordinates of the students. If we disregard the students, all those government services

11.2 Example: Ranking of Municipal Services

131

Table 11.3 Subjects-by-services dominance table Subject A B C D E 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31

9 −1 −7 7 7 9 −3 7 7 7 −7 −1 9 −5 −5 5 9 1 7 9 7 −1 −1 1 7 −3 −5 5 7 7 3

−3 −9 −5 −9 −9 5 −9 −9 −9 −9 −9 −9 −9 −1 −9 1 −9 3 −9 3 −9 5 −7 7 −9 −9 −3 −5 −9 −9 −9

−7 −7 3 1 −1 1 9 −1 1 1 −3 −3 5 1 −7 −9 −5 −7 −1 7 1 −7 −9 9 −1 −7 −9 −1 −3 −7 −7

−9 1 5 −1 −3 −1 −1 −3 −5 −7 −1 3 −7 5 −1 3 −7 5 −3 −9 −3 3 3 −7 −3 1 5 −3 −7 9 −3

Table 11.4 Distribution of information Component 1 2 3 Eigenvalue Delta Cum. delta

7 5 1 5 3 −3 1 3 3 −5 1 7 −1 −9 3 −1 5 −9 −5 −7 5 −9 −5 −9 −7 7 1 1 3 3 1

4

5

F

G

H

I

J

−1 9 −1 9 9 −5 5 9 9 −3 9 9 3 −3 9 −7 1 −5 9 −3 9 −5 −3 3 9 −1 −7 −9 9 −3 9

5 −3 −9 3 1 7 −5 1 −1 3 3 5 1 −7 5 −5 7 −3 1 −1 3 −3 1 −5 5 5 3 −7 1 1 5

−5 7 7 −5 5 3 3 5 5 9 7 −7 7 7 7 7 −1 7 3 5 −1 7 7 −1 3 9 7 7 5 5 7

1 3 9 −3 −7 −9 7 −7 −3 5 5 −5 −3 9 1 9 −3 9 5 1 −5 9 9 5 1 3 9 3 −1 −1 −1

3 −5 −3 −7 −5 −7 −7 −5 −7 −1 −5 1 −5 3 −3 −3 3 −1 −7 −5 −7 1 5 −3 −5 −5 −1 9 −5 −5 −5

8

9

6

7

0.154 0.091 0.054 0.043 0.020 0.017 0.011 0.009 0.008 37.9 22.4 13.4 10.6 4.9 4.2 2.7 2.2 1.9 37.9 60.2 73.6 84.2 89.0 93.2 95.9 98.1 100.0

132

11 Quantifying Dominance Data

Table 11.5 Four components: principal coordinates of government services Service Comp 1 Comp 2 Comp 3 Transit Postal Medical Sports Police Library Street Restaurant Theater Planning

0.40 −0.76 −0.14 −0.26 0.25 0.60 0.25 0.23 −0.19 −0.38

−0.14 −0.26 −0.33 0.19 −0.19 −0.15 −0.15 0.55 0.54 −0.50

0.36 0.11 0.31 −0.30 −0.38 −0.13 −0.13 0.17 0.06 −0.14

Comp 4 0.35 0.03 −0.33 −0.06 0.06 0.21 0.21 0.03 −0.11 0.16

can be plotted in graphs without any worry about the space discrepancy. Here, we will look at the graph of only the first two components as shown in Fig. 11.1. Government services are indicated by inverted triangles and subjects by squares. The graph should be interpreted in the following way: Each subject ranks first the closest service, second the second closest service and so on till the farthest the last rank. Therefore, we can say the following: A Those subjects in [A] that Theaters in Toronto are the best, and Restaurants the second best. B Those in [B] reverse the ranking of [A], stating that Restaurants are the best and Theaters the second best. However, we can say that those in [A] and [B] are perhaps used to theaters and good restaurants. C These subjects look more like ordinary citizens, ranking public libraries first, and then the transit system and the police protection. D These subjects are similar to those in [C], but they consider the transit system and the Toronto streets and police protection are ranked high. • Unfortunately Postal Service is ranked lowest by most subjects, followed by City Planning. As is obvious, this graph is quite different from those we have seen in the previous chapters. The distribution of subjects is not restricted to the same norm as the objects (in the current case, government services). So, it is not unusual that all the subjects are clustered in one area, for example when they provide very similar rankings, then all the subjects will occupy one small area in the graph, signifying the solidarity of their preference ranking. In this way, we can see a distinct difference between quantification of nominal data and ordinal data. The above interpretation is based on Nishisato (1996), who surveyed the work on rank order data and advanced an interesting idea, which can be summarized as follows:

11.2 Example: Ranking of Municipal Services

133

Fig. 11.1 Government services: Component 1 and 2

• Plot standard coordinates of subjects yi and principal coordinates of objects ρx j . Then, we obtain a configuration such that each subject ranks the closest object first, the second closest second, and so on for all subjects and objects • This was his interpretation of graphs of rank order data, and its numerical proof of the above statement can be found in Nishisato & Nishisato (1994), who demonstrated the above results, using their program Dual3 (note: unfortunately this program is no longer available for circulation). Let us look at an application of the idea to the two-component solution of the current example. To simplify our discussion, we decided to use the data on the first five subjects. The Dual3 provides the following distance matrix between yi and ρx j based on the first two components for the first five subjects (Tables 11.5 and 11.6). Since the quantification of rank-order data is to reproduce the ranking of the city services by subjects, we convert these distances into ranks, resulting in Tables 11.7, 11.8 and 11.9 (Tables 11.10, 11.11 and 11.12).

134

11 Quantifying Dominance Data

Table 11.6 Rank 2 subjects-stimuli squared distances Service 1 2 3 4 5 6

7

8

9

10

0.29 2.33 2.94 1.63 1.66

1.29 1.14 1.08 2.97 2.29

1.81 2.09 0.94 4.10 3.55

1.29 3.64 2.49 3.60 3.72

7

8

9

10

4 5 5 4 4

7 1 2 6 5

9 4 1 9 8

6 8 4 8 9

Subj.1 Subj.2 Subj.3 Subj.4 Subj.5

0.20 2.05 3.03 1.33 1.31

2.05 5.42 3.48 4.82 5.26

0.66 3.65 3.40 2.48 2.81

1.32 2.79 1.76 3.56 3.40

0.26 2.43 3.08 1.57 1.65

0.13 1.81 3.35 0.95 0.87

Table 11.7 Rank 2 approximation (ranked distances) Service 1 2 3 4 5 6 Subj.1 Subj.2 Subj.3 Subj.4 Subj.5

2 3 6 2 2

10 10 10 10 10

5 9 9 5 6

8 7 3 7 7

3 6 7 3 3

1 2 8 1 1

Table 11.8 Rank 8 subjects-stimuli squared distances Service 1 2 3 4 5 6 Subj.1 Subj.2 Subj.3 Subj.4 Subj.5

13.90 5.79 11.29 4.99 2.70

16.75 8.16 11.02 8.49 6.79

17.42 6.69 9.04 6.52 4.09

17.76 4.99 8.48 6.75 4.59

14.20 4.40 9.36 5.24 3.52

16.15 4.03 9.98 4.32 2.66

Table 11.9 Rank 8 approximation (ranked distances) Service 1 2 3 4 5 6 Subj.1 Subj.2 Subj.3 Subj.4 Subj.5

1 7 9 2 2

7 10 8 10 10

9 8 4 5 6

10 5 3 6 7

2 3 5 3 5

6 1 6 1 1

7

8

9

10

14.63 5.49 11.58 6.05 3.36

16.91 4.36 8.07 7.30 3.43

15.67 4.49 7.73 7.44 5.45

15.02 6.78 10.18 7.71 5.42

7

8

9

10

3 6 10 4 3

8 2 2 7 4

5 4 1 8 9

4 9 7 9 8

11.2 Example: Ranking of Municipal Services

135

Table 11.10 Averaged squared discrepancies of approximated ranks Rank-k 1 2 3 4 5 6 7 Subj.1 Subj.2 Subj.3 Subj.4 Subj.5

8.8 6.2 19.6 1.4 1.2

7.8 2.8 8.0 1.0 0.8

9.0 1.4 8.0 1.2 1.4

4.6 0.2 1.2 1.6 1.4

4.2 0.4 1.2 1.6 1.4

1.4 0.4 0.0 1.6 1.0

1.6 0.2 0.0 0.6 0.8

8

9

0.0 0.4 0.0 0.2 0.6

0.0 0.0 0.0 0.0 0.0

It is useful to look at average squared rank discrepancies between these approximated ranks and the original ranks. The following are the corresponding statistics of the first five subjects for Rank-1 to Rank-9 approximations. Notice that the Rank-9 approximation reproduced the input ranks, thus showing no discrepancies (Table 11.7). In terms of this complete recovery of the distance relations between objects (e.g., city cervices) in this example of the nine-component quantification, our analysis offers a solution to Coombs’ multidimensional unfolding problem (Nishisato 1994, 1996) when standard coordinates of subjects and principal coordinates of objects are jointly plotted in the same space. This is the case of objects projected onto the subject space. In other words, our joint configuration contains the information for each subject which object is to be chosen first, which one for second, and so on for all objects and all subjects. Should some one solves Coombs’ problem in a purely nonmetric way for a very large data set, it is possible that Coombs’ joint configuration may be accommodated in the space of fewer dimensions than the configuration by quantification method. But, how many fewer dimensions? The discrepancy in the number of dimensions between the two approaches may be trivial since we hardly look at more than several dimensions.

11.3 Paired Comparison Data As Guttmen (1946) and Nishisato (1978) have shown, rank order data and paired comparison data can be quantified in the same way. The only differences are specification of input data and the corresponding dominance numbers. Paired comparison data consist of an N × [n(n − 1)/2] (subjects-by-pairs of stimuli, X(j,k)) table, of which the element in the ith row and the (j,k)th column is 1 if subject i prefers stimulus j to stimulus k, 0 if subject i makes an equality judgment, and 2 if subject i prefers stimulus k to stimulus j. The following example shows three subjects who made paired comparison judgments of four fruits, apples (A), pears (P), grapes (G) and mangoes (M). The columns correspond to the pairs (A, P), (A, G), (A, M), (P, G), (P, M), and (G, M).

136

11 Quantifying Dominance Data

Table 11.11 Paired comparison data Subject AP AG 1 2 3

1 2 1

2 1 2

AM

PG

PM

GM

2 1 2

2 2 1

2 1 2

2 2 2

We should note that in rank order data the subject must arrange all objects in a straight order, while in paired comparison, it is possible for a subject to produce intransitive relations such as the subject prefers A–B, B–C, and C–A, rather than A–C. In this regard, paired comparison judgments may be made in terms of a different judgmental criterion from pair to pair. For instance, in a taste experiment, one may compare the first pair in terms of spiciness and the next pair in terms of sweetness. Weingarden & Nishisato (1986) demonstrated that paired comparison data capture more individual differences than rank order data when the same stimuli were judged by the two methods, ranking and paired comparisons. Mathematically, the paired comparison data from N subjects on all possible pairs of n objects have the same structure as the N-by-n rank order data. The total number of components and the total information are identical to the rank order data. The only difference is that in rank order one must arrange all objects in a single order, while in paired comparisons one can anticipate so-called intransitive choices (e.g., A is preferred to B, B is preferred to C, and C is preferred to A). Nishisato (1978) introduced a response variable and a formula to calculate dominance numbers for paired comparison data. For subject i and pair (X j , X k ), define a response variable:

i

f jk

⎧ ⎨ 1 if X j > X k 0 if X j = X k = ⎩ −1 if X j < X k

(11.11)

The subjects-by-objects dominance table can be obtained by transforming i f jk to ei j by the formula,

ei j =

n  i

f jk

(11.12)

k=1,k= j

Recall that the dominance numbers were easily obtained for rank-order data. The meaning is the same, that is, ei j is the number of times subject i preferred X j to X k minus the number of times subject i preferred other objects to X j .

11.3 Paired Comparison Data

137

Table 11.12 Wiggins’ christmas party plans data and dominance table j 1111111222222333334444555667 P l a k 2345678345678456785678678788 1 2 3 4 5 1 2 3 4 5 6 7 8 9 10 11 12 13 14

1121121222222211211121121212 2221212121212211121112222122 1111121111121111211121222211 2121112111112212221112222222 2221212221222212121111222122 1111111221222212221111222221 1111121121121211211121222221 1111121121221212211221221211 1221121221122111211121222221 1211222221222111111222222112 1211111222222111111111222222 2222122121111211111111111221 1211212222222111111212222112 2222121211111111112121121211

3 −3 5 1 −3 7 5 5 1 −1 5 −5 1 −3

−7 1 3 5 −3 −5 1 −1 −3 −5 −7 5 −7 5

1 1 1 −5 1 −3 −1 −3 5 7 7 3 7 7

5 5 −1 3 7 5 3 1 3 −3 3 7 −1 −1

−1 −7 −7 −7 −7 −7 −7 −5 −7 −7 −5 1 −5 1

n 6

7

8

−3 1 −3 −3 3 −1 −5 3 −5 5 −3 −7 5 −5

5 −5 7 −1 −3 3 7 7 7 1 −1 −1 −3 3

−3 7 −5 7 5 1 −3 −7 −1 3 1 −3 3 −7

11.3.1 Example: Wiggins’ Christmas Party Plans As a course assignment for Nishisato’s class, Ian Wiggins, now a successful consultant in Toronto, collected paired comparison data from 14 researchers at a research institute on his eight Christmas party plans. 1: A pot-luck at someone’s home in the evening 2: A pot-luck in the group room 3: A pub/restaurant crawl after work 4: A reasonably priced lunch in an area restaurant 5: Keep to one’s self 6: An evening banquet at a restaurant 7: A pot-luck at someone’s home after work 8: A ritzy lunch at a good restaurant (tablecloths) The data are tabulated in the 14 (subjects) by 28 (pairs of plans) table with elements 1 for the choice of the first plan and 2 for the choice of the second plan. Although there are no ties in this example, a tie will be indicated by 0. The table lists both paired comparisons and dominance numbers. For computations, “1” in the table remains “1” but “2” is converted to “−1.” As is the case of rank-order data, each element of the 14 × 8 dominance table is based on 7 comparisons. More generally, for N × n dominance table, each element is based on (n − 1) comparisons. Therefore, the marginal frequency of responses for each row is n(n − 1) and that of each column is N (n − 1) if we are to use the method

138

11 Quantifying Dominance Data

Table 11.13 Summary statistics Eigenvalue Component 1 2 Delta CumDelta

34 34

26 60

3

4

5

6

7

16 76

13 89

7 96

3 99

1 100

of reciprocal averages, which requires these marginal quantities in averaging the weighted sums. From the dominance table, we should be able to see that Plan 5 is not very popular because the corresponding elements from 14 subjects are mostly negative. If we calculate the means of the eight columns, they would provide estimates of preference values of the eight party plans, expected when we ignore individual differences. Therefore, the dominance table is already quite informative, and the advantage of quantification is to introduce individual differences into optimization, namely to determine individual scores so as the make the variance of the eight weighted averages be a maximum. The summary statistics from the quantification are given in Table 11.13. Although the first three component solutions show a variety of preference patterns, component 4 is dominated only by one variable, “pub/restaurant crawl”. Therefore, let us look at only the first three components.

11.3.1.1

Graphical Display

In analyzing the outcomes of quantification of dominance data, the individual differences may not be of the most important information of our interest, although we can say that due to the individual differences we obtain a multidimensional configuration of objects (stimuli). The two graphs show that Dimension 1 divides party plans into the convivial side and the “Keep to one’s self” side, that Dimension 2 separates plans into expensive and non-expensive, and that Dimension 3 divides party plans into day-time parties and evening parties. It is important to note that subjects are scattered in the threedimensional space, and that each subject prefers the closer plan out the of two in a pair, this being the case for all subjects and party plans. Thus, this is another case in which our quantification analysis can be used to reveal in full all the individual differences in judgment. These graphs indicate that our quantification analysis can accommodate any patterns or combinations of different aspects of the party, such as expensive-evening, expensive-daytime and so on. One interesting aspect of the data is that the weights for subjects are mostly positive (12 subjects out of 14), indicating the majority choosing convivial party plans. As for component 2 (expensive-inexpensive) and component 3 (daytime-evening) the weights for subjects are almost evenly mixed with positive and negative weights. This is a reflection of the property of dominance

11.3 Paired Comparison Data

139

Fig. 11.2 Xmas parties: Component 1 and 2

data that the weights for subjects are not centered, due to the row-ipsative nature of dominance data. This is definitely a peculiar aspect of dominance data. Let us look at another example (Fig. 11.2).

11.3.2 Example: Seriousness of Criminal Acts The seriousness of the following criminal acts was investigated by paired comparisons: (A) Arson, (B) Burglary, (C) Counterfeiting, (F) Forgery, (H) Homicide, (K) Kidnapping, M Mugging, (R) Receiving stolen goods. There are in total 28 pairs, that is, 8(8-1)/2. Data were collected from students in Nishisato’s scaling class in 1979 (Tables 11.14, 11.15 and 11.16). In this data set, we see a number of tied responses. In particular, 11 subjects, out of 23, could not distinguish between counterfeiting and forgery in terms of the seriousness. So, this subject’s responses were kept at the bottom of the table, that is, subject 23. From the paired comparison table, however, this subject’s unique responses are not obvious. Therefore, let us go ahead and convert the responses in the above table to the dominance numbers and examine the responses of subject 23. Thus, according to the dominance numbers, the rank order of seriousness of the criminal acts by this subject, from the most serious to the least serious, is

140

11 Quantifying Dominance Data

Table 11.14 Paired comparisons of eight criminal acts Subject a b c d e f g h i j k l m n o p q r s t u v w x y z +* 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23

1112221112221122222221111111 1112011112201022222221111111 1112211012211022212211111111 1112211012221022212220111211 1112211112111122112211111111 1112211112221222212221111011 1111121112221222212221121211 1112211112221222222222111111 1112211112211122112221111111 0212221112211022212221111211 1212221212221122112222121111 1110111112201022202220111011 1110111212221122212221111111 1112011112221222212221111111 1110001002201022212221111011 1112011112201022222221111111 1112121112221222212221111211 1011101102202022002200111110 1100001110001022222220001011 0112221102220022202220111111 1111111222220222012211111111 1112111112221222212221111111 1101122202212012112111222111

Note a = (A, B),b = (A, C), c = (A, F), d = (Q, H), e = (A, K), f = (A, M), g = (A, R) h = (B, C), i = (B, F), j = (B, H), k = (B, K), l = (B, M), m = (B, R), n = (C, F) o = (C, H), p = (C, K), q = (C, M), r = (C, R), s = (F, H), t = (F, K), u = (F, M) v = (F, R), w = (H, K), x = (H, M), y = (H, R), z = (K, M), + = (K, R),* = (M, R) Table 11.15 Dominance numbers of subject 23 Crime A B C Dominance numbers 2

−4

2

F

H

K

M

R

2

−3

3

−1

−1

Kidnapping > (Arson, Counterfeiting, Forgery) > (Mugging, Receiving Stolen Goods) > Homicide > Burglary

Why homicide should be less serious than receiving stolen goods, forgery, counterfeiting, or mugging? It is difficult to comprehend it, if not impossible. As a whole, the judgment of seriousness of criminal acts is very much in line with what we expect, as reflected in the distribution of δ values over seven components.

11.3 Paired Comparison Data

141

Table 11.16 Distribution of information over components Statistic C1 C2 C3 C4 Eigenvalue Singular value Delta Cum. Delta

0.308 0.555 81.39 81.39

0.023 0.152 6.13 87.52

0.018 0.133 4.70 92.22

0.014 0.120 3.78 96.00

C5 0.008 0.089 2.10 98.10

C6 0.005 0.067 1.20 99.29

C7 0.003 0.052 0.71 100.00

As is clear, the major portion of information is explained by the first component. However, because of the strange judgment by subject 23, we should examine the first two components. As expected, most subjects occupy one end of the space, from which it is possible to find their ranking of the criminal acts in terms of seriousness. Subject 23, whose ranking is not shared with anyone else, is located away from the majority of subjects. Importantly, we should note that the weights for the subjects are not under any constraints because of the ipsative nature of the dominance table (each row sum is zero for all subjects). Thus, what we typically see is that when the first dimension is very dominant in the entire set of components, like the current case, the subjects are clustered vertically on one side of the horizontal axis. From the graph, we can see that homicide is the most serious, followed by kidnapping and arson, which are very close and definite at the second place on the scale of seriousness, then followed by mugging and burglary, then with a big gap, forgery and counterfeiting together, and finally receiving stolen goods. This seems to be a reasonable order, and in the current case, we may as well declare that the data show a unidimensional scale of seriousness of the eight criminal acts (Fig. 11.3).

11.3.3 Goodness of Fit The traditional statistic of δ, “the percentage of the total information explained,” is useful in many cases. λj (11.13) δ j = 100 T (in f where λ j is the j-th eigenvalue of Hn . However, since we are dealing with rank orders and our objective is to reproduce input rank orders in the space of the smaller dimensions than the data set requires, a better statistic than the above is desirable. Following Nishisato’s paper (1994), plot normed weights of subjects and projected weights of objects in k-dimensional space (k = 1, 2, ..., K), compute the Euclidean distance between subject i and object j (j = 1, 2, ..., n), rank these distances from the smallest (to be designated as Rank 1) to the largest (rank n), and call these ranks

142

11 Quantifying Dominance Data

Fig. 11.3 Serious criminal acts: Component 1 and 2

of the distances as the k-approximations of subject i’s input (original) ranks. Let us indicate by R∗i j subject i’s recovered rank of object j. Nishisato (1996) proposed two statistics: the sum of squares of rank discrepancies between observed ranks and recovered ranks for each solution, or multiple solutions, for each judge or all the judges, and; the percentage of the above statistic as compared to the worst ranking, that is, the reversed ranking of the observed for each judge (δi j (rank)) or all the judges (δ j (rank)). As for the first set of statistics, it seems more useful to calculate the averages over n objects, thus eliminating the effects of the number of objects, n. With this modification, the first discrepancy measure of subject i for dimension k is: n D (i : k) = 2

j=1 (Ri j

− R∗i j )2

n

(11.14)

The percentage of this statistic as compared to the worst ranking is  100 nj=1 (Ri j − R∗i j )2 δi (rank) = 100 − n−1 2 h=0 (n − 2h − 1)

(11.15)

We can propose yet another statistic, that is, the percentage of the squared discrepancies between the observed ranks and the tied ranks (i.e., all objects being given the same rank), δ ∗i j (rank), that is, the case of no discrimination. n (Ri j − R∗i j )2 δ ∗i j (rank) = 100 − i=1 n n+1 2 j=1 (Ri j − 2 )

(11.16)

11.3 Paired Comparison Data

143

N δ ∗ j (rank) =

i=1

δ ∗i j (rank) . N

(11.17)

11.4 Forced Classification of Ordinal Data We will follow the procedure described in Nishisato (1984). As we recall, rank-order and paired comparison data are first transformed into the matrix E. Let us first discuss the case of rank-order data.

11.4.1 Rank-Order and Paired Comparison Data For rank-order data, the elements of matrix E can be calculated by the de Leeuw formula, namely, ei j = n + 1 − 2Ri j where Ri j is the rank that Subject i gave to Object j out of n objects. Since forced classification is to determine the orientation of an axis, its logical approach to rank order data would be to specify two stimuli, one to occupy one end of the axis and the other for the opposite end. If this strategy is adopted, we can proceed in the following way. Suppose that stimuli (columns of K) p and q were chosen as the criterion variables. Then, multiplying both columns by a constant k means introducing two sets of tied ranks since the multiplication is equivalent to repeating columns p and q “k times each.” Then, let Ri j ,Ri p and Riq be the elements of R, before multiplication by k, j = p = q, and p and q be the criterion variables. Based on the comparisons of Ri j , Ri p and Riq , we define the following set of transformations: ⎧ (1) : ei j = n + 2k − 1 − 2Ri j ⎪ ⎪ ⎪ ⎪ (2) : ei j = n + 1 − 2Ri j ⎨ (3) : ei j = n + 3 − 2k − 2Ri j (11.18) ⎪ ⎪ (4) : e = (n + k − 2R )k ⎪ it tj ⎪ ⎩ (5) : eit = (n + 2 − k − 2Rit )k under the following situations: (1) if Ri j is smaller than Ri p and Riq (2) if Ri j is between Ri p and Riq (3) if Ri j is larger than Ri p and Riq

144

11 Quantifying Dominance Data

(4) where t = min(Ri p , Riq ) (5) where t = max(Ri p , Riq ). The number of responses of a criterion variable is N k(n + 2k − 3) and that of a non-criterion variable is N (n + 2k − 3). Once these variables are chosen, forced classification can be carried out by increasing the value of k. In the case of paired comparison data, we must first obtain the matrix of E. Then, we can carry out forced classification analysis in line with that for rank-order data. Since we do not know how the readers are interested in this problem of creating two extreme values, associated with two objects, we will not take up the space for further discussion on this topic. Those interested in this case, however, please refer to Nishisato (1984, 1994) for detailed descriptions of the procedure.

References Adachi, K. (2000). A random effect model in metric multidimensional unfolding. Japanese Journal of Behaviormetrics, 27, 12–23. (in Japanese). Baba, Y. (1986). Graphical analysis of rank data. Behaviormetrika, 19, 1–15. Carroll, J. D. (1972). Individual differences and multidimensional scaling. In Shepard, R. N., Romney, A. K., Nerlove, S. B. (Eds.) Multidimensional Scaling: Theory and Applications in the Behavioral Sciences, Volume I. New York: Seminar Press. Coombs, C. H. (1964). A theory of data. New York: Wiley. Davidson, J. (1973). A geometrical analysis of the unfolding model: General solutions. Psychometrika, 38, 305–336. de Leeuw, J. (1973). Canonical analysis of categorical data. Doctoral thesis, Leiden University. de Leeuw, J. (1984). Canonical analysis of categorical data. Leiden University: DSWO Press. Gold, E. M. (1973). Metric unfolding: Data requirements for unique solution and clarification of Schönemann’s algorithm. Psychometrika, 38, 555–569. Greenacre, M. J., & Browne, M. W. (1986). An efficient alternating least-squares algorithm to perform multidimensional unfolding. Psychometrika, 51, 241–250. Guttman, L. (1941). The quantification of a class of attributes: A theory and method of scale construction. In the Committee on Social Adjustment (ed.), The prediction of personal; adjustment. (pp. 319–348). New York: Social Science Research Council. Guttman, L. (1946). An approach for quantifying paired comparisons and rank order. Annals of Mathematical Statistics, 17, 144–163. Guttman, L. (1967). The development of nonmetric space analysis. A letter to Professor John Ross. Multivariate Behavioral Research, 2, 71–82. Han, S. T., & Huh, M. H. (1995). Biplot of ranked data. Journal of the Korean Statistical Society, 24, 439–451. Heiser, W. J. (1981). Unfolding analysis of proximity data. Leiden University: DSWO Press. Hojo, H. (1994). A new method for multidimensional unfolding. Behaviormetrika, 21, 131–147. Nishisato, S. (1978). Optimal scaling of paired comparison and rank order data: An alternative to Guttman’s formulation. Psychometrika, 43, 267–271. Nishisato, S. (1980). Analysis of categorical data: Dual scaling and its applications. Toronto: The University of Toronto Press. Nishisato, S. (1984). Forced classification: A simple application of a quantification technique. Psychometrika, 49, 25–36.

References

145

Nishisato, S. (1994). Elements of dual scaling: An introduction to practical data analysis. Hillsdale, N.J.: Lawrence Erlbaum Associates. Nishisato, S. (2007). Multidimensional nonlinear descriptive analysis. London: ChapmanHall/CRC. Okamoto, Y. (1995). Unfolding by the criterion of the fourth quantification method. Journal of Behaviormetrics, 22, 126–134. (In Japanese with English abstract). Schönemann, P. H. (1970). On metric multidimensional unfolding. Psychometrika, 35, 167–176. Schönemann, P. H., & Wang, M. M. (1972). An individual difference model for the multidimensional analysis of preference data. Psychometrika, 38, 275–309. Sixtl, F. (1973). Probabilistic unfolding. Psychometrika, 38, 235–248. Slater, P. (1960). Analysis of personal preferences. British Journal of Statistical Psychology, 3, 119–135. Stevens, S. S. (Ed.). (1951). Mathematics, measurement, and psychophysics. Handbook of Experimental Psychology. New York: Wiley. Torres-Lacomba, A., & Greenacre, M. J. (2002). Dual scaling and correspondence analysis of preference, paired comparisons and ratings. International Journal of Research in Marketing, 19, 401–405. Tucker, L. R. (1960). Intra-individual and inter-individual multidimensionality. In H. Gulliksen & S. Messick (Eds.), Psychological scaling. New York: Wiley. van de Velden, M. (2000). Dual scaling and correspondence analysis of rank order data. Innovations in Multivariate Statistical Analysis, 12, 87–99. Weingarden, P., & Nishisato, S. (1986). Can a method of rank orders reproduce paired comparisons? An analysis by dual scaling (correspondence analysis). Canadian Journal of Marketing Research, 5, 11–18.

Part III

Cautions for Quantification

Chapter 12

Over-Quantification

12.1 Adverse Conditions of Data Let us consider the contingency table and the multiple-choice data. The total information in the data can be expressed by the sum of the eigenvalues associated with the data. We have the statistics (Table 12.1) to govern the quantification for the two data types which may or may not be responsible for unexpected outcomes, depending on your data. Table 12.1 does not seem to give us much warning, except that for the contingency table the number of components one can extract from the data depends on the smaller number of rows and that of columns. This means if there are only two categories in either the rows or the columns, we cannot get more than one component, irrespective of the number of categories of the other set. In other words, whether the data are 2×2 or 2×300 we obtain only one component. On the other side, the number of components from multiple-choice data depends on the average number of categories (options) of all the items. This may give us some comfort in terms of the number of components we can extract from the data. But, multiple-choice data have more serious problems than the contingency table, and we will extensively examine those problems associated with multiple-choice data. The multiple-choice data offer interesting expressions for the optimal quantification over K dimensions, as shown in Table 12.2. Here, we assume that the dimensionality is determined by the columns of the response-pattern matrix (i.e., we have many more respondents than the total number of response options). We also assume that the rank of F = m − n + 1 where m is the total number of options of n items, item j has m j options, and there are f j p responses for option p of item j. Then, our optimal quantification provides us with statistics as given in Table 12.2. We may not see any problems from those characteristics of data as shown in Tables 12.1 and 12.2, but we will be surprised to find out that these characteristics of data or structural components of data can affect the quantification outcomes beyond our expectations, yes beyond even our imagination. To explore those problems, we will use numerical examples. © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 S. Nishisato, Optimal Quantification and Symmetry, Behaviormetrics: Quantitative Approaches to Human Behavior 12, https://doi.org/10.1007/978-981-16-9170-6_12

149

150

12 Over-Quantification

Table 12.1 Total information and the data types Data types Sum of ηk2 n×m contingency table  N × mj multiple-choice Data

χ2 ft

m-1

Number of components min(n,m)-1 

m j − n, or N − 1

Notes m is the average number of options of n items Table 12.2 Sum of squares of weighted responses Sum of squares of weighted responses  Option p of item j f j p x 2j pk =n(N − f j p )  f j p x 2j pk =n N (m j − 1) Item j   f j p x 2j pk =n N ( m j − n) Data set

12.1.1 Future of English in Hong Kong: Tung’s Data Let us concentrate on multiple-choice data. Serious implications of the statistics in Table 12.2 are not easily visible, but we can immediately see the problems once we have a good numerical example. This data set was collected by Peter Tung for Nishisato’s scaling class at the University of Toronto (Note: Peter Tung, a brilliant student, later became a professor of education in Hong Kong). This is Nishisato’s favorite data set to discuss interesting and disturbing structural oddities embedded in the data. For the term paper, Tung prepared the following questionnaire of 11 items: 1. 2. 3. 4. 5. 6. 7. 8. 9.

Your gender: (1 = male; 2 = female) Your school:(1 = MSS; 2 = YLP; 3 = SS; 4 = SKW) Your class: (1 = Grade 10; 2 = Grade 12) After graduation, will you use English to talk to others? (1 = most definitely; 2 = very likely; 3 = likely; 4 = unlikely; 5 = don’t know) After graduation, will you use English to talk to other at home? (1 = yes, 2 = no; 3 = don’t know) After graduation, will you use English to talk with others at school gatherings? After graduation, will you use English to talk with others at your job? (1 = yes, 2 = no; 3 = don’t know) After graduation, will you use English to talk with others n pursuing your future studies? (1 = yes; 2 = no; 3 = don’t know) Please give a rough estimate of the total family income? (1 = 10,000 dollars or more’ 2 = between 5,000 and 9.999; 3 = between 3,000 and 4,999; 4 = between 1,500 and 2,999; 5 = below 1,500 dollars).

12.1 Adverse Conditions of Data

151

10. How many people have you counted in Question 9? (1 = one person; 2 = two persons; 3 = three persons; 4 = four or more persons) 11. Would you say the main wage earner f your family is (1 = white color; 2 = blue color workers)? The data were obtained from 50 students, using this questionnaire. To save the space, the data are shown in terms of chosen option numbers, as shown in Table 12.3. Remember that the data must be transformed into the table of (1, 0) response patterns to be quantified. There is one note on this data set. Although item 8 has three choices, none of the 50 subjects chose option 3 of the item. Therefore, we treat item 8 to have only two options. Let us first look at the information distribution over response options and items. We will pay particular attention to the following aspects: • The effects of the number of options. • The effects of response frequencies. The relevant statistics are summarized in Table 12.4, where SS j p is the sum of squares of option p of item j and SS j is the sum of squares of item j. Since the conditions of the data affect the optimal option weights, we also list the frequencies and the numbers of options in this table. Let us first look at the amount of information carried by individual options. We can clearly see that it is inversely related to the popularity (choice frequency) of the option. See the option 2 of item 8, the option 2 of item 7, the option 2 of item 6 and option 4 of item 4 (these are in the boldfaced), which are contributions, showing unusual amounts. Why do unpopular options contribute so outstandingly? This goes totally against our common sense expectation, but it is because of our optimization mathematics; a counter-intuitive outcome, isn’t it? But, it is real. Look at the contrasting results on Item 2, where the distribution of response frequency over options is constant at 10. All the options of Item 2 are equally popular, and contribute equally to the amount of information we extract. This is an expected outcome. It is clear from Table 12.4 that the item information is proportional to the number of options of the items. This information is very valuable for the questionnaire construction. The lesson from this result is that the test constructors should try to use the same number of response option for all the questions so long as it is feasible. But, remember that the larger the number of response options the larger the contribution, independently of how important the item is! How about scores and option weights? See Table 12.5. At this moment, those scores and option weights are not very interesting but look at those numbers in boldface. There seem to be something very unusual. What is it? The information about optimal scores and option weights is not easy to understand by themselves. So, let us also include the information of relative sums of squares

152

12 Over-Quantification

Table 12.3 PeterTung’s data: subjects-by-items Student 1 2 3 4 5 6 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39

2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 1 1 1 2 2 2 2 2 1

1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 3 4 4 4 4 4 4 4 4 4

1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 1 1 1 1 1 2 2 2 2

1 1 1 2 1 2 1 1 2 2 1 2 2 3 2 1 3 1 1 4 3 3 5 2 2 1 2 1 1 5 1 2 1 1 1 1 1 1 1

2 1 2 2 2 2 2 3 2 2 2 1 3 2 2 2 1 2 2 2 2 2 2 2 1 3 2 1 3 2 1 3 3 3 1 2 1 3 3

1 1 1 2 1 3 1 3 1 1 3 1 1 3 3 3 3 1 1 2 1 1 1 1 1 1 1 1 1 3 1 3 1 1 1 1 1 1 1

7

8

9

10

11

1 1 1 1 1 1 1 1 1 1 1 1 1 1 3 1 3 1 1 2 1 1 1 1 1 1 1 1 1 3 1 1 1 1 1 1 1 1 1

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

2 1 1 2 1 2 2 2 2 2 4 2 5 4 1 5 5 2 2 2 4 3 3 2 2 3 3 4 2 2 2 3 5 3 3 3 3 2 3

2 1 1 2 3 2 1 2 1 1 2 1 1 2 3 1 2 1 1 4 3 2 2 1 3 4 3 1 4 4 3 2 1 2 2 2 1 2 1

1 1 1 1 1 1 1 1 1 1 2 1 2 2 2 2 2 1 1 1 2 2 2 1 2 1 2 2 1 1 1 1 1 1 1 1 1 1 2 (continued)

12.1 Adverse Conditions of Data Table 12.3 (continued) Student 1 2 40 41 42 43 44 45 46 47 48 49 50

1 2 1 2 1 2 2 2 2 2 2

4 5 5 5 5 5 5 5 5 5 5

153

3

4

5

6

7

8

9

10

11

2 2 2 2 2 2 1 1 1 1 1

2 3 2 3 2 1 1 1 1 3 2

2 2 2 2 2 1 1 1 3 2 2

1 1 1 1 1 1 1 1 1 1 1

1 1 3 1 1 3 1 1 1 1 1

1 1 1 1 1 1 1 1 1 1 1

1 3 4 4 5 3 2 2 4 4 5

2 4 2 3 1 3 3 3 3 3 1

1 2 2 1 2 1 1 1 2 2 2

of items, squared item-total correlations and item-total correlations, which are all statistics to contain the amount of information of the items (Table 12.6). Since there are many numbers, it is difficult to see any abnormal aspects in the results. But, let us start with scores for the subjects. Subject 20 has the score of 2.96, which is outstandingly high. How about the weights of the options that only subject 20 chose? The answers are 5.44 for option 4 of item 4, and 5.44 for option 2 of item 7, both of which are again outstandingly high. There is another option that the same subject chose and the only one more person chose it, that is, option 2 of item 7, which has the weight of 3.60, the highest score after 5.44. Then, we can look at those items which involve those highly weighted options, that is, items 4, 7 and 6. The sums of squares of those items are 116.23, 110.48 and 96.51, respectively, and they are outstandingly large. Since the squared item correlation is proportional to the sum of squares of the scores, the same results apply to the squared item-total correlation. What can we conjecture from the above observations? An immediate idea is that component 1 must have been very heavily influenced by the responses from subject 20, more specifically, by the three responses from subject 20. In the current example, there are 50 subjects and 11 questions, thus altogether 550 responses in this data set. What will happen if we discard those three responses from subject 20, that is, three responses out of 550? Would the result remain more or less the same? Definitely not! This should come as a surprise because 3 out of 550 is only about half a percent, and those three responses appear to have controlled the extraction of the most major component!

154

12 Over-Quantification

Table 12.4 Option, frequency, option, weight, item Item Option Frequency SSjp* 8 1 11 3

7

6

5

10

4

9

2

2 1 1 2 2 1 1 2 2 3 1 2 3 1 1 3 2 4 3 1 2 4 5 3 2 1 1 5 4 3 2 1 2 3 4 5

2 48 15 35 19 31 25 25 1 5 44 2 9 39 10 10 30 5 12 15 18 1 2 7 15 25 5 6 8 12 19 10 10 10 10 10

528 22 385 165 341 209 275 275 539 495 66 528 451 121 440 440 220 495 418 385 352 539 528 473 385 275 495 484 462 418 341 440 440 440 440 440

Weight

SSj

−2.08 0.09 −1.21 0.52 −1.34 0.92 0.31 −0.3 10.05 −1.38 −0.07 6.65 −0.93 −0.13 −0.39 −0.09 0.16 2.64 −0.52 −0.29 −0.14 10.05 −0.07 −1.41 −0.25 0.15 0.03 −1.65 −1.42 −0.28 1.29 1.11 −0.06 −0.06 −0.13 −0.86

550 550 550 550

1100

1100

1100

1650

2200

2200

2200

12.1 Adverse Conditions of Data

155

Table 12.5 Optimal scores and option weights of component 1 Ss yi Ss yi j jp x jp 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

0.37 0.20 0.25 0.95 0.22 0.21 0.30 0.22 0.27 0.28 −0.46 −0.05 −0.48 −0.61 −0.52 −0.44 −0.94 0.10 0.25 2.95 −0.35 −0.21 −0.29 0.23 −0.05

26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50

0.30 −0.20 −0.30 0.44 0.25 0.02 −0.17 −0.20 0.10 0.07 0.06 0.00 0.18 −0.33 −0.10 −0.09 −0.62 −0.29 −0.53 −0.21 0.16 0.11 −0.31 −0.43 −0.32

1 1 2 2 2 2 2 3 3 4 4 4 4 4 5 5 5 6 6 6 7 7 7 8 8

1 2 1 2 3 4 5 1 2 1 2 3 4 5 1 2 3 1 2 3 1 2 3 1 2

−0.66 0.28 0.60 −0.04 −0.04 −0.07 −0.46 0.17 −0.17 0.08 −0.14 −0.77 5.44 −0.04 −0.21 0.09 −0.04 −0.07 3.60 −0.50 −0.04 5.44 −0.75 0.05 −1.13

j

jp

x jp

8 9 9 9 9 9 10 10 10 10 11 11

3 1 2 3 4 5 1 2 3 4 1 2

0.00 0.02 0.70 −0.15 −0.77 −0.90 −0.16 −0.07 −0.28 1.42 0.45 −0.73

Notes s = subject; yi = score of subject i; j = item j j p = option p of item j; x j p = optimal weight for option p of item j

Since the captured structure of the data by a particular component is well reflected by the inter-item correlation matrix for the component, let us compare the two interitem correlation matrices, one from the above analysis with complete data (550 responses), and one from the analysis without those three responses of subject 20 (547 responses). The following are the results of this comparison (Tables 12.7 and 12.8). The two correlation matrices are quite different, especially with respect to items 4, 7 and 6. If we extract these items from the above matrices, we obtain the following (Table 12.9). It is unbelievable that only half a percent of responses in the entire data set could exert this much influence on the summary statistic, that is, a statistic obtained by averaging (in this case, averaging cross products of standardized scores).

156

12 Over-Quantification

Table 12.6 SS(j), r 2jt and r jt Item(j)

SS(j)

r 2jt

r jt

1 2 3 4 5 6 7 8 9 10 11

31.39 20.03 4.77 116.23 2.33 96.51 110.48 9.08 65.20 38.99 54.99

0.18 0.12 0.03 0.68 0.01 0.57 0.65 0.05 0.38 0.23 0.32

0.43 0.34 0.17 0.83 0.12 0.75 0.81 0.23 0.62 0.48 0.57

Table 12.7 Full Data (550 Responses) 1 100 2 20 100 3 13 00 100 4 11 07 14 100 5 05 12 −06 02 6 21 13 23 65 7 15 04 19 92 8 09 02 −20 10 9 35 45 02 28 10 19 −06 −22 38 11 30 40 04 26 1 2 3 4

100 12 12 11 −02 12 −12 5

100 69 08 25 26 19 6

100 10 19 38 15 7

100 17 03 26 8

100 20 66 9

100 14 10

100 11

The discrepancies between the results based on 550 responses and those based on 547 responses are mind-boggling. The fact that scores of subjects from component 2, based on 550 responses, are similar to those of component 1, based on 547 responses (to be shown shortly), suggests that only a tiny fraction of the data set, that is, those three responses of subject 20, must have overwhelmingly contributed to the creation of component 1 of the complete data of 550 responses. This is unbelievable!

12.2 Standardized Quantification

157

Table 12.8 Reduced Data (547 Responses) 1 100 2 36 100 3 13 00 100 4 03 32 03 100 5 −11 15 −06 38 100 6 38 22 20 17 −02 7 18 34 15 20 12 8 09 16 −20 35 03 9 38 44 01 31 −03 10 −09 −03 −22 15 12 11 30 51 04 45 12 1 2 3 4 5

100 31 17 19 03 20 6

100 25 06 04 11 7

Table 12.9 Correlation from full (F) and reduced (R) Data Item F Item 4 7 6

1.00 0.92 0.65 4

1.00 0.69 7

1.00 6

4 7 6

100 15 06 26 8

100 03 66 9

100 12 10

100 11

R 1.00 0.20 0.17 4

1.00 0.31 7

1.00 6

12.2 Standardized Quantification Considering that the option frequencies and the number of options have great impacts on the amount of information in data, Nishisato (1991, 1993) proposed “standardization” of categorical data. This is a novel idea, but it is really worth investigating, considering that these two statistics can control the outcome of quantification. Torgerson (1958) called quantification theory (QT) as principal component analysis of categorical data. As we know, the idea of principal component analysis (PCA) was first presented by Pearson (1901), but it is usually considered that Hotelling (1933) proposed it (Note: Hotelling was also the inventor of canonical correlation and also a very well-known economist. The author of the current book was lucky enough to take Hotelling’s mathematical economy course at the University of North Carolina at Chapel Hill). As we know, PCA is a well-established statistical procedure, and from Torgerson’s statement, PCA and QT are very similar, or more specifically speaking, PCA is for continuous data, while QT is PCA for categorical data as Torgerson correctly stated. Once we know this similarity, we should also pay attention to the fact that PCA is typically used for two types of continuous data, one for raw data and the other for standardized data. As we know well, some data are purely quantitative (ratio

158

12 Over-Quantification

measurement in Stevens’ terminology Stevens 1951), but most data particularly in the social sciences are quasi-quantitative (interval measurement). PCA has been applied to ratio measurement without any transformation of the input data, while it is usually the practice that we standardize social science data prior to submitting the data to PCA. This standardization is desirable because social science data are often not purely quantitative ( e.g., Intelligence quotients do not have the rational origin: IQ of 0 does not mean “zero intelligence; most test scores do not have the origin of zero so that we cannot say” student A’s score of 80 means A is twice as intelligent as student B who obtained the score of 40). So, with the absence of the rational origin, we typically standardize the data prior to submitting them to PCA. This standardization has thus been used to increase the comparability between data on different variables. However, one should be warned about standardization in multivariate analysis. The case in point is the fact that the results of PCA of the original data may be totally different from those of PCA of the standardized data. This is so because the orientation of principal axes depends on the clouds of data. Unfortunately or not, the problem of orientation of principal axes has not always been one of the problematic conditions in data analysis. But, it is true as demonstrated by Nishisato and Yamauchi (1974) that the principal components from the correlation matrix (standardized variables) are often totally different from those from the variancecovariance matrix (original variables). One of the implications of their findings is that standardization of input data can change the principal component structure of the data in an unexpected fashion. This poses a great doubt on the search of the data structure. Should we standardize the data? Quantification theory does not use standardization, but what will happen if we introduce standardization into quantification theory? Nishisato (1991) explored this problem of standardization for quantification theory. Unlike the situation for PCA, we have one complication as we observed earlier: we must consider standardization of two variables, namely, the number of options and the frequencies of options. Following Nishisato, let us discuss these two tasks of standardization:

Item Standardization Recall that the sum of squares of item j, SS( j), is proportional to the number of options of the item, SS( j) =

mj m−n  k=1 p=1

f j p x 2 j pk = n N (m j − 1)

12.2 Standardized Quantification

159

Thus, it is not advisable to introduce an excessively large number of options (e.g., 10) to the data set when most items have a much smaller number of options (e.g., 3). It is obvious what may happen if one item has 12 options and 9 other items have three options each: the 12-option item is likely to dominate in many components. This is another example why it is important to equate contributions of items to the total information by using the same number of options. The number of options also has an implication for what degrees of nonlinear relations it can capture. If the number of options is two, we cannot capture a nonlinear relation; if the number is three, we can capture linear and quadratic relations; with four options, we can capture linear, quadratic and quartic relations, and so on. Thus, the number of options minus one is the degree of a polynomial relation, up to which we can capture nonlinear relations. We see a conflict here regarding the number of options. We need an appropriate number of options to capture a variety of nonlinear relations, and at the same time, an increase in the number of options may create too much information to analyze in practice. Since the latter is a more crucial problem than the former, the general consensus seems to use the same number of options for all questions and this number should not be too large, say between 3 and 7. In this standardization, the contributions of items are equated, no matter how many options the items have. This standardization can be achieved by dividing the corresponding columns of B by the square root of SS j , where 

SS j =



n(N (m j − 1)

(12.1)

12.2.1 Option Standardization In this standardization, we set the contributions of all options equal in total space. Recall the matrix B, which is defined as −1

− 21

B = Dr 2 FDc

(12.2)

We divide the columns of matrix B by the square root of SS j p , which is 

SS j p =

 n(N − f j p )

and we then subject the resultant matrix to quantification.

(12.3)

160

12 Over-Quantification

12.2.2 Results of Standardization Nishisato (1991) analyzed Tung’s data (see also Nishisato (1996)) and obtained the following results in Table 12.10. For the purpose of comparisons, the results consist of extracting the first two components from the complete data, and the results from the two modes of standardization. The columns of the table are 1: subjects, 2: their scores on component 1 (C1), 3: scores on component 2 (C2), 4: scores from option standardization (OpS) of component 1, 5: scores from item standardization (ItS). What we see in this table are a few remarkable revelations. • Once we standardize the analysis, be it for options or items, the distribution of subjects’ scores changes dramatically, and the standardized components no longer resemble those of component 1 from ordinary quantification. • Standardized components, however, look very much like component 2 from ordinary quantification. • Nishisato (1991) tentatively concluded that standardization has the effect of suppressing the effects of outlier responses. Recall our earlier discussion in which we concluded as follows: What will happen if we discard those three responses from subject 20, that is, three responses out of 550? Would the result remain more or less the same? Definitely not! This should come as a surprise because 3 out of 550 is only about half a percent, and those three responses appear to control the extraction of the most major component!

And the deletion of three responses from one subject yielded results which is similar to Component 2 of the entire data set, and the result from the deletion of three responses is now similar to those obtained by item and option standardizations! • Unfortunately, however, standardization destroys optimal properties (i.e., as compared from ordinary quantification over such key statistics as η2 , α, θ ). This leaves a number of questions unanswered but seems to suggest that standardization is worth further investigation. We have shown that standardization of option frequencies and/or the number of options of items can suppress the outlier effects on the outcomes, but considering that standardization may lose the initial optimal properties, we really do not know what other advantages may exist in standardization (Nishisato 2007).

12.3 Handling Outlier Responses Now that we have encountered outlier responses in a data set, we should also mention that there have been some studies on how to eliminate outliers in data processing, or more generally, robust quantification. We will briefly mention some of the techniques for further explorations.

12.3 Handling Outlier Responses

161

Table 12.10 Scores of original and standardized data Subject C1 C2 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 05 36 37

0.69 0.36 0.45 1.76 0.41 0.38 0.56 0.41 0.49 0.52 −0.85 −0.09 −0.89 −1.12 −0.96 −0.82 −1.73 0.18 0.47 5.44 −0.65 −0.40 −0.53 0.43 −0.09 0.55 −0.37 −0.55 0.81 0.46 0.04 −0.31 −0.37 0.18 0.13 0.12 0.00

−0.81 −1.20 −1.00 0.12 −0.91 −0.16 −0.93 −0.68 −0.69 −0.59 1.17 0.63 0.63 1.71 1.53 1.10 2.36 −0.04 −0.29 3.37 0.69 0.35 1.09 −0.29 −0.12 −0.60 0.02 −0.15 −0.54 1.29 −0.86 −0.32 −0.70 −1.22 −1.14 −0.95 −1.25

OpS

ItS

−1.32 −1.40 −1.32 −0.98 −1.28 −0.47 −1.36 −0.94 −1.06 −0.98 1.15 0.81 0.81 1.62 1.66 1.06 2.85 −0.43 −0.89 −0.47 0.21 0.04 0.77 −0.81 −0.47 −1.15 −0.13 −0.30 −1.28 0.38 −0.98 −0.09 −0.55 −1.32 −1.23 −1.06 −1.28

−1.03 −0.92 −0.92 −1.32 −0.94 −0.36 −0.89 −1.37 −0.75 −0.75 1.53 1.40 1.40 1.79 1.87 1.51 3.35 0.03 −0.67 −2.29 0.36 0.25 1.34 −0.70 −0.06 −0.81 0.22 0.28 −0.92 0.14 −0.25 0.39 0.17 −0.78 −0.75 −0.61 −0.61 (continued)

162

12 Over-Quantification

Table 12.10 (continued) Subject C1 38 39 40 41 42 43 44 45 46 47 48 49 50

0.34 −0.60 −0.19 −0.17 −1.14 −0.52 −0.98 −0.38 0.30 0.20 −0.56 −0.79 −0.59

C2

OpS

ItS

−1.17 −0.71 −0.48 0.63 1.16 0.31 0.57 −0.28 −0.53 −0.73 −0.13 0.70 0.33

−1.32 −0.30 −0.30 0.09 1.26 −0.17 −0.72 −0.47 −1.11 −1.23 −0.34 0.30 0.17

−0.78 0.73 0.17 0.22 1.56 −0.77 1.26 −0.14 −0.86 −0.86 0.08 0.39 0.45

12.3.1 The Method of Reciprocal Medians: MRM This method was proposed by Nishisato (1984a, b), along the line of the method of reciprocal averages. As is clear from the name, MRM substitutes the averages with the medians. It is well known in statistics that the median is much less affected than the mean by extreme values in the data. Therefore, it is a natural consequence to propose MRM for data with extreme values, in particular very small option frequencies. By replacing the mean operation with the median operation, after each operation, we set the origins and the units are adjusted as y F1 = 0; 1 Fj xj = 0, where Fj is the N × m j incidence matrix for item j, and xj = (x j 1, x j 2, · · · , x j,m j ) and y"Dr y = x Dc x = n N .

With these adjustments, the MRM determines the vectors y and x in such a way that the following relations of reciprocal medians hold: ρyi = Mdn( j)[ f i, jk x jk ] ρx jk = Mdn(i)[ f i, jk yi ]

12.3 Handling Outlier Responses

163

where “Mdn” is the median operator working only on f 1, jk = 1. The idea of using the median to avoid the effects of extreme values is very innovative, but Nishisato (1984b) provides the following warnings: • As compared with MRA, MRM provides a smaller value of ρ 2 . • The iterative process of MRM does not often converge to the solution but oscillates in an endless way. • The choice of the initial arbitrary vector often changes the results. Therefore, the idea is quite appealing, but the method must be further improved with a number of intermediate decisions to steer the process to convergence.

12.3.2 Alernating Reciprocal Averaging and Reciprocal Medians Nishisato (1987) tried this idea but found that almost always the method reaches two stable points, hence the method cannot be recommended for use.

12.3.3 Method of Trimmed Reciprocal Averages This method (Nishisato 1984b) is a better alternative to the above two methods, in which so many (a) extremely small values and so many (b) extremely large values are omitted from averaging up to half the number of observations, that is, the case of MRM. In his demonstration, a = b = 1 has always worked, and the process converged. So, this may be an answer to trimming the influence of outlier responses. This remains, however, to be proven.

References Hotelling, H. (1933). Analysis of a complex of statistical variables into principal components. Journal of Educational Psychology, 24, 417–441 and 498–520. Hotelling, H. (1936). Relations between two sets of variables. Biometrika, 28, 321–377. Nishisato, S. (1984a). Dual scaling by reciprocal medians. Proceedings of Estratto Dagli Atti Della XXXII Riunione Scietifica, Sorrento, Italy, 141–147. Nishisato, S. (1984b). Supplementary handout for the above paper. Nishisato, S. (1987). Robust techniques for quantifying categorical data. In MacNeil, I. B. & Umphrey, G. J (eds.) Foundations of Statistical Inference (pp. 209–217). Nishisato, S. (1991). Standardizing multidimensional space for dual scaling. Proceedings of the 20th Annual Meeting of the German Operations Research Society (pp. 584–591). Hohenheim University. Nishisato, S. (1993). On quantifying different types of categorical data. Psychometrika, 58, 617–629. Nishisato, S. (1996). Gleaning in the field of dual scaling. Psychometrika, 61, 559–599.

164

12 Over-Quantification

Nishisato, S. (2007). Multidimensional nonlinear descriptive analysis. Boca Raton: Chapman & Hall/CRC. Nishisato, S., & Yamauchi, H. (1974). Principal components of deviation scores and standardized scores. Japanese Psychological Research, 16, 162–170. Pearson, K. (1901). On lines and planes of closest fit to systems of points in space. Philosophical Magazines and Journal of Science, Series, 6(2), 559–572. Stevens, S. S. (1951). Mathematics, measurement, and psychophysics. In S. S. Stevens (Ed.), Handbook of experimental psychology. New York: Wiley. Torgerson, W. S. (1958). Theory and methods of scaling. New York: Wiley.

Chapter 13

When Not to Analyze Data

13.1 Missing Responses and Quantification For general data analysis, there is an excellent exposition of different approaches to the handling of missing responses (Rubin 1987), and the interested readers are referred to Rubin’s book. In quantification analysis, the problem is slightly different from the situation with ordinary data analysis. For instance, when a response is missing for a five-option question, our focal point is to identify which option the respondent would have chosen. So, in this chapter, we will consider multiple-choice data, often used in medical checks, census data and all kinds of surveys, namely a familiar type of data. Most of us have experience in answering such survey questions. Have you not seen such outrageous questions that you would rather not answer them or too personal questions that you would rather not respond? We often see such data that “missing responses” can tell us something meaningful for analysis, and we also see totally random missing responses. In this chapter, we are talking about the outcome of such experiences, the results of which are missing responses in data sets. Most authors of textbooks would rather avoid discussing such missing responses that may not be worth analyzing, but rather concentrate on useful methods of data analysis. Nevertheless, there are a ton of data with many missing responses, to the extent that you may wish to abandon data analysis. Thus, missing responses offer a reasonable topic to discuss. Our problem is how to handle missing responses. There are other concerns about poor designs for data-collection strategies (e.g., unbalanced designs, non-random designs, lack of control groups, sample size, sampling failures), but we will leave these problems to textbooks on data analysis. Instead, in this chapter, we are concerned with the situation when subjects did not respond to many questions for whatever the reasons may be. Imagine that the questionnaires are returned mostly with blanks. What can we do then? How many missing response should we accept to carry out data analysis? So, this is a problem of decision-making on when to or not to analyze the data.

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 S. Nishisato, Optimal Quantification and Symmetry, Behaviormetrics: Quantitative Approaches to Human Behavior 12, https://doi.org/10.1007/978-981-16-9170-6_13

165

166

13 When Not to Analyze Data

13.2 Some Procedures In ordinary setting of research, we typically have a statistical model for data analysis, and a model typically offers a hint on how to handle missing data. But, in quantification theory, we do not have a statistical model and our analysis is totally data-dependent. One would probably come up with the idea of creating an extra category to accommodate missing responses (e.g., if a question has four response options, we create option 5 for missing responses), but as we saw in Tung’s data in Chap. 12, this extra option can become a cause of outrageously powerful source for the outlier effect. If there are too many missing responses, on the other hand, the extra option will again become a cause of concern because then the missing responses will totally be ignored in optimization. The absence of a statistical model adds an extra problem for dealing with missing responses. Even so, we must do something if we confront many missing responses. One of the most legitimate questions is when we should abandon data analysis. When can we decide that it is not worth analyzing data with missing responses? Rather than dealing with this negative attitude of deciding not to analyze data, let us look at a simple example in which ten subjects responded to three multiple-choice questions and one subject missed answering one question (Table 13.1). In this table, Subject 1 missed the third item with four options. Suppose we replace this subject’s missing response to item 3 (i.e., (* * * *)) with the following responses and calculate the correlation ratios (η2 ), we obtain Table 13.2. Here, we can see how the final result, reflected on η2 , changes to the different imputations. Imagine what will happen if we extend a similar table to many more missing responses. We will soon realize the following dilemma. In our quantification strategy, we wish to identify the optimal option out of many options (alternatives of the question)

Table 13.1 A sample data with one missing response Subject * Q1 Q2 1 2 3 4 5 6 7 8 9 10

1 0 0 0 1 0 1 0 0 0

0 1 1 0 0 0 0 1 1 1

0 0 0 1 0 1 0 0 0 0

0 0 1 0 0 1 0 0 0 1

0 1 0 1 1 0 0 1 1 0

Q3 1 0 0 0 0 0 1 0 0 0

* 0 0 0 1 1 0 0 0 0

* 0 0 0 0 0 0 1 0 1

* 1 0 1 0 0 0 0 1 0

* 0 1 0 0 0 1 0 0 0

13.2 Some Procedures

167

Table 13.2 Imputed responses for Subject 1, Item 3 Imputation Pattern 1 2 3 4 5 6 7 8 9 10

1 0 0 0 0 0.25 0.25 0.25 0.02

0 1 0 0 0 0.25 0.25 0 0

0 0 1 0 0 0.25 0 0 0

η2 0 0 0 1 0 0.25 0.50 0.75 0.98

0.7650 0.7379 0.6847 (minimum) 0.8101 (maximum) 0.8048 0.7060 0.7553 0.7884 0.8079 0.7415

Imputation 10 = Subject 1 is deleted from data set

for response imputation. This sounds like a good strategy. But, under this strategy, the more responses missing, the better data set we can create! For, by replacing missing responses with optimal responses, we can expect for sure that the best outcome will arise when all the responses are missing! We do not need any data to create perfect data under this imputation scheme! This is a temptation to expand our application of “optimal analysis” to missing data, but this strategy is definitely wrong. When there are only a few missing responses, the above strategy of replacing missing cells with the best responses is so tempting, that is, to fill in the vacant places with optimal responses. There must have been a number of publications along this line of imputations, but as mentioned above this strategy leads to a disaster at the very end: “no responses, the perfect case”!

13.2.1 List-Wise Deletions When we deal with missing responses in quantification, we can immediately think about three approaches. Assuming that we are dealing with multiple-choice data, where each question has several response options, the first idea for dealing with missing responses is the list-wise deletion, where we delete rows and columns of the data matrix which contain missing responses. This can be a rather expensive way of handling missing responses, for, in this way, we may discard most of the data. It is a conservative, but not general enough strategy.

168

13 When Not to Analyze Data

13.2.2 Extra Categories This strategy was briefly mentioned above and at the first glance it is quite appealing. When many subjects miss answering a five-option question, we create the option 6 to accommodate missing responses. Depending on the structure of data, this strategy may reveal that the reason for missing the question might be related to their responses to some other questions. But again, it can happen as we mentioned above that if only one or two subjects miss that question it may lead to the outlier-response situation.

13.2.3 Imputation In this strategy, we fill in a missing cell with an “appropriate” response whatever it may be. Depending on this appropriateness, this strategy can be promising. But, how should we define “appropriateness.” Out of these strategies, the first strategy is widely used, but it wastes so much information that it does not seem to be advisable. The second strategy sounds really good, because when the number of options is 5 we create option 6 to accommodate a missing response, and this option will be included in quantification. A real problem is that missing responses might contribute to outliers in the data set. Recall the example of Tung’s data in Chap. 12: when there are very few missing responses, those extra options can become major contributors to the information in data. The third procedure of imputation is one that has been most extensively investigated and has produced such well-known procedures as the EM-algorithm and hot-deck procedures. So, let us look at this third strategy more closely.

13.3 Imputation Principles There are a number of ways to impute responses for missing cells. Let us classify them as principles.

13.3.1 Principle of Maximal Internal Consistency The task is to identify which option of a missed item (question) should get a response in order to maximize the internal consistency of quantified results and to impute a response for that option. Along this line, a number of publications emerged: In Leiden, van Buuren and van Rijckevorsel (1991); In Toronto, Nishisato and Levine (1975),

13.3 Imputation Principles

169

Chan (1978), Ahn (1991) and Nishisato and Ahn (1995). But recall the phrase: no responses, the best results! under this scheme.

13.3.2 Hot-Deck Principle Each missing response is allocated to the option chosen by those whose response patterns are similar to that of the person who missed the response. This strategy sounds good, but at the same time, it is not difficult to imagine that the hot-deck principle can create a case of unreasonably inflated internal consistency as the number of missing responses increases. Within this framework, van Buuren and van Rijckevorsel (1991) presented a fast algorithm, based on the least squares principle, to identify “reasonable” replacements for missing responses which maximize the consistency of the completed data. But, is the strategy of maximizing the consistency what we want?

13.3.3 Principle of Complete Ignorance There are a few ways to implement ignorance. The first one is to use (0, 0, 0, 0) for the four-option missing case. The second way is to replace (0, 0, 0, 0) with the response pattern (0.25, .25, 0.25, 0.25). This, of course, will reduce the internal consistency, thus, the eigenvalue. The third one is to replace the missing response with a pattern, randomly chosen from (1, 0, 0, 0), (0, 1, 0, 0), (0, 0, 1, 0) and (0, 0, 0, 1). This approach may make some investigators uneasy, for optimal quantification often capitalizes on chance variations, especially when the size of data is small in comparison with the number of missing responses. The fourth approach is list-wise deletion, which has already been mentioned. No matter which principle we may choose, handling missing responses does not seem to be ever satisfactory. Therefore, a group of researchers in Toronto (Nishisato; supervisor: Levin; researcher: Ahn, Poon; graduate students) changed the research direction completely, and investigated the decision rule for when not to analyze the data. This became the topic of Ahn’s Ph.D. thesis (1995), and she did all the simulations to arrive at a tentative decision rule for when not to analyze the data. But, considering a large number of computations some 30 years ago, the results are only satisfactory as the first step for practical applications. To make the procedure readily usable, we must await for modern technology and computational wisdom.

170

13 When Not to Analyze Data

13.4 Decision Rules: When Not to Analyze Given a data set with missing responses, one can consider the imputations of missing responses in two extreme ways: • To impute responses in such a way that the eigenvalue be a maximum. • To impute responses in such a way that the eigenvalue be a minimum. Our simple decision-making is “When there is a significant difference between the maximum eigenvalue and the minimum eigenvalue, we should abandon the data analysis.” This decision rule is very reasonable because it means that depending on how to impute responses for missing cells, the outcome may be significantly different. Therefore, if the difference is significant, we should abandon data analysis. For this significance testing, however, we must generate sample distributions of the minimum and the maximum eigenvalues. A huge simulation study was carried out to obtain the two sample distributions under different percentages of missing responses. The purpose of the simulation was to find the minimum percentage of missing responses when the maximum eigenvalue becomes significantly different from the minimum eigenvalue. That percentage is the critical value for decision making for not to analyze the data. The critical percentage of missing responses depends on data structure and the artificial data were constructed so as to be able to control the population. The question remains how to specify the population from which our data came from as a random sample. Out of a large number of computational results, Nishisato and Ahn (1991) offer tentative observations. • When to give up data analysis: The critical percentage, at which two 95% empir2 2 and ηmin cease to overlap each other, changes as a ical confidence intervals for ηmax function of latent (date-generating) inter-item correlation, data sizes and the number of options, from 11.5% to 25.4%. This critical value tends to become larger as the number of options becomes smaller and the inter-item correlation gets larger. As a general conservative rule, one should be concerned with the validity of results when more than 11% responses are missing. • When to delete a single item: The critical percentage, at which maximal and minimal item-total correlations induced by different imputations become substantially different, is also affected by the data size, the number of options and the latent inter-item correlation, but not as systematically as the above case of the first recommendation. A general rule one can draw from simulations is that when the number of missing responses for a single item exceeds 5% one should seriously consider discarding this item from analysis.

13.4 Decision Rules: When Not to Analyze

171

• When to discard a subject: The critical percentage of missing responses for a subject, at which the maximal and the minimal scores of the subject due to imputations reveal a substantial difference, is again affected by the parameters discussed above, but not so systematically as the case of the first problem. The general guideline is that one should consider discarding a subject who misses more than 11% of responses. Of course the above results are only examples of a single study, and some more theoretical work is needed. The above work was introduced here as a starting point towards a satisfactory procedure to deal with our question: when not to analyze the data.

13.5 Towards a State-of-the-Art Framework We still have a long way to establish a workable method for a sensible decision on when to give up quantification analysis. In terms of what we have discussed here, we would still like to develop a dream-come-true or state-of-the art method for handling our problem. It would be wonderful if we can develop a sampling scheme such that our given data set is considered to be the population, from which we randomly generate sample data sets with a given percentage of missing responses. Then, we will be able to 2 generate sampling distributions to construct the confidence intervals, one for ηmax 2 and one for ηmin . By changing percentage of missing responses, we will be able to arrive at the right decision. Such a computer program that handles this specific problem is possible, and it will give us a right answer to our question on when not to analyze the data. Such a program is urgently needed. It is a wish of the current author that a younger generation of researchers will develop such a technique as stated above. It will open up a way to handle quantification tasks whether or not missing responses are overly involved in our data sets. It is the view of the current author that any software programs for data analysis should be equipped with the capability of judging whether or not the data in hand are worth analyzing. This will definitely be one way to contribute to today’s problems with a large number of missing responses in survey research. This is the last chapter of the book. It is hoped that the current coverage of the topics of quantification theory has offered a bridge to other relevant problems in data analysis. Other than handling of missing responses, we need urgently computer programs or appropriate devices, which are capable of grasping multidimensional configurations of data clouds or summarizing information scattered over multidimensional space. If this is not possible, a substitute can be a sequential procedure of decision-making on

172

13 When Not to Analyze Data

the skeleton of multidimensional configuration of data, whereby we can summarize stepwise information retrieval. Data we collect are generally so rich in information that we may call them goldmine, one reason why we call data analysis as data-mining. It is not garbage in, garbage out, but it should be gold mining.

References Ahn, H. (1991). Effects of missing responses in multiple-choice data on dual scaling results. Ph.D. Thesis, University of Toronto. Chan, D. (1978). Treatment of missing responses by optimal scaling. Master’s thesis, University of Toronto. Nishisato, S., & Ahn, H. (1995). When not to analyze data: Decisioin making on missing responses in dual scaling. Annals of Operations Research, 55, 361–378. Nishisato, S., & Levine, R. (1975). Optimal scaling of omitted responses. Paper presented at the Spring Meeting of the Psychometric Society, Iowa City, Iowa. Rubin, D. B. (1987). Multiple imputation for nonresponse in surveys. New York: Wiley. van Buuren, S., & van Rijckevorsel, J. L. A. (1991). Fast least squares imputation of missing data (pp. 01–91). Leiden Psychological Reports: Psychometrics and Research Methodology PRN.

Chapter 14

Epilogue

14.1 Reminiscence Time has elapsed too quickly to appreciate many happy years with my family and friends, but there were a number of events to remember. The first one was on April 3, 1952, a sunny spring day, when a magnitude 8.3 earthquake struck my home in Urahoro, Hokkaido, Japan, destroying everything around me and changed my life forever. Moving back to my hometown of Sapporo served as the spring board for the next stage of my life. From Hokkaido University to the University of North Carolina in Chapel Hill with a Fulbright scholarship, I was blessed with excellent mentors, Professors Masanao Toda, Yoichiro Takada, Yoshio Sugiyama and Tadasu Oyama in Sapporo, Japan, and Professors R. Darrell Bock, Lyle V. Jones, Dorothy Adkins-Woods, Emir Shuford and Henry F. Kaiser in Chapel Hill, North Carolina, USA. After my Ph.D., the unexpected failure in finding a job in Japan was a fortune in disguise since I was immediately offered a position at McGill University, Montreal, Canada by Professors George Ferguson, Dalbir Bindra and Wally Lambert in the Department of Psychology, where I met Prof. Ross E. Traub and my future wife Lorraine Ford who worked with Prof. Bindra, one of my mentors. In 1967, I moved to the new research center, called the Ontario Institute for Studies in Education (OISE) of the University of Toronto, where Ross E. Traub, Roderick P. McDonald, Raghu Bhargava and I established a well-known psychometrics program. The world was still politically divided between the East and the West. Since the 1970s I participated in many international meetings in France, Germany, the Netherlands, Italy and Spain, where I met many wonderful researchers. On the eastern side, I was invited to a conference hosted by the Academy of Sciences, Moscow, USSR, and to the Bulgarian Government’s Research Conference, Sofia, under watchful eyes of communist security. In Moscow, Boris Mirkin and Selghei Adamov, who together translated my 1980 book (University of Toronto Press) into Russian, were hosts to a group of researchers, Wolfgang Gaul, Hans-Hermann Bock, Hamparsum Bozdogan, William Day and myself. I saw the © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 S. Nishisato, Optimal Quantification and Symmetry, Behaviormetrics: Quantitative Approaches to Human Behavior 12, https://doi.org/10.1007/978-981-16-9170-6_14

173

174

14 Epilogue

last days of the Soviet Union, for it was a few weeks before the historic collapse of the USSR. To Sofia, Bulgaria, Dr. Vihar Zudravkov invited me, together with John C. Gower, Peter van der Heijden, Eeke van der Burg and Takayuki Saito. All of us gave lectures on quantification theory to the participants. I was surprised to meet several Bulgarian researchers who asked me to autograph copies of my 1980 book. Bulgaria was still a communist country. Those days, international trips were not always easy with visa restrictions and limited funds, but surprisingly we knew many researchers abroad through exchanging postcards for requesting reprints of published papers, a custom we no longer see. Time has changed vastly since then. International travels are simpler than before, but we can no longer use Fortran or Word Perfect. The Psychometric Society, my dear organization where I served as the President and the Editor of Psychometrika, has also changed from an American society to an international one. Because of young sophisticated researchers, I hardly find papers in Psychometrika which I can understand or interest me. This must be inevitable progress. Most regrettably and sadly, however, many of my dear old friends are already gone! Furthermore, nowadays time flies not by days but by weeks. Time is much more precious now than ever before, and I am so happy to have finished this book now and the book this spring with my dear friends Eric J. Beh (Australia), Rosaria Lombardo (Italy) and José G. Clavel (Spain). I have been so lucky to have lived my early life in Japan, student days in both Japan and the USA, and research life in Canada, thus have enjoyed the best of various stages of life. Finally, I would like to honor two giants of quantification theory, John C. Gower and J. P. Benzécri, by presenting English translations of my memories of them originally written in Japanese for the Newsletter of the Japanese Behaviormetric Society.

14.1.1 John C. Gower Life Exists After Retirement is how John C. Gower started his invited talk with at the International Conference on Measurement and Multivariate Analysis in May 2000 in Banff, Canada. This conference was organized by Yasumasa Baba and myself, and my wife Lorraine served on the local arrangement committee. The above statement was John’s first words of encouragement to me, celebrating my impending retirement on my 65th birthday on June 9 of that year. John was one of the few scholars who continued to work beyond retirement, and his words greatly encouraged me to follow him, and in fact my hope was to accomplish even one tenth of what John had done after his retirement. For this conference we had two more invited speakers, Dr. Chikio Hayashi from Japan and Dr. David Hand from Britain. There were an exceptionally large number of participants from Japan, together with many other international participants. The conference was very successful, blessed with an unexpected large snowfall in the middle of May. This pleased many

14.1 Reminiscence

175

participants, especially those who had never seen snow. The Banff Conference Centre, covered in snow, provided a beautiful Canadian setting. John Gower was accompanied by his wife Janet, and my wife Lorraine drove them to the Athabasca Glacier National Park and Lake Louise. The conference program allowed enough time for the participants to do sightseeing, and this was much appreciated. In December (2018), I received a Christmas card from John and Janet, in which he mentioned his plan for presenting a paper at CARME2019 in Stellenbosch, South Africa, in January 2019, together with his usual proud mention of his granddaughter’s artwork. There was a note, however, that his presentation in Stellenbosch would be via video. In retirement one can still prepare a paper, but attending a conference in a distant land may not always be easy. Since I knew that John was constantly working, his unexpected passing was totally devastating to me: my hero had left me and I lost one of my good friends. I do not remember exactly when I met John for the first time, but it was definitely when he was still working at the Rothamstead Experimental Station, founded by R. A. Fisher. Around then I had communication with his colleague Pete Digby. After John moved to the Open University in Milton Keynes, I was invited to give a talk, and I spent the whole day with him and his colleagues. One of the most memorable events with John goes back to the days when Bulgaria was still under the communist regime. An important member of the Bulgarian Government, Dr. Vihar Zudravkov, invited John Gower from Britain, Eeke van der Burg and Peter van der Heijden from the Netherlands, Takayuki Saito from Japan and myself from Canada to Bulgaria for 1 week. It was an invitation for us to give lectures on quantification theory to Bulgarian researchers. I was amazed that my book, entitled Analysis of Categorical Data: Dual Scaling and Its Applications. University of Toronto Press, 1980, was well known in communist Bulgaria. One day I asked Zudravkov if he could change some US dollars to Bulgarian currency for me, and he gladly obliged. Then, John pulled me aside and told me that exchanging foreign currency to Bulgarian currency was illegal in Bulgaria. I was struck with the fear of being imprisoned in a communist country. We soon learned, however, that Zudravkov’s favorite English phrase was no problems. Bulgarian currency in my wallet, I went out with John to do some shopping. When we came out of a gift shop, John was so happy, saying I made a wonderful purchase and showed me a beautiful Bulgarian carpet. It was indeed beautiful. We then headed to our hotel and suddenly John stopped and shouted: I made a terrible mistake! In converting the Bulgarian price to British pounds, he made a mistake by one decimal point—the price he paid was ten times higher than what he had thought. But it was too late. I witnessed an example of the Japanese proverb Monkeys sometimes fall from trees, and I never thought that John would ever make such a mistake. I recall another episode with John. One day during a conference in Rome, John and I decided to visit the Vatican. We got on a city bus, but could not figure out where or from whom to purchase the tickets. In the meantime, the bus reached our destination and we left the bus. Then we realized that we had had a free ride to the Vatican. John was overjoyed and I still remember his big smile.

176

14 Epilogue

The last paper I received from John was entitled Skew symmetry in retrospect, which was a vintage paper by John. I sent him my comments on the paper. Generally speaking, his papers were beyond my comprehension, yet John occasionally sent me his papers for comments, which made me extremely privileged and grateful. John C. Gower, who accomplished a monumental amount of work and served many important positions in academic organizations, was a scholar second to none (see Cox 2015). Now that John has passed away, remembering his words Life exists after retirement, I must strive to do further work. I hope that his words may spread to all senior members of the Society. Finally, I would like to express my thanks and sympathy to his wife Janet. Thank you, John, for your friendship and happy memories. (Note: Sadly, Janet Gower passed away in November, 2019).

14.1.2 Jean-Paul Benzécri Heyday of Jean-Paul Benzécri: A giant star of science, Jean-Paul Benzécri, passed away on November 24, 2019. He was 7 years older than I, and an internationally acclaimed scholar. In spite of his enormous fame, however, relatively few researchers outside France came to know him personally. I heard that Benzécri had studied differential geometry at Princeton University, USA, at the time when transatlantic ship voyages were the norm. In the 1980s Michel Tenenhaus told me Benzécri does not like to fly, and he rarely attends conferences even in Europe. In the 1990s when I was the chair of the recruitment committee of the Classification Society of North America (CSNA), someone proposed to invite Jean-Paul Benzécri as an invited speaker, and I recall that Phips Arabie mentioned: I would not support it, for Benzécri does not like to fly! Nevertheless, he was like an emperor not only in France but in the world community of scientists. The first time I met him was when I visited him at the Pierre-et-Marie Curie University in Paris. It was shortly after my book from the University of Toronto Press (1980) was published, and it was also shortly after Michael Greenacre finished his Ph.D. and Fion Murtagh was still writing his Ph.D. thesis under the supervision of Benzécri. He looked like a monk with a beard, just like Rasputin, and did not look like someone who rigorously advanced the frontier of research while supervising a countless number of students. He was a gentle and quiet person whose handshake was equally soft, not firm. When I presented my book to him, he only murmured Merci beaucoup. Since then I met him twice at conferences. I could not help but feel how strong his influence was to the French research community. Each time we outsiders presented a talk, it was followed by a French scholar commenting the same study was already discussed and explained by Benzécri in his lectures. The famous Louis Guttman must have had the same experience in France as I did. Once at a conference in Versailles, France, I sat next to Guttman and after Brigette Escofier’s presentation I stood up and said A similar study was done ten years ago in Toronto. Then, to my big

14.1 Reminiscence

177

surprise Guttman shook my hand firmly, saying well done! My statement obviously elated Guttman. While Guttman’s 1941 work was very important as the foundation of quantification theory, thus preceding Benzécri’s work, Benzécri was nevertheless regarded as the father of correspondence analysis in France, while at that Versailles meeting Guttman was only one of the invited speakers. When Philip Weingarden and I published a paper in the Canadian Journal of Marketing Research on the comparisons of quantification results of rank-order and paired comparison data, guess who wrote to us first. It was Benzécri, telling us that a similar study had been done in France! At that point, I realized that we researchers are all members of the same group with strong egos. However, Benzécri was obviously above all of us. “Cahiers des Analyse des Donnés” was a scientific journal mostly devoted to research on quantification theory and flourished since its first publication in 1976 until it was discontinued in the middle of 1990s. This journal was one of the most precious footprints that Benzécri and his students left us in the history of quantification theory. This year we lost two giants in data analysis, first John C. Gower and then JeanPaul Benzécri. With their passing, the generation of active researchers appears to have completely been shifted to a younger group of researchers. When we look around, we notice that in the younger generation many researchers no longer go back several decades to identify the origin of the work, but their research framework looks modern and esoteric, making us senior researchers feel left out. As far as quantification theory is concerned, however, I am certain that the foundations laid by Jean-Paul Benzécri and Chikio Hayashi will never be forgotten.

14.2 Going Forward We have covered a wide variety of data types for quantification. One problem that was not discussed in the current book is discretization of continuous variables so that the categorized data may be subjected to quantification. In categorization, we lose some information, which, however, may be regained by quantification. The problem is (1) how to discretize continuous data and (2) how many categories would be practically recommended from the information-retrieval point of view. The problem of discretization has been pursued in different areas of science. In terms of quantification theory, I have investigated a number of times, but have not published the outcome of research in any systematic way. As my last supervision of Ph.D. theses at the University of Toronto, I asked my student Keanre B. Eouanzoui to investigate the problem of categorization of continuous variables for his Ph.D. thesis at the University of Toronto (Eouanzoui 2004). The results have not been published, but I recall that his work contained a number of very interesting findings. Recently, some important papers were published by Kim and Frisby (2019) and Kimet al. (2020). These studies, too, show highly interesting and important aspects

178

14 Epilogue

Fig. 14.1 Nishisato at University of Murcia, Spain

of the discretization problem, indicating clearly that this is one area of quantification theory to pursue for the future. Another work of importance is a technological investigation into quantification theory such as Nishisato and Nishisato (1986) and Clavel, Nishisato and Pia (2017): we urgently need a versatile computer program, which handles quantification of contingency tables, multiple-choice data (including sorting data), rank-order data, paired comparison data, successive categories data, multi-way classification data and a mixture of discrete and continuous data, together with capability of handling, in addition to singular value decomposition, such algorithms as reciprocal averaging, piecewise reciprocal averaging, forced classification, generalized forced classification, offering a variety of projection operators and experimental design matrices. Such a program needs to be capable of generating sampling distributions of sample data and quantification statistics, through Monte Carlo simulations. We also need such a program to be capable of displaying dynamic holographic multidimensional graphs and carrying out cluster analysis of multidimensional quantification outcomes. Our biggest task ahead of us is to develop a better and more efficient way of representing multidimensional quantification results. Whether it is multidimensional graphical display or cluster analysis, it is obvious that we must break the wall of spatial perception from the three-dimensional ordinal mode into dynamic vision of hyperspace. The technology exists to cultivate epoch-making perceptual breakthrough, be it a multidimensional graph or cluster analysis. This is the most urgent task for further development of quantification theory. I count on the endeavors of young talented researchers through the A.I. framework (Fig. 14.1).

References

179

References Clavel, J. G., Nishisato, S. & Pia, A. (2017). dualScale: A computer program for multiple-choice data. (No longer in depository of CRAN). Cox, D. A. (2015). A conversation with John C. Gower.International Statistical Review, 1–18. Eouanzoui, K. B. (2004). On desensitizing data from interval to nominal measurement with minimum information loss. The University of Toronto Ph.D. thesis. Kim, S. K., & Frisby, C. L. (2019). Gains from discretization of continuous data: The correspondence analysis biplot approach. Behavior Research Methods, 51(2), 589–601. Kim, S. K, McKay, D. & Tolin, D. (2020). Examining the generality and specificity of gender moderation in obsessive compulsive beliefs: Stacked prediction by correpondence analysis. British Journal of Clinical Psychology, (pages unknown yet). Nishisato, S., Baba, Y., Bozdogan, H., & Kanefuji, K. (2002). Measurement and multivariate analysis (p. 2002). Tokyo: Springer. Nishisato, S., & Nishisato, I. (1986). Dual3 users’ guide. Toronto: MicroStats.

Part IV

Appendices

Chapter 15

Stevens’ Measurement Theory

Many years ago a typical lecture in psychology often started with Stevens’ pioneering work on measurement (Stevens 1951). Over the many years, however, this topic seems to have disappeared from basic courses on data analysis. This appendix is for those who are not familiar with the topic. According to Stevens, measurement is defined as assignment of numbers to objects according to certain rules. These rules are allowable mathematical operations, which directly take us to the question of what we can do with our data. In spite of the fact that we face all kinds of data from non-numerical to numerical data, there are very few books to direct our attention to the important contributions by Stevens. Although it is in Japanese, Nishisato (1975) discussed most of his entire book in line with Stevens’ theory of measurement, namely what statistical methods are appropriate for Stevens’ four types of measurement, nominal, ordinal, interval and ratio. Other researchers have extended the scope of measurement further and handled the methods of analysis under the term scaling (e.g., Torgerson 1958; Coombs 1964; Bock and Jones 1968; Dunn-Rankin 1983; Hand 2004). In the current appendix, we will focus our attention to Stevens’ classification of measurement.

15.1 Four Kinds of Measurement Stevens classified measurement into the following four types: nominal, ordinal, interval and ratio measurement.

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 S. Nishisato, Optimal Quantification and Symmetry, Behaviormetrics: Quantitative Approaches to Human Behavior 12, https://doi.org/10.1007/978-981-16-9170-6_15

183

184

15 Stevens’ Measurement Theory

15.1.1 Nominal Measurement In this category, the numbers are used only as identification purposes such as group 1, group 2 and group 3, or 1 for male and 2 for female. Thus, the use of numbers at the “nominal” level is not appropriate for mathematical operations such as addition, subtraction, multiplication and division (e.g., Group 1 plus group 2 does not generate group 3; two males are not equal to one female). Similarly subtraction, multiplication and division of nominal numbers do not produce meaningful numbers either. But, note this: The nominal measurement is one of the domains appropriate for quantification. Most data types which are subjected to quantification are nominal data. Thus, our quantification task then is to identify numerals for nominal measurement, namely quantify nominal measurement, in such a way that quantified nominal measurement is meaningfully amenable to the basic mathematical operations! In fact, the history of quantification theory started with how to quantify nominal measurement.

15.1.2 Ordinal Measurement The numbers are used to show order relations such as 1 = poor, 2 = good and 3 = excellent, or 1 = never, 2 = sometimes, 3 = often and 4 = always. At this level, these numbers are not amenable to subtraction, addition, division or multiplication. For examples, the sum of poor taste and good taste does not yield excellent taste. Likewise, the sum of never and often does not mean always. We must consider some transformation of ordinal measurement so that the quantified ordinal measurement can be additive. This task of transformation is nothing but quantification. Since ordinal measurement has some metric information (that is, order), however, we may say that ordinal measurement is not a full-fledged domain for quantification theory. As we will see, however, Guttman (1946) pioneered a way to include ordinal measurement in the domain of quantification theory. It is certain that ordinal measurement requires an additional step of transformation to create an appropriate input for quantification. This will be discussed later. In conclusion, we can add ordinal measurement to the domain of quantification.

15.1.3 Interval Measurement At this level, the numbers have the same unit, namely 2 is 1 + 1 and 3 is 1 + 1 + 1 or 1 + 2. The temperature is an example of interval measurement. The crucial point here is the fact that interval measurement does not have the absolute origin. This means that without the absolute origin the ratio of two interval measurements is not meaningful. For example, it is absurd to say that 20 degrees in Celsius means twice as hot as 10 degrees. What will happen if we transform the same temperature to Fahrenheit? It is

15.1 Four Kinds of Measurement

185

important to note that many data collected in the social and behavioral sciences are interval measurement, and as such we cannot make such a statement that a person of Intelligence Quotient (IQ) of 120 is twice as wise as a person with IQ of 60. IQ is an interval measurement, and any psychologist can tell immediately that such a comparison of IQ’s is absurd. However, the addition and the subtraction are appropriate for interval measurement, and such a statistic as correlation is an appropriate statistic for interval measurement. From the quantification point of view, interval measurement is not included in the domain for quantification, for all we need to upgrade interval measurement is to seek a rational origin of measurement, and this is not a task of quantification theory.

15.1.4 Ratio Measurement This measurement is characterized by the meaningful existence of the origin and the equality of measurement. In other words, ‘0’ means ‘nothingness’ of an attribute. A typical example is distance and height, where ‘0’ means ‘nothingness’ of the attribute. The existence of the absolute origin makes it possible to make a meaningful statement about the ratio of two measurements, such as 50 m is twice the distance of 25 m; this mountain is three times higher than that mountain. At this level, there is no need to

Table 15.1 Stevens’ scales of measurement Scale Basic empirical Mathematical operations structure N*

Determination of equality

O*

Determination of greater or less

I*

Determination of equality of intervals or differences

R*

Determination of equality of ratio

Permutation group x  = f (x) [ f (x) means any one-to-one substitution] Isotonic group x  = f (x) [ f (x) means any increasing monotonic function] General linear group x  = ax + b

Similarity group x  = ax

Permissible statistics Typical examples No. of cases Mode Contingency correlation

Numbering of football players, Type or model numbers Genders Body types Median Percentiles Hardness of Order correlation materials Quality of wool/leather/lumber Pleasantness of odors Movie rank Mean Standard Temperature: (F and deviation. Pearson’s C) Academic rank correlation Calendar date Test scores Geometric mean Length Weight Coefficient of Resistence Density variation Decibel Pitch (mels) Loudness (sones)

Notes N* = nominal, O* = ordinal, I* = Interval, R* = ratio

186

15 Stevens’ Measurement Theory

quantify the data, and as such ratio measurement does not belong to the domain of quantification. A skeleton of his theory of measurement is summarized in Table A. Stevens’ original table was augmented with more examples (Table 15.1).

15.2 Domains of Quantification 15.2.1 Full-Fledged Domain As is clear from the above discussion, nominal measurement is definitely in the domain for quantification, and we can say that it is in the full-fledged domain for quantification. Each datum has no quantitative information, and as such all the nominal observations are proper objects for quantification. In the early history of quantification, researchers considered only nominal measurement for quantification. To distinguish this from other types of measurement, we may call nominal measurement constitutes the ‘proper domain for quantification.’

15.2.2 Quasi-Domain Ordinal measurement can be said semi-quantitative, and as we will see in data analysis, ordinal data are ipsative or conditional. Let us see what these words mean. When one person ranks five movies A, B, C, D and E, according to the order of his or her liking, the ordinal data are given such numbers as 1, 2, 3, 4 and 5 in order of preference; another person ranks the same set of movies as 3, 5, 2, 1, 4. This is an example of ipsative data, of which the first person’s rank 1 cannot be equated with the second person’s rank 1, because the two persons could have had different criteria (e.g., the first subject may like a romantic movie, while the second person prefers an action movie). Anyway, we hope you will see the difference between ipsative data and such non-ipsative data as examination scores and personality scores. In the history of quantification, however, ordinal measurement is also included in the domain of quantification (e.g., Guttman 1946; Nishisato 1976, 1978). As we will see later, the quantification of ordinal measurement is quite different from that of nominal measurement because each ordinal measurement contains order information, namely quantitative information. The question then is how to incorporate ordinal information for quantification in such a way that quantified ordinal measurement can be subjected to mathematical operations of addition, subtraction, multiplication and division. This is definitely a complex task for quantification. In the current book, we will discuss quantification problems associated with these two levels of measurement. We should note that the other two levels of measurement (interval and ratio) are already “quantitative” and there is no need for us to fur-

15.2 Domains of Quantification

187

ther quantify them. The main task for these quantitative measurement is to explore different statistical models for analysis.

15.2.3 Outside Domain From the viewpoint of quantification, interval measurement and ratio measurement are outside the quantification domain since they are “quantitative enough” for ordinary statistical analysis. However, we should note that interval measurement lacks the rational origin, a key piece of information that makes the mathematical operation of division becomes meaningful. For example, temperature is an interval measurement because we cannot make such a statement that “it is twice as hot today as yesterday because it is 20 degrees Celsius (C) today and it was only 10 degrees C yesterday” (note: if we convert Celsius to Fahrenheit (F), 20 C = 68 F and 10 C = 50 F, where we see that 68 F is not twice of 50 F). Thus, the absence of the origin in interval measurement can create a serious problem from the computational point of view. Historically, this matter is treated as an additive constant problem, for example, in the metric multidimensional scaling (e.g., Torgerson 1952). Thus, if we want to be exact, only ratio measurement is the full-fledged quantitative measurement since it does not require any additional transformation for mathematical operations. From the data analytic point of view, however, we may still need such an operation as standardization of ratio measurement in order to adjust the unit of measurement.

References Bock, R. D., & Jones, L. V. (1968). Measurement and prediction of judgment and choice. San Francisco: Holden-Day. Coombs, C. H. (1964). A theory of data. New York: Wiley. Dunn-Rankin, P. (1983). Scaling methods. Hillsdale: Lawrence Erlbaum Associates. Guttman, L. (1946). An approach for quantifying paired comparisons and rank order. Annals of Mathematical Statistics, 17, 144–163. Hand, A. J. (2004). Measurement theory and practice: The world through quantification. London: Oxford University Press. Nishisato, S. (1975). Applied psychological scaling. Tokyo: Seishin Shobo Publisher. (in Japanese). Nishisato, S. (1976). Optimal scaling as applied to different forms of categorical data. Toronto: Department of Measurement and Evaluation, OISE. Nishisato, S. (1978). Optimal scaling of paired comparison and rank order data: An alternative to Guttman’s formulation. Psychometrika, 43, 267–271. Torgerson, W. S. (1952). Multidimensional scaling: I. Theory and method. Psychometrika, 17, 401–419. Torgerson, W. S. (1958). Theory and methods of scaling. New York: Wiley. Stevens, S. S. (1951). Mathematics, measurement, and psychophysics. In Stevens, S. S. (Ed). Handbook of Experimental Psychology. New York: Wiley.

Chapter 16

A Numerical Example of MRA

Nowadays, the computational problem of quantification can be carried out with a high-speed computer. But as mentioned in Chap. 2, when computers were not available, it was not easy to solve eigenequations. To respond to the need then, the method of reciprocal averages (MRA) was promoted by Richardson and Kuder (1933), Horst (1935), Mosier (1946), Baker (1960) and Hill (1973), among others. Here, we would like to look at its actual computations, using the same example of teachers’ evaluation (Nishisato 1980), discussed earlier. For our convenience, the data are reproduced here. Given this data set, we will follow its computational steps, first to obtain the optimal results, and then more components. We will follow its computational process step by step.

16.1 Computing Optimal Component (1) Assign arbitrary weights to columns (or rows), arbitrary but avoid identical weights and all 0’s. As an example, consider (Table 16.1): x1 (good) = 1, x2 (average) = 0, x3 (poor) = −1. (2) Calculate the weighted averages of the rows.

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 S. Nishisato, Optimal Quantification and Symmetry, Behaviormetrics: Quantitative Approaches to Human Behavior 12, https://doi.org/10.1007/978-981-16-9170-6_16

189

190

16 A Numerical Example of MRA

Table 16.1 Evaluation of three teachers Teacher Good Average White* Green Brown Total

1 3 6 10

3 5 3 11

Poor

Total

6 2 0 8

10 10 9 29

*In the original use (Nishisato 1980), these teachers were identified as 1, 2 and 3; but later these were changed to White, Green and Brown, respectively

1 × x1 + 3 × x2 + 6 × x3 10 1 × 1 + 3 × 0 + 6 × (−1) = −0.5. = 10 3 × 1 + 5 × 0 + 2 × (−1) y2 (Gr een) = = 0.1000 10 6 × 1 + 3 × 0 + 0 × (−1) y3 (Br own) = = 0.6667. 9 y1 (W hite) =

(3) Calculate the mean responses weighted by y1 , y2 , y3

M=

10 × (−0.5) + 10 × 0.1 + 9 × 0.6667 10y1 + 10y2 + 9y3 = = 0.0690 29 29

(4) Subtract M from each of y1 , y2 , y3 , and new values are indicated as y1 , y2 , y3 , respectively: y1 = −0.5000 − 0.0690 = −0.5690 y2 = 0.1000 − 0.0690 = 0.0310 y3 = 0.6667 − 0.0690 = 0.5977 (5) Divide y1 , y2 , y3 by the largest absolute value of y1 , y2 , y3 , which are again indicated as g y . In the current example, gy = 0.5977. Adjusted values are indicated as y1 , y2 , y3 : y1 = −0.5690/0.5977 = 0.9519 y2 = 0.0310/0.5977 = 0.0519

16.1 Computing Optimal Component

191

y3 = 0.5977/0.5977 = 1.0000

(6) Using these new values as weights, calculate the averages of the columns: x1 = x2 = x3 =

1 × (−0.9519) + 3 × 0.0519 + 6 × 1.0 = 0.5204 10

3 × (−0.9519) + 5 × 0.0519 + 3 × 1.0000 = 0.0367 11

6 × (−0.9519) + 2 × 0.0519 + 0 × 1.0000 = −0.7010 8

(7) Calculate the mean responses weighted by x1 , x2 , x3 : N=

10 × 0.5204 + 11 × 0.0367 + 8 × (−0.7010) =0 29

(8) Subtract N from each of x1 , x2 , x3 . Since N = 0, x1 , x2 , and x3 remain the same. (9) Divide each element of x1 , x2 , x3 by the largest absolute value of the three numbers, gx . Since −0.7010 has the largest absolute value, gx = 0.7010. Adjusted values are indicated as x1 , x2 , x3 : x1 = 0.5204/0.7010 = 0.7424 x2 = 0.0367/0.7010 = 0.0524 x3 = −0.7010/0.7010 = −1.0000

Reciprocate the above averaging processes (step 2 through step 9) until all the six values are stabilized (Table 16.2). Iteration 5 provides the identical set of numbers as iteration 4. Therefore the process has converged to the optimal solution in four iterations. Notice that the largest absolute values at each iteration, gy and gx , also converge to two constants, 0.5083 and 0.7248. It is known that ρ 2 = g y gx ,

and ρ =

√ g y gx

where ρ 2 is called the correlation ratio (or the eigenvalue), and ρ is the maximal correlation between rows and columns of the table (or the singular value). In the current example, ρ 2 = 0.5083 × 0.7248 = 0.3648

192

16 A Numerical Example of MRA

Table 16.2 Iterative results Iter1 x Good Average Poor g(y,x)

White Green Brown g(y,x)

1.0000 0.0000 −1.0000 Iter1 y −0.9954 0.0954 1.0000

Iter2 x

Iter3 x

Iter4 x

Iter5 x

0.7321 0.0617 −1.0000 0.7227 Iter2 y −0.9993 0.9993 1.0000 0.5124

0.7321 0.0625 −1.0000 0.7246 Iter3 y −0.9996 0.0996 1.0000 0.5086

0.7311 0.0625 −1.0000 0.7248 Iter4 y −0.9996 0.0996 1.0000 0.5083

0.7311 0.0625 −1.0000 0.7248 Iter5 y −0.9996 0.0996 1.0000 0.5083

ρ=

√ 0.5083 × 0.7248 = 0.6070

(10) The unit for y1 , y2 , y3 now needs to be equated to the unit for x1 , x2 , x3 . Following Nishisato (1980), the unit is chosen in such a way that the sum of squares of weighted responses is equal to the number of responses. In this case, the constant multiplier for adjusting the unit of y1 , y2 , y3 is given by cr , where  29 = 1.2325 cr = 10y12 + 10y22 + 9y32 The constant multiplier for adjusting the unit of x1 , x2 , and x3 is given by cc , where  cc =

29 = 1.4718 10x12 + 11x22 + 8x32

The final weights are obtained by multiplying y1 , y2 , y3 by cr , and x1 ,x2 , x3 by cc . These weights are called standard coordinates. The standard coordinates, multiplied by the singular value, that is, ρyi and ρx j , are called principal coordinates. The distinction between these two types of weights is extremely important, particularly from the graphical point of view and this matter will be discussed later. The final results of our analysis are shown in Table 16.3. Out of these, principal coordinates are the ones to be used to plot the data, while standard coordinates do not carry data information. We should note the fact that we have created new quantitative variates from categorical variables under the conditions mentioned at the beginning of this chapter. As mentioned earlier, the main interest of most researchers was to find a single scoring

16.1 Computing Optimal Component

193

Table 16.3 Two types of coordinates Teacher Standard y Principal y coordinate coordinate White Green Brown

−1.2320 0.1228 1.2325

−0.7478 0.0745 0.7481

Rating

Standard x coordinate

Principal x coordinate

Good Average Poor

1.0760 0.0920 −1.4718

0.6531 0.0559 −0.8933

procedure for a date set. Historically, however, the researchers’ interest was gradually extended to multidimensional data analysis. Our MRA can easily be extended to the task of constructing multidimensional scoring.

16.2 Extracting More Components Once we obtain the first component, say (y1 , x1 , ρ1 ), we calculate the residual frequencies and subject the matrix of the residual frequencies to MRA, and obtain the second component (y2 , x2 , ρ2 ). If the first and the second components do not reproduce the input contingency table, we subject the residual matrix to extract the next component, and so on. This is the task of multidimensional extension of MRA. For the calculation of successive residual matrices, however, we need to know the structure of the data. The elements of the contingency table F, f i j , can be expressed as: f i. f . j (1 + ρ1 yi1 x1 j + ρ2 yi2 x2 j + · + ρ K yi K x K j ) (16.1) fi j = ft where K is the total number of possible components from the contingency table, namely K = (the smaller number of m and n) −1, that is, K = min(m, n) − 1, and f i. f . j ft is the frequency expected when the row i and the column j are statistically independent and we will ignore this so-called trivial component. In the current example, the following is the matrix of the trivial component. 

f i. f . j ft





⎤ 3.25 3.79 2.76 = ⎣ 3.45 3.79 2.76 ⎦ 3.10 3.41 2.48

Thus, when we eliminate this trivial contribution from the original data, we obtain the following residual matrix:

194

16 A Numerical Example of MRA



⎤ ⎡ ⎤ ⎡ ⎤ −2.45 −0.79 3.24 136 3.25 3.79 2.76 ⎣ −0.45 1.21 −0.76 ⎦ = ⎣ 3 5 2 ⎦ − ⎣ 3.45 3.79 2.76 ⎦ 2.90 −0.41 −2.48 630 3.10 3.41 2.48 This matrix is subjected to MRA to obtain the first component. • (technical note): In our example, the original contingency table was subjected to extract the first component, and now we mention that the above table which was obtained by subtracting the contribution of the trivial component from the original data should be subjected to MRA. In other words, the original data and this residual data are totally different. In principle, MRA should be applied to this residual matrix, but we used a trick of choosing the initial weights (scores) which are centered (i.e., they are summed to zero). By using the centered initial weights, we can avoid the reciprocal averaging series from converging to the trivial component. The current example yields two components, and the above residual matrix (the data minus the trivial components) can be decomposed as the sum of the contributions of the first two components as follows: ⎡

⎤ ⎡ ⎤ ⎡ ⎤ −2.45 −0.79 3.24 −2.78 −0.26 3.03 0.33 −0.53 0.21 ⎣ −0.45 1.21 −0.76 ⎦ = ⎣ 0.27 0.03 −0.30 ⎦ + ⎣ −0.72 1.18 −0.46 ⎦ 2.90 −0.41 −2.48 2.50 0.24 −2.73 0.40 −0.65 0.25 This may be too technical, but what we have shown here are the following decomposition of the data.  fi j =

f i. f . j ft



 +

f i. f . j ρ1 yi1 x1 j ft



 +

f i. f . j ρ2 yi2 x2 j ft

 (16.2)

Numerically this decomposition is as follows: ⎡

⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ 136 3.25 3.79 2.76 −2.78 −0.26 3.03 0.33 −0.53 0.21 ⎣ 3 5 2 ⎦ = ⎣ 3.45 3.79 2.76 ⎦ + ⎣ 0.27 0.03 −0.30 ⎦ + ⎣ −0.72 1.18 −0.46 ⎦ 630 3.10 3.41 2.48 2.50 0.24 −2.73 0.40 −0.65 0.25

We will skip the steps to convert these numbers so as to satisfy the conditions on the numerals. As for the removal of the contributions of the trivial component, we can rewrite the original expansion formula as follows:  fi j

ft f i. f . j

 − 1 = ρ1 yi1 x1 j + ρ2 yi2 x2 j + · + ρ K yi K x K j

(16.3)

The current example contains two components. Rather than following the computations, we will list the final quantification results as in Tables 16.4 and 16.5.

16.2 Extracting More Components Table 16.4 Two-dimensional coordinates Teacher Principal 1 Principal 2 coordinate coordinate −0.7478 0.0745 0.7481

White Green Brown

Table 16.5 Item statistics Component (ρ 2 )

Eigenvalue Singular value (ρ) Delta (%) Cumulative delta (%)

−0.1099 0.2440 −0.1490

195

Rating

Principal 1 coordinate

Principal 2 coordinate

Good Average Poor

0.6531 0.0559 −0.8933

−0.1531 0.2268 −0.1204

1

2

0.368 0.6070 92.10 92.10

0.0316 0.1777 7.90 100

What we can say about the results is that the data set has a dominant component, accounting for 92% of the total information. Therefore, the data can be interpreted using only the first component.

References Baker, F. B. (1960). UNIVAC scientific computer program for scaling psychological inventories by the method of reciprocal averages CPA 22. Behavioral Science, 5, 268–269. Hill, M. O. (1973). Reciprocal averaging: An eigenvector method of ordination. Journal of Ecology, 61, 237–249. Horst, P. (1935). Measuring complex attitudes. Journal of Social Psychology, 6, 369–374. Mosier, C. I. (1946). Machine methods in scaling by reciprocal averages. In Proceedings, Research Forum (pp. 35–39). Endicath, N.Y.: International Business Corporation. Nishisato, S. (1980). Analysis of categorical data: Dual scaling and its applications. Toronto: The University of Toronto Press. Richardson, M., & Kuder, G. F. (1933). Making a rating scale that measures. Personnel Journal, 12, 36–40.