130 33
English Pages 240 [237] Year 2021
Jan-Karl Knigge
Virtual Reality in Manual Order Picking Using Head-Mounted Devices for Planning and Training
Virtual Reality in Manual Order Picking
Jan-Karl Knigge
Virtual Reality in Manual Order Picking Using Head-Mounted Devices for Planning and Training
Jan-Karl Knigge Darmstadt, Germany Darmstadt, Technische Universität Darmstadt, doctoral thesis.
ISBN 978-3-658-34703-1 ISBN 978-3-658-34704-8 (eBook) https://doi.org/10.1007/978-3-658-34704-8 © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Fachmedien Wiesbaden GmbH, part of Springer Nature 2021 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. Responsible Editor: Marija Kojic This Springer Gabler imprint is published by the registered company Springer Fachmedien Wiesbaden GmbH part of Springer Nature. The registered company address is: Abraham-Lincoln-Str. 46, 65189 Wiesbaden, Germany
What is real? How do you define real? If you’re talking about what you can feel, what you can smell, what you can taste and see, then real is simply electrical signals interpreted by your brain. Morpheus—The Matrix (1999)
Abstract
Order picking is a process which, to a great extent, is still performed manually. However, semi-automated parts-to-picker systems are becoming more popular in order picking, making efficient rack designs for fast item searching and picking by human pickers more important. Virtual reality (VR) using a contemporary head-mounted device (HMD) is a promising technology for the planning of order picking racks. Also, HMDs can be used to train human employees in manual order picking tasks. The introduction of consumer-level head-mounted devices has led to a major drop in the application costs of VR, making the technology available for a wide range of users. However, modern HMDs still come with several limitations, mainly in terms of missing haptic feedback and inadequate possibilities for physical interaction with virtual contents. Therefore, it remains unclear whether these limitations lead to different results in terms of human performance in VR compared to a real environment when simulating manual order picking. As such differences can significantly influence the outcomes of a planning and training process, a research gap has been identified with regard to the usability of VR HMDs in this context. To contribute towards closing this research gap, the thesis first provides a systematic literature review. The review accesses previous studies that compare general manual activities in VR and in a real environment in order to evaluate which activities of manual order picking can be simulated using a VR HMD. The results show that large-scale studies are still relatively scarce. Moreover, it appears relevant to further investigate especially item picking and searching at single racks using VR HMDs. To get a deeper insight into the usability of VR HMDs for manual order picking, an experimental setup consisting of a real picking rack and an equally
vii
viii
Abstract
sized virtual model was developed. A large-scale randomized controlled study has been conducted with 112 participants, who have performed a picking task either in the real or the virtual setup. During these experiments, order completion times, picking times, and searching times, as well as the individually perceived workload, the number of erroneous orders, and the number of orders with dropped items have been measured as dependent variables in the two environments (independent variable). The results reveal that order completion times and picking times are significantly longer in VR. However, a significant difference has neither been found in the perceived workload nor in the searching times. Subsequently, the picking and searching times from the experimental study have been used to estimate learning curves for each participant. The analysis shows that training in VR is effective and at least as efficient as similar training in the real environment. The results thus imply that VR HMDs can indeed be used by manufacturers and warehouse operators in a rack planning process if the reduction of searching times or the perceived workload is in focus. Additionally, the findings enable the use of VR HMDs for scientific research on human-centred rack design. Finally, this thesis highlights the usability of VR HMDs for training manual order picking tasks.
Zusammenfassung
Die Kommissionierung ist ein Prozess, der noch weitgehend manuell durchgeführt wird. Allerdings halten halbautomatische Parts-to-Picker-Systeme zunehmend Einzug in die Kommissionierung. Aus diesem Grund werden Regallayouts, die eine effiziente Suche und Entnahme von Artikeln durch menschliche Kommissionierer ermöglichen, immer wichtiger. So genannte Head-Mounted Devices (HMD) zur Darstellung einer virtuellen Realität (VR) stellen dabei eine vielversprechende Technologie dar, die in Zukunft im Rahmen der Planung von Kommissionierregalen eingesetzt werden kann. HMDs bieten außerdem die Möglichkeit, Mitarbeiter in manuellen Kommissionieraufgaben zu trainieren. Die Einführung moderner HMDs hat in den vergangenen Jahren zu einer erheblichen Reduktion der für den Einsatz von VR benötigten Kosten und Voraussetzungen geführt, wodurch die Technologie für ein breites Spektrum von Anwendern verfügbar wurde. Moderne HMDs weisen jedoch einige Einschränkungen auf, vor allem im Hinblick auf fehlendes haptisches Feedback und unzureichende Möglichkeiten zur physischen Interaktion mit virtuellen Objekten. Es bleibt unklar, ob diese Einschränkungen zu unterschiedlichen Ergebnissen in VR im Vergleich zu einer realen Umgebung führen, wenn manuelle Tätigkeiten untersucht werden. Solche Unterschiede können die Ergebnisse eines Planungs- und Trainingsprozesses jedoch erheblich beeinflussen. Es wurde daher eine Forschungslücke hinsichtlich der Nutzbarkeit von VR HMDs für Planung und Training im Kontext der manuellen Kommissionierung identifiziert. Um zur Schließung dieser Forschungslücke beizutragen und zu untersuchen, welche Aktivitäten in der manuellen Kommissionierung mit einem HMD simuliert werden können, wurde zunächst eine systematische Literaturrecherche durchgeführt. Dabei wurden Forschungsarbeiten, in denen manuelle Aktivitäten im
ix
x
Zusammenfassung
Allgemeinen zwischen einer VR- und einer realen Umgebung verglichen wurden, erfasst und ausgewertet. Die Ergebnisse zeigen, dass es bislang noch relativ wenige großzahlige Studien in diesem Themenfeld gibt. Auch zeigt sich, dass sich insbesondere das Suchen und Greifen von Artikeln an einem einzelnen Regal gut für eine Untersuchung in der VR mit HMDs eignet. Um einen tieferen Einblick in die Anwendbarkeit von VR-HMDs im Rahmen der manuellen Kommissionierung zu erhalten, wurde als nächstes ein Versuchsaufbau bestehend aus einem realen Kommissionierregal und einem virtuellen Modell entwickelt. Anschließend wurde eine randomisierte, kontrollierte Studie mit 112 Teilnehmerinnen und Teilnehmern durchgeführt, die eine Kommissionieraufgabe entweder im realen oder im virtuellen Aufbau durchführen sollten. Während der Experimente wurden die Gesamt-, sowie die Greif- und Suchzeiten gemessen. Außerdem wurde die individuell wahrgenommene Beanspruchung, die Anzahl der fehlerhaften Aufträge und die Anzahl der Aufträge mit fallengelassenen Artikeln als abhängige Variablen in den beiden Umgebungen (unabhängige Variable) ermittelt. Die Ergebnisse zeigen, dass die Gesamt- und die Greifzeiten in der VR signifikant länger sind als in der realen Umgebung. Es wurde jedoch weder bei der wahrgenommenen Beanspruchung noch bei den Suchzeiten ein signifikanter Unterschied festgestellt. Die individuellen Greif- und Suchzeiten aus der experimentellen Studie wurden darüber hinaus auch für die Schätzung von Lernkurven für jede Teilnehmerin und jeden Teilnehmer verwendet. Die anschließende Analyse der Lernkurven zeigt, dass ein Training in VR effektiv und mindestens so effizient ist wie ein vergleichbares Training in der realen Umgebung. Die Ergebnisse implizieren folglich, dass HMDs tatsächlich in der Praxis von Intralogistik-Herstellern und Lagerbetreibern in einem Regalplanungsprozess eingesetzt werden können, wenn möglichst kurze Suchzeiten und geringe Beanspruchungen im Fokus der Planung stehen. Darüber hinaus ermöglichen die Ergebnisse den Einsatz von VR HMDs für die wissenschaftliche Forschung zur mitarbeiterfokussierten Regalplanung. Zudem untermauert diese Arbeit die Anwendbarkeit von VR HMDs im Rahmen eines Trainings von manuellen Kommissionierprozessen.
Contents
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1 Motivation and Research Gap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Aim of the Research, Research Questions and Procedure . . . . . . . 1.3 Thesis Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.4 The Thesis in the Context of the Philosophy of Science . . . . . . . . 2 Theoretical Background: The use of Virtual Reality Head-Mounted Devices for Planning and Training in the Context of Manual Order Picking . . . . . . . . . . . . . . . . . . . . . . . . . 2.1 Virtual Reality Head-Mounted Devices as a Tool for Planning and Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1.1 Fundamentals of Virtual Reality Technology . . . . . . . . . . . 2.1.2 Limitations of Contemporary Virtual Reality Head-Mounted Devices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1.3 State of Research on using Virtual Reality in a Planning Context . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1.4 State of Research on Learning and Training in Virtual Realities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Human-Centred Planning and Training in the Context of Manual Order Picking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.1 Fundamentals of Order Picking and Order Picking Technology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.2 Fundamentals of the Planning and Design of Order Picking Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.3 State of Research on the Planning of Order Picking Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1 1 5 9 11
13 13 13 17 19 20 23 23 25 27
xi
xii
Contents
2.2.4 State of Research on Learning and Training in Manual Order Picking . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3 State of Research on the use of Virtual Reality in Manual Order Picking and Specification of the Research Gap . . . . . . . . . . 3 Systematic Literature Review of Previous Studies that use Virtual Reality Head-Mounted Devices for Simulating Manual Activities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1 Tertiary Analysis of Previously Published Literature Reviews . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Framework for the Content Analysis of the Literature Sample . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.1 Defining Manual Activities in Order Picking . . . . . . . . . . . 3.2.2 Development of a Framework for the Content Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3 Methodological Approach: Searching and Sampling the Literature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3.1 Keywords and Database Search . . . . . . . . . . . . . . . . . . . . . . 3.3.2 Inclusion and Exclusion Criteria and Sample Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3.3 Discussion of the Sample Generation Process . . . . . . . . . . 3.4 Analysis of the Literature Samples . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4.1 Quantitative Analyses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4.2 Application of the Content Analysis Framework . . . . . . . . 3.5 Conclusion of the Literature Review . . . . . . . . . . . . . . . . . . . . . . . . . 4 Experimental Design for Evaluating the Usability of Virtual Reality for Planning and Training in the Context of Manual Order Picking and Execution of the Study . . . . . . . . . . . . . . . . . . . . . . . 4.1 Specification of the Research Design . . . . . . . . . . . . . . . . . . . . . . . . 4.1.1 Criteria for Quality in Research Designs . . . . . . . . . . . . . . . 4.1.2 Laboratory Experiments as the Research Design to Investigate Manual Order Picking in Virtual Realities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Implementation of the Research Design . . . . . . . . . . . . . . . . . . . . . . 4.2.1 Overview of the Design Process Together with Logistics Managers and Engineers . . . . . . . . . . . . . . . 4.2.2 Experimental Groups and Randomization . . . . . . . . . . . . . . 4.2.3 Experimental Procedure and Treatments . . . . . . . . . . . . . . . 4.2.4 Laboratory Setup and Apparatus . . . . . . . . . . . . . . . . . . . . . .
29 30
33 34 36 36 38 40 40 41 44 44 45 49 56
57 57 57
59 64 65 67 67 70
Contents
4.3 Operationalisation of the Research Questions . . . . . . . . . . . . . . . . . 4.3.1 Selection of the Dependent Variables . . . . . . . . . . . . . . . . . . 4.3.2 Measurement of the Dependent Variables . . . . . . . . . . . . . . 4.3.3 Questionnaire Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4 Execution of the Research Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4.1 Sampling Process, Time of the Experiments and Sample Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4.2 Validation and Verification of the Experimental Setup Using the Questionnaire Results . . . . . . . . . . . . . . . . 4.4.3 Data Preparation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 Results of the Comparison Between Virtual and Real Order Picking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1 Research Hypotheses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2 Procedure and Methods for the Inferential Statistics Analyses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3 Results of the Hypotheses Testing using Inferential Statistics . . . 5.3.1 Perceived Workload . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3.2 Set Completion Times . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3.3 Picking Times . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3.4 Searching Times . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3.5 Number of Erroneous Orders . . . . . . . . . . . . . . . . . . . . . . . . 5.3.6 Number of Orders with Dropped Items . . . . . . . . . . . . . . . . 5.3.7 Summary of Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4 Discussion of the Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4.1 Validation and Discussion of the Experimental Setup . . . . 5.4.2 Comparison of Human Performance in Virtual and Real Order Picking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 Analysis of Learning Curves in Virtual and Real Order Picking . . . 6.1 Occurrence of Learning Effects in General and Selection of Dependent Variables for the Analysis of Learning Curves . . . . 6.2 Curve Fitting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2.1 Learning Curve Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2.2 Results of the Fitted Learning Curve Models . . . . . . . . . . . 6.2.3 Discussion of the Fitted Learning Curves for Picking Times per Item . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2.4 Discussion of the Fitted Learning Curves for Searching Times . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.3 Evaluating the Quality of Fit of the Learning Curve Models . . . .
xiii
80 80 82 84 86 86 92 97 101 101 105 108 108 112 116 118 120 123 126 127 127 128 133 133 138 138 145 158 160 162
xiv
Contents
6.3.1 Quality Measures for the Comparison of the Learning Curve Models . . . . . . . . . . . . . . . . . . . . . . . 6.3.2 Results for Picking Times per Item . . . . . . . . . . . . . . . . . . . 6.3.3 Results for Searching Times . . . . . . . . . . . . . . . . . . . . . . . . . 6.3.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.4 Comparing Learning Curves between Virtual and Real Order Picking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.4.1 Research Hypotheses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.4.2 Results for Picking Times per Item . . . . . . . . . . . . . . . . . . . 6.4.3 Results for Searching Times . . . . . . . . . . . . . . . . . . . . . . . . . 6.4.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.5 Using Learning Curves for Predicting Human Performance in the Real Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.6 Using Learning Curves to Estimate the Number of Orders Necessary for Familiarization in Virtual Reality . . . . . . . . . . . . . . .
163 164 165 167 171 171 174 176 179 182 184
7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.1 Summary of Results and Answer to the Research Questions . . . . 7.2 Implications for Research and Practice . . . . . . . . . . . . . . . . . . . . . . . 7.3 Limitations and Outlook on Future Research . . . . . . . . . . . . . . . . .
187 187 189 191
Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
195
Abbreviations
3PH ANOVA AR CAVE df DJLC DPLC HMD JGLC LR NA NASA-TLX RQ RR RSE RSS SCM SD SER VR WLC
Three-parameter hyperbolic learning curve Analysis of variance Augmented reality Cave automatic virtual experience Degrees of freedom De Jong learning curve Dual phase learning curve Head-mounted device or Head-mounted display Jaber-Glock learning curve Learning rate Not applicable or Not available NASA Task Load Index Research question Group RR (real rack) Residual standard error Residual sum of squares S-curve model Standard deviation Standard error of the regression Virtual reality or Group VR (virtual rack) Wright learning curve
xv
List of Figures
Figure 1.1 Figure 1.2 Figure 1.3 Figure 2.1 Figure 3.1 Figure 3.2 Figure 3.3 Figure 3.4 Figure 3.5 Figure 3.6 Figure 3.7 Figure 3.8
Figure 4.1 Figure 4.2
Research gap in the context of using VR HMDs for the planning and training of manual order picking . . . . Research questions defining the aim of this thesis . . . . . . . . Integration of the research questions into the general outline of the thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Extended illustration of the research gap . . . . . . . . . . . . . . . Human activities in manual order picking relevant in the context of this thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . Framework developed for the content analysis of the literature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Process of generating the literature samples including the number of papers excluded in each step . . . . . . . . . . . . . Number of articles published in each year in each sample . . . . . . . . . . . . . . . . . . . . . . . . . . . . Primary research field of the articles in each sample . . . . . . HMD systems used by the articles in samples B and C . . . Total number of participants in experimental studies in samples B and C . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Number of articles from sample C assigned to each manifestation of each item of the content analysis framework, separated into articles with no reference to training (n=39) and articles with focus on training (n=22) . . . . . . . . Classification framework used to develop the research design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Experimental procedure consisting of nine steps . . . . . . . . .
5 6 10 32 37 39 43 46 47 48 49
50 61 68
xvii
xviii
Figure 4.3 Figure 4.4
Figure 4.5
Figure 4.6 Figure 4.7 Figure 4.8
Figure 4.9 Figure 4.10
Figure 4.11 Figure 4.12 Figure 4.13 Figure 4.14 Figure 4.15 Figure 4.16 Figure 4.17 Figure 4.18
Figure 5.1
List of Figures
Levels and positions of the order picking rack with corresponding dimensions . . . . . . . . . . . . . . . . . . . . . . . Levels and bin positions of the order picking cart containing the four order bins with corresponding dimensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The laboratory setup with the order picking rack and picking cart in the real (a) and the virtual environment (b) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A participant using the HTC Vive for picking in VR during the experiment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Handheld controller of the HTC Vive system with the button used for picking items in VR . . . . . . . . . . . . Top-down view of the experimental setup in the real environment. In VR, only one rack and cart has been simulated . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The altered setup in the real (a) and the virtual environment (b) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Alignment of the infrared sensors and reflectors in the real setup (a) and trigger fields in the virtual setup (b) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Setup for writing time stamps of the infrared sensors and orders to a common database . . . . . . . . . . . . . . . . . . . . . Age distribution of participants in different time periods of the experimental study . . . . . . . . . . . . . . . . . . . . . Participants’ responses to item 1.3 in questionnaire 1: Prior experience with VR . . . . . . . . . . . . . . . . . . . . . . . . . . . . Participants’ responses to item 3.1 in questionnaire 3: General understanding of the task . . . . . . . . . . . . . . . . . . . . . Participants’ responses to item 3.2 in questionnaire 3: Clarity of the pick-by-voice commands . . . . . . . . . . . . . . . . . Participants’ responses to item 3.3 in questionnaire 3: Visibility of items in the real rack . . . . . . . . . . . . . . . . . . . . . Participants’ responses to item 3.4 in questionnaire 3: Visibility of items in the virtual rack . . . . . . . . . . . . . . . . . . . Set completion times in sets 1, 2, 3 and 4 of all participants in group VR (a) and group RR (b) in all time periods of the experiments . . . . . . . . . . . . . . . . . . . . . . . Methods and steps used to test the hypotheses using inferential statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
71
72
72 74 74
76 78
79 79 90 93 94 95 96 96
97 105
List of Figures
Figure 5.2
Figure 5.3
Figure 5.4
Figure 5.5
Figure 5.6
Figure 5.7
Figure 6.1 Figure 6.2 Figure 6.3 Figure 6.4 Figure 6.5 Figure 6.6
xix
Box plots of the weighted NASA-TLX scores for group VR and group RR (a), for participants with an age < 30 years and participants with an age ≥ 30 years within group VR (b), and group VR and the professional order pickers (c) . . . . . . . . . . . . . . . . . . Box plots of the set completion times (s) for group VR and group RR (a), for participants with an age < 30 years and participants with an age ≥ 30 years within group VR (b), and group VR and the professional order pickers (c) . . . . . . . . . . . . . . . . . . Box plots of the picking times (s) for group VR and group RR (a) and for participants with an age < 30 years and participants with an age ≥ 30 years within group VR (b) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Box plots of the searching times (s) for group VR and group RR (a) and for participants with an age < 30 years and participants with an age ≥ 30 years within group VR (b) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Box plots of the number of erroneous orders for group VR and group RR (a), for participants with an age < 30 years and participants with an age ≥ 30 years within group VR (b), and group VR and the professional order pickers (c) . . . . . . . . . . . . . . . . . . Box plots of the number of orders with dropped items for group VR and group RR (a), for participants with an age < 30 years and participants with an age ≥ 30 years within group VR (b), and group VR and the professional order pickers (c) . . . . . . . . . . . . . . . . . . Mean order completion times per number of items in the order . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Mean picking times per number of items in the order . . . . . Mean searching times per number of items in the order . . . Estimated learning curves for picking times per item (s) in sets 1–4 of participants in group VR . . . . . . . . . . . . . . Estimated learning curves for picking times per item (s) in sets 5–8 of participants in group VR . . . . . . . . . . . . . . Estimated learning curves for picking times per item (s) in sets 1–4 of participants in group RR . . . . . . . . . . . . . .
109
114
116
120
122
124 135 137 138 146 147 148
xx
Figure 6.7 Figure 6.8 Figure 6.9 Figure 6.10 Figure 6.11 Figure 7.1
List of Figures
Estimated learning curves for picking times per item (s) in sets 5–8 of participants in group RR . . . . . . . . . . . . . . Estimated learning curves for searching times (s) in sets 1–4 of participants in group VR . . . . . . . . . . . . . . . . . Estimated learning curves for searching times (s) in sets 5–8 of participants in group VR . . . . . . . . . . . . . . . . . Estimated learning curves for searching times (s) in sets 1–4 of participants in group RR . . . . . . . . . . . . . . . . . Estimated learning curves for searching times (s) in sets 5–8 of participants in group RR . . . . . . . . . . . . . . . . . Summary of the key findings of this thesis . . . . . . . . . . . . . .
149 152 153 154 155 189
List of Tables
Table 3.1 Table 3.2 Table 3.3 Table 3.4 Table 3.5 Table 4.1
Table 4.2 Table 4.3 Table 4.4 Table 4.5
Table 5.1
Inclusion and exclusion criteria used to filter the set of articles and produce the final sample . . . . . . . . . . . . . . . . . Overview of the quantitative and content analyses performed on the distinct literature samples . . . . . . . . . . . . . . Pairwise analysis of the manifestations assigned to the articles with no reference to training . . . . . . . . . . . . . . Pairwise analysis of the manifestations assigned to the articles focusing on training . . . . . . . . . . . . . . . . . . . . . Coefficient estimates and p-values of the logit regression models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Overview of both workshops to develop the experimental design, including participants and their company and position . . . . . . . . . . . . . . . . . . . . . . . . Dependent variables and calculation methods . . . . . . . . . . . . Overview of the items in the questionnaires and the step at which the questionnaire had to be filled in . . . . . . . . . . . . . Courses in which the experimental study was advertised at TU Darmstadt for the recruitment of participants . . . . . . . Number of usable datasets with sensor data after removing datasets due to permanently triggering sensors and operating errors . . . . . . . . . . . . . . . . . . . . . . . . . . . Statistical null and alternative hypotheses as well as R functions for each test method . . . . . . . . . . . . . . . . . . . . . . . . .
42 45 52 52 53
66 83 85 87
99 108
xxi
xxii
Table 5.2
Table 5.3
Table 5.4
Table 5.5
Table 5.6
Table 5.7
Table 5.8
List of Tables
Test results for comparing the weighted NASA-TLX scores between group VR and group RR (a), between participants with an age < 30 years and participants with an age ≥ 30 years within group VR (b), and between group VR and the professional order pickers (c) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Test results for comparing the set completion times (s) between group VR and group RR (a), between participants with an age < 30 years and participants with an age ≥ 30 years within group VR (b), and between group VR and the professional order pickers (c) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Test results for comparing the picking times (s) between group VR and group RR (a) and between participants with an age < 30 years and participants with an age ≥ 30 years within group VR (b) . . . . . . . . . . . . . . . . . . . . . . . Test results for comparing the searching times (s) between group VR and group RR (a) and between participants with an age < 30 years and participants with an age ≥ 30 years within group VR (b) . . . . . . . . . . . . . . . . . . . . . . . Test results for comparing the number of erroneous orders between group VR and group RR (a), between participants with an age < 30 years and participants with an age ≥ 30 years within group VR (b), and between group VR and the professional order pickers (c) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Test results for comparing the number of orders with dropped items between group VR and group RR (a), between participants with an age < 30 years and participants with an age ≥ 30 years within group VR (b), and between group VR and the professional order pickers (c) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Summary of the test results for each hypothesis, i.e. if the hypothesis has been rejected in individual blocks or sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
110
113
117
121
123
125
126
List of Tables
Table 6.1
Table 6.2 Table 6.3 Table 6.4 Table 6.5 Table 6.6 Table 6.7 Table 6.8 Table 6.9 Table 6.10 Table 6.11 Table 6.12 Table 6.13 Table 6.14 Table 6.15 Table 6.16 Table 6.17 Table 6.18 Table 6.19
xxiii
Results of a one-sided Wilcoxon signed-rank test to test the hypothesis that time measures t are greater or equal in each set compared to the previous set . . . . . . . . . Notation for variables and parameters used throughout this chapter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Learning curve models used throughout this chapter . . . . . . . Estimated model parameters of the WLC, the STB, the DJLC, and the SCM for picking times per item (s) . . . . Estimated model parameters of the DPLC and the JGLC for picking times per item (s) . . . . . . . . . . . . . Estimated model parameters of the 3PH for picking times per item (s) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Number of participants with bi = 0 for estimating picking times per item . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Estimated model parameters of the WLC, the STB, the DJLC, and the SCM for searching times (s) . . . . . . . . . . Estimated model parameters of the DPLC and the JGLC for searching times (s) . . . . . . . . . . . . . . . . . . . Estimated model parameters of the 3PH for searching times (s) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Number of participants with bi = 0 for estimating searching times . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Descriptive statistics for S E RiL for the different learning curve models estimating picking times per item . . . Number of ranking positions for each learning curve estimating picking times per item . . . . . . . . . . . . . . . . . . . . . . Mean rank of each learning curve (η¯ L ) for estimating picking times per item . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Descriptive statistics for balanceiL for the different learning curve models estimating picking times per item . . . Descriptive statistics for S E RiL for the different learning curve models estimating searching times . . . . . . . . . Number of ranking positions for each learning curve estimating searching times . . . . . . . . . . . . . . . . . . . . . . . . . . . . Mean rank of each learning curve (η¯ L ) for estimating searching times . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Descriptive statistics for balanceiL for the different learning curve models estimating searching times . . . . . . . . .
134 140 144 150 151 151 151 156 157 157 157 166 168 168 169 169 170 170 171
xxiv
Table 6.20
Table 6.21
Table 6.22
Table 6.23
Table 6.24 Table 6.25 Table 6.26
List of Tables
Results of a Shapiro-Wilk test for normal distribution of the learning curve parameters for estimating picking times per item . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Results of a Kolmogorov-Smirnoff and a Mann-Whitney U test comparing learning curve parameters for estimating picking times per item between group VR and group RR . . . . . . . . . . . . . . . . . . . . . . Results of a Shapiro-Wilk test for normal distribution of the learning curve parameters for estimating searching times . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Results of a Kolmogorov-Smirnoff and a Mann-Whitney U test comparing learning curve parameters for estimating searching times between group VR and group RR . . . . . . . . . . . . . . . . . Summary of the test results for each hypothesis, i.e. if the hypothesis has been rejected . . . . . . . . . . . . . . . . . . . . . Results of the linear regression . . . . . . . . . . . . . . . . . . . . . . . . Different p-quantiles of the estimated number of orders f am per participant in group VR (xi ) to reach the mean picking time per item for the first order of group RR in sets 5–8 (t¯1,R R ) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
175
176
177
178 180 183
185
1
Introduction
1.1
Motivation and Research Gap
In the 1999 movie “The Matrix” (Wachowski and Wachowski 1999), it is alleged that reality as we know it is just a vast computer simulation—a virtual reality (VR). During the movie, the protagonists learn to use the VR to their advantage, making almost anything possible for them: Suddenly, they are able to defy the laws of physics, they can fly, move at tremendous speeds or create entire buildings and structures out of nowhere within the VR. In the non-fictional world, having a highly immersive simulation that gives full control of the environment would be a dream come true for researchers and practitioners from many different fields. Especially for the purpose of planning real world structures, processes or human activities, such a VR simulation would offer manifold possibilities: For example, it would allow risk-free testing and evaluation of the planning results under realistic conditions before actually implementing them in the real world. Due to their ability to systematically alter and control certain variables, computer simulations are already widely used for planning tasks (Longo 2011, p. 652). In this context, simulations enable the testing of different conditions without the need for constructing or altering an existing setup (Coburn et al. 2017, p. 8; Akpan and Shanker 2017; Reif and Walch 2008, p. 988). This makes it possible to find and eliminate planning errors at an early stage that would be costly or impossible to correct later (Winkes and Aurich 2015, p. 152; Jayaram et al. 2001, pp. 76, 78; Maria 1997, p. 7). VR technology supplements the range of available simulation methods with the option of integrating one or more real human users into the simulated environments. It allows users to not only visually see, but to perceive the simulated environment via multiple senses and actively interact with it (Aurich et al. © The Author(s), under exclusive license to Springer Fachmedien Wiesbaden GmbH, part of Springer Nature 2021 J.-K. Knigge, Virtual Reality in Manual Order Picking, https://doi.org/10.1007/978-3-658-34704-8_1
1
2
1
Introduction
2009, p. 5299). Thus, users are “immersed” in VR. The integration of real human users into the simulation improves model validation and verification (Akpan and Shanker 2017, p. 206) and enables new ways of experimentation and analysis that include human behaviour (Mol 2019), human information processing (Berg and Vance 2017, p. 2; Akpan and Shanker 2017, p. 209), emotions (Berni and Borgianni 2020), collaboration (Jayaram et al. 2001, p. 82), and ergonomics (Peron et al. 2020, p. 14; Berg and Vance 2017, p. 7; Wilson 1997). It is, therefore, especially promising as a means to support the future planning of processes that involve human interaction (Peron et al. 2020, p. 14; Jayaram et al. 2001, pp. 76, 78; Westkämper and Briel 2001, p. 351). Aside from the planning of processes, VR technology also appears auspicious for training manual activities. According to Ganier et al. (2014, pp. 831–832), training in VR is especially advantageous if the conditions of the real task are uncommon, complex or dangerous and therefore hard to replicate in a real environment. Furthermore, VR training permits real-time adjustment of the setting and can provide artificial scenarios that do not (yet) exist in a real environment. It also offers trainees the possibility to make mistakes without real-world consequences, it works over large distances, and it avoids system downtimes during training sessions. Training in VR can therefore significantly reduce training costs (Ganier et al. 2014, pp. 831–832; Sowndararajan et al. 2008). Note that the field of human training is also closely connected to the planning of manual activities, as the occurrence of learning effects should be considered during planning (Neumann and Village 2012, p. 1147). Even though the VR technology available in the non-fictional world is still far away from the possibilities of the VR depicted in the movie The Matrix, it has already come a long way since the term “virtual reality” was first mentioned in the 1960s by Sutherland (1965). While early VR systems were expensive and difficult to operate, the recent introduction of high-performance head-mounted devices (HMD)1 for the entertainment and gaming industry has led to a significant drop in application costs and requirements (Sun et al. 2020, p. 55; El Beheiry et al. 2019, p. 1315; Coburn et al. 2017, p. 1; Avila and Bailey 2014). This has opened up manifold new use cases for the technology, including many industrial applications (Berg and Vance 2017). Hence, VR hardware sales are increasing rapidly (Manis and Choi 2019, p. 503; Grudzewski et al. 2018, p. 73). Berg and Vance (2017, p. 1) even claim that “Virtual reality […] works! It is mature, stable, and, most importantly, usable.” 1
In literature, the term head-mounted display is more commonly used than the term headmounted device (see e.g., Coburn et al. 2017; Berg and Vance 2017). However, the term head-mounted device, which is for example used by DiZio and Lackner (1992), Klippel et al. (2019), and Sakhare et al. (2019), covers more than just the visual display. Hence, it is assumed more suitable for describing the complex technology and is therefore used in this thesis.
1.1 Motivation and Research Gap
3
One field of application, in which VR can potentially advance to a useful tool in terms of planning and training, is the field of manual order picking (Reif and Walch 2008, p. 987). Even though automation technology is increasingly finding its way into warehousing, order picking is still to a large extent performed manually (Gils et al. 2018, p. 1; Marchet et al. 2015; Napolitano 2012, p. 56; Koster et al. 2007, p. 481). The human-centred design of warehouses and the integration of human factors in the planning of manual order picking systems are therefore considered important fields of research in order to increase warehouse efficiency (Grosse et al. 2015a; Grosse and Glock 2015, pp. 882–883; Grosse et al. 2015c; Chen et al. 2013, p. 77). A large number of the scientific studies that are currently available in the context of warehousing focus on the reduction of picker travel time (Gils et al. 2018; Koster et al. 2007). However, semi-automated parts-to-picker systems that reduce picker travel time to a minimum by automatically delivering item bins or single racks to stationary pickers are becoming more popular in warehouses (Boysen et al. 2017, p. 550; D’Andrea 2012, p. 638). It is therefore likely that the reduction of searching and picking times of human pickers at single racks will increasingly gain attention in the future planning and training of manual order picking (Calzavara et al. 2019, p. 203; Gils et al. 2018, p. 11). The feature of VR, allowing the integration of real human users into simulations, thus appears ideal for use in research on manual order picking. However, it must not be forgotten that contemporary HMDs still come with several limitations. For example, the integration of haptics and force feedback into VR is anything but trivial (Berg and Vance 2017, p. 12; Coburn et al. 2017, pp. 6–7; Vaughan et al. 2016, pp. 71–73; Lawson et al. 2016, p. 326). Also, users tend to underestimate distances in VR (Lin and Woldegiorgis 2015), even when using a contemporary HMD (Kelly et al. 2017). Due to these limitations, current VR technology is only able to mimic physical reality up to a certain degree of equivalence. A manual task performed in VR could therefore significantly differ from an equivalent task in the real world (Pontonnier et al. 2014, p. 200). Moreover, if systematic differences between order picking in VR and in a real environment exist and are not taken into account, VR planning and training could produce results that are invalid for real-world applications. This is especially relevant in the context of planning, as a planning process that relies on wrong assumptions on human performance leads to inaccurate or flawed results. This in turn will make the entire planning outcome invalid and can cause extensive additional costs (Kern and Refflinghaus 2013, pp. 847–848).
4
1
Introduction
Surprisingly, potential differences in human performance between VR and an equivalent real environment are often ignored (Eastgate et al. 2015, p. 357). The existing research that compares human tasks in a virtual and a real environment is still relatively scarce. Among 150 research articles that use VR in the built environment, Kim et al. (2017) have only found 19 studies that provide a comparative evaluation of the usability of VR, with none of them using a contemporary HMD. In the context of manual order picking, only two studies by Reif and Walch (2008) and Wulz (2008) can be found that evaluate using the technology in this field. Again, both studies do not use a contemporary HMD for displaying the VR. Considering the usability of contemporary VR HMDs for planning purposes in general and the planning of manual parts-to-picker order picking systems in particular, a major research gap can thus be identified. In contrast, the amount of available research that compares learning effects in a virtual and a real environment is considerably larger. For example, for the field of surgery training alone, Kim et al. (2017) find a total of 11 publications using different VR technologies. For the training of social skills, Howard and Gutworth (2020) identify 23 relevant research articles using VR. Thus, it can be assumed that learning effects generally exist and can be observed in VR. For the specific application in the training of manual order picking, however, so far no research is available that evaluates the use of contemporary VR HMDs. As the aforementioned limitations of HMDs are also likely to affect the transfer of learning effects between a VR and a real application, additional research is advisable for the specific case of manual order picking. In the available literature, mathematical learning curve models are a well established method for quantifying learning effects in production research (Grosse et al. 2015b, p. 401) in general and manual order picking in particular (see e.g., Glock et al. 2019b; Grosse and Glock 2013; Grosse et al. 2013). However, to the author’s knowledge, no research can be found that applies learning curve models to similar activities performed in VR using an HMD. In summary, the research gap underlying this thesis is depicted in Figure 1.1. As can be seen in the figure, the research gap results from the interaction of the two disciplines virtual reality and manual order picking. Although contemporary VR HMDs appear well suited as a tool for planning and training in the context of manual order picking, it remains unclear how the transfer of planning and training results from a VR to a real environment is affected by the limitations of the technology. The general usability of VR HMDs for planning and training in the context of manual order picking, therefore, remains questionable.
1.2 Aim of the Research, Research Questions and Procedure
5
Figure 1.1 Research gap in the context of using VR HMDs for the planning and training of manual order picking
1.2
Aim of the Research, Research Questions and Procedure
The aim of this thesis is to contribute to closing the research gap described above by systematically investigating the usability of VR HMDs for planning and training in the context of manual order picking. In order to do so, three consecutive research questions (RQ) have been formulated. The research questions are depicted in Figure 1.2 and will be explained in detail below. To lay the foundation for the research at hand, first the process of manual order picking must be further specified in terms of the manual activities which need to be performed during picking. The goal is then to evaluate which of these activities can, in general, be simulated in VR using HMDs. This way, activities that are not suitable for a simulation in VR can already be discarded in this step. The first research question can be formulated as follows: RQ 1 Which activities of manual order picking can be simulated in VR using HMDs? RQ 1 is answered based on the available literature. First, the manual activities associated with manual order picking are identified. Then, a systematic literature review is employed to analyse a wide range of previous studies that have used VR HMDs in different applications. This way, all studies that have already simulated
6
1
Introduction
activities similar to the previously defined activities in manual order picking can be identified and their findings can subsequently be analysed with respect to RQ 1. By deriving assumptions on the usability of VR HMDs for the simulation of specific manual activities in an order picking process, RQ 1 can be regarded as theorybuilding. Subsequently, the second and third research questions are used to test this theory by empirically analysing these activities with regard to the usability of VR HMDs in the context of planning and training. The second research question focuses on the use of VR for planning: RQ 2 Can VR HMDs be used for planning manual order picking activities?
Figure 1.2 Research questions defining the aim of this thesis
As has been described above, it is especially relevant to consider potential differences in human performance between VR and a real environment when using VR HMDs for planning manual order picking. RQ 2 is therefore answered by systematically investigating if significant differences can be observed in human performance between an activity performed in VR using an HMD and the same activity being performed in a real environment. The research question can thus be further specified as follows:
1.2 Aim of the Research, Research Questions and Procedure
7
RQ 2.1 Does order picking in VR differ from real order picking in terms of human performance? Moreover, in the context of planning, VR users might be interested if individual human performance in VR can be used to predict human performance in the real environment. If such a relationship between human performance in VR and human performance in a real environment can be found, VR HMDs can be used in a planning process to evaluate different configurations and predict the outcome in the real environment. In this case, VR HMDs can also be used for planning on the level of individual human pickers: If real-world performance can be predicted, VR HMDs can be used for performance recording and assessment of human pickers, for example during a job application process. Hence, another research question has been formulated as follows: RQ 2.2 Can human performance in VR be used to predict the performance in the real environment? However, Nickel et al. (2019) argue that the performance of human users inexperienced with using VR technology is generally lower during the time period in which these users become familiar with VR. Thus, this effect is an inference that must be considered when using the technology for planning. Meaningful results on the comparison of order picking in VR and in a real environment can only be derived if this effect can be excluded, i.e. if users are already familiar with the technology. Similarly, when VR is used in practice for the planning of manual order picking, users inexperienced with the technology could also potentially distort the results. While RQ 2.1 asks if differences in human performance exist in a VR and a real environment, RQ 2.3 asks when (i.e. after how much time of familiarization with VR) the effects caused by users being unfamiliar with the technology cannot be observed any more: RQ 2.3 What time is needed by users to become familiar with order picking in VR? The third research question is formulated similarly to RQ 2 but focuses on the use of VR HMDs for training: RQ 3 Can VR HMDs be used for the training of manual order picking? As the assumption has been made prior to this thesis that learning effects can generally be achieved in VR, this research question focuses primarily on the compar-
8
1
Introduction
ison of the learning effects and their transfer between VR and the real environment using mathematical learning curve models. As numerous different learning curve models exist, the first step is to determine which learning curve models are suitable for describing learning effects in manual order picking in VR. To answer RQ 3, the following subquestion must thus be answered first: RQ 3.1 Which learning curve models are best suited for describing learning effects in VR? Subsequently, the elected learning curve models can be used to evaluate if learning effects can be transferred between a VR and a real environment. This evaluation can be done on two levels: The first is to ask if learning effects are transferred at all, i.e. if learning in VR is effective. If this is the case, the learning effects can be compared to learning effects obtained from training in a real environment, thus evaluating if VR training is also efficient. The second subquestion for RQ 3 can thus be formulated as such: RQ 3.2 Can learning effects be transferred effectively and efficiently from VR to the real environment? In order to evaluate the usability of VR for practical application, Wilson et al. (1996, pp. 119–126) suggest comparing activities and tasks performed in VR and the obtained results to the results and effects in a real environment. To be able to answer RQ 2 and 3, human performance data for order picking in a VR and in a real environment is thus needed. To collect this data, an experimental setup consisting of both a virtual and a real order picking rack, was developed. By constructing both environments to be as similar as possible, performance data for manual order picking can be directly compared between them. A total of 112 volunteers were recruited and a large-scale study was then conducted using the experimental setup. Participants were divided into two groups, one of which performed a picking task in both environments. The other group served as a control group and performed the same picking task but only in the real environment. To quantify and compare human performance, times, picking errors and the perceived workload were measured in each environment. By comparing the results, RQ 2.1 can be answered. Then, learning curve models are fitted to the data in order to answer the remaining research questions.
1.3 Thesis Outline
1.3
9
Thesis Outline
This thesis is divided into seven chapters as depicted in Figure 1.3. Following this chapter, which provides the introduction, the theoretical background is described in chapter two. Here, the fundamentals of the two main disciplines relevant to this thesis (virtual reality and manual order picking) are presented. Furthermore, the current state of research is outlined briefly. The second chapter closes with an overview of the state of research on using VR in manual order picking, thus bringing the two disciplines together. This way, the previously described research gap can be further specified, providing the basis for the following analyses. As Figure 1.3 shows, the third chapter aims at answering RQ 1 by providing a systematic literature review analysing studies that use VR HMDs for simulating manual activities. It starts with a tertiary analysis of previous literature reviews in the field of interest, followed by the elaboration of a framework for the content analysis of the literature. Then, the searching for and sampling of the literature is described. The fourth section of the third chapter contains the actual analysis of the literature, followed by a brief conclusion and the answer to RQ 1 in the final section. In the fourth chapter, the experimental setup used for answering RQ 2 and RQ 3 is described in detail. First, the basic research design is specified before an overview is given of the actual implementation, including the design process, the randomization process, the experimental procedure, and the laboratory setup. Next, the research questions are operationalised by selecting the relevant dependent variables and describing the methods for measuring human performance in manual order picking and the questionnaire design. The final section of the chapter outlines the experimental study that was executed using the experimental setup, giving information on the sampling process, the participants in the sample, the validation and verification of the experimental setup, and the necessary data preparation. In chapter five, the results of the comparison between order picking in VR and in the real environment are provided, thus answering RQ 2.1. The chapter starts by formulating research hypotheses based on RQ 2.1 and the dependent variables available for the analysis. Moreover, the inferential statistics methods that are used for the hypotheses testing are introduced. In the third section of the chapter, the results of the statistical analyses are presented. The chapter closes with a discussion of the results. The sixth chapter is dedicated to the analysis of learning curves based on the data from the experimental study. First, the occurrence of learning effects in general is investigated and the dependent variables are selected for fitting the learning curve models. In the second section of the sixth chapter, the different learning curve models are introduced and fitted to the data. The third section then aims at answering RQ 3.1,
10
1
Introduction
i.e. comparing the learning curve models based on their quality of fit and evaluating which one is best suited for describing learning effects of order picking in VR. For these learning curve models, the estimated parameters are then compared between VR and the real environment in section four in order to answer RQ 3.2. Note that only then, i.e. in the fifth and sixth section of chapter six, RQ 2.2 and RQ 2.3 are answered, even though these research questions are associated with the context of planning. This is because the analysis for answering these research questions also relies on the learning curve models selected in section 6.3. The final chapter concludes the thesis. It summarizes the answers to the research questions and highlights the implications for research and practice. The chapter closes with a critical discussion of the limitations of the thesis and an outline of opportunities for future research.
Figure 1.3 Integration of the research questions into the general outline of the thesis
1.4 The Thesis in the Context of the Philosophy of Science
1.4
11
The Thesis in the Context of the Philosophy of Science
The purpose of this thesis is to evaluate the usability of VR HMDs for planning and training in the context of manual order picking. It thus originates from the field of business administration and business research. However, due to the technical aspect underlying the evaluation of VR HMDs, this thesis is also strongly connected to the field of information science. As human behaviour is analysed, it further relates to social sciences, psychology, human factors research, and ergonomics. This thesis thus reflects the interdisciplinary character of research in the field of business administration (Helfrich 2016, p. 17; Fülbier 2004, p. 267). From the meta-perspective of the philosophy of science, all scientific disciplines have in common that they seek for knowledge. This epistemological principle also justifies the classification of business administration as a discipline among scientific research (Fülbier 2004, p. 267). As every other scientific discipline, business administration as a science covers a particular part of reality and its pursuit of knowledge is characterized by a specific object (Jung 2016, pp. 21–22). The objects of research in business administration in general are businesses (i.e. complex real-world systems), the economic links between businesses and their environments, the making of decisions on the use of limited resources, and the impact of these decisions (Helfrich 2016, pp. 22–23; Jung 2016, pp. 23–24). Because this thesis covers the usability of VR HMDs for planning and training in the context of manual order picking, it is obviously related to the general object of decision making in business administration. Based on its scientific object, business administration is considered to belong to the field of social sciences and humanities in distinction to the field of natural sciences. Both fields can be described as applied or “real sciences” in contrast to formal sciences, which seek fundamental knowledge (Jung 2016, pp. 22–23). The strong focus of business administration on the practical implementation of results is undeniable, whereas fundamental sciences are more interested in providing a general model of reality (Helfrich 2016, p. 6). Therefore, business administration is often conceived to be more of a practical science than a theoretical science (Fülbier 2004, p. 267). Yet, this subdivision cannot always be applied without further consideration: In reality, business administration faces research questions for which a theoretical basis must first be established prior to practical research (Helfrich 2016, p. 6). Theory building and the identification of fundamental cause-effect relationships can thus also be a valid objective in research in business administration. The practical character of business administration is mainly defined by the more technological objective of transforming explanatory cause-effect relationships into formative target-means systems (Fülbier 2004, p. 267).
12
1
Introduction
The thesis at hand is prototypical for the described interdependence of theoretical and practical aspects in business research. On the one hand, a theory-building approach (induction) is pursued with the goal of identifying the cause-effect relationship between the use of VR HMDs and human performance in manual order picking. On the other hand, the cause-effect relationship is empirically tested (deduction) to evaluate the practical usability of the technology for planning and training tasks. In methodological terms, this thesis thus corresponds to both a rationalist as well as an empirical procedure. While RQ 1 refers to searching for general knowledge on VR usability in the available literature and applying it to manual order picking, the answer to the remaining research questions will be based on using empirical experimentation to test previously defined hypotheses.2 In this context, the epistemological philosophy of critical rationalism, developed by Popper (1989), must also be named as a fundamental concept in business research. According to this concept, the scientific analysis of causal relations first requires the formulation of hypotheses. These hypotheses can then be tested empirically. However, according to the principles of critical rationalism, hypotheses can only be falsified but not proven (Fülbier 2004, p. 268). The concept of critical rationalism is also reflected by this thesis, as the empirical analysis requires the formulation of hypotheses on the usability of VR HMDs, which then have to withstand falsification in the subsequent statistical testing. Accordingly, a final proof of the hypotheses which are not rejected during this process is beyond the boundaries of this thesis.
2
For a comprehensive overview of rationalism and empiricism, please refer to Fülbier (2004, p. 268).
Theoretical Background: The use of Virtual Reality Head-Mounted Devices for Planning and Training in the Context of Manual Order Picking
This chapter presents the theoretical background of this thesis in order to elaborate on the previously defined research gap and to highlight the relevant points of reference to other research in this field. The focus will be on two main aspects: VR technology and manual order picking. Hence, the chapter starts by briefly describing the fundamentals of VR technology as well as the limitations of contemporary HMDs. Also, the state of research concerning the application of VR in the context of planning and training is summarized. The second section then provides an explanation of the fundamentals of manual order picking and the relevance of planning in this context. In analogy to the section on VR technology, this is followed by an overview of the state of research concerning planning and training in the context of manual order picking. Both aspects are brought together in the final section of this chapter, which summarises the state of research on the use of VR in manual order picking. Additionally, the research gap is specified further based on the identified research.
2.1
Virtual Reality Head-Mounted Devices as a Tool for Planning and Training
2.1.1
Fundamentals of Virtual Reality Technology
One of the first appearances of the term “virtual reality” can be found in Sutherland (1965), thus dating back to the 1960s. A universal definition of VR is provided by Steuer (1992, pp. 76–77): The author describes VR as an environment in which a user is subject to telepresence. The term telepresence refers to the feeling of being in (i.e. present in) a mediated environment. Coburn et al. (2017, p. 1) use this definition to develop an advanced definition of VR: According to them, VR is characterized © The Author(s), under exclusive license to Springer Fachmedien Wiesbaden GmbH, part of Springer Nature 2021 J.-K. Knigge, Virtual Reality in Manual Order Picking, https://doi.org/10.1007/978-3-658-34704-8_2
13
2
14
2 Theoretical Background
by the replacement of one or more physical senses of the user with virtual senses. Moreover, the authors suggest classifying virtually created environments in terms of immersivity and fidelity. In general, a higher immersivity is given when a larger proportion of the physical environment is replaced by VR. A high fidelity1 is present if the virtually created inputs are realistic, i.e. if they match the inputs that would be experienced in a physical environment (Coburn et al. 2017, p. 1; Bowman and McMahan 2007, p. 38). In literature, the term immersion is also used to describe how believable a VR is perceived by its users. According to Bystrom et al. (1999), immersion is high if the VR system is inclusive (i.e. a large number of real world stimuli are excluded), extensive (i.e the number of physical senses replaced by the system is high), surrounding and vivid. Note that the necessity to use these terms for the classification of VR systems shows that until today, achieving full immersion is difficult and VR systems are only able to replace physical reality up to a certain degree of equivalence (Elbert et al. 2017, p. 150; Pontonnier et al. 2014, p. 200). The terms immersivity and immersion are also suitable to distinguish Augmented Reality (AR) from VR: In AR, physical senses are not replaced but extended by the additional provision of virtual information. Immersivity in AR is thus lower compared to VR (Coburn et al. 2017, p. 1; Starner et al. 1997). Milgram et al. (1995) suggest a continuous scale for the classification of AR and VR systems, called the Reality-Virtuality continuum, reaching from entirely real to fully virtual environments. The area between the two extremes in which AR systems can be located is referred to as mixed reality. Recently, Flavián et al. (2019) have developed an extended taxonomy for the classification of VR and AR technology. Their “EPI Cube” consists of the three dimensions perceptual presence, behavioural interactivity and technological embodiment. Highly immersive VR systems rank high on all three of these dimensions. In general, achieving a high immersion is considered beneficial for providing spatial perception (Paes et al. 2017, p. 302) and for allowing effective and transferable learning and training in VR (Dede 2009; Sowndararajan et al. 2008), even though perfect fidelity is not always necessary (Stone 2011). Yet, especially for non-emotional environments like in order picking scenarios, Baños et al. (2004) argue that high immersion is important. For this reason, this thesis concentrates on investigating highly immersive, interactive and inclusive virtual environments. AR is therefore considered out of scope and will not be covered further. To achieve immersion, VR systems need hardware for the output of the virtually created senses, as well as some sort of 3D input devices for interaction with the 1
Some authors refer to fidelity as “sensor fidelity” (e.g., Elbert et al. 2017; Bowman and McMahan 2007).
2.1 Virtual Reality Head-Mounted Devices as a Tool for Planning and Training
15
virtual environment (Choi et al. 2015, p. 41; Steuer 1992, pp. 74–75). In the following subsections, different types of output and input hardware are briefly described. VR output technology According to Coburn et al. (2017, p. 2), the most important output in order for a VR system to provide an immersive experience is visual 3D imaging. Bowman and McMahan (2007, p. 38) list a total of nine factors relevant for creating visual immersion, namely field of view (i.e. the visual field that can be viewed instantaneously), field of regard (i.e. the total size of the visual field), display size, display resolution, stereoscopy (i.e. having separate images for each eye), head-based rendering (i.e. the real-time tracking of the position and direction of the user’s head in space), realism of lighting, frame rate and refresh rate. These factors are primarily defined by the systems hard- and software. While different software tools that render VR will not be covered further here, the hardware used for generating visual output can either be classified as cave automatic virtual experience (CAVE) systems or HMDs (Coburn et al. 2017, p. 2). CAVE systems were first developed by the University of Illinois (Creagh 2003). They can be described as a room (i.e. the cave) in which images are projected on all four walls and sometimes also on the ceiling and the floor. Specialized goggles are used to transform the projector’s images into stereoscopic 3D environments (Coburn et al. 2017, p. 2; Creagh 2003). The primary advantages of CAVE systems lie in the large field of view and the possibility of a collaborative use (Cruz-Neira et al. 1992). However, CAVE systems have large space requirements and are therefore expensive and not easy to set up. As a result, CAVE systems are mainly used in large research facilities (Coburn et al. 2017, p. 3; Miller et al. 2005, p. 1). In contrast to CAVEs, HMD systems only have two high-resolution displays that are placed directly in front of the user’s eyes inside a wearable headset. In recent years, highly immersive HMDs that were originally developed for the gaming and entertainment industry have entered the market (Avila and Bailey 2014). Well-known examples of these HMDs are the HTC Vive (High Tech Computer Corporation, Taoyuan, Taiwan), the Oculus Rift (Facebook Inc., Menlo Park, California, USA) or the Sony Playstation VR system (Sony Corporation, Tokyo, Japan).2 Because the primary target group of these systems are private end consumers, they are relatively cheap and easy to set up and operate (El Beheiry et al. 2019, p. 1315; Coburn et al. 2017, pp. 3–4). Not only the hardware, but also the software used with HMDs is easily available and has lower requirements than the software for CAVE 2
For an extensive overview of VR systems currently available on the market, please refer to Coburn et al. (2017).
16
2 Theoretical Background
systems in terms of the computational skills needed by the operator (Paes et al. 2017, p. 294). HMDs therefore offer many opportunities to be used in small and medium sized companies or research facilities and are becoming increasingly more popular (Vaughan et al. 2016, p. 67; Choi et al. 2015, p. 56). Besides visual imaging, haptics are another output factor important for VR users to experience immersivity and fidelity. Research on possible solutions for the integration of haptic feedback into VR is manifold. Overviews of the available hardware and software for haptic feedback can be found in Xia (2016) and Vaughan et al. (2016). However, most of these existing solutions require additional hardware which is complicated and costly (Xia 2016, p. 2; Choi et al. 2015; Burdea 2000). This counteracts the previously mentioned advantage of modern HMDs being relatively cheap. The development of affordable haptic feedback devices for consumer-level VR systems is therefore still considered an unsolved issue (Berg and Vance 2017, p. 16). Audio output is also important for immersive VR systems. Current VR systems usually use headphones or loudspeakers that are able to provide 3D sound (Jayaram et al. 2001, p. 75). Moreover, for achieving even higher degrees of immersion, olfactory and gustatory displays, providing smell and taste to VR users, are under development (Coburn et al. 2017, p. 2). For the simulation of manual order picking, audio, olfactory and gustatory output is not relevant and will therefore not be covered further within this thesis.3 VR input technology Input technology refers to the possibility of VR users to interact with the simulation and to the simulation being able to react to the users’ actions. To achieve fidelity, VR systems need to track the users’ movement and display it in VR. Both CAVE systems and HMDs use motion capturing technology to transfer the users’ movement in the real environment into VR. Motion capturing technology can either record human postures and gestures directly or, instead, capture the relative position of the VR hardware or certain tracking elements that are worn on the users’ body (Coburn et al. 2017, pp. 5–6). Additionally, HMDs often include gyroscopes and accelerometers to record movement (Vaughan et al. 2016, p. 70). This allows users to move freely in VR as long as they are within a defined sensor area. Additional technology, such as treadmills or optical illusions, can be used to increase the area of free movement (Wulz 2008, pp. 79–80).
3
For further information on the current state of research on olfactory displays, please refer to Micaroni et al. (2019).
2.1 Virtual Reality Head-Mounted Devices as a Tool for Planning and Training
17
Motion capturing can also be used to enable user interaction with VR. A popular tool for tracking the user’s hands and gestures is the so-called Leap motion controller (Leap Motion Inc., San Francisco, California, USA). Even though this technology allows hands-free and therefore relatively realistic interaction in VR, Figueiredo et al. (2018) argue that the system’s usability is restricted due to its inaccuracy and unreliability. According to the authors, using handheld devices for user interaction with VR is favourable. Consumer HMDs such as the HTC Vive or the Oculus Rift come with controllers that are equipped with buttons for user input. Additionally, the controllers’ position is tracked and haptic feedback can be provided to a limited degree via vibration in the controllers (Berg and Vance 2017, p. 2; Coburn et al. 2017, p. 6). More advanced interaction hardware, such as specialized data gloves, is also under development (Lawson et al. 2016, p. 325; Vélaz et al. 2014, p. 2). Nevertheless, Lawson et al. (2016, p. 327) still recommend manufacturers of VR hardware and researchers to put additional effort into the development of advanced methods for user interaction in VR. In summary, current VR technology can be considered a technology that is mature and well usable for practical applications (Berg and Vance 2017, p. 1; Choi et al. 2015, p. 56). With the advances in the development of HMDs, low-cost and easyto-implement products are available for a wide range of potential users. It is thus especially suitable for integrating human users into simulations during the planning and training of processes. Moreover, additional hardware exists, or is currently under development, that aims to further extend the range of use of contemporary HMDs. However, to maintain the technology’s advantage of being relatively cheap, this thesis focuses on evaluating the use of consumer-level HMDs without any costly extensions. It is, therefore, necessary to look more closely at the limitations of consumer-level, out-of-the box HMDs.
2.1.2
Limitations of Contemporary Virtual Reality Head-Mounted Devices
Despite the above-mentioned advantages in terms of cost and usability, contemporary HMDs also come with some mentionable limitations: First, although HMDs enable free and natural movement, this movement is limited to a relatively small area that is covered by the available sensors. Moreover, most HMDs have cables attached to them which also restrict free and natural movement (Coburn et al. 2017, p. 4). Hence, the available HMDs are only suitable for applications within a small area (Niehorster et al. 2017, p. 20).
18
2 Theoretical Background
Second and third, contemporary HMDs are not capable of providing detailed haptics and force feedback. In fact, providing haptics and force feedback in VR in general (e.g., in order to allow users to perceive weight) is still a major concern among researchers (Berg and Vance 2017, p. 12; Varalakshmi et al. 2012; Laycock and Day 2007; Jayaram et al. 2001, p. 81). Similarly, it is obvious that using buttons and the available handheld controllers for physical interaction with objects in VR is not an adequate representation of real-world human hand interactions (Jayaram et al. 2001, p. 74). Also, Nanjappan et al. (2018) have found that the controllers are not well suited for precise interaction with small objects. Even though some solutions for consumer-level HMDs exist that enable more advanced haptic feedback to VR users or allow more naturalistic human hand usage, these techniques were found to be either inaccurate and imprecise (Figueiredo et al. 2018) or do not necessarily improve the way users interact with elements in VR (Magdalon et al. 2011). Fourth, researchers have found that human users have a different perception of space in VR compared to the real world, which causes humans to underestimate distances and object sizes in VR (Lin and Woldegiorgis 2015; Stefanucci et al. 2015). Even though this effect is less pronounced with contemporary HMDs, it can still be observed in VR experiments (Kelly et al. 2017). Despite the large body of literature on this issue (see e.g., Bhargava et al. 2020; Lin et al. 2020; Creem-Regehr et al. 2015; Lin and Woldegiorgis 2017; Lin and Woldegiorgis 2015; Lin et al. 2014; Grechkin et al. 2010; Willemsen et al. 2009), the cause of these differences has not yet been fully identified. Researchers assume that the restricted field of view, the limited screen resolution or even the weight of the HMD itself influence spatial perception (Kelly et al. 2017, pp. 13–14). Finally, researchers are discussing whether HMDs cause cyber- or motion sickness and fatigue with some users, especially if the display resolution and refreshment rate are too low (Aldaba and Moussavi 2020; Brough et al. 2007; Jayaram et al. 2001; Nalivaiko et al. 2015). Moreover, HMDs in general might be incompatible with corrective eye-wear. However, according to Coburn et al. (2017, p. 4–5), Vaughan et al. (2016, p. 71), and Jayaram et al. (2001, p. 80), modern HMDs are especially designed to successfully prevent cybersickness and incompatibility with corrective eye-wear. These limitations are therefore considered less relevant in the context of this thesis. In summary, the following four major limitations of consumer-level HMDs have been identified from the literature: 1. Limited freedom of movement. 2. Missing haptic and force feedback. 3. Inadequate simulation of realistic hand gripping movements.
2.1 Virtual Reality Head-Mounted Devices as a Tool for Planning and Training
19
4. Limited spatial perception leading to users underestimating sizes and distances in VR. According to Pontonnier et al. (2014, p. 200), these limitations of VR can lead to a task performed by a human user in VR significantly differing from a similar task performed in the real world. Without doubt, the abovementioned limitations could have a great impact on human performance when simulating manual order picking in VR. Thus, the results of planning and training obtained from a VR simulation could be invalid for the real-world application. It is therefore deemed necessary to carefully compare human performance in a virtual and a real environment to evaluate whether a specific activity can be replicated in VR in such a way that human performance resembles the performance in the real-world equivalent of the activity.
2.1.3
State of Research on using Virtual Reality in a Planning Context
Due to the technological advances in recent years, it is possible to identify various fields in which VR planning could be applied. According to Kim et al. (2017), one of the most important areas for this technology is the field of medicine. The authors summarize the current state of research on the use of AR and VR in plastic surgery, finding a total of 13 studies dealing with the planning of surgical interventions. In the context of production, Korves and Loftus (2000) evaluate the usability of VR for the planning of manufacturing layouts. According to them, the major advantage of VR lies in its capability to provide realistic images of the planned layout. However, they also mention that users generally need a good understanding of the virtual environment, meaning that some sort of introduction of the users to VR is advisable prior to its use. The usability of VR in the context of factory planning is also highlighted by Menck et al. (2012). The authors especially value the potential of VR for collaborative planning. Peron et al. (2020) suggest integrating different emerging technologies into the process of facility layout planning. They conclude that combining immersive VR systems with motion capturing technology provides a promising tool for a planning process as it makes it possible to consider individual workers’ needs and ergonomics while at the same time reducing the time and cost of the planning process. Ottosson (2002) suggests using VR to replace physical prototypes during the process of product development. Prototypes in VR are not only cheaper but also easier to adopt and change compared to physical prototypes. Consequently, Cecil
20
2 Theoretical Background
and Kanchanapiboon (2007) identify a considerable amount of literature on the use of VR in the product design and prototyping process. However, they argue that the inability of VR to simulate force feedback is the biggest obstacle for using VR in this context. Budziszewski et al. (2011) highlight the benefits of the ability of VR systems to integrate human users into complex simulations. They use the technology to modify workplaces to fit the needs of workers with motion disabilities. According to them, VR not only enables easy modification of the simulated environment and the testing of different workplace configurations, it also facilitates the simulation of different human features such as physical disabilities. A study focusing on the evaluation of different assembly planning techniques, including VR, is provided by Ye et al. (1999). The authors compare traditional methods to a desktop application and an immersive CAVE system. Both the desktop application and the VR system prove to have an advantage over traditional methods by yielding shorter assembly times. In summary, the wide range of planning applications identified in the available research highlights the potential of using VR in a planning context. Even though the results from literature are promising, a universally applicable assumption that VR can be used for planning tasks in general and for planning manual activities in particular cannot yet be made. Analysing the potential of VR HMDs for planning tasks in the context of manual order picking is, therefore, still necessary.
2.1.4
State of Research on Learning and Training in Virtual Realities
The term “learning” refers to the improvement of a process over time (i.e. increased output or decreased number of errors), which is caused by the repeated execution of the process by human workers (Grosse and Glock 2015, pp. 883–884).4 In one of the first studies on learning, Wright (1936) describes the decreasing costs per unit in the manual production of war planes over time. Since then, learning effects in industrial applications have increasingly received attention among researchers (Grosse and Glock 2015, p. 884; Grosse and Glock 2013, p. 852). Learning effects can be mathematically described using learning curve models. A comprehensive overview of different learning curve models and their application
4
Note that learning can not only be observed on an individual level, but also on an organizational level (Epple et al. 1991). However, in this thesis, individual human learning is in focus and organizational learning will not be taken into consideration.
2.1 Virtual Reality Head-Mounted Devices as a Tool for Planning and Training
21
in research can be found in Anzanello and Fogliatto (2011).5 A recent overview of the existing research on the application of learning curves, especially in the fields of production and operations management, is provided by Glock et al. (2019b). In recent years, the existing learning curve models have been extended to also include the depiction of negative effects, such as human fatigue (Jaber et al. 2013) and forgetting (Nembhard and Uzumeri 2000b). To help identify an adequate learning curve model, Grosse et al. (2015b) have analysed the existing literature on learning effects in order to evaluate which learning curve model is best suited for different applications. Similar to planning, research on learning and training in VR, as well as the transfer of training effects between VR and real environments, can already be found for different fields of application. According to Vaughan et al. (2016, p. 76), VR training is mostly used in five areas of application: in a medical context, in an industrial and commercial context, in serious games, in collaborative remote training, and in rehabilitation. Note that this section focuses on the state of research on VR training that appears relevant for the application in manual order picking. Unrelated research fields, i.e. gamification or using VR for collaborative remote training or rehabilitation, are thus not covered further here. If interested, the reader might refer to the available literature on gamification (e.g., Warmelink et al. 2020), collaborative remote training (e.g., Bertram et al. 2015; Merién et al. 2010) and rehabilitation (e.g., Pedroli et al. 2019; Loucks et al. 2019; Laver et al. 2015; Rizzo and Kim 2005; Schultheis and Rizzo 2001). Vaughan et al. (2016 p. 74) claim that one of the most important requirements of a VR training system is to provide transfer validity. Transfer validity is given if the trained skills and knowledge can be successfully transferred from VR to realworld application. Early publications on the transfer of training and learning from VR to real environments date back to the 1990s. Witmer et al. (1996) evaluate the training of navigation routes inside a building using VR. They find that route knowledge is easily transferred from VR to the real environment, making VR an effective technology for training. The transfer to a real environment of spatial and route knowledge gained through learning in a VR is analysed by Waller and Knapp (1998). They find that especially longer periods of VR training are more effective than real-world training. McComas et al. (1998) investigate spatial learning with children, finding VR training to yield comparable results to a similar training in the real world. Applied to the field of manual order picking, these results indicate that VR is very suitable for the training of picker travel routes within a warehouse. 5
The learning curve models relevant for this thesis will be thoroughly introduced in section 6.2.1.
22
2 Theoretical Background
However, it must be kept in mind that these studies were published before the introduction of contemporary consumer-level HMDs. Literature on the training of manual skills and activities can mainly be found in the context of maintenance and assembly tasks, and in medicine. Ganier et al. (2014) conduct a study on the training of military tank maintenance. Even though they do not use an immersive VR system but a simple desktop application, they find that VR and traditional training perform equally well and trained knowledge is transferred successfully to the real world. The study also reveals that not only motor skills but also complex cognitive procedures with a high level of abstraction can be successfully trained in VR. Vélaz et al. (2014) investigate the training of an assembly task using different technologies for the user interaction with VR. They find that more advanced systems for user interaction in VR, which provide force feedback to users, do not provide a better transfer of knowledge but require longer training times. This indicates that advanced systems for physical interaction and haptics in VR are not always necessary to achieve good training results, making modern consumerlevel HMDs a valid technology for training. Gavish et al. (2015) also investigate the training of an assembly task using a non-immersive VR system. Their study yields similar results with no significant difference in post-training performance between participants that were trained in VR and participants trained by watching an instructional film. However, VR training generally took longer than watching the instructional film. A new VR training method for assembly tasks using an immersive HMD is provided by Ho et al. (2018). Using this method, the authors find significant advantages in quantitative and qualitative measures compared to traditional training methods. They conclude that VR can indeed be used as an effective and cheap training tool. Medical surgery is another field in which complex manual skills can be trained using VR. In fact, research in this area is already manifold and comprehensive overviews can be found in Kim et al. (2017) and Yiannakopoulou et al. (2015). Both reviews conclude that VR systems are beneficial for the training of the manual skills necessary for conducting surgery. In accordance with the previously cited works on the training of manual activities in other fields of application, Yiannakopoulou et al. (2015) state that basic manual skills can be transferred from VR training to real-world surgery. Medicine is also the only field of research in which analyses of learning curve models for VR training can be found (e.g., Selvander and Åsman 2012; Kruglikova et al. 2010; Hogle et al. 2007). However, so far no research has been found that uses mathematical models to estimate learning curves in VR and compare the results to learning curves obtained from a real environment. This is therefore regarded as a promising field for extending the available research on learning and training in VR.
2.2 Human-Centred Planning and Training in the Context of Manual Order Picking
23
In summary, it is promising that researchers agree that learning in VR is possible and learning effects can be observed, even though a large amount of the existing research does not use contemporary HMDs for the analysis of training effects. Additionally, it has further been found that training in VR does not necessarily perform better than traditional training methods. Therefore, it appears valuable to systematically analyse the results of previous research that uses an HMD for training manual activities in order to identify the prerequisites of an effective and efficient training in VR. Also, the use of mathematical models for estimating learning curves has not gained much attention in VR research. Investigating whether established learning curve models can be applied to VR training of manual order picking using an HMD and comparing the results between a VR and a real environment is thus considered a valuable contribution to the existing research.
2.2
Human-Centred Planning and Training in the Context of Manual Order Picking
2.2.1
Fundamentals of Order Picking and Order Picking Technology
In general, manual order picking can be defined as the process of locating, moving towards and retrieving items from racks inside a warehouse, which are subsequently delivered to a depot for further processing and shipping in order to fulfil customer orders (Koster et al. 2007, p. 481; Grosse et al. 2017, p. 1260). Two major types of order picking systems can be distinguished: Picker-to-parts and parts-to-picker systems. In picker-to-parts systems, the order picker moves to stationary picking racks or bins in order to retrieve the requested items (Gils et al. 2018, p. 1). In contrast, in parts-to-picker systems, the racks or bins themselves automatically move towards the picker for item retrieval (Boysen et al. 2017; Marchet et al. 2015, p. 84; Koster et al. 2007, pp. 483–484). However, parts-to-picker systems still include human pickers for the final step of retrieving items and can therefore be described as a system located in-between manual and fully automated order picking systems (Melacini et al. 2011, p. 841). One such example of a parts-to-picker technology is the so-called KIVA system, which is offered by Amazon Robotics: Low-profile robots that are able to navigate underneath racks in a warehouse are used to lift and transport these racks towards stationary picking locations (Weidinger et al. 2018; D’Andrea 2012; Guizzo 2008). Other intralogistics manufacturers such as Grenzebach and Swisslog offer similar products and the technology is assumed to be easy to implement and requires rela-
24
2 Theoretical Background
tively little investment (Weidinger et al. 2018, p. 1479). Further examples of partsto-picker systems are vertical lift systems (Calzavara et al. 2019) or end-of-aisle picking systems, in which items are brought to picker workstations on closed-loop conveyors (Claeys et al. 2016). Moreover, carousel racks (Van den Berg 1996) and automated storage and retrieval systems (Roodbergen and Vis 2009) can also be used to bring racks or bins to stationary pickers, thereby falling under the same category of parts-to-picker systems. However, as manual picking is still considered more flexible than automated picking solutions, most warehouses in Western Europe use picker-to-parts systems, and the degree of automation in warehouses is still generally low (Gils et al. 2018, p. 1; Marchet et al. 2015; Napolitano 2012, p. 56; Koster et al. 2007, p. 481). This leads to high labour costs, making order picking accountable for approximately 50% or more of overall warehousing costs (Habazin et al. 2017, p. 60; Petersen and Aase 2004, p. 11; Tompkins et al. 1996, p. 435). Nevertheless, D’Andrea (2012, p. 638) points out that semi-automated parts-to-picker systems are currently becoming more popular in warehousing, as these systems enable personnel cost savings while still being more flexible than fully automated systems. In both picker-to-parts and parts-to-picker systems, additional technology can be employed to support pickers and provide the necessary information on the orders. The simplest way to provide this information is to use paper-based pick lists. More advanced technologies offering additional support to pickers in order to decrease searching and picking times, as well as picking errors, are pick-by-scanner, pick-bylight, pick-by-voice, and pick-by-vision systems (Guo et al. 2015). While in pickby-scanner systems information is provided on the monitor of a handheld scanner, pick-by-light and pick-by-voice systems enable hand-free work (Reif and Walch 2008, p. 988). In pick-by-light systems, picking locations are indicated by small lamps at the corresponding rack position. After item retrieval, pickers need to press a button to inform the system that a pick is completed (Guo et al. 2015, p. 17). Pick-by-voice systems deliver information using spoken commands, typically via headsets. Pickers themselves can also give spoken commands, confirming an order or asking for the last command to be repeated (Vries et al. 2016b, p. 2). Pick-by-vision systems are the most advanced technology and strongly related to AR. Here, pickers wear special glasses that project necessary information directly in the picker’s field of view (Reif and Walch 2008, p. 989). The information can either be provided via simple head-up displays (Guo et al. 2015, p. 17) or – in the most advanced cases – via see-through displays that let the information adopt dynamically to the visible surrounding of the picker (Schwerdtfeger et al. 2011). In practice, pick-by-light and pick-by-voice are the most common (Reif and Walch 2008, p. 988), although
2.2 Human-Centred Planning and Training in the Context of Manual Order Picking
25
each technology offers unique advantages and disadvantages for different use cases (Vries et al. 2016b).
2.2.2
Fundamentals of the Planning and Design of Order Picking Systems
Warehouses mainly compete in terms of response time to customer orders, flexibility and cost. Therefore, Staudt et al. (2015, p. 5531) identify four direct performance indicators that must be considered as goals during the planning of a warehouse: time, quality, cost and productivity. However, the authors also claim that in warehousing research, the cost indicator is most often expressed in terms of the inventory cost of the warehouse. On the operational level, performance evaluation is often based on non-monetary indicators (Gunasekaran and Kobu 2007). In the case of manual order picking, the operational costs are strongly related to the pickers’ travel and picking time (Habazin et al. 2017, p. 60; Koster et al. 2007, p. 486). Furthermore, Staudt et al. (2015, p. 5533) point out that the productivity indicator refers to the overall capacity utilization of the warehouse. When analysing the picking process on the level of an individual human picker, the overall warehouse productivity is out of scope, whereas the individual picker’s productivity can again be reduced to time and quality measures of the picking activity (Battini et al. 2015, p. 483). As a result, for the planning of manual order picking systems, achieving low order lead time and high quality are the most important goals. In general, Tompkins et al. (1996, p. 437) distinguish five different time consuming activities influencing total order lead time in order picking, namely picker travelling, searching, picking, setup, and other activities (e.g., documentation, counting, sorting or idle time).6 According to the authors, picker travel time is responsible for the largest share of an order picker’s working time, as pickers spend approximately 50% of the time travelling. Since travel time is defined as the time required for moving from, to and between picking locations, it is directly influenced by the distances within a warehouse (Koster et al. 2007, p. 486). Searching time, which is responsible for the second largest share of a picker’s working time (approximately 20%) (Tompkins et al. 1996, p. 437), may include both the time needed for finding the rack with the item to pick, and the time needed for finding the item itself within the corresponding rack. This means that part of the searching activity takes place simultaneously while the picker is travelling through the warehouse (Lee et al. 2015, p. 734). In order to clearly distinguish searching 6
A similar subdivision of picker working times can be found in Gils et al. (2018).
26
2 Theoretical Background
time from travelling time, this thesis defines searching time as the time needed for finding an item within a single rack, i.e. after the picker has finished travelling and is located in front of the rack that contains the requested item. According to this definition, searching time can mainly be described as the time for information processing activities, such as reading, searching, and identifying items (Grosse and Glock 2013, p. 857; Tompkins et al. 1996, pp. 435–437). Note that according to Grosse and Glock (2013), searching is a highly cognitive task and searching times are therefore especially impacted by individual characteristics of the picker as well as learning effects. The third largest share of a picker’s working time with approximately 15% is taken by the actual picking (Tompkins et al. 1996, p. 437). Picking time in general refers to the time needed by the order picker for reaching and bending towards a specific item location within a rack, grasping and extracting one or multiple items, and subsequently dropping these items into an order bin (Tompkins et al. 1996, pp. 435–436). Note that in order to be more specific, this thesis defines picking time as the time for actually grasping and extracting items from a rack, thus separating picking time from the time needed for reaching towards racks and dropping items into bins. Finally, setup time and time for other activities conjointly consume 15% of a picker’s time and refer to manifold actions such as receiving orders, preparing the picking cart, counting or packaging items or documenting the completion of orders (Tompkins et al. 1996, pp. 435–438). These times will not be given any further consideration as they are not the focus of this thesis. According to Staudt et al. (2015, p. 5531), quality in order picking refers to punctuality, completeness, correctness of orders, minimum breakage of items and overall customer satisfaction. While punctuality is closely related to the aforementioned picking time, and overall customer satisfaction is out of scope when analysing an individual picker’s performance, completeness and correctness can be directly linked to different types of picking errors that may occur during manual order picking. Brynzér and Johansson (1995) use the term “picking accuracy” when quantifying the occurrence of picking errors in a manual order picking system. In fact, three different errors can be distinguished in multi-order picking systems: A picker can either pick the wrong item, pick a wrong number of items or assign the picked items to the wrong order. The term breakage refers to the number of items that get damaged, for example by being dropped during the picking process. To reduce searching and picking times as defined above, and minimize picking errors, the design of individual order picking racks has been identified as an important factor, especially in parts-to-picker systems (Calzavara et al. 2017b, p. 6888). The design of order picking racks yields a surprisingly large number of potential
2.2 Human-Centred Planning and Training in the Context of Manual Order Picking
27
design parameters, which are in part derived from research in the retail sector: First of all, the horizontal and vertical position of items must be defined (Battini et al. 2016, p. 149; Geismar et al. 2015; Petersen et al. 2005). Then, the available display area (i.e the visible area of each item) must be considered, because it can directly affect searching times (Abbott and Palekar 2008). Factors like the angle of exposure and bin sizes should also be taken into consideration, as these factors can influence both searching and picking times (Calzavara et al. 2017b, p. 6888; Finnsgård and Wänström 2013). Finally, the colours used for the rack and the items are also important factors influencing searching times, as certain colors can have a distracting effect (Monnier 2011). In fact, Bishu et al. (1991) argue that the colour of items has a significant influence on picking times.
2.2.3
State of Research on the Planning of Order Picking Systems
Past research on warehousing in general has analysed different aspects of warehouse planning, such as warehouse layout designs (Roodbergen et al. 2015), performance evaluations (Gu et al. 2010), decision support models (Gu et al. 2007), operational control of warehouses (Rouwenhorst et al. 2000), workforce planning (Gils et al. 2017), and stochastic models for warehouse operations (Gong and Koster 2011). For order picking in particular, picker routing policies (Masae et al. 2020; Petersen and Aase 2004; Roodbergen and De Koster 2001), storage allocation (Guo et al. 2016; Yu et al. 2015; Chuang et al. 2012), order assignment strategies (including zoning, sorting and batching) (Gils et al. 2018; Koster et al. 2007), as well as the effect of picker blocking in narrow aisles (Franzke et al. 2017; Chen et al. 2013; Pan and Wu 2012), have been investigated. Also, the efficient use of warehousing equipment and picker supporting technology has already been thoroughly covered by researchers (Vries et al. 2016b; Battini et al. 2015; Reif and Walch 2008; Berger and Ludwig 2007). Due to the dominance of fully manual order picking systems in practice and the large amount of pickers’ time spent travelling, previous research on the planning of manual order picking has mainly been done with the aim of reducing picker travel time (Gils et al. 2018; Koster et al. 2007). However, in recent years, human factors have been identified as another important aspect in the context of the planning of manual order picking systems (Grosse et al. 2015a; Grosse and Glock 2015, pp. 882–883; Grosse et al. 2015c). It has been proven that the human-centred design of manual order picking systems can increase both customer satisfaction and warehouse performance in terms of order completion times (Grosse et al. 2017, p. 1260;
28
2 Theoretical Background
Grosse and Glock 2013; Gils et al. 2018, p. 2; Wruck et al. 2017, p. 6453; Battini et al. 2015, pp. 483–483; Berger and Ludwig 2007). Additionally, the rising awareness of the need to reduce the risk of injuries and musculoskeletal disorders among human workers in manual order picking has further increased the importance of this field (Grosse et al. 2017, p. 1260; Calzavara et al. 2017a). Nevertheless, by using parts-to-picker systems, human travel times can be reduced or even eliminated (Boysen et al. 2017, p. 550). Consequently, with the ongoing integration of these systems in practice, other time components such as the reduction of searching and picking times at a single rack are becoming more important in the human-centred planning process (Calzavara et al. 2019, p. 203; Gils et al. 2018, p. 11). While rack design has gained much attention in the retail sector, where research has focused on reducing the searching times of customers to optimize store revenue (Geismar et al. 2015; Abbott and Palekar 2008), research on different rack layouts for manual order picking is still scarce (Calzavara et al. 2017a, p. 529). An analysis of different rack layouts and item locations can be found, for example, in Battini et al. (2016). The authors focus on the trade-off between picking time and human energy expenditure, finding that aiming for low energy expenditure can also be beneficial in terms of picking times. Similarly, Calzavara et al. (2017a) deal with different rack designs for palletized items and their effect on ergonomic measures and costs, which are in part influenced by picking times. They recommend using half-pallets on higher levels and pull-out systems on ground levels to achieve a favourable trade-off between warehousing costs and the ergonomic well-being of human pickers. Research dealing with the effect of item exposure on picking times is provided by Finnsgård and Wänström (2013). The authors investigate the case of an automotive assembly line, finding that packaging type, height, angle and part size have a considerable influence on searching and picking times. The work by Calzavara et al. (2017b) covers the effect of bin sizes and angles on picking times, stating that smaller bins at tilted angles yield best results. Not a rack but a stationary workplace for a parts-to-picker system is developed by Lee et al. (2016). The authors find that a picking station angled at 20◦ to 30◦ makes workers feel most comfortable. Furthermore, the picking station’s height should be between a worker’s eye and waist level. In summary, this thesis is based on the assumption that, due to semi-automated parts-to-picker systems becoming more popular, searching and picking performed by human workers at single racks will increasingly gain interest in the practical planning of warehouses, as well as in order picking research. For this, VR presents itself as a promising technology, as has been described in the previous sections.
2.2 Human-Centred Planning and Training in the Context of Manual Order Picking
2.2.4
29
State of Research on Learning and Training in Manual Order Picking
In the field of logistics, the integration of learning effects have been investigated for vehicle routing models (Zhong et al. 2007), supplier selection (Glock 2012), and inventory models (Kazemi et al. 2015). The negative effect of worker fatigue in a logistics process is covered by Glock et al. (2019a). By analysing a packaging process, the authors are able to give recommendations on cost-optimal box sizes and work schedules to ensure a maximum fatigue level is not exceeded. Neumann and Village (2012, p. 1147) recommend considering learning effects during the human-centred design of manual working systems. For the planning of manual order picking in particular, the effect of worker learning has only gained interest in recent years (Grosse and Glock 2015, p. 889), even though researchers agree that pickers being familiar or unfamiliar with storage locations can influence picking times, especially in small picking zones (Chuang et al. 2012, p. 1175; Zelst et al. 2009, p. 629; Koster et al. 2007, p. 489; Jane and Laih 2005, p. 491). One of the first studies considering learning effects in manual order picking has been provided by Bishu et al. (1991) and Bishu et al. (1992). They find that learning has a significant influence on searching times in order picking, even though it mainly takes place during the first 50-100 picks of an order picker. By performing a case study in a company that introduced a new material-handling system, Chakravorty (2009) is able to observe experiential learning in a parts-to-picker system. Grosse and Glock (2013) investigate learning effects of three newly employed order pickers in a manual order picking environment. They find that learning effects can be observed, however, the size of the effect differs noticeably for each picker. Moreover, the authors calculate different learning curve models based on the data obtained in order to evaluate which model is best suited for the application in manual order picking. This research is extended by Grosse and Glock (2015). Here, the authors highlight the importance of learning effects during a planning process, as learning can have a major influence on order picking times. Especially the design of picking zones can be improved if learning effects are considered. The potential impact of learning and forgetting on storage assignment decisions is investigated by Grosse et al. (2013). The authors are able to show the strong effect learning and forgetting can have on the potential benefits and pay-off times of storage reassignments in manual order picking. Again, this highlights the importance of human learning for the planning of manual order picking systems. In general, the existing literature clearly shows that learning effects play an important role in order picking and should therefore be considered in a human-centred planning of an order picking system. According to Glock et al. (2019b), the majority
30
2 Theoretical Background
of research on learning curves focuses on mathematical modelling and theoretical aspects, while empirical and experimental studies are much scarcer. Hence, applying the existing learning curve models to empirical data in order to compare learning effects in virtual and real order picking can be considered a valuable contribution to research.
2.3
State of Research on the use of Virtual Reality in Manual Order Picking and Specification of the Research Gap
In the context of logistics and manual order picking in particular, the number of studies employing VR is relatively small. Battini et al. (2018) combine an immersive VR HMD with a motion capturing system and a heart rate monitor in order to develop a system for the human-centred design of workplaces for order pickers, highlighting the potential of the technology in this context. However, their work does not provide an evaluation of the system. To the author’s knowledge, only the publications by Wulz (2008), Reif and Walch (2008), and Kozak et al. (1993) can be found that evaluate the use of VR for simulating manual order picking. Wulz (2008) develops a complex CAVE system to display a virtual picking environment, consisting of several computers, a treadmill and specialized gloves for grasping items. Using this setup, the author conducts a study with 18 participants. The results show that order picking in VR takes longer compared to picking in a real environment, especially because the times for grasping items and downtimes are longer in VR. However, there is a much smaller difference between the virtual and the real environment with respect to walking times, and participants in VR show higher motivation. Nevertheless, the focus of the work by Wulz (2008) is not primarily on the evaluation of VR usability, but on the construction and design of the CAVE system for simulating manual order picking. This shows the high complexity and demands that arise when trying to simulate manual order picking using a CAVE system. Reif and Walch (2008) present an experimental study with 17 participants comparing virtual to real-world order picking, also using a CAVE system for the VR display. Their results are similar to the findings of Wulz (2008), claiming that even though order picking in VR is more motivating, significantly longer times are needed in comparison to picking in a real environment.
2.3 State of Research on the use of Virtual Reality in Manual …
31
The only application of VR training to a task similar to manual order picking can be found in Kozak et al. (1993). They analyse the transfer of learning effects in a picking and placement task, comparing a virtual and a real environment. In a study with 21 participants in total, they are not able to find a significant difference in real-world task performance between participants that were trained in VR and nontrained participants. Moreover, participants trained in a real environment perform significantly better than participants that received training in VR. It is worth noting here that the authors already use an HMD to display the VR. The results thus reject the usability of VR for training manual order picking and therefore differ from the previously cited studies, which argue that VR training of manual tasks performs at least as well as traditional training methods. At first sight, the results of these studies seem to reject the usability of VR for planning and training in the context of manual order picking. However, all three studies were published before the introduction of contemporary HMDs. Wulz (2008) and Reif and Walch (2008) therefore use CAVE systems to display VR. While being the only study using an HMD, the work of Kozak et al. (1993) dates back to the year 1993. For modern HMDs, their results might not be valid any more due to the improved performance of the technology in terms of providing immersion. Repeating the evaluation of learning effects in VR in manual order picking with a modern HMD is thus advisable. Moreover, the number of participants is relatively small in all studies, thus limiting the explanatory power of the results. Hence, to fully evaluate the usability of VR for planning and training in manual order picking, it appears advisable to repeat these experiments with an advanced HMD and a larger number of participants. Based on the review of the available literature on VR technology and manual order picking presented in this chapter, the research gap can be further elaborated. Figure 2.1 extends the previously presented Figure 1.1 on page 5 accordingly. In summary, the literature has shown that human performance in manual order picking relevant for planning and training purposes is mainly expressed in terms of time (i.e. travel time, searching time, and picking time) and quality (i.e. picking errors and dropped items). However, when simulating manual order picking in VR using contemporary, consumer-level HMDs, the four major limitations shown in the figure might restrict the transfer of results from the VR to the real environment. It therefore remains unclear whether manual order picking can adequately be simulated in VR.
32
2 Theoretical Background
Figure 2.1 Extended illustration of the research gap (based on Figure 1.1)
To further evaluate the usability of VR HMDs in the context of manual order picking, the remainder of this thesis will analyse whether individual human activities in the context of manual order picking can be simulated in VR in such a way that they yield similar results in terms of human performance compared to the equivalent real-world task.
3
Systematic Literature Review of Previous Studies that use Virtual Reality Head-Mounted Devices for Simulating Manual Activities
The aim of this chapter is to answer RQ 1 and provide the theoretical basis for the experimental study presented in the following chapters. Therefore, this chapter analyses the available literature on the comparison of manual activities in VR using an HMD and in a real environment. Based on the findings from previous studies, this chapter then evaluates which activities of manual order picking can be simulated using a VR HMD. For the sampling and the analysis of the literature, a systematic approach has been chosen because it ensures rigorousness and replicability. In contrast to traditional literature reviews, systematic literature reviews also help to avoid bias (Tranfield et al. 2003). They are, therefore, widely used in business and management research (Hochrein et al. 2015). Note that the methodology used in this thesis is based on the guidelines provided by Kitchenham and Charters (2007) and Seuring and Gold (2012), who elaborate a systematic review approach for qualitative science based on predecessors from medical research. The chapter is structured as follows: First, a tertiary analysis of previously published literature reviews in the context of VR is given in order to establish the frame of reference for this thesis and highlight the need for the additional review provided here. The second section of this chapter describes the development of a framework which is used for the subsequent analysis of the identified research articles. The sampling process, including the searching for and filtering of research articles is given in the third section. The fourth section of this chapter provides the content
Electronic supplementary material The online version of this chapter (https://doi.org/10.1007/978-3-658-34704-8_3) contains supplementary material, which is available to authorized users.
© The Author(s), under exclusive license to Springer Fachmedien Wiesbaden GmbH, part of Springer Nature 2021 J.-K. Knigge, Virtual Reality in Manual Order Picking, https://doi.org/10.1007/978-3-658-34704-8_3
33
34
3
Systematic Literature Review
analysis of the identified sample of research articles. The fifth and final section concludes this chapter.
3.1
Tertiary Analysis of Previously Published Literature Reviews
In order to highlight the need for a systematic literature review, Kitchenham and Charters (2007, p. 7) recommend first conducting a tertiary analysis of existing literature reviews in the field of interest. In fact, several literature reviews on the use of VR in industrial applications can be found: • Kim et al. (2013) focus on the use of VR in the built environment, identifying a total of 150 articles. Among these 150 articles, the authors find only 19 papers that provide a comparative evaluation of the effectiveness or the usability of VR. Moreover, as their literature review dates back to the year 2013, none of these 19 studies uses an immersive HMD to display VR. • Berni and Borgianni (2020) provide a literature review on the general use of VR in a design process. Their sample of 86 articles contains studies using different technologies for the VR display. However, with 26 articles, the largest number of studies use an HMD. The authors thus conclude that HMDs have increasingly gained interest in recent years and are commonly used in research on virtual prototyping and product evaluation. • Coburn et al. (2017, p. 11) also analyse the available literature on the use of VR in design processes but focus specifically on consumer-level HMDs. They do not provide a systematic review of the literature. However, they conclude that only a small part of the available studies on VR HMDs provide a validation or comparison of VR compared to traditional tools and methods. • Choi et al. (2015) review the available literature on the use of VR in manufacturing. By finding a total of 154 relevant studies, they are able to highlight the potential of the technology in this specific area of application. However, only 7 articles in their sample use an immersive HMD system. • Cecil and Kanchanapiboon (2007) provide an overview of the use of VR in the product design and prototyping process. By doing so, they are also able to prove the impact of VR technology on future product development processes. As their review was published before the development of modern, consumer-level HMDs, they have not yet covered the effect of this technological advancement on research in their field.
3.1 Tertiary Analysis of Previously Published Literature Reviews
35
Medicine is another field of application in which VR technology already plays an important role, especially in the context of training. In this field alone, a large number of systematic literature reviews can be found: • Kim et al. (2017) provide a review on the use of VR and AR in plastic surgery. They classify the available literature into the three groups: preoperative planning, navigation and training. This proves that also in a medical context, planning and training are among the most important fields for the application of VR technology. • Laver et al. (2015) analyse previously published studies on the use of VR for the rehabilitation of stroke patients. The authors conclude that even though most studies provide a comparison between VR and traditional training methods, most results are of low quality because sample sizes in the studies are often small and control groups are either not considered at all or formed in an inadequate way. • Buckley et al. (2014) present a literature review focusing specifically on the transfer of training effects obtained from surgical simulators to real-world applications. They identify sixteen relevant studies with a total of 309 participants. Their results justify the use of simulations for surgical training, as operating times decreased and technical skills increased after training. However, it must be noted that not all studies in the sample employ immersive VR. Instead, the sample also includes studies using other simulation methods. Finally, some literature reviews can be found with a main focus on the comparison of VR and real environments: • Cardoso and Perrotta (2019) analyse and categorize the available technology for simulating natural and real-world equivalent locomotion in VR. They conclude that the simulation of realistic ways for users to interact with the VR is vital for the success of VR applications. • Lin and Woldegiorgis (2015) review the available research on stereoscopic displays for depicting VR in general, and especially the effects of perception and interaction. They reveal that contemporary VR displays have a number of limitations, especially concerning the underestimation of sizes and distances in VR compared to the real world. • A systematic literature review provided by David (2019) aims at identifying all articles in peer-reviewed journals that provide a comparison of human performance in VR and in a real environment. The author finds a total of 20 studies that conduct a direct comparison between a virtual and a real environment. However, the literature review excludes a relatively large number of papers, e.g., all research dealing with the use of VR for training purposes.
36
3
Systematic Literature Review
In summary, the assessment of the aforementioned literature shows that reviews focusing on VR HMDs and on the comparison of human performance in a VR environment and a real environment are surprisingly scarce in relation to the large and increasing body of literature on VR in general. In fact, neither literature reviews that focus on the usability of VR HMDs for planning and training nor literature reviews that analyse manual activities applicable to manual order picking can be found. Therefore, conducting an additional literature review with these two aspects in focus is considered advisable.
3.2
Framework for the Content Analysis of the Literature Sample
3.2.1
Defining Manual Activities in Order Picking
In order to analyse the content of the literature, a framework has been developed first using the state of research on VR and manual order picking. Since this thesis focuses on the human activities in manual order picking, the activities in a typical order picking process which are relevant to this thesis must first be defined. As has been described in section 2.2.2, three main time components have been identified as the performance measures in manual order picking relevant to this thesis: travel time, searching time, and picking time.1 These time measures refer directly to the three corresponding human activities in manual order picking: travelling, searching and picking (Tompkins et al. 1996, p. 436). However, analysing the available research with a focus exclusively on these three activities would limit the body of literature to studies with an unambiguous reference to manual order picking, yielding only a very small usable sample (Görge 2020). To provide a broader analysis of the literature, the three activities have therefore been further divided into eight related factors (labelled A to H) as depicted in Figure 3.1. This subdivision, which will be explained in detail below, is mainly based on the analysis of critical human factors in manual order picking provided by Grosse et al. (2015a, p. 701). However, it also includes additional aspects from the relevant literature, thus forming the final list of factors relevant to this thesis. As has been pointed out in section 2.2.2, order picking is most often performed manually. This means that human pickers travel to and from picking locations mainly by walking (Grosse and Glock 2015, p. 883). According to O’Connor and Kuo (2009, p. 1411), balance control plays an important role during human walking. As 1
See also Figure 2.1 on page 31.
3.2 Framework for the Content Analysis of the Literature Sample
37
Figure 3.1 Human activities in manual order picking relevant in the context of this thesis
balance can be influenced by visual perception (Kelly et al. 2019, p. 2; O’Connor and Kuo 2009, p. 1411) it is considered an important factor when analysing walking in VR. Orientation is another activity performed by pickers while travelling within a warehouse. It refers to pickers navigating, i.e. finding their way through the warehouse, remembering routes, and finding aisles and picking location within the warehouse (Grosse and Glock 2015, p. 885; Grosse et al. 2015a, p. 701). Navigation can, therefore, also be associated with the searching for items within a warehouse. In summary, travelling inside a warehouse has been divided into the activities of walking, keeping balance, as well as navigation and orientation. In general, searching is to a large part a cognitive activity (Lee et al. 2015, p. 734; Grosse and Glock 2015, p. 882) that requires remembering locations and items (Grosse et al. 2015a, p. 701). Ultimately, the actual identification of items is also part of the searching process. Thus, the searching activity has been divided into cognitive and memory tasks and the actual searching for and identifying of items. Finally, according to (Grosse et al. 2015a, p. 701) and Tompkins et al. (1996, p. 436), the actual picking of items can be divided into stretching, bending and reaching for items, as well as the actual extracting, grabbing and picking of items,
38
3
Systematic Literature Review
i.e. the manual interaction with the items. However, effective human interaction with objects requires a good perception of space (Lin and Woldegiorgis 2017, p. 67). As outlined in section 2.1.2, spatial perception can be an issue in VR. Spatial perception is especially relevant during manual order picking and it also plays an important role during picker travelling, because the picker must perceive and understand the warehouse layout (Grosse et al. 2015a, p. 701). Spatial perception could thus also be referenced to picker travelling. In summary, the activity of picking items has been divided into stretching, bending, and reaching, manual interaction with objects, and spatial perception. The examples of navigation and orientation and spatial perception show that the allocation of activities depicted in Figure 3.1 is not always straightforward and free of overlaps. However, this is not considered problematic, because it is not the allocation but the identification of all relevant human activities in manual order picking that is relevant to this thesis. Note that Grosse et al. (2015a, p. 701) list further human activities in manual order picking, for example psychosocial influences or activities during the set up of the order picking process. As these are assumed irrelevant in the context of this thesis, they are not given any further consideration.
3.2.2
Development of a Framework for the Content Analysis
For the content analysis of the literature, a framework has been developed based on theoretical considerations. Following the suggestions made by Seuring and Gold (2012, p. 552), different items are defined for the content analysis. The final framework, which will be explained in detail below, is depicted in Figure 3.2. The figure shows that the literature is analysed on three successive levels, dealing with the context, the content and the comparison between VR and the real environment as provided by the literature. Each level contains a number of different items for the analysis with various manifestations for each item. On the first level, the literature is analysed in the context of this thesis. This means that the first item on this level (I1) distinguishes if a publication covers the field of training or not. The second item (I2) describes the activities in the context of manual order picking that are covered by each sample article. Note that the manifestations available for I2 are defined by the activities identified in section 3.2.1 (see also Figure 3.1). On the second level, the actual content of the literature is reviewed. Neuman (2014, pp. 63–65) points out that empirical research almost always relies on abstract concepts to describe real-world elements. For this reason, the first item on the second level (II1) analyses whether a low, medium or high level of abstraction from a corresponding real-world application is employed in the studies at hand.
3.2 Framework for the Content Analysis of the Literature Sample
39
Moreover, virtual environments can range from very accurate and multi-sensory representations of the real world to very simplified visualisations (Portman et al. 2015, p. 378). This means that for research in the field of VR, it is also important to consider the level of equivalence when comparing the virtual model to its real-world reference model. VR models can either try to replicate a real-world model exactly, or provide a similar or even totally different environment for the analysis. This aspect is covered by the second item of the second level (II2). The following two items on the second level focus on the data that is used for the comparison between the virtual and the real environment. According to Neuman (2014, p. 46), data gathered in experimental studies can either be quantitative or qualitative. Therefore, the third item on the second level (II3) distinguishes between studies using quantitative data and studies relying on qualitative data for the comparison. Furthermore, Neuman (2014, pp. 47–49) points out that different methods used for the data acquisition can be distinguished. Because the focus of this literature review lies on empirical studies, it is limited to research articles that use experiments for collecting VR data. The real-world data used for the comparison, however, can also originate from previously published research and statistics or from conducting surveys that collect subjective information of participants (Neuman 2014, pp. 47–49). Therefore, the fourth item on the second level (II4) is used to analyse whether real-world data is gathered by performing an individual experimental study or by analysing secondary research or subjective feedback.
Figure 3.2 Framework developed for the content analysis of the literature
40
3
Systematic Literature Review
The third level focuses on the results found in literature, i.e. if activities performed in VR yield consistent or different results compared to equivalent activities performed in the real world. Thus, only one item (III1) with these two manifestations can be found on the third level. In summary, the framework used for the content analysis as shown in Figure 3.2 makes it possible to distinguish a total of 23 different manifestations of seven different items.
3.3
Methodological Approach: Searching and Sampling the Literature
The process leading to the final literature sample consists of several steps: First, a database search has been performed using a list of keywords. Second, the literature found in the databases has been reviewed systematically and repeatedly using predefined inclusion and exclusion criteria to identify and remove irrelevant research from the sample. These steps will be explained in detail in the following sections.
3.3.1
Keywords and Database Search
In order to access a wide and heterogeneous set of literature, two different databases have been used, namely Web of Science and Business Source Premier by Ebsco Host. According to Chapman and Brothers (2006, p. 61), the combination of these two databases yields the best results in terms of journals covered in the fields of management and information systems research. As this thesis focuses on the use of HMDs, the two keywords “Virtual Reality” and “Head Mounted Device” resp. “Head Mounted Display” and their abbreviations “VR” and “HMD” were combined for the database search. These keywords produced a total of 805 hits categorized as journal articles in Web of Science. In order to also include articles that do not use the term HMD but the brand name of the device used in the research, the two terms “HTC Vive” and “Oculus Rift” were also added to the list of keywords. These were chosen because the HTC Vive, the Oculus Rift, and the Sony Playstation VR were identified as the HMD brands with the highest market share in 2018 (Statista 2020). However, the keyword list already contained the word “VR” so the brand name “Sony Playstation VR” did not need to be included. In summary, an additional 129 articles were found in Web of Science by including the terms “HTC Vive” and “Oculus Rift”. Hence, the final search term used for the database search was as follows:
3.3 Methodological Approach: Searching and Sampling the Literature
41
(“Virtual Reality” OR VR) AND (“Head Mounted Device*” OR “Head Mounted Display*” OR HMD* OR “HTC Vive” OR “Oculus Rift”) The final database search was conducted on June 22nd, 2020, yielding a total of 934 results in Web of Science and an additional 129 results in Business Source Premier. The database search was configured in such a way that only peer-reviewed journal articles were selected. Conference proceedings were thus not considered in the database search.
3.3.2
Inclusion and Exclusion Criteria and Sample Generation
The database search described above produced a total of 1,063 journal articles from both databases. In order to answer the RQ 1, the aim is to analyse all articles that use an immersive HMD when comparing order picking related manual activities in VR and a real environment. Therefore, the total set of articles has carefully been reviewed and filtered using certain inclusion and exclusion criteria in order to remove all articles that are considered irrelevant. The complete set of inclusion and exclusion criteria is given in Table 3.1, which will be described in detail below. First of all, only journal publications in English language published before the search date are considered in this literature review. Note that the publication date as well as the type of publications are restricted by the time and the settings of the database search, as described above. Also, duplicate articles have been removed. Moreover, only primary research is considered, i.e. literature reviews have been omitted. On the level of the articles’ content, as a first step, all research that does not use an immersive VR HMD has been removed. This was necessary even though the keywords were clearly directed at research using VR HMDs because the database search still produced some articles focusing solely on mixed reality, desktop-VR or AR systems. Second, all research that is not focused on the VR itself but uses VR only as a method for other, unrelated research has been removed. Third, research that focuses on hard- or software development or the development of algorithms and methods in the context of VR has been excluded. Fourth, all research unrelated to activities associated with manual order picking has been removed. To do so, the set of activities defined in Figure 3.1 has been used: If not even one of these eight activities and tasks was covered by the research article, the article has been excluded. Finally, all research that does not provide a comparison between a VR environment and a real environment has been excluded. This also includes articles that provide a comparison of different VR systems or between an immersive VR system and a desktop-VR system but provide no reference to a real environment.
42
3
Systematic Literature Review
Table 3.1 Inclusion and exclusion criteria used to filter the set of articles and produce the final sample
Note that the exclusion criteria were checked for each article in the order given in Table 3.1. This means, for instance, that some of the articles that were removed because no reference to manual order picking were found might also lack a comparison between a VR and a real environment but were not further checked for this aspect. The filter process for generating the final sample of articles has been conducted in three steps as depicted in Figure 3.3. In each step, selected articles were excluded from the previous sample, leading to a total of 3 different samples, named A, B, and C. Thus, A ⊃ B ⊃ (C − Snowball search). The figure also lists the number of articles excluded in each step due to each of the exclusion criteria specified in Table 3.1. In the first step, the title and abstract of each article were analysed in order to remove all articles based on non-content related exclusion criteria. This includes duplicate articles, inaccessible articles, reviews (i.e. non-primary research articles), and articles that are not in English. Also, all articles that do not focus on immersive VR HMDs were excluded in this step. Overall, 23% of the articles were removed from the 1,063 articles found by the database search. The resulting sample A, with 815 articles in total, is the first sample used for the upcoming content analysis
3.3 Methodological Approach: Searching and Sampling the Literature
43
Figure 3.3 Process of generating the literature samples including the number of papers excluded in each step (note that the percentage values are calculated in relation to number of articles in the previous sample in each step)
as it only contains articles using a VR HMD according to the definition relevant to this thesis (see section 2.1.1). In the second step, which is also based on the assessment of title and abstract, 80% of the articles in sample A were excluded based on content-related exclusion criteria, leading to sample B. The remaining 159 articles in this sample were then used for a full-text review. Again, unrelated articles (65% of sample B) were removed in the third step. However, six papers were added in this step based on a snowball search. This leads to the final sample C, consisting of 61 articles that use an immersive HMD for comparing order picking related manual activities in a VR environment and a real environment. A complete list of all articles in samples C and B can be found in Appendices A and B in the electronic supplementary material. The full dataset including information on all articles in the process is publicly available in Knigge (2020d).
44
3.3.3
3
Systematic Literature Review
Discussion of the Sample Generation Process
The relatively large number of articles found by the database search indicates that VR using HMDs is indeed a relevant topic among researchers. Also, the selection of keywords appears suitable to access a wide body of literature. Moreover, the fact that only 83 papers in total (8% of the articles found in the database search) were removed because no immersive VR HMD system had been employed, also supports the selection of keywords. Using two databases for the article search has also proven to be a valid approach. Note that after the removal of 77 duplicate articles, Business Source Premier added an additional 52 articles to the sample (40% of the 129 articles). Bearing in mind the content of the research, it was found that a total of 279 articles focus on the development of hard- and software and were thus excluded. The large number of articles in this category is not surprising because HMD technology has advanced significantly in recent years, opening up numerous new possibilities for further technical developments (Coburn et al. 2017, p. 11). Additionally, 333 articles in total were removed because no reference to activities related to manual order picking was found. This reflects the large variety of potential applications covered by current VR research. Of the remaining articles that do cover activities associated with manual order picking, a total of 148 had to be removed because a comparison between a VR environment and a real environment was lacking. This is another interesting finding, as it confirms the previous statement by Eastgate et al. (2015, p. 357) that research on the comparison between VR and real environments is still under-represented in literature.
3.4
Analysis of the Literature Samples
This section provides an analysis of the three distinct literature samples generated in the previous section. First, a quantitative analysis is performed, followed by a content analysis using the framework developed in section 3.2.2. The initial step is to analyse the year of publication and the research field of each article. This has been done for all three samples as this information is generally available in the title and abstract. Second, an overview of the HMD system used for the research and the number of participants in each study is given. To gain this information, full text reviews were necessary. Therefore, this analysis is only provided for samples B and C. The actual content analysis using the framework is only performed for sample C, as this sample has been exclusively generated to include all papers that are relevant
3.4 Analysis of the Literature Samples
45
for the content analysis. Table 3.2 gives an overview of the analyses performed for each literature sample.
3.4.1
Quantitative Analyses
Results Figure 3.4 gives an overview of the year of publication of each article. As can be seen, the earliest article in sample A dates back to the year 1993 (sample B: 1996; sample C: 2001). As already mentioned, for the year 2020, only articles published before June 22nd (i.e. in the first half of the year) were considered. However, with 111 articles in sample A (sample B: 26 articles; sample C: 10 articles), the number is already approximately half the number of articles published in 2019 (sample A: 216 articles; sample B: 48 articles; sample C: 15 articles). Table 3.2 Overview of the quantitative and content analyses performed on the distinct literature samples
During the review process, each article was classified according to its primary field of research. Note that in order to assess the research field, not only title and abstract were taken into account, but also the name of the journal in which the article was published. Each article was either assigned one of the previously defined research fields, or – if no previously defined research field appeared to fit – a new research field was defined. In this way, a total of 16 different research fields could be identified in the literature sample. The total number of articles in each research field is given in Figure 3.5. For all the articles that were analysed based on the full text (i.e. sample B and C), the HMD system used for the research was identified. Figure 3.6 gives the total number of articles for each HMD system. Note that for reasons of clarity, Figure 3.6 only lists the brand name of the HMD system but not the actual model used.
46
3
Systematic Literature Review
Figure 3.4 Number of articles published in each year in each sample (* for 2020, only articles published before June 22nd were considered)
Moreover, articles that used multiple VR hardware systems were summarized in one category labelled “Multiple VR systems”. HMD systems that were only found in one article within sample B (19 HMD systems in total) were summarized in the category “Miscellaneous”. This includes the Sony Playstation VR system, which was only used by one article in the sample. Note that 13 articles in sample B do not use a specific HMD system or do not provide any information on the HMD system. For each article in samples B and C that presents an experimental study, the number of participants has also been analysed. Figure 3.7 shows violin plots illustrating the results. Note that of the 159 articles in sample B, only 148 conduct an experimental study, whereas in sample C, all articles contain an experimental study. Also note that the number of participants depicted in the figure is the sum of participants in all groups of all experiments described in the corresponding article. This means that if multiple experiments were described in one article or if participants were divided into different experimental groups, the number of participants per experiment resp. per group is actually lower. This has been done to ensure comparability. Furthermore, a slightly overestimated number of participants is considered more informative than an underestimated number. However, if an article states that the
3.4 Analysis of the Literature Samples
47
Figure 3.5 Primary research field of the articles in each sample
same participants participated in multiple experiments, they were only counted once. Furthermore, if an article provided any information on the number of drop-outs, only those participants whose data was actually used in the analysis were counted. Figure 3.7 reveals that the number of participants ranges from 3 to 260 with a median of 28 participants in both sets. The third quartile lies at a number of 42 (sample B) resp. 40 (sample C) participants. Discussion With regard to the year of publication, it is noteworthy that the number of articles increased significantly after the year 2014, rising to a peak in 2019. This clearly supports previous findings from the literature (e.g., Berg and Vance 2017) stating that the introduction of new consumer-level HMDs has sparked the usage of HMDs in VR research, even though HMDs were available before that time. With the number of articles for the year 2020 following a similar trend, the interest of researchers in using VR HMDs is still growing. Furthermore, the previous statement that HMDs are used in many different fields is also confirmed by the quantitative analysis. The results show that using VR HMDs
48
3
Systematic Literature Review
Figure 3.6 HMD systems used by the articles in samples B and C
is most popular in psychology and medicine. Given that the technology is relatively new, it is not surprising that a large number of articles also originate from the disciplines of hard- and software development. Yet, it is also understandable that these disciplines are less common in the final sample C, as hard- and software development are mostly unrelated to manual order picking. Instead, a large portion of articles from the field of ergonomics in sample A can also be found in sample C. This is because manual order picking is a process in which ergonomics play an important role (Grosse et al. 2017, p. 1272). Moreover, it is striking that the number of articles from the fields of manufacturing, as well as retail and marketing, is fairly low, even though the technology is already used in practice by marketers (Cowan and Ketron 2019, p. 1586). Apparently, scientific research on the use of VR HMDs in these fields lags behind practical applications. The analysis of the HMD hardware used throughout the articles revealed that the Oculus Rift and the HTC Vive are by far the most popular systems among researchers. It can thus be concluded that their introduction to the market has caused the aforementioned increase in the number of research after the year 2014. Although currently the highest selling VR system on the consumer market (Statista 2020), the
3.4 Analysis of the Literature Samples
49
Figure 3.7 Total number of participants in experimental studies in samples B and C (Sample B: n=148; Sample C: n=61)
Sony Playstation VR system seems to be surprisingly unpopular among researchers, as only one article using this system was found in samples B and C. Finally, the analysis has revealed that most experimental studies employ only relatively small sample sizes. This is surprising, as some studies have found a significant influence of human characteristics such as gender or age on the performance in VR (e.g., Juliano and Liew 2020; Chang et al. 2019). Due to the small sample sizes, results of most studies could thus be severely flawed and their explanatory power could be limited if the sample was not carefully selected for the purpose of the respective research. Hence, it can be concluded that it is advisable to conduct further large-scale studies in the future in order to adequately evaluate the usability of VR HMDs.
3.4.2
Application of the Content Analysis Framework
Results The framework developed in section 3.2.2 has been applied to the articles in sample C. To do so, each article has been read carefully and the adequate manifestations of each item of the framework have been assigned. Based on item I1, the articles in
50
3
Systematic Literature Review
sample C were divided into two groups: articles with no reference to training (39 articles – 64%) and articles focusing on using VR for training purposes (22 articles – 36%). In the following, both groups are analysed individually to account for the different context. For the remaining items, Figure 3.8 gives the number of articles for each manifestation. Note that the manifestations per item are not mutually exclusive, meaning that multiple manifestations could be assigned to one article if they were assumed applicable. Also note that for articles focusing on training, item III1B (different results for the comparison between VR and the real environment) has been further distinguished: Training in VR can either perform better (16 articles – 73%) or worse (4 articles – 18%) than training in a real environment. A detailed overview of the assignment of manifestations to each article can be found in Appendix C in the electronic supplementary material.
Figure 3.8 Number of articles from sample C assigned to each manifestation of each item of the content analysis framework, separated into articles with no reference to training (n=39) and articles with focus on training (n=22)
3.4 Analysis of the Literature Samples
51
In order to analyse any interdependence of items and manifestations, a pairwise analysis of manifestations has also been performed. Results are given in Table 3.3 (for articles with no reference to training) and Table 3.4 (for articles focusing on training). For each combination of two manifestations, the tables provide the number of articles that have been assigned both manifestations. Higher numbers in the table are highlighted by a darker background colour. For articles without reference to training, the large number of articles that investigate walking activities (I2A) and find different results (III1B) stands out. Therefore, a χ 2 -test has additionally been performed using R2 to test for independence between III1A and III1B given I2A. Independence is rejected at a 10% but not at a 5% level (χ 2 = 3.27, p = .071). For a low level of abstraction (II1A) and exact replications of reality (II2A), the number of articles finding different results is also considerably higher than the number of articles finding consistent results, although the difference is not statistically significant at a 5% level (II1A: χ 2 = 2.13, p = .144; II2A: χ 2 = 2.81, p = .093). Similarly, a large but not significant difference between consistent and different results can also be found for quantitative studies (II3A; χ 2 = 1.65, p = .199) and experimental studies (II4A; χ 2 = 2.69, p = .101). Moreover, the table also shows that 15 of the 39 articles in this set find both consistent (III1A) as well as different results (III1B). For articles focusing on training, the pairwise comparison in Table 3.4 shows that for all activities that were investigated (I2), a larger number finds better results of VR training than consistent or worse results (except for I2H). However, independence can only be rejected at a 5 % significance level for manifestations I2D and I2G, i.e. searching and manual interaction with objects (I2A: χ 2 = 1, p = .607; I2C: χ 2 = 0, p = 1.000; I2D: χ 2 = 8.32, p = .016; I2E: χ 2 = 3.20, p = .202; I2F: χ 2 = 6.00, p = .050; I2G: χ 2 = 6.2, p = .045; I2H: χ 2 = .67, p = .717). Finally, logit regression models have been fitted, using each manifestation of item III1 as the dependent variable and the manifestation of every other item as independent variables. The models thus try to predict the outcome in terms of item III1 by using the manifestations of the other items. For each manifestation of item III1, an independent model has been estimated because articles could be assigned to both manifestations at the same time. Moreover, an independent model has been estimated for each of the remaining items in order to avoid overfitting. To estimate the regression models, the function glm() in R has been used. Coefficient estimates and p-values of the fitted logit models are given in Table 3.5. The null and residual
2
For all statistical analyses in this chapter, R version 3.6.1 and the software R studio version 1.2.1335 have been used. The code of the statistical analyses is available in Knigge (2020c).
52
3
Systematic Literature Review
Table 3.3 Pairwise analysis of the manifestations assigned to the articles with no reference to training
Table 3.4 Pairwise analysis of the manifestations assigned to the articles focusing on training
deviances of each model, providing additional information on the goodness of fit, can be found in Appendix D in the electronic supplementary material.
Discussion The results of the content analysis using the framework in Figure 3.8 show that the majority of articles with no reference to training in sample C focus either on walking activities (I2A) or on spatial perception in VR (I2H). Seven articles also consider stretching, bending and reaching (I2F). However, searching and identifying items
3.4 Analysis of the Literature Samples
53
Table 3.5 Coefficient estimates and p-values of the logit regression models. Note that an independent model has been estimated for each item (I2, II1, II2, II3, II4) and each dependent variable
(I2E) and the interaction with objects (I2G) is only covered by 3 resp. 4 articles. This lack of research is surprising because it stands to reason that searching in VR might be substantially influenced by the capabilities of the VR HMD, e.g., the available field of view (Pastel et al. 2020). Similarly, it has already been pointed out in section 2.1.2 that available VR controllers limit the possibilities of natural interaction with objects and items in VR. For evaluating the use of VR in manual order picking, additional research in these areas is thus recommendable. With regard to training, the comparison of cognitive and memory tasks (I2D) can be found most often in the sample, followed by the searching for items (I2E) and manual interaction (I2G). This is in part due to the large number of articles focusing on the use of VR for the training and education of medical students, in which not only manual skills but also remembering anatomical facts and procedures is important (Pulijala et al. 2018). Even though manual interaction in order picking might not require the surgical precision of interaction in medical applications, this field is better covered by literature than the field of object manipulation in nontraining applications. Surprisingly, only a small number of articles can be found on the training of walking (I1A) and navigation (I1C) in VR, probably due to the limitations of VR HMDs to simulate free movement in wide areas (Niehorster et al. 2017, p. 20; Coburn et al. 2017, p. 4).
54
3
Systematic Literature Review
Figure 3.8 also reveals that most articles use a low level of abstraction (II1A) and accurate representations of the real world (II2A) in VR. This indicates that simulating highly detailed and realistic environments is not a challenge anymore due to the recent advances in display technology and computational power. Also, most studies collect and analyse quantitative data (II3A) and use experiments to generate real-world data (II4). This is therefore considered a valid approach for VR research. Looking at the results of the comparison between VR and real environments (III1), no definite statement can be made for articles with no reference to training or for articles with a focus on training. Although 22 articles in the set of articles with no reference to training (9 in the set of articles with focus on training) find consistent results, 32 articles (20 with focus on training) also find significant differences between the two environments. However, for the field of training in VR, 16 of the 19 articles state that VR yields better results compared to real-world training and only 4 articles find worse results. The analysis of the correlation between context (I2) and content items (II1, II2, II3, II4) and the results (III1) in Table 3.3 shows that the majority of articles in the sample that investigate walking (I2A) in VR encounter difficulties when trying to replicate real-world performance. The inability of HMDs to simulate unrestricted movement in a wide area is thus considered an unsolved issue in VR. The data also suggests that it is more likely to find different results in VR when using a low level of abstraction (II1A). This could be a hint that for some applications, it might be advisable to use more abstract replications of reality to achieve similar results compared to the real world. Table 3.3 also reveals that high levels of abstraction (II1C) are mainly used in articles investigating stretching, bending and reaching (I2F) and spatial perception (I2H). For I2F and I2G, however, the number of articles finding consistent and different results is equal (I2G) or almost equal (I2F). This is an important finding in the context of this thesis because it indicates that for simulating manual order picking it might not be necessary to model a highly realistic environment in VR. In the context of VR training, Table 3.4 shows that especially simulating the manual interaction with objects (I2G) and searching for items (I2D) provide better results compared to real-world training. This is a promising finding, indicating that VR can indeed be used to train searching activities and manual interaction with items in manual order picking. Nevertheless, Table 3.4 also shows that the majority of journal articles on manual interaction relies on qualitative data (II3B) gathered through subjective feedback of participants (II4C). Additional quantitative analyses might thus be advisable.
3.4 Analysis of the Literature Samples
55
Finally, the logit regression model for predicting consistent results (III1A) for articles not focusing on training shows that none of the estimated coefficients is significant at a 5% level. The lowest p-values can be found for different environments (II1C), spatial perception (I2H) and walking (I2A). Estimates of these coefficients are positive, except for I2A, meaning that for walking activities, consistent results are less likely. This finding is similar to the result of the pairwise comparison, indicating that realistically simulating walking using contemporary VR HMDs is difficult to implement and thus not recommendable. For predicting III1B, estimated coefficients for low (II1A) and high levels of abstraction (II1C), exact replications (II2A), quantitative data (II3A), and experimental studies (II4A) are significant. All of the coefficients of these items are positive, making different results more likely to occur. The coefficient estimate of II1A being almost twice as large as the coefficient estimate of II1C adds to the previous findings by showing that not only low levels of abstraction but also highly abstract simulations can lead to different results. Hence, the level of abstraction must be chosen carefully for each specific application. Moreover, the data shows that most articles gathering real-world data from experimental studies and using realistic replications in VR fail to achieve consistent results between VR and the real world. This supports the need for a critical evaluation of VR HMDs specifically for every use case. It thus appears reasonable to conduct another experimental study with the focus on the use of VR HMDs in the context of manual order picking. For articles focusing on training, the logit regression model for predicting consistent results (III1A) also yields only insignificant coefficient estimates. The lowest p-values can be found for spatial perception (I2H) and manual interaction with objects (I2G), with the latter showing a negative estimate. Similarly, the models predicting better results of VR training (III1B*) also yield mainly insignificant estimates, except for qualitative data (II3B). The positive estimate suggests that evaluating qualitative data makes it more likely to find better results for VR training. It can further be noted that the p-values are high for all manifestations of item I2, implicating that the specific activity does not have an influence on achieving better results. This also highlights the need for additional experimental research specifically on the training of manual order picking. The regression models predicting worse results (III1B**) find significant estimates for a low level of abstraction (II1A) and exact replication (II2A) at a 5% significance level. At a 10% level, the coefficient estimate for experimental studies (II4A) is also significant. All these manifestations have negative coefficient estimates, thus making worse results less likely. In the context of this thesis, these results support the use of an exact and realistic simulation model in an experimental study for the research on VR training.
56
3.5
3
Systematic Literature Review
Conclusion of the Literature Review
In summary, the systematic literature review provided in this chapter has revealed multiple interesting findings: First of all, the results confirm the previously stated assumption that research articles providing a comparison between VR and a real environment for manual activities that can be associated with manual order picking are relatively rare. Also, most studies that do provide a comparison between VR and a real environment employ only small numbers of participants, potentially limiting their explanatory power. Second, it has been found that only a few articles compare searching activities or the interaction with objects between VR and a real environment. Yet, a relatively large number of articles in the sample compare walking in VR with real-world walking. However, results on walking are often not consistent in the two environments due to the limitations of VR HMDs for simulating unrestricted walking. As an answer to RQ 1, it is thus theorized that searching and picking activities are most suitable for a human-integrated simulation using VR HMDs. This is in line with the previous assumption that the reduction of searching and picking times will become more important objectives of planning and training in the context of manual order picking. Therefore, these activities will be the focus of the experimental study presented in the subsequent chapters. Moreover, the results from the literature review give reason for using a low level of abstraction and an exact replication of a real order picking setup for the simulation of searching for and picking of items in VR. Nevertheless, it must be noted that the literature review presented here has some limitations. For example, it has only focused on VR displayed by using HMDs. Analysing studies that use other technology for generating VR might provide additional insights relevant to the research question. Moreover, the literature review has excluded conference proceedings from the literature sample. Since modern consumer-level HMD have only been available for a few years, the share of research published via conference proceedings is expected to be significantly larger than in peer-reviewed journals. A review including conference proceedings might provide additional findings and reveal current streams of research. Furthermore, the high p-values in the logit regression models suggest that the available data is not sufficient or not suitable to adequately predict the outcome of the comparison between VR and real environments. This means that the results in general are ambiguous and difficult to generalise. Further studies evaluating the usability of VR HMDs for specific applications are therefore considered necessary.
Experimental Design for Evaluating the Usability of Virtual Reality for Planning and Training in the Context of Manual Order Picking and Execution of the Study
In literature, numerous procedures for developing a research design can be found (see e.g., Döring and Bortz 2016, p. 27; Brosius et al. 2016, p. 28; Bryman 2012, pp. 8– 15,161; Neuman 2014, pp. 17–18). In this thesis, the procedure suggested by Döring and Bortz (2016, p. 27) has been used as the basis to develop a research design that answers RQ 2 and RQ 3. It also determines the basic structure of this chapter. First, the specification of the general research design leading to a randomized controlled experimental laboratory study is described in detail. Secondly, the implementation of the research design is presented. The third section of the chapter deals with the operationalisation of the research questions, i.e. the dependent variables are defined. Finally, the execution of the experimental study is explained and an overview of the sampling process is given.
4.1
Specification of the Research Design
4.1.1
Criteria for Quality in Research Designs
According to Bryman (2012, p. 46), three general criteria for the evaluation of quality are most important and should be considered in research design: validity, reliability and replication. These criteria, which will be described in detail below, define the basic prerequisites upon which the study in this thesis has been designed.
Electronic supplementary material The online version of this chapter (https://doi.org/10.1007/978-3-658-34704-8_4) contains supplementary material, which is available to authorized users.
© The Author(s), under exclusive license to Springer Fachmedien Wiesbaden GmbH, part of Springer Nature 2021 J.-K. Knigge, Virtual Reality in Manual Order Picking, https://doi.org/10.1007/978-3-658-34704-8_4
57
4
58
4
Experimental Design for Evaluating the Usability …
Validity In general, one can distinguish between internal validity, external validity, construct validity and statistical validity (Neuman 2014, p. 221; Bryman 2012, pp. 47–48; Shadish et al. 2002, p. 38).1 Internal validity refers to the prevention of nonspuriousness, i.e. a research design must ensure that for an observed effect no other cause (e.g., provoked by systematic errors or bias) than the cause under investigation is applicable (Bryman 2012, p. 47; Shadish et al. 2002, p. 53; Campbell 1957). Overviews of potential threats to internal validity can be found in Campbell (1957), Cook and Campbell (1979, p. 51–54) and Neuman (2014, p. 300–303). According to Bryman (2012, p. 52), these threats can be eliminated by using a research design that uses a control group and random assignment of subjects to the groups. Internal validity is therefore especially relevant for the general research design in Section 4.1.2. External validity ensures that results are generalizable and can be applied to other contexts and settings outside the study design (Jackson and Cox 2013, p. 35; Bryman 2012, pp. 47–48; Shadish et al. 2002, pp. 26–27; Campbell 1957). As with internal validity, Campbell (1957) and Cook and Campbell (1979, p. 51–54) present a number of potential threats to external validity. According to Croson (2002) and Jackson and Cox (2013), external validity can be ensured by carefully replicating a real situation with the research design and selecting an appropriate sample. Ensuring external validity is thus especially relevant during the design of the laboratory setup (described in Section 4.2.4) and the sampling process (described in Section 4.4.1). Construct validity2 refers to the selection of independent variables and the selection and measurement of dependent variables. Construct validity raises the question whether these variables represent the construct which is under investigation and if the selection is thus suitable to answer the research questions (Bryman 2012, p. 47; Döring and Bortz 2016, p. 97). Construct validity is therefore especially relevant during operationalisation (Section 4.3), in particular when selecting and measuring dependent variables (Sections 4.3.1 and 4.3.2). Statistical validity3 refers to the correct choice of descriptive and inferential statistical analyses to investigate the cause-effect relationships between treatment and outcome (Neuman 2014, p. 221; Döring and Bortz 2016, p. 97; Shadish et al. 2002, p. 38). Statistical validity is thus relevant during the analysis of the results, when 1
Bryman (2012, p. 48) further lists ecological validity as another form of validity. As this is especially relevant for social research, it is not given any further consideration. 2 Construct validity is sometimes also referred to as measurement validity. 3 Statistical validity is sometimes also referred to as statistical inference or statistical conclusion validity.
4.1 Specification of the Research Design
59
selecting the statistical methods and tests for performing the inferential analysis (Section 5.2). Reliability Reliability refers to the degree to which results of a study are consistent if repeated. This means that measures should yield consistent values when repeated under similar conditions (Bryman 2012, p. 46). It can further be divided into three aspects: stability, internal reliability and inter-observer consistency. Stability is achieved if results are consistent over time. Internal validity refers to the consistency of the scales or indices which are used. Inter-observer consistency can be affirmed if results and measures are not influenced by subjective judgement, i.e. different observers find consistent results (Bryman 2012, p. 169). Replication Replication or replicability refers to the possibility of replicating a study. This factor is closely connected to the transparent documentation of methods and procedures. Allowing other researchers to reproduce a study increases the objectivity of the results (Bryman 2012, pp. 47, 177). Guidelines for the documentation of experimental studies can, for example, be found in Palfrey and Porter (1991) and Davis and Holt (1993, p. 22). Apart from information on the experimental procedures, location and timing, the authors state that descriptions of experimental studies should also include information on the subjects (such as their number, age and previous experience in the field of research) and the recruitment process. Furthermore, information on payment, intentional deception of subjects (if applicable) and the use of practice trials should be provided. Finally, the technology used for the experiment should also be documented. To ensure full replicability, the thesis at hand follows the guidelines presented in Palfrey and Porter (1991) for the documentation of the experimental design and its implementation. Information on the laboratory setup and the technology used in the experiment is therefore given in Section 4.2.4. Information on the recruitment process, the payment of participants, and the time of the experiments is provided in Section 4.4.1. The section also contains a description of the final sample of participants.
4.1.2
Laboratory Experiments as the Research Design to Investigate Manual Order Picking in Virtual Realities
For the development of a research design, manifold variants exist and a universally applicable research design process does not exist (Döring and Bortz 2016, p. 182).
60
4
Experimental Design for Evaluating the Usability …
Instead, this thesis uses a custom framework based on the classification criteria presented in Montero and León (2007) and Döring and Bortz (2016, p. 183) to develop the research design. The framework consists of five successive decision levels and is depicted in Figure 4.1.4 The specific variants of the final research design used in this thesis are highlighted by a darker background in the figure. The decision on each level will be described in detail below. Note that the framework is not complete. Instead, the variants depicted on each level are defined by the decision on the previous level. The decision on each level is based on the criteria for quality in research designs, described in Section 4.1.1. Especially internal and external validity are considered key requirements for the research design (Campbell 1957). Beside quality criteria, further factors for defining the final research design are the aim and the particular research questions of the study (Bryman 2012, p. 39). Moreover, economical and ethical aspects also play a role in the process of developing a research design (Döring and Bortz 2016, p. 182). Scientific approach The first level of the framework in Figure 4.1 defines the basic scientific approach. Quantitative, qualitative and mixed-methods designs can be distinguished as variants on this level. In general, qualitative research seeks an inductive approach by collecting data mainly in the form of words. Quantitative research follows a deductive approach by using quantitative data (i.e. numbers) (Neuman 2014, p. 46; Bryman 2012, pp. 35–36). This means that quantitative data makes it possible to draw inferences with regard to changes in a variable in response to unit changes in another variable; and qualitative data does not (Ryan 2007, p. 5). Qualitative research is often used for finding new theories and hypotheses, whereas quantitative research is most suitable for testing existing theories (Bryman 2012, p. 36). Furthermore, qualitative research often employs unstructured or semi-structured data collection methods for a small number of cases. In quantitative research, data collection is mostly done by structured measurements of selected variables in a large number of cases (Döring and Bortz 2016, p. 184). Therefore, this level is sometimes also referred to as “types of data” or “method of data collection” in literature (Neuman 2014, p. 26). Finally, the term mixed-method design refers to a combination of both quantitative and qualitative research (Bryman 2012, p. 628).
4
An alternative classification framework for research design can, for example, be found in Neuman (2014, p. 26). However, the framework is also similar to the one used in this thesis, as most variants and levels resemble those used in this thesis.
4.1 Specification of the Research Design
61
Figure 4.1 Classification framework used to develop the research design (based on: Montero and León 2007; Döring and Bortz 2016, p. 183). The variant chosen in this thesis is highlighted on each level
In the study at hand, the theory that VR can be used for planning and training in the context of manual order picking is under investigation. Furthermore, the target is to conduct a study with many participants (i.e. the investigation of numerous cases) in order to ensure external validity of the results. This means that a quantitative approach is most suitable for answering RQ 2 and RQ 3. Subject of the study The second level in Figure 4.1 defines the subject of the study. On this level, studies can be distinguished between theoretical, methodological, and empirical studies. According to Döring and Bortz (2016, p. 186), a theoretical study uses available literature in order to analyse the state of research. Another, slightly different definition is provided in Montero and León (2007), who claim that theoretical studies only provide analyses of theories and models without using previously collected data. However, both definitions have in common that no data is generated for such studies. The same holds true for methodological studies, which focus on comparing or developing research methods (Döring and Bortz 2016, p. 187). In contrast, it is characteristic for empirical studies to collect data. This can either be original data or
62
4
Experimental Design for Evaluating the Usability …
data gathered by replicating a previously published study (Döring and Bortz 2016, p. 186; Montero and León 2007). To answer RQ 2 and RQ 3, neither a theoretical nor a methodological approach are pursued. Instead, a collection of original data is needed. Thus, the subject of the study at hand can be defined as empirical. Origin of empirical data The third level in Figure 4.1 deals with the origin of the empirical data. A study can either be experimental, quasi-experimental or descriptive. In experimental studies, the experimenter has control over at least one factor, i.e. at least one independent variable is altered intentionally and systematically in order to investigate a causeeffect relationship on other variables (Ryan 2007, p. 5; Shadish et al. 2002, pp. 1– 7,12). This means that at least two independent measurements must be taken (e.g., in two different groups of subjects) to represent two independent expressions of the independent variable. Each expression of the independent variable applied to the subjects is also referred to as experimental treatment. In some cases, it is advisable to have a group of subjects which does not receive any experimental treatment, i.e. the independent variable is left at its default or null value (Jackson and Cox 2013). This group is referred to as the control group. The other groups are called experimental or treatment groups. In an experimental study, subjects are assigned to groups randomly. These experimental studies are therefore often referred to as true experiments or randomized controlled trials if a control group is used (Bryman 2012, p. 51). Quasi-experimental studies are similar to experimental studies, except that group assignment is not performed randomly (Jackson and Cox 2013; Montero and León 2007; Döring and Bortz 2016, p. 193; Ryan 2007, pp. 5, 23; Shadish et al. 2002, p. 12Campbell 1957). In a descriptive study, an existing population is described, e.g., by performing systematic observation or by using surveys.5 This means that, in contrast to experimental studies, a systematic manipulation of variables is not performed (Montero and León 2007; Jackson and Cox 2013; Döring and Bortz 2016, p. 193).6 For the study at hand, no population exists that can be described. Instead, the effect of performing manual order picking in VR and in a real environment is under investigation. The environment in which order picking is performed (VR or reality) can be regarded as the independent variable of the study, which is manipulated inten5
For this reason, descriptive studies can also be referred to as observations. Note that Montero and León (2007) further distinguish between descriptive and ex post facto studies. However, ex post facto studies also lack intentional manipulation of variables and therefore fall in the same category as descriptive studies on this level of the framework (Ryan 2007, p. 5).
6
4.1 Specification of the Research Design
63
tionally. Participants picking in the real environment can be regarded as the control group. Furthermore, random assignment of subjects to groups can increase internal validity and should – if possible – be pursued in experimental studies (Ryan 2007, pp. 6–7; Bryman 2012, p. 50; Cook and Campbell 1986). As random assignment is indeed possible, a quasi-experimental design can be ruled out. Thus, a randomized controlled trial has been chosen for this thesis. Location of the experiment The fourth level of Figure 4.1 deals with the location of the experimental study. The study can either be a laboratory or a field experiment. Laboratory experiments take place in an artificial environment, enabling full control of the environmental conditions and facilitating the assignment of experimental conditions to subjects (Jackson and Cox 2013; Bryman 2012, p. 55). Laboratory experiments are therefore characterized by a high internal validity (Jackson and Cox 2013). Field experiments take place in the natural environment of the activities that are under investigation. Field experiments therefore normally lead to a higher external validity (Bryman 2012, p. 55; Jackson and Cox 2013). However, a natural environment for virtual order picking simply does not yet exist. As the study at hand aims to provide fundamental research on the general usability of VR in manual order picking and the control of environmental variables is essential, a laboratory experiment has been chosen. Note that laboratory experiments are generally considered a valuable approach for research in the logistics sector (Deck and Smith 2013). Number of treatments per subject The fifth level in Figure 4.1, deals with the number of treatments per subject, namely between-subject and within-subject designs. In between-subject designs, each subject of a study receives only one treatment. In contrast, experiments using a withinsubject design expose each subject to multiple treatments. Within-subject designs are especially valuable when investigating changes in individual behaviour caused by changes in the independent variable. Moreover, random assignment of subjects to treatment groups is not necessary. Instead, independence of the treatments must be ensured in order to establish internal validity. Otherwise, results can be biased in within-subjects designs due to unwanted effects such as learning or fatigue. Furthermore, the order of treatments is an important factor in within-subject designs (Charness et al. 2012; Croson 2002). Between-subjects designs have the advantage that the aforementioned effects can be eliminated by balancing causal factors between groups (Montero and León 2007, p. 851). However, between-subjects designs can cause a larger variance in the data simply because different subjects are used for the study (Hejtmanek et al. 2020, p. 500). Therefore, the statistical
64
4
Experimental Design for Evaluating the Usability …
power of between-subject designs can be smaller and larger sample sizes are generally needed compared to within-subject designs, as each subject can only receive a single treatment (Charness et al. 2012). For the evaluation of the usability of VR in manual order picking, a betweensubject design is preferred in order to exclude within-subject effects. However, to evaluate learning in VR, these effects are of special interest and Pollard et al. (2020) recommend using within-subject designs when learning effects are under consideration. To resolve the conflict of these two possible approaches, Charness et al. (2012) suggest using a design that allows both between-subject analysis as well as within-subject analysis. Generally, more information can be extracted from a single sample in these designs. In the study at hand, an experimental design that enables both approaches has therefore been chosen. In fact, a within-subject analysis is used to answer RQ 2.2, whereas between-subject analyses are used to answer RQ 2.1, RQ 2.3 and RQ 3.2.
4.2
Implementation of the Research Design
As has been described in the previous section, a randomized controlled experiment has been chosen for answering RQ 2 and RQ 3. In this section, the implementation of the selected research design, i.e. the laboratory experiment, is described in detail. The research design builds upon the results for RQ 1 presented in chapter three by focusing on picking and searching activities and excluding picker travelling. According to Coleman and Montgomery (1993, p. 2) and Neuman (2014, p. 290), experiments can consist of different elements, such as the different experimental groups (including the control group), the treatments and expressions of the independent variable with its factors and levels, and the selection of dependent variables.7 These elements of the experimental design are thus described successively in this section. First, the general design process is summarized in Section 4.2.1. Then, the experimental groups and the randomization process are presented in Section 4.2.2, followed by a description of the experimental procedures and treatments per group in Section 4.2.3. The laboratory setup and the modelling of the two expressions of the independent variable are described in Section 4.2.4.
7
Neuman (2014, p. 290) lists a total of seven elements, stating, however, that not all experimental designs necessarily consist of all these elements and some designs might have additional elements. Similar steps in designing experimental studies are suggested by Jackson and Cox (2013, p. 34).
4.2 Implementation of the Research Design
65
Note that for scientific research in general, and especially for experiments with human participants, it is essential to be aware of ethical principles (Bryman 2012, p. 133; Döring and Bortz 2016, pp. 123–128), such as the principles defined in the Code of Ethics of the American Sociological Association (American Sociological Association 2018). These principles were therefore carefully considered when developing the experimental setup. Additionally, the setup as well as the experimental procedure were presented to and subsequently approved by the Ethics Commission of TU Darmstadt.8
4.2.1
Overview of the Design Process Together with Logistics Managers and Engineers
As stated by Neuman (2014, p. 28) and Döring and Bortz (2016, p. 91), practical relevance is an important aspect of scientific research. In order to ensure the practical relevance of the experimental setup in this thesis, it was developed in two half-day workshops at TU Darmstadt together with professionals from the logistics industry. The workshops were open for all interested participants and invitations were sent to warehouse operators as well as intralogistics manufacturers via the Verband Deutscher Maschinen- und Anlagenbau e.V. (VDMA), a German manufacturer’s association. Manufacturers as well as users of intralogistics technology were identified as the primary users of the results of the study and their input was therefore considered most important. An overview of both workshops with their contents and the participants can be found in Table 4.1. In the first workshop, which took place in May 2017, the general requirements and parameters of the setup were discussed in order to ensure a realistic scenario for the experimental setup. Based on the results of the workshop, a preliminary setup was developed and subsequently tested in a first study with a small number of participants. The results of the first study were published in Elbert et al. (2018).9 Furthermore, the results were presented to the logistics managers and engineers in the second workshop in October 2017, in order to identify and subsequently deal with potential limitations of the setup. Note that the number of participants was smaller in the second workshop, as intralogistics manufacturer 3, which sent four participants 8
The final vote of the Ethics commission can be found in can be found in Appendix E in the electronic supplementary material. 9 Additional preliminary studies have been performed to lay the ground for the development of the final experimental design. The results of these studies can be found in Grau (2020), Albert (2019), and Jolmes (2019).
66
4
Experimental Design for Evaluating the Usability …
Table 4.1 Overview of both workshops to develop the experimental design, including participants and their company and position
to the first workshop, only sent one participant to the second workshop. Others, who took part in the first but not in the second workshop, gave time constraints as the reason for not participating. They were informed of the results via a report sent out after the workshop. Both workshops were recorded and afterwards the audios were closely analysed. The experimental setup that was finally implemented is based on these discussions and will be described in the following sections.
4.2 Implementation of the Research Design
4.2.2
67
Experimental Groups and Randomization
The environment in which the order picking takes place represents the independent variable with two possible expressions (virtual reality or real environment). As a between-subject design has been chosen for answering RQ 2.1, RQ 2.3 and RQ 3.2, participants were divided randomly into two groups: The first group (“group VR” – virtual reality) had to perform a picking task in VR. The second group (“group RR” – real rack) can be considered as the control group: Participants of this group had to perform the same picking task in a real environment. As has already been pointed out in Section 4.1.1, random assignment of participants to experimental groups is a critical factor in between-subject designs. Random assignment eliminates unexpected effects or bias in the obtained data and establishes internal validity (Bryman 2012, p. 47; Croson 2002, pp. 939–940; Campbell 1957; Berger et al. 2018, p. 7; Katok 2012, p. 18 Charness et al. 2012, p. 2). Randomization can thus reduce systematic error and ensure that any differences in the groups are really caused by the different treatment (Jackson and Cox 2013, p. 33; Shadish et al. 2002, p. 13). Furthermore, it is advised to describe the process of randomization in detail in order to ensure transparency (Montero and León 2007, p. 851). In this thesis, randomization has been realized by allowing participants to choose one of the various time slots for conducting the experiment. Previously, each time slot was randomly assigned either to an experiment including VR or to an experiment solely in the real environment using the sample() function in R. However, participants did not have any information on the treatment when choosing their time slot. This procedure ensured equal probability of each participant to be assigned to either group VR or group RR. In addition, randomly assigning treatments to the available timeslots prior to the experiments ensured the generation of equally sized groups. If a time slot remained unused, randomization was repeated for the remaining time slots in order to maintain equal group sizes.
4.2.3
Experimental Procedure and Treatments
The experiment was scheduled to take a maximum of 60 min per participant and consisted of nine steps. The individual steps are depicted in Figure 4.2 and will be described in detail below. In the first step, each participant was asked to sign a statement of consent on the collection and usage of their personal data and the potential risks and dangers of the study. In the second step, the participants received two pages of instructional
68
4
Experimental Design for Evaluating the Usability …
Figure 4.2 Experimental procedure consisting of nine steps
4.2 Implementation of the Research Design
69
text, which explained the task and procedure.10 Next the participants were asked to complete a questionnaire recording personal data.11 In the fourth step, participants in group VR were able to test the handling of the handheld controllers used for physical interaction in VR. Therefore, they were allowed to perform one or two picks before the start of the actual order picking. In order to minimize the influence on the results, testing of the controllers was kept as short as possible. Apart from that, participants of group VR did not receive any additional training on the use of the VR controllers. The actual picking task started with step five in which each participant had to pick a total of 64 orders. Here, the two groups received a different treatment: Participants of group VR performed these picks in VR, whereas participants of group RR performed the picks in the real environment. Workshop participants in workshop 2 considered the total number of orders as an adequate compromise between a picking task that was long enough to observe effects of fatigue and loss of concentration on the one hand and feasibility within a large scale study on the other hand. Moreover, the number of orders ensured that the duration of the experimental study stayed below the maximum duration of 55 to 70 min suggested by Kourtesis et al. (2019) for VR sessions in experimental research. As can be seen in Figure 4.2, the 64 orders had to be divided into four sets consisting of sixteen orders each because order bins had to be emptied frequently, and the picking rack needed refilling.12 During this refilling, however, participants were able to continue picking on a second rack opposite the first rack. By doing so, all 64 orders could be picked without interruption. In VR, the rack was also refilled after 16 orders by resetting the simulation. After completing the 64 orders, each participant was asked to fill in a second questionnaire in step six, capturing the NASA task load index (NASA-TLX).13 For the purpose of verifying the results, each participant in both groups was subsequently asked to perform another 64 picking orders in step seven. These orders had to be picked in the real environment by both groups. In step eight, participants were asked to complete the second questionnaire capturing the NASA-TLX one more time, this time evaluating the perceived workload in the second block of 64 orders. Finally, each participant had to fill in a third questionnaire used to evaluate the experimental setup.
10
The instructional text can be found in Appendix F in the electronic supplementary material. Detailed information on each questionnaire will be given in Section 4.3.3. 12 For more details on the refilling of the rack, please refer to Section 4.2.4. 13 For details on the NASA-TLX, please refer to Section 4.3.2. 11
70
4.2.4
4
Experimental Design for Evaluating the Usability …
Laboratory Setup and Apparatus
This section describes how the two different expressions of the independent variable, i.e. the real and the virtual order picking environment, have been modelled. To do so, the real and the virtual order picking setup and the apparatus for displaying the VR are first explained in detail. Subsequently, the picking items used in the study are described and how the rack was refilled to make continuous picking of a large number of orders possible. Then, the order picking method – in this case a pick-byvoice method – and its implementation are described. Finally, the adjustments that were made to enable the gathering of additional sensor data are explained. Modelling of the real and the virtual order picking setup Because the study focuses on picking and searching in a parts-to-picker system, only a single rack with five levels and six storage positions on each level was erected. The basic rack has a total height of 180 cm and is a standard warehousing rack that is commercially available and has been purchased from a retailer. In order to have six storage positions on each level, two racks were purchased and then firmly connected side by side using screws to yield a single rack with a total width of 180 cm. The top and bottom level of the rack were not used during the experiment. This means that a total of 18 different storage positions on three levels (at a height of 60 cm, 100 cm and 140 cm) were available for the experiment. Each level was assigned a number from zero (for the lowest level) to five (for the highest level). Similarly, every picking position on each level was numbered from left to right with numbers from one to six. Underneath each picking position, both the number of the picking position and the corresponding level were displayed on a small sign. Each picking position contained an open box with a size of 30 x 40 cm and a height of 12 cm to hold the items for picking. The dimensions of the picking rack and the different levels and positions are depicted in Figure 4.3. In addition to the picking rack, a two-level picking cart with four order bins (two on each level, each with a size of of 30 x 40 cm and a height of 12 cm) was placed at a distance of 170 cm from the picking rack. The two levels of the cart were at a height of 14 and 66 cm. The picking cart and its dimensions are depicted in Figure 4.4. By having four order bins, the setup represents a multi-order picking scenario. As practitioners confirmed during the first workshop, the chosen rack and cart dimensions are representative for a real order picking setup, especially for the case of pick-to-belt or parts-to-picker systems. In VR, the rack and the cart were modelled to be as similar as possible to the real rack. Especially, rack dimensions, distances and colours were modelled to closely match the measures in the real environment. Note that the simulation displays the picking rack and the picking
4.2 Implementation of the Research Design
71
Figure 4.3 Levels and positions of the order picking rack with corresponding dimensions
cart filled with the order bins and items. Additionally, users were able to see the controllers in VR as reference for their hands. However, neither the users’ body (avatar) nor additional environmental details were depicted in the simulation. This way, the VR setup uses a low level of abstraction and an exact replication of the picking rack (as suggested by the findings of chapter three) but a high level of abstraction for the environment to avoid any unwanted distraction. The entire setup in the real and in the virtual environment can be seen in Figure 4.5. With the participants’ consent, each experiment in both the real and the virtual environment was recorded on video so that a precise analysis could be done after the experiments had taken place. Apparatus of the virtual order picking setup The virtual order picking setup was modelled using the software Unity 2018.2 (Unity Technologies, San Francisco, California, USA). The simulation was run on an MSI
72
4
Experimental Design for Evaluating the Usability …
Figure 4.4 Levels and bin positions of the order picking cart containing the four order bins with corresponding dimensions
Figure 4.5 The laboratory setup with the order picking rack and picking cart in the real (a) and the virtual environment (b)
GT73VR notebook (Micro Star International, Zhonghe, Taiwan) with an Intel i76820HK processor (Intel Corporation, Santa Clara, California, USA) with 4 cores, 8 MB Cache and a frequency of 2.7 GHz. The notebook was also equipped with 32 GB RAM and a Nvidia GeForce GTX 1070 graphics card (Nvidia Corporation, Santa Clara, California, USA).
4.2 Implementation of the Research Design
73
As an HMD for displaying the VR, the HTC Vive (High Tech Computer Corporation, Taoyuan, Taiwan) was chosen. The HTC Vive is a popular HMD in VR research (see Section 3.4.1) and is considered one of the most advanced HMD in terms of display resolution, field of view, and movement tracking of all HMDs which were commercially available at the time of the experiments (Borrego et al. 2018; Niehorster et al. 2017). Furthermore, using the HTC Vive together with the software Unity provides high accuracy and precision (Wiesing et al. 2020, p. 19). The HTC Vive consists of the HMD itself, two handheld controllers and two base stations (“lighthouse base stations”, Valve Corporation, Bellevue, Washington State, USA) that use infrared lasers for tracking the HMD’s and the controllers’ movement in space (Coburn et al. 2017). Each controller has a weight of approximately 0.2 kg and the HMD itself weighs 0.6 kg. The HTC Vive is connected to a computer by cable. During each experiment, the cable was held carefully by an assistant, ensuring no interference of the cable with the participants’ movement. An image of a participant using the HTC Vive during the experiments can be seen in Figure 4.6. All components of the VR system (HMD, base stations and controllers) are highlighted in the figure. The figure also shows how the base stations were erected in the corner of the VR area. The controllers of the HTC Vive have several buttons that enable direct user interaction in the virtual environment. For the experimental study though, only one button on the bottom side of the controller (shown in Figure 4.7) was used. By pressing this button upon controller contact with an item and subsequently holding the button pressed, participants were able to grasp the item. The item was dropped when the button was released. Using only one button on the controller ensured the simplicity of the task. Additionally, items in VR were modelled to turn red during contact with a controller in order to compensate for the missing haptic feedback. The instructional text provided to participants in step 2 of the experimental procedure (see Figure 4.2) explains how the controllers are used. Note that the usability of the HTC Vive controllers has already been investigated and compared to other devices by De Paolis and De Luca (2020), Kwon (2019) and Figueiredo et al. (2018). All three studies agree that the controllers are highly accurate and provide users with a strong sense of presence, therefore allowing users to realistically interact with VR.14
14
Moreover, an analysis of different movements performed in VR using the HTC Vive’s controllers can be found in Nanjappan et al. (2018).
74
4
Experimental Design for Evaluating the Usability …
Figure 4.6 A participant using the HTC Vive for picking in VR during the experiment
Figure 4.7 Handheld controller of the HTC Vive system with the button used for picking items in VR
4.2 Implementation of the Research Design
75
Modelling of the picking items and refilling of the rack A drawback of the HTC Vive’s controllers (and a limitation of VR HMDs in general – see Section 2.1.2) is their inability to simulate item weight (Coburn et al. 2017, p. 6). In order to ensure internal validity of the setup, the influence of item weight was minimized in the real environment by using empty white cardboard boxes as items. The boxes have a size of 6 x 6 x 12 cm and their weight is considered negligible. In VR, exact replicas of these boxes were modelled to be used as items. Workshop participant no. 14 confirmed that the size of the cardboard boxes is comparable to the majority of items which are manually picked in his company. Another limitation of the handheld controllers of the HTC Vive lies in the fact that they only allow the users to grasp one item at a time. In reality though, pickers are able to take more than one item of the described size in each hand. In order to ensure the virtual and real setup to be as similar as possible, participants in the real environment were asked not to pick more than two items at any one time.15 Each picking location was filled with 20 items. Pictures of the real and the virtual order picking rack filled with picking items can also be seen in Figure 4.5 on page 72. Only having room for 20 items per picking position meant that the rack had to be refilled after each set of 16 orders. Furthermore, the order bins also only had a capacity of 20 items and needed to be emptied. To enable continuous picking of a large number of orders, the setup was mirrored in the real environment, i.e. a second rack and a second cart were placed opposite the first rack. By using a short verbal signal at the end of each set, participants were asked to switch to the other rack. Doing so, participants could continue picking with minimum delay while the other rack was refilled manually by an assistant. A top-down view of the experimental setup in the real environment is given in Figure 4.8. In VR, modelling a second rack was not necessary, as resetting the rack to its originally filled state was easily possible with just one click. Order picking method For the experiments, a pick-by-voice system was used as this is considered a common method in practice (Battini et al. 2015, p. 485). Workshop participants 1 and 14 also agreed that pick-by-voice is often used in the industry. For each order, an audio command spoken by a computer voice was played to the participants. The command gives the rack level (L), the associated position (P), and the number of items to be picked for a specific order bin (O) as in the following example: L3 P5 4 items for O2. In this example, a total of four items need to be picked from position five on the third 15
Note that a similar restriction in a real experimental setup allowing the picking of only one item with one hand can be found in Finnsgård and Wänström (2013).
76
4
Experimental Design for Evaluating the Usability …
level of the rack and dropped into order bin number two.16 For each order, between one and nine items had to be picked from one position and then dropped into one of the four order bins. During workshop 2, workshop participants confirmed that this is a realistic scenario for real multi-order picking. The orders (i.e. item location, number of items and the order bin for which items need to be picked) were generated randomly once, using uniform distributions for each parameter. It was ensured, however, that for each set of 16 orders, exactly 80 items were picked in total (20 items for each order bin and a total of 320 items in each block of four sets) in order to make the total number of items comparable for each set. Note that all participants in both groups received the same picking orders in the same sequence. During the experiment, participants were able to respond verbally to the commands: They could either confirm the current order triggering the next command with the command “confirmed” or ask for a replay of the last command by saying “again”.17 Triggering the commands was realized manually by a wizard-of-oz-
Figure 4.8 Top-down view of the experimental setup in the real environment. In VR, only one rack and cart has been simulated 16
As the experimental study was carried out in Germany, picking commands were given in German. The original German version of the command in the example would be: E3 P5 4 Stück für A2. 17 Again, those commands were spoken in German, i.e. “Bestätigt” (confirmed) or “Nochmal” (again).
4.2 Implementation of the Research Design
77
technique. Here, a human experimenter (“wizard”) starts the commands manually while pretending to be a computer. The wizard-of-oz-technique is easy to implement and provides more flexibility compared to fully automated systems (Green and Wei-Haas 1985).18 To play the picking commands, a self-developed Java software and loudspeakers were used instead of a regular pick-by-voice headset, because this enabled the experimenter to listen to the commands. Close attention was paid to the location of the loudspeakers, making sure that their height and distance to the order picking rack were equal in both the virtual and the real environment. The same person served as the experimenter in the entire study in order to reduce bias caused by manually playing the picking commands and to ensure comparability of the experiments. Collection of additional sensor data After the first 35 participants had taken part in the experiment and after the results had been briefly analysed, the decision was made to alter the experimental setup in order to gather additional data during the experiments. With the aim of accessing the time needed by the participants for searching and picking the items from the rack, two infrared sensors of the type Panasonic CY-192B-Y with retroreflectors (Panasonic Corporation, Kadoma, Japan)19 were attached to each level of the real rack. The sensors were aligned carefully to cover the entire height of each level. It could thus be ensured that the sensors reliably registered each time the picker’s hand entered or left the corresponding level of the picking rack. Moreover, the order picking cart was replaced by an additional rack which made it possible to attach infrared sensors on each level as well. It was ensured that the height of each level of this rack matched the height of each level of the cart and distances were kept exactly the same as in the previous setup so that picker movement and picking times were not affected by the change in the setup. The altered setup is displayed in Figure 4.9. As can be seen in the figure, the virtual setup was adopted accordingly to match the real setup. Here, trigger fields that register each contact with one of the VR controllers and thus match the function of the infrared sensors in reality were implemented. The general alignment of infrared sensors and trigger fields along with an image of the sensors used is depicted in Figure 4.10.
18 19
A similar system is, for example, used by Vélaz et al. (2014). The datasheet of the sensors and the retroreflectors can be found in Panasonic (2020).
78
4
Experimental Design for Evaluating the Usability …
To measure picking times, the infrared sensors were connected to a single-board computer of the type Raspberry Pi 3 B+ (Raspberry Pi Foundation, Cambridge, United Kingdom). A script written in Python was used to write a timestamp along with the respective rack level number to a database each time one of the sensors in each level of the rack was triggered. Another time stamp was written to the database once the sensor was released again. By also connecting the computer running the VR simulation in Unity to the Raspberry Pi via a local network, sensor triggering in VR could be handled in the same way as sensor triggering at the real rack, i.e. by writing similar entries to the database. In order to match the operation of the infrared sensors in the real environment, trigger fields in VR were programmed to not detect simultaneous interruptions. This means that if one of the handheld controllers is already located within a trigger field, the entry of the second controller into the same trigger field does not cause a second database entry. Similarly, if two controllers are located within the trigger field simultaneously, only the last controller leaving the trigger field causes the program to write a timestamp for the end of the interruption. Furthermore, the experimenter’s computer running the software for playing picking orders was also connected to the Raspberry Pi via the same local network. With this setup, timestamps of the start and end of each order can also be written to the same common database. This made it possible to assign each sensor interruption to a specific order. The entire setup for writing the timestamps to the common database is illustrated in Figure 4.11. The code of the software tool used for manually playing the picking orders, the code for controlling the infrared sensors and the VR model are publicly available in Knigge (2020c).
Figure 4.9 The altered setup in the real (a) and the virtual environment (b)
4.2 Implementation of the Research Design
79
Figure 4.10 Alignment of the infrared sensors and reflectors in the real setup (a) and trigger fields in the virtual setup (b)
Figure 4.11 Setup for writing time stamps of the infrared sensors and orders to a common database
80
4
Experimental Design for Evaluating the Usability …
4.3
Operationalisation of the Research Questions
4.3.1
Selection of the Dependent Variables
The independent variable of the experiment is given by the different environments (VR and real environment) as described above. In order to answer RQ 2.1 and RQ 3.2, human performance must be quantified and translated into dependent variables that can be measured in the experiment. According to Bliss et al. (2015, pp. 750–752), VR environments in general provide easy access to a large variety of measures connected to human performance. For quantifying human performance, the American National Standards Institute (ANSI) lists 10 categories of performance indicators, namely time, accuracy, amount achieved or accomplished, frequency of occurrence, behaviour categorization by observers, consumption or quantity used, workload, probability, space / distance and errors (ANSI 1993; cited in Bliss et al. 2015, p. 753). This list has been used to select measures relevant to the research questions. First of all, the individually perceived workload can serve as a dependent variable in the study at hand. It can be measured using the NASA-TLX and is considered a subjective performance measure as the NASA-TLX includes a rating for the individually perceived performance by participants. It is therefore assumed to covary with certain overall performance measures (Hart 2006, p. 906). Moreover, as described in Section 2.2.2, human performance in manual order picking can generally be specified in terms of time and quality. Thus, the time needed by human pickers to complete the picking of an order (i.e. the order completion time) can be used as a dependent variable in the setup. As described in Section 4.2.4, the number of items varies in each order; however, the total number of items is equal to 80 in each set. Therefore, to ensure comparability, set completion times are used for the analysis instead of order completion times. In chapter three, manual order picking was further divided into specific activities of which especially searching and picking were found suitable for being simulated in VR. Searching and picking times are considered the most significant shares of the total order completion times in the given setup. Note that total order completion times mainly consist of four time-consuming activities: Searching for items, picking of items, picker movement between the rack and the order bins and dropping of items into the corresponding order bin. Due to the equal distances in VR and the real environment, time requirements for picker movement is assumed equal. Furthermore, dropping items into the order bins is assumed to account for only a very small share of the total order completion time. For this reason, mainly searching
4.3 Operationalisation of the Research Questions
81
and picking times are considered relevant shares of the order completion times and their sum per set serve as additional dependent variables. Section 2.2.2 also revealed that quality in manual order picking refers to the number of picking errors and item breakage. A picking error can either be an item picked from a wrong location, an item dropped into the wrong order bin or the wrong number of items being picked for one order. Because the cardboard boxes used as items in the experimental setup cannot break, the number of items dropped out of the hand of the human picker has been selected as a dependent variable referring to the breakage of items. Evaluating the number of dropped items in VR and in the real environment is also interesting because it can provide insight into whether the limitations of the controllers of the VR HMD affect the grasping of and holding onto the items in VR. In summary, the following dependent variables have been selected for the analysis: 1. 2. 3. 4. 5. 6.
The individually perceived workload The set completion time The picking time The searching time The number of orders with picking errors The number of orders with dropped items.
To effectively conduct quantitative research, Farrugia et al. (2010, p. 280) recommend translating previously formulated research questions into more specific research hypotheses before analysing the data. These hypotheses specify the relationship between the investigated effect and its potential cause in terms of existence, size and direction of the effect. Research hypotheses are therefore directly connected to the selection of dependent and independent variables (Döring and Bortz 2016, p. 146). In this thesis, especially RQ 2.1, which focuses on the comparison of human performance for the evaluation of the usability of VR HMDs for planning manual order picking, as well as RQ 3.2 will be answered by statistically analysing the results of the dependent variables from the experimental study. Therefore, these research questions are translated into more specific research hypotheses in Section 5.1 (RQ 2.1) and Section 6.4.1 (RQ 3.2).
82
4.3.2
4
Experimental Design for Evaluating the Usability …
Measurement of the Dependent Variables
The measurement of the perceived workload is done via specific questionnaires assessing the NASA-TLX. The NASA-TLX has been chosen for operationalising the perceived workload because it is considered a well-established and easy-to-use method (Hart 2006). The NASA-TLX makes it possible to measure the individually perceived workload of a task by evaluating the answers of participants to six items, namely mental demand, physical demand, temporal demand, performance, effort and frustration. Furthermore, the relative weight of each item is calculated by letting participants complete a pairwise comparison of each item (yielding 15 pairwise comparisons in total). The NASA-TLX is then calculated by summing up the results for each item multiplied by their relative weight (Hart and Staveland 1988). Some researchers, however, recommend using the so-called “raw” (i.e. unweighed) NASA-TLX mainly for reasons of simplicity (Hart 2006; Bustamante and Spain 2008, p. 1523). Therefore, in this thesis, both the weighted and raw NASA-TLX have been calculated. The set completion time is calculated directly by the software that is used to play the picking commands. The software records the time between each two succeeding orders and saves it into a .csv-file. For the last order of each block, the time between the start of the order and the end of the experiment is used to calculate the order completion time. By summing up the order completion time of each set of orders (i.e. 16 orders each), the set completion time can be calculated. The picking time has been defined as the time during which at least one of the picker’s hand is placed inside the sensors at the picking rack. The picking time can thus be calculated as the time difference between the start and the end of a sensor interruption at the picking rack and subsequently summed up for each set of orders. Even though this is only an estimate for the time that is actually needed to grasp items, it is assumed to be an adequate estimate for the comparison of picking times, as the remaining movement of the picker’s hand within the rack is small and of equal magnitude in the real and the virtual environment and can therefore be neglected. To calculate searching times, the time difference between the start of each order and the first triggering of a sensor at the correct level of the picking rack is calculated. It is assumed that this time interval is spent by the picker searching for the correct picking position within the rack. Again, these times have been summed up for each set of orders. Picking errors are measured manually using the software for playing picking commands. The software allows the experimenter to mark picking errors during an experiment by pressing an assigned button on the keyboard. For this purpose, all relevant information on the current order is displayed to the experimenter, who has
4.3 Operationalisation of the Research Questions
83
to watch the picking process carefully. Picking errors are not summed up for each set but for each entire block of 64 orders (i.e. sets 1–4 and sets 5–8). The number of orders with dropped items is measured in a similar way: The experimenter marks each order in which an item has unintentionally been dropped out of the participant’s hand (resp. the controller in VR) using a keyboard key and the software. In the real setup, the experimenter was able to identify these errors by simply watching the participants while picking. During experiments in VR, the image visible to participants wearing the HMD was simultaneously displayed on the laptop used for running the simulation, allowing the experimenter to register picking errors and dropped items.20 The different methods for calculating the dependent variables for the analysis are summarized in Table 4.2. The calculations as well as the statistical analyses of the performance measures, which will be presented in the subsequent chapter, have been conducted using R version 3.6.1 (with standard packages) and the software R studio version 1.2.1335. The complete R code (including the code for the upcoming data preparation and statistical analyses) can be found in Knigge (2020c).
Table 4.2 Dependent variables and calculation methods
20
Note that the database also contains data from the infrared sensors located at the order bins in which participants had to drop items. However, this data has not been analysed in this thesis.
84
4.3.3
4
Experimental Design for Evaluating the Usability …
Questionnaire Design
As can be seen in Figure 4.2 on page 68, participants of both groups were asked to fill in three different questionnaires. Note that questionnaire 2 had to be filled in twice (once after each block of 64 orders), meaning that each participant had to complete a total of four questionnaires. Participants were told that it is acceptable for them to skip questions they do not want to answer or are unable to answer. Each participant’s questionnaires were coded with the name of the participant’s group and a consecutive number in order to ensure anonymity while still making it possible to refer questionnaire results to the results from the treatments. Furthermore, each participant was left alone to fill in the questionnaires and the questionnaires were collected blindly to avoid any unwanted influence by the experimenter. While questionnaire 2 strictly follows an established questionnaire design for deducting the NASA-TLX, questionnaires 1 and 3 were individually designed for this study using the recommendations provided by Anastas (2000, pp. 372–392) and Bryman (2012, p. 237–239). An overview of the three different questionnaires and the step at which they had to be filled in during the experimental procedure is given in Table 4.3. Note that all questions apart from the question about the participant’s age were designed as structured questions with closed answers21 in order to facilitate efficient evaluation of the questionnaires (Anastas 2000, p. 373). All questionnaires were provided in German and can be found in Appendix G in the electronic supplementary material. As recommended by Döring and Bortz (2016, p. 410), pretests of the questionnaires were performed during the preliminary study published in Elbert et al. (2018). Questionnaire 1, which had to be filled in at the beginning of the experimental procedure, was used to gather sociodemographic information and previous knowledge of the participants in order to generate a descriptive analysis of the sample. It consists of three items that were used in this study.22 The first and second item ask for the participants’ age and gender. The third item refers to participants’ experience with VR prior to the experiments. The response to the last item has been designed as a five-level, fully verbalized frequency scale as recommended by Weijters et al.
21
In closed questions, participants are only able to chose from different response options provided in the questionnaire (Bryman 2012, p. 238). 22 Note that questionnaires 1 and 3 contain further items, which are not listed here. These items were added to the questionnaire during the questionnaire design but were later excluded from the analysis for this thesis. For reasons of clarity, only those items that were analysed in this thesis are described in this section. However, for an analysis of further items from the questionnaires, please refer to Elbert et al. (2019) and Makhlouf (2018).
4.3 Operationalisation of the Research Questions
85
(2010, p. 246).23 For the evaluation of the questionnaire, the response scale was numerically coded using the number 1 for the lowest possible frequency (“I have never been in contact with VR before.”) and 5 for the highest possible frequency (“I use VR regularly.”). Table 4.3 Overview of the items in the questionnaires and the step at which the questionnaire had to be filled in (referring to the steps of the experimental procedure given in Figure 4.2)
Questionnaire 2 was used to capture the NASA-TLX. As has been described in detail in Section 4.3.2, the NASA-TLX is calculated based on six different items and 15 pairwise comparisons of the items. Based on the recommendations by Hart and Staveland (1988), the items are coded as ten-level, endpoint verbalized rating scales. Questionnaire 3, which had to be filled in at the end of the experimental procedure, was used to evaluate the experimental setup. This questionnaire consists of four items. The first item asks how well participants understood the overall task. It is recommended that general understanding of the provided instructions is evaluated because missing or faulty understanding of a task can have a recognizable influence on the initial performance in experimental studies (Eiriksdottir and Catrambone 2011). The second item analyses the clarity of the pick-by-voice commands by asking how well participants understood the audio. The third and fourth item ask participants how clearly the items were visible in the real and in the virtual rack. 23
In fully verbalized scales, each level of the answer is individually labelled. In endpoint verbalized scales, only the two most extreme answer levels are labelled (Krebs 2012, p. 105).
86
4
Experimental Design for Evaluating the Usability …
Note that only participants of group VR were asked about the visibility of items in the virtual rack. All these items were formulated as closed questions providing fivelevel, fully verbalized rating scales for the structured response. For the evaluation, the response scales were again numerically coded from 1 for the lowest possible rating to 5 for the highest possible rating.
4.4
Execution of the Research Study
4.4.1
Sampling Process, Time of the Experiments and Sample Description
Recruitment of participants For conducting the experimental study, volunteers needed to be recruited. Note that Eckel and Grossman (2000) argue that only using volunteers as participants can lead to bias, as characteristics of persons who voluntarily participate in experimental studies might not be generalizable. However, neither Cleave et al. (2013) nor Falk and Zehnder (2011) found such a selection bias when comparing self-selected volunteer samples to entire populations. Therefore, recruiting volunteers as participants was considered appropriate for the study at hand. To set the goal of the recruitment process, an a priori estimation of the necessary sample size was performed using a power analysis as described by Cohen (1988, pp. 1–74).24 To do so, the effect size has been estimated at a value of d = .52 based on the results of the preliminary study provided in Elbert et al. (2018).25 Given a significance level of 5% in a two-tailed, two-sample t-test and a target value of 80% power as recommended by Cohen (1992), the power analysis suggests a sample size of 59 participants per group. To recruit participants, the experiments were advertised repeatedly in six different lectures, practice courses, and seminars at TU Darmstadt. A list of courses in which the study was advertised, is given in Table 4.4. Information on the experiments were provided on slides during these courses and via e-mails to all registered students in the courses along with information on how to voluntarily sign up for the experiments. Most of these courses were attended by students from multiple departments at TU Darmstadt, ensuring some variety in the academic background of the participants. 24
Note that a power analysis was only performed a priori but not retrospectively, because performing power analyses for the purpose of evaluating experimental results is not recommended (Goodman and Berlin 1994). 25 According to Cohen (1988, p. 26), this is considered a medium effect size.
4.4 Execution of the Research Study
87
Interested persons were able to choose a time slot from the available time slots via a website. This means that recruiting took place mainly among students; however, Thomas (2011) argues that students are well suited as participants in experimental studies in logistics if basic aspects of human behaviour or decision making are in focus.26 Nevertheless, student samples can limit the internal validity if personal variables, such as the age, potentially interact with independent variables (Stevens 2011). In order to generate a more heterogeneous sample, additional effort was therefore made to increase the total sample size by recruiting further participants from the university’s administrative and scientific personnel as well as family and friends of the author. For the same reason, the study was also advertised at a public event at TU Darmstadt in June 2018. Additionally, participants were recruited among professional order pickers who were employed with a local logistics service provider. They were informed about the experiments by their employer and were offered the possibility to sign up for the experiments voluntarily. The order pickers took part in the experiments during their regular working shift and the experiments were conducted in the warehouse of the logistics service provider. As environmental conditions with the order pickers thus differed considerably from the other experiments, they were excluded from the general sample but were analysed separately and compared to the main sample with the primary purpose of validation and verification.
Table 4.4 Courses in which the experimental study was advertised at TU Darmstadt for the recruitment of participants
26
For the case of a trust experiment, Falk and Zehnder (2011) are also able to show that student participants do not differ significantly from non-student participants.
88
4
Experimental Design for Evaluating the Usability …
Compensation and bonus payments In order to facilitate recruitment (Krawczyk 2011) and generate credibility (Davis and Holt 1993, p. 26), each participant received a compensation of e 10 after completing the task. Smith and Walker (1993) as well as Croson (2005, p. 134), recommend that the minimum payment for participation in experimental studies should compensate for the time needed by participants. The amount of e 10 for completing the experiment with a scheduled length of not more than 60 min was chosen because this value guaranteed a payment higher than the minimum wage of between e 8.84 and e 9.35 per hour that was applicable in Germany at the time of the experiments (“Verordnung zur Anpassung der Höhe des Mindestlohns” 2016; “Zweite Verordnung zur Anpassung der Höhe des Mindestlohns” 2018). Following Deck and Smith (2013, p. 8) and Hertwig and Ortmann (2001, p. 390), an additional motivation for completing the task as fast as possible was provided by giving each participant a performance-dependent bonus between e 0 and e 5. Paying such a performance-dependent incentive is favourable as it can reduce unwanted variability in performance (Davis and Holt 1993, pp. 24–25). Furthermore, paying order pickers performance-dependent incentives is also recommended in warehousing practice (Vries et al. 2016a). The bonus was calculated depending linearly on the individual time needed for the completion of the entire study (i.e. both blocks). Additional penalty times of 30 seconds were added for each erroneous order and participants were informed that picking errors would lead to those penalties. The bonus was calculated independently for each group with the fastest participant of each group receiving the full e 5 bonus. The bonus G i of each other participant i was calculated using the equation G i = 5 ∗ [1 − (
ti − 1)], tbest
(4.1)
with ti being the achieved total time of participant i and tbest being the time of the fastest participant in the respective group. This equation for calculating the bonus was chosen as it led to relatively high bonuses for each participant while still yielding remarkable differences between participants. The compensation and bonus were paid in cash.27 At the beginning of the experiments, the compensation was given to the participants some weeks after conducting the experiments when multiple participants had completed the study. Only in this way was it possible to identify the best participant and determine the value of tbest . In later experiments, the compensation and bonus was paid directly after each partic27
Reasons for giving compensation in cash instead of using other values can be found in Croson (2005, p. 134) and Davis and Holt (1993, p. 25).
4.4 Execution of the Research Study
89
ipant had completed the experiment. Here, the best of all previous participants was used to define tbest . If a participant had better results than the best of the previous participants (i.e. ti < tbest ), this participant received a e 5 bonus and her/his time was thereafter used as tbest . This procedure was chosen so that the compensation could be paid as promptly as possible. Note that the professional order pickers who participated in the study were paid by their employer. For this reason, they did not receive the compensation or the bonus which the other participants received. This is another factor limiting the comparability of the results between the regular sample and the sample of professional order pickers. Time of the experiments The experiments were conducted during four different time periods in November 2017, August 2018, May—July 2019 and November 2019—February 2020. To ensure consistent environmental conditions for all participants and eliminate unwanted inference, the experimental setup, location, apparatus and the person acting as experimenter were not changed. In this way, the effects of large timespans between experiments can be controlled in laboratory experiments (Shadish et al. 2002, p. 56). The only change made to the experimental setup during the experiments were the infrared sensors which were added to the setup in 2019. As these sensors were hardly visible to the participants and did not interfere with the way participants executed the experiment and picked the items from the rack, environmental variables were not affected by this change in the setup. However, sensor data is therefore only available for participants in 2019 and 2020. The study was conducted with the professional order pickers in August 2018. This means that the experiments with professional order pickers were performed before the implementation of infrared sensors in the experimental setup. Description of the final sample In total, 112 volunteers were recruited and participated in the experiment. The full set of raw data obtained during the experimental study for each participant can be found in Knigge (2020a). For reasons of anonymity, each participant’s data was saved under a consecutive number. 56 of the 112 participants were assigned to group VR and 56 to group RR using the randomization procedure described in Section 4.2.2. With the previously estimated effect size of d = .52, the sample size thus achieves a statistical power of 78% in a two-tailed, two-sample t-test with a significance level of 5%. As has been found in Section 3.4.1, the sample size of the study lies above sample sizes of most experimental studies in the field of VR. Only the studies presented by Peukert et al. (2019) and Martínez-Navarro et al. (2019)
90
4
Experimental Design for Evaluating the Usability …
were found to have employed larger numbers of participants.28 Furthermore, Croson (2002, p. 939) recommends a total of 20 to 30 participants per treatment. For this reason, the sample size was deemed sufficient for answering the research questions and the recruitment of participants was stopped at a total number of 112 participants. An overview of the number of participants in different time periods of the study is given in Figure 4.12. As can be seen in the Figure, sensor data (i.e. picking times and searching times) is available for a total of 77 participants.
Figure 4.12 Age distribution of participants in different time periods of the experimental study (NA: Not available due to some participants providing no information on their age)
28
Note that the studies by Willemsen et al. (2009) and Sharples et al. (2008) also use a larger total number of participants. However, the total number of participants in Sharples et al. (2008) divides into a total of three experiments, with 71, 37, and 31 participants. Willemsen et al. (2009) only conduct one experiment, but test a total of eight different conditions in a between-subject design so that the number of participants in each condition is lower than in the study at hand.
4.4 Execution of the Research Study
91
All participants finished the experiment completely. Just one participant in group VR aborted the experiment after completing two sets in the first block. The data of this participant has been deleted and the participant is not included in the final sample of 112 participants. The completion rate is thus calculated at 99%. As the reason for aborting the experiment, the participant reported feeling uneasy and uncomfortable. However, this particular participant explicitly stated having no feeling of nausea or illness so this was not the reason for aborting the experiment. In fact, none of the participants in group VR reported feeling nausea or motion sickness during the experiments, even though it took participants between 13 and 38 min (mean: 20.67 min; median: 19.34 min) to complete the four sets in the first block and each participant completed the four sets in VR without any interruption. Previous research claiming nausea or motion sickness to be an issue limiting the usability of VR (see e.g., Curry et al. 2020; Aldaba and Moussavi 2020; Vaughan et al. 2016; Brough et al. 2007; Jayaram et al. 2001) was therefore not confirmed. Of the local logistics service provider, a total of ten professional order pickers participated in the study. As they served mainly the purpose of validation and verification of the results, they were not assigned to any of the two groups but were analysed separately. However, they performed the experiment in the same sequence like participants in group VR (i.e. the first block in the virtual and the second block in the real environment). In questionnaire 1, participants were asked to give their age and gender. In group VR, 20 participants were female (36% of the 56 participants in the group) and 35 were male (63%). One participant gave no information on her/his sex. In group RR, 20 participants were female (36%) and 36 were male (64%). Thus, in total, 36% of the 112 participants were female and 63% were male. Information on the participants’ ages is also provided in Figure 4.12. As can be seen, recruitment among students at TU Darmstadt led to 71% participants having an age below 30 years. Of the professional order pickers, six were female and four were male. Four had an age of between 20 and 29 years, two between 30 and 39 years and the remaining four between 40 and 49 years. Apart from giving personal information on gender and age, participants were asked in questionnaire 1 to rate their experience of using VR prior to the experiments. Results are given in Figure 4.13. Note that 70 of the 112 participants in both groups (62.5%) stated that they had never used VR before (answers 1 and 2 in Figure 4.13). Only three participants in the sample (2.7%) can be identified as regular VR users (answer 5 in Figure 4.13).29 Of the professional order pickers, only two claimed that
29
Two participants in group VR did not fill in this question, leading to n = 54 in this group.
92
4
Experimental Design for Evaluating the Usability …
they had used VR prior to the experiments and none can be identified as a regular VR user.
4.4.2
Validation and Verification of the Experimental Setup Using the Questionnaire Results
This section provides a validation and verification of the experimental setup. As described in Section 4.3.3, questionnaire 3 has been used to evaluate the experimental setup from the participants’ point of view. Therefore, the results of questionnaire 3 are presented and analysed in order to evaluate the experimental setup. The first item of questionnaire 3 asks participants to rate their understanding of the task, i.e. it makes it possible to validate the instructional text provided prior to performing the picking task. Participants’ responses are summarized in Figure 4.14. Results show that participants generally have a very good understanding of the task with a median of 5 in both groups and only one participant in each group rating his/her understanding as intermediate. A two-sample t-test does not find a significant difference in mean values between the two groups (t(110) = −0.58, p = .564).30 Professional order pickers also rated their understanding of the task as good with a mean of 4.1, a median of 4 and a standard deviation of .88. In item 3.2, participants were asked to rate how well they understood the pick-byvoice commands, thus rating the clarity of the audio. The participants’ responses are given in Figure 4.15. As the figure shows, only seven participants of all 112 participants (6.3%) responded that they understood the commands badly. With a median of 4.5 in group VR and a median of 4 in group RR, participants in general rated that the pick-by-voice commands could be well understood. Again, a twosample t-test does not reject equal means between the two groups at a 5% significance level (t(110) = 1.96, p = .053). Professional order pickers rated the pick-by-voice commands slightly worse with a mean of 3.6, a median of 3 and standard deviation of .84. Figures 4.16 and 4.17 give the participants’ responses when asked how well they were able to see the items in the real and in the virtual rack. Remember that the latter question was only given to participants of group VR. Visibility of items in the real rack is generally considered very good, with a median of 5 in both groups. Only two participants in total (1.8%) rated the visibility as bad. The two-sample 30
The parametric Student’s t-test has been used here and in the subsequent tests provided in this section without prior testing for normality of the data because sample sizes are > 40 in both groups (Herzog et al. 2019, p. 56; Elliott and Woodward 2007, p. 26).
4.4 Execution of the Research Study
93
Figure 4.13 Participants’ responses to item 1.3 in questionnaire 1: Prior experience with VR
t-test does not find a significant difference in mean values between the two groups at a 5% significance level (t(110) = 1.8, p = .074). Results of professional order pickers are similar with a mean of 4.3, a median of 4.5 and a standard deviation of .82. The visibility of items in the virtual rack is rated worse by participants in group VR. However, it is still generally perceived as good with a median of 4. Figure 4.17 shows that the standard deviation is larger compared to results on visibility of items in the real rack. A two-sample t-test for paired samples has been used to test for equal means between the item visibility in the real and the virtual rack as rated by the participants in group VR. Test results reveal that the difference in the means is significant (t(54) = 5.79, p < .001). Professional order pickers rated the visibility of items in VR with a mean of 4.2, a median of 4 and a standard deviation of .79. Finally, it must be pointed out that the long timespan during which the experiments have been conducted could interfere with the results. However, as has been explained before, it is assumed that the small changes to the laboratory setup that were made in 2019 in order to add the infrared sensors do not influence the results.
94
4
Experimental Design for Evaluating the Usability …
Figure 4.14 Participants’ responses to item 3.1 in questionnaire 3: General understanding of the task
Therefore, the previous experience of participants with using a VR is considered the only variable that is potentially affected by the time at which each individual experiment took place. Due to the ongoing increase in popularity of consumer-level VR systems, it can be argued that participants in later time periods are more likely to have previous experience with VR. However, a Kruskal-Wallis test has found no significant difference (at a 5% significance level) between the previous VR experience accessed in questionnaire 1 among all participants in the four different time periods (χ 2 (3) = 7.64, p = .054).31 Additionally, a graphical analysis of the set completion times in sets 1, 2, 3 and 4 of all participants in all time periods given in Figure 4.18 shows no sign of any structural breaks in the data. This also indicates that adding the infrared sensors indeed caused no noticeably change in set completion times.
31
The non-parametric Kruskal-Wallis test has been used instead of a one-way analysis of variance (ANOVA) because previously performed Shapiro-Wilk tests have rejected normality of the data with p < .001 for each time period.
4.4 Execution of the Research Study
95
Figure 4.15 Participants’ responses to item 3.2 in questionnaire 3: Clarity of the pick-byvoice commands
In summary, these results are promising: Responses to questionnaire 3 show that participants generally had a very good understanding of the task. This means that the instructions given to participants prior to the experiments can be considered sufficient and all participants were able to complete the task with their prior knowledge. Furthermore, as no difference between the two groups has been found, an effect of different or potentially insufficient understanding of the task on the results can be ruled out. The same can be said for participants’ perception of the clarity of the pick-by-voice commands. Results show that they had no difficulties understanding the commands, thus showing that the implemented pick-by-voice system is an adequate choice for the study at hand. Also, participants’ positive rating of the visibility of the items, especially in VR, indicate that both hard- and software are able to provide sufficient graphics performance to complete the task. Yet, the different rating of item visibility in VR and the real environment highlights the need for carefully evaluating searching times in both environments. Finally, it can be assumed that the long timespan during which the experiments took place does not have a significant
96
4
Experimental Design for Evaluating the Usability …
Figure 4.16 Participants’ responses to item 3.3 in questionnaire 3: Visibility of items in the real rack
Figure 4.17 Participants’ responses to item 3.4 in questionnaire 3: Visibility of items in the virtual rack
4.4 Execution of the Research Study
97
Figure 4.18 Set completion times in sets 1, 2, 3 and 4 of all participants in group VR (a) and group RR (b) in all time periods of the experiments
influence on the participants’ previous experience with VR and does therefore not interfere with the results.
4.4.3
Data Preparation
Prior to the analysis, the collected data for the dependent variables and especially the sensor data has been verified by carefully checking it for inconsistencies and errors. Also, the videos taken of each participant while conducting the experiment have been used for this purpose. The following four major issues have been identified within the sensor data. Note that the list is sorted by severity of the issues. 1. Permanently triggering sensors in the real environment. 2. Operating errors made by the experimenter. 3. Dublicate database entries in the virtual environment.
98
4
Experimental Design for Evaluating the Usability …
4. Participants confirming orders before the final pick. The following section describes these issues in detail, how they have been identified and how they have been dealt with in order to prepare for the analysis of the results. Note that no further issues have been found within the data that would limit the usability of the obtained results. To avoid any unwanted influence on the results, additional data preparation (e.g., an analysis and removal of outliers) has not been performed. Permanently triggering sensors in the real environment Permanently triggering sensors refers to infrared sensors registering an interruption of the infrared beam which is caused by something else than the picker’s hand entering the corresponding rack. This could either occur if the sensor beam is blocked by foreign objects (e.g., items sticking out of the rack) or if a sensor is temporarily aligned incorrectly, causing the beam to miss the reflector. In all cases, this issue was identified by the experimenter directly during the experiment. All experiments having this issue were therefore carefully documented directly after the experimental session. Furthermore, the issue can be identified by looking into the data: If the time difference between the timestamps for the start and end of an interruption exceeds the total time of the corresponding order, an incorrect triggering can be assumed. By additionally analysing the data for such long-time interruptions and comparing it to the documented cases, it has been ensured that all data sets showing this issue were found. The issue has been dealt with by completely removing the data for the corresponding block from the sample. Within group RR, the issue has been found with three participants in sets 1–4 and with two participants in sets 5–8. Within group VR, the issue has only been found in sets 5–8 as only those sets were performed in the real environment. Here, the data of six participants has been removed. Operating errors made by the experimenter Operating errors made by the experimenter were either caused by the experimenter forgetting to switch on the sensors or by failing to set up the required network correctly. These errors were found by the experimenter and were logged directly after the experimental session. As before, all cases containing such an error have been removed from the sample. In group RR, the data of two participants in sets 1–4 has been affected by this. In group VR, this issue has been found with three participants in sets 1–4 and one participant in sets 5–8.
4.4 Execution of the Research Study
99
Together with the data removed due to sensors permanently triggering, this leaves a total of 33 participants (87%) in sets 1–4 and 36 participants (95%) in sets 5–8 in group RR. In group VR, a total of 36 participants (92%) remain in sets 1–4 and 32 participants (82%) in sets 5–8. The number of datasets removed is summarized in Table 4.5. As these errors only affected sensor data, the entire set of 56 participants in each group (including participants of 2017 and 2018) has been used for the analysis of the NASA-TLX, the number of orders with picking errors, the number of orders with dropped items, as well as the set completion times. Duplicate database entries in the virtual environment Even though trigger fields in VR were programmed to only trigger once if both controllers enter simultaneously, it was found after the experiments that two database entries were written if the controllers entered the trigger field close to each other in time, i.e. with a time difference of only a few milliseconds. This either led to one of the two database entries not having a timestamp for the end of the interruption (because the end of the interruption was registered correctly, i.e. only once) or the total time of the interruption being counted twice, which would distort the results. To identify this issue, the data has been scanned for trigger field interruptions with a missing end timestamp or trigger field interruptions starting before the end of the previous interruption of the same trigger field (i.e. in the same level of the rack). Subsequently, double entries have been removed, leaving only one entry for the interruption. For the remaining entry, the start of the entry with the earlier start timestamp has been chosen. For the end timestamp, the end timestamp of the later ending entry has been used. During this process, 2,359 database entries have been removed from a total of 86,665 database entries of all participants in both groups, i.e. only 2.7% of the database entries were effected.
Table 4.5 Number of usable datasets with sensor data after removing datasets due to permanently triggering sensors and operating errors
100
4
Experimental Design for Evaluating the Usability …
Participants confirming orders before the final pick Analyses of the videos taken during the experiments revealed that some participants tried to increase their picking speed by verbally confirming an order before picking the last items of the previous order from the picking rack. This would cause the database entry of the pick to be already associated with the next order, which means searching times32 would be measured faster than they actually were. In order to correct this issue, all sensor interruptions at the picking rack happening less than two seconds after the start of the corresponding order have been assigned to the previous order. The threshold of two seconds has been chosen because the pick-by-voice system needs between 2 and 3 seconds to acoustically provide the information on rack and position of an item to pick. This means that the participants can only know the position to pick from at least two seconds after the start of a new picking order. In summary, 256 orders of a total of 8,768 orders in all sets of all participants in both groups have been corrected in such a way, i.e. 2.9%.
32
As a reminder: Searching times were calculated as the time difference between the start of a picking order and the first entry of the picker’s hands into the picking rack.
Results of the Comparison Between Virtual and Real Order Picking
This chapter presents the results of the experimental study and provides a direct comparison of the dependent variables in order to answer RQ 2.1. To do so, research hypotheses for the comparison of the dependent variables are defined (section 5.1) and the general procedure for selecting adequate methods for the inferential statistics analysis is introduced (section 5.2). Then, the results of the hypotheses testing are given (section 5.3), followed by a detailed discussion of the results (section 5.4).
5.1
Research Hypotheses
To compare human performance in order picking in VR and the real environment, RQ 2.1 must be translated into hypotheses using the available dependent variables measured during the study. As RQ 2.1 asks for differences between human performance in virtual and real order picking, the hypotheses can be formulated as so-called hypotheses of difference. However, no clear assumptions on the size or the direction of the effect can be made prior to the study as there is no former research available that, for example, indicates whether order picking in VR takes more or less time compared to order picking in a real environment. The hypotheses in this thesis therefore focus on the mere existence of an effect and are referred to as non-specific research hypotheses (Döring and Bortz 2016, p. 149). Note that the research hypotheses formulated below are not yet statistical hypotheses. Statistical hypotheses are even more specific since they outline the statistical test procedures Electronic supplementary material The online version of this chapter (https://doi.org/10.1007/978-3-658-34704-8_5) contains supplementary material, which is available to authorized users.
© The Author(s), under exclusive license to Springer Fachmedien Wiesbaden GmbH, part of Springer Nature 2021 J.-K. Knigge, Virtual Reality in Manual Order Picking, https://doi.org/10.1007/978-3-658-34704-8_5
101
5
102
5
Results of the Comparison Between Virtual and Real Order Picking
and consist of a null and an alternative hypothesis (Farrugia et al. 2010, p. 280). However, to formulate statistical hypotheses, the test statistic which is calculated from the data must be known (Döring and Bortz 2016, p. 661). As the test statistic is specified by the test method which in turn depends on assumptions on the available data1 , statistical hypotheses can only be formulated after the selection of the specific test method.2 Nevertheless, research hypotheses are formulated in such a way that they can be translated directly into statistical null hypotheses by including the respective test statistic. The available dependent variables measured and calculated in the experimental setup are listed in Table 4.2 in section 4.3.2. The first research hypothesis formulated for specifying RQ 2.1 deals with the perceived workload. An equally perceived workload between group VR and group RR in sets 1–4 is an important prerequisite for the subsequent analyses as it also serves as validation for the experimental setup: If the task is perceived to be equally challenging in the virtual and the real environment in terms of the workload, observed differences in other measures can be assumed to be caused by the different environment. Otherwise, if differences in the perceived workload are found, the cause for these differences must be analysed carefully in order to ensure that no unwanted inference (for example caused by unforeseen differences in the experimental procedure between the two groups) influences the comparison between virtual and real order picking. The first research hypothesis is therefore: H 1 The perceived workload does not differ between group VR and group RR. In order to check if the age of participants has an impact, the results of younger and older participants within group VR have been compared. Because the majority of participants in the sample was younger than 30 years, group VR has been divided into participants with an age of under 30 years and participants with an age of 30 years and above. Participants who did not give any information on their age have been excluded. Hypothesis H 1 is therefore extended by the formulation of the following hypothesis: H 1.1. The perceived workload does not differ between participants with an age of under 30 years and participants with an age of 30 years and above within group VR. 1
e.g., the assumption of normality of the data The selection of methods for inferential statistics testing and the corresponding test statistics which enable the formulation of statistical hypotheses are described in section 5.2.
2
5.1 Research Hypotheses
103
Next, in order to validate generalizability of the sample, the perceived workload measured in the group of professional order pickers and group VR in the main sample has been compared. Again, a difference could indicate unwanted inference, as the group of professional order pickers performed the same task in the same environment as group VR. The corresponding hypothesis has been formulated as follows: H 1.2. The perceived workload does not differ between the professional order pickers and the participants in group VR. The next dependent variable under consideration is the set completion time. It has been hypothesised that set completion times do not differ between order picking in a real environment and in a virtual environment: H 2 Set completion times do not differ between group VR and group RR. Again, this hypothesis has been extended for a comparison between younger and older participants as well as between the professional order pickers and group VR: H 2.1. Set completion times do not differ between participants with an age of under 30 years and participants with an age of 30 years and above within group VR. H 2.2. Set completion times do not differ between the professional order pickers and the participants in group VR. The data gathered by infrared sensors in the experimental setup enables the analysis of the overall set completion time more precisely by isolating two different and disjoint time components of the overall set completion time: picking and searching time. From H 2, the following hypotheses concerning picking and searching times can directly be derived: H 3 Picking times do not differ between group VR and group RR. H 3.1. Picking times do not differ between participants with an age of under 30 years and participants with an age of 30 years and above within group VR. H 4 Searching times do not differ between group VR and group RR.
104
5
Results of the Comparison Between Virtual and Real Order Picking
H 4.1. Searching times do not differ between participants with an age of under 30 years and participants with an age of 30 years and above within group VR. If H 2 is rejected, testing H 3 and H 4 can provide an answer to the origin of the observed differences in set completion times. Moreover, if H 2 holds true, testing H 3 and H 4 is still of value as time differences might exist in these time components that are not visible in set completion time.3 The next hypothesis deals with picking errors as a measure for picking quality. It states that the number of orders with picking errors should be equal in both environments if there is no difference between virtual and real order picking: H 5 The number of orders with picking errors does not differ between group VR and group RR. As before, additional hypotheses have been formulated considering the age of participants and potential differences between the professional pickers and group VR: H 5.1. The number of orders with picking errors does not differ between participants with an age of under 30 years and participants with an age of 30 years and above within group VR. H 5.2. The number of orders with picking errors does not differ between the professional order pickers and the participants in group VR. Finally, the number of orders with dropped items has been analysed for all participants and the following research hypotheses have been formulated: H 6 The number of orders with dropped items does not differ between group VR and group RR. H 6.1. The number of orders with dropped items does not differ between participants with an age of under 30 years and participants with an age of 30 years and above within group VR. 3
Note that hypotheses on differences in picking and searching times between the professional order pickers and group VR were not formulated as they could not be answered in this thesis. This is due to the fact that the professional order pickers took part in the experiment before the implementation of infrared sensors and picking and searching times are therefore not available (see section 4.4.1).
5.2 Procedure and Methods for the Inferential Statistics Analyses
105
H 6.2. The number of orders with dropped items does not differ between the professional order pickers and the participants in group VR.
5.2
Procedure and Methods for the Inferential Statistics Analyses
To test the research hypotheses formulated in section 5.1 and compare the dependent quantitative variables between both groups, multiple steps using different statistical methods for inferential testing have been taken. As the selected test method specifies the test statistic, it defines the statistical hypotheses derived from the research hypotheses. The specific steps for selecting the adequate test method applied in this thesis are depicted in Figure 5.1 and will be described below.
Figure 5.1 Methods and steps used to test the hypotheses using inferential statistics
Before using inferential statistics to compare sample data, one must first decide whether parametric or non-parametric tests are appropriate, i.e. if normal distribution can be assumed for the sample data (Ghasemi and Zahediasl 2012, p. 486; Herzog et al. 2019, p. 56). For large sample sizes of more than 30 to 40 data points, normallike distribution is often assumed according to the central limit theorem, even if the real distribution of the data from which the sample is drawn is unknown (Elliott and
106
5
Results of the Comparison Between Virtual and Real Order Picking
Woodward 2007, p. 26).4 However, careful testing for normality of the sample is always advisable in order to select the subsequent methods of statistical testing and provide a correct analysis and interpretation of the data (Ghasemi and Zahediasl 2012; Royston 1991).5 As described in section 4.4.1 and section 4.4.3, the sample in this thesis contains 56 participants in each group. Of these 56 participants, sensor data is available for 33 participants in sets 1–4 of group RR (36 in sets 5–8) and 36 participants in sets 1–4 of group VR (32 in sets 5–8). As this number is below the proposed threshold of 40 data points, the hypothesis of normal distribution of the sample data is not assumed a priori but has been tested for each sample using the Shapiro-Wilk test described in Royston (1982a) and Royston (1982b).6 In the case that normal distribution cannot be rejected for the data, the main and interaction effects of the group, the set, the block and the participants’ age can first be analysed using a repeated measures ANOVA (Herzog et al. 2019, pp. 76–81; Wilcox 2017, p. 343).7 However, the ANOVA only tests if significant effects between groups exist. Thus, if the null hypotheses is rejected, multiple alternative hypothesis are possible and additional pairwise comparisons are recommended as post-hoc tests (Herzog et al. 2019, pp. 71–72). To do so, a Levene’s test is performed in order to test for equal variances between the two groups as recommended by Gastwirth et al. (2009). If equal variances can be assumed, the mean values of group VR and group RR can be compared and tested for equality using a two-tailed Student’s t-test for two independent samples. In the case of unequal variances, a Welch’s test8 can be applied (Herzog et al. 2019, p. 57) to test the hypotheses. The Welch’s test is recommended to test the equality of means in the case of unequal variances, as it is robust against errors of Type I (Derrick and White 2016; Ruxton 2006). In the case that the hypothesis of normality has been rejected by the ShapiroWilk-test for at least one of the two groups, the main and interaction effects can also be tested using a repeated measures ANOVA if the data is first converted using an aligned rank transformation as described by Wobbrock et al. (2011). For the sub4
Herzog et al. (2019, 56) even argue that the t-test is robust to non-normality as long as the distribution is unimodal. 5 For example, normal distribution enables a direct comparison of means between two groups, which is not necessarily possible for non-normal data. 6 Note that the standard Shapiro-Wilk test described in Shapiro and Wilk (1965) can be used for sample sizes of up to 50 data points. However, the extension by Royston (1982a) and Royston (1982b) makes it possible to use the test with sample sizes of up to 2000 data points. 7 Note that the professional order pickers were not included in the ANOVA and the origin of participants was not added as an additional dependent variable, as all of the professional order pickers were assigned to group VR. 8 The Welch’s test is based on the Student’s t-test and is therefore similar.
5.2 Procedure and Methods for the Inferential Statistics Analyses
107
sequent pairwise comparison, the non-parametric Mann-Whitney U test for independent samples is used (Wilcoxon 1947).9 The Mann-Whitney U test is a popular non-parametric test among researchers, especially among economists (Croson 2005, p. 143; Al-Benna et al. 2010). The Mann-Whitney U test generally tests whether the distributions of two independent samples are equal.10 However, the test is able to test the equality of medians as long as the distributions of the two samples only differ by a shift δ along the x-axis, i.e. if FV R (x) = FR R (x + δ) (Divine et al. 2018; Hart 2001). To test for this prerequisite, a Kolmogorov-Smirnov test, which tests the equality of two sample distributions (Marsaglia et al. 2003), is performed previously to the Mann-Whitney U test.11 Note that all time measures for which normality has been rejected by the ShapiroWilk test were also logarithmised and subsequently tested for a log-normal distribution, also using a Shapiro-Wilk test. The results of these tests can be found in Appendix H in the electronic supplementary material. Even though a lognormal distribution has not been rejected for some samples, it has been rejected for some other samples. Therefore, the non-parametric Kolmogorov-Smirnov and the MannWhitney U test have been used for testing all non-normal data, as this is assumed to be a more reliable and conservative approach than assuming lognormal distribution for some samples. The corresponding statistical null and alternative hypothesis of the applied methods are summarized in Table 5.1. The table also contains the functions used in R to calculate the tests along with their packages. Note that for the reporting of the statistical analysis in section 5.3, the guidelines provided by Curran-Everett and Benos (2004) have been followed. Unless stated otherwise in the subsequent sections, a significance level of 5% has generally been used in this thesis when interpreting test results.
9
The Mann-Whitney U test is also referred to as Wilcoxon rank-sum test or Mann-WhitneyWilcoxon test but must not be confused with the Wilcoxon signed-rank test, which is used for dependent samples. 10 To do so, the Mann-Whitney U test can either calculate exact p-values or, in cases of very large data sets, use an approximation for the p-values by applying an asymptotic χ 2 test statistic (Divine et al. 2018, p. 279). As the samples contain less than 60 data points, exact p-values are calculated in this thesis. 11 Alternatively, permutation tests as described by Ernst (2004) can be computed to compare the datasets and test the hypotheses. Results of such an approach for a part of the data can be found in Seidel (2019). However, it can only be used if identical distributions can be assumed among all groups (Huang et al. 2006, p. 2247) and has therefore not been performed in this thesis.
108
5
Results of the Comparison Between Virtual and Real Order Picking
Table 5.1 Statistical null and alternative hypotheses as well as R functions for each test method
5.3
Results of the Hypotheses Testing using Inferential Statistics
5.3.1
Perceived Workload
Difference between group VR and group RR (H 1) Box plots of the weighted NASA-TLX scores for each block (sets 1–4 and 5–8) are depicted in Figure 5.2a. The mean and median values of the weighted NASA-TLX along with the results of the statistical test can be found in Table 5.2a. Note that the sample size of group RR is only 53 participants, as some participants did not complete the pairwise comparison of items necessary for calculating the weighted NASA-TLX. For those participants, only the raw NASA-TLX can be computed. As can be seen in Table 5.2a, a normal distribution of weighted NASA-TLX scores can be assumed for both groups with equal variances. The same holds true in sets 5–8. A 2 (group: VR and RR) x 2 (blocks: sets 1–4 and sets 5-8) repeated measures ANOVA has found neither the main effect of the group (F(1, 107) = 1.698, p = .195) nor the
5.3 Results of the Hypotheses Testing using Inferential Statistics
109
main effect of the block (F(1, 107) = .09, p = .76) to be significant. A significant interaction effect has also not been found (F(1, 107) = .01, p = .93).
Figure 5.2 Box plots of the weighted NASA-TLX scores for group VR and group RR (a), for participants with an age < 30 years and participants with an age ≥ 30 years within group VR (b), and group VR and the professional order pickers (c)
Next, NASA-TLX scores obtained after completing sets 1–4 have been directly compared between group VR (who performed sets 1–4 in the virtual environment) and group RR (who performed sets 1–4 in the real environment). Even though group RR shows a larger mean value than group VR, a two-sample t-test does not find a significant difference in mean NASA-TLX scores between both groups in sets 1–4. To verify the setup, results in sets 5–8 have been analysed in the same way. Remember that these sets were performed in the real environment by both groups. It is therefore expected that no difference between the two groups can be found. Indeed, the two-sample t-test does not reject the hypothesis of equal means for the weighted NASA-TLX. Test results for the raw NASA-TLX resemble those of the weighted NASA-TLX and can be found in Appendix I in the electronic supplementary material.
110
5
Results of the Comparison Between Virtual and Real Order Picking
Table 5.2 Test results for comparing the weighted NASA-TLX scores between group VR and group RR (a), between participants with an age < 30 years and participants with an age ≥ 30 years within group VR (b), and between group VR and the professional order pickers (c)
In summary, hypothesis H 1, (The perceived workload does not differ between group VR and group RR.), cannot be rejected based on the analysis of the data. Difference between participants with an age of under 30 years and participants with an age of 30 years and above (H 1.1) Next, participants with an age < 30 years and participants with an age ≥ 30 years within group VR have been compared. Box plots can be found in Figure 5.2b. Mean and median values as well as the results of the statistical tests can be found in Table 5.2b. As before, a normal distribution is assumed for both the group of younger participants as well as the group of older participants and variances are also assumed to be equal. In sets 5–8, normal distribution can also not be rejected, neither for participants with an age < 30 years nor for participants with an age ≥ 30 years. Furthermore, equality of the variances is assumed based on the Levene’s test. A 2 (groups: VR and RR) x 2 (blocks: sets 1–4 and sets 5–8) x 2 (age: < 30 and ≥ 30 years) repeated measures ANOVA has been performed to include the participants’ age. For the weighted NASA-TLX, the ANOVA has found no significant main effect (group: F(1, 101) = 1.43, p = .235; block: F(1, 101) = .01, p = .931; age: F(1, 101) = .65, p = .42). Interaction effects of group and block (F(1.101) = .07, p = .795), group and age (F(1, 101) = .01, p = .917), block and age (F(1, 101) = 1.48, p = .226) and all three independent variables (F(1, 101) = 1.20, p = .276) are also not significant.
5.3 Results of the Hypotheses Testing using Inferential Statistics
111
Next, for testing hypothesis H 1.1, a pairwise comparison between participants with an age < 30 years and participants with an age ≥ 30 years within group VR has been performed. In sets 1–4, a two-sample t-test does not find a significant difference in mean NASA-TLX scores based on a 5% significance level. Similarly, the two-sample t-test does also not find significant differences between participants with an age < 30 years and participants with an age ≥ 30 years in sets 5–8. Again, results for the raw NASA-TLX (see Appendix I in the electronic supplementary material) are similar. Hypothesis H 1.1 (The perceived workload does not differ between participants with an age of under 30 years and participants with an age of 30 years and above within group VR.) can thus also not be rejected. Difference between the professional order pickers and the participants in group VR (H 1.2) Finally, NASA-TLX scores of the professional order pickers have been compared to the scores of the participants in group VR. The results can be found in Figure 5.2c and Table 5.2c. As with the participants in group VR, the Shapiro-Wilk test does not reject normality in NASA-TLX scores of the professional order pickers. In sets 1–4, which were performed in VR by both group VR and the professional order pickers, the Levene’s test rejects equal variances between the two groups at a 5% significance level for the weighted NASA-TLX. The weighted NASA-TLX is therefore compared between group VR and the professional order pickers using the Welch’s test which rejects the hypothesis of equal means of weighted NASA-TLX in sets 1–4. Instead, the mean NASA-TLX score of the professional order pickers is 13.44 points lower that the score of group VR. In sets 5–8, which were performed in the real environments by both groups, the Levene’s test does not reject equal variances for the weighted NASA-TLX. The Student’s t-test rejects the hypothesis of equal means between the two groups and a difference in mean NASA-TLX scores of 21.37 points has been found. Similar results have again been found for the raw NASA-TLX (see Appendix I in the electronic supplementary material). Hypothesis H 1.2 (The perceived workload does not differ between the professional order pickers and the participants in group VR.) must therefore be rejected based on the data in both sets 1–4 and sets 5–8. Instead, the data shows that the professional order pickers yield significantly lower mean NASA-TLX scores in sets 1–4 and sets 5–8 compared to group VR.
112
5.3.2
5
Results of the Comparison Between Virtual and Real Order Picking
Set Completion Times
Difference between group VR and group RR (H 2) In contrast to the NASA-TLX, set completion times have been calculated and tested for each individual set. Box plots of the measured set completion times can be seen in Figure 5.3a. The results of the Shapiro-Wilk tests, the Kolmogorov-Smirnov test, and the Mann-Whitney U tests along with descriptive statistics are given in Table 5.3a. The Shapiro-Wilk test shows that normality has to be rejected in all sets but set 5 of group VR. Therefore, a 2 (groups: VR and RR) x 4 (4 sets in each block) x 2 (blocks) aligned rank transformed repeated measures ANOVA has been used to analyse main and interaction effects. The main effects of the group (F(1, 110) = 5.19, p = .025), the set (F(3, 770) = 140.94, p < .001), and the block (F(1, 770) = 942.93, p < .001) are all significant. The interaction effects of group and set (F(3, 770) = 20.80, p < .001), group and block (F(1, 770) = 220.05, p < .001), set and block (F(3, 770) = 49.66, < .001), and all three independent variables combined (F(3, 770) = 6.97, p < .001) are also significant. For the pairwise comparison and for testing the hypothesis, the non-parametric Kolmogorov-Smirnov and the Mann-Whitney U test have been used. The Kolmogorov-Smirnov test shows that the distribution of set completion times differs significantly between both groups, therefore, the Mann-Whitney U test cannot be interpreted as a test for equality of medians. As can be seen in Table 5.3a, the test shows a significant difference between set completion times of group VR and group RR with p < .005 in all sets 1–4. In fact, median set completion times of group VR are 64 seconds longer in set 1 and 28 seconds, 31 seconds and 31 seconds longer in sets 2, 3, and 4 compared to group RR. To ensure that differences in set completion times were in fact caused by the difference in the independent variable, i.e. the different environments of both groups, set completion times in sets 5–8 have also been tested. Results are also given in Figure 5.3a and Table 5.3a. In contrast to the results in sets 1–4, the hypothesis of equal distributions in group VR and RR cannot be rejected in sets 5–8 based on the Kolmogorov-Smirnov test. The results of the Mann-Whitney U test reveal that the hypothesis of equal medians cannot be rejected based on the data. As a result, hypothesis H 2 (Set completion times do not differ between group VR and group RR.) must be rejected in sets 1, 2, 3, and 4, which were performed in different environments by the two groups. In sets 5, 6, 7, and 8, which were performed in the real environment by both groups, the hypothesis cannot be rejected.
5.3 Results of the Hypotheses Testing using Inferential Statistics
113
Table 5.3 Test results for comparing the set completion times (s) between group VR and group RR (a), between participants with an age < 30 years and participants with an age ≥ 30 years within group VR (b), and between group VR and the professional order pickers (c)
Difference between participants with an age of under 30 years and participants with an age of 30 years and above (H 2.1) Box plots for the analysis of participants with an age < 30 years and participants with an age ≥ 30 years within group VR can be found in Figure 5.3b. Results of the statistical tests are given in Table 5.3b. The table shows that normality can be rejected at a 5% significance level in sets 1, 3, 6 and 8 of the group of participants under 30 years of age. A 2 (groups: VR and RR) x 4 (4 sets in each block) x 2 (blocks) x 2 (age: < 30 and ≥ 30 years) aligned rank transformed repeated measures ANOVA has found all effects to be significant, except for the interaction effects of group and age (F(1, 104) = .62, p = .433), set and age (F(3, 728) = 2.28, p = .078), and block and age (F(1, 728) = 1.57, p = .211).12 12
The complete results of the ANOVA can be found in Appendix J in the electronic supplementary material.
114
5
Results of the Comparison Between Virtual and Real Order Picking
Figure 5.3 Box plots of the set completion times (s) for group VR and group RR (a), for participants with an age < 30 years and participants with an age ≥ 30 years within group VR (b), and group VR and the professional order pickers (c)
5.3 Results of the Hypotheses Testing using Inferential Statistics
115
Due to the non-normality of the data in some sets, the non-parametric MannWhitney U test has again been used for the pairwise comparison of set completion times between participants with an age < 30 years and participants ≥ 30 years within group VR. However, the hypothesis of equal distribution can only be rejected at a 10% significance level in sets 1, 4, and 8 based on the Kolmogorov-Smirnov tests. The results of the Mann-Whitney U tests shown in Table 5.3b reveal that equal median set completion times between participants with an age < 30 years and ≥ 30 years cannot be rejected at a 5% significance level in any of the sets, neither in sets 1–4 (performed in VR) nor in sets 5–8 (performed in the real environment). This means that, even though p-values are relatively small in some sets, hypothesis H 2.1 (Set completion times do not differ between participants with an age of under 30 years and participants with an age of 30 years and above within group VR.) cannot be rejected in any of the sets. Difference between the professional order pickers and the participants in group VR (H 2.2) Finally, results of group VR have been compared with results of the group of professional order pickers. Box plots of the set completion times of the professional order pickers are displayed in Figure 5.3c. In sets 1–4, normality has already been rejected for group VR. For this reason, the Mann-Whitney U test has again been used to test the hypothesis. Results can be found in Table 5.3c. Both the Kolmogorov-Smirnov test and the Mann-Whitney U test yield significant differences in the underlying distributions. Table 5.3c also shows that median set completion times of the professional order pickers are 135 seconds, 79 seconds, 55 seconds, and 57 seconds longer in sets 1, 2, 3, and 4 compared to group VR. Results in sets 5–8, which were performed in the real environment by both groups, have also been compared between group VR and the group of professional order pickers. The results are similar to the results in sets 1–4 (see Table 5.3c). Median set completion times of the professional order pickers are generally longer compared to group VR (set 5: 73 seconds; set 6: 82 seconds; set 7: 71 seconds; set 8: 85 seconds) and the Kolmogorov-Smirnov test, as well as the Mann-Whitney U test, reject the hypothesis of an equal distributions in all sets. Thus, hypothesis H 2.2 (Set completion times do not differ between the professional order pickers and the participants in group VR.) can be rejected in all sets.
116
5.3.3
5
Results of the Comparison Between Virtual and Real Order Picking
Picking Times
Difference between group VR and group RR (H 3) Picking times, i.e. the sum of times between sensor entry and exit at the picking rack in each set of both groups are displayed in Figure 5.4a. The figure shows that picking times of group VR have larger medians compared to group RR in all sets 1–4. Mean and median values as well as the results of a Shapiro-Wilk, a KolmogorovSmirnov, and a Mann-Whitney U test are given in Table 5.4a. The Shapiro-Wilk test rejects the hypothesis of normal distribution in group VR in sets 1–4. In sets 5–8, the Shapiro-Wilk test rejects normality in all sets in group RR. A 2 (groups: VR and RR) x 4 (4 sets in each block) x 2 (blocks) aligned rank transformed repeated measures ANOVA shows significant main effects (group: F(1, 62) = 11.14, p = .001; set: F(3, 434) = 61.38, p < .001; block: F(1, 434) = 274.77, p < .001) as well as significant interaction effects (group and set: F(3, 434) = 27.39, p < .001; group
Figure 5.4 Box plots of the picking times (s) for group VR and group RR (a) and for participants with an age < 30 years and participants with an age ≥ 30 years within group VR (b)
5.3 Results of the Hypotheses Testing using Inferential Statistics
117
Table 5.4 Test results for comparing the picking times (s) between group VR and group RR (a) and between participants with an age < 30 years and participants with an age ≥ 30 years within group VR (b)
and block: F(1, 434) = 193.62, p < .001; set and block: F(3, 434) = 24.14, p < .001; group, set, and block: F(3, 434) = 21.52, p < .001).13 In the pairwise comparison between the two groups, both the KolmogorovSmirnov test and the Mann-Whitney U test show that the distributions of picking times differ significantly between the two groups in sets 1–4. Sets 5–8 have also been compared in order to show that the observed differences in sets 1–4 are caused by the difference in the independent variable. Here, the Kolmogorov-Smirnov test rejects equal distributions only in set 5, therefore, the Mann-Whitney U test can be interpreted as testing for equal medians in sets 6, 7 and 8. As can be seen in Table 5.4a, the hypothesis of equal medians cannot be rejected in these sets. In conclusion, hypothesis H 3 (Picking times do not differ between group VR and group RR.) can be rejected in sets 1, 2, 3, and 4 but not in sets 5, 6, 7 and 8. In fact, median picking times of group VR are 40 seconds longer in set 1, 15 seconds longer in set 2, 8 seconds longer in set 3, and 6 seconds longer in set 4 compared to group RR. 13
Note that for calculating the repeated measures ANOVA on picking times, participants for which the data of one of the two blocks have been removed during data preparation (see section 4.4.3) have been excluded.
118
5
Results of the Comparison Between Virtual and Real Order Picking
Difference between participants with an age of under 30 years and participants with an age of 30 years and above (H 3.1) Results for picking times of participants < 30 years of age and participants ≥ 30 years of age within group VR are given in Figure 5.4b and Table 5.4b. For participants < 30 years of age, the Shapiro-Wilk test rejects normality in sets 2, 3, 4 and 5 at a 5 % significance level. For this reason, a 2 (groups: VR and RR) x 4 (4 sets in each block) x 2 (blocks) x 2 (age: < 30 and ≥ 30 years) aligned rank transformed repeated measures ANOVA reveals that all effects are significant, except for the interaction effects of group and age (F(1, 58) = 2.77, p = .101), set and age (F(3, 406) = 1.34, p = .260), and block and age (F(1, 406) = 1.02, p = .314).14 Picking times have been compared using the Mann-Whitney U test again. As the Kolmogorov-Smirnov test rejects equal distribution of picking times only in set 1, the Mann-Whitney U test can be used to test the equality of medians in the remaining sets 2–8. Results in Table 5.4b show that the hypothesis of equal medians cannot be rejected in these sets, except for set 7. As a result, hypothesis H 3.1 (Picking times do not differ between participants with an age of under 30 years and participants with an age of 30 years and above within group VR.) can only be rejected in sets 1 and 7 but not in sets 2, 3, 4, 5, 6, and 8.
5.3.4
Searching Times
Difference between group VR and group RR (H 4) The results of searching times, i.e. the sum of times between the start of a picking order and the subsequent entry of a picker’s hand into the corresponding level of the picking rack are displayed in Figure 5.6a. Table 5.5a gives the results of the statistical tests along with median and mean values. As can be seen in the table, the Shapiro-Wilk test rejects normal distributions in sets 1 and 7 of group RR and in sets 2, 3, 4 and 8 of group VR. Therefore, an aligned rank transformed ANOVA and a Mann-Whitney U test have been used to analyse searching times between the groups. The 2 (groups: VR and RR) x 4 (4 sets in each block) x 2 (blocks) aligned rank transformed repeated measures ANOVA shows a significant main effect for the set (F(3, 434) = 32.66, p < .001) and the block (F(1, 434) = 240.69, p < .001) but not for the group (F(1, 62) = .05, p = .816). Interaction effects are only significant for set and block (F(3, 434) = 21.61, p < .001) but neither for group 14
The complete results of the ANOVA can be found in Appendix J in the electronic supplementary material.
5.3 Results of the Hypotheses Testing using Inferential Statistics
119
and set (F(3, 434) = .15, p = .929) nor group and block (F(1, 434) = 3.18, p = .075) and group, set, and block combined (F(3, 434) = .62, p = .602).15 With regard to the pairwise comparison between the two groups, the KolmogorovSmirnov test shows that the distribution of searching times does not differ significantly and the Mann-Whitney U test does not reveal a significant difference in median searching times, neither in sets 1, 2, 3, and 4 nor in sets 5, 6, 7, and 8 (see Table 5.5a). Thus, hypothesis H 4 (Searching times do not differ between group VR and group RR.) cannot be rejected in either of the sets. Difference between participants with an age of under 30 years and participants with an age of 30 years and above (H 4.1) Results for searching times of participants with an age < 30 years and participants with an age of ≥ 30 years within group VR are given in Figure 5.5b and in Table 5.5b. Based on the Shapiro-Wilk test, normality can only be rejected in set 4 of the group of participants with an age of < 30 years. The Levene’s test does not reject equal variances in all sets apart from in set 6. A 2 (groups: VR and RR) x 4 (4 sets in each block) x 2 (blocks) x 2 (age: < 30 and ≥ 30 years) repeated measures ANOVA finds that only the main effects of the set (F(3, 406) = 35.02, p < .001) and the block (F(1, 406) = 204.07, p < .001), as well as the interaction effects of the set and the block (F(3, 406) = 20.34, p < .001), and the group, the block, and the age (F(3, 406) = 6.55, p = .011) are significant.16 For the pairwise comparison of mean searching times between the two groups, a two-sample t-test has been used. As can be seen in Table 5.5b, mean searching times are longer for participants with an age ≥ 30 years in all sets. However, the t-test states that this difference is only significant in set 1. In summary, hypothesis H 4.1 (Searching times do not differ between participants with an age of under 30 years and participants with an age of 30 years and above within group VR.) cannot be rejected in all sets apart from set 1.
15
For calculating the repeated measures ANOVA on searching times, participants for which the data of one of the two blocks have been removed during data preparation (see section 4.4.3), have again been excluded. 16 The complete results of the ANOVA can be found in Appendix J in the electronic supplementary material.
120
5
Results of the Comparison Between Virtual and Real Order Picking
Figure 5.5 Box plots of the searching times (s) for group VR and group RR (a) and for participants with an age < 30 years and participants with an age ≥ 30 years within group VR (b)
5.3.5
Number of Erroneous Orders
Difference between group VR and group RR (H 5) Box plots for the number of orders with picking errors are displayed in Figure 5.6a. Mean and median values as well as test results are provided in Table 5.6a. The table shows that the Shapiro-Wilk test reveals non-normality for both groups in sets 1–4. In sets 5-8, normality can also be rejected for both groups. A 2 (group: VR and RR) x 2 (blocks: sets 1–4 and sets 5–8) aligned rank transformed repeated measures ANOVA shows that the main effect of the block (F(1, 110) = 41.84, p < .001) and the interaction effect of group and block (F(1, 110) = 5.29, p = .023) are significant but the main effect of the group is not (F(1, 110) = 3.34, p = .070). Table 5.6a also shows that the pairwise comparison between the two groups finds no significant difference in the distributions and median values between the two groups in sets 1–4. In sets 5–8, results are similar to sets 1–4 and the hypothesis of equal medians between the two groups cannot be rejected.
5.3 Results of the Hypotheses Testing using Inferential Statistics
121
Table 5.5 Test results for comparing the searching times (s) between group VR and group RR (a) and between participants with an age < 30 years and participants with an age ≥ 30 years within group VR (b)
Hypothesis H 5 (The number of orders with picking errors does not differ between group VR and group RR.) can therefore not be rejected, neither in sets 1–4 nor in sets 5–8. Difference between participants with an age of under 30 years and participants with an age of 30 years and above (H 5.1) Box plots of the number of erroneous orders for participants with an age < 30 years and participants with an age ≥ 30 years within group VR are given in Figure 5.6b. A normal distribution of the number of erroneous orders in sets 1–4 can be assumed for participants with an age ≥ 30 years, but not for participants with an age < 30 years (see Table 5.6b). In sets 5–8, the Shapiro-Wilk test rejects normality for the number of errors for both groups. A 2 (groups: VR and RR) x 2 (blocks: sets 1–4 and sets 5–8) x 2 (age: < 30 and ≥ 30 years) aligned rank transformed repeated measures ANOVA finds a significant effect of the group (F(1, 104) = 5.80, p = .018), the block (F(1, 104) = 30.54, p < .001) and of the interaction of group and block (F(1, 104) = 6.66, p = .011). However, the effects of the age (F(1, 104) = 2.71, p = .103) and of the interaction of group and age (F(1, 104) = .23, p = .634), the interaction of block
122
5
Results of the Comparison Between Virtual and Real Order Picking
Figure 5.6 Box plots of the number of erroneous orders for group VR and group RR (a), for participants with an age < 30 years and participants with an age ≥ 30 years within group VR (b), and group VR and the professional order pickers (c)
and age (F(1, 104) = 1.81, p = .181), and the interaction of group, block, and age (F(1, 104) = .80, p = .374) are not significant. Test results for the comparison between the two groups given in Table 5.6b show that the hypothesis of equal medians cannot be rejected, neither in sets 1–4 nor in sets 5–8. Thus, hypothesis H 5.1 (The number of orders with picking errors does not differ between participants with an age of under 30 years and participants with an age of 30 years and above within group VR.) can neither be rejected in sets 1-4 nor in sets 5–8. Difference between the professional order pickers and the participants in group VR (H 5.2) Results for the professional order pickers are given in Figure 5.6c and Table 5.6c. A Mann-Whitney U test is used to compare the number of erroneous orders between the professional order pickers and the participants in group VR as normality has already been rejected for the number of erroneous orders in group VR. Based on
5.3 Results of the Hypotheses Testing using Inferential Statistics
123
Table 5.6 Test results for comparing the number of erroneous orders between group VR and group RR (a), between participants with an age < 30 years and participants with an age ≥ 30 years within group VR (b), and between group VR and the professional order pickers (c)
the results of the Kolmogorov-Smirnov and Mann-Whitney U tests, the hypothesis of equal medians of the number of erroneous orders cannot be rejected, neither in sets 1–4 nor in sets 5–8. In summary, hypothesis H 5.2 (The number of orders with picking errors does not differ between the professional order pickers and the participants in group VR.) cannot be rejected in sets 1–4 and sets 5–8.
5.3.6
Number of Orders with Dropped Items
Difference between group VR and group RR (H 6) Box plots of the number of orders with dropped items are given in Figure 5.7a. Mean and median values along with the results of the statistical tests are given in Table 5.7a. The table shows that normality is rejected by the Shapiro-Wilk test for both groups in sets 1–4 and sets 5–8. A 2 (group: VR and RR) x 2 (blocks: sets 1–4 and sets 5–8) aligned rank transformed repeated measures ANOVA shows that the main effects of the group (F(1, 110) = 24.84, p < .001) and the block (F(1, 110) = 31.46, p < .001), as well as the interaction effect of group and block (F(1, 110) = 31.63, p < .001) are significant. Furthermore, the Kolmogorov–Smirnov and the Mann-Whitney U test show that the distributions of the two groups differ significantly in sets 1–4. In sets 5–8, the
124
5
Results of the Comparison Between Virtual and Real Order Picking
Figure 5.7 Box plots of the number of orders with dropped items for group VR and group RR (a), for participants with an age < 30 years and participants with an age ≥ 30 years within group VR (b), and group VR and the professional order pickers (c)
number of orders with dropped items having equal distributions cannot be rejected and the hypothesis of equal medians of the number of orders with dropped items between group VR and group RR can also not be rejected. On the one hand, this means that hypothesis H 6 (The number of orders with dropped items does not differ between group VR and group RR.) has to be rejected in sets 1–4. On the other hand, in sets 5–8, the hypothesis cannot be rejected. Difference between participants with an age of under 30 years and participants with an age of 30 years and above (H 6.1) Test results of the comparison of the number of orders with dropped items between participants with an age < 30 years and participants with an age ≥ 30 years within group VR are given in Figure 5.7b and Table 5.7b. The Shapiro-Wilk test rejects normality for participants with an age < 30 years. For participants with an age ≥ 30 years, normality must be rejected in sets 5–8.
5.3 Results of the Hypotheses Testing using Inferential Statistics
125
Table 5.7 Test results for comparing the number of orders with dropped items between group VR and group RR (a), between participants with an age < 30 years and participants with an age ≥ 30 years within group VR (b), and between group VR and the professional order pickers (c)
A 2 (groups: VR and RR) x 2 (blocks: sets 1–4 and sets 5–8) x 2 (age: < 30 and ≥ 30 years) aligned rank transformed repeated measures ANOVA finds that all main effects (group: F(1, 104) = 18.15, p < .001; block: F(1, 104) = 32.48, p < .001; age: F(1, 104) = 5.60, p = .020) and all interaction effects (group and block: F(1, 104) = 32.47, p < .001; group and age: F(1, 104) = 8.17, p = .005; block and age: F(1, 104) = 8.93, p = .004; group, block, and age: F(1, 104) = 10.90, p = .001) are significant. For the pairwise comparison between the two groups, the Kolmogorov–Smirnov test rejects the hypothesis of equal distributions in sets 1-4 but not in sets 5–8. Similarly, the Mann-Whitney U test rejects equal distributions in sets 1–4, but it does not reject equal medians in sets 5–8. This means that hypothesis H 6.1 (The number of orders with dropped items does not differ between participants with an age of under 30 years and participants with an age of 30 years and above within group VR.) has to be rejected in sets 1–4 but not in sets 5–8. Difference between the professional order pickers and the participants in group VR (H 6.2) Test results for the comparison of the number of orders with dropped items between the group of professional order pickers and group VR can be found in Figure 5.7c and Table 5.7c. For the comparison between the professional order pickers and the participants in group VR, the non-parametric Mann-Whitney U test has been used as
126
5
Results of the Comparison Between Virtual and Real Order Picking
normality has already been rejected for group VR. Based on a Kolmogorov-Smirnov test, equal distributions can be rejected in sets 1–4 but not in sets 5–8. Similarly, the Mann-Whitney U test rejects equal distributions in sets 1–4. However, the hypothesis of equal medians in sets 5-8 cannot be rejected. In summary, hypothesis H 6.2 (The number of orders with dropped items does not differ between the professional order pickers and the participants in group VR.) must be rejected in sets 1–4 but not in sets 5–8.
5.3.7
Summary of Results
A summary of all hypotheses and the results of the statistical testing (i.e. if the hypothesis is rejected based on the data or not) is given in Table 5.8. Remember that
Table 5.8 Summary of the test results for each hypothesis, i.e. if the hypothesis has been rejected in individual blocks or sets
5.4 Discussion of the Results
127
the hypotheses have either been tested individually in each set or – in the case of H 1, H 5, and H 6 – for an entire block consisting of four sets. The table also gives the corresponding dependent variable of each hypothesis along with the two samples that are compared.
5.4
Discussion of the Results
5.4.1
Validation and Discussion of the Experimental Setup
With respect to NASA-TLX scores, it has to be noted that hypothesis H 1 has not been rejected, meaning that a significant difference in NASA-TLX scores in sets 1–4 has not been found between group VR and group RR. This finding supports the experimental setup, as users perceived the task to be equally demanding in terms of workload in both environments. This means that differences in the measured times do not result from unequal workload between the groups. In fact, the results of the NASA-TLX back the experimental setup by showing that the presented task can be considered challenging: With the mean and median values for the weighted NASATLX lying above 60 for both groups, the perceived workload of the presented task can generally be described as high (Grier 2015). Moreover, H 1.1 was also not rejected, i.e. it can be assumed that participants with an age ≥ 30 years were able to use the HMD just as well as younger participants and did not perceive an additional workload due to their age. This finding is consistent with the results of Syed-Abdul et al. (2019), who have found that elderly people accept VR well and generally perceive it as easy to use. Similarly, neither Manis and Choi (2019) nor Sakhare et al. (2019) have found any significant relationship between users’ age and experiences in VR or the perceived usefulness of the technology. Nevertheless, the professional order pickers yield significantly lower NASATLX scores compared to the participants in group VR, i.e. H 1.2 has been rejected. A possible explanation could be that the professional order pickers did not receive the performance-dependent bonus that the participants in group VR received, which meant the professional order pickers experienced less pressure to finish the task as fast as possible. This shows that the difference in the experimental conditions does have an effect on the results, limiting the comparability of the results of the professional order pickers to the results of the participants in group VR. This finding thus supports the decision to not include the professional order pickers in the main sample but analyse their data separately.
128
5
Results of the Comparison Between Virtual and Real Order Picking
Another result which supports the experimental setup is the low number of erroneous orders in both groups, which yields a median of just one for both groups in sets 1–4 and zero erroneous orders in sets 5–8. Notably, the number of erroneous orders is equally low for the professional order pickers. This again proves that the task was well understood by all participants and not too difficult to complete, despite the high NASA-TLX scores. This finding is also independent of the environment in which the experiment took place as hypotheses H 5 has not been rejected by the results. Also, neither H 5.1 nor H 5.2 have been rejected, meaning that a difference between participants of different age or between the professional order pickers and group VR have also not been found. Especially the latter leads to the conclusion that the performance dependent bonus which participants in group VR received, but the professional order pickers did not receive, did not lead to a higher number of errors caused by the increased temporal pressure. However, it must be highlighted that picking errors were registered manually by the experimenter and are therefore subject to human error. This limitation must be kept in mind as the experimental setup could not ensure that every single picking error was identified as such by the experimenter. Furthermore, it must be considered that the low number of erroneous orders limits the explanatory power of the comparison. It can thus be argued that the task was too easy to enable a profound understanding of differences in the number or type of errors between order picking in VR and in a real environment. Yet, the low number of picking errors is regarded a realistic representation of real order picking applications in which picking errors can be a costly loss in quality and must therefore be avoided (Gils et al. 2018, p. 11; Battini et al. 2015, pp. 483–484; Brynzér and Johansson 1995).
5.4.2
Comparison of Human Performance in Virtual and Real Order Picking
The results of the ANOVA for set completion times show a significant effect of the group and the pairwise comparison also yields a significant difference between group VR and group RR in sets 1–4. I.e. hypothesis H 2 has been rejected in these sets. A similar difference, however, cannot be found in sets 5–8 and the hypothesis has not been rejected in these sets. Because the latter sets were performed in the real environment by both groups, it can be assumed that the observed difference in sets 1–4 is caused by the difference in the independent variable, indicating that order picking takes significantly longer in VR compared to order picking in a real environment. This finding is important as it implies that the usability of VR for planning manual order picking systems in practice is limited. At least the observed
5.4 Discussion of the Results
129
time differences (which can reach a median of up to 64 seconds in the first set) must be taken into account when using VR for planning manual order picking. However, it must be noted that the ANOVA also found a significant interaction effect of the group and the block. This means that the results in sets 5–8 are affected by the fact that participants have performed sets 1–4 previously. However, it is assumed that this interaction effect is also mainly caused by the change in the environment for group VR, as participants of both groups have performed the exact same set of orders (in different environments) in sets 1–4. Furthermore, the ANOVA has also revealed a significant effects of the set and the block and their interaction with the group variable, indicating that set completion times improve and the difference between the two groups changes between the sets. In fact, when looking at the observed differences in median set completion times, it is noticeably that the difference is more than twice as large in set 1 compared to sets 2, 3 and 4. Apparently, the additional time requirements in VR are larger for the initial picks and become smaller after some picks have been performed in VR. As responses from questionnaire 1 have shown, 55% of the participants in group VR had never used VR before participating in the experimental study and only 7% participants use VR sometimes or regularly. A reason for the time difference might thus be that the majority of participants needed some time to become familiar with the virtual environment and with using the handheld controllers. Participants needing more time to complete a task in VR due to them being unfamiliar with using the technology has also been found by Hejtmanek et al. (2020, 486). Giving users some time to get familiar with VR thus seems advisable for any application of the technology in manual order picking. The data suggests that the effect of familiarization is mainly present in set 1 (i.e. the first 16 orders). However, an exact answer on how many picks are needed for users to become familiar with VR cannot yet be provided. When comparing participants with different ages, the results are promising. Even though the ANOVA has found a significant main effect of the age, interaction effects including the age are not significant. Also, the pairwise comparison within group VR has not rejected hypothesis H 2.1 in any of the sets, i.e. no significant difference in set completion times has been found between participants with an age < 30 years and participants with an age ≥ 30 years. This leads to the conclusion that the observed results are independent of the participants’ age. However, the group of participants ≥ 30 years of age is much smaller in the sample, which limits the explanatory power of this finding. Nevertheless, having found no difference between the groups provides evidence that the relatively young sample in the study does not limit the generalizability of the results.
130
5
Results of the Comparison Between Virtual and Real Order Picking
The comparison of set completion times of group VR and the professional order pickers has, however, found significant differences, i.e. H 2.2 has been rejected in all sets. On the one hand, this might indicate that the sample used in the study – which was mainly recruited from university students – is not representative of real order picking personnel. On the other hand, experimental conditions for the professional order pickers were different from participants in group VR. As has been discussed in section 5.4.1, this is also reflected in the lower perceived workload. As a result, the difference in set completion times of the professional order pickers might also be caused by some other unknown inference due to the unequal experimental conditions. Also, it must be mentioned that the group of professional order pickers is much smaller compared to group VR, which could also contribute to the observed difference. Hence, drawing an unambiguous conclusion based on the results is not possible for the comparison of the professional order pickers and the participants in group VR. Yet, the fact that the professional order pickers have shown even larger set completion times than the participants in group VR generally points in the same direction as the previously stated finding of larger time requirements in VR compared to real order picking. With regards to picking times, a significant effect of the group has been found. Results of the pairwise comparison show that the hypothesis of equal picking times in both groups (H 3) must be rejected in sets 1–4 but not in sets 5–8. This outcome is consistent with scientific literature, which states that limitations in simulating haptic feedback and realistic physical interaction is a major drawback of present VR systems (Berg and Vance 2017, p. 12; Sun et al. 2020; Vaughan et al. 2016; Vélaz et al. 2014; Grajewski et al. 2013; Jayaram et al. 2001; Burdea 2000). The effect of missing haptic feedback especially influences picking times because the actual picking of items represents the only physical interaction in the experimental setup. For practical applications of VR for planning manual order picking, this means that picking times can be expected to be significantly faster in a real setup compared to picking times measured in VR (up to a median of 40 seconds in the first set). This implication is also supported by the fact that no significant difference in picking times has been found in sets 5–8, which were performed in the real environment by both groups. When looking at participants with an age < 30 years and participants of an age ≥ 30 years, the ANOVA has found a significant main effect of the age. Based on the pairwise comparison within group VR, the hypothesis of equal picking times (H 3.1) can only be rejected in the first set. This difference in the first set could be caused by older participants needing more time to become familiar with the handheld controllers. However, it cannot be observed during sets 2, 3, and 4, and the ANOVA has found a significant interaction effect of the group, the set and the age. This means
5.4 Discussion of the Results
131
that even though picking times do not differ between younger and older participants in the latter sets, the time requirements for users to become familiar with VR seem to be influenced by participants’ age. In general, this indicates that the time required to become familiar with VR differs individually for each participant, and personal factors (such as the participants’ age) must be considered. With regard to searching times, no significant effect of the group has been found. Similarly, the pairwise comparison between the two groups has found no difference, i.e. H 4 has not been rejected in either of the sets. Furthermore, the main effect of the age is not significant and the hypothesis of equal searching times between participants with an age < 30 years and participants with an age ≥ 30 years (H 4.1) has only been rejected in the first set. Apparently, the slightly worse visibility of items in VR reported by participants does not have a significant influence on searching times. In contrast to the aforementioned results, this finding is promising for the future use of VR for planning manual order picking as the reduction of human searching times is one of the primary parameters for warehouse planning (Boysen et al. 2017; Gils et al. 2018). The time needed to find an item within the rack in VR can be assumed to be of equal magnitude compared to the time needed to find an item within a similar rack in the real environment. Hence, VR can be used to estimate searching times in real-world picking applications. For example, different rack layouts can be simulated in VR and the layout yielding the shortest searching times can be identified without any necessity to construct real racks. Moreover, VR enables an analysis of how different external parameters (e.g., colours, brightness), which can easily be changed in a simulated environment, affect searching times. The impact of the limited possibilities for physical interaction in VR, that has already been found for picking times, is further supported by the findings on the number of orders with dropped items. This number has been found to be significantly larger in VR, i.e. the effect of the group is significant and the pairwise comparison rejects hypothesis H 6 in sets 1–4. It is apparently more likely for participants to unintentionally drop an item in VR compared to picking in a real environment. Nevertheless, it must be noted that this issue has already reached manufacturers of VR hardware, and manifold technical solutions for improved haptic feedback and interaction are currently under development (Coburn et al. 2017, p. 6; Xia 2016; Vaughan et al. 2016). Therefore, it is not unlikely that the results on the relevance of limited haptic feedback will be outdated in a short time, increasing the usability of VR for planning manual order picking. To answer RQ 2.1, human performance mainly differs in terms of set completion times, picking times, and the number of orders with dropped items, but not in perceived workload, number of erroneous orders and searching times. Moreover, it is worth mentioning that the main effect of the set and the block, as well as
132
5
Results of the Comparison Between Virtual and Real Order Picking
interaction effects including the set and block have been found to be significant for all dependent variables (except for the NASA-TLX). This indicates that the results significantly change with the number of completed orders and the times needed by participants generally decrease over time. It can thus be assumed that learning takes place during the course of the experiment, and it is advisable to analyse these learning effects in more detail. This will be done in the following chapter.
6
Analysis of Learning Curves in Virtual and Real Order Picking
In this chapter, mathematical learning curve models are used to further analyse the data obtained in the experimental study. To do so, the general occurrence of learning effects is analysed briefly and the dependent variables that are suited for fitting learning curve models are selected first (section 6.1). Then, the actual learning curve models are introduced and fitted to the data (section 6.2). The results are subsequently used to evaluate which learning curve model provides the best fit, thus answering RQ 3.1 (section 6.3). Next, hypotheses are formulated and the parameters of the best fitting models are compared between VR and the real environment in order to answer RQ 3.2 (section 6.4). Finally, the remaining research questions from the field of planning manual order picking are answered by using the learning curves to predict human performance (RQ 2.2, section 6.5) and estimate the number of orders necessary for users to become familiar with the VR environment (RQ 2.3, section 6.6).
6.1
Occurrence of Learning Effects in General and Selection of Dependent Variables for the Analysis of Learning Curves
In general, learning curves are used to model and estimate the output of a process, the time of a task or its cost in relation to the number of repetitions (Grosse and Glock 2013, p. 853). As described in section 4.3.2, three different time values are Electronic supplementary material The online version of this chapter (https://doi.org/10.1007/978-3-658-34704-8_6) contains supplementary material, which is available to authorized users.
© The Author(s), under exclusive license to Springer Fachmedien Wiesbaden GmbH, part of Springer Nature 2021 J.-K. Knigge, Virtual Reality in Manual Order Picking, https://doi.org/10.1007/978-3-658-34704-8_6
133
134
6
Analysis of Learning Curves in Virtual and Real Order Picking
measured for each order in the experimental setup, namely the order completion time, the picking time and the searching time.1 The results of the ANOVAs provided in section 5.3 have shown that the sets and blocks have significant main effects on set completion times, picking times and searching times. This means that these times differ significantly between individual sets within each group, leading to the assumption that learning effects cause an improvement of participants over time. It thus seems that learning effects indeed exist. Hence, additional pairwise comparisons of the times in each set to the times of the previous set within each group and in each block have been conducted in order to test the hypothesis that set completion times, picking times and searching times are not smaller (i.e. greater or equal) in each set compared to the previous set. A one-sided Wilcoxon signed-rank test has been used2 and results are given in Table 6.1. The results show that the null hypothesis has to be rejected in most cases. Especially in the first block, set completion times, picking times and searching times of both groups are significantly lower in set 2 compared to set 1. Additionally, set completion times and picking times of set 3 are significantly lower than in set 2. Only when comparing set 4 to set 3, a significant difference cannot be found. On the one hand, this confirms the previous assumption that learning effects exist. On the other hand, having found no difference between the latter sets shows that learning effects are not constant but diminish over time. It is therefore considered appropriate
Table 6.1 Results of a one-sided Wilcoxon signed-rank test to test the hypothesis that time measures t are greater or equal in each set compared to the previous set
1
An overview of the dependent variables measured in the experimental setup can be found in Table 4.2 on page 83. Note that for the comparison in chapter five, times per order were summed for each set. For estimating learning curves, however, the times for each individual order are used. 2 The Wilcoxon signed-rank test has been used here because it can test two dependent samples of non-normal data.
6.1 Occurrence of Learning Effects in General and Selection …
135
to investigate the occurrence of learning effects in detail by applying learning curves to the order completion times, picking times and searching times. However, order completion times depend on the number of items picked per order. As a reminder, the number of items per order varies between one and nine items. In Figure 6.1, the mean order completion time of all participants in both groups is depicted for different numbers of items in the orders. As expected, mean order picking times increase with the number of items in the order. This illustrates that order completion times depend strongly on the number of items in the order, making it difficult to isolate learning effects. One solution would be to divide the order completion times by the number of items in the corresponding order to calculate the time per item. A similar approach has been used, for example, by Reif (2020) and Grosse and Glock (2013), who calculate order completion times per item resp. per picking position. For the data at hand, however, calculating the order completion time per item proves difficult as the travel time between the picking rack and the order bins is included in the order completion time. Because participants are allowed to pick two items at a time, the minimum number of movements between the picking rack and the order bins is
Figure 6.1 Mean order completion times per number of items in the order
136
6
Analysis of Learning Curves in Virtual and Real Order Picking
equal for orders with an uneven number of items and orders with the next higher even number of items. Figure 6.1 also gives the mean values of order completion times divided by the number of items as a dashed line. As can be seen from the sawtooth-like curve for order completion times, this causes the order completion times per item to be overestimated for orders with an uneven number of items, and underestimated for orders with an even number of items. It can be seen that the effect of overestimating times per item for uneven numbers of items is largest for orders with just one item to pick: Those orders generally take almost the same time as orders with two items. Yet, when calculating the order completion time per item, the time is divided in half for orders with two items, resulting in a steep decline in the curve from orders with one item to orders with two items. To overcome this issue, Reif (2020, pp. 11–12) suggests calculating the time per required movement between the picking rack and the order bins (i.e. one movement for every two items). Although participants were asked to pick no more than two items at a time, they were allowed to pick each item individually. This means that, even though dividing order completion times by the number of required movements would be adequate for the majority of participants, the calculated times would be too long for participants who preferred to pick just one item at a time in some orders. Thus, it would not be possible to identify whether learning effects result from actual improvement due to learning or from participants switching from picking just one item to picking two items at a time. As a result, order completion times have been ruled out for the evaluation of learning curves. Figure 6.2 shows the mean picking time for different numbers of items in the order as well as picking times divided by the number of items in the order. Again, the figure shows that the mean picking times increase almost linearly with the number of items in the order, as additional picks from the rack are required. This means that using picking times to analyse learning effects suffers from the same limitations as using order completion times. Even though the time for the pickers’ movement between the picking rack and the order bins is not included in picking times, the figure shows that overestimating is also an issue when dividing picking times by the number of items. However, this effect is much smaller compared to order completion times, causing the curve for picking times divided by the number of items to be almost horizontal (i.e. independent of the number of items). Therefore, it is considered possible to use the picking times divided by the number of items for the analysis of learning effects. Nevertheless, the effect of the number of items cannot be fully ruled out and must be kept in mind. Furthermore, it has to be taken into account that picking times were found to differ significantly between VR and the real environment in chapter five.
6.1 Occurrence of Learning Effects in General and Selection …
137
Searching times, however, do not depend on the number of items picked. This is illustrated by the almost horizontal line for searching times in Figure 6.3. Searching times are therefore considered well suited for the estimation of learning curves and the analysis of learning effects. In fact, using searching times for the analysis is also beneficial because Grosse and Glock (2015, p. 889) assume that learning in order picking mainly affects the searching for items in the rack. Furthermore, no difference in searching times between VR and the real environment was found in chapter five. In summary, picking times per item and searching times are used in the following analysis to estimate learning curves in VR and in the real environment. As described in Table 4.5 on page 99, these time measures are available for 69 participants in sets 1–4 (36 in group VR and 33 in group RR) and 68 participants in sets 5–8 (32 in group VR and 36 in group RR). As each participant completed two blocks of four sets with 16 orders each, a total of 128 data points have been gathered for each participant. However, learning curves are fitted individually for each block of 64 orders.
Figure 6.2 Mean picking times per number of items in the order
138
6
Analysis of Learning Curves in Virtual and Real Order Picking
6.2
Curve Fitting
6.2.1
Learning Curve Models
An overview of the most frequently used learning curve models in scientific literature can be found in Grosse et al. (2015b, pp. 402–403). More extensive overviews of learning curve models (including models on forgetting) are available in Anzanello and Fogliatto (2011) and Fogliatto and Anzanello (2011). To find the available learning curve models that are most suitable for use in manual order picking, Grosse and Glock (2013) have already applied different learning curve models to picking times gathered from a real order picking process. They have analysed six different learning curve models, namely the Wright learning curve, the De Jong learning curve, the Stanford B learning curve, the time constant learning curve, the threeparameter hyperbolic learning curve, and the dual phase learning curve. As they especially recommend the Wright, the De Jong, the three-parameter hyperbolic, and the dual phase learning curve for use in manual order picking, these learning curve models, as well as the Stanford B learning curve, are also used in this thesis and will be introduced below. Additionally, Grosse et al. (2015b, p. 410–411), who suggest
Mean searching time (s)
5
4
3
2
1
1
2
3
Sets 1-4
4
5 Items per order
Sets 5-8
Time
6
7
Time / Items per order
Figure 6.3 Mean searching times per number of items in the order
8
9
6.2 Curve Fitting
139
which learning curve model should be used under which circumstances based on a meta-analysis of scientific literature, found that the S-curve model and the JaberGlock learning curve also perform well in laboratory settings. As this description fits the experimental setup in this study, these models have also been analysed. The notation used throughout this chapter is summarized in Table 6.2. The Wright Learning Curve Model (WLC) The learning curve developed by Wright (1936)3 was the first mathematical model describing the improvement of a process over time due to learning. Using the model, the author was able to estimate the reduction in cost per unit in airplane manufacturing with each additional airplane that was produced. The model has since served as a basis for the development of further learning curve models and is still popular today due to its simplicity and its ability to provide generally acceptable results (Glock et al. 2019b; Grosse et al. 2015b, pp. 408, 411; Jaber and Glock 2013, p. 867; Gunawan 2009, p. 51). The model can be formulated as follows: tx,i = t1,i x −bi ,
(6.1)
with tx,i denoting the time needed for order x, bi giving the learning exponent, and t1,i being the time needed for the first order of participant i. Note that the learning exponent defines the slope of the learning curve and 0 ≤ bi ≤ 1. Having estimated these parameters, they can be used to calculate an individual learning rate L Ri for each participant i, which quantifies the remaining percentage of the time needed after doubling the number of completed iterations (i.e. orders in this case) (Grosse and Glock 2013, p. 859; Dar-El et al. 1995b, p. 272): L Ri =
t1,i (2x)−bi = 2−bi t1,i x −bi
(6.2)
The Stanford B Learning Curve Model (STB) The Stanford B learning curve model, which was developed by Carlson (1973) as an extension of the WLC, can be expressed using the following equation: tx,i = t1,i (x + Bi )−bi
(6.3)
The model introduces a factor Bi > 0, which represents the number of previously performed tasks of the same or similar kind. The model is therefore able to incor3
The Wright learning curve model is sometimes also referred to as the Power Model.
140
6
Analysis of Learning Curves in Virtual and Real Order Picking
Table 6.2 Notation for variables and parameters used throughout this chapter
6.2 Curve Fitting
141
porate previous knowledge in the process of learning. Note that Gunawan (2009, p. 47) recommends a value of 1 < Bi < 10 which has also been adopted in this thesis. The STB has been found by Grosse et al. (2015b) to generally yield good results for various datasets from literature. However, for the specific case of manual order picking, the STB has not been found to perform well by Grosse and Glock (2013). The authors assume the limited performance of the model could be caused by the order pickers in their sample having only little prior experience or by the number of orders per picker being too small for the analysis. The De Jong Learning Curve Model (DJLC) The learning curve model by Jong (1957) is an extension of the WLC given in equation 6.1 and introduces the parameter Mi with 0 ≤ Mi ≤ 1. The model can be expressed as follows: 1 − Mi tx,i = t1,i Mi + (6.4) x bi The newly introduced parameter Mi can be regarded as a measure for the degree of automation in a process. As automated tasks cannot normally be improved by human learning, the parameter Mi defines a threshold at t1,i Mi for x → ∞. A higher value for Mi causes the learning curve to plateau at higher values, meaning that no further improvement is possible. It is therefore also referred to as the factor of incompressibility (Grosse et al. 2015b, p. 402; Grosse and Glock 2013, pp. 853, 856; Jong 1957, p. 54). Note that Gunawan (2009, p. 48) suggests Mi = 0.1 for tasks which are mainly manual. For the specific application of manual order picking, Grosse and Glock (2013) have also found good results when setting the factor of incompressibility to Mi = 0.1. However, in this thesis, no such restriction to Mi has been applied, meaning that the factor has been estimated freely. The S-curve Model (SCM) According to Nembhard and Uzumeri (2000a), the S-curve model4 can be expressed by the following equation: tx,i = t1,i (Mi + (1 − Mi )(x + Bi )−bi )
4
(6.5)
The S-curve model is named after the shape of its function in a logarithmic scale, which can be described as S-shaped.
142
6
Analysis of Learning Curves in Virtual and Real Order Picking
The model extends the DJLC by introducing the parameter Bi > 0, which is already known from the STB. The SCM can thus be regarded as a combination of the STB and the DJLC (Grosse et al. 2015b, p. 403). The Dual Phase Learning Curve Model (DPLC) The dual phase learning curve has been proposed by Dar-El et al. (1995a) and makes it possible to distinguish between the effect of cognitive learning and the effect of motor learning. Cognitive learning refers to a participant’s knowledge of a process, e.g., where to find a specific picking location. Motor learning describes the improvement in the physical performance of a task, e.g., using the HMD’s controllers for picking items. In simple tasks or with more experienced workers, motor learning usually has a larger effect than cognitive learning. In all other cases, cognitive learning is considered more dominant (Grosse and Glock 2013, p. 865; Dar-El et al. 1995a, p. 270). The model can be formulated as follows: c −bi m −bi tx,i = t1,i x + t1,i x c
m
(6.6)
The first term in the equation is used to calculate the effect of cognitive learning, with c giving the time needed for the first order if only cognitive learning occurs, and bc t1,i i being the learning exponent for cognitive learning. The second term in equation 6.6 m and bm respectively. In general, it is calculates the effect of motor learning using t1,i i assumed that cognitive learning surpasses motor learning, i.e. bic ≥ bim (Jaber and Glock 2013, p. 868). Both bic and bim can be combined into a single learning exponent bi as such (Dar-El et al. 1995a, p. 268): b∗
bi =
bic
−
log( R+X R+1 ) log(X )
(6.7)
tc
with b∗ = bic − bim and R = t1,i m . After having calculated bi , the learning rate L Ri 1,i can be found using equation 6.2. The Jaber-Glock Learning Curve Model (JGLC) An extension of the DPLC can be found in Jaber and Glock (2013). They introduce the factor ρi (0 ≤ ρi ≤ 1) to equation 6.6, which gives the percentage of the cognitive part in a task consisting of cognitive and motor learning components. Respectively, (1 − ρi ) gives the percentage of the motor component in the task. The Jaber-Glock learning curve model thus has the following equation:
6.2 Curve Fitting
143 c −bi m −bi tx,i = ρi t1,i x + (1 − ρi )t1,i x c
m
(6.8)
Similar to the DPLC, a learning exponent bi and a learning rate L Ri incorporating both cognitive and motor learning can be calculated using equations 6.7 and 6.2. The Three-parameter Hyperbolic Learning Curve Model (3PH) The three-parameter hyperbolic learning curve model is based on Mazur and Hastie (1978). It is considered simple but flexible and is well suited for applications in which learning occurs in both cognitive and motor skills (Uzumeri and Nembhard 1998, p. 518). Good results of the 3PH in terms of the model’s fit were also found by Grosse et al. (2015b). In particular, the authors point out that three-parameter hyperbolic models perform considerably better than two-parameter hyperbolic models, which are therefore not covered further in this thesis. The 3PH can be formulated as follows: x + pi yx,i = ki , (6.9) x + pi + ri with x + b + r = 0. This model uses the order picking rate yx,i , i.e. the number of orders per minute after order x of participant i. Thus, the time tx,i for order x of participant i (which is given in seconds) first needs to be transformed to yield the order picking rate per minute according to the following equation: yx,i =
60 tx,i
(6.10)
Similar to the DJLC in equation 6.4, the 3PH establishes a threshold for learning, given by the maximum rate ki ≥ 0, which defines the asymptote of the curve of participant i. In this model, the individual learning rate of participant i is defined by the parameter ri . Furthermore, the model incorporates prior experience of participant i by including the parameter pi , giving the number of orders of prior experience, with pi ≥ 0. In general, hyperbolic learning curve models offer the possibility to be applied to both increasing (ri > 0) and decreasing effects (ri < 0). They can therefore also be applied to scenarios in which—for example—learning results in a reduction of defective items (Grosse et al. 2015b, p. 403). Further Learning Curve Models Further learning curve models, which are common in literature but have not been used in this thesis, are exponential models and group learning models (Grosse et al. 2015b).
144
6
Analysis of Learning Curves in Virtual and Real Order Picking
For the case of field data, 3-parameter exponential models have also been found to yield promising results (Grosse et al. 2015b, p. 410). Even though these models can potentially be applied to the case of manual order picking as well, they are used to predict an increase in output with regard to the elapsed time. This fact limits the possibility of comparing exponential models to the other aforementioned learning curve models (Grosse et al. 2015b, p. 407). Moreover, Grosse and Glock (2013), who have also investigated an exponential time constant model for use in manual order picking, have found limited usability of the model in this case. As exponential models are thus not perfectly suitable for the available data, they have not been considered in this thesis. Group learning models are not used to describe the improvement of individual human workers but the improvement of entire groups (Glock and Jaber 2014). These models are thus not suitable for this thesis as the picking task was performed individually by each participant and no collaborative group work took place. An overview of all the learning curve models analysed in this thesis, along with the respective model equations and the abbreviations used in the remainder of this thesis, can be found in Table 6.3.
Table 6.3 Learning curve models used throughout this chapter Name
Abbreviation Model equation
Wright learning curve WLC Stanford B learning STB curve De Jong learning curve DJLC S-curve model
SCM
Dual phase learning curve model
DPLC
Jaber-Glock learning curve model
JGLC
Three-parameter hyperbolic learning curve model
3PH
References
tx,i = t1,i x −bi tx,i = t1,i (x + Bi )−bi
Wright (1936) Carlson (1973)
tx,i = t1,i (Mi +(1− Mi )x −bi ) tx,i = t1,i (Mi + (1 − Mi )(x + Bi )−bi ) tx,i = c x −bic + t m x −bim t1,i 1,i c x −bic + tx,i = ρi t1,i m x −bim (1 − ρi )t1,i i yx,i = ki x+x+pi p+r i
Jong (1957) Nembhard and Uzumeri (2000a) Dar-El et al. (1995a) Jaber and Glock (2013)
Mazur and Hastie (1978)
6.2 Curve Fitting
6.2.2
145
Results of the Fitted Learning Curve Models
To fit the learning curves to the data gathered from the experiments and estimate the model parameters, a non-linear least squares regression method has been applied using the nlxb() function from the nlsr-package in R. The function aims at finding the minimum of the residual sum of squares with the help of the Nash variant of the Marquardt algorithm (Nash 2019). Non-linear least squares regressions are widely used in literature for estimating learning curves (see e.g., Grosse et al. 2015b; Glock and Jaber 2014; Grosse and Glock 2013; Hinze and Olbina 2009). For picking times per item, the fitted learning curves for each participant along with the observed picking times per item are depicted in Figures 6.4 (sets 1–4 of group VR), 6.5 (sets 5–8 of group VR), 6.6 (sets 1–4 of group RR) and 6.7 (sets 5–8 of group RR). The results for the 3PH are not included in these figures as the model estimates picking rates, and so the y-axes in the figures do not fit the 3PH. Instead, the 3PH figures can be found in Appendix K in the electronic supplementary material. Descriptive statistics of the learning curve parameters are given in Table 6.4 (WLC, STB, DJLC, and SCM), Table 6.5 (DPLC and JGLC), and Table 6.6 (3PH). For the WLC, the STB, the DJLC and the SCM, Table 6.7 additionally gives the number of participants for which the regression estimates bi = 0, i.e. for which no learning effect can be observed. Additionally, the table displays in parentheses the number of participants for which a t-test has found the parameter bi to not significantly differ from zero (at a 5% significance level), even though the estimated bi = 0. For searching times, the fitted learning curves for each participant along with the observed searching times are displayed in Figures 6.8 (sets 1–4 of group VR), 6.9 (sets 5–8 of group VR), 6.10 (sets 1–4 of group RR) and 6.11 (sets 5–8 of group RR). As for picking times per item, the curves of the 3PH are not included in these figures due to the different format of the y-axis. Instead, they are provided in Appendix K in the electronic supplementary material. Descriptive statistics of the learning curve parameters are given in Table 6.8 (WLC, STB, DJLC, and SCM), Table 6.9 (DPLC and JGLC), and Table 6.10 (3PH). Table 6.11 gives the number of participants for which the WLC, the STB, the DJLC, and the SCM have estimated a learning exponent of bi = 0. The table gives in parentheses the additional number of participants for which a t-test has found no significant difference of bi from zero, based on a 5% significance level.
146
6
Analysis of Learning Curves in Virtual and Real Order Picking
Figure 6.4 Estimated learning curves for picking times per item (s) in sets 1–4 of participants in group VR
6.2 Curve Fitting
147
Figure 6.5 Estimated learning curves for picking times per item (s) in sets 5–8 of participants in group VR
148
6
Analysis of Learning Curves in Virtual and Real Order Picking
Figure 6.6 Estimated learning curves for picking times per item (s) in sets 1–4 of participants in group RR
6.2 Curve Fitting
149
Figure 6.7 Estimated learning curves for picking times per item (s) in sets 5–8 of participants in group RR
150
6
Analysis of Learning Curves in Virtual and Real Order Picking
Table 6.4 Estimated model parameters of the WLC, the STB, the DJLC, and the SCM for picking times per item (s)
6.2 Curve Fitting
151
Table 6.5 Estimated model parameters of the DPLC and the JGLC for picking times per item (s)
Table 6.6 Estimated model parameters of the 3PH for picking times per item (s)
Table 6.7 Number of participants with bi = 0 for estimating picking times per item (in parentheses: additional number of participants with p(H0 : bi = 0) > .05 based on a t-test)
152
6
Analysis of Learning Curves in Virtual and Real Order Picking
Figure 6.8 Estimated learning curves for searching times (s) in sets 1–4 of participants in group VR
6.2 Curve Fitting
153
Figure 6.9 Estimated learning curves for searching times (s) in sets 5–8 of participants in group VR
154
6
Analysis of Learning Curves in Virtual and Real Order Picking
Figure 6.10 Estimated learning curves for searching times (s) in sets 1–4 of participants in group RR
6.2 Curve Fitting
155
Figure 6.11 Estimated learning curves for searching times (s) in sets 5–8 of participants in group RR
156
6
Analysis of Learning Curves in Virtual and Real Order Picking
Table 6.8 Estimated model parameters of the WLC, the STB, the DJLC, and the SCM for searching times (s)
6.2 Curve Fitting
157
Table 6.9 Estimated model parameters of the DPLC and the JGLC for searching times (s)
Table 6.10 Estimated model parameters of the 3PH for searching times (s)
Table 6.11 Number of participants with bi = 0 for estimating searching times (in parentheses: additional number of participants with p(H0 : bi = 0) > .05 based on a t-test)
158
6.2.3
6
Analysis of Learning Curves in Virtual and Real Order Picking
Discussion of the Fitted Learning Curves for Picking Times per Item
Figures 6.4, 6.5, 6.6, and 6.7 show that, for most participants, the curves fitted to picking times per item are relatively close to each other. Only for some participants do some models slightly deviate from the other curves (see e.g., sets 1–4 of participant 30 or sets 5–8 of participants 7 and 12 in group RR or all sets of participant 28 in group VR). The figures also show that especially in sets 1–4 of group VR (which were performed in VR), a learning effect is visible for most participants, resulting in the curves falling steeply for the first orders. However, in sets 5–8 of group VR (which were performed in the real environment), the slope of the curves is visibly less steep, indicating that the learning effect is is no longer as pronounced as in the first four sets. Furthermore, Figure 6.6 also shows that for many participants in group RR, the slope of the learning curves in sets 1–4 is flatter compared to participants in group VR. For participants 9, 12, 20, 22, 25 and 27 in group RR, the learning curves are almost horizontal, meaning that no learning effect can be observed. In sets 5–8, even more participants in group RR have flat learning curves (participants 1, 2, 7, 9, 10, 11, 12, 13, 14, 15, 17, 18, 19, 22, 23, 24, 25, 26, 30, 33, 34, 35 and 36), indicating that after the first four sets of picking, their estimated picking times per item do not decrease any further. This finding is also confirmed by the results in Table 6.7. The table shows that in sets 1–4 of group VR, all participant have a learning exponent larger than zero. However, in sets 5–8 and for group RR, participants with a learning exponent equal to zero exist, meaning that no learning takes place for those participants. In sets 5–8 of group RR, the WLC finds bi = 0 for 20 participants (55.56%), and only two participants (5.56%) have an estimated learning exponent significantly larger than zero. For the STB and the DJLC, 38.89% resp. 42.22% of participants in group RR have a learning exponent equal to zero in sets 5–8. Only the SCM estimates a significantly positive learning effect for all participants in sets 1–4 of group VR. This finding is consistent with the results from Bishu et al. (1992, pp. 116–117), who claim that learning effects in manual order picking can mainly be observed during the first 50 orders of a picker. For the WLC, the learning exponent bi in Table 6.4 generally shows low median values except for sets 1–4 of group VR. In sets 5–8 of group RR, the median learning exponent yields a value of zero, indicating that for most participants in the sample, no more learning can be observed at this point. This also translates to relatively high median values for the learning rate L Ri , indicating only limited improvement of the participants. Note that for group RR, the values are similar for the learning exponents estimated by Grosse and Glock (2013), who have also found bi < .1 for the WLC, the STB, and the DJLC for two of the three order pickers in their study.
6.2 Curve Fitting
159
Furthermore, sets 1–4 of group VR have a longer estimated picking time for the first order, t1,i , compared to group RR and to sets 5–8, meaning that the participants need more time for the first orders in VR compared to participants picking in the real environment. The parameters of the STB yield similar estimates. With regard to the parameter Bi , results show median values of 10.00 (i.e. the upper limit defined for the regression) in sets 5–8 of both groups. Again, this result resembles the findings in Grosse and Glock (2013) who have found a value of Bi = 10.00 for two of three order pickers in their study. This indicates that all participants had some experience with picking items in the real environment, which is plausible due to participants having performed sets 1–4 previously. In contrast, in sets 1–4, the median value of Bi is only equal to one (i.e. the lower limit of the regression) for group VR, and 2.07 for group RR. As these sets were performed in different environments by the two groups, results support questionnaire results by showing that participants had only little prior experience with picking in VR. For the DJLC, estimates for t1,i and bi again resemble those of the previously mentioned learning curve models. For the factor of incompressibility (Mi ), small values for all sets of both groups have been found. In fact, Mi adopts a median value equal or very close to .1 in sets 1–4 of group VR and sets 5–8 of group RR. This is consistent with the previously cited literature finding Mi = .1 to be a good estimate for manual tasks (Gunawan 2009, p. 48; Grosse and Glock 2013, p. 861). Yet, the factor of incompressibility has a median of zero in sets 5–8 of group VR. Note that with Mi = 0, the DJLC is equivalent to the WLC. In the SCM, higher median values for bi compared to the other learning curve models stand out, with bi even reaching a median of one in sets 1–4 of both groups. Furthermore, the median Bi equals 10 in sets 1–4 of group RR and in sets 5–8 of both groups. The median factor of incompressibility (Mi ) is estimated to be zero in sets 5–8 of both groups. Therefore, the SCM is equivalent to the STB for most participants in these sets. c and t m indicate that the In the DPLC, the estimated values for the parameters t1,i 1,i m surpasses task mainly consists of the motor part, because the median value of t1,i c in all sets of both groups (see Table 6.5). In fact, the median the median value of t1,i c is equal to zero in sets 5–8 of both groups, transforming the DPLC into of t1,i the WLC with only motor learning for most participants. Estimates for the JGLC yield similar results. However, the parameter ρi assigns a larger share of the task to cognitive learning in all sets of both groups. The parameters of the 3PH given in Table 6.6 show that the maximum picking rate ki has an estimated median of 140 items per minute in sets 1–4 of group VR and 127 items per minute in sets 1–4 of group RR. Also, results show that participants of
160
6
Analysis of Learning Curves in Virtual and Real Order Picking
both groups have less prior experience for completing the task in sets 1–4 with lower median values for pi than in sets 5–8. Median estimated values for ri furthermore yield lower values for group RR in sets 1–4, indicating steeper learning curves. Being the only learning curve model that is able to estimate negative learning rates, the results for the minimum ri show that some participants indeed yield decreasing picking rates. This means that for those participants, fatigue might actually influence the picking performance, even though only 64 orders had to be picked in each block. However, maximum values for the parameters show unrealistically high values in both groups. The same can be observed in sets 5–8, in which median estimates for the parameters show very high values. This contrasts with the results of Grosse and Glock (2013), who have found that the 3PH yields a good fit for their data. However, they use considerably more data points for each participant to estimate their learning curves. One explanation as to why the 3PH does not find realistic values in this thesis could thus be that the number of 64 orders in this study does not suffice to adequately estimate the 3PH.
6.2.4
Discussion of the Fitted Learning Curves for Searching Times
Similar to the learning curve estimations based on picking times per item, Figures 6.8, 6.9, 6.10, and 6.11 show that learning curves for searching times lie close to each other. Only the DPLC and the JGLC deviate considerably from the other learning curve models in sets 1–4 of participants 1, 11, and 25 in group VR, sets 1–4 of participant 4 in group RR and sets 5–8 of participants 4 and 5 in group RR. Furthermore, in sets 5–8 of group RR (Figure 6.11), a deviation of the STB can be found for participants 11, 25, and 26. Apart from these participants, the learning curve models only differ—if at all—in their estimate for the first order and the initial part of the curve. The figures also show that learning effects are more present in sets 1–4 of both groups as the number of participants yielding a flat learning curve is noticeably larger in sets 5–8. Table 6.11 shows that in sets 5–8 of group RR, both the WLC and the STB, estimate bi = 0 for 17 participants (47.22%). For group VR, the number of participants with bi = 0 in sets 5–8 is 12 (37.50%) for the WLC and 13 (40.63%) for the STB. In sets 1–4 of both groups, the number of participants with bi = 0 is considerably lower, with 5 participants (13.89%) in group VR and 3 participants (9.38%) in group RR for the WLC, and 6 participants (16.67%) in group VR and 3 participants (9.38%) in group RR for the STB. For the DJLC and the SCM, the number of participants with bi = 0 is low in all sets of both groups. In sets 5–8 of group VR, both learning curve models find no participant with bi = 0,
6.2 Curve Fitting
161
i.e. a learning effect can be observed for all participants. Compared to picking times per item, this indicates that the searching times of a larger number of orders are effected by picker learning. As for picking times per item, the learning exponents estimated by the WLC yield small median values in all sets of both groups. Surprisingly, the median learning exponent in sets 1–4 is slightly larger for group RR compared to group VR. Again, the median learning exponents are similar to the values found by Grosse and Glock (2013). However, in sets 5–8, the learning exponent yields a median value of zero for group RR, supporting the previously stated finding from the figures that, for most participants, no learning can be observed and learning curves are flat. For group VR, the median learning exponent is also very close to zero. Regarding the median searching times estimated for the first order (t1,i ), both groups yield similar values. Moreover, the median searching time for the first order is 1.25 s longer in sets 1–4 compared to sets 5–8 of group VR (1.92 s for group RR), showing that participants have improved noticeably. The estimated learning exponents of the STB are similar. However, the estimated median times for the first order are slightly longer compared to estimates from the WLC. The parameter Bi has a median of one in sets 1–4 of both groups. This shows that in contrast to the task of picking items from the rack, for which Bi yielded a slightly larger median for group RR, both groups have equally limited previous experience with regard to the searching for items in the rack. This finding is plausible, as picking items in the real environment is a task similar to the everyday picking of objects. Searching for items in the rack, however, was a relatively new task for all participants, as no participant had seen the picking rack prior to the experiments. In sets 5–8, both groups have median values of Bi close to the maximum value of 10.00, showing the experience gained by previously performing sets 1–4. For the DJLC, relatively large median values for the factor of incompressibility (Mi ) can be observed. In fact, median values are considerably larger than the value of Mi = 0.1 suggested by Gunawan (2009, p. 48) and Grosse and Glock (2013, p. 861) for manual tasks. This causes the DJLC to plateau at relatively high values, allowing only a small maximum decrease in searching times due to learning. In fact, for x → ∞, learning can only decrease searching times to a median of 84% of the initial searching time for group VR and 81% for group RR in sets 5–8. The largest possible decrease can be observed in sets 1–4 of group RR, in which learning allows a decrease in searching times to a median of 34% of the initial searching times (52% in sets 1–4 of group VR). Parameter estimates of the SCM yield similar values for Bi and Mi compared to the previously mentioned learning curve models. However, it is striking that in the SCM, the learning exponent bi has a median of 1.00 in all sets of both groups except
162
6
Analysis of Learning Curves in Virtual and Real Order Picking
sets 5–8 of group RR, resulting in median learning rates of 50%. This means that more than 50% of the participants experience the maximum learning effects during these sets. The estimated parameters of the DPLC show that, as is the case for picking times per item, the motor component accounts for the main part of the searching times, m being larger than median t c in all sets of both groups. However, with median t1,i 1,i the median learning exponent for motor learning bim is estimated as zero or very close to zero in all sets of both groups, indicating that almost no motor learning takes place. This is also plausible, as improving searching times can be considered mainly a cognitive task that cannot be improved by motor learning. Results for the JGLC are similar with regard to the learning exponent. However, c , surpassing the median values of the JGLC estimates larger median values for t1,i m t1,i in all sets of both groups. However, this larger portion of the cognitive share of the task is countered by the low estimated values for ρi , assigning the largest part of the learning effect to motor learning for all participants. In sets 1–4 of group VR and sets 5–8 of both groups, ρi yields a median of zero, meaning that only motor learning can be observed for most participants, transforming the JGLC model into the basic WLC model. The 3PH plateaus at a median value of 14 searches per minute in all sets of both groups (ki in Table 6.10). Similar to the results of the STB, the 3PH shows larger previous experience in sets 5–8 compared to sets 1–4 in both groups (ρi ). However, and in contrast to the results of the STB, previous experience in sets 1–4 is larger in group RR than in group VR. All sets of both groups yield positive median learning rates ri , with a larger effect in group RR compared to group VR. However, negative minimum learning rates show once more that fatigue might be an issue for some participants. Nevertheless, unrealistically high (low) maximum (minimum) values for the parameters can again be noted, indicating that the 3PH is not suitable for estimating the searching times of some participants. Similar to picking times per item, this is likely to be caused by the low number of orders analysed for each participant when estimating the learning curves.
6.3
Evaluating the Quality of Fit of the Learning Curve Models
After having estimated the parameters of the different learning curve models, the aim of this section is to compare the fitted learning curve models in order to identify which model performs best for the given data, i.e. answering RQ 3.1. Therefore, different measures to quantify the quality of fit of the learning curve models have
6.3 Evaluating the Quality of Fit of the Learning Curve Models
163
to be defined first. Then, results are provided individually for the learning curves for picking times per item and learning curves for searching times. In a subsequent discussion, the research question is finally answered.
6.3.1
Quality Measures for the Comparison of the Learning Curve Models
A common measure to assess the fit of linear regression models is the coefficient of determination (R 2 ), which is used, for example, by Grosse et al. (2015b) to compare the quality of fit of different models. However, for non-linear models, the coefficient of determination is not recommended because it can produce false and misleading results (Spiess and Neumeyer 2010). Instead, this thesis uses the standard error of the regression (S E R) to compare the learning curve models. For participant i, the S E R of learning curve L can be calculated as such: S E RiL =
X x=1 (yx,i
L )2 − yˆ x,i
X − κL
=
RSSiL , X − κL
(6.11)
with yx,i being the real value of the data point (i.e. the dependent variable) at order x L being the estimated value for the same data point. X gives the total number and yˆ x,i of orders and κ L gives the number of estimated parameters of the learning curve model. The term X −κ L in the denominator in equation 6.11 is thus equivalent to the degrees of freedom of the estimated learning curve model. Note that the numerator of the fraction in equation 6.11 gives the residual sum of squares (RSS), a parameter which is also used by Grosse and Glock (2013) for the evaluation of learning curve models. The S E R can be interpreted as the standardized distance of the estimated data points from the observed data, measured in the same dimension as the data itself. This means that for the 3PH, the S E R is given in the dimension of the order picking rate yx,i . Hence, to compare the S E R of the 3PH with the S E R of the other learning curve models, it first has to be transformed to the same dimension using equation 6.10. When calculating the S E R value for each participant i and summarizing the results by calculating the mean values of all participants, potential outlier values can have a distorting effect (Grosse et al. 2015b, p. 407). One solution would be to use median values instead of the mean to summarize the results, because the median is robust against outliers. Another solution suggested by Grosse et al. (2015b), which has successfully been applied by Reif (2020), would be to assign a discrete value ηiL for the rank of the S E R to each learning curve model L of participant i. I.e. the
164
6
Analysis of Learning Curves in Virtual and Real Order Picking
learning curve model with the lowest S E R value for participant i would be assigned ηiL = 1, the second best would be assigned ηiL = 2 and so on. By doing so, the mean rank of each learning curve model L among all participants, η¯ L , can be calculated as follows: N L η η¯ L = i=1 i , (6.12) N with N denoting the total number of participants. Another value used by Grosse and Glock (2013) and Glock et al. (2012) for the evaluation and comparison of learning curve models is the so-called balance. It describes the relation between estimated data points lying above (i.e. residuals having negative values) and estimated data points lying below its real values (i.e. residuals having positive values). As it extends the S E R by assessing whether the learning curve model tends to overestimate or underestimate the observed data, it has been used in this thesis as well. Here, the balance of learning curve L for participant i is calculated as such: balanceiL =
number residuals < 0 number of residuals < 0 = total number of residuals X
(6.13)
Note that according to equation 6.13, a perfectly balanced learning curve would yield a balance of .50. A learning curve estimating all values to be below the real data points would have a balance of 0. Similarly, a learning curve estimating all values to be above the real data points would have a balance of 1.00.
6.3.2
Results for Picking Times per Item
The descriptive statistics for the S E RiL of each learning curve model for estimating picking times per item are given in Table 6.12. In section 6.2.3, it has already been found that the different learning curves generally lie close together. It is thus not surprising that the mean and median values of the S E RiL are similar for all models except the 3PH. The 3PH generally yields higher values for S E RiL compared to the other learning curve models in all sets of both groups. Only in sets 5–8 of group RR does the 3PH have a lower maximum S E RiL than the other models. This indicates that the 3PH outperforms the other learning curve models for a small number of participants. In sets 1–4 of group VR, the DJLC stands out by having a higher maximum S E RiL compared to the WLC, the STB, the SCM, the DPLC, and the JGLC. This means that the DJLC can perform worse than the other models in certain situations.
6.3 Evaluating the Quality of Fit of the Learning Curve Models
165
To gain further insight into the goodness of fit of the different learning curve models for different participants, Table 6.13 provides the number of times the ηiL of each model yields a specific ranking position. Additionally, Table 6.14 gives the mean rank, η¯ L , of each learning curve model along with its placement in the order of all learning curve models. As can be seen, the WLC has the lowest η¯ L in all sets of both groups, except in sets 1–4 of group VR. I.e. the WLC provides the best fit for picking times per item obtained in the real environment. In VR, however, the DJLC performs better. In sets 5–8, the second best fit is provided by the STB. Yet, in sets 1–4, the STB is only placed fourth in group VR and third in group RR. In all sets of all participants, the 3PH performs worst. However, when looking at Table 6.13, it can be seen that the 3PH leads to the lowest S E RiL for one participant in group VR and three participants in group RR, while yielding the highest S E RiL for all other participants. Finally, Table 6.15 provides descriptive statistics for the balanceiL of each learning curve model. As can be seen by all mean and median values being > .50, all learning curve models tend to overestimate the observed picking times per item.5 Again, the results for balanceiL are similar for all learning curves except the 3PH. Surprisingly, and in contrast to the findings on the S E R, where the 3PH performs worst of all learning curves, it yields the best results for balanceiL with the mean and median value being closest to the value of .50. The only exception can be found in sets 1–4 of group RR, where the other models yield a slightly better mean and median balance than the 3PH.
6.3.3
Results for Searching Times
With regard to searching times, descriptive statistics for the S E RiL of each learning curve model can be found in Table 6.16. As for picking times per item, the table shows that the S E RiL values for searching times do not differ much for the different learning curve models. The only exception to this is the 3PH, which again shows much higher median S E RiL values than the other models. However, in sets 5–8 of group VR and all sets of group RR, the 3PH yields very low minimum S E RiL values, indicating that for some few participants, the model provides a very good fit. One way to verify the results on balanceiL is by analysing the plots of the learning curves. For example, the maximum value of 1.00 reveals that for at least one participant in group VR, the DJLC estimates all data points in sets 1–4 to lie above the observed picking times per item. In fact, Figure 6.4 on page 146 shows a corresponding curve for participant 28.
5
166
6
Analysis of Learning Curves in Virtual and Real Order Picking
In Table 6.17, the number of ηiL yielding a specific rank is given for each learning curve model. Table 6.18 lists the mean rank η¯ L of each learning curve and its position among all models. As for learning curves estimating picking times per item, the WLC outperforms all other models in almost all sets of both groups. However, in VR (i.e. sets 1–4 of group VR), the DJLC again takes first place with regard to the median rank. This is surprising, as in sets 1–4 of group VR the DJLC is only ranked first eleven times, while the WLC is placed first fourteen times. Yet, the number of second, third and fourth ranks is higher for the DJLC. In sets 5–8 of group VR and in sets 1–4 of group RR, the DJLC takes the second rank after the WLC. In sets 5–8 of group RR, the STB performs second best and the DJLC takes the third rank with a difference in the mean rank of only .11. Again, the 3PH performs worst of all learning curve models, taking the last rank in all sets of group VR and sets 1–4 of group RR. However, in sets 5–8 of group RR, the JGLC takes the last rank with regard to the mean rank. As can be seen in Table 6.17, the 3PH either takes the first or the last rank of all learning curves for each participant. In sets 5–8 of group RR, the number of participants for which the 3PH takes the first rank is relatively large, with ten participants. Table 6.12 Descriptive statistics for S E RiL for the different learning curve models estimating picking times per item
Descriptive statistics for the balanceiL of each learning curve are given in Table 6.19. Similar to the results for picking times per item, the learning curves generally overestimate the observed searching times, with all mean values being
6.3 Evaluating the Quality of Fit of the Learning Curve Models
167
> .50.6 Note that the results for the WLC, the STB, the DJLC, the SCM, the DPLC, and the JGLC are similar. Only the 3PH performs considerably better, with median values for balanceiL being closest to the balanced value of 0.50.
6.3.4
Discussion
On the one hand, results for both picking times per item and searching times reveal that all learning curves except the 3PH perform almost equally well, as shown by the small difference in values of the S E RiL . This can be explained by the fact that the STB, the DJLC, and the SCM are transformed into the basic WLC if Bi = 0 resp. Mi = 0. Similarly, the DPLC and the JGLC also transform into the WLC if only cognitive or motor learning is present. As has been described in section 6.2.3, this is indeed the case for some participants: As the WLC yields the best fit, the minimum least squares method thus estimates the parameters of the other models to fit the WLC. However, the relatively small number of data points with a total of 64 orders per block must be kept in mind. With a larger number of data points, differences between the S E RiL of the learning curve models might be bigger. Moreover, it has to be noted that the task performed by the participants is relatively simple. This could explain why the DPLC and the JGLC fall behind the other models in terms of S E RiL , as the subdivision of the learning effects into cognitive and motor learning does not offer any advantage. For more complex tasks in VR, however, these models might be worth considering. The 3PH shows the worst performance of all learning curve models for both picking times per item and searching times for all except some few participants. This shows that the model is either very well suited or does not yield a good fit at all. A reason for this behaviour of the model could again lie in the small number of data points used to estimate the learning curves. Even though the 3PH might generally be well suited (Grosse and Glock 2013; Grosse et al. 2015b), the number of orders might be too small to adequately estimate the parameters of the 3PH in such a way that the model can provide a good fit for all participants. Another reason for the 3PH to outperform the other learning curve models for a small number participants could be a hint that negative learning effects caused by fatigue or forgetting occur. As the 3PH is the only model that is able to depict a negative effects (i.e. increasing rates), it 6
Results can again be verified by looking at the plots of learning curves: Note, for example, that the balance of the JGLC in sets 1–4 of group RR has a minimum value of zero. A corresponding curve underestimating all observed values can indeed be found for participant 4 in Figure 6.10 on page 154.
168
6
Analysis of Learning Curves in Virtual and Real Order Picking
Table 6.13 Number of ranking positions for each learning curve estimating picking times per item
Table 6.14 Mean rank of each learning curve (η¯ L ) for estimating picking times per item (in parentheses: placement among all learning curve models)
is supposed to provide a better fit in these cases compared to the other learning curve models. This assumption is supported by the fact that the 3PH generally performs better in the second block (sets 5–8). Although the DJLC performs best in VR and the WLC performs best in the real environment in terms of η¯ L , the results of the S E RiL and balanceiL show no large difference between the learning curves fitted to the data from VR and the learning curves fitted to the data from the real environment. This leads to the conclusion that the environment (virtual or real) does not have a large effect on the selection of the best learning curve model.
6.3 Evaluating the Quality of Fit of the Learning Curve Models
169
Table 6.15 Descriptive statistics for balanceiL for the different learning curve models estimating picking times per item
Table 6.16 Descriptive statistics for S E RiL for the different learning curve models estimating searching times
As an answer to RQ 3.1, the WLC, the STB and the DJLC are considered the best performing models. As a result, only these three learning curve models are used in the remainder of this thesis. While the basic WLC generally performs best, the STB enables further analysis of the previous experience (Bi ) and is therefore included in the following analysis. Similarly, the DJLC provides the factor Mi defining the plateau of the curve, which justifies the choice of this model for further evaluation.
170
6
Analysis of Learning Curves in Virtual and Real Order Picking
Table 6.17 Number of ranking positions for each learning curve estimating searching times
Table 6.18 Mean rank of each learning curve (η¯ L ) for estimating searching times (in parentheses: placement among all learning curve models)
Note that the entire dataset of each participant has been used for fitting the learning curve models. An evaluation is thus based on the goodness of fit and not on the predictive power of each model. The predictive power can be analysed by using only a part of the dataset for model fitting and then estimating the remaining data points using the fitted models (Grosse and Glock 2013, p. 860). This has not been done here due to the small number of orders per participant, so that the maximum number of data points is available for model fitting. As a result, is has to be pointed out that a clear statement on the predictive power can not yet be derived.
6.4 Comparing Learning Curves
171
Table 6.19 Descriptive statistics for balanceiL for the different learning curve models estimating searching times
6.4
Comparing Learning Curves between Virtual and Real Order Picking
In order to answer RQ 3.2, i.e. compare learning effects in VR and in the real environment, the parameter values of the WLC, the STB and the DJLC are statistically tested for equality between group VR and group RR. To do so, RQ 3.2 is first translated into research hypotheses. Subsequently, an approach similar to the procedure presented in section 5.2 is followed by first testing the parameter distributions within each group for normality using a Shapiro-Wilk test to select the appropriate statistical test for the subsequent hypotheses testing.7 Again, statistical tests have all been performed at a 5% significance level if not stated otherwise.
6.4.1
Research Hypotheses
Similar to section 5.1, the hypotheses developed in this section are not yet statistical null hypotheses, because the formulation of statistical null hypotheses requires 7
See also Figure 5.1 on page 105 for a graphical presentation of the approach. The only difference to the procedure in the previous chapter is that an ANOVA is not used for the comparison of learning curves.
172
6
Analysis of Learning Curves in Virtual and Real Order Picking
selecting the adequate statistical test first (Döring and Bortz 2016, p. 661; Farrugia et al. 2010, p. 280). However, the hypotheses are formulated in such a way that they can be transferred into statistical hypotheses easily once the test method has been elected. RQ 3.2 consists of two parts. The first part asks if learning effects can be transferred from VR to the real environment effectively, i.e. if a transfer of learning effect takes place at all. If this is the case, participants of group VR should start their picking in the second block (sets 5–8) with lower picking times per item than group RR starts their picking in the first block (sets 1–4). This is because group VR received four sets of order picking training in VR prior to performing sets 5–8 in the real environment, whereas group RR received no training at all before performing sets 1–4 in the real environment. In contrast, if learning effects cannot be transferred, picking times per item of group VR at the beginning of set 5 are expected to be equal or above the respective times of group RR at the beginning of set 1. The first hypothesis is thus: H 1.1 Picking times per item of the first order (t1,i ) in sets 1–4 of group RR are longer or equal to picking times per item of the first order in sets 5–8 of group VR. The second part of RQ 3.2 asks if learning effects from VR are transferred to the real environment efficiently. The alternative to training in VR would be training in the real environment. It is thus in question if learning effects transferred from VR are at least as large or larger than learning effects transferred from similar training in the real environment. As both groups perform the exact same number of orders in sets 1–4 but in different environments, this can be analysed by comparing the picking times per item of the first order in set 5 (i.e. after training) of the two groups. If learning effects are larger in VR compared to the real environment, participants of group RR are expected to start with longer picking times per item in set 5 than participants of group VR. The second research hypothesis is thus: H 1.2 Picking times per item of the first order (t1,i ) in sets 5–8 of group RR are longer or equal to picking times per item of the first order in sets 5–8 of group VR. Finally, also of interest is whether the remaining parameters of the learning curve models, i.e. the learning exponents bi , the factor Bi (for the STB) and the factor Mi (for the DJLC) differ between the two groups. The overall evaluation of the learning curve parameters is expected to give an additional insight into whether learning in VR is similar to learning in a real environment, for example by analysing if learning
6.4 Comparing Learning Curves
173
exponents (and thus learning rates) are different or if the learning curves plateau at a similar level. Additional hypotheses are therefore formulated as follows: H 1.3 The learning exponent bi of the learning curve models for estimating picking times per item does not differ between group VR and group RR. H 1.4 The factor Bi of the STB for estimating picking times per item does not differ between group VR and group RR. H 1.5 The factor Mi of the DJLC for estimating picking times per item does not differ between group VR and group RR. The hypotheses above are formulated with regard to picking times per item. For searching times, similar hypotheses are formulated in analogy to picking times per item: H 2.1 Searching times of the first order (t1,i ) in sets 1–4 of group RR are longer or equal to searching times of the first order in the sets 5–8 of group VR. H 2.2 Searching times of the first order (t1,i ) in sets 5–8 of group RR are longer or equal to searching times of the first order in sets 5–8 of group VR. H 2.3 The learning exponent bi of the learning curve models for estimating searching times does not differ between group VR and group RR. H 2.4 The factor Bi of the STB for estimating searching times does not differ between group VR and group RR. H 2.5 The factor Mi of the DJLC for estimating searching times does not differ between group VR and group RR. Although all hypotheses are non-specific in terms of the size of the hypothesised effect, H 1.1, H 1.2, H 2.1, and H 2.2 have been formulated giving a clear direction. The remaining hypotheses have been formulated as non-directional hypotheses. They are only concerned with the existence of a difference, because no assumptions on the direction of the effect can be made prior to the data analysis (Döring and Bortz 2016, pp. 148–149).
174
6.4.2
6
Analysis of Learning Curves in Virtual and Real Order Picking
Results for Picking Times per Item
For picking times per item, the results of the Shapiro-Wilk test for normal distribution of the parameters of each learning curve can be found in Table 6.20. The table shows that the test rejects normality at a 5% significance level for each parameter in sets 1–4 for at least one of the two groups. Just for the STB, normal distribution of bi can only be rejected at a 10% significance level. Consequently, the non-parametric Kolmogorov-Smirnoff and Mann-Whitney U tests have been used to test the abovegiven hypotheses. Note that in accordance with the hypotheses, the Mann-Whitney U tests for the initial picking times have been performed as one-sided tests against the alternative hypothesis that t1,x is lower in group RR than in group VR (i.e. the distribution of group RR is located to the left of the distribution of group VR), whereas the tests for all other hypotheses have been performed as two-sided test (i.e. asking for a simple difference in the underlying distributions). Comparing Picking Times per Item of the First Order in Sets 5–8 of Group VR with the First Order in Sets 1–4 of Group RR (H 1.1) In order to analyse if learning effects are transferred effectively, it is necessary to test whether picking times per item in the first order in sets 1–4 of group RR are longer or equal to picking times per item of the first order in sets 5–8 of group VR. The Kolmogorov-Smirnoff test rejects an equal distribution for the WLC and the DJLC but not for the STB (WLC: D = 0.39, p = .008; STB: D = .17, p = .613; DJLC: D = .45, p = .001). For the WLC and the DJLC, the one-sided Mann-Whitney U test thus tests if the distribution of the data of group RR is located to the right of the distribution of the data of group RR. For the STB, the test can be interpreted as a test for the hypothesis that the median of group RR is larger than the median of group VR (Divine et al. 2018; Hart 2001). For all three learning curve models, the results clearly show that the null hypothesis cannot be rejected (WLC: W = 744, p = .998; STB: W = 629, p = .901; W = 782, p = .999). In summary, hypothesis H 1.1 (Picking times per item of the first order (t1,i ) in sets 1–4 of group RR are longer or equal to picking times per item of the first order in sets 5–8 of group VR.) cannot be rejected based on the results. Participants that received training in VR start with lower or equal picking times per item in the real environment compared to participants who received no prior training.
6.4 Comparing Learning Curves
175
Table 6.20 Results of a Shapiro-Wilk test for normal distribution of the learning curve parameters for estimating picking times per item
Comparing Picking Times per Item of the First Order in Sets 5–8 of both Groups (H 1.2) The next evaluation is whether learning effects are transferred efficiently, i.e. if the initial picking times per item in sets 5–8 are larger in group RR compared to group VR. Results for the Kolmogorov-Smirnoff test and the Mann-Whitney U test can be found in Table 6.21. The table shows that the Kolmogorov-Smirnoff tests rejects an equal distribution of t1,i in sets 5–8 at a 5% significance level only for the WLC and the DJLC. The Mann-Whitney U test rejects the hypothesis only for the STB. As an equal distribution can be assumed in the STB, the Mann-Whitney U test in this case can be interpreted as a test for equal medians. Based on the results of the statistical test given in Table 6.21, the hypothesis H 1.2 (Picking times per item of the first order (t1,i ) in sets 5–8 of group RR are longer or equal to picking times per item of the first order in sets 5–8 of group VR.) can only be rejected for the STB. For reasons of verification, the first orders in sets 1–4 are also compared between the two groups. Results for picking times per item are also given in Table 6.21. The Kolmogorov-Smirnoff test as well as the Mann-Whitney U test reject the hypothesis of an equal distribution of t1,i in group VR and group RR for all learning curve models. Comparing the Learning Exponent bi (H 1.3) For the learning exponent bi , test results can also be found in Table 6.21. Equal distributions are rejected by the Kolmogorov-Smirnoff and the Mann-Whitney U test in all learning curve models and in all sets. Only for the STB does the KolmogorovSmirnoff test not reject the hypothesis of equal distributions in sets 5–8 at a 5% but at a 10% significance level. With regard to the results of the Mann-Whitney U test, hypothesis H 1.3 (The learning exponent bi of the learning curve models for
176
6
Analysis of Learning Curves in Virtual and Real Order Picking
estimating picking times per item does not differ between group VR and group RR.) must be rejected in all sets and all learning curve models. Table 6.21 Results of a Kolmogorov-Smirnoff and a Mann-Whitney U test comparing learning curve parameters for estimating picking times per item between group VR and group RR. (Note that all tests were performed as two-sided tests except for the Mann-Whitney U test for the parameter t1,i , which was performed as a one-sided test according to the formulated hypothesis.)
Comparing the Factor Bi in the STB (H 1.4) With regard to the previous experience of participants estimated by the parameter Bi in the STB, the Kolmogorov-Smirnoff test finds no significant differences in the underlying distributions, as is shown in Table 6.21. The Mann-Whitney U test, however, rejects the hypothesis of equal medians in sets 5–8, but not in sets 1–4. This means that hypothesis H 1.4 (The factor Bi of the STB for estimating picking times per item does not differ between group VR and group RR.) has to be rejected in sets 5–8 but not in sets 1–4. Comparing the Factor Mi in the DJLC (H 1.5) Table 6.21 further reveals that the factor of incompressibility, Mi , in the DJLC does not show significantly different distributions based on the Kolmogorov-Smirnoff test, neither in sets 1–4 nor in sets 5–8. The hypothesis of equal medians can also not be rejected based on the Mann-Whitney U test in sets 1–4. However, in sets 5–8, the hypothesis of equal medians is rejected at a 5% significance level. Hence, hypothesis H 1.5 (The factor Mi of the DJLC for estimating picking times per item does not differ between group VR and group RR.) is only rejected in sets 5–8.
6.4.3
Results for Searching Times
For searching times, the results of the Shapiro-Wilk tests for normal distribution can be found in Table 6.22. Again, normality can be rejected for at least one of the
6.4 Comparing Learning Curves
177
Table 6.22 Results of a Shapiro-Wilk test for normal distribution of the learning curve parameters for estimating searching times
two groups for all parameters in sets 1–4 and sets 5–8 at a 5% significance level. Similar to picking times per item, a Kolmogorov-Smirnoff and a Mann-Whitney U test are therefore used to compare the parameter estimates between the two groups. Additionally, as for picking times per item, the Mann-Whitney U tests for the initial searching times have been performed as one-sided tests against the alternative hypothesis that t1,x is lower in group RR than in group VR. All other hypothesis tests are performed as two-sided tests. Comparing Searching Times of the First Order in Set 5–8 of Group VR with the First Order in Sets 1–4 of Group RR (H 2.1) To evaluate if learning effects in searching times are transferred effectively, it needs to be tested if searching times of the first order (t1,i ) of sets 1–4 of group RR are longer or equal to searching times in the first order in sets 5–8 of group VR. The Kolmogorov-Smirnoff test rejects an equal distribution for all learning curve models (WLC: D = .63, p < .001; STB: D = .53, p < .001; DJLC: D = .57, p < .001). The Mann-Whitney U test therefore tests if the distribution of the initial searching times of group RR in sets 1–4 is located to the right of the distribution of the initial searching times of group VR in sets 5–8. The results clearly indicate that the hypothesis cannot be rejected (WLC: W = 898, p = 1; STB: W = 871, p = 1; DJLC: W = 905, p = 1). As a result, hypothesis H 2.1 (Searching times of the first order (t1,i ) in sets 1–4 of group RR are longer or equal to searching times of the first order in the sets 5–8 of group VR.) cannot be rejected based on the data. Comparing Searching Times of the First Order in Sets 5–8 of both Groups (H 2.2) To analyse if learning effects are transferred efficiently, it is necessary to test if the initial searching times t1,i in sets 5–8 are longer for group RR than for group VR.
178
6
Analysis of Learning Curves in Virtual and Real Order Picking
Results for the Kolmogorov-Smirnoff test, as well as the Mann-Whitney U test, can be found in Table 6.23. The table indicates that the Kolmogorov-Smirnoff test does not reject the assumption of equal distribution in sets 5–8. Similarly, the Mann-Whitney U test does not reject the hypothesis of larger or equal medians of group RR for either of the learning curve models. Therefore, hypothesis H 2.2 (Searching times of the first order (t1,i ) in sets 5–8 of group RR are longer or equal to searching times of the first order in sets 5–8 of group VR.) cannot be rejected. For reasons of verification, the initial searching times in sets 1–4 are also compared between the two groups. However, the results given in Table 6.23 are similar to the results in sets 5–8, neither rejecting the hypothesis of equal distributions nor the hypothesis of larger or equal medians of group RR. Table 6.23 Results of a Kolmogorov-Smirnoff and a Mann-Whitney U test comparing learning curve parameters for estimating searching times between group VR and group RR. (Note that all tests were performed as two-sided tests except for the Mann-Whitney U test for the parameter t1,i , which was performed as a one-sided test according to the formulated hypothesis.)
Comparing the Learning Exponent bi (H 2.3) For the learning exponents bi , test results are also given in Table 6.23. As can be seen, the Kolmogorov-Smirnoff tests does not find a significant difference in the underlying distributions for either of the learning curve models. The hypothesis of equal medians can also not be rejected based on the Mann-Whitney U test, neither in sets 1–4 nor in sets 5–8. Hence, hypothesis H 2.3 (The learning exponent bi of the learning curve models for estimating searching times does not differ between group VR and group RR.) cannot be rejected.
6.4 Comparing Learning Curves
179
Comparing the Factor Bi in the STB (H 2.4) Test results of the comparison of the parameter Bi in the STB can also be found in Table 6.23. The table shows that the Kolmogorov-Smirnoff test again does not reject the hypothesis of equal distributions. Moreover, equal medians can be assumed based on the results of the Mann-Whitney U test in both blocks. Thus, hypothesis H 2.4 (The factor Bi of the STB for estimating searching times does not differ between group VR and group RR.) cannot be rejected, neither in sets 1–4 nor in sets 5–8. This means that both groups have the same previous experience in terms of searching times. Comparing the Factor Mi in the DJLC (H 2.5) Results for the tests comparing the parameter Mi can also be found in Table 6.23. Again, the Kolmogorov-Smirnoff test does not reject the hypothesis of equal distributions in either of the sets. Additionally, the hypothesis of equal medians is not rejected by the Mann-Whitney U test. As a result, hypothesis H 2.5 (The factor Mi of the DJLC for estimating searching times does not differ between group VR and group RR.) cannot be rejected, neither in sets 1–4 nor in sets 5–8.
6.4.4
Discussion
The hypotheses and the results of the statistical tests are summarized in Table 6.24. The fact that neither H 1.1 nor H 2.1 have been rejected for most learning curve models is a strong indicator that learning effects are indeed transferred from VR to the real environment, as participants who received training in VR start picking in the real environment with lower picking times per item and searching times compared to participants with no prior experience. VR training can thus be called effective. Moreover, the results referring to H 1.2 and H 2.2 are also promising, as no evidence has been found that participants of group VR perform worse at the beginning of set 5–8 than participants in group RR. Instead, both groups start the second block of the experiment performing at least equally well. This means that the training received by group RR performing sets 1–4 in the real environment does not lead to significantly faster picking times per item or searching times compared to the training received by group VR in the virtual setup. Instead, as both groups start set 5 with an equal median time for the first order, the previous training in sets 1–4 has an equal effect for both groups, no matter in which environment sets 1–4 were performed.
180
6
Analysis of Learning Curves in Virtual and Real Order Picking
Table 6.24 Summary of the test results for each hypothesis, i.e. if the hypothesis has been rejected
However, rejecting equal distributions for picking times per item of the first order in sets 1–4 between the two groups is consistent with previous findings from chapter five, stating that participants in group VR need longer for picking items, especially at the beginning of the experiment, when they are not used to the VR controller. In fact, Table 6.4 on page 150 reveals that the estimated median of t1,i is 1.52 s (WLC), 2.09 s (STB) and 1.22 s (DJLC) longer for group VR compared to group RR. For searching times in sets 1–4, a similar difference between the two groups has, however, not been found. This is also plausible because the findings in chapter five concluded that, in general, searching times between the two groups do not differ significantly. This means that, in contrast to picking times per item, participants in VR do not need extra time to become familiar with the virtual environment and reach similar searching times compared to participants in group RR. With regard to the learning exponent bi , H 1.3 (for picking times per item) has been rejected, but H 2.3 (for searching times) has not. Table 6.4 shows larger median values for estimated bi for picking times per item of group VR compared to group
6.4 Comparing Learning Curves
181
RR for all learning curve models. It can therefore be assumed that the improvement in picking times per item is larger for group VR. This leads to the conclusion that participants needing to get used to the handling of the VR controller is responsible for this larger learning effect, as the different form of physical interaction with the items is the main difference between the two environments. Nevertheless, finding no significant difference in bi for searching times is also not surprising, as it resembles the results for initial searching times. Altogether, it shows that the improvement in searching times due to learning is most likely independent of the environment. Results for the factor Bi for picking times per item show that the random assignment of participants to the groups has led to the same initial situation in both groups with regard to previous experience in order picking, because H 1.4 has not been rejected in sets 1–4. Moreover, the short introduction on the use of the VR controllers given to participants in group VR (see step 4 in Figure 4.2 on page 68) does not influence the results. However, in sets 5–8, H 1.4 has been rejected. This result is especially interesting: At the beginning of set 5, participants of both groups already have previous experience from performing the same number of orders in sets 1–4. In contrast to the results of testing hypothesis H 1.2, the different environments in which these previous orders were performed seem to have an influence on the estimates of Bi . Nevertheless, it must be pointed out that of all parameters, Bi was most limited during the estimation of the learning curves, with a lower bound of 1 and an upper bound of 10. This means that the available range of Bi has been limited from the start, which could have an influence on the observed difference between the two groups. Results on the comparison of Bi for searching times also show that participants have been assigned to the groups in such a way that the groups do not differ in terms of experience prior to sets 1–4. Also, having found no difference for the estimator of prior experience in sets 5–8 for searching times indicates that having performed sets 1–4 in different environments has no influence on Bi in sets 5–8. Results for the comparison of Mi for picking times per item (H 1.5) imply that the maximum improvement of picking times per item in relation to t1,i is equal for both groups in sets 1–4, even though the curves of the two groups do not plateau at the same level due to the aforementioned differences in t1,i . Instead, the higher value of t1,i causes the DJLC of group VR to plateau at a higher level than the DJLC of group RR, meaning that even after an infinite number of orders, participants in VR will not reach the same picking times per item as participants in group RR. This is again consistent with results from the previous chapter, showing significant differences in picking times per item between the two environments. In contrast, for searching times, results of the comparison of Mi (H 2.5) indicate that the DJLC plateaus at the same level for both groups in sets 1–4 and sets 5–8. This is because a significant
182
6
Analysis of Learning Curves in Virtual and Real Order Picking
difference in the initial searching times has not been found. After an infinite number of orders in the setup, participants yield similar searching times independent from the environment in which the picking takes place. This is yet another indicator that learning effects for searching times are similar in VR compared to learning effects in the real environment. In summary, the answer to RQ 3.2 is that training in VR is effective and at least as efficient as training in a real environment. However, with regard to picking times per item, participants have extra time requirements to learn how to operate the VR controllers, resulting in larger initial picking times in sets 1–4 but also larger learning exponents. For the practical use of VR for planning and training, this time requirement must be considered. In section 6.6, it is therefore further analysed how many orders are actually needed until participants in group VR can be considered familiar with the controllers. For searching times, the learning effects seem to be independent of the environment in which the training takes place. This means that searching items at a single rack can indeed be trained in VR.
6.5
Using Learning Curves for Predicting Human Performance in the Real Environment
In order to answer RQ 2.2, the parameters of the fitted learning curves can now be used to predict picking times per item and searching times in the real environment. Therefore, a linear model has been formulated using the parameters from the WLC obtained in VR (sets 1–4) to predict the parameters of the WLC of the same participants in the real environment (sets 5–8). Hence, in contrast to the previous analyses presented in this thesis, a within-subject approach with only the participants in group VR is used. Note that for reasons of simplicity, only the fitted WLC is used here, as the other learning curve models yield similar results. For both the picking times per item and the searching times, three linear models have been estimated as shown in Table 6.25. The first model (1.1 and 2.1 in Table 6.25) uses the times of the first order in sets 1–4 to predict the times of the first order in sets 5–8, thus analysing the dependency between the times of the first order in a new environment. The second model (1.2 and 2.2) again predicts the times of the first order in sets 5–8, but this time using the times of the last order in sets 1–4. Using the times of the last order in sets 1–4 offers the advantage that interfering effects of participants being unfamiliar with the VR controllers or the effects of training are expected to have worn off. The third model (1.3 and 2.3) uses the times of the final order in sets 1–4 to predict the times of the final order in sets 5–8. Because the learning curves are strictly monotonically decreasing, the model thus predicts the lowest estimated times of the learning curve model in sets
6.5 Using Learning Curves for Predicting Human Performance …
183
5–8 using the lowest estimated times in sets 1–4, i.e. after the participants have performed four sets of training in each environment. This way, the model predicts the long-term performance of the pickers by eliminating the effect of training to the furthest possible extent. The linear models are fitted to the data of group VR using a least-squares method provided by the function lm() in R. The results of the regressions are also given in Table 6.25. The table shows that all coefficient estimates for βˆ1 are significant at a 5% level, except for model 2.1. This means that the times estimated by the WLC in sets 5–8 do indeed depend on the times in the virtual environment. Only for searching times in the first order of sets 5–8, a dependency of the searching times of the first order in sets 1–4 cannot be found (model 2.1). Furthermore, the results for the multiple R 2 indicate that up to 34% of the variation in picking times per item (model 1.3) and 65% of the variation in searching times (model 2.3) can be explained by the models. Note that the highest values for the multiple R 2 are always reached in the model that uses the times of the last order in sets 1–4 to predict the times of the last order in sets 5–8. This result is plausible, as the interfering effect of participants being unfamiliar with either the virtual or the real environment is least pronounced in the last orders. Thus, using time measures after a certain number of previous picks have been performed for predicting long term performance is recommended. Therefore, RQ 2.2 can be answered as such: The performance of order pickers in VR can indeed be used to predict their performance in the long term in a similar real environment. This finding has strong implications for practice. For example, order picking in VR can be used to identify tasks or setups in which specific employees perform best. Also, a VR environment can be reliably used to evaluate the performance of order pickers in a real environment, for example during an application process. Table 6.25 Results of the linear regression (with standard errors of the estimated coefficients in parentheses)
184
6.6
6
Analysis of Learning Curves in Virtual and Real Order Picking
Using Learning Curves to Estimate the Number of Orders Necessary for Familiarization in Virtual Reality
By using the fitted learning curves, the aim of this section is to provide an estimate for the time needed by participants to become familiar with the VR controllers, thus answering RQ 2.3. As significant differences in picking times have been found between picking in VR and in a real environment in chapter five, calculating this time requirement for familiarization is valuable for the future use of VR in manual order picking. It provides an insight into the number of orders that should be performed by participants in VR prior to the collection of data for planning purposes so that results of VR picking times are comparable to a real order picking setup. f am To do so, the number of orders xi is calculated for each participant in group VR. It gives the number of orders after which the participant’s picking time per item reaches the mean picking time per item in the first order of group RR in sets 1–4, i.e. tx f am ,i ≤ t¯1,R R . Thus, the term tx f am ,i ≤ t¯1,R R is inserted into the model equations i
i
of the WLC, the STB and the DJLC, which are then solved for xi For the WLC, equation 6.1 transforms as follows: log(t1,i ) − log(t¯1,R R ) f am xi ≥ ex p bi
f am
.
For the STB, equation 6.3 becomes log(t1,i ) − log(t¯1,R R ) f am ≥ ex p xi −Bi . bi For the DJLC, equation 6.4 transforms as such: log(1 − Mi ) + log(t1,i ) − log(t¯1,R R − t1,i Mi ) f am xi ≥ ex p bi
(6.14)
(6.15)
(6.16)
As can be seen in Table 6.4 on page 150, the estimated mean picking time per item for the first order of group RR in sets 1–4 (t¯1,R R ) is .84 s in the WLC, .99 s in the STB, and .07 s in the DJLC. Using these values and the individually estimated f am learning curve parameters in sets 1–4, xi can be calculated for each participant of group VR. Table 6.26 gives the .50-, the .75-, the .90-, and the .95-quantile as f am well as the maximum value (i.e. the 1.00-quantile) of xi . Note that the DJLC of four participants does not intersect with t¯1,R R at all, due to the model’s property to f am plateau. Those participants thus never reach t¯1,R R according to the DJLC and xi can therefore not be calculated for them. For the calculation of the quantiles, these participants were excluded.
6.6 Using Learning Curves to Estimate the Number of Orders …
185
Table 6.26 Different p-quantiles of the estimated number of orders per participant in group f am VR (xi ) to reach the mean picking time per item for the first order of group RR in sets 5–8 (t¯1,R R )
Results in Table 6.26 show that after having completed 18 orders, 50% of participants in group VR reach t¯1,R R according to the WLC. According to the STB, only 13 orders are needed for 50% of participants to become familiar with the VR controllers. The DJLC suggests ten orders for 50% of participants in group VR to reach t¯1,R R . However, for 95% of participants to reach t¯1,R R , an estimated 340 orders are needed according to the WLC. These results show that a very long time of familiarization and a large number of orders are theoretically needed in VR until 95% of participants achieve picking times per item as fast as the mean picking times per item in group RR. However, note that in the experimental study presented in this thesis, only 64 orders were performed in VR. Values for the .90-, the .95- and the 1.00-quantile thus exceed the range of the observed values in the real data by far and must therefore be interpreted with care (Hahn 1977). With regard to these results, the answer to RQ 2.3 is thus that giving participants as much time as possible to become familiar with the handling of the VR controllers appears recommendable when using VR for planning manual order picking. Yet, in a practical application, an extended time period for the individual familiarization of participants might either not be available or not be economically feasible. Hence, these results again highlight the need for advanced technology providing more intuitive physical interaction in VR. For practical application, however, a familiarization period of at least 56 orders in VR can be recommended based on the data. This number is equal to the .75quantile of the WLC, meaning that 75% of participants achieve t¯1,R R after those 56 orders. The value estimated by the WLC has been chosen not only because the WLC was found to yield the best fit in sets 1–4 of group RR and the second best fit of group VR in terms of S E RiL (see Table 6.14), but also because its estimated value for the .75-quantile is the highest among the three learning curve models and therefore yields the most conservative estimate.
7
Conclusion
7.1
Summary of Results and Answer to the Research Questions
The primary aim of this research has been to evaluate whether modern consumerlevel HMDs can be used for planning and training in the context of manual order picking. Therefore, three research questions have been raised and answered consecutively. The first research question asked which manual activities within an order picking process can be simulated in VR using HMDs. By performing a systematic literature review and analysing the content of 61 relevant studies, it has been theorized that especially searching for items and the actual picking (i.e. manual interaction with objects in VR) are well suited for a simulation using VR HMDs. However, so far, simulating these activities in VR using an HMD has only been evaluated by a small number of studies. On the other hand, activities associated with picker travelling (especially walking) have already been covered by a considerable number of studies finding that the limitations of HMDs lead to inconsistencies between walking in VR and walking in a real environment. In order to add to the available body of literature, an experimental study was then designed to evaluate picking and searching activities in VR. A total of 112 participants subsequently performed an order picking task either in a real environment or, additionally, in a similar VR simulation using a HTC Vive. The results of the study have been analysed in order to answer the second research question, asking if VR HMDs can be used for the planning of manual order picking. To do so, inferential statistics methods have been employed to test if the measured performance differs between VR and the real environment, which would limit the usability of VR HMDs for planning purposes. First of all, the results show that © The Author(s), under exclusive license to Springer Fachmedien Wiesbaden GmbH, part of Springer Nature 2021 J.-K. Knigge, Virtual Reality in Manual Order Picking, https://doi.org/10.1007/978-3-658-34704-8_7
187
188
7
Conclusion
the individually perceived workload of participants does not differ significantly between the two environments. Also, no differences have been found in the number of erroneous orders and searching times. However, set completion times and picking times have been found to be significantly longer in VR compared to the real environment. Similarly, the number of orders with dropped items has been found to be significantly larger in VR. The answer to the second research question is, therefore, that VR HMDs can indeed be used for planning purposes if the reduction of the perceived workload, the number of erroneous orders or searching times are the objectives of the planning process. In this context, it has further been found that the performance of pickers in VR can be used to predict their performance in the real environment. However, if picking times are in the focus of a planning process, additional time requirements in VR need to be considered. It has also been found that this additional time requirement is partly due to participants being unfamiliar with the handheld controllers for interaction in VR. Therefore, the time needed by users to become familiar with picking in VR has also been researched. The results indicate that after 56 orders performed in VR, 75% of participants achieve picking times equal to or below the estimated picking time of the first order of users in the real environment. The third research question asks if VR HMDs can be used to train users in the field of manual order picking. To answer this question, it was first necessary to analyse which learning curve models perform best when fitted to the picking times per item and searching times obtained from the experimental study. The best results in terms of the S E R and balance have been found for the Wright, the Stanford B and the De Jong learning curve model. For these learning curve models, the estimated coefficients have been compared between VR and the real environment. It has been found that, especially with a focus on improving searching times, training manual order picking in VR is effective (i.e. participants with no training are not faster than participants trained in VR) and at least as efficient as training in a real environment (i.e. participants trained in the real environment are not faster than participants trained in VR). Hence, VR can be used for the training of manual order picking. The key findings of this thesis are briefly summarized in Figure 7.1.
7.2 Implications for Research and Practice
189
Figure 7.1 Summary of the key findings of this thesis
7.2
Implications for Research and Practice
This is the first study evaluating the usability of a contemporary, consumer-level HMD in manual order picking. Moreover, with its 112 participants, it is one of the largest studies on the use of VR HMDs in general. Due to the fundamental character of the research, the results provide important implications for research and practice alike. In the context of scientific research, VR simulations are becoming more and more popular due to the advantages of its use in experimental studies that involve human users (Mol 2019). For manual order picking in particular, HMDs could facilitate human factors research, which in recent years has become one of the most important research fields in this context. Previous experimental studies on human factors either had to collect data from observing real order picking processes (e.g., Grosse and Glock 2013), or erect artificial warehouses for the purpose of experimental investigations (e.g., Vries et al. 2016b). In both scenarios, a systematic control of environmental parameters is either not possible or difficult and costly to achieve. For example, an experimental investigation of the effect of different rack heights either requires an entire set of racks in different heights or automatic solutions for height adjustment (Könemann et al. 2015, P. 197). In VR though, adjusting the height of racks in a model can be done easily and with minimal additional cost. By having shown that the perceived workload and searching times do not differ between order picking in a VR environment and a real environment, the study at hand enables the future use of VR HMDs for all kinds of research in this area, making complicated real-world picking setups unnecessary. Another
190
7
Conclusion
potential application lies in the development and the scientific evaluation of different order picking technologies. Previous studies have already compared the advantages and disadvantages of pick-by-voice, pick-by-light or pick-by-vision technology in different order picking scenarios (e.g., Vries et al. 2016b; Guo et al. 2015). Similar studies analysing workload and searching times associated with each technology can now be performed in VR, enabling a quick and easy adjustment of the order picking environments or the configuration of the technical system itself. Furthermore, the study has revealed some limitations of contemporary HMDs that provide implications for their use in future experimental research. First of all, the study has found that VR HMDs are most suitable for investigating stationary picking, as the simulation of naturalistic walking is difficult. Moreover, it has been found that for studies on the ergonomics of the actual picking of items, the HMDs’ handheld controllers do not provide a sufficiently realistic form of physical interaction with items in VR. This is also the first study to quantify the resulting differences in picking times between a VR environment and a real environment. From a general perspective, the results on picking and searching times also add to the wide body of literature on physical interaction with items, and the visual perception of human users in VR. Especially the large sample size of the study is assumed to ensure a large external validity of the results. Future research on VR HMDs can thus benefit from the findings. For example, the results of the study could lay the ground for future research on other activities that are similar to manual order picking, such as assembly tasks or the refilling of racks. Finally, it must be noted that the full dataset obtained from the experimental study has been published openly and is thus available for future research on manual order picking and on the usability of VR HMDs. In business practice, the results enable the future use of VR HMDs for manifold applications. First of all, the results support the use of the technology early during the planning of a new warehouse, making an evaluation of the workload and searching times in different order picking setups possible before they are actually physically erected. This way, the technology can have a great impact on planning decisions in practice, especially if single racks in parts-to-picker systems are under investigation. Similarly, existing order picking setups can be analysed for potential improvements leading to a lower workload for the pickers and shorter searching times without the need for actually implementing them. This includes the selection and configuration of the order picking technology (pick-by-voice, pick-by-light, pick-by-vision etc.): VR HMDs can be used to identify the most suitable technology for an existing warehouse or an existing group of employees. It can also be used to configure the technology (e.g., find the ideal degree of information provision in a pick-by-vision
7.3 Limitations and Outlook on Future Research
191
system1 ) according to the needs of individual employees without interfering with real operations. As a result, consumer-level VR HMDs offer a high potential to significantly reduce planning costs as different technologies and configurations can be thoroughly tested prior to the purchase of the actual hardware. Moreover, VR HMDs can be used for the selection and training of new employees. As it enables a realistic assessment of real-world searching times, the technology makes it possible to evaluate the performance of applicants and support the assignment of existing personnel to specific picking tasks in a way that best suits their individual characteristics. Also, the results provide evidence that VR HMDs can be used to train picker personnel in new picking environments, including picking environments that do not yet exist or are not accessible. This is relevant for newly employed pickers as well as existing personnel that have to work in changing picking environments. This kind of training might be especially beneficial in picking scenarios in which fast searching times and a minimum number of errors are essential.
7.3
Limitations and Outlook on Future Research
Of course, this research has several limitations that need to be mentioned. First of all, the systematic literature review conducted in order to answer the first research question has only focused on research published in peer-reviewed journals. Even though this specification ensures a certain level of quality within the sample, the number of articles on the comparability of manual activities between VR and a real environment is certainly larger within conference proceedings and white paper publications due to the novelty of modern consumer-level HMDs. Future studies including these publications might thus also yield valuable insights into current streams of research and the potential of VR HMDs in the context of planning and training. Second, with respect to the experimental study, it has to be noted that the participants were mainly recruited among students. Therefore, the majority of participants had an age of under 30 years. Also, the number of male participants was much higher than the number of female participants. Under-representation of female participants has been identified as a general issue in VR research that could potentially limit the explanatory power of the results (Peck et al. 2020). This means that the sample might not be entirely representative for professional order pickers. However, it is 1
For the effect of different degrees of information provision in pick-by-vision systems, please refer to Elbert and Sarnow (2019).
192
7
Conclusion
assumed that the large size of the sample can at least to some degree counteract this limitation and ensure external validity. Moreover, statistical tests have found no difference in the measured times of participants with an age below 30 years and participants with an age equal or above 30 years. Yet, conducting the experiment with professional order pickers has yielded results that are significantly different from the results of the main sample. However, it must be kept in mind that the experimental conditions were significantly different for the sample of professional order pickers, limiting the comparability of results. For future research, it is thus highly advisable to repeat the study with a more heterogeneous sample or with a larger sample of professional order pickers. Repeating the study with a larger number of participants from different age groups would also enable the testing of additional hypotheses, e.g., analysing the influence of the pickers’ age on the dependent variables in more detail. In the study at hand, only the results of participants with an age of under 30 years and participants with an age of 30 years and above are compared due to the small number of participants in the latter group. Third, the research and its explanatory power is limited by the choice of the apparatus. For the VR display, a HTC Vive was used throughout the entire study. This means that the results might only be applicable to this particular HMD. In fact, especially the results on picking times and dropped items are assumed to depend considerably on the input device that has been used for physical interaction in VR. With regard to the wide variety of input devices that are already available on the market or currently under development, picking performance could differ significantly with another input device. Nevertheless, the results on searching times are considered independent from the input device and depend only on the visual output of the HMD itself and the simulation. This means that the results on searching times are assumed to be consistent for every HMD with a comparable resolution and field of view. However, the study should be repeated in the future, using different VR technology for the input and output. Especially with the development of advanced tools for physical interaction, repeating this study would be useful in order to evaluate the possibilities of these developments. Fourth, another limitation lies in the fact that for reasons of comparability, participants were only allowed to pick two items at once in the real setup, as it was only possible to pick one item at a time with each VR controller in the virtual setup. In reality though, pickers would pick as many items as they could securely carry with each pick. This restriction given to participants is thus caused by the limitations of the VR controllers, once again highlighting the effect of the interaction technology on the usability of VR simulations to evaluate human performance. In order to make the technology function better in practice, future research on VR should therefore especially focus on advanced and affordable techniques for physical inter-
7.3 Limitations and Outlook on Future Research
193
action within the VR. Similarly, the missing weight of items in VR further limits the explanatory power of the study. Although it was ensured that item weight in the real setup was almost negligible to create comparability and validity of the experimental study, picking items with a considerable weight might produce different results, especially when dealing with learning effects and fatigue of participants. In general, the analysis of fatigue has not been covered by the thesis at hand. Future research could therefore also focus on finding ways to simulate haptic feedback and item weight in VR and analyse the effect of picking heavier items. Fifth, the experimental setup has not been altered during the study. This means that this thesis is unable to provide an answer to the question whether adjustments in the order picking setup lead to similar changes in the dependent variables in the virtual and in the real environment. However, a statement on the effect of design changes would also be a valuable contribution for the future use of VR during a planning process. Systematically altering the setup and comparing the results between VR and the real environment is therefore considered another promising field for future research. Sixth, for the estimation of learning curve models, it has to be noted that the number of data points for each participant is relatively small with only 64 orders in each block. Accordingly, the curve fitting might yield better results for some learning curve models if a larger set of data is used. The number of data points was the result of the limited duration of the experiment but a longer duration of the experiment with a larger number of orders would have been unreasonably costly in terms of resource demand and workload for volunteer participants. Finally, it must be pointed out that this thesis has neither covered the psychological effects of using a VR HMD nor the legal requirements for using the technology in practice. Especially when using VR HMDs for the evaluation of applicants or existing employees, legal factors must be considered. Also, potential users might be unwilling or feel anxious or uncomfortable using a VR HMD, which could lead to an increased level of stress. Although the data on the perceived workload does not indicate that participants in VR were exposed to higher levels of stress, the psychological aspects should be investigated in more detail. In this context, future research could use the Simulator Sickness Questionnaire (Kennedy et al. 1993), the Presence Questionnaire (Witmer et al. 2005) or the Immersive Experience Questionnaire (Jennett et al. 2008) in a similar setup. These are well established questionnaires accessing the cybersickness, sense of presence and immersion perceived by users in VR, that could provide additional insights into the capabilities and limitations of simulating order picking using VR HMDs. In the end, it must not be forgotten that the giant VR simulation depicted in the movie The Matrix is actually used by intelligent machines to suppress humanity
194
7
Conclusion
and use people as energy-generating slaves. In reality though, VR technology offers a great potential to become the exact opposite: A technology that supports human activities in various fields of application and helps creating better working environments for human employees. Even though VR technology is still far away from the possibilities described in the movie The Matrix, this thesis has shown that VR is no longer just science fiction, but also a technology that can indeed be used in practice. So one day, experiencing limitless VR simulations like in the movie The Matrix might actually become reality. By creating the basis for the future use of VR in manual order picking, this thesis hopes to provide a contribution to this goal and open up many new possibilities for the application as well as the future development of this truly exciting technology.
Bibliography
Abbott, Harish and Udatta S. Palekar (2008). “Retail replenishment models with displayspace elastic demand”. In: European Journal of Operational Research 186(2), pp. 586–607. doi: https://doi.org/10.1016/j.ejor.2006.12.067. Akpan, Ikpe Justice and Murali Shanker (2017). “The confirmed realities and myths about the benefits and costs of 3D visualization and virtual reality in discrete eventmodeling and simulation: A descriptive meta-analysis of evidence from research and practice”. In: Computers & Industrial Engineering 112, pp. 197–211. doi: https://doi.org/10.1016/j.cie. 2017.08.020. Al-Benna, Sammy, Yazan Al-Ajam, Benjamin Way, and Lars Steinstraesser (2010). “Descriptive and inferential statistical methods used in burns research”. In: Burns 36(3), pp. 343–346. doi: https://doi.org/10.1016/j.burns.2009.04.030. Albert, Leo (2019). “Zwischen den Realitäten – Ableitung einer Definition des Begriffes Übertragbarkeit zwischen virtueller und realer Welt auf Basis einer systematischen Literaturrecherche”. Bachelor Thesis. Technische Universität Darmstadt, Fachgebiet Unternehmensführung und Logistik. Aldaba, Cassandra N. and Zahra Moussavi (2020). “Effects of virtual reality technology locomotive multi-sensory motion stimuli on a user simulator sickness and controller intuitiveness during a navigation task”. In: Medical and Biological Engineering and Computing 58(1), pp. 143–154. doi: https://doi.org/10.1007/s11517-019-02070-2. American Sociological Association (2018). Code of Ethics. url: https://www.asanet.org/ sites/default/files/asa_code_of_ethics-june2018a.pdf (visited on April 8, 2020). Anastas, Jeane W. (2000). Research Design for Social Work and the Human Services. New York: Columbia University Press. doi: https://doi.org/10.7312/anas11890. ANSI, American National Standards Institute (1993). Guide to human performance measurements. Washington, DC: American Institute of Aeronautics. Anzanello, Michel Jose and Flavio Sanson Fogliatto (2011). “Learning curve models and applications: Literature review and research directions”. In: International Journal of Industrial Ergonomics 41(5), pp. 573–583. doi: https://doi.org/10.1016/j.ergon.2011.05.001. Aurich, J. C., D. Ostermayer, and C. H. Wagenknecht (2009). “Improvement of manufacturing processes with virtual reality-based CIP workshops”. In: International Journal of Production Research 47(19), pp. 5297–5309. doi: https://doi.org/10.1080/00207540701816569.
© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Fachmedien Wiesbaden GmbH, part of Springer Nature 2021 J.-K. Knigge, Virtual Reality in Manual Order Picking, https://doi.org/10.1007/978-3-658-34704-8
195
196
Bibliography
Avila, Lisa and Mike Bailey (2014). “Virtual reality for the masses”. In: IEEE Computer Graphics and Applications 34(5), pp. 103–104. doi: https://doi.org/10.1109/MCG.2014. 103. Baños, R. M., C. Botella, M. Alcañiz, V. Liaño, B. Guerrero, and B. Rey (2004). “Immersion and emotion: Their impact on the sense of presence”. In: Cyberpsychology and Behavior 7(6), pp. 734–741. doi: https://doi.org/10.1089/cpb.2004.7.734. Battini, Daria, Martina Calzavara, Alessandro Persona, and Fabio Sgarbossa (2015). “A comparative analysis of different paperless picking systems”. In: Industrial Management & Data Systems 115(3), pp. 483–503. doi: https://doi.org/10.1108/IMDS-10-2014-0314. Battini, Daria, Christoph H. Glock, Eric H. Grosse, Alessandro Persona, and Fabio Sgarbossa (2016). “Human energy expenditure in order picking storage assignment: A bi-objective method”. In: Computers & Industrial Engineering 94, pp. 147–157. doi: https://doi.org/ 10.1016/j.cie.2016.01.020. Battini, Daria, Calzavara Martina, Persona Alessandro, Sgarbossa Fabio, Visentin Valentina, and Ilenia Zennaro (2018). “Integrating mocap system and immersive reality for efficient humancentred workstation design”. In: IFAC-PapersOnLine 51(11), pp. 188–193. doi: https://doi.org/10.1016/j.ifacol.2018.08.256. Berg, Leif P. and Judy M. Vance (2017). “Industry use of virtual reality in product design and manufacturing: a survey”. In: Virtual Reality 21(1), pp. 1–17. doi: https://doi.org/10.1007/ s10055-016-0293-9. Berger, Paul D, Robert E Maurer, and Giovana B. Celli (2018). Experimental Design. Second edition. Cham: Springer. Berger, Samuel M. and Timothy D. Ludwig (2007). “Reducing Warehouse Employee Errors Using Voice-Assisted Technology That Provided Immediate Feedback”. In: Journal of Organizational Behavior Management 27(1), pp. 1–31. doi: https://doi.org/10.1300/ J075v27n01_01. Berni, Aurora and Yuri Borgianni (2020). “Applications of Virtual Reality in Engineering and Product Design: Why, What, How, When and Where”. In: Electronics 9(7), p. 1064. doi: https://doi.org/10.3390/electronics9071064. Bertram, Johanna, Johannes Moskaliuk, and Ulrike Cress (2015). “Virtual training: Making reality work?” In: Computers in Human Behavior 43, pp. 284–292. doi: https://doi.org/10. 1016/j.chb.2014.10.032. Bhargava, Ayush, Kathryn M. Lucaites, Leah S. Hartman, Hannah Solini, Jeffrey W. Bertrand, Andrew C. Robb, Christopher C. Pagano, and Sabarish V. Babu (2020). “Revisiting affordance perception in contemporary virtual reality”. In: Virtual Reality 24, pp. 713–724. doi: https://doi.org/10.1007/s10055-020-00432-y. Bishu, R. R., B. Donohue, and P. Murphy (1992). “Cognitive ergonomics of a mail order filling company: Part 2 – influence of shelf coding and address information on acquisition time”. In: Applied Ergonomics 23(2), pp. 115–120. doi: https://doi.org/10.1016/00036870(92)90083-8. Bishu, R.R., B. Donohue, and P. Murphy (1991). “Cognitive ergonomics of a mail order filling company: Part 1 – Influence of colour, position and highlighting on recognition time”. In: Applied Ergonomics 22(6), pp. 367–372. doi: https://doi.org/10.1016/00036870(91)90077-U. Bliss, James P., Alexandra B. Proaps, and Eric T. Chancey (2015). “Human Performance Measurement in Virtual Environments”. In: Handbook of Human Factors and Ergonomics.
Bibliography
197
Ed. by Kelly S. Hale and Kay M. Stanney. Second edition. Boca Raton, FL: CRC Press. Chap. 29, pp. 749–780. Borrego, Adrián, Jorge Latorre, Mariano Alcañiz, and Roberto Llorens (2018). “Comparison of Oculus Rift and HTC Vive: Feasibility for Virtual Reality-Based Exploration, Navigation, Exergaming, and Rehabilitation”. In: Games for Health Journal 7(3), pp. 151–156. doi: https://doi.org/10.1089/g4h.2017.0114. Bowman, Doug a and Ryan P. McMahan (2007). “Virtual Reality: How Much Immersion Is Enough?” In: Computer 40(7), pp. 36–43. doi: https://doi.org/10.1109/MC.2007.257. Boysen, Nils, Dirk Briskorn, and Simon Emde (2017). “Parts-to-picker based order processing in a rack-moving mobile robots environment”. In: European Journal of Operational Research 262(2), pp. 550–562. doi: https://doi.org/10.1016/j.ejor.2017.03.053. Brosius, Hans-Bernd, Alexander Haas, and Friederike Koschel (2016). Methoden der empirischen Kommunikationsforschung. 7., überarbeitete und aktualisierte Auflage. Wiesbaden: VS Verlag für Sozialwissenschaften. doi: https://doi.org/10.1007/978-3-53119996-2. Brough, John E., Maxim Schwartz, Satyandra K. Gupta, Davinder K. Anand, Robert Kavetsky, and Ralph Pettersen (2007). “Towards the development of a virtual environment-based training system for mechanical assembly operations”. In: Virtual Reality 11(4), pp. 189– 206. doi: https://doi.org/10.1007/s10055-007-0076-4. Bryman, Alan (2012). Social Research Methods. Fourth Edition. Oxford: Oxdord University Press. Brynzér, H. and M.I. Johansson (1995). “Design and performance of kitting and order picking systems”. In: International Journal of Production Economics 41(1–3), pp. 115–125. doi: https://doi.org/10.1016/0925-5273(95)00083-6. Buckley, Christina E., Dara O. Kavanagh, Oscar Traynor, and Paul C. Neary (2014). “Is the skillset obtained in surgical simulation transferable to the operating theatre?” In: American Journal of Surgery 207(1), pp. 146–157. doi: https://doi.org/10.1016/j.amjsurg.2013.06. 017. Budziszewski, Pawel, Andrzej Grabowski, Marcin Milanowicz, Jaroslaw Jankowski, and Marek Dzwiarek (2011). “Designing a workplace for workers with motion disability with computer simulation and virtual reality techniques”. In: International Journal on Disability and Human Development 10(4), pp. 355–358. doi: https://doi.org/10.1515/IJDHD.2011. 054. Burdea, G.C. (2000). “Haptics issues in virtual environments”. In: Proceedings Computer Graphics International 2000. Geneva, Switzerland: IEEE Comput. Soc, pp. 295–302. doi: https://doi.org/10.1109/CGI.2000.852345. Bustamante, Ernesto A and Randall D. Spain (2008). “Measurement Invariance of the Nasa TLX”. In: Proceedings of the Human Factors and Ergonomics Society Annual Meeting 52(19), pp. 1522–1526. doi: https://doi.org/10.1177/154193120805201946. Bystrom, Karl-Erik, Woodrow Barfield, and Claudia Hendrix (1999). “A Conceptual Model of the Sense of Presence in Virtual Environments”. In: Presence: Teleoperators and Virtual Environments 8(2), pp. 241–244. doi: https://doi.org/10.1162/105474699566107. Calzavara, Martina, Christoph H. Glock, Eric H. Grosse, Alessandro Persona, and Fabio Sgarbossa (2017a). “Analysis of economic and ergonomic performance measures of different rack layouts in an order picking warehouse”. In: Computers & Industrial Engineering 111, pp. 527–536. doi: https://doi.org/10.1016/j.cie.2016.07.001.
198
Bibliography
Calzavara, Martina, Robin Hanson, Fabio Sgarbossa, Lars Medbo, and Mats I. Johansson (2017b). “Picking from pallet and picking from boxes: a time and ergonomic study”. In: IFAC-PapersOnLine 50(1), pp. 6888–6893. doi: https://doi.org/10.1016/j.ifacol.2017.08. 1212. Calzavara, Martina, Fabio Sgarbossa, and Alessandro Persona (2019). “Vertical Lift Modules for small items order picking: an economic evaluation”. In: International Journal of Production Economics 210, pp. 199–210. doi: https://doi.org/10.1016/j.ijpe.2019.01.012. Campbell, Donald T. (1957). “Factors relevant to the validity of experiments in social settings”. In: Psychological Bulletin 54(4), pp. 297–312. doi: https://doi.org/10.1037/h0040950. Cardoso, Jorge C.S. and André Perrotta (2019). “A survey of real locomotion techniques for immersive virtual reality applications on head-mounted displays”. In: Computers & Graphics 85, pp. 55–73. doi: https://doi.org/10.1016/j.cag.2019.09.005. Carlson, J.G. (1973). “Cubic learning curve: precision tool for labor estimating”. In: Manufacturing Engineering and Management 71(5), pp. 22–25. Cecil, J. and A. Kanchanapiboon (2007). “Virtual engineering approaches in product and process design”. In: International Journal of Advanced Manufacturing Technology 31(9– 10), pp. 846–856. doi: https://doi.org/10.1007/s00170-005-0267-7. Chakravorty, Satya S. (2009). “Improving distribution operations: Implementation of material handling systems”. In: International Journal of Production Economics 122(1), pp. 89–106. doi: https://doi.org/10.1016/j.ijpe.2008.12.026. Chang, Chen-Wei, Shih-Ching Yeh, Mengtong Li, and Eason Yao (2019). “The Introduction of a Novel Virtual Reality Training System for Gynecology Learning and Its User Experience Research”. In: IEEE Access 7, pp. 43637–43653. doi: https://doi.org/10.1109/ACCESS. 2019.2905143. Chapman, Karen and Paul Brothers (2006). “Database Coverage for Research in Management Information Systems”. In: College & Research Libraries 67(1), pp. 50–62. doi: https://doi. org/10.5860/crl.67.1.50. Charness, Gary, Uri Gneezy, and Michael A. Kuhn (2012). “Experimental methods: Betweensubject and within-subject design”. In: Journal of Economic Behavior & Organization 81(1), pp. 1–8. doi: https://doi.org/10.1016/j.jebo.2011.08.009. Chen, Fangyu, Hongwei Wang, Chao Qi, and Yong Xie (2013). “An ant colony optimization routing algorithm for two order pickers with congestion consideration”. In: Computers & Industrial Engineering 66(1), pp. 77–85. doi: https://doi.org/10.1016/j.cie.2013.06.013. Choi, Sangsu, Kiwook Jung, and Sang Do Noh (2015). “Virtual reality applications in manufacturing industries: Past research, present findings, and future directions”. In: Concurrent Engineering Research and Applications 23(1), pp. 40–63. doi: https://doi.org/10.1177/ 1063293X14568814. Chuang, Yi-Fei, Hsu-Tung Lee, and Yi-Chuan Lai (2012). “Item-associated cluster assignment model on storage allocation problems”. In: Computers & Industrial Engineering 63(4), pp. 1171–1177. doi: https://doi.org/10.1016/j.cie.2012.06.021. Claeys, Dieter, Ivo Adan, and Onno Boxma (2016). “Stochastic bounds for order flow times in parts-to-picker warehouses with remotely located order-picking workstations”. In: European Journal of Operational Research 254(3), pp. 895–906. doi: https://doi.org/10.1016/ j.ejor.2016.04.050. Cleave, Blair L., Nikos Nikiforakis, and Robert Slonim (2013). “Is there selection bias in laboratory experiments? The case of social and risk preferences”. In: Experimental Economics 16(3), pp. 372–382. doi: https://doi.org/10.1007/s10683-012-9342-8.
Bibliography
199
Coburn, Joshua Q., Ian Freeman, and John L. Salmon (2017). “A Review of the Capabilities of Current Low-Cost Virtual Reality Technology and Its Potential to Enhance the Design Process”. In: Journal of Computing and Information Science in Engineering 17(3), p. 031013. doi: https://doi.org/10.1115/1.4036921. Cohen, Jacob (1988). Statistical Power Analysis for the Behavioral Sciences. Second. Hillsdale, NJ: Lawrence Erlbaum. Cohen, Jacob (1992). “Statistical Power Analysis”. In: Current Directions in Psychological Science 1(3), pp. 98–101. doi: https://doi.org/10.1111/1467-8721.ep10768783. Coleman, David E. and Douglas C. Montgomery (1993). “A Systematic Approach to Planning for a Designed Industrial Experiment”. In: Technometrics 35(1), pp. 1–12. doi: https://doi. org/10.2307/1269280. Cook, Thomas D. and Donald T. Campbell (1979). Quasi-experimentation: Design & Analysis Issues for Field Settings. Boston: Houghton Mifflin. Cook, Thomas D. and Donald T. Campbell (1986). “The Causal Assumptions of QuasiExperimental Practice”. In: Synthese 68, pp. 141–180. Cowan, Kirsten and Seth Ketron (2019). “Prioritizing marketing research in virtual reality: development of an immersion/fantasy typology”. In: European Journal of Marketing 53(8), pp. 1585–1611. doi: https://doi.org/10.1108/EJM-10-2017-0733. Creagh, H. (2003). “Cave Automatic Virtual Environment”. In: Proceedings: Electrical Insulation Conference and Electrical Manufacturing and CoilWinding Technology Conference (Cat. No.03CH37480). Indianapolis, Indiana, USA: IEEE, pp. 499–504. doi: https://doi. org/10.1109/EICEMC.2003.1247937. Creem-Regehr, Sarah H., Jeanine K. Stefanucci, and William B. Thompson (2015). “Perceiving absolute scale in virtual environments: How theory and application have mutually informed the role of body-based perception”. In: Psychology of Learning and Motivation – Advances in Research and Theory. Vol. 62. Elsevier Ltd, pp. 195–224. doi: https://doi. org/10.1016/bs.plm.2014.09.006. Croson, Rachel (2002). “Why and how to experiment: Methodologies from experimental economics”. In: University of Illinois Law Review 921, pp. 921–945. Croson, Rachel (2005). “The Method of Experimental Economics”. In: International Negotiation 10(1), pp. 131–148. doi: https://doi.org/10.1163/1571806054741100. Cruz-Neira, Carolina, Daniel J. Sandin, Thomas A. DeFanti, Robert V. Kenyon, and John C. Hart (1992). “The CAVE: audio visual experience automatic virtual environment”. In: Communications of the ACM 35(6), pp. 64–72. doi: https://doi.org/10.1145/129888.129892. Curran-Everett, Douglas and Dale J. Benos (2004). “Guidelines for reporting statistics in journals published by the American Physiological Society”. In: American Journal of Physiology-Cell Physiology 287(2), pp. C243–C245. doi: https://doi.org/10.1152/ajpcell. 00250.2004. Curry, Christopher, Ruixuan Li, Nicolette Peterson, and Thomas A. Stoffregen (2020). “Cybersickness in Virtual Reality Head-Mounted Displays: Examining the Influence of Sex Differences and Vehicle Control”. In: International Journal of Human–Computer Interaction 36(12), pp. 1161–1167. doi: https://doi.org/10.1080/10447318.2020.1726108. D’Andrea, Raffaello (2012). “Guest editorial: A revolution in the warehouse: A retrospective on Kiva Systems and the grand challenges ahead”. In: IEEE Transactions on Automation Science and Engineering 9(4), pp. 638–639. doi: https://doi.org/10.1109/TASE.2012. 2214676.
200
Bibliography
Dar-El, E. M., K. Ayas, and I. Gilad (1995a). “A dual-phase model for the individual learning process in industrial tasks”. In: IIE Transactions 27(3), pp. 265–271. doi: https://doi.org/ 10.1080/07408179508936740. Dar-El, E. M., K. Ayas, and I. Gilad (1995b). “Predicting performance times for long cycle time tasks”. In: IIE Transactions (Institute of Industrial Engineers) 27(3), pp. 272–281. doi: https://doi.org/10.1080/07408179508936741. David, Dominik (2019). “Die Übertragbarkeit von Kennzahlen zwischen Virtueller Realität und realer Anwendung – Eine systematische Literaturanalyse”. Master Thesis. Technische Universität Darmstadt, Fachgebiet Unternehmensführung und Logistik. Davis, Douglas D. and Charles A. Holt (1993). Experimental economics. Princeton, New Jersey: Princeton University Press. De Paolis, Lucio Tommaso and Valerio De Luca (2020). “The impact of the input interface in a virtual environment: the Vive controller and the Myo armband”. In: Virtual Reality 24(3), pp. 483–502. doi: https://doi.org/10.1007/s10055-019-00409-6. Deck, Cary and Vernon Smith (2013). “Using laboratory experiments in logistics and supply chain research”. In: Journal of Business Logistics 34(1), pp. 6–14. doi: https://doi.org/10. 1111/jbl.12006. Dede, Chris (2009). “Immersive interfaces for engagement and learning”. In: Science 323(5910), pp. 66–69. doi: https://doi.org/10.1126/science.1167311. Derrick, B. and P. White (2016). “Why Welch’s test is Type I error robust”. In: The Quantitative Methods for Psychology 12(1), pp. 30–38. doi: https://doi.org/10.20982/tqmp.12.1.p030. Divine, George W., H. James Norton, Anna E. Barón, and Elizabeth Juarez-Colunga (2018). “The Wilcoxon–Mann–Whitney Procedure Fails as a Test of Medians”. In: American Statistician 72(3), pp. 278–286. doi: https://doi.org/10.1080/00031305.2017.1305291. DiZio, Paul and James R. Lackner (1992). “Spatial Orientation, Adaptation, and Motion Sickness in Real and Virtual Environments”. In: Presence: Teleoperators and Virtual Environments 1(3), pp. 319–328. doi: https://doi.org/10.1162/pres.1992.1.3.319. Döring, Nicola and Jürgen Bortz (2016). Forschungsmethoden und Evaluation in den Sozial- und Humanwissenschaften. 5. vollständig überarbeitete, aktualisierte und erweiterte Auflage. Springer-Lehrbuch. Berlin, Heidelberg: Springer Berlin Heidelberg. doi: https:// doi.org/10.1007/978-3-642-41089-5. Eastgate, Richard M., John R. Wilson, and Mirabelle D’Cruz (2015). “Structured Development of Virtual Environments”. In: Handbook of Virtual Environments. Ed. by Kelly S. Hale and Kay M. Stanney. Second edition. Boca Raton, FL: CRC Press, pp. 353–389. Eckel, Catherine C. and Philip J. Grossman (2000). “Volunteers and pseudo-volunteers: The effect of recruitment method in dictator experiments”. In: Experimental Economics 3(2), pp. 107–120. doi: https://doi.org/10.1007/bf01669303. Eiriksdottir, Elsa and Richard Catrambone (2011). “Procedural instructions, principles, and examples: How to structure instructions for procedural tasks to enhance performance, learning, and transfer”. In: Human Factors 53(6), pp. 749–770. doi: https://doi.org/10.1177/ 0018720811419154. El Beheiry, Mohamed, Sébastien Doutreligne, Clément Caporal, Cécilia Ostertag, Maxime Dahan, and Jean-Baptiste Masson (2019). “Virtual Reality: Beyond Visualization”. In: Journal of Molecular Biology 431(7), pp. 1315–1321. doi: https://doi.org/10.1016/j.jmb. 2019.01.033.
Bibliography
201
Elbert, Ralf, Jan-Karl Knigge, Rami Makhlouf, and Tessa Sarnow (2019). “Experimental study on user rating of virtual reality applications in manual order picking”. In: IFACPapersOnLine 52(13), pp. 719–724. doi: https://doi.org/10.1016/j.ifacol.2019.11.200. Elbert, Ralf, Jan-Karl Knigge, and Tessa Sarnow (2018). “Transferability of order picking performance and training effects achieved in a virtual reality using head mounted devices”. In: IFAC-PapersOnLine 51(11), pp. 686–691. doi: https://doi.org/10.1016/j.ifacol.2018. 08.398. Elbert, Ralf and Tessa Sarnow (2019). “Augmented Reality in Order Picking–Boon and Bane of Information (Over-) Availability”. In: Advances in Intelligent Systems. Vol. 903, pp. 400– 406. doi: https://doi.org/10.1007/978-3-030-11051-2_61. Elbert, Ralf, Tessa Sarnow, and Jan-Karl Knigge (2017). “Virtual Reality in Logistics – Opportunities and Limitations of Planning & Training in Logistics”. In: Proceedings of the 9th International Scientific Symposium on Logistics 2018: Understanding Future Logistics – Models, Applications, Insights. Magdeburg, Germany, pp. 150–173. Elliott, Alan and Wayne Woodward (2007). Statistical Analysis Quick Reference Guidebook. Thousand Oaks, California, USA: SAGE Publications, Inc. doi: https://doi.org/10.4135/ 9781412985949. Epple, Dennis, Linda Argote, and Rukmini Devadas (1991). “Organizational Learning Curves: A Method for Investigating Intra-Plant Transfer of Knowledge Acquired Through Learning by Doing”. In: Organization Science 2(1), pp. 58–70. doi: https://doi.org/10.1287/orsc.2. 1.58. Ernst, Michael D. (2004). “Permutation methods: A basis for exact inference”. In: Statistical Science 19(4), pp. 676–685. doi: https://doi.org/10.1214/088342304000000396. Falk, Armin and Christian Zehnder (2011). “Did We Overestimate the Role of Social Preferences? The Case of Self-Selected Student Samples”. In: IZA Discussion Paper Series No. 5475. url: https://www.iza.org/publications/dp/5475/did-we-overestimate-the-roleofsocial-preferences-the-case-of-self-selected-student-samples (visited on August 30, 2020). Farrugia, Patricia, Bradley A Petrisor, Forough Farrokhyar, and Mohit Bhandari (2010). “Practical tips for surgical research: Research questions, hypotheses and objectives.” In: Canadian journal of surgery. Journal canadien de chirurgie 53(4), pp. 278–281. Figueiredo, Lucas, Eduardo Rodrigues, João Teixeira, and Veronica Teichrieb (2018). “A comparative evaluation of direct hand and wand interactions on consumer devices”. In: Computers & Graphics 77, pp. 108–121. doi: https://doi.org/10.1016/j.cag.2018.10.006. Finnsgård, Christian and Carl Wänström (2013). “Factors impacting manual picking on assembly lines: An experiment in the automotive industry”. In: International Journal of Production Research 51(6), pp. 1789–1798. doi: https://doi.org/10.1080/00207543.2012.712729. Flavián, Carlos, Sergio Ibáñez-Sánchez, and Carlos Orús (2019). “The impact of virtual, augmented and mixed reality technologies on the customer experience”. In: Journal of Business Research 100, pp. 547–560. doi: https://doi.org/10.1016/j.jbusres.2018.10.050. Fogliatto, Flavio Sanson and Michel Jose Anzanello (2011). “Learning curves: the state of the art and research directions”. In: Learning Curves: Theory, Models, and Applications. Ed. by Mohamad Y. Jaber. Boca Raton, FL: CRC Press, pp. 3–22. Franzke, Torsten, Eric H. Grosse, Christoph H. Glock, and Ralf Elbert (2017). “An investigation of the effects of storage assignment and picker routing on the occurrence of picker
202
Bibliography
blocking in manual picker-to-parts warehouses”. In: The International Journal of Logistics Management 28(3), pp. 841–863. doi: https://doi.org/10.1108/IJLM-04-2016-0095. Fülbier, Rolf Uwe (2004). “Wissenschaftstheorie und Betriebswirtschaftslehre”. In: WiSt – Wirtschaftswissenschaftliches Studium 33(5), pp. 4–6. Ganier, Franck, Charlotte Hoareau, and Jacques Tisseau (2014). “Evaluation of procedural learning transfer from a virtual environment to a real situation: a case study on tank maintenance training”. In: Ergonomics 57(6), pp. 828–843. doi: https://doi.org/10.1080/ 00140139.2014.899628. Gastwirth, Joseph L., Yulia R. Gel, and Weiwen Miao (2009). “The Impact of Levene’s Test of Equality of Variances on Statistical Theory and Practice”. In: Statistical Science 24(3), pp. 343–360. doi: https://doi.org/10.1214/09-STS301. Gavish, Nirit, Teresa Gutiérrez, Sabine Webel, Jorge Rodríguez, Matteo Peveri, Uli Bockholt, and Franco Tecchia (2015). “Evaluating virtual reality and augmented reality training for industrial maintenance and assembly tasks”. In: Interactive Learning Environments 23(6), pp. 778–798. doi: https://doi.org/10.1080/10494820.2013.815221. Geismar, H. Neil, Milind Dawande, B. P.S. Murthi, and Chelliah Sriskandarajah (2015). “Maximizing Revenue Through Two-Dimensional Shelf-Space Allocation”. In: Production and Operations Management 24(7), pp. 1148–1163. https://doi.org/10.1111/poms.12316. Ghasemi, Asghar and Saleh Zahediasl (2012). “Normality tests for statistical analysis: A guide for non-statisticians”. In: International Journal of Endocrinology and Metabolism 10(2), pp. 486–489. doi: https://doi.org/10.5812/ijem.3505. Gils, Teun van, Katrien Ramaekers, An Caris, and Mario Cools (2017). “The use of time series forecasting in zone order picking systems to predict order pickers’ workload”. In: International Journal of Production Research 55(21), pp. 6380–6393. doi: https://doi.org/ 10.1080/00207543.2016.1216659. Gils, Teun van, Katrien Ramaekers, An Caris, and René B.M. de Koster (2018). “Designing efficient order picking systems by combining planning problems: State-of-the-art classification and review”. In: European Journal of Operational Research 267(1), pp. 1–15. doi: https://doi.org/10.1016/j.ejor.2017.09.002. Glock, C.H., E.H. Grosse, T. Kim, W.P. Neumann, and A. Sobhani (2019a). “An integrated cost and worker fatigue evaluation model of a packaging process”. In: International Journal of Production Economics 207, pp. 107–124. doi: https://doi.org/10.1016/j.ijpe.2018.09.022. Glock, Christoph H. (2012). “Single sourcing versus dual sourcing under conditions of learning”. In: Computers & Industrial Engineering 62(1), pp. 318–328. doi: https://doi.org/10. 1016/j.cie.2011.10.002. Glock, Christoph H., Eric H. Grosse, Mohamad Y. Jaber, and Timothy L. Smunt (2019b). “Applications of learning curves in production and operations management: A systematic literature review”. In: Computers & Industrial Engineering 131, pp. 422–441. doi: https:// doi.org/10.1016/j.cie.2018.10.030. Glock, Christoph H. and Mohamad Y. Jaber (2014). “A group learning curve model with and without worker turnover”. In: Journal of Modelling in Management 9(2), pp. 179–199. doi: https://doi.org/10.1108/JM2-05-2013-0018. Glock, Christoph H., Mohamad Y. Jaber, and Saeed Zolfaghari (2012). “Production planning for a ramp-up process with learning in production and growth in demand”. In: International Journal of Production Research 50(20), pp. 5707–5718. doi: https://doi.org/10. 1080/00207543.2011.616549.
Bibliography
203
Gong, Yeming and René B. M. de Koster (2011). “A review on stochastic models and analysis of warehouse operations”. In: Logistics Research 3(4), pp. 191–205. doi: https://doi.org/ 10.1007/s12159-011-0057-6. Goodman, Steven N. and Jesse A. Berlin (1994). “The use of predicted confidence intervals when planning experiments and the misuse of power when interpreting results”. In: Annals of Internal Medicine 121(3), pp. 200–206. doi: https://doi.org/10.7326/0003-4819-121-3199408010-00008. Görge, Dominik (2020). “Systematische Literaturrecherche zu den Einsatzmöglichkeiten moderner VR-Brillen in der manuellen Kommissionierung”. Master Thesis. Technische Universität Darmstadt, Fachgebiet Unternehmensführung und Logistik. Grajewski, Damian, Filip Górski, Przemysaw Zawadzki, and Adam Hamrol (2013). “Application of virtual reality techniques in design of ergonomic manufacturing workplaces”. In: Procedia Computer Science 25, pp. 289–301. doi: https://doi.org/10.1016/j.procs.2013.11. 035. Grau, Lukas (2020). “Kritische Analyse eines bestehenden Versuchsaufbaus auf Basis wissenschaftlicher Grundlagen zum Design experimenteller Studien”. Studienarbeit. Technische Universität Darmstadt, Fachgebiet Unternehmensführung und Logistik. Grechkin, Timofey Y., Tien Dat Nguyen, Jodie M. Plumert, James F. Cremer, and Joseph K. Kearney (2010). “How does presentation method and measurement protocol affect distance estimation in real and virtual environments?” In: ACM Transactions on Applied Perception 7(4). doi: https://doi.org/10.1145/1823738.1823744. Green, Paul and Lisa Wei-Haas (1985). “The Rapid Development of User Interfaces: Experience with the Wizard of OZ Method”. In: Proceedings of the Human Factors Society Annual Meeting 29(5), pp. 470–474. doi: https://doi.org/10.1177/154193128502900515. Grier, Rebecca A (2015). “How High Is High? A Meta-Analysis of Nasa-Tlx GlobalWorkload Scores”. In: 59th Annual Meeting of the Human Factors and Ergonomics Society, pp. 1727– 1731. doi: https://doi.org/10.1177/1541931215591373. Grosse, Eric H. and Christoph H. Glock (2013). “An experimental investigation of learning effects in order picking systems”. In: Journal of Manufacturing Technology Management 24(6), pp. 850–872. doi: https://doi.org/10.1108/JMTM-03-2012-0036. Grosse, Eric H. and Christoph H. Glock (2015). “The effect of worker learning on manual order picking processes”. In: International Journal of Production Economics 170, pp. 882– 890. doi: https://doi.org/10.1016/j.ijpe.2014.12.018. Grosse, Eric H., Christoph H. Glock, and Mohamad Y. Jaber (2013). “The effect of worker learning and forgetting on storage reassignment decisions in order picking systems”. In: Computers & Industrial Engineering 66(4), pp. 653–662. doi: https://doi.org/10.1016/j. cie.2013.09.013. Grosse, Eric H., Christoph H. Glock, Mohamad. Y. Jaber, and PatrickW. Neumann (2015a). “Incorporating human factors in order picking planning models: framework and research opportunities”. In: International Journal of Production Research 53(3), pp. 695–717. doi: https://doi.org/10.1080/00207543.2014.919424. Grosse, Eric H., Christoph H. Glock, and Sebastian Müller (2015b). “Production economics and the learning curve: A meta-analysis”. In: International Journal of Production Economics 170, pp. 401–412. doi: https://doi.org/10.1016/j.ijpe.2015.06.021.
204
Bibliography
Grosse, Eric H., Christoph H. Glock, and W. Patrick Neumann (2015c). “Human Factors in Order Picking System Design: A Content Analysis”. In: IFAC-PapersOnLine 48(3), pp. 320–325. doi: https://doi.org/10.1016/j.ifacol.2015.06.101. Grosse, Eric H., Christoph H. Glock, and W. Patrick Neumann (2017). “Human factors in order picking: a content analysis of the literature”. In: International Journal of Production Research 55(5), pp. 1260–1276. doi: https://doi.org/10.1080/00207543.2016.1186296. Grudzewski, Filip, Marcin Awdziej, Grzegorz Mazurek, and Katarzyna Piotrowska (2018). “Virtual reality in marketing communication – the impact on the message, technology and offer perception – empirical study”. In: Economics and Business Review 4(18)(3), pp. 36– 50. doi: https://doi.org/10.18559/ebr.2018.3.4. Gu, Jinxiang, Marc Goetschalckx, and Leon F. McGinnis (2007). “Research on warehouse operation: A comprehensive review”. In: European Journal of Operational Research 177(1), pp. 1–21. doi: https://doi.org/10.1016/j.ejor.2006.02.025. Gu, Jinxiang, Marc Goetschalckx, and Leon F. McGinnis (2010). “Research on warehouse design and performance evaluation: A comprehensive review”. In: European Journal of Operational Research 203(3), pp. 539–549. doi: https://doi.org/10.1016/j.ejor.2009.07. 031. Guizzo, Erico (2008). “Three engineers, hundreds of robots, one warehouse”. In: IEEE Spectrum 45(7), pp. 26–34. doi: https://doi.org/10.1109/MSPEC.2008.4547508. Gunasekaran, Angappa and Bulent Kobu (2007). “Performance measures and metrics in logistics and supply chain management: A review of recent literature (1995–2004) for research and applications”. In: International Journal of Production Research 45(12), pp. 2819–2840. doi: https://doi.org/10.1080/00207540600806513. Gunawan, Indra (2009). “Implementation of lean manufacturing through learning curve modelling for labour forecast”. In: International Journal of Mechanical and Mechatronics Engineering 9(10), pp. 46–52. Guo, Anhong, Xiaolong Wu, Zhengyang Shen, Thad Starner, Hannes Baumann, and Scott Gilliland (2015). “Order Picking with Head-Up Displays”. In: Computer 48(6), pp. 16–24. doi: https://doi.org/10.1109/MC.2015.166. Guo, Xiaolong, Yugang Yu, and René B.M. De Koster (2016). “Impact of required storage space on storage policy performance in a unit-load warehouse”. In: International Journal of Production Research 54(8), pp. 2405–2418. doi: https://doi.org/10.1080/00207543.2015. 1083624. Habazin, Josip, Antonia Glasnovi´c, and Ivona Bajor (2017). “Order Picking Process in Warehouse: Case Study of Dairy Industry in Croatia”. In: Promet – Traffic&Transportation 29(1), pp. 57–65. doi: https://doi.org/10.7307/ptt.v29i1.2106. Hahn, Gerald J. (1977). “The Hazards of Extrapolation in Regression Analysis”. In: Journal of Quality Technology 9(4), pp. 159–165. doi: https://doi.org/10.1080/00224065.1977. 11980791. Hart, A. (2001). “Mann-Whitney test is not just a test of medians: differences in spread can be important”. In: BMJ 323, pp. 391–393. doi: https://doi.org/10.1136/bmj.323.7309.391. Hart, Sandra G. (2006). “Nasa-Task Load Index (NASA-TLX); 20 Years Later”. In: Proceedings of the Human Factors and Ergonomics Society Annual Meeting 50(9), pp. 904–908. doi: https://doi.org/10.1177/154193120605000909. Hart, Sandra G. and Lowell E. Staveland (1988). “Development of NASA-TLX (Task Load Index): Results of Empirical and Theoretical Research”. In: Human Mental Workload. Ed.
Bibliography
205
by Peter A. Hancock and Najmedin Meshkati. Amsterdam: North Holland Press, pp. 139– 183. doi: https://doi.org/10.1016/S0166-4115(08)62386-9. Hejtmanek, Lukas, Michael Starrett, Emilio Ferrer, and Arne D. Ekstrom (2020). “How Much of What We Learn in Virtual Reality Transfers to Real-World Navigation?” In: Multisensory Research 33(4–5), pp. 479–503. doi: https://doi.org/10.1163/22134808-20201445. Helfrich, Hede (2016). Wissenschaftstheorie für Betriebswirtschaftler. Wiesbaden: Springer Fachmedien Wiesbaden. doi: https://doi.org/10.1007/978-3-658-07036-6. Hertwig, Ralph and Andreas Ortmann (2001). “Experimental practices in economics: A methodological challenge for psychologists?” In: Behavioral and Brain Sciences 24(3), pp. 383–403. doi: https://doi.org/10.1017/S0140525X01004149. Herzog, Michael H., Gregory Francis, and Aaron Clarke (2019). Understanding Statistics and Experimental Design. Learning Materials in Biosciences. Cham: Springer International Publishing. doi: https://doi.org/10.1007/978-3-030-03499-3. Hinze, Jimmie and Svetlana Olbina (2009). “Empirical analysis of the learning curve principle in prestressed concrete piles”. In: Journal of Construction Engineering and Management 135(5), pp. 425–431. doi: https://doi.org/10.1061/(ASCE)CO.1943-7862.0000004. Ho, Nicholas, Pooi-MunWong, Matthew Chua, and Chee-Kong Chui (2018). “Virtual reality training for assembly of hybrid medical devices”. In: Multimedia Tools and Applications 77, pp. 30651–30682. doi: https://doi.org/10.1007/s11042-018-6216-x. Hochrein, Simon, Christoph H. Glock, Ronald Bogaschewsky, and Matthias Heider (2015). “Literature reviews in supply chain management: a tertiary study”. In: Management Review Quarterly 65(4), pp. 239–280. doi: https://doi.org/10.1007/s11301-015-0113-4. Hogle, Nancy J., William M. Briggs, and Dennis L. Fowler (2007). “Documenting a Learning Curve and Test-Retest Reliability of Two Tasks on a Virtual Reality Training Simulator in Laparoscopic Surgery”. In: Journal of Surgical Education 64(6), pp. 424–430. doi: https:// doi.org/10.1016/j.jsurg.2007.08.007. Howard, Matt C. and Melissa B. Gutworth (2020). “A meta-analysis of virtual reality training programs for social skill development”. In: Computers & Education 144, p. 103707. doi: https://doi.org/10.1016/j.compedu.2019.103707. Huang, Yifan, Haiyan Xu, Violeta Calian, and Jason C. Hsu (2006). “To permute or not to permute”. In: Bioinformatics 22(18), pp. 2244–2248. doi: https://doi.org/10.1093/ bioinformatics/btl383. Jaber, Mohamad Y. and Christoph H. Glock (2013). “A learning curve for tasks with cognitive and motor elements”. In: Computers & Industrial Engineering 64(3), pp. 866–871. doi: https://doi.org/10.1016/j.cie.2012.12.005. Jaber, M.Y., Z.S. Givi, and W.P. Neumann (2013). “Incorporating human fatigue and recovery into the learning-forgetting process”. In: Applied Mathematical Modelling 37(12–13), pp. 7287–7299. doi: https://doi.org/10.1016/j.apm.2013.02.028. Jackson, Michelle and D.R. Cox (2013). “The Principles of Experimental Design and Their Application in Sociology”. In: Annual Review of Sociology 39(1), pp. 27–49. doi: https:// doi.org/10.1146/annurevsoc-071811-145443. Jane, Chin Chia and Yih Wenn Laih (2005). “A clustering algorithm for item assignment in a synchronized zone order picking system”. In: European Journal of Operational Research 166(2), pp. 489–496. doi: https://doi.org/10.1016/j.ejor.2004.01.042. Jayaram, Sankar, Judy Vance, Rajit Gadh, Uma Jayaram, and Hari Srinivasan (2001). “Assessment of VR Technology and its Applications to Engineering Problems”. In: Journal of
206
Bibliography
Computing and Information Science in Engineering 1(1), pp. 72–83. doi: https://doi.org/ 10.1115/1.1353846. Jennett, Charlene, Anna L. Cox, Paul Cairns, Samira Dhoparee, Andrew Epps, Tim Tijs, and Alison Walton (2008). “Measuring and defining the experience of immersion in games”. In: International Journal of Human-Computer Studies 66(9), pp. 641–661. doi: https://doi. org/10.1016/j.ijhcs.2008.04.004. Jolmes, Kilian Benedikt (2019). “Modellierung und Weiterentwicklung eines Systems zur automatisierten Erfassung logistischer Kennzahlen der manuellen Kommissionierung in der Virtual Reality”. Master Thesis. Technische Universität Darmstadt, Fachgebiet Unternehmensführung und Logistik. Jong, J. R. (1957). “The Effects Of Increasing Skill On Cycle Time And Its Consequences For Time Standards”. In: Ergonomics 1(1), pp. 51–60. doi: https://doi.org/10. 1080/00140135708964571. Juliano, Julia M. and Sook Lei Liew (2020). “Transfer of motor skill between virtual reality viewed using a head-mounted display and conventional screen environments”. In: Journal of NeuroEngineering and Rehabilitation 17(1), pp. 1–13. doi: https://doi.org/10.1186/ s12984-020-00678-2. Jung, Hans (2016). Allgemeine Betriebswirtschaftslehre. 13., aktualisierte Auflage. Berlin, Boston: De Gruyter Oldenbourg. Katok, Elena (2012). “Using laboratory experiments to build better operations management models”. In: Foundations and Trends in Technology, Information and Operations Management 5(1), pp. 1–88. doi: https://doi.org/10.1561/0200000022. Kazemi, Nima, Ehsan Shekarian, Leopoldo Eduardo Cárdenas-Barrón, and Ezutah Udoncy Olugu (2015). “Incorporating human learning into a fuzzy EOQ inventory model with backorders”. In: Computers & Industrial Engineering 87, pp. 540–542. doi: https://doi. org/10.1016/j.cie.2015.05.014. Kelly, Jonathan W., Lucia A. Cherep, and Zachary D. Siegel (2017). “Perceived space in the HTC vive”. In: ACM Transactions on Applied Perception 15(1), pp. 1–16. doi: https://doi. org/10.1145/3106155. Kelly, Jonathan W., Brenna C. Klesel, and Lucia A. Cherep (2019). “Visual stabilization of balance in virtual reality using the HTC vive”. In: ACM Transactions on Applied Perception 16(2). doi: https://doi.org/10.1145/3313902. Kennedy, Robert S, Norman E Lane, S Kevin, and Michael G Lilienthal (1993). “The International Journal of Aviation Psychology Simulator Sickness Questionnaire: An Enhanced Method for Quantifying Simulator Sickness”. In: The International Journal of Aviation Psychology 3(3), pp. 203–220. doi: https://doi.org/10.1207/s15327108ijap0303. Kern, Christian and Robert Refflinghaus (2013). “Cross-disciplinary method for predicting and reducing human error probabilities in manual assembly operations”. In: Total Quality Management & Business Excellence 24(7–8), pp. 847–858. doi: https://doi.org/10.1080/ 14783363.2012.669549. Kim, Mi Jeong, Xiangyu Wang, Peter E.D. Love, Heng Li, and Shih Chung Kang (2013). “Virtual reality for the built environment: A critical review of recent advances”. In: Journal of Information Technology in Construction 18, pp. 279–305. url: http://www.itcon.org/ 2013/14.
Bibliography
207
Kim, Youngjun, Hannah Kim, and Yong Oock Kim (2017). “Virtual reality and augmented reality in plastic surgery: A review”. In: Archives of Plastic Surgery 44(3), pp. 179–187. doi: https://doi.org/10.5999/aps.2017.44.3.179. Kitchenham, Barbara and Stuart Charters (2007). Guidelines for performing Systematic Literature reviews in Software Engineering Version 2.3. EBSE Technical Report EBSE2007-01. url: http://www.academia.edu/download/35830450/2_143465389588742151. pdf (visited on August 30, 2020). Klippel, Alexander, Jiayan Zhao, Kathy Lou Jackson, Peter La Femina, Chris Stubbs, Ryan Wetzel, Jordan Blair, Jan Oliver Wallgrün, and Danielle Oprean (2019). “Transforming Earth Science Education Through Immersive Experiences: Delivering on a Long Held Promise”. In: Journal of Educational Computing Research 57(7), pp. 1745–1771. doi: https://doi.org/10.1177/0735633119854025. Knigge, Jan-Karl (2020a). Virtual Reality in Manual Order Picking – Results of the experimental study. TUdatalib dataset. doi: https://doi.org/10.25534/tudatalib-309.4. Knigge, Jan-Karl (2020b). Virtual Reality in Manual Order Picking – Software. TUdatalib dataset. doi: https://doi.org/10.25534/tudatalib-317. Knigge, Jan-Karl (2020c). Virtual Reality in Manual Order Picking – Statistical analyses. TUdatalib dataset. doi: https://doi.org/10.25534/tudatalib-312.3. Knigge, Jan-Karl (2020d). Virtual Reality in Manual Order Picking – Systematic Literature Review Data. TUdatalib dataset. doi: https://doi.org/10.25534/tudatalib-311.3. Könemann, R, T Bosch, I Kingma, J.H. Van Dieën, and M.P. De Looze (2015). “Effect of horizontal pick and place locations on shoulder kinematics”. In: Ergonomics 58(2), pp. 195– 207. doi: https://doi.org/10.1080/00140139.2014.968636. Korves, B. and M. Loftus (2000). “Designing an immersive virtual reality interface for layout planning”. In: Journal of Materials Processing Technology 107(1–3), pp. 425–430. doi: https://doi.org/10.1016/S0924-0136(00)00717-2. Koster, René de, Tho Le-Duc, and Kees Jan Roodbergen (2007). “Design and control of warehouse order picking: A literature review”. In: European Journal of Operational Research 182(2), pp. 481–501. doi: https://doi.org/10.1016/j.ejor.2006.07.009. Kourtesis, Panagiotis, Simona Collina, Leonidas A. A. Doumas, and Sarah E. MacPherson (2019). “Validation of the Virtual Reality Neuroscience Questionnaire: Maximum Duration of Immersive Virtual Reality Sessions Without the Presence of Pertinent Adverse Symptomatology”. In: Frontiers in Human Neuroscience 13. doi: https://doi.org/10.3389/fnhum. 2019.00417. Kozak, J. J., P. A. Hancock, E. J. Arthur, and S. T. Chrysler (1993). “Transfer of training from virtual reality”. In: Ergonomics 36(7), pp. 777–784. doi: https://doi.org/10.1080/ 00140139308967941. Krawczyk, Michal (2011). “What brings your subjects to the lab? A field experiment”. In: Experimental Economics 14(4), pp. 482–489. doi: https://doi.org/10.1007/s10683-0119277-5. Krebs, Dagmar (2012). “The impact of response format on attitude measurement”. In: Methods, Theories, and Empirical Applications in the Social Sciences. Ed. by Samuel Salzborn, Eldad Davidov, and Jost Reinecke.Wiesbaden: VS Verlag für Sozialwissenschaften, pp. 105–113. doi: https://doi.org/10.1007/978-3-531-18898-0_14. Kruglikova, Irina, Teodor P. Grantcharov, Asbjorn M. Drewes, and Peter Funch-Jensen (2010). “Assessment of early learning curves among nurses and physicians using a high-fidelity vir-
208
Bibliography
tualreality colonoscopy simulator”. In: Surgical Endoscopy 24(2), pp. 366–370. doi: https:// doi.org/10.1007/s00464-009-0555-7. Kwon, Chongsan (2019). “Verification of the possibility and effectiveness of experiential learning using HMD-based immersive VR technologies”. In: Virtual Reality 23(1), pp. 101– 118. doi: https://doi.org/10.1007/s10055-018-0364-1. Laver, Kate E., Stacey George, Susie Thomas, Judith E. Deutsch, and Maria Crotty (2015). “Virtual reality for stroke rehabilitation”. In: Cochrane Database of Systematic Reviews (2). doi: https://doi.org/10.1002/14651858.CD008349.pub3. Lawson, Glyn, Davide Salanitri, and BrianWaterfield (2016). “Future directions for the development of virtual reality within an automotive manufacturer”. In: Applied Ergonomics 53, pp. 323–330. doi: https://doi.org/10.1016/j.apergo.2015.06.024. Laycock, S. D. and A. M. Day (2007). “A survey of haptic rendering techniques”. In: Computer Graphics Forum 26(1), pp. 50–65. doi: https://doi.org/10.1111/j.1467-8659. 2007.00945.x. Lee, Eun Young, Mn Kyu Kim, and Yoon Seok Chang (2016). “Development of an Advanced Picking Station Considering Human Factors”. In: Human Factors and Ergonomics in Manufacturing & Service Industries 26(6), pp. 700–712. doi: https://doi.org/10.1002/hfm.20669. Lee, Joo Ae, Yoon Seok Chang, Hyun-Jin Shim, and Sung-Je Cho (2015). “A Study on the Picking Process Time”. In: Procedia Manufacturing 3, pp. 731–738. doi: https://doi.org/ 10.1016/j.promfg.2015.07.316. Lin, Chiuhsiang J. and Bereket H. Woldegiorgis (2017). “Egocentric distance perception and performance of direct pointing in stereoscopic displays”. In: Applied Ergonomics 64, pp. 66–74. doi: https://doi.org/10.1016/j.apergo.2017.05.007. Lin, Chiuhsiang Joe, Betsha Tizazu Abreham, Dino Caesaron, and Bereket Haile Woldegiorgis (2020). “Exocentric distance judgment and accuracy of head-mounted and stereoscopic widescreen displays in frontal planes”. In: Applied Sciences 10(4). doi: https://doi.org/10. 3390/app10041427. Lin, Chiuhsiang Joe, Bereket H. Woldegiorgis, and Dino Caesaron (2014). “Distance estimation of near-field visual objects in stereoscopic displays”. In: Journal of the Society for Information Display 22(7), pp. 370–379. doi: https://doi.org/10.1002/jsid.269. Lin, Chiuhsiang Joe and Bereket Haile Woldegiorgis (2015). “Interaction and visual performance in stereoscopic displays: A review”. In: Journal of the Society for Information Display 23(7), pp. 319–332. doi: https://doi.org/10.1002/jsid.378. Longo, Francesco (2011). “Advances of modeling and simulation in supply chain and industry”. In: Simulation 87(8), pp. 651–656. doi: https://doi.org/10.1177/0037549711418033. Loucks, Laura, Carly Yasinski, Seth D. Norrholm, Jessica Maples-Keller, Loren Post, Liza Zwiebach, Devika Fiorillo, Megan Goodlin, Tanja Jovanovic, Albert A. Rizzo, and Barbara O. Rothbaum (2019). “You can do that?!: Feasibility of virtual reality exposure therapy in the treatment of PTSD due to military sexual trauma”. In: Journal of Anxiety Disorders 61, pp. 55–63. doi: https://doi.org/10.1016/j.janxdis.2018.06.004. Magdalon, Eliane C., Stella M. Michaelsen, Antonio A. Quevedo, and Mindy F. Levin (2011). “Comparison of grasping movements made by healthy subjects in a 3-dimensional immersive virtual versus physical environment”. In: Acta Psychologica 138(1), pp. 126–134. doi: https://doi.org/10.1016/j.actpsy.2011.05.015. Makhlouf, Rami (2018). “Der Vergleich einer Pick-by- Voice-Kommissionierung in der virtuellen Realität mit der echten Welt – Eine Analyse im Rahmen einer Versuchsreihe
Bibliography
209
mit der Fragebogen-Methode”. Studienarbeit. Technische Universität Darmstadt, Fachgebiet Unternehmensführung und Logistik. Manis, Kerry T. and Danny Choi (2019). “The virtual reality hardware acceptance model (VRHAM): Extending and individuating the technology acceptance model (TAM) for virtual reality hardware”. In: Journal of Business Research 100, pp. 503–513. doi: https://doi. org/10.1016/j.jbusres.2018.10.021. Marchet, Gino, Marco Melacini, and Sara Perotti (2015). “Investigating order picking system adoption: a case-study-based approach”. In: International Journal of Logistics Research and Applications 18(1), pp. 82–98. doi: https://doi.org/10.1080/13675567.2014.945400. Maria, Anu (1997). “Introduction to modeling and simulation”. In: Proceedings of the 29th conference on Winter simulation – WSC ’97. Ed. by Sigrún Andradóttir, Kevin J. Healy, David H. Withers, and Barry L. Nelson. New York, New York, USA: ACM Press, pp. 7–13. doi: https://doi.org/10.1145/268437.268440. Marsaglia, George, Wai Wan Tsang, and Jingbo Wang (2003). “Evaluating Kolmogorov’s Distribution”. In: Journal of Statistical Software 8(18). doi: https://doi.org/10.18637/jss. v008.i18. Martínez-Navarro, Jesus, Enrique Bigné, Jaime Guixeres, Mariano Alcañiz, and Carmen Torrecilla (2019). “The influence of virtual reality in e-commerce”. In: Journal of Business Research 100, pp. 475–482. doi: https://doi.org/10.1016/j.jbusres.2018.10.054. Masae, Makusee, Christoph H. Glock, and Eric H. Grosse (2020). “Order picker routing in warehouses: A systematic literature review”. In: International Journal of Production Economics 224, p. 107564. doi: https://doi.org/10.1016/j.ijpe.2019.107564.173. Mazur, James E. and Reid Hastie (1978). “Learning as accumulation: A reexamination of the learning curve”. In: Psychological Bulletin 85(6), pp. 1256–1274. doi: https://doi.org/10. 1037/0033-2909.85.6.1256. McComas, Joan, Jayne Pivik, and Marc Laflamme (1998). “Children’s Transfer of Spatial Learning from Virtual Reality to Real Environments”. In: CyberPsychology & Behavior 1(2), pp. 121–128. doi: https://doi.org/10.1089/cpb.1998.1.121. Melacini, Marco, Sara Perotti, and Angela Tumino (2011). “Development of a framework for pickand-pass order picking system design”. In: The International Journal of Advanced Manufacturing Technology 53, pp. 841–854. doi: https://doi.org/10.1007/s00170-010-2881-2. Menck, N., X. Yang, C. Weidig, P. Winkes, C. Lauer, H. Hagen, B. Hamann, and J.C. Aurich (2012). “Collaborative Factory Planning in Virtual Reality”. In: Procedia CIRP 3, pp. 317– 322. doi: https://doi.org/10.1016/j.procir.2012.07.055. Merién, A.E.R., J. van de Ven, B.W. Mol, S. Houterman, and S.G. Oei (2010). “Multidisciplinary Team Training in a Simulation Setting for Acute Obstetric Emergencies”. In: Obstetrics & Gynecology 115(5), pp. 1021–1031. Micaroni, Lorenzo, Marina Carulli, Francesco Ferrise, Alberto Gallace, and Monica Bordegoni (2019). “An Olfactory Display to Study the Integration of Vision and Olfaction in a Virtual Reality Environment”. In: Journal of Computing and Information Science in Engineering 19(3), p. 031015. doi: https://doi.org/10.1115/1.4043068. Milgram, Paul, Haruo Takemura, Akira Utsumi, and Fumio Kishino (1995). “Augmented reality: a class of displays on the reality-virtuality continuum”. In: Telemanipulator and Telepresence Technologies. Ed. by Hari Das. Vol. 2351, pp. 282–292. doi: https://doi.org/ 10.1117/12.197321.
210
Bibliography
Miller, Samuel A, Noah J Misch, and Aaron J Dalton (2005). “Low-Cost, Portable, MultiWall Virtual Reality”. In: Immersive Projection Technology and Eurographics Virtual Environment Workshop 2005. url: https://ntrs.nasa.gov/citations/20050240930 (visited on September 2, 2020). Mol, Jantsje M. (2019). “Goggles in the lab: Economic experiments in immersive virtual environments”. In: Journal of Behavioral and Experimental Economics 79, pp. 155–164. doi: https://doi.org/10.1016/j.socec.2019.02.007. Monnier, Patrick (2011). “Color heterogeneity in visual search”. In: Color Research and Application 36(2), pp. 101–110. doi: https://doi.org/10.1002/col.20593. Montero, Ignacio and Orfelio G. León (2007). “A guide for naming research studies in Psychology”. In: International Journal of Clinical and Health Psychology 7(3), pp. 847–862. Nalivaiko, Eugene, Simon L. Davis, Karen L. Blackmore, Andrew Vakulin, and Keith V. Nesbitt (2015). “Cybersickness provoked by head-mounted display affects cutaneous vascular tone, heart rate and reaction time”. In: Physiology & Behavior 151, pp. 583–590. doi: https://doi.org/10.1016/j.physbeh.2015.08.043. Nanjappan, Vijayakumar, Hai-Ning Liang, Feiyu Lu, Konstantinos Papangelis, Yong Yue, and Ka Lok Man (2018). “User-elicited dual-hand interactions for manipulating 3D objects in virtual reality environments”. In: Human-centric Computing and Information Sciences 8(1), p. 31. doi: https://doi.org/10.1186/s13673-018-0154-5. Napolitano, Maida (2012). “2012 Warehouse/DC Operations Survey: Mixed signals.” In: Modern Materials Handling 48(11), pp. 48–56. Nash, John C (2019). nlsr Background, Development, Examples and Discussion. url: https:// rdrr.io/cran/nlsr/f/inst/doc/nlsr-devdoc.pdf (visited on May 15, 2020). Nembhard, David A. and Mustafa V. Uzumeri (2000a). “An individual-based description of learning within an organization”. In: IEEE Transactions on Engineering Management 47(3), pp. 370–378. doi: https://doi.org/10.1109/17.865905. Nembhard, David A. and Mustafa V. Uzumeri (2000b). “Experiential learning and forgetting for manual and cognitive tasks”. In: International Journal of Industrial Ergonomics 25(4), pp. 315–326. doi: https://doi.org/10.1016/S0169-8141(99)00021-9. Neuman, W. Lawrence (2014). Social Research Methods: Qualitative and Quantitative Approaches. Seventh edition. Harlow: Pearson Education Limited. doi: https://doi.org/ 10.2307/3211488. Neumann, W. P. and J. Village (2012). “Ergonomics action research II: A framework for integrating HF into work system design”. In: Ergonomics 55(10), pp. 1140–1156. doi: https:// doi.org/10.1080/00140139.2012.706714. Nickel, Courtney, Carolyn Knight, Aaron Langille, and Alison Godwin (2019). “How much practice is required to reduce performance variability in a virtual reality mining simulator?” In: Safety 5(2). doi: https://doi.org/10.3390/safety5020018. Niehorster, Diederick C., Li Li, and Markus Lappe (2017). “The accuracy and precision of position and orientation tracking in the HTC vive virtual reality system for scientific research”. In: i-Perception 8(3), pp. 1–23. doi: https://doi.org/10.1177/ 2041669517708205. O’Connor, Shawn M. and Arthur D. Kuo (2009). “Direction-dependent control of balance during walking and standing”. In: Journal of Neurophysiology 102(3), pp. 1411–1419. doi: https://doi.org/10.1152/jn.00131.2009.
Bibliography
211
Ottosson, Stig (2002). “Virtual reality in the product development process”. In: Journal of Engineering Design 13(2), pp. 159–172. doi: https://doi.org/10.1080/ 09544820210129823. Paes, Daniel, Eduardo Arantes, and Javier Irizarry (2017). “Immersive environment for improving the understanding of architectural 3D models: Comparing user spatial perception between immersive and traditional virtual reality systems”. In: Automation in Construction 84, pp. 292–303. doi: https://doi.org/10.1016/j.autcon.2017.09.016. Palfrey, Thomas and Robert Porter (1991). “Guidelines for Submission of Manuscripts on Experimental Economics”. In: Econometrica 59(4), pp. 1197–1198. Pan, Jason Chao Hsien and Ming Hung Wu (2012). “Throughput analysis for order picking system with multiple pickers and aisle congestion considerations”. In: Computers and Operations Research 39(7), pp. 1661–1672. doi: https://doi.org/10.1016/j.cor.2011.09.022. Panasonic (2020). Datasheet: Cylindrical Photoelectric Sensor CY-100 Series. url: https:// www3.panasonic.biz/ac/e_download/fasys/sensor/photoelectric/catalog/cy-100_e_cata. pdf (visited on January 24, 2020). Pastel, Stefan, Chien-Hsi Chen, Luca Martin, Mats Naujoks, Katharina Petri, and Kerstin Witte (2020). “Comparison of gaze accuracy and precision in real-world and virtual reality”. In: Virtual Reality. doi: https://doi.org/10.1007/s10055-020-00449-3. Peck, Tabitha C., Laura E. Sockol, and Sarah M. Hancock (2020). “Mind the Gap: The Underrepresentation of Female Participants and Authors in Virtual Reality Research”. In: IEEE Transactions on Visualization and Computer Graphics 26(5), pp. 1945–1954. doi: https:// doi.org/10.1109/TVCG.2020.2973498. Pedroli, Elisa, Pietro Cipresso, Luca Greci, Sara Arlati, Lorenzo Boilini, Laura Stefanelli, Monica Rossi, Karine Goulene, Marco Sacco, Marco Stramba-Badiale, Andrea Gaggioli, and Giuseppe Riva (2019). “An Immersive Motor Protocol for Frailty Rehabilitation”. In: Frontiers in Neurology 10, p. 1078. doi: https://doi.org/10.3389/fneur.2019.01078. Peron, Mirco, Giuseppe Fragapane, Fabio Sgarbossa, and Michael Kay (2020). “Digital Facility Layout Planning”. In: Sustainability 12(8), p. 3349. doi: https://doi.org/10.3390/ su12083349. Petersen, Charles G. and Gerald Aase (2004). “A comparison of picking, storage, and routing policies in manual order picking”. In: International Journal of Production Economics 92(1), pp. 11–19. doi: https://doi.org/10.1016/j.ijpe.2003.09.006. Petersen, Charles G., Charles Siu, and Daniel R. Heiser (2005). “Improving order picking performance utilizing slotting and golden zone storage”. In: International Journal of Operations & Production Management 25(10), pp. 997–1012. doi: https://doi.org/10.1108/ 01443570510619491. Peukert, Christian, Jella Pfeiffer, Martin Meißner, Thies Pfeiffer, and Christof Weinhardt (2019). “Shopping in Virtual Reality Stores: The Influence of Immersion on System Adoption”. In: Journal of Management Information Systems 36(3), pp. 755–788. doi: https:// doi.org/10.1080/07421222.2019.1628889. Pollard, Kimberly A., Ashley H. Oiknine, Benjamin T. Files, Anne M. Sinatra, Debbie Patton, Mark Ericson, Jerald Thomas, and Peter Khooshabeh (2020). “Level of immersion affects spatial learning in virtual environments: results of a three-condition within-subjects study with long intersession intervals”. In: Virtual Reality 24, pp. 783–796. doi: https://doi.org/ 10.1007/s10055-019-00411-y.
212
Bibliography
Pontonnier, Charles, Georges Dumont, Asfhin Samani, Pascal Madeleine, and Marwan Badawi (2014). “Designing and evaluating a workstation in real and virtual environment: Toward virtual reality based ergonomic design sessions”. In: Journal on Multimodal User Interfaces 8(2), pp. 199–208. doi: https://doi.org/10.1007/s12193-013-0138-8. Popper, Karl (1989). Logik der Forschung. Neunte, verbesserte Auflage. Tübingen: Mohr, J.C.B. J.C.B. Portman, M.E., A. Natapov, and D. Fisher-Gewirtzman (2015). “To go where no man has gone before: Virtual reality in architecture, landscape architecture and environmental planning”. In: Computers, Environment and Urban Systems 54, pp. 376–384. doi: https:// doi.org/10.1016/j.compenvurbsys.2015.05.001. Pulijala, Y., M. Ma, M. Pears, D. Peebles, and A. Ayoub (2018). “An innovative virtual reality training tool for orthognathic surgery”. In: International Journal of Oral and Maxillofacial Surgery 47(9), pp. 1199–1205. doi: https://doi.org/10.1016/j.ijom.2018.01.005. Reif, Moritz (2020). “Lerneffekte bei der virtuellen Kommissionierung”. Master Thesis. Technische Universität Darmstadt, Fachgebiet Unternehmensführung und Logistik. Reif, Rupert and Dennis Walch (2008). “Augmented & Virtual Reality applications in the field of logistics”. In: Visual Computer 24(11), pp. 987–994. doi: https://doi.org/10.1007/ s00371-008-0271-7. Rizzo, Albert and Gerard Jeunghyun Kim (2005). “A SWOT analysis of the field of virtual reality rehabilitation and therapy”. In: Presence: Teleoperators and Virtual Environments 14(2), pp. 119–146. doi: https://doi.org/10.1162/1054746053967094. Roodbergen, Kees Jan and René De Koster (2001). “Routing methods for warehouses with multiple cross aisles”. In: International Journal of Production Research 39(9), pp. 1865– 1883. doi: https://doi.org/10.1080/00207540110028128. Roodbergen, Kees Jan, Iris F A Vis, and G. Don Taylor (2015). “Simultaneous determination of warehouse layout and control policies”. In: International Journal of Production Research 53(11), pp. 3306–3326. doi: https://doi.org/10.1080/00207543.2014.978029. Roodbergen, Kees Jan and Iris F.A. Vis (2009). “A survey of literature on automated storage and retrieval systems”. In: European Journal of Operational Research 194(2), pp. 343–362. doi: https://doi.org/10.1016/j.ejor.2008.01.038. Rouwenhorst, B., B. Reuter, V. Stockrahm, G.J. van Houtum, R.J. Mantel, and W.H.M. Zijm (2000). “Warehouse design and control: Framework and literature review”. In: European Journal of Operational Research 122(3), pp. 515–533. doi: https://doi.org/10.1016/S03772217(99)00020-X. Royston, J. P. (1982a). “Algorithm AS 181: The W Test for Normality”. In: Applied Statistics 31(2), pp. 176–180. doi: https://doi.org/10.2307/2347986. Royston, J. P. (1982b). “An Extension of Shapiro and Wilk’s W Test for Normality to Large Samples”. In: Applied Statistics 31(2), p. 115. doi: https://doi.org/10.2307/2347973. Royston, Patrick (1991). “Estimating departure from normality”. In: Statistics in Medicine 10(8), pp. 1283–1293. doi: https://doi.org/10.1002/sim.4780100811. Ruxton, Graeme D. (2006). “The unequal variance t-test is an underused alternative to Student’s t-test and the Mann-Whitney U test”. In: Behavioral Ecology 17(4), pp. 688–690. doi: https://doi.org/10.1093/beheco/ark016. Ryan, Thomas P. (2007). Modern Experimental Design. Hoboken, New Jersey: John Wiley & Sons Inc.
Bibliography
213
Sakhare, Ashwin R., Vincent Yang, Joy Stradford, Ivan Tsang, Roshan Ravichandran, and Judy Pa (2019). “Cycling and Spatial Navigation in an Enriched, Immersive 3D Virtual Park Environment: A Feasibility Study in Younger and Older Adults”. In: Frontiers in Aging Neuroscience 11, p. 218. doi: https://doi.org/10.3389/fnagi.2019.00218. Schultheis, Maria T and Albert a Rizzo (2001). “The application of virtual reality technology in rehabilitation.” In: Rehabilitation Psychology 46(3), pp. 296–311. doi: https://doi.org/ 10.1037/0090-5550.46.3.296. Schwerdtfeger, Björn, Rupert Reif, Willibald A. Günthner, and Gudrun Klinker (2011). “Pickby-vision: there is something to pick at the end of the augmented tunnel”. In: Virtual Reality 15, pp. 213–223. doi: https://doi.org/10.1007/s10055-011-0187-9. Seidel, Constantin Julian (2019). “Zwischen den Welten – Durchführung und Auswertung von Versuchen zur manuellen Kommissionierung in der virtuellen Realität”. Master Thesis. Technische Universität Darmstadt, Fachgebiet Unternehmensführung und Logistik. Selvander, Madeleine and Peter Åsman (2012). “Virtual reality cataract surgery training: Learning curves and concurrent validity”. In: Acta Ophthalmologica 90(5), pp. 412–417. doi: https://doi.org/10.1111/j.1755-3768.2010.02028.x. Seuring, Stefan and Stefan Gold (2012). “Conducting content-analysis based literature reviews in supply chain management”. In: Supply Chain Management: An International Journal 17(5), pp. 544–555. doi: https://doi.org/10.1108/13598541211258609. Shadish, William R., Thomas D. Cook, and Donald T. Campbell (2002). Experimental and Designs for Generalized Causal Inference. Boston: Houghton Mifflin. Shapiro, S. S. and M. B. Wilk (1965). “An Analysis of Variance Test for Normality (Complete Samples)”. In: Biometrika 52(3/4), p. 591. doi: https://doi.org/10.2307/2333709. Sharples, Sarah, Sue Cobb, Amanda Moody, and John R. Wilson (2008). “Virtual reality induced symptoms and effects (VRISE): Comparison of head mounted display (HMD), desktop and projection display systems”. In: Displays 29(2), pp. 58–69. doi: https://doi. org/10.1016/j.displa.2007.09.005. Smith, Vernon L. and James M. Walker (1993). “Monetary Rewards And Decision Cost in Experimental Economics”. In: Economic Inquiry 31(2), pp. 245–261. doi: https://doi.org/ 10.1111/j.1465-7295.1993.tb00881.x. Sowndararajan, Ajith, Rongrong Wang, and Doug A. Bowman (2008). “Quantifying the benefits of immersion for procedural training”. In: Proceedings of the 2008 workshop on Immersive projection technologies/Emerging display technologiges – IPT/EDT ’08. New York, New York, USA: ACM Press, pp. 1–4. doi: https://doi.org/10.1145/1394669.1394672. Spiess, Andrej-Nikolai and Natalie Neumeyer (2010). “An evaluation of R2 as an inadequate measure for nonlinear models in pharmacological and biochemical research: a Monte Carlo approach”. In: BMC Pharmacology 10, p. 6. doi: https://doi.org/10.1186/1471-2210-106. Starner, T., S. Mann, B. Rhodes, J. Levine, J. Healey, D. Kirsch, R. W. Picard, and A. Pentland (1997). “Augmented reality through wearable computing”. In: Presence: Teleoperators and Virtual Environments 6(4), pp. 386–398. Statista (2020). Unit shipments of virtual reality (VR) devices worldwide from 2017 to 2019 (in millions), by vendor. url: https://www.statista.com/statistics/671403/globalvirtualreality-device-shipments-by-vendor/ (visited on July 16, 2020). Staudt, Francielly Hedler, Gülgün Alpan, Maria Di Mascolo, and Carlos M. Taboada Rodriguez (2015). “Warehouse performance measurement: A literature review”. In: Inter-
214
Bibliography
national Journal of Production Research 53(18), pp. 5524–5544. https://doi.org/10.1080/ 00207543.2015.1030466. Stefanucci, Jeanine K., Sarah H. Creem-Regehr, William B. Thompson, David A. Lessard, and Michael N. Geuss (2015). “Evaluating the accuracy of size perception on screen-based displays: Displayed objects appear smaller than real objects”. In: Journal of Experimental Psychology: Applied 21(3), pp. 215–223. doi: https://doi.org/10.1037/xap0000051. Steuer, Jonathan (1992). “Defining virtual reality: dimensions determining telepresence, Communication in the age of virtual reality”. In: Journal of Communication 42(4), pp. 73–93. Stevens, Cynthia Kay (2011). “Questions to consider when selecting student samples”. In: Journal of Supply Chain Management 47(3), pp. 19–21. doi: https://doi.org/10.1111/j. 1745-493X.2011.03233.x. Stone, Robert J. (2011). “The (human) science of medical virtual learning environments”. In: Philosophical Transactions of the Royal Society B: Biological Sciences 366(1562), pp. 276–285. doi: https://doi.org/10.1098/rstb.2010.0209. Sun, Da, Andrey Kiselev, Qianfang Liao, Todor Stoyanov, and Amy Loutfi (2020). “A New Mixed- Reality-Based Teleoperation System for Telepresence and Maneuverability Enhancement”. In: IEEE Transactions on Human-Machine Systems 50(1), pp. 55–67. doi: https://doi.org/10.1109/THMS.2019.2960676. Sutherland, I.E. (1965). “The ultimate display”. In: Proceedings of the IFIPS Congress 65(2), pp. 506–508. Syed-Abdul, Shabbir, Shwetambara Malwade, Aldilas Achmad Nursetyo, Mishika Sood, Madhu Bhatia, Diana Barsasella, Megan F. Liu, Chia Chi Chang, Kathiravan Srinivasan, R. Raja, and Yu Chuan Jack Li (2019). “Virtual reality among the elderly: A usefulness and acceptance study from Taiwan”. In: BMC Geriatrics 19(1), pp. 1–10. doi: https://doi.org/ 10.1186/s12877-019-1218-8. Thomas, Rodney W. (2011). “When student samples make sense in logistics research”. In: Journal of Business Logistics 32(3), pp. 287–290. doi: https://doi.org/10.1111/j.21581592.2011.01023.x. Tompkins, James A., John A. White, Yavuz A. Bozer, and J. A. Tanchoco (1996). Facilities Planning. Second edition. New York: Wiley. Tranfield, David, David Denyer, and Palminder Smart (2003). “Towards a Methodology for Developing Evidence-Informed Management Knowledge by Means of Systematic Review”. In: British Journal of Management 14(3), pp. 207–222. doi: https://doi.org/10. 1111/1467-8551.00375. Uzumeri, Mustafa and David Nembhard (1998). “A population of learners: A new way to measure organizational learning”. In: Journal of Operations Management 16(5), pp. 515– 528. doi: https://doi.org/10.1016/s0272-6963(97)00017-x. Van den Berg, Jeroen P. (1996). “Multiple Order Pick Sequencing in a Carousel System: A Solvable Case of the Rural Postman Problem”. In: Journal of the Operational Research Society 47(12), pp. 1504–1515. doi: https://doi.org/10.1057/jors.1996.194. Varalakshmi, B D, J Thriveni, K R Venugopal, and L M Patnaik (2012). “Haptics: State of the Art Survey”. In: International Journal of Computer Science Issues 9(5), pp. 234–244. Vaughan, Neil, Bodgan Gabrys, and Venketesh N. Dubey (2016). “An overview of selfadaptive technologies within virtual reality training”. In: Computer Science Review 22, pp. 65–87. doi: https://doi.org/10.1016/j.cosrev.2016.09.001.
Bibliography
215
Vélaz, Yaiza, Jorge Rodríguez Arce, Teresa Gutiérrez, Alberto Lozano-Rodero, and Angel Suescun (2014). “The Influence of Interaction Technology on the Learning of Assembly Tasks Using Virtual Reality”. In: Journal of Computing and Information Science in Engineering 14(4), p. 041007. doi: https://doi.org/10.1115/1.4028588. “Verordnung zur Anpassung der Höhe des Mindestlohns” (2016). In: Bundesgesetzblatt Teil I(Nr. 54), p. 2530. Vries, Jelle de, René de Koster, and Daan Stam (2016a). “Aligning Order Picking Methods, Incentive Systems, and Regulatory Focus to Increase Performance”. In: Production and Operations Management 25(8), pp. 1363–1376. doi: https://doi.org/10.1111/poms.12547. Vries, Jelle de, René de Koster, and Daan Stam (2016b). “Exploring the role of picker personality in predicting picking performance with pick by voice, pick to light and RF-terminal picking”. In: International Journal of Production Research 54(8), pp. 2260–2274. doi: https:// doi.org/10.1080/00207543.2015.1064184. Wachowski, Lana and Lilly Wachowski (1999). The Matrix. Movie. Waller, David and David Knapp (1998). “The Transfer of Spatial Knowledge”. In: Technology 7(2), pp. 129–143. doi: https://doi.org/10.1162/105474698565631. Warmelink, Harald, Jonna Koivisto, Igor Mayer, Mikko Vesa, and Juho Hamari (2020). “Gamification of production and logistics operations: Status quo and future directions”. In: Journal of Business Research 106, pp. 331–340. doi: https://doi.org/10.1016/j.jbusres.2018.09. 011. Weidinger, Felix, Nils Boysen, and Dirk Briskorn (2018). “Storage Assignment with RackMoving Mobile Robots in KIVA Warehouses”. In: Transportation Science 52(6), pp. 1479– 1495. doi: https://doi.org/10.1287/trsc.2018.0826. Weijters, Bert, Elke Cabooter, and Niels Schillewaert (2010). “The effect of rating scale format on response styles: The number of response categories and response category labels”. In: International Journal of Research in Marketing 27(3), pp. 236–247. doi: https://doi.org/ 10.1016/j.ijresmar.2010.02.004. Westkämper, E. and R. Von Briel (2001). “Continuous improvement and participative factory planning by computer systems”. In: CIRP Annals – Manufacturing Technology 50(1), pp. 347–352. doi: https://doi.org/10.1016/S0007-8506(07)62137-4. Wiesing, Michael, Gereon R. Fink, and Ralph Weidner (2020). “Accuracy and precision of stimulus timing and reaction times with Unreal Engine and SteamVR”. In: PLOS ONE 15(4), e0231152. doi: https://doi.org/10.1371/journal.pone.0231152. Wilcox, Rand (2017). Modern Statistics for the Social and Behavioral Sciences. Second edition. Boca Raton, London, New York: CRC Press. Wilcoxon, Frank (1947). “Individual Comparisons by Ranking Methods”. In: Biometrics Bulletin 1(6), pp. 80–83. doi: https://doi.org/10.2307/3001946. Willemsen, Peter, Mark B. Colton, Sarah H. Creem-Regehr, and William B. Thompson (2009). “The effects of head-mounted display mechanical properties and field of view on distance judgments in virtual environments”. In: ACM Transactions on Applied Perception 6(2). doi: https://doi.org/10.1145/1498700.1498702. Wilson, John R. (1997). “Virtual environments and ergonomics: Needs and opportunities”. In: Ergonomics 40(10), pp. 1057–1077. doi: https://doi.org/10.1080/001401397187603. Wilson, John R., Mirabelle D’Cruz, Sue Cobb, and Richard M. Eastgate (1996). Virtual Reality for Industrial Application: Opportunities and Limitations. Nottingham: Nottingham University Press.
216
Bibliography
Winkes, Pascal A. and Jan C. Aurich (2015). “Method for an Enhanced Assembly Planning Process with Systematic Virtual Reality Inclusion”. In: Procedia CIRP 37, pp. 152–157. doi: https://doi.org/10.1016/j.procir.2015.08.007. Witmer, Bob G., John H. Bailey, Bruce W. Knerr, and Kimberly C. Parsons (1996). “Virtual spaces and real world places: Transfer of route knowledge”. In: International Journal of Human Computer Studies 45(4), pp. 413–428. doi: https://doi.org/10.1006/ijhc.1996.0060. Witmer, Bob G., Christian J. Jerome, and Michael J. Singer (2005). “The Factor Structure of the Presence Questionnaire”. In: Presence: Teleoperators and Virtual Environments 14(3), pp. 298–312. doi: https://doi.org/10.1162/105474605323384654. Wobbrock, Jacob O., Leah Findlater, Darren Gergle, and James J. Higgins (2011). “The aligned rank transform for nonparametric factorial analyses using only anova procedures”. In: Proceedings of the 2011 annual conference on Human factors in computing systems – CHI ’11. Vancouver, BC, Canada, pp. 143–146. doi: https://doi.org/10.1145/1978942. 1978963. Wright, T. P. (1936). “Factors Affecting the Cost of Airplanes”. In: Journal of the Aeronautical Sciences 3(4), pp. 122–128. doi: https://doi.org/10.2514/8.155. Wruck, Susanne, Iris F.A. Vis, and Jaap Boter (2017). “Risk control for staff planning in e-commerce warehouses”. In: International Journal of Production Research 55(21), pp. 6453–6469. doi: https://doi.org/10.1080/00207543.2016.1207816. Wulz, Johannes R. (2008). “Menschintegrierte Simulation in der Logistik mit Hilfe der Virtuellen Realität”. PhD Thesis. Technische Universität München, Lehrstuhl für Fördertechnik Materialfluss Logistik. Xia, Pingjun (2016). “Haptics for Product Design and Manufacturing Simulation”. In: IEEE Transactions on Haptics 9(3), pp. 358–375. doi: https://doi.org/10.1109/TOH.2016. 2554551. Ye, N., P. Banerjee, A. Banerjee, and F. Dech (1999). “A comparative study of assembly planning in traditional and virtual environments”. In: IEEE Transactions on Systems, Man and Cybernetics, Part C (Applications and Reviews) 29(4), pp. 546–555. doi: https://doi. org/10.1109/5326.798768. Yiannakopoulou, Eugenia, Nikolaos Nikiteas, Despina Perrea, and Christos Tsigris (2015). “Virtual reality simulators and training in laparoscopic surgery”. In: International Journal of Surgery 13, pp. 60–64. doi: https://doi.org/10.1016/j.ijsu.2014.11.014. Yu, Yugang, René B.M. De Koster, and Xiaolong Guo (2015). “Class-Based Storage with a Finite Number of Items: Using More Classes is not Always Better”. In: Production and Operations Management 24(8), pp. 1235–1247. doi: https://doi.org/10.1111/poms.12334. Zelst, Susan van, Karel van Donselaar, Tom van Woensel, Rob Broekmeulen, and Jan Fransoo (2009). “Logistics drivers for shelf stacking in grocery retail stores: Potential for efficiency improvement”. In: International Journal of Production Economics 121(2), pp. 620–632. doi: https://doi.org/10.1016/j.ijpe.2006.06.010. Zhong, Hongsheng, Randolph W. Hall, and Maged Dessouky (2007). “Territory planning and vehicle dispatching with driver learning”. In: Transportation Science 41(1), pp. 74–89. doi: https://doi.org/10.1287/trsc.1060.0167. “Zweite Verordnung zur Anpassung der Höhe des Mindestlohns” (2018). In: Bundesgesetzblatt Teil I(Nr. 38), p. 1876.