Robotics for Intralogistics in Supermarkets and Retail Stores 3031060776, 9783031060779

This book aims at sharing knowledge about the technological opportunities and the main research challenges regarding rob

244 109 8MB

English Pages 181 [182] Year 2022

Table of contents :
Series Editor’s Foreword
Preface
Contents
Contributors
1 Robot-Assisted Instore Operations in Grocery Retailing
1.1 Introduction
1.2 Characterization of Grocery Supply Networks
1.3 Instore Logistics Systems and Challenges
1.3.1 Instore Logistics Processes
1.3.2 Instore Logistics Costs
1.3.3 Specifics of Instore Logistics Systems
1.3.4 Past Developments and Status Quo of Instore Logistics
1.3.5 Challenges for Instore Automation Approaches
1.4 Automation Approaches to Support Instore Logistics
1.4.1 Scenario 1—Store Monitoring
1.4.2 Scenario 2—Collaborative Instore Logistics
1.4.3 Scenario 3—Autonomous Shelf Filling
1.5 Managerial Challenges of Using Professional Service Robots in Retail Stores
1.6 Conclusion
References
2 Robots Collecting Data: Modelling Stores
2.1 Introduction
2.2 Conceptual Framework and Computational Problem
2.3 SemDT Data and Knowledge Model
2.4 Virtual Simulation Environment of the Semantic Digital Twin
2.5 Semantic Mapping
2.5.1 Layout Identification
2.5.2 Store Monitoring
2.6 Example Store Monitoring System
2.6.1 Mobile Robotic Platform
2.6.2 Perception
2.7 Conclusion
References
3 Robots Collecting Data: Robust Identification of Products
3.1 Introduction
3.2 Related Work
3.3 Method
3.3.1 Network Architecture
3.3.2 Triplet Loss
3.3.3 Triplet Mining
3.3.4 Training
3.4 Datasets
3.4.1 Synthetic Dataset
3.4.2 Real World Dataset
3.4.3 Test Dataset
3.4.4 Omniglot
3.5 Experiments
3.5.1 Evaluation Metric
3.5.2 Selection of Embedding Size
3.5.3 Results
3.5.4 Performance on Neural Compute Stick 2
3.6 Conclusion
References
4 Robots Working in the Backroom: Depalletization of Mixed-Case Pallets
4.1 Introduction
4.2 Gripper
4.2.1 Mechanical Structure
4.2.2 Sensors and Actuators
4.2.3 Grasping Procedure
4.3 Detection, Recognition and Localization
4.3.1 The Cases Database
4.3.2 The Detection Module
4.3.3 The Geometrical Module
4.4 Integrated Cell
4.4.1 Cell Design
4.4.2 Executive System
4.4.3 Selection and Grasping
4.5 Experiments
4.5.1 Quantitative Analysis
4.5.2 Real Supermarket Scenario
4.6 Conclusion
References
5 Robots Helping Humans: Collaborative Shelf Refilling
5.1 Introduction
5.2 Handling Module
5.3 Methodology
5.3.1 Procedure
5.3.2 Ergonomic Indices
5.4 Simulations and Ergonomic Handling Module (EHM)
5.4.1 Simulation Setup
5.4.2 Data Collection
5.4.3 Data Processing
5.5 Experiments
5.5.1 Experimental Setup and Protocol
5.5.2 Experimental Data Collection
5.5.3 Validation of the Virtual Simulations
5.5.4 Validation of the Ergonomic Indices
5.6 Conclusions
References
6 Robotic Clerks: Autonomous Shelf Refilling
6.1 Introduction
6.2 Software Architecture for Autonomous Shelf Refilling
6.3 The Sensorized Gripper
6.3.1 Tactile Sensor Electronics
6.3.2 Tactile Sensor Mechanics
6.3.3 Sensor Calibration
6.4 Reactive Control System
6.4.1 Grasp Control
6.4.2 Visual Servoing
6.5 Plan Language for In-Store Manipulation Activities
6.5.1 Library of Generalized Fetch and Deliver Plans for In-Store Manipulation
6.5.2 Integration of the Reactive Controller
6.6 Experience-Based Learning and Optimization of Manipulation Activities
6.6.1 Learning Pipeline
6.6.2 Generating Episodic Memories within a Simulator
6.6.3 Learning from Episodic Memories
6.7 Experiments
6.7.1 Replenishment Experiments
6.7.2 Evaluation of Learning Pipeline
6.8 Conclusions
References

Recommend Papers

Sensory Marketing in Retail: An Introduction to the Multisensory Nature of Retail Stores 3031475143, 9783031475146

Sensory marketing is a way to differentiate businesses from their competition while influencing customers and their beha

118 41 5MB Read more

Pop-Up Retail: The Evolution, Application and Future of Ephemeral Stores 2021006549, 2021006550, 9780367628529, 9780367628550, 9781003111092

278 31 2MB Read more

For Love and Money: Women in Retail 0930264703, 9780930264703

143 15 102MB Read more

F-16 Stores Capability

353 129 312KB Read more

Retail Management, Retail Concepts and Practices 9789350243947

160 40 62MB Read more

Restructuring of Food Retail Markets in Countries of the Global South: The Case of Emerging Supermarkets in Dhaka, Bangladesh (Handel und ... Retailing and International Marketing) 3658333146, 9783658333140

This explorative, primary data-based study provides findings on the first nearly two decades of the emerging supermarket

122 9 3MB Read more

State Estimation for Robotics

456 89 4MB Read more

Robotics for Cell Manipulation and Characterization 9780323952132

188 45 33MB Read more

Efficient Collision Detection for Animation and Robotics

We present efficient algorithms for collision detection and contact determination between geometric models, described by

443 65 1MB Read more

State Estimation for Robotics 9781316671528

A key aspect of robotics today is estimating the state, such as position and orientation, of a robot as it moves through

408 71 4MB Read more

Robotics for Intralogistics in Supermarkets and Retail Stores
3031060776, 9783031060779

Author / Uploaded
Luigi Villani
Ciro Natale
Michael Beetz
Bruno Siciliano

0 0 0
Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up

File loading please wait...

Citation preview

Springer Tracts in Advanced Robotics 148

Luigi Villani Ciro Natale Michael Beetz Bruno Siciliano Editors

Robotics for Intralogistics in Supermarkets and Retail Stores

Springer Tracts in Advanced Robotics Volume 148

Series Editors Bruno Siciliano, Dipartimento di Ingegneria Elettrica e Tecnologie dell’Informazione, Università degli Studi di Napoli Federico II, Napoli, Italy Oussama Khatib, Artificial Intelligence Laboratory, Department of Computer Science, Stanford University, Stanford, CA, USA Advisory Editors Nancy Amato, Computer Science & Engineering, Texas A&M University, College Station, TX, USA Oliver Brock, Fakultät IV, TU Berlin, Berlin, Germany Herman Bruyninckx, KU Leuven, Heverlee, Belgium Wolfram Burgard, Institute of Computer Science, University of Freiburg, Freiburg, Baden-Württemberg, Germany Raja Chatila, ISIR, Paris cedex 05, France Francois Chaumette, IRISA/INRIA, Rennes, Ardennes, France Wan Kyun Chung, Robotics Laboratory, Mechanical Engineering, POSTECH, Pohang, Korea (Republic of) Peter Corke, Queensland University of Technology, Brisbane, QLD, Australia Paolo Dario, LEM, Scuola Superiore Sant’Anna, Pisa, Italy Alessandro De Luca, DIAGAR, Sapienza Università di Roma, Roma, Italy Rüdiger Dillmann, Humanoids and Intelligence Systems Lab, KIT - Karlsruher Institut für Technologie, Karlsruhe, Germany Ken Goldberg, University of California, Berkeley, CA, USA John Hollerbach, School of Computing, University of Utah, Salt Lake, UT, USA Lydia E. Kavraki, Department of Computer Science, Rice University, Houston, TX, USA Vijay Kumar, School of Engineering and Applied Mechanics, University of Pennsylvania, Philadelphia, PA, USA Bradley J. Nelson, Institute of Robotics and Intelligent Systems, ETH Zurich, Zürich, Switzerland

Frank Chongwoo Park, Mechanical Engineering Department, Seoul National University, Seoul, Korea (Republic of) S. E. Salcudean, The University of British Columbia, Vancouver, BC, Canada Roland Siegwart, LEE J205, ETH Zürich, Institute of Robotics & Autonomous Systems Lab, Zürich, Switzerland Gaurav S. Sukhatme, Department of Computer Science, University of Southern California, Los Angeles, CA, USA

The Springer Tracts in Advanced Robotics (STAR) publish new developments and advances in the fields of robotics research, rapidly and informally but with a high quality. The intent is to cover all the technical contents, applications, and multidisciplinary aspects of robotics, embedded in the fields of Mechanical Engineering, Computer Science, Electrical Engineering, Mechatronics, Control, and Life Sciences, as well as the methodologies behind them. Within the scope of the series are monographs, lecture notes, selected contributions from specialized conferences and workshops, as well as selected PhD theses. Special offer: For all clients with a print standing order we offer free access to the electronic volumes of the Series published in the current year. Indexed by SCOPUS, DBLP, EI Compendex, zbMATH, SCImago. All books published in the series are submitted for consideration in Web of Science.

Luigi Villani · Ciro Natale · Michael Beetz · Bruno Siciliano Editors

Robotics for Intralogistics in Supermarkets and Retail Stores

Editors Luigi Villani Department of Electrical Engineering and Information Technology University of Naples Federico II Naples, Italy Michael Beetz University of Bremen Bremen, Germany

Ciro Natale University of Campania Aversa, Italy Bruno Siciliano University of Naples Naples, Italy

ISSN 1610-7438 ISSN 1610-742X (electronic) Springer Tracts in Advanced Robotics ISBN 978-3-031-06077-9 ISBN 978-3-031-06078-6 (eBook) https://doi.org/10.1007/978-3-031-06078-6 © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland

To our families

Series Editor’s Foreword

At the dawn of the century’s third decade, robotics is reaching an elevated level of maturity and continues to benefit from the advances and innovations in its enabling technologies. These all are contributing to an unprecedented effort to bringing robots to human environment in hospitals and homes, factories, and schools; in the field for robots fighting fires, making goods and products, picking fruits, and watering the farmland, saving time and lives. Robots today hold the promise for making a considerable impact in a wide range of real-world applications from industrial manufacturing to healthcare, transportation, and exploration of the deep space and sea. Tomorrow, robots will become pervasive and touch upon many aspects of modern life. The Springer Tracts in Advanced Robotics (STAR) is devoted to bringing to the research community the latest advances in the robotics field on the basis of their significance and quality. Through a wide and timely dissemination of critical research developments in robotics, our objective with this series is to promote more exchanges and collaborations among the researchers in the community and contribute to further advancements in this rapidly growing field. Luigi Villani, Ciro Natale, Michael Beetz, and Bruno Siciliano present in this book the results of REFILLS (Robotics Enabling Fully Integrated Logistics Lines for Supermarkets), a European Research project. This project brought a strong team of robotics researchers with a wide range of complementary skills and competencies to work together on the development of robotic systems for the logistics of retail stores and supermarkets that are capable to operate in collaboration with humans. The volume offers an extensive view on REFILLS development of innovative robotic technologies for in-store logistics including monitoring, transportation, and shelf refilling, in both collaborative and autonomous modalities. The results are organized in six chapters covering principal issues ranging from the automatic acquisition of stocks, pre-sorting, and sequencing of goods to collaborative ergonomic shelffilling and full autonomous refilling of products on shelves. These areas were all experimentally validated with some tested in stores—a remarkable accomplishment of the REFILLS team.

vii

viii

Series Editor’s Foreword

The outcome is a book that is confirmed to be shining in our STAR series! Stanford, USA April 2022

Oussama Khatib STAR Editor

Preface

REFILLS (Robotics Enabling Fully-Integrated Logistics Lines for Supermarkets) is a European research project, which started in January 2017 and ended in December 2020. The ambition of the project was that of improving logistics in a supermarket thanks to mobile robotic systems in close and smart collaboration with humans, addressing the main in-store logistics processes for retail shops. The goal of this edited volume is to present the principal results of the project to the scientific community but also to operators of the logistic and retail world, sharing knowledge about the technological opportunities and the main challenges regarding robotics for logistics in supermarket and retail stores. Although the market for online ordering is growing, experts agree that there will always be demand for physical supermarkets as they combine the relaxing sensory experience of shopping with the extra service of human advice. Retail and logistics companies are committed towards a shopping experience more comfortable and exciting while, at the same time, using technology to reduce costs and increase efficiency. In particular, there is still a lack of automation in the logistics management of retail stores. Most of the logistics costs arise from items handling, items transportation, shelves replenishment, and backroom management. These tasks are time-consuming, repetitive, inefficient, monotonous, and wearing for supermarket clerks but are rather complex to automate. A mobile robotic system in close and smart collaboration with humans can ideally perform all these operations provided some challenges are solved. The solutions proposed within REFILLS cover a number of intermediate steps— with the potential for early exploitation and store and customer benefits—up to the desired level of automation to fulfill the logistics needs of the retail market domain, as illustrated in Chap. 1. In particular, three scenarios are explored and discussed, with different robots applied and different processes supported, i.e.:

ix

x

1. 2. 3.

Preface

Store monitoring Collaborative shelf refilling Autonomous shelf refilling.

The first scenario deals with the acquisition of full information on location, stock and availability of goods within the store, by using a mobile robotic system that autonomously acquires a semantic digital twin model of the store, which is presented in Chap. 2. This operation is necessary to keep supermarket shelves stocked and minimize empty or disarranged shelves. In particular, the automatic verification that the real shelves match the ideal layout is a very challenging task and requires the development of robust and scalable techniques for the identification of thousands of different products, as discussed in Chap. 3. The digital twin data generated by the scanning robot of the first scenario is the foundation for the second scenario, where effective product pre-sorting and sequencing are achieved with robots. The idea is that of developing an automatic depalletizing cell in the backroom, described in Chap. 4, in charge of putting the cases with the products on trolleys in a suitable order. Then, a fleet of store-ready automated guided vehicle is in charge of transporting the trolleys in proximity of the shelf to be refilled. At this point, the shelf-filling process starts. This operation is performed by the store employees supported by suitable robotic units. Namely, a pointing unit shedding light on the spot of the shelf where a given product should be placed, and a handling unit that serves the case packs to the clerks with the aim of reducing the risk of musculoskeletal disorders related to the refilling operation. Chapter 5 presents the ergonomic analysis that allows selecting the best refilling process modality, for both human and robot, tailored to the anthropometric characteristics of the clerk. Finally, the third scenario is focused on the time-consuming process of stacking individual products on the shelves, which is performed autonomously by a robotic arm. This is a very challenging task, requiring highly reliable manipulation capability in narrow spaces of products with a large variety of size, shape, weight, and fragility, as shown in Chap. 6. All the presented technologies have been validated in realistic environments and some of them have been tested also in real supermarkets. Further information about the project’s achievement can be found at http://www.refills-project.eu, including videos illustrating experiments. They are the result of the coordinated effort of three academic partners (CREATE-University of Naples Federico II, University of Campania Luigi Vanvitelli, University of Bremen), one robotic company (KUKA Roboter GMBH), one high technology company (INTEL Ireland LTD), one logistic company (SWISSLOG), with the fundamental contribution of a retail stores chain (dm-drogerie markt) as end user. We hope readers will find the material contained in this volume useful to deepen the knowledge on the main logistic process in retail stores and on the possible robotized solutions, and we would like to take this opportunity to express our sincere

Preface

xi

appreciation and warmest thanks to all those who have contributed to the success of the REFILLS project! Napoli, Italy March 2022

Luigi Villani Ciro Natale Michael Beetz Bruno Siciliano

Acknowledgements The research leading to the results presented in this book received funding from the European Commission under the H2020 Framework Programme, REFILLS project GA 731590.

Contents

1 Robot-Assisted Instore Operations in Grocery Retailing . . . . . . . . . . . Michael Sternbeck

1

2 Robots Collecting Data: Modelling Stores . . . . . . . . . . . . . . . . . . . . . . . . Michael Beetz, Simon Stelter, Daniel Beßler, Kaviya Dhanabalachandran, Michael Neumann, Patrick Mania, and Andrei Haidu

41

3 Robots Collecting Data: Robust Identification of Products . . . . . . . . . Saksham Sinha and Jonathan Byrne

65

4 Robots Working in the Backroom: Depalletization of Mixed-Case Pallets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Pierluigi Arpenti, Riccardo Caccavale, Andrea Giuseppe Fontanelli, Vincenzo Lippiello, Gianmarco Paduano, Bruno Siciliano, and Luigi Villani

81

5 Robots Helping Humans: Collaborative Shelf Refilling . . . . . . . . . . . . . 117 Teodorico Caporaso, Dario Panariello, Stanislao Grazioso, Giuseppe Di Gironimo, and Luigi Villani 6 Robotic Clerks: Autonomous Shelf Refilling . . . . . . . . . . . . . . . . . . . . . . 137 Alberto Cavallo, Marco Costanzo, Giuseppe De Maria, Ciro Natale, Salvatore Pirozzi, Simon Stelter, Gayane Kazhoyan, Sebastian Koralewski, and Michael Beetz

xiii

Contributors

Arpenti Pierluigi PRISMA Lab, CREATE - Department of Electrical Engineering and Information Technology, Università degli Studi di Napoli Federico II, Napoli, Italy Beetz Michael Universität Bremen, Bremen, Germany Beßler Daniel Universität Bremen, Bremen, Germany Byrne Jonathan Intel R&D Ireland Ltd., Leixlip, Kildare, Ireland Caccavale Riccardo PRISMA Lab, CREATE - Department of Electrical Engineering and Information Technology, Università degli Studi di Napoli Federico II, Napoli, Italy Caporaso Teodorico CREATE and Department of Industrial Engineering, University of Naples Federico II, Napoli, Italy Cavallo Alberto Università degli Studi della Campania Luigi Vanvitelli, Aversa, Italy Costanzo Marco Università degli Studi della Campania Luigi Vanvitelli, Aversa, Italy De Maria Giuseppe Università degli Studi della Campania Luigi Vanvitelli, Aversa, Italy Dhanabalachandran Kaviya Universität Bremen, Am Fallturm 1, 28359 Bremen, Germany Di Gironimo Giuseppe CREATE and Department of Industrial Engineering, University of Naples Federico II, Napoli, Italy Fontanelli Andrea Giuseppe PRISMA Lab, CREATE - Department of Electrical Engineering and Information Technology, Università degli Studi di Napoli Federico II, Napoli, Italy

xv

xvi

Contributors

Grazioso Stanislao CREATE and Department of Industrial Engineering, University of Naples Federico II, Napoli, Italy Haidu Andrei Universität Bremen, Bremen, Germany Kazhoyan Gayane Universität Bremen, Bremen, Germany Koralewski Sebastian Universität Bremen, Bremen, Germany Lippiello Vincenzo PRISMA Lab, CREATE - Department of Electrical Engineering and Information Technology, Università degli Studi di Napoli Federico II, Napoli, Italy Mania Patrick Universität Bremen, Bremen, Germany Natale Ciro Università degli Studi della Campania Luigi Vanvitelli, Aversa, Italy Neumann Michael Universität Bremen, Bremen, Germany Paduano Gianmarco PRISMA Lab, CREATE - Department of Electrical Engineering and Information Technology, Università degli Studi di Napoli Federico II, Napoli, Italy Panariello Dario CREATE and Department of Industrial Engineering, University of Naples Federico II, Napoli, Italy Pirozzi Salvatore Università degli Studi della Campania Luigi Vanvitelli, Aversa, Italy Siciliano Bruno PRISMA Lab, CREATE - Department of Electrical Engineering and Information Technology, Università degli Studi di Napoli Federico II, Napoli, Italy Sinha Saksham Intel R&D Ireland Ltd., Leixlip, Kildare, Ireland Stelter Simon Universität Bremen, Bremen, Germany Sternbeck Michael dm-drogerie markt GmbH + Co. KG, Karlsruhe, Germany Villani Luigi PRISMA Lab, CREATE - Department of Electrical Engineering and Information Technology, Università degli Studi di Napoli Federico II, Napoli, Italy

Chapter 1

Robot-Assisted Instore Operations in Grocery Retailing Michael Sternbeck

Abstract The pressure on instore logistics systems of bricks-and-mortar grocery retail stores is bound to increase in future due to rising shopper expectations and less customer traffic in cities caused by online competition. The future management and operations of stores will be a key factor in keeping stores viable. In this chapter, instore logistics systems as part of retail logistics networks are characterized in detail to establish the framework for instore automation approaches. The application of professional service robots for instore logistics from an overall process and enduser perspective is discussed as one approach of many to offer an answer to current challenges of operating bricks-and-mortar stores. Three scenarios are explored and discussed, with different robots applied and different processes supported, i.e., store monitoring, collaborative instore logistics, and autonomous shelf filling. Significant potential is explored by digitally replicating the store, adapting the instore logistics process organization and physically supporting instore operations. Moreover, managerial aspects are discussed such as the profound involvement of store employees in shaping instore operations changes. Keywords Instore operations · Instore automation · Digital store twin · Autonomous transportation · Collaborative robotics

1.1 Introduction The bricks-and-mortar grocery retail environment is becoming increasingly challenging for retail companies. This is the result of several factors. First, competition is increasing with growing market consolidation. Better volumedependent purchasing prices at the suppliers are passed on to customers resulting in lower sales prices, which intensifies competition especially in the field of very

M. Sternbeck (B) dm-drogerie markt GmbH + Co. KG, Am dm-Platz 1, 76227 Karlsruhe, Germany e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 L. Villani et al. (eds.), Robotics for Intralogistics in Supermarkets and Retail Stores, Springer Tracts in Advanced Robotics 148, https://doi.org/10.1007/978-3-031-06078-6_1

1

2

M. Sternbeck

price-sensitive basic grocery assortments. This raises the pressure on store operations as one of the largest cost blocks of retail companies. Second, additional online sales channels are being established both by existing traditional retail companies and by new market entries of online pure-play startups. Major investments are being made in order to transform formerly exclusive store-based retailers into integrated omnichannel companies and to participate in the expected accelerating growth of online sales in the grocery sector. A shift of demand to online channels and decreasing store traffic in city centers, however, raise the pressure on existing bricks-and-mortar retail spaces to operate more economically. Third, customer expectations are rising. Besides the basic customer requirements of well-stocked shelves and a high on-shelf availability, customers will ask for more digital support before and during their store shopping trips, e.g., online display of availability of the products requested in the store selected, routes to exact positions of specific products in digital store maps, or ability to place the shopping order via a smartphone application and collect the order a little later in the store (Click&Collect). Fourth, exogenous factors complicate retail operations. The faster diffusion of trends and ad-hoc demand peaks through social media, for example, and shorter product life cycles even in the grocery assortment, generate special operational challenges for geographically decentralized store networks. In addition, product diversity is increasing as more new products and categories tend to be included in the store assortments. In summary, this implies that on the one hand customer expectations and requirements are rising and market trends are generally leading to greater operational efforts for store-based retail companies. On the other hand, less leeway for additional operational expenditure in the stores is available, generating the need to increase efficiency due to the market conditions and customers switching sales channels. Against the backdrop of these challenges and the resulting pressure from the performance- and cost-side, retail companies are searching for new methods and opportunities to cope with customer requirements and keep their business profitable. Supply chain and especially store operations play a major role in the search for effective and efficient approaches to further develop the store-based retail business and make it future-proof. The store is the central customer-facing point of bricks-andmortar retail companies and is therefore very essential for the provision of customer services and satisfaction. Moreover, compared to manufacturers of physical products, logistics and handling operations in retail companies represent a larger share of total costs. Within retail supply networks, instore logistics is responsible for the largest operative cost pool [23]. Instore operations are therefore a highly relevant area with a strong focus on new innovative solutions to ensure future store-based services and competitive store cost structures. Today, in comparison to retail warehouses, instore execution from the receipt of store deliveries, initial shelf filling, backroom management and shelf replenishment with products intermediately stored to shelf layout reconfigurations and product countings are performed completely manually without any physical technical support. This was the starting point of the EU-Horizon 2020 project “REFILLS” (Robotics Enabling Fully-Integrated Logistics Lines for Supermarkets), in which

1 Robot-Assisted Instore Operations in Grocery Retailing

3

the consortium considered three scenarios to explore the use of autonomous robotic and software technologies to support the instore execution processes. The aim of the project was to develop robot technologies and software that support the operational execution processes within grocery retail stores. From an overall retail supply chain perspective, the goal of the project is to better include the retail store in the technological process chain of bricks-and-mortar retail companies and to generate and use store-based operational data to improve the instore logistics processes in a quantitative and qualitative manner. This data-based reflection of the store conditions and processes should also be used at the suppliers, in retail Distribution Centers (DCs) and in transportation to realize a comprehensive supply chain integration strategy in which the physical point of sale plays the role it deserves. These approaches are intended to be one field of action in order to respond to the challenges described above and to keep the bricks-and-mortar store network viable and competitive in the future. This chapter establishes the framework for the REFILLS developments from a practical end user perspective and describes the context for the more technical contributions in the following chapters. The first goal of this chapter is to create an understanding for the research setting by characterizing and illustrating the typical design of grocery retail supply chains. We will especially describe the store processes and challenges in the current manual operational environment to achieve high quality of execution and process efficiency. Based on this, the second goal of this chapter is to derive approaches for meaningful automation support of instore logistics processes. Three scenarios are described and contextualized that were the focus of the REFILLS consortium, i.e., store monitoring, collaborative instore logistics and autonomous shelf filling. Moreover, a few general aspects are considered that go along with the change resulting from the application of professional service robotic technologies without customer interaction in stores from a retailer’s perspective. The remainder of this chapter is organized as follows. First, in Sect. 1.2 grocery retail supply networks are roughly characterized before we shift the focus to the detailed description of instore logistics processes in Sect. 1.3. In Sect. 1.4 we then go on to describe three possible robotic solution approaches, which are reflected in the three scenarios considered in the REFILLS project. In Sect. 1.5 we illustrate some of the managerial aspects that need to be taken into account when intending to apply professional service robots to support instore operations. Section 1.6 concludes the chapter.

1.2 Characterization of Grocery Supply Networks The supply networks currently operated by large grocery retail companies have evolved over the last few decades and have increasingly become focused on the requirements of the stores. We will briefly describe constitutional aspects of internal upstream supply networks and their effects on store supply.

4

M. Sternbeck

Fig. 1.1 Example of a retail supply network with three distribution stages

Today, most European grocery retail companies operate supply networks consisting of central DCs (supplying all stores) and regional DCs (supplying an allocated subset of stores) (see Fig. 1.1) [17, 23, 33]. Sometimes companies operate more than two distribution stages, e.g. three stages with a central DC, regional DCs (stage 1) and regional DCs (stage 2) or even more. In most network configurations each DC is supplied directly by the manufacturers with full, unmixed pallets or at least in larger quantities per Stock Keeping Unit (SKU). The products are normally exclusively shipped via one specific distribution stage. In the corresponding DCs the products are stored only for a limited time period of a few days on average before they are transferred to the picking area, where they are put together according to the store orders. The pallets prepared for an individual store are then shipped to the point of sale either directly or via internal consolidation points in which the transportation flows of several DC types are bundled for consignment to the stores. The result is a differentiated network of retail warehouses and consolidation points via which the vast majority of products are channeled to the stores. Only a very limited volume is still supplied directly by the manufacturers to grocery stores. In general, slow-moving items with a high value density are shipped via central DCs to the stores. The larger the sales volumes per SKU and the larger the physical product size, the more decentralized the products routed to the stores, especially when product values are low. The goal of product allocation to distribution stages is to balance the costs of transportation, inventory holding and handling in the entire network [17]. Each store receives deliveries from the different DC types with the corresponding assortment allocated. Another relevant network decision that is interdependent on the allocation of products to DC types is related to the packaging unit per SKU that is used as order and delivery unit for the stores [3, 19, 32, 35, 36]. Packaging is generally considered to have a significant influence on efficiency in the logistics subsystems of a retail supply

1 Robot-Assisted Instore Operations in Grocery Retailing

5

Fig. 1.2 Functional perspective on retail network operations with mirrored processes in DC and store [33, 36]

chain [15, 30]. More specifically, the number of Consumer Units (CUs) that are bundled in the packaging unit used for store delivery impacts DC and store operations. The majority of the delivery volume to grocery stores is shipped in the case pack sizes offered by the manufacturers. The SKUs for which the case pack size offered by the supplier is not considered appropriate from a sales or handling perspective are unpacked in a DC and broken down to sub-packages or even to the individual CU [3]. The picking systems have to be designed and configured accordingly [2]. Reusable boxes are often used for sub-packages and individual CUs that circulate between stores and DCs. The products are packed and shipped to the stores in these reusable boxes. For the stores, this results in receiving different loading units with SKUs in different packaging units. As the packaging unit that can be ordered by stores in integer multiples has an influence on the possible granularity of order sizes, it impacts the precision of order volumes compared to the actually existing store demand or forecasts, respectively. This in turn affects the store processes of initial shelf filling and restocking. Furthermore, store delivery patterns are highly relevant with regard to the store supply characteristics of the upstream retail network [16–19, 33]. A store delivery pattern specifies on which days during the week a store receives deliveries from the different DC types, with the correspondent assortment. The resulting delivery frequency per DC type impacts operations in all logistics subsystems in a retail supply chain. For the store, the delivery frequency influences the number of shipments, the overall volume per shipment and per SKU, and thus impacts initial shelf filling and restocking efforts [33]. Overall, the configuration of the upstream logistics network with regard to physical network layout and further planning variables (case pack sizes, delivery patterns) considerably impacts the processes that subsequently take place instore. The complete physical tasks that take place within internal retail supply chains are shown from a functional perspective in Fig. 1.2. Structurally the processes in the DCs are performed in a mirrored manner in the stores. Products that are taken from storage and put on pallets or roll cages in the DCs are unpacked in stores and the products are put on the shelves. Although the

6

M. Sternbeck

Fig. 1.3 DC picking robot with innovative gripper to grab up to four case packs in parallel [8]

processes are comparable from a functional point of view, the way the processes are performed and the intensity of technology use is very different. Looking into the past, we recognize that retail companies heavily invested in new automation technology in their DCs, while the process design and technology in the stores remained unchanged. For example, in 2020 dm-drogerie markt began operating a new DC in which picking is performed via innovative picking robots for a large proportion of the assortment allocated [8]. This involves taking all the products that arrive in the DC from the supplier pallets automatically and arranging them in an integrated automatic retrieval system individually in the sequence required by the picking robots to physically build the mixed SKU pallets for the respective store. This means that each pallet requires a detailed digital representation before the physical process starts. An algorithm builds the pallet digitally before the physical packing process begins. The specialized grippers of the picking robots are the most innovative part of the automated DC. Up to four picking units are lined up and picked by the robot’s fingers. The picking units are then placed on the pallet from two sides. To ensure that it is filled to capacity, the pallet is lowered for the next picking unit so that the robot always works at a certain height. As the packing process progresses, the picking units are automatically secured using stretch wrapping for further transportation on the pallet [8]. In consequence this means that at least for some assortments no manual handling effort is required from the point of unloading incoming goods to the loading of outgoing pallets to the trucks that supply the stores. Figure 1.3 shows the gripper of the picking robots in the DC of dm-drogerie markt. The description of this innovative technology applied in a DC helps to illustrate the discrepancy of DC automation compared to the stores that are currently operated without the help of robots to support instore logistics tasks. This realization was

1 Robot-Assisted Instore Operations in Grocery Retailing

7

the starting point of the REFILLS project. Before we elaborate on potential robotic solutions to support store operations we dive more deeply into the instore logistics processes in the following section.

1.3 Instore Logistics Systems and Challenges Logistics tasks that are carried out in the stores themselves take on a very important role in fulfilling customer requirements in an efficient manner within a retail supply chain. One reason besides the cost intensity why grocery retail companies are increasingly focusing on their instore logistics processes also arises from increasing pressure on grocery retail spaces. In comparable time frames (10 years) between 2004 and 2015, grocery revenues in Germany increased by 25% [13] and the average number of products offered in a grocery store rose by 15% [9]. However, retail spaces only increased by 7% in the corresponding period [14]. This rising productivity, measured in sales per square meter accompanied by growing complexity, measured in the number of products per square meter, illustrates the challenges and necessity of operating a store efficiently. We define instore logistics as all planning, execution and monitoring tasks that go along with the physical product flows through retail stores and the corresponding informational and data processes. The physical product flows through the logistics subsystem store begin with goods receipt processes at the ramp of the store and end with customers carrying the products out of the store or handing over the products to downstream logistics subsystems, e.g., carriers for delivering online orders to customers’ homes. The design and execution of the processes within these cornerstones are organized slightly differently at different grocery retail companies. The basic tasks and challenges, however, are largely comparable.

1.3.1 Instore Logistics Processes DC deliveries to stores are often planned with time windows prearranged in a balance of degrees of freedom between the transportation and store subsystems so that store employees can base their further instore activities on the delivery times expected. In contrast to direct-to-store deliveries from manufacturers, there are usually no permanent and detailed inspections of the individual SKUs delivered. As DC store deliveries in the case of fully owned outlets do not cross any company boundaries, permanent inspection efforts would incur more costs compared to the expected benefit generated. Generally, only the incoming pallets with the corresponding serial shipping container codes are checked, often supported by scanning devices. In the case of buffer time included between the receipt of the shipment and initial shelf filling of the SKUs delivered, the unpacked pallets are stored provisionally in the backroom. If initial shelf-stacking operations are synchronized with the store

8

M. Sternbeck

delivery and scheduled right after the planned delivery time, the pallets are mostly brought directly onto the sales floor. In some cases, especially if the pallets contain products of several different categories, manual presorting activities are carried out in the backroom before the products are transported to the corresponding shelves. After arrival of the products on the sales floor, initial shelf stacking operations begin, i.e., the first attempt to stack the products on the shelf after delivery. In detail this means that a store employee takes a SKU from the pallet, opens the package (if applicable), identifies the product, searches for the slot on the shelf, and walks to the front of the corresponding shelf section. Next, the free shelf capacity for the specific SKU is checked. If free shelf space is available, depending on the store format and shelf layout, either complete shelf-ready-designed case packs or individual CUs are stacked on the shelf until the shelf space available is filled completely or the entire delivery of the specific SKU is placed on the shelf. This is repeated until the SKUs of the complete delivery are processed accordingly. In the event that there are overflow items, i.e., CUs of a SKU that do not fit on the shelf in the process of initial shelf filling, they are put back on pallets, shopping carts or roll cages and transported into the backroom for intermediate storage. At a later point in time, after free shelf space becomes available due to customer purchases, the products are brought back to the sales floor from the backroom and the CUs stored intermediately are stacked on the shelf in a second attempt until all remaining products are stacked or the available shelf capacity is full. In addition, there are significant disposal activities. During the unpacking process recyclable packaging material is collected and separated, especially cardboard packaging of carton case packs, transparent foil stretched around pallets for securing the cargo load, and coloured (hard) plastic of trays and subpackages. This material is carried into the backroom with the help of special collection bins or bags and often condensed with the help of pressing technology before being loaded onto a truck. Figure 1.4 illustrates the main physical instore logistics tasks. Figure 1.5 roughly illustrates the physical instore logistics processes and selected information flows. From this process overview it becomes apparent that physical instore product flows are first of all linear until the point that shelf capacity is insufficient to accommodate the products that should principally be stacked on the shelf at that point in time. From an efficiency point of view it is therefore the goal to maximize the benefits of the linear flow of products from the goods receiving area directly to the shelves while minimizing the additional efforts of handling products in the loop between sales floor and backroom. There is still significant leeway to better realize this objective, which is also derived from the cost structure of the instore logistics processes.

1.3.2 Instore Logistics Costs Instore logistics costs account for the highest share of operational costs within internal retail supply chains. In an empirical study of 28 European grocery retail companies, it turned out that on average 48% of operational logistics costs are incurred on the

1 Robot-Assisted Instore Operations in Grocery Retailing

9

Fig. 1.4 Illustration of physical instore logistics tasks

last few yards of the entire internal retail supply chain within the stores [23]. The DC accounts for 28% of the operational supply chain costs and transportation for 24% on average, respectively (see Fig. 1.6). This is the result of the significant manual working time required to perform the fine granular logistics tasks on a CU level described above. Up to more than 40% of the store employees’ working hours are spent on instore logistics tasks [29]. Against the backdrop of increasing competition within the bricks-and-mortar retail sector and increasingly also with online retailers, it is obvious that the pressure on stores is rising and retail companies are intensively searching for new ways to increase instore operational efficiency. Analyzing the occurrence of instore logistics costs in more detail, it becomes clear that the processes of initial shelf filling, restocking overflow items out of the backroom and processes around goods receipt, internal transport and sorting are by far the most costly, followed by further instore logistics tasks such as order placements, product counting, backroom organization and instore picking of online orders. In grocery retailing, costs for capital tied up in inventories also play a minor role compared to handling costs of shelf filling and restocking. Figure 1.7 demonstrates the very rough proportions of time needed and costs incurred in the three main operational processes internal to the store.

10

M. Sternbeck

Fig. 1.5 Typical design of grocery instore logistics system, based on [22]

The costs for initial shelf filling are driven by a fixed cost component per case pack or order line (depending on SKU sorting on DC pallets) and variable shelf-filling costs per unit stacked on the shelf. This corresponds to CUs stacked when individual products are placed on the shelf. The fixed cost component reflects the labor time for the processes of picking up the case pack from the DC pallet, identifying the SKU, opening the package, walking to the corresponding shelf section and searching for the right slot on the shelf. The variable cost component represents the manual activity of

1 Robot-Assisted Instore Operations in Grocery Retailing

11

Fig. 1.6 Occurrence of operational supply chain costs per internal retail logistics subsystem [23]

Fig. 1.7 Rough proportions of time spent and costs incurred in the three main store-internal operational processes

placing one unit after the other on the shelf. The fixed costs of shelf filling including the orientation and walking times of the employees account for roughly half of the entire shelf-filling costs—without considering opening the case packs. With this cost structure, there is a clear incentive to achieve fixed cost degression via larger case packs or larger shipment sizes, which could be realized by reducing delivery frequency when applying a common periodic review, reorder-level-based inventory policy [5, 34]. However, this approach works only as long as there is sufficient shelf space available to realize these economies of scale. If the delivery size per SKU is larger than the shelf space available at the time of initial shelf filling, overflow stock results that has to be taken back into the backroom. Fixed costs occur in this process in the event of overflowing items at the time of initial shelf filling. These fixed costs comprise identifying and organizing the right storage location, for example on a roll cage, and additional walking distances. Later, when new attempts are made to put the products from backroom storage on the shelves, comparable to initial shelf-filling cost structures, fixed costs arise per restocking task per SKU and variable costs per individual unit stacked on the shelves.

12

M. Sternbeck

Restocking cost rates are often higher than the cost rates of initial shelf filling for several reasons. First, restocking overflow items is often carried out during opening hours compared to initial shelf filling, which is often scheduled before the store opens. This implies that operation times take longer as more concern must be given to the customers. Second, restocking from the backroom is often carried out by store personnel with higher hourly rates. Third, sorting of the products on the roll cages, often no longer bundled by case pack, is not easy to achieve and maintain, so that searching for the products and organizing the storage takes a lot of time. It becomes apparent that with the first overflow item of a SKU, more fixed costs arise additionally than are incurred for one complete case pack in the initial shelffilling process. This is the reason why retail companies try to find the best tradeoff between realizing economies of scale in instore operations and avoiding overflow inventory. From today’s perspective, there is still considerable untapped efficiency potential in this field. This is an enormous challenging task as the interplay between the main efficiency drivers, i.e., the case pack size as order unit selected, the delivery pattern applied and the shelf space allocated, is extremely complex in a stochastic and dynamic world. Moreover, the selection of the packaging unit as order and delivery unit and the application of certain delivery patterns also have a large influence on DC and transportation efficiencies, necessitating integrative supply chain planning approaches.

1.3.3 Specifics of Instore Logistics Systems In an abstract sense and from a purely operational perspective, the store can be interpreted as a logistics storage and handling system with very special conditions and requirements. In contrast to other logistics facilities, a store is primarily designed for customers and not organized according to a purely operational handling and productivity perspective. The main designated storage space in the store is the shelf space assigned to specific SKUs. Contrary to warehouses, logistics departments are not the only stakeholders in the use of the shelves and in dimensioning shelf capacities per SKU. A lot of other influencing factors, especially from marketing and category management, have to be taken into consideration in the product listing, locating and facing processes individual to a specific store [21]. The number of facings describes how often the items of a particular SKU are placed next to each other on the shelf. Together with the depth of the shelf and potential multiple product layers, this determines the overall shelf space capacity for a specific SKU. Integrating the perspectives of several stakeholders makes shelf space planning very complex [20]. It is not only the shelf planograms that are mainly designed for customers: the entire shop design and organization is also more focused on the customer than on operational efficiency. For example, aisles must be kept clear, which results in longer walking distances to pallets or roll cages. The shelves should always look well filled and tidy without empty packaging material, and the products should ideally be placed right at the front of the shelf so that the shelf always looks attractive

1 Robot-Assisted Instore Operations in Grocery Retailing

13

and the products are easily accessible for the customers. This generates activities on the retail shop floor that would not be necessary in DCs or other industrial logistics sites. In the co-production process the customers are responsible for picking the desired products from the shelves and placing them in their personal shopping baskets. The fact that customers have to integrate themselves in the service production process of retailers creates further complexity and proneness to error. For example, counting of products by store employees is critical when customers are in the store as products in the shopping baskets cannot be recognized. Due to customer behavior in the store, products may, for example, be misplaced when they are put back on the shelf by customers themselves. In the rare cases of theft, inventory records become inaccurate and lead without correction to lower service levels. Of course, retail stores are a vital point of interaction between customers and the retail company. Retailers that put the customer at the center of their considerations and activities are rewarded by high customer loyalty and satisfaction and, associated with this, economic success as a logical result. Nevertheless, from a pure instore logistics point of view, customer integration makes store operations more challenging and an important factor in the high share of instore logistics costs.

1.3.4 Past Developments and Status Quo of Instore Logistics While in the past the understanding used to be that retail logistics ended at the ramp to the store, the focus has increasingly shifted to a comprehensive perspective in which instore operations are no longer considered to be just a sales function, but a highly relevant integral part of the retail supply chain. Numerous improvements in instore logistics planning and execution have already been achieved [27]. At dm-drogerie markt, for example, new integrative supply chain planning systems have been introduced in the recent years to calculate appropriate order and delivery units per SKU, allocate SKUs within the network and to define suitable order and delivery patterns. All planning approaches reflect the stores as well as possible with their specific requirements and anticipated system behavior using current data available. The result has been a significant increase in efficiency of the entire supply chain as well as supply chain performance. The average storage volume in store backrooms has been reduced dramatically over the last few years, for instance, with the accompanying effect of significantly reduced restocking efforts without any negative effects on out-of-shelf rates. This has been possible by better reflecting store conditions and requirements in very data-intensive operational planning models without applying radical changes in process organization or technology. In the past and current operational improvement approaches in grocery retailing the systems behavior of the instore logistics system is anticipated, and the resulting approximations and assumptions are included in operational planning models. The concrete process steps of shelf filling and restocking activities are mostly assumed to be carried out at deterministic times and shelf inventory volumes are often integrated into models with the expected values and distribution assumptions. As real-time data

14

M. Sternbeck

Fig. 1.8 Intra-store operations as a black-box system from a data perspective

about the state of the instore logistics system with regard to inventory volumes, locations and free shelf capacity are not available, there has so far not been much scientific and analytic support to control the individual instore logistics production steps. How to execute instore logistics processes in concrete and specific situations is mostly left to the discretion of the store employees. There are of course best-practice procedures shared between the stores, suggestions and proposals generated by empirical tests and a huge wealth of experience gained by the employees to operate the store in an appropriate manner. However, the instore logistics systems of grocery retail companies are largely black-box operations from a data-based production control perspective (see Fig. 1.8). In the DCs the products picked for a store are captured automatically, recorded and provided for the store in a dispatch advice. Scanning the pallets in the transportation chain and in the store allows it to check that the right pallet reaches the right store and the products on the pallet are booked into the store merchandizing IT system. While in automated DCs each movement is reflected in IT systems, in current store process configurations the next regular digital recording point is mostly the scanning process of the individual CUs captured at the cash desks during customer checkouts. All the instore logistics operations that take place in between are not currently controlled and supported by actual data about the system states and therefore not included in operational data-based decision support models with a short planning horizon and high updating frequency. There are some more capturing events to ensure high inventory record accuracy of the stores, e.g., bookings to reflect customer returns, breakage, own store consumption, give-away products, products picked for online orders, product counting and DC returns. Accurate checkout scans and inventory bookings are intended to keep instore system stock data as accurately as possible. There are further special product counting procedures in place with the goal of enhancing inventory accuracy, i.e., no or low discrepancy between a storeâŁ™s actual and recorded inventory levels [6]. These measures are meant to ensure accurate inventory data of the entire black-box system as a whole but without any differentiation

1 Robot-Assisted Instore Operations in Grocery Retailing

15

with regard to different locations and processes in the store. Nevertheless, inventory inaccuracy rates range between 30 and 60% of SKUs, with implications on inventory management, instore handling efforts and customer service levels [6, 7]. Continuous improvement, further refinement of supply chain planning systems and many incremental innovations in instore handling and organization have led to the current point at which it is becoming ever apparent that for significant further improvements and efficiency gains, new methods and ideas are necessary with regard to data generation and technology application in stores.

1.3.5 Challenges for Instore Automation Approaches Essential initial findings can be derived from the descriptions above for future instore logistics improvement and automation approaches. The overall goal in future developments of instore operations is to raise the processes to a new level of efficiency and find new ways to offer and support new instore services, especially in the context of omnichannel retailing. The approach of REFILLS is to help reach these long-term goals by establishing the foundation for the application of new instore technology. The perspective is to apply instore robot technology that is properly embedded in the instore process landscape and enables new methods of effective and efficient store operation. However, based on the characterization of retail networks and instore logistics tasks and costs described, specific challenges can be derived that have to be overcome by the robotic solutions developed. First, a store network is, compared to industrial production sites, highly decentralized. For example, dm-drogerie markt operates more than 2,000 stores in Germany alone, which implies that there is no option to realize economies of scale by bundling the production processes in only a limited number of highly automated sites. A solution has to be designed such that it can be applied and financed at a very large number of locations. Second, the store is a challenging environment for robots due to customer-centric store design with customer movements, no technicians permanently on site and, compared to industrial locations, very limited space available both in the backroom and on the sales floor. Robots working in a store environment with humans, both clerks and customers, impose high safety requirements. Third, the product range handled in a grocery store is a challenge for automation technology. A high number of SKUs are listed that demonstrate wide heterogeneity with regard to product sizes, weight, packaging, form, product consistency and pressure sensitivity. Moreover, the assortment listed in a store is highly dynamic, with increasing numbers and speed of listings and de-listings. Fourth, there is no continuous production process in the stores. For example, dmstores are supplied two to three times per week on average, although the stores are open on six days. Even if the store is organized such that shelf-filling operations take place every day, the filling volume might be very high in the early morning before the

16

M. Sternbeck

store opens, while only a few restocking activities may be carried out during opening hours. This implies that there is only selective utilization of any instore automation technology that is intended to support instore logistics. Of course, low utilization influences amortization of technology. Facing these challenges, the REFILLS project is aimed at coping with the specific instore conditions and initially developing automated technical prototypes and solutions intended at both supporting instore handling and completely taking over manual tasks. The following section describes the REFILLS approaches from an overall, strategic perspective, focused on business impact and process integration from an end user perspective, instead of going into technical details.

1.4 Automation Approaches to Support Instore Logistics The REFILLS project started with in-depth observation and execution of existing instore logistics processes in two dm-stores in Munich. The entire consortium worked in the stores and exchanged information with the employees intensively. This start was highly relevant to develop common ground and comprehensive understanding for further developments, learn about the challenges, and shape the common mindset of the project participants. Building on this experience the consortium developed three project scenarios that formed the core of the entire project (Fig. 1.9). In the following, these three scenarios are described with regard to the respective approach, interconnection with current practices, as well as the expected benefit and business impact for instore operations from an end user’s retail perspective. The three scenarios are denoted as follows: 1. Scenario 1—Store monitoring 2. Scenario 2—Collaborative instore logistics 3. Scenario 3—Autonomous shelf filling

Fig. 1.9 REFILLS scenarios

1 Robot-Assisted Instore Operations in Grocery Retailing

17

1.4.1 Scenario 1—Store Monitoring It became clear right at the beginning of the REFILLS project that light has to be shed on the instore black-box operations before thinking about further automation approaches. More detailed data are required than are currently available to describe the store as a production environment and to reflect current states of the instore logistics system. This information is the necessary prerequisite to creating a databased foundation for new instore logistics planning, process control approaches and for applying prospective automation technology and robots that support physical activities. Generating differentiated data about store layout and states is therefore seen as the key to taking instore logistics to a new level of effectiveness and efficiency. This is the aim of the REFILLS Scenario 1.

1.4.1.1

Scenario 1—General Idea

The general idea of Scenario 1 is to generate digital store twins as a data basis for new logistics planning and execution approaches and new services at high frequency. Such a semantic digital store twin represents a store digitally with regard to state-dependent information needed in a new vision of how instore logistics could be managed in the future. From an operational control perspective it seems to be especially beneficial to know where in the store which items are placed in which volumes at a certain point in time. This is the crucial information for applying new operational planning models, shop floor control systems and also to offer new services. In order to obtain the data required, a digital store map is needed with the infrastructure included, especially the different shelf types with their layers and the respective dimensions. Moreover, all SKUs have to be recognized with their current locations on the shelf, the number of facings and ideally the volume in number of CUs for each SKU. The REFILLS approach is a shelf scanning robot that moves autonomously through the aisles of the store at high frequency and scans the shelves. The scanning data captured feeds semantic databases with structured knowledge referred to real world situations that helps to build digital store twins in connection with à priori knowledge. A robust perception of the large variety of the store assortments that change dynamically and at ever higher frequency is of great relevance for the data quality. Figure 1.10 demonstrates a non-technical illustration of a shelf scanning robot moving through a supermarket. From an operational instore planning perspective, the availability of the semantic data and the generation of a digital store twin is more relevant than the technical method by which the information is gathered. In principle it is also conceivable that in future static cameras or smart shelves will contribute to generating the digital store twins with sensor information via the Internet-of-Things. However, from an initial investment perspective shelf scanning robots seem to be a promising approach for the REFILLS consortium for large existing store networks and against the backdrop of the use cases explored.

18

M. Sternbeck

Fig. 1.10 Non-technical illustration of shelf scanning robot to generate semantic digital store twins; taken from [28]

The principal application idea is to frequently generate semantic digital store twins that digitally reflect the current status of the store and the instore logistics system. This information is the detailed data input for operational planning lacking in current planning systems. Including the data from the digital twins, planning models and algorithms are applied, e.g., optimization models, heuristics or simulations. The results in the form of concrete suggestions for physical actions are transferred to reality for execution, for example via smartphones. The actions carried out modify the system status together with other changes of the regular store business, which is again recorded via the subsequent scan by the robot. The general application approach is illustrated in Fig. 1.11. A system similar to a Warehouse Management System (WMS) in a DC will also be necessary in a store to operationalize the application approach described. Such a Store Management System (SMS) is the IT layer in which the operations planning models and algorithms have to be performed and operational decision making takes place. The SMS contains the semantic store map data or is closely connected to the database. The SMS initiates and coordinates instore logistics activities and integrates subordinate systems that control instore robot technology. It is thus a crucial element for shedding light on the black box system.

1.4.1.2

Scenario 1—Expected Benefits

Numerous improvement opportunities are expected in several areas from a semantic digital store twin. The use cases described here are not complete. Of course, the expected benefits for different stakeholder groups not only depend on the existence of a digital store twin, but also on the design of the applications based on it.

1 Robot-Assisted Instore Operations in Grocery Retailing

19

Fig. 1.11 General application approach of shelf scanning robots and semantic digital store twins

Regarding logistics store operations, one benefit is automatic out-of-shelf detection. While having the robot count products is a challenging task, out-of-shelf detection works more reliably. Empty shelf slots are captured by the scanning robot. In combination with the product label on the shelf, the SKU of the empty slot can be identified and also whether items are still available for the customer or not in the case of multiple facings. So called zero-balance walks are often applied in the store in the manual, non-automated process, too, although they are very time consuming and partially ineffective as an out-of-shelf situation does not imply that there is always an inventory inaccuracy. High-frequency shelf scans could automate this process and save employees’ time and increase accuracy in inventory management [12]. If the stock data in the SMS show that although an out-of-shelf situation is recognized there is instore inventory in the backroom, a restocking task is put on the store employees’ digital worklist. Depending on the concrete organization of the restocking process and backroom management, robot-detected out-of-shelf situations could be recorded as zero countings with automatic adjustment in the stockholding system in the event of inventory inaccuracy. Another highly supportive benefit is the automatic comparison of the product at the front of the shelf with the corresponding shelf label, which may be electronic or paper-based. If paper-based shelf labels are used, an automatic check can be made to see whether the right price is displayed and the label is the current one. In any case, it can be checked automatically whether the product at the front of the shelf fits the shelf label or if the product is misplaced. In both cases, action may be required by store employees, who are notified via an entry in their digital worklists. A highly relevant benefit of digital store twins is that walking routes through the stores can be reduced by determining minimal distance routes through the store

20

M. Sternbeck

for different purposes, e.g., stacking new shelf labels, counting a number of SKUs manually, picking customer click-and-collect orders or restocking a number of SKUs from the backroom on the shelves. The digital store twin is also used to produce store-specific pallets in DCs, which influences the very time-consuming initial shelf-filling operations [8]. Products are picked together on DC pallets such that the sorting fits the individual store requirements and walking distances during initial shelf-filling operations in the stores are short. SKUs of categories are grouped together on pallets that are displayed side by side on the shelves. For example, loading carriers with SKUs on them that are placed on different sales floors can definitively be avoided. This benefit of less instore sorting and walking effort mainly influences the fixed cost component per SKU or order line in the initial shelf-filling process, which represents a significant share of overall operational instore logistics costs. Store-specific DC pallets are therefore considered to be a very relevant cost lever. However, this implies picking and packing flexibility in the DC, which could be problematic in warehouses that are not organized according to a parts-to-picker principle. The precise digital localization of all SKUs can be helpful for store employees to find the right slot on the shelf. Trained employees will probably need that service only in rare cases, but for new employees it can be very helpful, especially when product variety is very great and shelves contain a lot of small-volume, similarlooking products. Exact product localization information on the shelf can also be helpful in the case of highly dynamic product ranges with a lot of new product listings and de-listings. Of course, the benefit depends on how product locations are displayed to an employee. The faster and more effortlessly the concrete placements are displayed, the greater is the efficiency effect. One significant expected benefit would result from information about the size of the free shelf space and the corresponding free shelf capacities for each SKU right before initial shelf-filling operations start. This information could be approximated by the number of facings recorded by the robot as well as the depth of shelf, product and inventory data. The free shelf capacity can be used as input for an intelligent presorting algorithm and an adapted shelf-filling procedure that reduces backroom storage of overflow items. Right before shelf-filling operations start it can be decided with less risk compared to the time of ordering which products included in the new delivery are needed at what point in time on the sales floor, and what volume of them. In combination with the actual free shelf space, this knowledge can be used for a new support system to decide which SKU should initially be stacked on the shelf at what point in time. The goal is to achieve a higher proportion of items that can directly be stacked onto the shelf during initial shelf filling and to smooth instore workload across the week. This approach is described in more detail as part of REFILLS Scenario 2. The digital twin could perhaps also be used for regular checks and cataloging of secondary items in a store. For example, with help of the digital twin an automatic check can be made to ascertain that the fire extinguishers and other safety equipment are in the right places. Regular automatic checks can be made to ensure that escape routes are clear. In addition, different shelf types can be cataloged and checked to

1 Robot-Assisted Instore Operations in Grocery Retailing

21

see, if they are suitable for the volume they have to carry. Also, all other furnishings in the store and their placements can be monitored. Digital twin data about the shelves and concrete product placements are of great value and offer significant potential for the marketing and category managers, too. The shelf infrastructure data is necessary input to design store-individual planograms that fit the shelves and the customers of the specific store. However, depending on the freedom of choice of the store managers, the planograms may be a proposal that can be adapted to some extent. The resulting “realograms” in each store are captured by the shelf-scanning robot. In turn, these data—together with sales figures—greatly help to better understand customer behavior as a function of the current shelf configuration and to incorporate it into future planogram designs. The realogram could be especially relevant for temporary product offers on small display pallets and promotions in general. Sometimes, specific promotion periods are agreed with the manufacturers that invest in parallel in media advertisements. In such cases, an automatic check to see whether the products are placed and priced accordingly with the help of the scanning robot supports promotional success. Last but not least, customers may benefit from digital twin data, too. The most frequent customer questions in grocery stores are about the location of specific products. The digital twin data could, for example, be used for a customer instore navigation system incorporated into a mobile application that allows customers to search for a specific product and routes to the corresponding shelf location. In combination with a digital customer shopping list, the digital twin data could be used as input for customer route planning and the customer could efficiently be guided through the individual store. Customers could put their smartphone in a holder on the shopping cart and be guided through the supermarket aisles in a route-optimized manner to get all the items on their specific shopping list. Digital twin data could perhaps also be shared to a certain extent and sanitized to a certain degree to leverage an ecosystem to which third-party companies and startups could also contribute by creating new digital services for customers. The use cases described benefit from the digital twin data recorded by shelf scanning robots. However, different use cases require different scanning frequencies and some use cases could even be realized without an autonomous scanning vehicle. In implementation scenarios, it will therefore be of great relevance to align the data requirements of the individual use cases that should be established with the intended scanning robot deployment and scanning frequencies.

1.4.2 Scenario 2—Collaborative Instore Logistics The digital twin data generated by the scanning robot of Scenario 1 is the foundation for the more concrete instore operational Scenario 2. In this scenario a specific instore process configuration is imagined that is supported by adequate robotic technology, the application of which is based on the digital twin data. This new instore process— enabled by the robot technology developed in the project—is intended to avoid

22

M. Sternbeck

current inefficiencies by employees and instore robots working hand in hand together where both robots and humans contribute their respective individual strengths to a cooperative overall process. This is the aim of the REFILLS Scenario 2.

1.4.2.1

Scenario 2—General Idea

The general idea of Scenario 2 is to conceptualize and design a new instore process with some newly designed process steps that are enabled or supported by new robot technology with the goal of increasing operational efficiency. Premises are that the technology should be designed for a principal application in a decentralized store environment and that store employees and robots should collaborate effectively via an integrative and uncomplicated human-machine interaction. The underlying instore process of Scenario 2 is designed against the backdrop of the high manual efforts and operational costs in the processes of initial shelf filling and restocking from the backroom. The approaches identified pursue the goal of increasing the efficiency of these processes. The overall idea of the process designed is to introduce an additional manipulation step in the store’s backroom, which is overcompensated by following efficiency increases. Instead of directly carrying the entire DC delivery onto the sales floor for initial shelf filling, the pallets are depalletized in the backroom and the products taken off the pallet are presorted on roll cages. Criteria for this presorting process are the need for the products on the shelf, the store-specific placing locations of the specific SKUs, free shelf capacity to accommodate the number of items, and workload balancing for store employees. With this approach, only SKUs needed at a specific point in time should be carried onto the sales floor—in a store-specific allocation of the SKUs to shelves in systematic sequence, in volumes needed to guarantee customer service levels and for which free shelf capacity is available so that restocking from the backroom is reduced significantly. Robot technology could support these tasks. The idea is to evaluate an automatic depalletizing cell in the backroom, a fleet of store-ready automated guided vehicles and support technology for store employees in front of the shelf to improve ergonomics and reduce orientation times for finding the right slot on the shelf. Specifically, from a technology perspective, the approach is based on a robot that depalletizes incoming DC pallets in the backroom and sorts the SKUs on trolleys. The trolleys, partially equipped with a handling unit for better ergonomics, can be moved by low-cost, store-ready multi-purpose carrier modules. The carrier modules are also used to transport bins for empty cartons and foils and pointing units used to illuminate the right shelf slots for the products that have to be stacked onto the shelf. Figure 1.12 demonstrates a global overview of the technology applied in Scenario 2 in the store environment with the help of initial visionary non-technical sketches. The instore process configuration imagined in Scenario 2 is composed of three main sub-processes that are illustrated in the following:

1 Robot-Assisted Instore Operations in Grocery Retailing

23

Fig. 1.12 Global overview of technology applied in Scenario 2

1. Depalletizing and presorting 2. Autonomous transport 3. Collaborative shelf filling Depalletizing and Presorting In the center of the new instore logistics process idea of Scenario 2 is the function of depalletizing incoming DC pallets in the backroom. Although to some extent already deployed in relatively rare cases, the standard process today is to move the pallets on the sales floor without prior manipulation. This implies that in the current standard process all SKUs supplied from the DC are brought onto the sales floor, sorted on the sales floor to the specific shelf sections and stacked on the shelves during the regular initial shelf-filling operations regardless of any current immediate need at that point in time. Products that do not fit onto the shelves at that point in time due to restricted shelf space are carried back into the backroom, which generates high restocking costs. The process idea is now to introduce a kind of decoupling point in the backroom of the store. The products on the pallet are no longer all treated identically, but are presorted into groups by the depalletizing robot. The rationale behind this idea results from the store ordering and inventory system, which is first described with the effects that are relevant for the presorting idea. Orders for the ambient assortment of grocery stores are regularly placed according to a (R, s, n Q)-ordering policy. This is a periodic inventory policy in which the store has the possibility of placing an order every R periods resulting from the store order and delivery pattern applied so that products ordered can be bundled for picking and transportation. Under this policy, an order at the DC is issued only when the

24

M. Sternbeck

inventory position I P at a periodic review and potential order moment is strictly below the reorder level s. In this case, an order is issued that lifts I P back to at least s. However, as SKUs can only be ordered with the granularity of full case packs of size Q, the number n case packs have to be ordered (i.e., n Q consumer units) that bring I P back to s or just above it. Note that the inventory position corresponds to the sum of the inventory physically available in the store and the inventory that was already ordered but is still in transit [4]. For the performance of that inventory policy, the calculation of the reorder level s is of high relevance as s has to be set at a point of ordering (i.e., st ) such that the product availability rate desired is guaranteed until that point in time at which the delivery of the next order placement arrives in the store (i.e., t + L + R). In most cases, the reorder level is calculated and set dynamically for every review moment by using the point forecast values for the risk period composed of the lead time L and the review period R. This approach incorporates seasonal demand patterns during year, month and week, and sales trends [4]. Additionally, when setting s, safety stocks SS have to be considered, reorder level st can be expressed t+L+R so that the dynamic t+L+R E[Di ] + SSL+R , with i=t E[Di ] being the expected as follows: st = i=t demand values derived from the point forecasts in the risk period and SSL+R being the safety stocks dynamically calculated for the risk period. The safety stocks SSL+R for the risk period L + R are derived from an ex-post forecast evaluation process and the customer service level desired. If there are no structural shifts or disruptions it is assumed that forecast quality in the future corresponds to how well the estimated values have fit for the past. For the ex-post forecast evaluation, the forecasts for past periods are used, which were calculated with a forecast origin and horizon that correspond to the results obtained when forecasts are used for the order calculation. Without going into detailed safety stock calculations, the following can be noted. Safety stocks are comparatively high if the uncertainty of demand—reflected by expected forecast errors—is high, if the risk period is long, or if the desired customer service level is set at a high product availability rate. This implies that full-line grocery retail companies in particular have to pay special attention to the expensive safety stocks as customers expect high service rates in the grocery segment and forecast errors increase on average with growing product variety per category. Moreover, the tradeoff has to be taken into account between the length of lead time L and instore safety stocks. The longer the lead time is, the more degrees of freedom are given to DC and transportation to fulfill their tasks, but the longer the risk period in the store ordering system with the effect of requiring higher safety stocks. The second part of the risk period is the order and delivery frequency which defines the review period R and is a result of the store delivery pattern applied. The application of the specific pattern also has to be determined from an integrative perspective due to supply chain tradeoffs as delivery frequencies impact order and delivery sizes and therefore the lot sizes in DC and transportation. From an instore safety stock perspective it can be noted that the lower the store order and delivery frequency, the higher the safety stocks needed, which have to be reflected in the reorder level s.

1 Robot-Assisted Instore Operations in Grocery Retailing

25

The following effects can be derived from analysis of the (R, s, n Q)-inventory policy applied for instore inventory handling operations. The quantities delivered to a store at a delivery time do not necessarily correspond to the quantities immediately required at that time. The probability that the volumes delivered from DCs are not directly needed on the shelf is comparatively high for situations that are characterized by long lead times and review periods, low forecast accuracy and high service levels targeted. For example, assuming a long delivery lead time would imply that in the order level calculated a comparatively large proportion of safety stock is included for the lead time. When the corresponding order arrives instore at the end of the lead time, the safety stocks SSL , related to the lead time, were, on average, not needed in 50% of the cases (assuming a well-balanced forecast system with an expected value of the forecast error of 0). A second example refers to the length of the review period R. The longer the review period, the higher the reorder level s. This implies that orders are released earlier, average instore stock levels increase and the imputed average point in time at which the delivery is required on the shelf increasingly deviates from the time of delivery. These two examples demonstrate that stock arrivals from the DC in the store contain volumes that are not immediately needed at the point of delivery. The effects in these two examples are even greater, the lower the expected forecast accuracy and the higher the customer service goals. Trying to put considerable volumes on the shelves at a time when they are not actually needed increases the likelihood of their being returned to the backroom, resulting in significant restocking costs for overflow items. There are now several ways to react to the circumstances described. One approach would be to adjust the reorder levels from a cost perspective against the backdrop of the expensive restocking costs from the backroom, with effects on customer service [1, 10]. Another much more complicated approach would be to better align SKU listings, case pack sizes, store delivery patterns, inventory policies and shelf space allocation. Although there is significant potential and selective approaches definitively help to improve the systems, it is extremely challenging from a practical point of view due to the high complexity of such an interorganizational and interfunctional approach with massive interdependencies and due to the dynamics in assortment and customer behavior. The approach in the REFILLS Scenario 2 is therefore that products delivered to a store at a certain point in time are not treated identically any more by putting all SKUs on the shelf during one initial shelf-filling date; instead a differentiated shelf-filling approach is applied that requires a presorting process in the backroom supported by automation technology. The idea of depalletizing and presorting is based on two pillars. First, it is assumed that robotic technology in the backroom will ideally depalletize and presort the pallets fully autonomously. Second, a presorting algorithm is developed that supports the decision on how to proceed with the respective SKU at any time. It is conceivable that in future there is a robot working in the backroom that depalletizes the products from incoming DC pallets onto trolleys that are either stored intermediately in the backroom or are transported directly onto the sales floor. There are significant technical challenges considered in the REFILLS consortium, mainly

26

M. Sternbeck

Fig. 1.13 Illustration of a depalletizing cell in the store backroom (example); taken from [25]

with regard to product recognition and grasping from highly diverse mixed pallets. It could perhaps help in future that automation and digitization in the DCs lead to digital replication of the pallets picked in the warehouses that can be built on when it comes to the robot’s recognition of individual products in the store. Figure 1.13 illustrates schematically the shape such a depalletizing cell in the backroom might take. In the scenario, a new pallet arriving from the DC in the backroom of the store is brought to the working area of the robot. According to the decisions made by applying the presorting algorithm, the robot places the individual case packs on different trolleys placed around it so that they can be reached by the arm of the robot. The trolleys could perhaps have several layers so that placing the products is easier. Since the distribution of cases on trolleys is pre-calculated, trolleys are not filled one after another, but in a predefined pattern. Of course, the robot is just one fully automated option for performing this task, which was the research goal of REFILLS. In retail practice, other technological options are also conceivable that require less space and less investment but more manual handling for this additional process step. For example, augmented reality options based on intelligent glasses or camerabased perception and recognition followed by action instructions for employees via a display would be alternative approaches. It is not just trolleys or roll cages that can be used for the presorting process. If space is scarce, half or full pallets may possibly also be used in a more manual process. In addition to being a highly efficient technological solution, the presorting algorithm is of major importance for the success of this process idea. The result of the algorithm is to define on which trolley and where to place a case pack from the incoming pallet. Several goals are pursued with the algorithm. One is to significantly reduce overflow items, another is to improve sorting of the SKUs on the trolleys

1 Robot-Assisted Instore Operations in Grocery Retailing

27

compared to sorting of the SKUs on the pallets. Balancing the weekly workload of the store employees more suitably compared to the current process configuration is a further objective. The new process idea decouples the product flow in the backroom. Instead of a direct flow from the DC onto the sales floor of the store, a two-stage inventory process is applied, with the first stage being the order and delivery from the DC into the store’s backroom according to the (R, s, n Q) inventory policy, and the second stage being the store-internal inventory routing between the backroom and the shelf. The operational advantage of the decoupled process is that after arrival of the pallets in the store and after the lead time has passed, new information is available in terms of demand that has actually occurred during lead time and updated forecast values for the review period. This new information status can be used as a basis to decide whether a SKU is immediately needed on the sales floor or whether current stock (without the new delivery) is still sufficient to cover estimated demand plus safety stock for one or more of the following days. In this situation the product could be held unpacked in the backroom in the meantime and brought onto the sales floor when it is actually needed. At that time there is a higher probability that all items will fit on the shelf due to customer purchases in the meantime. The latest digital twin data are used to derive the actual shelf space available, possibly in combination with system stock data. For products that are not actually needed on a certain day but if enough free shelf space is available on that day to accommodate at least one case pack (if several are delivered from the DC), degrees of freedom result that could be used to align the workload for the store employees to a desired workload profile over the days being considered. After having finally assigned each case pack supplied from the DC to a concrete day on which it is stacked onto the shelf, the next step is to plan on which trolley each case pack has to be placed by the robot. Heuristics could be applied for this planning step to achieve short distances both for the trolly trips as well as for the employees during the shelf-filling process. The cases on one trolley should cover only a small radius around the trolley’s destination to minimize the clerk’s movement and to minimize the search area to find the right shelf position. The results of the presorting algorithm are used in the SMS for robot control. Figure 1.14 demonstrates very roughly and as an example only how such a presorting algorithm might be conceptualized—although it still neglects highly relevant aspects such as having only a restricted number of trolleys available. The concrete design of a presorting algorithm is subject to many different potential influencing factors, against the backdrop of numerous company-specific and organizational aspects that could be reflected accordingly. For example, different inventory policy configurations are conceivable in principle for the second stage (backroom to shelf). It also has to be decided whether only one presorting process is carried out right after the DC delivery arrives at the store for a few following days (e.g., the days belonging to the review period of the inventory policy applied between DC and store), or whether the presorting is repeated each day by always integrating data updates of the previous day. This will definitely be dependent on the efficiency of the presorting process achieved.

28

M. Sternbeck

Fig. 1.14 Illustrative example of a presorting algorithm with one presorting activity after DC arrivals and an uncapacitated number of trolleys

1 Robot-Assisted Instore Operations in Grocery Retailing

29

With this multi-echelon inventory system and the presorting process, an instore postponement strategy is applied with regard to serving the shelf with products of the DC pallets out of the backroom. SKUs are brought onto the sales floor in an appropriate sequence when individual items are either needed or when sufficient shelf space is available to accommodate the entire case pack. Autonomous Transport The next instore process step after putting case packs on trolleys is moving the trolleys to different locations. For this and also for other instore transport processes the idea is to have a flexible and autonomous carrier module that is able to perform different transportation activities. Such a modular approach seems to be necessary since the operating times of individual, specific trolleys or other units are rather low compared to the overall opening hours of the store. If all units were equipped with individual drives and further automated vehicle technology as sensors, this would lead to very low utilization rates and very high costs in such decentralized production environments of retail companies. Generally, a low-cost design will be of major importance for retailers operating a huge number of stores. One can imagine that there might be a small fleet of autonomous carrier modules per store that are able to carry the most relevant units in a store. These are, for example, trolleys or roll cages, bins in which cardboard and foils are packed during the unpacking process, picking units for preparing click-and-collect orders, and other conceivable units such as small shelves with material to equip cash desks, or pointing units to support shelf filling. For example, there are thousands of quite standardized instore roll cages in use at many grocery retail companies. The goal is that the autonomous carrier unit is able to transport these units and all compatible units relevant for instore operations. Figure 1.15 illustrates in a non-technical manner how such a carrier module might be designed. There are numerous requirements for a carrier module. It is relevant from an application perspective that the carrier module demonstrates a low vertical profile design to fit underneath existing units. A lifting and docking device is required to safely push up the units carried that supports up to around 100 kg. The vehicle must be able to maneuver flexibly in the grocery store aisles, which are sometimes narrow. Moreover, the drive has to ensure that existing sales floors are not damaged by intensive use of the carrier along the main instore transportation routes. Due to the very frequent changes in the store with regard to special product and display placements and other movable interiors, for example, the carriers have to navigate autonomously through the store without any strictly predefined routes via simultaneous localization and mapping approaches. The carrier modules receive their tasks from the SMS, in which task prioritization, scheduling and sequencing has to be algorithmically implemented. An idea is that store employees can interact via their smartphones and for example order a new bin or complete and acknowledge the processing of a trolley so that it can be collected autonomously by a carrier module. During free periods, the carriers are for instance used to sort trolleys in the backroom that will be required next for shelf filling.

30

M. Sternbeck

Fig. 1.15 Non-technical illustration of a carrier module; partly taken from [24]

Collaborative Shelf Filling Having the trolleys or roll cages filled with products to be stacked onto the shelf at that certain point in time on the sales floor—after being depalletized, presorted, put on trolleys and transported within the store—the shelf-filling process starts. Store employees are supported by different technological aids in the imagined, visionary collaborative shelf-filling process of the future and work hand in hand with robots. The tasks of physically stacking the products on the shelves, however, are exclusively performed by humans in this Scenario 2. The preparation process for shelf filling includes the provision of products and the requisite equipment by the autonomous carrier system. This includes the trolleys partially equipped with an integrated handling unit carrying the products, a bin for the packing material resulting from product unpacking, and a newly designed pointing unit. The trolleys are positioned by the carrier module in terms of distance to the shelf with regard to the products on it, so that walking routes for the employees are short. Nevertheless, the trolley might be an obstacle for either the clerk or customers, or the clerk wants to move the trolley closer to a shelf than precalculated. This means the trolley needs to be moveable manually by any person. The idea of the handling unit on the trolley is that it serves case packs to the employees in an ergonomic manner. In order to achieve this, a roll cage with the products could be docked into a frame to which a manipulating robot arm is attached. This approach is characterized by considerable complexity. The cases on the trolley need to be percepted first to know their sizes, weights, and the specific products in the cases. The handling module will probably be limited in its ability to handle heavy products so that the entire assortment cannot be handled, which is a limiting

1 Robot-Assisted Instore Operations in Grocery Retailing

31

Fig. 1.16 Non-technical illustration of the overall vision of a collaborative shelf-filling process with (1) an exclusive pointing unit in place and (2) a combined handling and pointing unit; taken from [24]

factor. The employee then takes hold of the case pack that had automatically been recognized by the handling unit. After grabbing the product and while opening the case pack manually, the pointing unit—which was previously positioned by a carrier—sheds light on the corresponding slot on the shelf so that the employee can immediately recognize where to put the product on the shelf (put-to-light operation). On the way back from the shelf, the employee puts empty packaging material into the bin before grabbing the next case pack provided ergonomically by the handling unit of the trolley. Trolleys and bins are exchanged regularly by the autonomously operating carriers. Figure 1.16 provides a non-technical illustration of the overall vision of a collaborative shelf-filling process with (1) an exclusive pointing unit in place and (2) a combined handling and pointing unit.

32

M. Sternbeck

The process of pointing is divided into the identification of the case and thus the product and the pointing operation itself. Different approaches are generally conceivable for the identification process: automatic camera-based recognition, for example, or the store employee scans the case using an instantaneously operating scanner. The latest digital twin data of Scenario 1 in combination with a localization system of the pointing unit provide the information required to shed light fast and precisely on the corresponding slot on the shelf.

1.4.2.2

Scenario 2—Expected Benefits

The expected benefits of Scenario 2 are manifold and of course depend on the technological units that are actually used. Scenario 2 in its most technically advanced visionary form contains several elements that could also be used separately, especially because the different technologies involve different complexities and different expected implementation speeds. Depalletizing and presorting in the backroom already has an impact at upstream supply chain stages. The process of “roll-cage sequencing” currently applied, i.e., putting the products on pallets and roll cages in the DC according to (an average) store layout leads to significant inefficiencies in the retail warehouses. Without that restriction, products could be placed in the DCs according to DC and picking requirements, which would raise the expected efficiency to a new level. Moreover, the pressure on delivery lead times would ease slightly. From an upstream point of view, longer lead times tend to drive a high level of upstream efficiency and workload balancing, especially in the retail field with its weekly seasonality and generally volatile demand structures. Instore, strong emphasis of Scenario 2 is given to tasks that have so far been very time consuming. The following tasks are mainly positively affected by the operational model of Scenario 2: instore transportation, orientation and walking times in front of the shelves and sorting during shelf filling, returning overflow items, and restocking from the backroom. Of course, the depalletizing and presorting activities in the backroom come in addition, which have to be overcompensated. The overall process benefits and individual elements are not yet finally evaluated and validated and therefore mainly reflect expected general effects. The carrier system avoids very frequent walking distances and times especially between the backroom and sales floor during instore logistics operations and store employees can use the time to carry out more value-adding activities. In front of the shelf, when initially stacking the products, a key benefit is certainly support from the pointing unit, especially for new and untrained employees. The time needed in front of the shelf for orientation, searching for the right slot on the shelf and walking to the shelf is currently almost equal to the time required for stacking the CUs on the shelf. The orientation time needed in front of the shelf also varies depending on the product category. A considerable amount of time is needed to identify products and find the corresponding place on the shelf, especially for categories with small, similar-looking products. This is why the impact of robotic support is considered to be significant for that task.

1 Robot-Assisted Instore Operations in Grocery Retailing

33

The depalletizing cell and the presorting algorithm should lead to products on trolleys that perfectly fit on a certain shelf section in the store, based on actual digital twin data. Having the DC pallet instead of a presorted trolly, at a certain place in the store, sorting processes and longer walking distances occur as warehouses organized according to a picker-to-part-system cannot offer pallets specific to the store. Sorting within the store, however, can largely reflect individual requirements, also because stability issues are not as relevant as during road transportation. Perhaps the main benefit of presorting is the postponement of shelf filling for products not directly needed and for which free shelf space is not sufficient directly after arrival of the DC delivery. The very time-consuming and costly processes of returning these products to the backroom, putting them unpacked on a roll cage and restocking them later is expected to be partly avoided or drastically reduced. The place in the backroom currently needed for overflow items will be used for trolleys with products unpacked and not yet initially placed on shelves. The costs of returning the product, sorting it on a roll cage, transporting it back to the sales floor, orientating and walking to the shelf can be avoided for every SKU that is not in excess of what can be placed on the shelf. The fixed costs of this additional process are higher than for initial shelf filling but are often only incurred for a small number of consumer units. This procedure is also expected to have a positive impact on shelf availability as the presorted trolleys with case packs in the backroom are registered in the SMS while overflow items moved back into the backroom are not registered separately. The working times of store employees can also be balanced more suitably than in the current process via the focused utilization of existing degrees of freedom. Figure 1.17 roughly depicts the expected impact of Scenario 2 on tasks of the three most costly instore logistics processes described in Sect. 1.3.2.

1.4.3 Scenario 3—Autonomous Shelf Filling The third scenario of the REFILLS project is, in contrast to the integrated approach of Scenario 2, focused exclusively on the time-consuming subprocess of stacking individual CUs on the shelf. Scenario 3 thus extends the previous scenario to include autonomous shelf filling. A robotic arm is used to autonomously place the products on the shelves. For full autonomous shelf replenishment, fine manipulation would be required, which is a very difficult task, often even for humans, especially when consumer units of the products have a small footprint and are comparatively heavy at the top. The robot needs to be able to handle a variety of objects, including rigid objects, plastic bottles, glass bottles, paper boxes, and deformable packages with fragile items. To perform this process with the existing shelf infrastructure the robot manipulator has to be able to pull, push, rotate, and possibly relocate items in narrow shelf spaces, which is highly challenging.

34

M. Sternbeck

Fig. 1.17 Expected impact of Scenario 2 on tasks of the three most costly instore logistics processes

The general idea is that the solutions generated in the previous scenarios are also used for Scenario 3, which pursues the overall vision of fully automating shelf-filling operations. The global vision of that scenario is illustrated in Fig. 1.18. The scenario investigates the autonomous placement of individual consumer units, already unpacked, on the shelf. This is why Scenario 3 is detached from the previous scenario from an end-to-end autonomous process perspective, as unpacking has still to be carried out manually. In an application scenario the robot should work as autonomously as possible. This implies that the robotic system decides where to put the item or what other tasks need to be performed first. This requires excellent perception. The correct handling and positioning of the items on the shelf is of high priority. While the process takes place without human interaction and outside the store opening hours, the system can call a clerk for assistance if needed to resolve unpredictable events. This scenario is very complex due to high demands on perception and autonomous handling, and is certainly the scenario that is currently still the furthest from practical applicability.

1 Robot-Assisted Instore Operations in Grocery Retailing

35

Fig. 1.18 Non-technical illustration of the vision to fully automate shelf-filling operations; taken from [31]

1.5 Managerial Challenges of Using Professional Service Robots in Retail Stores The main focus of this chapter is on instore logistics processes and technological support. However, it should not be overlooked that the success of all instore technology applications depends largely on the acceptance and cooperation of the store employees. The desired effects of robot applications as described in this chapter are only achievable by effective and efficient human-machine interaction in a truly collaborative environment. A key managerial challenge will be to encourage employees to have a positive attitude towards the more intensive use of technology, and to make them key actors in the transformation processes. The vision developed during the REFILLS project work is definitively not to transform stores into large walk-in vending machines. The stores of the future will still be joint sociotechnical systems with a more pronounced technical component than today. Despite all the use of technology, store management will still include a very strong social component. Customers in general appreciate personal interaction at the point of sale and for many people bricks-and-mortar retailing is also a place of human encounter. This will also be a comparative advantage of bricks-and-mortar stores in future and keep them viable. In contrast to robots that interact with customers, the REFILLS approach is intended to free up time by leveraging logistics efficiency potential for the more valuable personal customer interactions at the point of sale and for the numerous additional instore tasks in an omnichannel world. Nevertheless, the idea of using professional service robots for instore logistics is accompanied by a lot of managerial challenges, especially with regard to the acceptance and support of the employees. Meyer et al. [26] explored frontline employees’ acceptance and resistance to service robots in bricks-and-mortar retailing by conducting 24 in-depth interviews in

36

M. Sternbeck

Germany and Austria. The following remarks are essentially based on their findings. Introducing robots to take over instore logistics tasks that are today carried out by humans is of course a sensitive topic as this significantly affects and changes the core activities of store employees. It is understandable that this is accompanied by fears and worries. Employees interviewed report fears of being substituted or degraded by the robot and therefore less respected by the customers. They are concerned about losing their current status in the competition with the robots and being less essential and less valued by customers and management [26]. It must therefore be possible to create a basic attitude that the responsibility of employees increases because they recognize that they can control the instore logistics robots to their advantage, and this puts them in a more responsible position. For the management of instore robotic projects it therefore seems to be highly relevant to strongly emphasize the supportive character of the instore logistics robotic approaches, and that employees will control the robots and not the other way round. Nevertheless, it is clear that the instore application of robot technology will alter the role of store employees, which must not be underestimated. Numerous novel challenges will shape their daily working environment when imagining scenarios, as described in this chapter. There is a risk that store employees will be caught in an unreasonable amount of tension between the customers, the organization, technological devices and new instore robots. A fast introduction of instore robots that is not well prepared is especially inclined to lead to technostress [26]. A start with really small-scale, clearly defined and well-communicated use cases therefore seems to be promising so that employees can recognize the supportive character of the technology instead of being completely disrupted from their previous routines. It also seems to be highly relevant that employees easily understand the general approach of the use case in order to avoid a perceived lack of plausibility, which could lead to a negative attitude. This requires an open, honest and clear communication about the expectations and objectives of the use case. Employees will better understand and support new technology if they are included in the technological creation process. Shaping an open and participatory culture of robotic innovation is therefore an essential management task. Store employees like to feel included in technological creation and evaluation processes, to share their experiences and to bring in their ideas and fundamental knowledge of daily instore challenges. This will lead to use case definitions that properly fit instore requirements and will help to avoid implementation obstacles in advance. The perceived value of a robot increases if store employees are able to contribute and see that their ideas are valuable and perhaps implemented [26]. Although not fully comparable, dm-drogerie markt has had experience with the introduction of a personalized company smartphone for every store employee. The success factors were later empirically analyzed and it turned out that besides all stakeholders being involved and part of the project team, the ample time spent on horizontal communication between the store colleagues of different age groups in team meetings was a particularly important success factor. Internal team discussions led to the realization that the smartphone made many things in the store easier. A deeply rooted culture of respect and “learning from one another” also supported

1 Robot-Assisted Instore Operations in Grocery Retailing

37

rapid acceptance of the new tool and confident use of it. Initial skepticism among some employees quickly gave way after the first positive experiences that daily work processes were significantly facilitated. Certainly in this case there is also the fact that the company pursues a fairly far-reaching decentralization of responsibility so that employees feel largely responsible for their own working methods and the success of the store, which promotes the acceptance of new technologies. After introduction of the smartphones, many ideas for new applications were generated by store employees that are now implemented. These lessons learned are certainly also helpful for potential future implementations of robot technology. Further managerial tasks include establishing an implicit and explicit corporate knowledge base with regard to instore robot applications. Knowledge building will be necessary in numerous areas of the company: instore, in IT and logistics, but also—and no less relevant—in the fields of employee management, store construction, category management, packaging design, sales management and finance and controlling. A service concept has to be designed and implemented that addresses and incorporates the use of instore logistics robots. Employees express concerns about whether they will succeed in working together with robots, especially when there are malfunctions in the presence of customers [26]. A central goal has therefore to be not to leave the employees alone struggling with defects and operational imperfections. This will frustrate and annoy employees so that they could feel that the robots generate more work rather than taking work away from employees [26]. Quick service responses and a field force will be required to support the operation, either provided by the company itself or by a service provider, for example via incident management, repair services and also trainings and briefings. The focus of this chapter is more on logistics processes and principal technical design options than on economic profitability of instore logistics robots. However, in-depth financial calculations have to follow when describing business cases for selective use cases that require some kind of robot. Profitability analyses are then necessary for different implementation scenarios. This will certainly have a substantial influence on how the scenarios will concretely be configured and which robot technology comes to action in which process steps and stores. The question of financing the robots must also be clarified, at least when the first use cases leave pilot status and are to be rolled out at a larger number of stores.

1.6 Conclusion This chapter discussed the use of robotics technology for instore logistics as one approach of many to offer an answer to current challenges of operating bricksand-mortar stores. To provide the necessary background, grocery supply networks were initially characterized as they are operated by many retail companies. The focus was strongly on instore logistics. After a detailed process description, the cost structure was analyzed, specifics of this logistics system explored, past developments and the current status examined and challenges for instore automation processes derived. Following this intensive study of the instore logistics system, automation

38

M. Sternbeck

approaches via the use of robotics technology were explored. In line with the structure of the REFILLS project, three scenarios were investigated, i.e., store monitoring, collaborative instore logistics and autonomous shelf filling. The general thrust of the scenarios and the conceivable robot technology applied were particularly driven by a non-technical, overall process perspective, and the expected benefits were analyzed against the background of the expected integrated process and business effects. The focus in the final section shifted from process- and technical-related questions of instore logistics robots to further managerial challenges that will accompany instore robot implementation, especially the profound involvement of store employees. The contribution of this chapter is to establish a framework for the REFILLS robotics solutions, exploring the overall technological approaches from a practical end user perspective and putting the technology explored into the overall retail process context and deriving its expected effects. It therefore contributes to answering the research questions of which benefits can be expected from instore robots and what may be targeted fields of application in future [11]. The professional service robot solutions identified and explored show great potential to improve instore logistics, entire store management and future customer experience in bricks-and-mortar retail stores. However there are still many challenges on the path to extensive and comprehensive future use of instore logistics robots. The scenarios explored in this chapter are first examples of conceptual and coherent approaches. It is not in any way assumed that the processes need to be exactly the same in practice. Specific parts and modules of the processes presented could also be combined individually to form an alternative instore logistics configuration. Scenario building including a deeper integration of investment calculations will probably lead to adapted scenarios. More research is needed in the technical as well as the procedural and also social field to develop comprehensive new instore logistics systems in which humans and machines work together in true collaboration. REFILLS with all its results is a promising starting point for further developing that vision and for subsequent research and continued practical pilot projects.

References 1. Atan, Z., Erkip, N.: Note on “the backroom effect in retail operations”. Prod. & Oper. Manag. 24(11), 1833–1834 (2015). https://doi.org/10.1111/poms.12357 2. Boysen, N., de Koster, R., Füßler, D.: The forgotten sons: warehousing systems for brick-andmortar retail chains. Eur. J. Oper. Res. 288(2), 361–381 (2021). https://doi.org/10.1016/j.ejor. 2020.04.058 3. Broekmeulen, R.A.C.M., Sternbeck, M.G., van Donselaar, K., Kuhn, H.: Decision support for selecting the optimal product unpacking location in a retail supply chain. Eur. J. Oper. Res. 259(1), 84–99 (2017). https://doi.org/10.1016/j.ejor.2016.09.054 4. Broekmeulen, R.A.C.M., van Donselaar, K.: A heuristic to manage perishable inventory with batch ordering, positive lead-times, and time-varying demand. Comput. & Oper. Res. 36(11), 3013–3018 (2009). https://doi.org/10.1016/j.cor.2009.01.017

1 Robot-Assisted Instore Operations in Grocery Retailing

39

5. Cur¸seu, A., van Woensel, T., Fransoo, J., van Donselaar, K., Broekmeulen, R.: Modelling handling operations in grocery retail stores: an empirical analysis. J. Oper. Res. Soc. 60(2), 200–214 (2009). https://doi.org/10.1057/palgrave.jors.2602553 6. DeHoratius, N., Holzapfel, A., Kuhn, H., Mersereau, A., Sternbeck, M.G.: Evaluating count prioritization procedures for improving inventory accuracy in retail stores. SSRN Electron. J. (2020). https://doi.org/10.2139/ssrn.3661481 7. DeHoratius, N., Raman, A.: Inventory record inaccuracy: an empirical analysis. Manage. Sci. 54(4), 627–641 (2008). https://doi.org/10.1287/mnsc.1070.0789 8. dm-drogerie markt GmbH + Co. KG: Integrativ. Intelligent. Automatisiert. Innovative Handelslogistik bei dm-drogerie markt (2020). https://www.bvl.de/dlp/dlp-preistraeger-2020 9. EHI Retail Institute e. V.: Durchschnittliche Anzahl der Artikel in Lebensmittel-Supermärkten in Deutschland in den Jahren 1965 bis 2015 (2021). https://de.statista.com/statistik/daten/ studie/479382/umfrage/artikel-in-lebensmittel-supermaerkten-in-deutschland/ 10. Eroglu, C., Williams, B.D., Waller, M.A.: The backroom effect in retail operations. Prod. & Oper. Manag. 22(4), 915–923 (2013). https://doi.org/10.1111/j.1937-5956.2012.01393.x 11. Evanschitzky, H., Bartikowski, B., Baines, T., Blut, M., Brock, C., Kleinlercher, K., Naik, P., Petit, O., Rudolph, T., Spence, C., Velasco, C., Wünderlich, N.V.: Digital disruption in retailing and beyond. J. Serv. Manag. Res. 4(4), 187–204 (2020). https://doi.org/10.15358/2511-86762020-4-187 12. Forgan, B.: What robots can do for retail. Harvard Business Review Digital Articles pp. 2–5 (2020). https://hbr.org/2020/10/what-robots-can-do-for-retail 13. GfK e.V.: Umsatz im Lebensmitteleinzelhandel in Deutschland in den Jahren 1998 bis 2020 (2021). https://de.statista.com/statistik/daten/studie/161986/umfrage/umsatz-imlebensmittelhandel-seit-1998/ 14. Handelsverband Deutschland e.V. (HDE): Zahlenspiegel 2020 (2020). https://einzelhandel.de/ publikationen-hde/zahlenspiegel 15. Hellström, D., Saghir, M.: Packaging and logistics interactions in retail supply chains. Packag. Technol. Sci. 20(3), 197–216 (2007). https://doi.org/10.1002/pts.754 16. Holzapfel, A., Hübner, A., Kuhn, H., Sternbeck, M.G.: Delivery pattern and transportation planning in grocery retailing. Eur. J. Oper. Res. 252(1), 54–68 (2016). https://doi.org/10.1016/ j.ejor.2015.12.036 17. Holzapfel, A., Kuhn, H., Sternbeck, M.G.: Product allocation to different types of distribution center in retail logistics networks. Eur. J. Oper. Res. 264(3), 948–966 (2018). https://doi.org/ 10.1016/j.ejor.2016.09.013 18. Holzapfel, A., Sternbeck, M., Hübner, A.: Selecting delivery patterns for grocery chains. In: Lübbecke, M., Koster, A., Letmathe, P., Reinhard, M., Peis, B., Walther, G. (eds.) Operations Research Proceedings 2014, pp. 227–232. Springer International, Cham (2016) 19. Hübner, A., Kuhn, H., Sternbeck, M.G.: Demand and supply chain planning in grocery retail: an operations planning framework. Int. J. Retail & Distrib. Manag. 41(7), 512–530 (2013). https://doi.org/10.1108/IJRDM-05-2013-0104 20. Hübner, A., Schaal, K.: Effect of replenishment and backroom on retail shelf-space planning. Bus. Res. (2017). https://doi.org/10.1007/s40685-016-0043-6 21. Hübner, A.H., Kuhn, H.: Retail category management: state-of-the-art review of quantitative research and software applications in assortment and shelf space management. Omega 40(2), 199–209 (2012). https://doi.org/10.1016/j.omega.2011.05.008 22. Kotzab, H., Teller, C.: Development and empirical test of a grocery retail instore logistics model. Br. Food J. 107(8), 594–605 (2005). https://doi.org/10.1108/00070700510610995 23. Kuhn, H., Sternbeck, M.: Integrative retail logistics - an exploratory study. Oper. Manag. Res. 6(1–2), 2–18 (2013). https://doi.org/10.1007/s12063-012-0075-9 24. Lippiello, V.: Robotic solutions for instore logistics from the REFILLS project. European Robotics Forum, Malaga, Spain (2020) 25. Lippiello, V., Arpenti, P., Cacace, J., Villani, L., Siciliano, B.: A depalletising system for heterogeneous and unstructured pallets. IROS Workshop “Robotics for logistics in warehouses and environments shared with humans”. In: IEEE/RSJ International Conference on Intelligent Robots and Systems, Madrid, Spain, October 2018

40

M. Sternbeck

26. Meyer, P., Jonas, J., Roth, A.: Frontline employees’ acceptance of and resistance to service robots in stationary retail - an exploratory interview study. J. Serv. Manag. Res. 4, 21–34 (2020). https://doi.org/10.15358/2511-8676-2020-1-21 27. Mou, S., Robb, D.J., DeHoratius, N.: Retail store operations: Literature review and research directions. Eur. J. Oper. Res. 265(2), 399–422 (2018). https://doi.org/10.1016/j.ejor.2017.07. 003 28. Reiling, J.: Robotics enabling fully integrated logistics lines for supermarkets. European Robotics Forum, Tampere, Finland (2018) 29. Reiner, G., Teller, C., Kotzab, H.: Analyzing the efficient execution of in-store logistics processes in grocery retailing - the case of dairy products. Prod. & Oper. Manag. 22(4), 924–939 (2012). https://doi.org/10.1111/poms.12003/abstract 30. Saghir, M., Jönson, G.: Packaging handling evaluation methods in the grocery retail industry. Packag. Technol. Sci. 14(1), 21–29 (2001) 31. Siciliano, B.: REFILLS in a nutshell: IROS Workshop “Robotics for logistics in warehouses and environments shared with humans”. In: IEEE/RSJ International Conference on Intelligent Robots and Systems, Madrid, Spain, October 2018 32. Sternbeck, M.G.: A store-oriented approach to determine order packaging quantities in grocery retailing. J. Bus. Econ. 85(5), 569–596 (2015). https://doi.org/10.1007/s11573-014-0751-3 33. Sternbeck, M.G., Kuhn, H.: An integrative approach to determine store delivery patterns in grocery retailing. Transp. Res. Part E 70, 205–224 (2014). https://doi.org/10.1016/j.tre.2014. 06.007 34. van Zelst, S., van Donselaar, K., van Woensel, T., Broekmeulen, R., Fransoo, J.: Logistics drivers for shelf stacking in grocery retail stores: potential for efficiency improvement. Int. J. Prod. Econ. 121(2), 620–632 (2009). https://doi.org/10.1016/j.ijpe.2006.06.010 35. Wen, N., Graves, S.C., Ren, Z.J.: Ship-pack optimization in a two-echelon distribution system. Eur. J. Oper. Res. 220(3), 777–785 (2012). https://doi.org/10.1016/j.ejor.2012.02.003 36. Wensing, T., Sternbeck, M.G., Kuhn, H.: Optimizing case-pack sizes in the bricks-and-mortar retail trade. OR Spectr. 40(4), 913–944 (2018). https://doi.org/10.1007/s00291-018-0515-5

Chapter 2

Robots Collecting Data: Modelling Stores Michael Beetz, Simon Stelter, Daniel Beßler, Kaviya Dhanabalachandran, Michael Neumann, Patrick Mania, and Andrei Haidu

Abstract Retail stores are a promising application domain for autonomous robotics. Unlike other domains, such as households, the environments are more structured, products are designed to be easily recognizable, and items are consciously placed to facilitate their detection and manipulation. In this chapter we exploit these properties and propose a mobile robot systems that can be deployed in drugstores and autonomously acquire a semantic digital twin model of the store. This facilitates autonomous robot fetch and place and shopping in a virtual replica of the store. The potential commercial impact is substantial because in the retail business stores are an information blackbox and being able to automate inventory on a regular basis could improve the knowledge of retailers about their business drastically. Keywords Semantic digital twin · Semantic mapping · Knowledge representation · Virtual simulation

M. Beetz · S. Stelter (B) · D. Beßler · K. Dhanabalachandran · M. Neumann · P. Mania · A. Haidu Universität Bremen, Am Fallturm 1, 28359 Bremen, Germany e-mail: [email protected] M. Beetz e-mail: [email protected] D. Beßler e-mail: [email protected] K. Dhanabalachandran e-mail: [email protected] M. Neumann e-mail: [email protected] P. Mania e-mail: [email protected] A. Haidu e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 L. Villani et al. (eds.), Robotics for Intralogistics in Supermarkets and Retail Stores, Springer Tracts in Advanced Robotics 148, https://doi.org/10.1007/978-3-031-06078-6_2

41

42

M. Beetz et al.

2.1 Introduction Stationary retail is in a transition and seeks new solutions to better compete with webstores. One of the disadvantages of the stationary retail stores compared to webstores is the lack of information about the day-to-day inventory of the stores. A promising approach to overcome the information disadvantage is to deploy autonomous robots that automatically acquire models of retail stores, count the stocked products, and document their arrangements in the shelves. The REFILLS project proposes to acquire and represent these kinds of information as semantic store maps. Semantic store maps facilitate answering questions such as (◦) which products are out-of-stock or misplaced?, (◦) where can I find a particular product?, (◦) what is quickest route for my shopping list?, (◦) how does a particular shelf deviate from the standard?, and (◦) how much free space is left in the detergent shelves? In robotics such pieces of information are provided by robot maps, which are information resources that enable robots to infer this information about their environment that the robot needs to accomplish its tasks. The knowledge content of such robot maps can be measured in terms of the queries that they can answer. As shown in Fig. 2.1, semantic store maps are also detailed digital replicas of stores. They can be used to digitally visualize stores, virtually rearrange furniture and products with little effort and to simulate processes such as shopping and filling shelves. Semantic store maps are realistic and detailed enough to be considered as digital twins of real environments that allow for symbolic question answering, where answers can be visualized through virtual reality (VR), the physical simulation of manipulation actions, and highly realistic visual rendering. Semantic store maps are specializations of semantic digital twins (semDT) which are object-based first-order logic knowledge bases that can be rendered as realistic looking virtual environments using virtual reality scene graph technology. Digital twins are increasingly finding their presence in industries in the context of manufacturing and product development. In the REFILLS project, we have advanced the digital twins technology by adding semantic capabilities to them. This allows

Fig. 2.1 Example of a retail store laboratory (left) and its representation as a semantic store map (middle, right)

2 Robots Collecting Data: Modelling Stores

43

objects and processes in the virtual store to be described in a machine-understandable way, enabling many new AI applications. A semDT realistically represents the physical store as well as its shelves, products and respective locations. Each element in the store is uniquely identified so that it can be linked to merchandise management systems. In addition, this enables background knowledge from other sources to be linked to an item based on categories like ingredient, consumable, non-consumable. Digital twins can already be created, to a great extent, autonomously for standardized shelf systems, as demonstrated in this project. Customer behavior could be recorded by certified anonymous camera systems while product movements can be tracked by IoT (internet of things) connectivity. In comparison, merchandise management systems additionally store information about sales in an abstract manner without detailed spatial information. A semDT combines this varied business data into a coherent picture. The entire company thus has access to the most current and complete data set. By using SemDTs it is possible to provide store and customer specific tailored solutions. In this way, logistics can be distinctively coordinated for each store and the replenishment can be optimized by sorting on site according to the exact placement of products. Assortment composition can be adapted to store-specific features and technical tools that are part of the semDT platform to support the store employees. Furthermore, the semDT enables the construction of true-to-life digital worlds, which are annotated with machine-readable background knowledge and linked to other information systems. This allows digital twins to be not only realistically visualized but also simulated. The visualization enables virtual visits to stores, insertion of additional information using virtual reality, and the automatic image-based recognition of products. Simulation, on the other hand, allows extensive marketing studies and supports the control of robots through the semantic information. This makes semDT a symbolic knowledge base with rich semantic data that can support visual realism, enhance cognitive capabilities of an artificial agent and also aid in the development of innovative information services. SemDTs are a kind of semantic object maps (SOM), which are studied in semantic mapping. Semantic mapping is surveyed by Cadena et al. [4] who identify several open problems, including the question of how to acquire environment models that are based on rich representations, which is the problem that the semDT targets. Most of the approaches for semantic mapping are concerned with capturing the world around the robot as accurately as possible with the agent being an observer in the environment. While capturing the data is an important part of the process, generating machine-understandable representations of the environment is equally important. Particular powerful forms of semantic maps are semantic object maps introduced by Rusu et al. [19], and further developed by Pangercic et al. [17]. A key difference between SOM and semDT mapping is that for SOMs the semantic representation is solely generated through the interpretation of the sensor data while semDTs generate the symbolic representations using accurate models of objects and object parts as inputs for the mapping process. A similar approach is taken by Gemignani et al. [10], where authors also argue for a deep understanding of a robot’s

44

M. Beetz et al.

environment, but they propose a semi-autonomous approach for acquiring the SOMs with the help of human interaction. In another recent work Deeken et al. [6] propose SEMAP, a framework for semantic map representation that enables qualitative spatial reasoning about objects in a robots surrounding. Recently, digital twins have been proposed as digital replicates of environments for the digitization of manufacturing processes, ranging from industrial production lines [23] to hardware in the loop testing [7]. To the best of our knowledge, digital twins do not treat the digital replicas as symbolic knowledge bases. In addition, we are not aware of digital twins being automatically created through robots. SemDTs are also based on the idea of generating geometric structures using formal grammars. One prominent example is the notion of shape grammars [22], in which production rules are recursively applied on 2D or 3D geometric figures to generate more complex ones. They have been used in domains such as urban design [13] to generate different designs of cities. In the following sections, we will first introduce the general concept of the framework and computation problem in Sect. 2.2. Afterwards, we go into more detail, by first explaining the data and knowledge model of semDTs in Sect. 2.3 and how a virtual simulation environment enhances the semDT in Sect. 2.4. Then, we formally describe the autonomous mapping process in Sect. 2.5 and describe an example system, that is able to apply this mapping process to create semDTs in Sect. 2.6. Finally, we will finish of with a concluding Sect. 2.7.

2.2 Conceptual Framework and Computational Problem The store model represented through semantic store maps include objects and their parts as well as the relations between them representing the physical structure of the store and their products, see Fig. 2.2. A key composed object category is the shelf system, which is composed of a shelf frame and individual shelf layers that are horizontally attached to the shelf body at

Fig. 2.2 Composition of shelf systems and the arrangement of products in shelves

2 Robots Collecting Data: Modelling Stores

45

Fig. 2.3 A store shelf system and its annotated model

different heights. An individual shelf layer hosts several (product) facings, which are the cuboid volumes in which the products of the same product category are arranged. The facings containing different products are separated by separators and each facing is identified through a label, which is a barcode with a price tag. The label is attached to the front side of the supporting shelf layer between the respective separators. In the complete application domain there are also shelf components for products that cannot be trayed but have to hang, lie, or stand. The different forms of packaging include rigid objects, glass bottles, plastic bottles, paper boxes, and shelf-ready packaging, to name only a few. Figure 2.3 illustrates the main idea of semantic store maps. Semantic store maps are best imagined as a realistic virtual reality environment, where each relevant object and object part has a symbolic name that is linked to the symbolic knowledge base. The symbolic knowledge bases contain machine-understandable background knowledge about all relevant objects and object parts, including the object categories they are an instance of. Another way to look at semantic store maps is that they combine geometric, physical, and appearance knowledge about products and where they are located with semantic knowledge about the products. The latter is typically contained in the information systems of the retail chain and the internet stores of the respective retailers. Thus the mapping problem for semantic store maps can be characterized as follows. Given 3D models of the physical parts of shelf systems as well as of all products, the symbolic background knowledge, compute a composed CAD model that can be considered as a digital twin of the supermarket. The mapping method is depicted in Fig. 2.4: Perceived information is combined with background knowledge to yield a semantic store map that is used to create photorealistic VR environments and robot simulations.

46

M. Beetz et al.

Fig. 2.4 Combining perceived information with the background knowledge provided by store information systems

Fig. 2.5 The figure shows three queries formulated in the semDT query language and the answers to the queries highlighted in the semDT

2 Robots Collecting Data: Modelling Stores

47

Figure 2.5 shows three example queries evaluated on robot-generated semantic store maps, formulated in Prolog, a logic-based programming language. The first one queries a particular product, the second one the empty facings in the shelf, and the third one health-threatening products in the reach of children, which results in highlighting of cleaning products.

2.3 SemDT Data and Knowledge Model SemDTs facilitate the combination of different data sources in order to create an overall digital model of a retail store. These data include sensor scans, merchandise management data, and data from Enterprise Resource Planning (ERP) systems. These data sources can be leveraged by applications that cater to planning and decision making in order to optimize processes such as the restocking of shelves. One of the difficulties in realizing such applications lies in the fact that the data sources do not possess a uniform format, and thus need to be integrated to create a holistic view of the retail store. This problem is often encountered through the establishment of an interface layer to make the data available to applications. In our work, this is accomplished through a common ontological model that defines abstract categories underlying retail store environments. These definitions are specialized and instantiated for a specific retail store through data sources, such as ERP systems, that are commonly available in retail eco-systems. We call an instantiation of our retail store ontology a semantic environment model of a retail store. All semantic data models [12] have in common that they exhibit a graph structure in the form of a hierarchy of concepts build by the subsumption relation moving from generality to specificity in top-down order. This organization allows designing modular and re-usable models which are domain or task-specific, and that can be extended and combined with each other easily. Similar to object oriented models, semantic data models support inheritance of properties among concepts and their sub-concepts that are connected through an is-a relationship. But in addition, in semantic models, we can express necessary and sufficient conditions of an entity to belong to a certain class. These conditions are expressed in the form of axioms that can be verified automatically to prevent unsatisfiable conditions, or to reason about the class membership of an entity. The inherent structure in semantic models further enables us to restrict the level of abstraction that is necessary for an application. Thereby we abstract the detailed underlying structure of the environment into an application-specific view that consists of an appropriate level of abstraction. The ontology language of our choice is the Web Ontology Language (OWL) [16] which is based on Description Logics. It is most prominent for its role in the semantic web which has the goal to provide meaning together with data such that machines can directly interpret the data. OWL ontologies are organized into terminological knowledge (TBox), and assertional knowledge (ABox). The TBox consist of axioms such as that a Shelf is a sub-concept of PhysicalObject, which is written as Shel f Physical Object. Through this axiom various aspects of the concept PhysicalObject

48

M. Beetz et al.

Fig. 2.6 Facing and Product are the central notions in our ontological model

are inherited by the Shelf concept such as having a location and spatial quality. The OWL language consists of several other types of axioms including equivalence and disjointness of concepts, and semantics of relationships such as reflexivity and transitivity. The ABox, on the other hand, consists of facts that describe some aspect of the world such as a particular object in the environment is of type Shelf, consisting of several shelf layers, and so on. Our ontology is composed of two different modules: one focusing on objects that are essential to construct a retail store environment, and one that defines product categories. This organization is depicted in Fig. 2.6. The abstract concepts Facing and Product are the most central notions in our model as they link the two modules through relations between them. This is, first, each product type is contained in at most one facing at a time, and, second, each facing should contain only products of a certain type according to a store plan. Maintenance of the product location information is significant as it allows us to represent the changes in the state of the store over time. For instance, a customer moving a product from a shelf into a basket is indicated in our knowledge base by changing the relation between the product and the place where it is contained. Foundationally, our retail model is build upon the SOMA ontology [3], which in turn is based on the DUL ontology [15]. These models define a very general terminology including concepts such as Event and Object that serve as super-concepts of classes that are defined in our model. The first part of our ontology model defines concepts ranging from parts of a store to its whole, closely following the order in which the objects are connected to each other. A Facing is defined as a cuboid region between two separators that are attached to a ShelfLayer. Each facing is linked to at most one product type based on an association of price labels to facings. The concept Facing has four subconcepts to indicate different states: FullFacing classifies facings that are entirely full, EmptyFacing those that are entirely empty, PartialFullFacing for a state in between

2 Robots Collecting Data: Modelling Stores

49

the previous two, and the fourth category, MisplacedFacing, is used to indicate that a facing contains a product with wrong type. Our model considers two different types of shelf layers based on positioning of products, i.e. whether the products can be placed on the layers, or need to be hung up. A ShelfFrame is defined as the back panel of a shelf with layers attached at different heights. We further specialize this concept based on geometrical attributes like depth, width and height. For instance, every ShelfH180W100T5L is a light-load shelf frame of 1.8 m height, 1 m width, and with shelf layers having five tiles. The second part of our ontology model defines a hierarchy underneath the Product concept consisting of different product categories that are traded by a specific retailer. For the REFILLS project, the product hierarchy was generated automatically from the information encoded in a product catalog. The taxonomy includes 2814 product classes covering household product categories such as CareProduct, BabyAccessory, and Clothing. The OWL model of a retail environment consists of axioms that can be tested automatically via an OWL reasoner such as HermiT [20]. Axioms that are not satisfiable indicate an inconsistency in the model, i.e. a contradiction between the axiom and another axiom or fact. This is useful, e.g. to detect anomalies in a store such as that certain products are located in the wrong shelf which can be expressed through an axiom that asserts that each product in a shelf must have a common type. OWL reasoner may also infer class membership of entities based on axioms that assert equivalence between classes. If the reasoner can prove that an entity is a member of one of the classes, it can automatically infer that the entity is also a member of the other class. We use such axioms, e.g. for the classification of facings based on what items they contain. An example being the class MisplacedFacing, for which our model defines the equivalence class Facing ∃contains.Misplaced—i.e. each Facing that contains at least one Misplaced item is classified as a MisplacedFacing. Another axiom in our model asserts an equivalence class of Misplaced based on the association of price labels to facings to distinguish between misplaced and wellplaced products. Summarizing, semantic maps augment the basic metric maps with background knowledge in the form of facts and axioms that can capture certain aspects of the retail environment, and that can be used for automated verification and classification. Such maps are a valuable source of information for retail applications, and they further lower the complexity of applications due to transparent integration of different data sources into a common language. In our work, we will demonstrate that such maps can be acquired automatically through a robotic agent. This will be discussed in Sect. 2.5.

50

M. Beetz et al.

2.4 Virtual Simulation Environment of the Semantic Digital Twin In addition to the semantic data models, the semDT consists of a virtual simulation environment (VSE) as displayed in Fig. 2.7. By utilizing the inherent graph-like structure of the semDT as a virtual reality scene graph in VSE, we are able to render the symbolic knowledge base of the semDT. In Fig. 2.8, we depict the VR scene graph of a store environment. The scene graph contains the information of the spatial relationships of all objects and their respective parts, physical properties such as mass or aperture of lenses, and constraints for the relative motions of the parts. Furthermore, it includes knowledge about the mesh, texture and collisions of objects, as well as environment properties such as lighting. This complements the information in the symbolic knowledge base, which is possible due to the fact that the VR scene graph uses the same semantic names as the symbolic knowledge base. The possibility to render the semDT enables many different use cases. If we want to check our store layout for safety issues we could for example query the semDT for products that are dangerous and reachable for children and highlight them. It is also possible to check if, in the current store layout, escape route or fire extinguisher are blocked. Another possibility is using the VSE to plan future store layouts. The VSE makes it possible to rearrange the layout much easier than in the real store, while photorealism provides a good impression of how the result would look like in reality. Furthermore, the visualization provides the base for possible future application developments. An example could be in-store navigation for customers that uses the semDT to visualize the store, current position and products of interest such as discounted products and lead them to their goal. Virtual reality engines are not only able to render data, they are also able to simulate physics. This allows us to perform actions that require interaction with the environment in virtual reality. Because of the structure of the VSE the results of those

Fig. 2.7 Robot performs fetch and place action in virtual reality environment

2 Robots Collecting Data: Modelling Stores

51

Fig. 2.8 Casting of the simulation scene graph as a virtual symbolic knowledge base

action simulations are again a rendered semDT and this opens up the possibility to use the VSE as a robot simulator and virtual reality environment for humans. Using the VSE as a robot simulator, the robot can perform the requested tasks in the same way as in reality. After receiving a task, the robot can access the semDT to infer information necessary to finish the task. One example might be Scenario 3 of REFILLS, which focuses on the replenishment of store items. The task does not specify which objects to replenish. However, by querying for empty facings in the store, object types associated with the facings and where to find them in the storage, it becomes possible to specify the otherwise under-specified task. The robot can then employ the simulation to find parameters for manipulation actions, that are most likely to lead to a successful execution. The simulator supports all the actions that are possible with a real robot, including those presented in Sect. 2.6. In Fig. 2.7, we demonstrate a fetch and place action, using a simulated robot. In addition, the robot simulator can be used as support for the real robot. When performing actions in the real world, robots face different difficulties such as object classification and pose estimation. Occlusion, shadows and noisy sensor data can become a challenge for object classification algorithms. The semDT can be used to support the real robot by using the simulation to render the expected image of a scene as depicted in Fig. 2.9a. This knowledge can be used to enhance perception algorithms through the comparison between expected and actual images of the scene during pose estimations [14]. Errors from the pose estimation could result in objects floating or being inside of other objects. By exploiting the knowledge about physical effects of the simulation, inconsistencies such as levitating objects can automatically be eliminated. Gravity will pull perceived objects to the supporting surface. Objects that are predicted to be

52

M. Beetz et al.

Fig. 2.9 Rendering of actions performed by robot and virtual human

inside of each other would collide in simulation which is used to update their positions by physics simulation. Systems could detect these inconsistencies and request further actions such as reperceiving the scene or updating the pose estimation. The second application uses the VSE as a virtual reality for humans. The possibility to render the semDT makes it possible for us to interact with the semDT in an intuitive way. For example, users are able to inspect the current store layout by using VR headset to walk inside the store and look around. Furthermore, VR technology makes it possible for humans to physically interact with the environment and perform actions such as shopping, as depicted in Fig. 2.9b. For both applications, the knowledge system automatically analyzes and interprets the resulting force-dynamic events. Force-dynamic events are situations in which a force generates a motion in an object. An example is pushing, where the hand is getting into contact with an object and applying a force that changes the position of the object. The identified force-dynamic events make it possible to identify and classify the performed action. This information is then used to create segmented and symbolically annotated episodes. From these episodes it is possible to query for the state transition during actions such as fetch and place. In Fig. 2.10 we visualize the

2 Robots Collecting Data: Modelling Stores

53

Fig. 2.10 State transition during fetch and place action

state transitions during such an action in a kitchen environment. From the knowledge contained in the symbolically annotated episodes we can highlight different subactions, e.g. picking-up, transporting or sliding. Furthermore, we can retrieve the state of objects or actors and highlight objects of interest.

2.5 Semantic Mapping In this section, we describe the process of creating semantic digital twins of retail stores. The process is composed of two phases, which we call layout identification and store monitoring. This reflects that certain features of a store such as shelf positions only change rarely, whereas other features change more frequently. During the layout identification phase, the agent detects features of the store that rarely change. These features are represented as a 2D map of the store where each shelf in the map is labeled by its ontological type, and, thus, inherits several constraints regarding, e.g., what types of layer fit into the shelf. This map is used as a reference for shelf positions, agent localization, and navigation. In the store monitoring phase, the agent detects features of the store that regularly change such as which and where shelf layers are mounted on a shelf. As depicted in Fig. 2.11, the store mapping process can be seen as successive application of production rules in a shape grammar where the productions create new coordinate reference frames, and facts in the knowledge base. For the implementation of these production rules, an agent is needed that is able to freely move a camera in space in order to acquire the necessary data for the rules to be triggered. An example implementation is described in Sect. 2.6. In the following subsections, the shape grammar will be explained in detail.

54

M. Beetz et al.

Fig. 2.11 The shape grammar of the mapping process

2.5.1 Layout Identification The goal of the layout identification phase is the creation of a semDT skeleton with sufficient information to enable an agent to execute the second phase autonomously. This skeleton includes the following information: • A 2D occupancy map of the store, • the localization of shelves in that map, and • the classification of these shelves. The 2D occupancy map of the store can be created using an algorithm for simultaneous localization and mapping (SLAM) [11]. The map produced by SLAM algorithms is usually stored as a grey scale image that represents an occupancy grid of the environment: The grey scale value of a pixel encodes the probability that the corresponding space in the environment is occupied. Therefore, white indicates empty and black occupied space. The map is created by moving the robot through the whole store. From then onward, the agent can localize itself within the environment with respect to the 2D map using algorithms such as Monte Carlo localization (MCL) [9].

2 Robots Collecting Data: Modelling Stores

55

Fig. 2.12 Fiducial markers are used to localize shelves in a 2D map of the retail lab at the University of Bremen

We further define a fixed global reference frame with respect to that map which is called map or world frame. The location of this frame is arbitrary, but often a map corner is chosen. In the following, we will denote this frame as ω. In the second step, occupied regions in the 2D map that correspond to shelves are labeled as such. This is accomplished with the help of magnetic fiducial markers that are attached to the bottom of each shelf, as depicted in Fig. 2.12. Each shelf has a unique pair of markers, one on each corner. The location of the shelf that hosts a marker pair can be estimated given the location of the markers relative to the reference frame ω. The left marker encodes the fact that it is a left marker as well as the ID of the shelf in the ERP system of the retail company, and the right marker is used to achieve a more accurate pose estimation for the shelf. Through the connection to the ERP system, the shelf type and thus its dimensions can be inferred. Fiducial markers are used, because they allow a decent pose estimate, even with a 2D camera. In robotics, a pose refers to the position and orientation of something relative to a reference frame. Such poses are typically represented with homogeneous transformation matrices [21]. For example, the pose of marker m 1 relative to the agents camera c is written as c Tm 1 ∈ R4×4 , where T stands for “transform”. Equation 2.1 describes how a transformation matrix is composed, c Tm 1

=

c Rm 1 c Pm 1

000

1

(2.1)

where c Pm 1 ∈ R3 represents the point of origin of the local reference frame m 1 relative to c, and c Rm 1 ∈ R3×3 represents the relative orientation with a rotation matrix. The advantage of homogeneous transformation matrices is that they double as functions that transform poses into different reference frames. For example, the agent knows its own pose relative to ω, and its own joint configuration, and can compute ω Tc . We can change the reference frame of c Tm 1 using matrix multiplication (denoted with ·): ω Tm 1 = ω Tc · c Tm 1 .

56

M. Beetz et al.

The pose of a shelf can be computed given the dimensions of the shelf, and the poses of fiducial markers attached to it. This is depicted in Fig. 2.11a. On the left side, the Figure shows the poses of two detected markers ω Tm 1 and ω Tm 2 , visualized by their three axes. The frame at the bottom is ω, and the others are the local reference frames of the markers m 1 and m 2 . On the right side, a local reference frame s1 is added at the bottom left corner. We will construct ω Ts1 using Eq. 2.1, and therefore need to compute ω Ps1 and ω Rs1 first. The point of origin ω Ps1 is computed like this: ω Pm 1 − ω Pm 2 ||ω Pm 1 − ω Pm 2 ||2 ||ω Pm 1 − ω Pm 2 ||2 ws1 − − − → P = P + V + ω s1 ω m2 ω m2 m1 2 2

−−− → ω Vm 2m1

=

(2.2) (2.3)

−−−→ −−− → denotes that the vector m 2 m 1 is expressed relative to ω, and ws1 Where ω Vm 2m1 is the width of shelf s1 . To compute ω Rs1 we first have to define the desired axis orientations. As can be seen in Fig. 2.11a, we decided that the x axis (red) points → and the z axis (blue) points away from the shelf, the y axis (green) is −ω V− m m−2− 1 up. One way to construct ω Rs1 from that is to utilize the fact, that the columns of a rotation matrix define the axis of a coordinate frame. Therefore, ω Rs1 can be constructed by appending the three vectors representing the axis of the reference T −−− → , z = 0 0 1 , assuming that the z axis frame: ω Rs1 = x y z , where y = −ω Vm 2m1 of ω is perpendicular to the ground, and x = z × y is the cross product of z and y. Equation 2.1 can now be used to construct ω Ts1 . The frame s1 is now placed in the semantic digital twin, where it will be used as the root reference frame for everything related to this shelf. Accordingly, the parent reference frames of m 1 and m 2 are changed to the newly created s1 . Here we exploit the fact that homogeneous transformation matrices can be computed by inverting = s1 Tω . For example, if ω Tm 1 and ω Tm 2 represent the poses of the the matrix: ω Ts−1 1 markers, then we can change the reference frames of the markers with this formula: −1 s1 Tm 1 = ω Ts1 · ω Tm 1 . The second dimension of the shape grammar rules is that the left hand side is matched against a knowledge base consisting of facts. As a result of applying the rules, new facts are produced that are asserted into the knowledge base. The evolution of knowledge bases within the mapping process can be written as kbn+1 = μ(kbn ) where μ is the mapping function, and kbn+1 the knowledge base resulting from applying a shape grammar rule to the knowledge base kbn . Obviously, the fixpoint is reached once kbn = kbn+1 , i.e. when no shape grammar rule can be used to produce new facts. The left hand side of a rule r in our shape grammar consists of a first order formula gr . The matching succeeds for r in the n’th step iff it can be proven that gr is true under the knowledge base kbn . That is, if there exists a subsitution vlhs of variables in gr that renders the formula true. In this case, a new knowledge base is produced as kbn+1 = kbn ∪ fr (vlhs ) where fr is a function that instantiates a set of template facts given the substitutions in vlhs . The function fr has the form

2 Robots Collecting Data: Modelling Stores

fr (vlhs ) = { p1 (val11 , . . . , val1a ), . . . , pm (valm1 , . . . , valmk )| val x y = vlhs (vx y )}

57

(2.4)

where p1 , . . . , pm are the m consequent facts of rule r , and vlhs (vx y ) is used to access the value of the variable vx y as it was instantiated by proving the formula gr . The evolution of knowledge bases during the layout identification phase is guided by the shape grammar rule displayed in Fig. 2.11a. The left hand side of the rule consists of markers that belong to the same shelf, but where no facts exist in the knowledge base that express this attachment. The corresponding first order formula gr of the left hand side of r can be written as: gr = ∃m 1 , m 2 , id s :Shel f Mar ker (m 1 ) ∧ Shel f Mar ker (m 2 )∧ sid(m 1 , id s ) ∧ sid(m 2 , id s )∧ ¬∃x : attached(m 1 , x) ∧ ¬∃x : attached(m 2 , x)

(2.5)

where the unary predicate ShelfMarker asserts that markers m 1 and m 2 are instances of the concept ShelfMarker, and the predicate sid maps an identifier ids to the markers m 1 and m 2 . The fact that both of the markers are assigned with the same identifier indicates that they belong to the same shelf. We follow the closed-world assumption, which means that if a certain fact is not known, then its negation is assumed. Hence, the negative statements in gr must not be part of the knowledge base before applying the rule. The absence of their positive variant is enough to prove that the negative variant holds. The attached predicate represents that an attachment relationship holds between the markers m 1 , m 2 and some object x. When the formula gr is true under the current knowledge base kb1 , then we can compute the next knowledge base kb2 as: kb2 = kb1 ∪ fr (vlhs ) where vlhs = {m 1 → M1 , m 2 → M2 , x → S1 , ids → I d 1 } fr (vlhs ) = {attached(M1 , S1 ), attached(M2 , S1 ), Shel f (S1 ), sid(S1 , I d 1 ) | vlhs (m 1 ) = M1 , vlhs (m 2 ) = M2 , vlhs (x) = S1 , vlhs (ids ) = I d 1 }. The knowledge base kb2 , which is obtained as the result of applying the rule in Fig. 2.11a, contains facts about the presence of a shelf and the attachment relation between the markers and the shelf.

2.5.2 Store Monitoring During the second phase, called store monitoring, the agent creates an inventory of the whole store. The process starts off with the semDT skeleton from the previous step, and requires the following three additional steps for each shelf to complete the semDT:

58

M. Beetz et al.

Fig. 2.13 A frame o is defined at the left row of mounting points for shelf layers to simplify the computation of shelf layer poses

1. detection of the height and type of shelf layers; 2. detection of the position of separators, and barcodes of price labels for each shelf layer; and 3. counting the number of items, and the detection of the type of the front most item in each facing. The first step is visualized in Fig. 2.11b, c, and h. The left hand side of the rule shown in Fig. 2.11b consists of a detected horizontal line floating in front of a shelf. On the right hand side, the line is replaced by a shelf layer that is attached to the shelf. Each shelf layer is either a ShelfLayerStanding used to place products on (e.g. for bottles), or a ShelfLayerMounting used to hang products (e.g. for toothbrushes). This classification is performed based on the inputs from a perception algorithm. Though for simplicity, we only consider ShelfLayerStanding in this chapter. In addition, every shelf layer is further classified based on geometrical properties including the depth and width of the layer. Afterwards the pose of the shelf layer relative to the shelf s1 Tl1 can be computed, where l1 is the new shelf layer. We define a fixed frame o for the mounting point offset to simplify the computation of shelf layer poses. The pose of o can be imagined as taking the pose of a correctly attached layer, projecting it to the back wall of the shelf, and then to the floor, as depicted in Fig. 2.13. By defining the shelf layer poses relative to o, we just need to set the detected height as z coordinate and the layers depth as x coordinate. The computation is described in Eqs. 2.6 and 2.7: ⎡

1 ⎢0 ⎢ o Tl1 = ⎣ 0 0 s1 Tl1

0 1 0 0

0 0 1 0

⎤ dl1 0⎥ ⎥ h⎦ 1

= s1 To · o Tl1

(2.6) (2.7)

where h is the height detected by a perception algorithm and dl1 is the depth of the shelf layer which can be inferred from its type.

2 Robots Collecting Data: Modelling Stores

59

Fig. 2.14 Result of store monitoring in retail lab visualized as a virtual environment. Grey boxes are placeholders for objects for which no 3D scan was available for visualization

The other two rules of the first step are displayed in Fig. 2.11c and h. They are used to avoid an internal representation of a shelf that is obviously incomplete or impossible. The rule presented in Fig. 2.11c links a bottom layer to the shelf in case that the current shelf is missing a bottom layer. Figure 2.11h shows a rule that, when applied to a semDT where two layers are very close to each other, produces a semDT where those two layers are merged into a single layer. After all layers of a shelf have been added to the semDT, the facings are placed on the layer. As depicted in Fig. 2.11d, facings are the physical space on the layer dedicated to a specific product, and the height of the facing is determined by the position of the layer above, or the shelf height for the top most layer. In the second step, separators and price labels are added to the semDT. Figure 2.11e depicts the rule that is used for separators, but it works for price labels accordingly. When the separators which are attached along the layers are recognized and added to the semDT, then the rule computes a position of the separator relative to the layer. The computation works analogous to the one for shelf layers, with the help of a reference frame placed at a convenient pose to simplify the procedure. Once a separator is added, any existing facing at its place is split into two by applying the rule depicted in Fig. 2.11d. Shelf layers with multiple facings are created in the semDT by successive application of these rules. Price labels are detected through barcodes on a layer, and their poses are computed similarly to those shelf layers and separators. Finally, during the third step, products are first detected and then added into the facing in the semDT that encloses the product location. Using a set of rules similar to the one displayed in Fig. 2.11f, the state of each facing is computed. The example rule expresses how the state of a facing changes from being empty to being partially filled because it has objects inside of it. We differentiate between empty, partially filled, full and at least one misplaced item. Whether or not a product in a facing is misplaced can be inferred by comparing its id with the price label closest to it. Repeating this process for all shelves creates a complete semantic digital twin of the whole store. Figure 2.14, shows an example twin of our retail laboratory, where six shelves were scanned.

60

M. Beetz et al.

Fig. 2.15 Software architecture of autonomous robot for creating semantic digital twins of retail stores

2.6 Example Store Monitoring System This section describes an example system which uses the abstract mapping process described in Sect. 2.5 to generate semantic digital twins. The high level software architecture is depicted in Fig. 2.15. It has four main components, a task executive, the knowledge processing system KnowRob [2], which stores the semantic digital twin and allows reasoning on it, a mobile robot platform and a perception framework for which we use RoboSherlock [1]. In this example, the mapping process described in the previous section is primarily implemented as part of KnowRob. The only goal of the task executive is to provide KnowRob the necessary input from a perception system. In case of the layout identification, the task executive only serves the purpose of acting as an interface between the perception framework and KnowRob. A person drives the robot through the store to create a 2D map, in the meantime the task executive listens to the perception system for detected fiducial markers and forwards them to KnowRob. In contrast, the store monitoring is performed fully autonomously by the task executive. The program sequence is structured after the process described in Sect. 2.5.2. In a loop it drives the robot to each shelf, detects its layers, then its facings and finally estimates the state of the facings. The used robot is described in Sect. 2.6.1, however any platform that can localize and navigate in a store, and that can place a camera flexibly at different heights in front of shelves is able to perform this task, e.g. a drone. The perception framework used to perceive the features of the store is described in Sect. 2.6.2.

2.6.1 Mobile Robotic Platform For the robot, we chose an existing mobile platform constructed in our laboratory called ‘Donbot’ (displayed in Fig. 2.16). This robot has an omnidirectional mobile

2 Robots Collecting Data: Modelling Stores

61

Fig. 2.16 The robot used in our laboratory experiment, its sensor package, and one of its wheels

platform using four mecanum wheels, two Hokuyo UST10-LX laser scanners, and a Universal Robots UR5 arm. It can move easily between shelves, localize itself in a map using standard laser-scanner-based algorithms (e.g. MCL [9]), and position the sensors needed for the application in any pose within the envelope of reachability. Since the arm can not reach higher than 1.2 m in this setup, which is below the maximum height of shelves, some of the shelf layers are not reachable, but it is enough for observing most of the objects in our test scenario. The sensor package consists of two sensors. The first one is a high-resolution RGB camera from FLIR (12MP), using a lens with 10mm focal length. We have also installed a ring light on the lens to provide additional illumination, which allows us to capture sharp images with only one or two milliseconds exposure time. Together with time synchronization, this makes it possible to perceive barcodes on price labels, separators, and fiducial markers in the environment during movement of the base or the arm. The second one is an Intel Realsense D435 RGB-D sensor. This sensor is based on a black-andwhite camera pair, together with an infrared pattern projector. There is an additional RGB camera that delivers information to assign color to the points obtained from the stereo vision calculation. To control the robot, opensource libraries like Move Base Flex [18] for base navigation and MoveIt! [5] or Giskard [8] to control the arm can be used.

62

M. Beetz et al.

2.6.2 Perception With RoboSherlock we provide a general perception framework for robots that is a knowledge-based multi-expert system. The principle behind RoboSherlock is Unstructured Information Management (UIM) which, for example, played a central role for the success of the IBM Watson system in the year 2011. The basic idea was to treat the incoming natural language data as a document where individual algorithms can annotate small pieces of information with respect to their expertise. At the end of the process the annotations are interpreted and weighted to generate the final result. RoboSherlock treats incoming sensory data the same way. It converts the sensory information into a Common Analysis Structure (CAS) in which a large variety of specific perception experts can annotate information in a flexible way. For example, a clustering algorithm can annotate regions in the data which belong to an object, whereas a shape estimation algorithm can annotate geometric information on already segmented objects. In a final step, the potentially contradictory annotations of the experts are analyzed to construct the most probable belief of the robot agent which is then asserted into its knowledge base. Essential for general perception is also the task context because it influences the expected results of perception tasks. If a robot scans shelves and counts objects in a supermarket, the perception process can focus on the segmentation, clustering and quantification of the objects. In contrast, a robot that is supposed to grasp objects, for example to refill supermarket shelves, needs to estimate accurate object poses for successful manipulation. This flexibility is supported in RoboSherlock by providing a query-answering interface. It allows the task executive to formulate perception tasks in an abstract way and automatically reasons about suitable perception algorithms that should be executed for the task at hand. This powerful functionality can be realized because RoboSherlock is tightly integrated with the semantic digital twin, which links the capabilities of perception algorithms with models about environments and objects. As a concrete example, RoboSherlock can be asked the question “detect layers of shelf s1 ”. The shelf identifier can be used to retrieve the shelf pose and dimensions from the semantic digital twin for segmentation of layer candidates. It can use price label detectors to look for horizontal lines in the segment to create height estimates for the layers, as shown in Fig. 2.17. Another example would be “what product is in facing f 1 ”. Again, the semantic digital twin can be asked for the facing dimensions for segmentation. In addition, RoboSherlock can ask for the enclosing separators positions and redetect them in the image to get a more accurate segmentation. If it is only important, if the right object is in the facing, RoboSherlock can ask the semantic digital twin for the associated price label to call an object recognizer that is specialized for the specific object. For the REFILLS project, novel product identification and counting algorithms have been implemented, that are specialized for the retail environment. Those algorithms were integrated as annotators into the existing RoboSherlock framework and are explained in detail in next Chapter. More information about RoboSherlock can be found in [1].

2 Robots Collecting Data: Modelling Stores

63

Fig. 2.17 Segmented shelf for layer detection on a black background

2.7 Conclusion In this chapter we have seen the feasibility of having mobile robots autonomously build semantic and almost photo-realistic models of retail stores. The REFILLS researchers envisage that this technology could achieve a disruptive change in retail operation. In the not so distant future robots doing the store monitoring do not have to be human-sized, wheeled robots but could be hand-sized drones. This can enable every retailer, even the ones with small stores to do their inventory automatically. The range of stores that can be modeled will continually increase to stores with less and less structure. This facilitates that the whole retail world can be equipped with new categories of information systems that will substitute the traditional relational database systems with information systems that can be rendered as virtual reality stores or form the basis of augmented reality services. It is a big opportunity for the whole retail domain to build digital innovation platforms around this disruptive technology. Furthermore, they could create ecosystems for these platforms with stakeholders including shoppers, store operators, logistics planners, application developers and technology providers. The research project Knowledge4Retail is a promising next step into this direction.

References 1. Beetz, M., Bálint-Benczédi, F., Blodow, N., Nyga, D., Wiedemeyer, T., Marton, Z.C.: Robosherlock: Unstructured information processing for robot perception. In: IEEE International Conference on Robotics and Automation, pp. 1549–1556 (2015) 2. Beetz, M., Beßler, D., Haidu, A., Pomarlan, M., Bozcuo˘glu, A.K., Bartels, G.: Knowrob 2.0– a 2nd generation knowledge processing framework for cognition-enabled robotic agents. In: IEEE International Conference on Robotics and Automation, pp. 512–519 (2018) 3. Beßler, D., Porzel, R., Pomarlan, M., Vyas, A., Höffner, S., Beetz, M., Malaka, R., Bateman, J.: Foundations of the socio-physical model of activities (soma) for autonomous robotic agents. In: Brodaric, B., Neuhaus, F. (eds.) Formal Ontology in Information Systems - Proceedings of

64

4.

5. 6. 7.

8.

9. 10.

11. 12. 13. 14.

15.

16. 17.

18.

19.

20. 21. 22.

23.

M. Beetz et al. the 12th International Conference, Bozen-Bolzano, Italy, September 13-16, 2021, Frontiers in Artificial Intelligence and Applications (2021) Cadena, C., Carlone, L., Carrillo, H., Latif, Y., Scaramuzza, D., Neira, J., Reid, I., Leonard, J.J.: Past, present, and future of simultaneous localization and mapping: Toward the robustperception age. IEEE Trans. Rob. 32(6), 1309–1332 (2016). https://doi.org/10.1109/TRO. 2016.2624754 Coleman, D., Sucan, I., Chitta, S., Correll, N.: Reducing the barrier to entry of complex robotic software: a MoveIt case study (2014). arXiv:1404.3785 Deeken, H., Wiemann, T., Hertzberg, J.: Grounding semantic maps in spatial databases. Robot. Auton. Syst. 105, 146–165 (2018). https://doi.org/10.1016/j.robot.2018.03.011 Dufour, C., Soghomonian, Z., Li, W.: Hardware-in-the-loop testing of modern on-board power systems using digital twins, pp. 118–123 (2018). https://doi.org/10.1109/SPEEDAM.2018. 8445302 Fang, Z., Bartels, G., Beetz, M.: Learning models for constraint-based motion parameterization from interactive physics-based simulation. In: IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 4005–4012 (2016) Fox, D.: Kld-sampling: adaptive particle filters and mobile robot localization. Adv. Neural. Inf. Process. Syst. 14(1), 26–32 (2001) Gemignani, G., Capobianco, R., Bastianelli, E., Bloisi, D., Iocchi, L., Nardi, D.: Living with robots: interactive environmental knowledge acquisition. Robot. Auton. Syst. 78, 1–16 (2016). https://doi.org/10.1016/j.robot.2015.11.001 Hess, W., Kohler, D., Rapp, H., Andor, D.: Real-time loop closure in 2D lidar slam. In: IEEE International Conference on Robotics and Automation, pp. 1271–1278 (2016) Hull, R., King, R.: Semantic database modeling: survey, applications, and research issues. ACM Comput. Surv. 19(3), 201–260 (1987). https://doi.org/10.1145/45072.45073 Mandi´c, M., Tepavˇvevi´c, B.: Analysis of shape grammar application as a tool for urban design. Environ. Plann. B: Plann. Design 42(4), 675–687 (2015). https://doi.org/10.1068/b130084p Mania, P., Kenfack, F.K., Neumann, M., Beetz, M.: Imagination-enabled robot perception. In: International Conference on Intelligent Robots and Systems (2021). https://doi.org/10.1109/ IROS51168.2021.9636359 Masolo, C., Borgo, S., Gangemi, A., Guarino, N., Oltramari, A.: WonderWeb deliverable D18 ontology library (final). Technical Report, IST Project 2001-33052 WonderWeb: Ontology Infrastructure for the Semantic Web (2003). http://www.loa.istc.cnr.it/old/DOLCE.html McGuinness, D.L., Van Harmelen, F., et al.: Owl web ontology language overview. W3C Recommendation 10(10) (2004) Pangercic, D., Tenorth, M., Pitzer, B., Beetz, M.: Semantic object maps for robotic housework - representation, acquisition and use. In: IEEE/RSJ International Conference on Intelligent Robots and Systems. Vilamoura, Portugal (2012) Pütz, S., Simón, J.S., Hertzberg, J.: Move Base Flex: a highly flexible navigation framework for mobile robots. In: IEEE/RSJ International Conference on Intelligent Robots and Systems (2018). https://github.com/magazino/move_base_flex Rusu, R.B., Marton, Z.C., Blodow, N., Holzbach, A., Beetz, M.: Model-based and learned semantic object labeling in 3D point cloud maps of kitchen environments. In: IEEE/RSJ International Conference on Intelligent Robots and Systems. St. Louis, MO, USA (2009) Shearer, R., Motik, B., Horrocks, I.: Hermit: a highly-efficient owl reasoner. Owled 432, 91 (2008) Siciliano, B., Khatib, O.: Springer Handbook of Robotics. Springer, Berlin (2016) Stiny, G., Gips, J., Stiny, G., Gips, J.: Shape grammars and the generative specification of painting and sculpture. In: Segmentation of Buildings for 3DGeneralisation. In: Proceedings of the Workshop on generalisation and multiple representation, Leicester (1971) Vachálek, J., Bartalsk`y, L., Rovn`y, O., Šišmišová, D., Morháˇc, M., Lokšík, M.: The digital twin of an industrial production line within the industry 4.0 concept. In: 21st International Conference on Process Control, pp. 258–262 (2017)

Chapter 3

Robots Collecting Data: Robust Identification of Products Saksham Sinha and Jonathan Byrne

Abstract A supermarket can have numerous stock keeping units (SKUs) in a single store. The arrangement of SKUs or products in a supermarket is carefully controlled and planned to maximize sales. However, verifying that the real shelves match ideal layout, a task called planogram compliance, is a costly process that requires store personnel to take an inventory of thousands of products. In order to automate this task, we have developed a system for retail product identification that doesn’t require fine tuning on the supermarket products, shows impressive generalization and is scalable. In this chapter, we address the problem of product identification on the grocery shelves by using a deep convolutional neural network to generate variable length embeddings corresponding to varying accuracy. For embedding generation, we created an in-house dataset containing more than 6,900 images and tested our model on the dataset created from the real store with products in different rotations and positions. Our experimental results show the effectiveness of our approach. Furthermore, our solution is designed to run on low powered devices such as Intel’s Neural Compute Stick 2 on which our perception system was able to achieve 5.8 frames per second (FPS). Keywords Deep convolutional neural networks · Image classification · Deep learning

3.1 Introduction Supermarket stores house a large number of products, and a planogram is created to optimize the sales of those products. A planogram is a model that specifies the product placement on the shelves of a supermarket to ensure maximum sales. Hence, S. Sinha (B) · J. Byrne Intel R&D Ireland Ltd., Leixlip, Kildare, Ireland e-mail: [email protected] J. Byrne e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 L. Villani et al. (eds.), Robotics for Intralogistics in Supermarkets and Retail Stores, Springer Tracts in Advanced Robotics 148, https://doi.org/10.1007/978-3-031-06078-6_3

65

66

S. Sinha and J. Byrne

keeping the planogram compliance is necessary and important. However, ensuring compliance in a large store such as supermarket is very time consuming, costly and labour intensive. We address this problem by automating the process of planogram compliance with the help of computer vision algorithms running on a low powered device such as Intel® Neural Compute Stick 2 (NCS2) which will allow for easy integration with a handheld device or a robotics platform. We take the pictures of the products on the shelves with a help of a robot and then identify the products on that shelf for the planogram compliance. For example, in a scenario where planogram indicates a detergent on a shelf but the product is identified as a toothpaste, then the store manager would be notified to rectify this placement. Identification of a product can be thought of as an image classification problem where each class is a separate identity. However, identifying a product on the shelves is a challenging task because of the image quality, different light conditions in the store, varied viewpoints and rotations of the product, and image pattern variations due to various promotions which are carried out by product vendors. A plethora of research has been done in improving the approaches to image classification problem using deep learning architectures [10, 16, 25, 28]. However, most of the research focuses on classifying the images for up to 1000 classes. Classifying more than a 1000 classes will lead to increase in required resources for model computations and storage. Also, the required number of parameters to classify will increase with hyper-linear growth for every increase in the number of classes [40]. Furthermore, such networks require a large amount of supervised data for training. For each new class, the network must be retrained for that new class and may also require some architectural modifications [15]. Hence, in order to overcome the challenges stated above, researchers have looked into finding the similarities between the images in the dataset and the query image to find which class the query image might belong to. Examples of such tasks are verification or recognition problem where one has millions of images either be of signatures [5], faces [21], person [11], or products [23]. Siamese networks have proven to be successful in finding the similarities between the images [5, 15]. However, running the Siamese network over a large database of images is computationally expensive and inefficient. Deep metric embedding learning networks have demonstrated to be useful in such situations where an image can be represented as a vector in a Euclidean space. These networks are also to be able to generalize well for the unseen data [11, 21, 23]. Our contributions in this chapter are as follows: • We aim to identify vast amounts of product in the supermarket store using deep metric embeddings. Our approach is able to generalize well to all the products for identification. We stress that our proposed model works with publicly available pre-trained weights of ResNet50 [10] trained on ImageNet dataset [7] and doesn’t require any training on our product dataset at all for identification. • We also developed another architecture trained with triplet loss in order to compare the results on our in-house dataset to show the effectiveness of our proposed model against the model trained on the products dataset.

3 Robots Collecting Data: Robust Identification of Products

67

• We show the performance of our model on inductive transfer learning based classification, in which we evaluate over Omniglot dataset against the state of the art without any training on Omniglot dataset [17].

3.2 Related Work Traditional methods to check for planogram compliance has been to extract the layout of products on the grocery shelves and compare it with the planogram layout based on object detection and recognition which requires many images for training. There has been substantial work done using the invariant feature descriptors such as scale invariant feature transform (SIFT) [1], speeded up robust features (SURF) [37, 39]. However, such methods are not good in generalizing well to wide variety of products that includes different transparency, lighting conditions, shapes and rotations. Some methods focuses on brand or logo detection and identification such as work done by Varol et al. [33] where authors utilized a cascade object detection framework and support vector machine (SVM) to detect and recognize brands of cigarette packages placed on the shelves. However, identifying brand doesn’t identify the specific product as there can be different products under a single brand. Graph-based approach has also been explored by researchers where they identify the products based on the correlation between the product’s image and the product’s image that should be present ideally at that location [19, 30]. The graph-based methods are computationally expensive and hence cannot be used for real-time applications. Embedding based deep learning approaches have proven to be successful in recognizing objects when low supervised data is available as well as generalize well to new unseen data. The goal of such approaches is to ensure that embeddings of the same identity are as close to one another as possible and embeddings of the different identities are as far apart as possible in the feature space [2–4, 11, 12, 18, 20, 21, 27, 32, 36, 38]. Our methods in this chapter are inspired by deep metric embedding learning approaches. Deep metric embedding learning has also been used to solve the classification problem using inductive transfer learning (ITL). Scott et al. [22] used adapted embeddings that combines the loss functions for deep embeddings with weight adaptation. In this paradigm, authors used their proposed adapted embedding that combined Histogram loss [32] with weight adaptation in the the target classes which outperformed all the previous deep metric based k-ITL solutions where k is the number of instances of the target class available in the embeddings. We will be using their proposed method Adaptive Hist Loss [22] as a reference for performance comparison.

68

S. Sinha and J. Byrne

3.3 Method We use Resnet-50 model as a backbone to our architecture as it works well with the NCS2. We represent each image of the products as an embedding f (x) in the feature space Rd with d dimensions. Then using the Euclidean distance between these embeddings, we find the nearest neighbors to the query image to identify the similar images and map identity of the product present in majority to the query image. Our architecture can be directly used with the publicly available ResNet50 pre-trained weights for ImageNet dataset [7], no additional training is required.

3.3.1 Network Architecture As shown in the Fig. 3.1a, we used a pre-trained ResNet50 network trained on the ImageNet dataset [7]. The layers conv2_x, conv3_x, conv4_x and conv5_x of ResNet50 are built from the basic bottleneck block stacked together. The architecture of these layers are shown in detail in Table 3.1. We discarded the last fully-connected (FC) layer and have extracted outputs after the conv2_x, conv3_x and conv4_x layers of the ResNet-50 architecture using skip connections. The outputs of these skip connections are then passed through AverageConcat2D Block which is an adaptive maxpool-2D and adaptive average-pool-2D layers concatenated together. Later, these outputs are concatenated with the final output of adaptive average-pool-2D after the conv5_x layer to produce the final embedding of size 5632. We have color coded the layers ,parts of the embedding and also named the parts of embedding so that it’s easier to understand and refer them in the later sections of the chapter. This architecture allows for fine grained identification of products which is required to differentiate between similar looking products without the need for fine-tuning. This saves us training time involved to identify the products in the supermarket shelves. We have also developed another model whose architecture is shown in the Fig. 3.1b. In this architecture, we replaced the last FC layer with a new FC layer with 1024 dimensions. This model is trained with the synthetic and real world dataset presented in section “Datasets”. For every batch, the embeddings produced by the final FC layer are used to form triplets, perform triplet mining and calculate the triplet loss for the batch. Our intention was to compare the performance of the model trained with triplet loss to our main proposed model which works with the pre-trained weights of ImageNet and doesn’t require any additional training on the supermarket products dataset.

3 Robots Collecting Data: Robust Identification of Products

69

Fig. 3.1 a Shows our proposed model architecture. b Shows triplet network architecture Table 3.1 Architecture of Resnet-50 layers. Building block is shown in curly brackets [10]. The last row shows the final embedding size after pooling each of the layers. The total size of combined embeddings is 5632 conv2_x conv3_x conv4_x conv5_x ⎫ ⎫ ⎫ ⎫ ⎧ ⎧ ⎧ ⎧ ⎪ ⎪ ⎪ ⎪ ⎬ ⎬ ⎬ ⎬ ⎨ 1 × 1, 64 ⎪ ⎨ 1 × 1, 128 ⎪ ⎨ 1 × 1, 256 ⎪ ⎨ 1 × 1, 512 ⎪ ×3 ×6 ×3 3 × 3, 64 3 × 3, 128 × 4 3 × 3, 256 3 × 3, 512 ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎭ ⎭ ⎭ ⎭ ⎩ ⎩ ⎩ ⎩ 1 × 1, 256 1 × 1, 512 1 × 1, 1024 1 × 1, 2048 512 1024 2048 2048

3.3.2 Triplet Loss As described by Schroff et al. in his paper [21], we wanted to ensure that the embedding which is represented as f (x) ∈ Rd of an image xia (anchor ) of a specific product p is closer to all the other images of the same product xi ( positive) and far from any n image of another product xi (negative) in the feature space Rd . Assuming Da, p as the distance between the embedding anchor and positive and Da,n as the distance between the embedding anchor and negative, then the triplet loss is defined as L(θ ) =

a, p,n

[m + Da, p − Da,n ]+ .

(3.1)

70

S. Sinha and J. Byrne

where the loss L(θ ) is defined over the parameters θ , m is the margin by which positive should be closer to the anchor than to the negative. When trained using this loss over a large dataset with sufficient number of epochs, eventually all pairs (a, p) will be pulled together and (a, n) will be pushed apart. This allows the images of the same product to be clustered together in the feature space Rd and away from images of different products.

3.3.3 Triplet Mining Forming triplets increase the requirement of computing resources by a great extent. This is because as the dataset gets larger, the number of triplets increases cubically. In addition to that, most of the triplets formed are invalid as the triplet pairs may not satisfy anchor , positive and negative. Also, some of the valid triplets could already satisfy the margin condition or miss the margin by a small value. These can be termed as easy triplets which doesn’t contribute well to learning. Hence, training on such easy triplets would result in waste of resources and time. Therefore, mining the hard triplets is critical. Many offline mining strategies has been researched for triplet mining [9, 21, 24]. However, these are impractical due to the considerable increase in training time due to offline mining. Hermans et al. [11] described two ways to perform triplet mining online in order to get good triplets—batch hard and batch all. Batch hard mining corresponds to mining of triplets for which given an anchor , we take the closest negative and farthest positive to create a triplet in the given batch. Similarly, batch all mining corresponds to mining of all the valid triplets in the batch. Furthermore, the mining happens only on the current batch, thus the runtime cost is very low. We follow the batch hard strategy to mine the triplets based on Hermans et al. results.

3.3.4 Training For training of triplet network architecture, the Adam optimizer [13] was used with the default parameters (beta1 = 0.9, beta2 = 0.999, epsilon = 1e − 08) and the learning rate as 10−3 . All training was performed on NVIDIA® GTX 1080 Ti with 12 GB memory. We resized all images to 224 × 224 with padding and augment them using random resized crop, random grayscale, lighting noise and cutout [8]. ImageNet mean was subtracted from each of the input images. While training, a batch size of 196 was used and the triplet loss margin was taken as 2000. In order to verify that there are enough classes with every class having same number of images, we created a batch sampler that would create the batches with n random classes with each class having m randomly selected images.

3 Robots Collecting Data: Robust Identification of Products

71

3.4 Datasets We looked into multiple pre-existing datasets such as Grocery Dataset [14] and WebMarket Dataset [41, 42] to use for training and evaluation of our method. However, for our use case, all available online datasets were not helpful as we do not want any human interaction with the shelves, nor do we need a hierarchical classification of the products. For example, in the Grocery dataset, there is a hierarchy in which products are classified as packages, boxes, tubes etc. Then they are classified as the type of product such as milk, juice, etc. and then they are classified by the name of the product. There are many images with human hands in them. Furthermore, products are not on the shelves in many images. For the WebMarket dataset, the query images (test images) are of a single product placed in front of a plain background while the dataset images are of the whole shelves combined and not of a single product. This makes the dataset not usable for us. Therefore, to tackle this problem, we decided to curate our own datasets. We made use of both synthetically generated dataset and self curated real world dataset for the training, and captured real world store images with the help of robot scanning the shelves of the store for testing. Each label of the product is a Global Trade Item Number (GTIN) that is used to identify the product. We intend to release all synthetic, real world and test dataset in the future. Our novel datasets provide utility to researchers and application developers interested in the concrete problem of recognizing objects in the supermarket. These datasets contain products with high variation in sizes, colours, transparency and packaging. We gave special attention to have all kinds of variations in the products ranging from the small to large size, reflective packaging, transparency and packaging type (bag, box, bottles, cans, tubes, etc.). In the test set after each scan, the products were shuffled and placed in varying lighting conditions with different rotations. We also selected the products that looked very similar in appearance in order to challenge fine-grained recognition. Moreover, we use Omniglot dataset [17] to evaluate our proposed model on the k-shot ITL problem to show its effectiveness. Following sub-sections describe each of the datasets in detail (Fig. 3.2).

3.4.1 Synthetic Dataset We used an ARTEC 3D® scanner to scan the product on a turntable to generate a 3D mesh of products. We then created a synthetic dataset of products using the Intel® OSPray rendering engine [35]. This dataset has products placed in different positions in a row, with different lighting conditions, and with different rotations. In total, we have around 140 products with about five different rotations and lighting conditions generated using OSPray.

72

S. Sinha and J. Byrne

Fig. 3.2 Example of synthetic and real world dataset. a, b and c are examples from synthetic dataset. d, e, and f are examples from real world dataset

Fig. 3.3 Example of similar looking products from the Real world dataset

3.4.2 Real World Dataset We had a total of 347 products with a wide variety of characteristics pertaining to shape (tube shaped, boxes and packets), occlusion (transparent and non-transparent), colour (reflective and non-reflective) and similarity with respect to appearance. Each of the product was placed in front of a black or white background, and the image was captured using a digital single-lens reflex (DSLR) camera to obtain high quality images of the product. The products were then rotated every 18 degrees, and images were captured. The rotation was facilitated by in-house developed turn table. After gathering all the product images, they were cropped with the help of background subtraction (Fig. 3.3).

3 Robots Collecting Data: Robust Identification of Products

73

3.4.3 Test Dataset We used a robot to scan the shelves of the real supermarket store to curate the test dataset. The robot is equipped with the Intel® Realsense D435 camera to capture the images. The robot captures the image and sends it to the perception system, including the meta data along with the bounding box of the product. This system uses our model to identify the product and label it. This bounding box is calculated by detecting the tags over the separators on the shelf. Figure 3.4a shows an example of an image captured by the robot. All images are captured with 10◦ relative pitch. We crop the image according to the bounding box provided and test on the cropped images. Figure 3.4b–d shows the images after cropping. The ground truth, which is the GTIN of the product and its placement on the shelves, is recorded at every scan of the shelves for the evaluation. The product’s orientation and positions is changed after each scan of the shelves to augment the dataset. The shelf configuration also had empty shelves as well. In total, the test set has 588 images from 162 products.

Fig. 3.4 Example of image from the test dataset. a Is the image received from the robot. b, c and d Are the examples of images after cropping

74

S. Sinha and J. Byrne

3.4.4 Omniglot The Omniglot dataset [17] contains 1623 handwritten characters from 50 alphabets with each character having 20 images. Following the research done in previous years [22, 26, 31, 34], the dataset is augmented by rotating with all 90◦ rotations resulting in the total of 6492 classes. This dataset originally comes with a train and test separation with mutually exclusive alphabets, 30 alphabets in the train set and 20 alphabets in the test set. For evaluation, we merged them both together. We created a support and test split by randomly selecting 10 instances per class. Hence the support and test set has 10 × 6492 = 64,920 images respectively. The support set is used to generate embeddings from our proposed model and is evaluated on the test split.

3.5 Experiments For evaluation on test dataset, all the images from the real world dataset are first resized to 224 × 224 with padding and the corresponding embeddings were generated by the model. These embeddings are then saved to disk in Hierarchical Data Format (HDF5) [6, 29]. The total size of the embeddings are [6940, 5632] where 6940 corresponds to total number of images and and 5632 is the total length of the embedding. For testing, the perception system receive the image from the robot, then the image is cropped using the bounding box provided in the image metadata. This cropped image is resized to 224 × 224 with padding for input to the model. The embedding produced is then used to find the Euclidean distance from all the embeddings in our HDF5 database. Finally, we apply K-nearest Neighbor (KNN) to identity the product. For evaluation on Omniglot dataset, we employ similar approach as taken by the state of the art method [22] for the comparison of the performance, via k-shot nway classification. From a total of 6492 classes, n classes are chosen randomly. For embeddings generation, k random instances are selected from each of the selected classes from the support set. For testing, all instances are used as query images from the selected classes of the test set.

3.5.1 Evaluation Metric We used Top-1 and top-5 accuracy to report our model performance. Top-1 accuracy is the accuracy calculated when we select the closest embedding’s label for identification and compare it with ground truth label while Top-5 is the accuracy calculated when at least one of the first five closest embedding’s labels is a ground truth label. For comparing the performance of KNN based evaluation, we assign the label which is in the majority among the first K closest labels.

3 Robots Collecting Data: Robust Identification of Products

75

3.5.2 Selection of Embedding Size To identify products as accurately as possible, we tested our proposed model with different parts of embedding on a subset of test data having 85 test images of 60 products. All other experimental settings were kept the same. As shown in the Fig. 3.1a, we have named the parts of embedding as A, B, C, D for the referencing. As can be observed from the Fig. 3.5a, Top-1 and Top-5 accuracy are same for the embeddings made from parts A + B + C and A + B + C + D, however, only the embedding with A + B + C + D is the most robust in finding most of the neighbours correctly in all of the cases. Figure 3.5b shows an example of query image and first five closest images. Embedding with parts A + B + C and A + B + C + D both identified the product correctly, however, the former is weaker in terms of

Fig. 3.5 a Performance of different embedding parts on subset of test data. b First five nearest image to the query image with different embedding parts

76

S. Sinha and J. Byrne

finding the first 5 neighbours than the latter. Hence, we decided to use all four parts of the embedding. Hereafter all evaluations are performed on embeddings with A + B + C + D parts.

3.5.3 Results We tested our proposed model and triplet network architecture on the complete test data. In order to find the best optimal K neighbours for the KNN, we tested the models from K ranging from 1 to 8. Our proposed model, without any training, outperforms our triplet network architecture by a huge margin. As can be observed from the Fig. 3.6, our proposed model performs the best with 95.91% accuracy for K = 4 while triplet network performs best at K = 3 with 58.27% accuracy. On Omniglot dataset [17], our proposed model, with only pre-trained ImageNet weights and without fine tuning, outperforms Adapted Hist loss which was trained on the Omniglot dataset for n ≤ 100 where n is the number of classes and k ∈ {1, 5, 10} is the number of support instances in the embedding per class as shown in the Fig. 3.7. We tested our proposed model in the exact fashion as done by the Scott et al. [22] for the fair comparison. Results of Adapted Hist Loss is borrowed from their paper. One of the reasons that for class n > 100 our proposed model didn’t perform well is because the ImageNet pre-trained ResNet-50 was trained to classify object in different rotations as the same object, whereas, in this dataset, each rotation is considered as a separate class.

Fig. 3.6 Performance on complete test data of our proposed network architecture and triplet network architecture with K neighbor for KNN

3 Robots Collecting Data: Robust Identification of Products

77

Fig. 3.7 Performance on Omniglot dataset. Each point is the average test accuracy over 10 replications. The bands indicate the standard deviation

3.5.4 Performance on Neural Compute Stick 2 We test our model to work on Intel® NCS2 to validate its working on a low powered device that can be easily deployed to any edge device. We exported the model to the Open Neural Network Exchange (ONNX) format and then used Intel® OpenVINOTM Toolkit to convert to an intermediate representation (IR) with a batch size of 1 that is used to run on the NCS2 device. For benchmarking the speed of inference, we used toolkit provided benchmarking app with parameters −d MY R I AD for selecting to run on NCS2 device, −api async to run in asynchronous mode, −niter 200 for running upto 200 iterations and −nir eq 8 for making 8 infer requests. Our model achieved 26.6 FPS on the benchmarking application which is close to real-time performance and consumes only 1.8W power, whereas the whole perception system performs at 5.8 FPS with the same amount of power consumption for K = 4 in Floating Point 16-bit (FP16) precision. In addition to 6940 actual embeddings, we also generated 100,000 fake embeddings to test the scaling of the system. We also ported our system to Raspberry Pi 4 with NCS2 to develop a handheld solution.

78

S. Sinha and J. Byrne

3.6 Conclusion A supermarket can have numerous products in their stores. In addition to that, new products are being introduced frequently to the store. Hence, having a deep learning model to identify the products that require training or fine-tuning every time a new product is introduced is not feasible. We tackle this problem by introducing our proposed model that require no fine-tuning to identify the products in the supermarket shelves. Since the model was only pre-trained on ImageNet dataset and was never trained on our products dataset, all the products are new and unseen to the model. Therefore, our model achieves impressive generalization to the new products without the need for training and is be able to work with less supervised data. We created our in-house synthetic, real-world and test dataset from the real store and our proposed model achieved 95.9% accuracy on the test dataset. We also show that our model works well on tackling other problems such as k-ITL where we compared our model results with the state of the art deep embedding metric based solution on the Omniglot dataset. Furthermore, our perception system is able to achieve 5.8 FPS on Intel® NCS2 and allows our model to be deployed to any low-powered handheld device or robotics platform for inferencing.

References 1. Auclair, A., Cohen, L.D., Vincent, N.: How to use sift vectors to analyze an image with database templates. In: Boujemaa, N., Detyniecki, M., Nürnberger, A. (eds.) Adaptive Multimedia Retrieval: Retrieval, User, and Semantics, pp. 224–236. Springer, Berlin, Heidelberg (2008) 2. Batchelor, O., Green, R.: Object recognition by stochastic metric learning. In: Dick, G., Browne, W.N., Whigham, P., Zhang, M., Bui, L.T., Ishibuchi, H., Jin, Y., Li, X., Shi, Y., Singh, P., Tan, K.C., Tang, K. (eds.) Simulated Evolution and Learning, pp. 798–809. Springer International Publishing, Cham (2014) 3. Bell, S., Bala, K.: Learning visual similarity for product design with convolutional neural networks. ACM Trans. Graph. 34(4), 98:1–98:10 (2015). https://doi.org/10.1145/2766959 4. Bouma, S., Pawley, M.D.M., Hupman, K., Gilman, A.: Individual common dolphin identification via metric embedding learning. CoRR abs/1901.03662 (2019). arXiv:1901.03662 5. Bromley, J., Guyon, I., LeCun, Y., Säckinger, E., Shah, R.: Signature verification using a “siamese” time delay neural network. In: Proceedings of the 6th International Conference on Neural Information Processing Systems, NIPS’93, pp. 737–744. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA (1993). http://dl.acm.org/citation.cfm?id=2987189.2987282 6. Collette, A.: Python and HDF5. O’Reilly (2013) 7. Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: CVPR09 (2009) 8. DeVries, T., Taylor, G.W.: Improved regularization of convolutional neural networks with cutout (2017). arXiv:1708.04552 9. Geng, M., Wang, Y., Xiang, T., Tian, Y.: Deep transfer learning for person re-identification. CoRR abs/1611.05244 (2016). arXiv:1611.05244 10. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. CoRR abs/1512.03385 (2015). arXiv:1512.03385

3 Robots Collecting Data: Robust Identification of Products

79

11. Hermans, A., Beyer, L., Leibe, B.: In defense of the triplet loss for person re-identification. CoRR abs/1703.07737 (2017). arXiv:1703.07737 12. Hu, J., Lu, J., Tan, Y.: Discriminative deep metric learning for face verification in the wild. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1875–1882 (2014). https://doi.org/10.1109/CVPR.2014.242 13. Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. CoRR abs/1412.6980 (2014). arXiv:1412.6980 14. Klasson, M., Zhang, C., Kjellström, H.: A hierarchical grocery store image dataset with visual and semantic labels. In: IEEE Winter Conference on Applications of Computer Vision (WACV) (2019) 15. Koch, G., Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition (2015) 16. Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Proceedings of the 25th International Conference on Neural Information Processing Systems - Volume 1, NIPS’12, pp. 1097–1105. Curran Associates Inc., USA (2012). http://dl.acm.org/citation.cfm?id=2999134.2999257 17. Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015). https://doi.org/10.1126/ science.aab3050. https://science.sciencemag.org/content/350/6266/1332 18. Lu, J., Hu, J., Zhou, J.: Deep metric learning for visual understanding: an overview of recent advances. IEEE Signal Process. Mag. 34(6), 76–84 (2017). https://doi.org/10.1109/MSP.2017. 2732900 19. Ray, A., Kumar, N., Shaw, A., Mukherjee, D.P.: U-pc: Unsupervised planogram compliance. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) Computer Vision - ECCV 2018, pp. 598–613. Springer International Publishing, Cham (2018) 20. Ridgeway, K., Mozer, M.C.: Learning deep disentangled embeddings with the f-statistic loss. CoRR abs/1802.05312 (2018). arXiv:1802.05312 21. Schroff, F., Kalenichenko, D., Philbin, J.: Facenet: A unified embedding for face recognition and clustering. CoRR abs/1503.03832 (2015). arXiv:1503.03832 22. Scott, T., Ridgeway, K., Mozer, M.C.: Adapted deep embeddings: A synthesis of methods for k-shot inductive transfer learning. In: Bengio, S., Wallach, H., Larochelle, H., Grauman, K., Cesa-Bianchi, N., Garnett, R. (eds.) Advances in Neural Information Processing Systems 31, pp. 76–85. Curran Associates, Inc. (2018). http://papers.nips.cc/paper/7293-adapted-deepembeddings-a-synthesis-of-methods-for-k-shot-inductive-transfer-learning.pdf 23. Sharma, V., Karnick, H.: Automatic tagging and retrieval of e-commerce products based on visual features. In: Proceedings of the Student Research Workshop, SRW@HLT-NAACL 2016, The 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, San Diego California, USA, June 12–17, 2016, pp. 22–28 (2016). http://aclweb.org/anthology/N/N16/N16-2004.pdf 24. Shi, H., Yang, Y., Zhu, X., Liao, S., Lei, Z., Zheng, W., Li, S.Z.: Embedding deep metric for person re-identification A study against large variations. CoRR abs/1611.00137 (2016). arXiv:1611.00137 25. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. CoRR abs/1409.1556 (2014). arXiv:1409.1556 26. Snell, J., Swersky, K., Zemel, R.S.: Prototypical networks for few-shot learning. CoRR abs/1703.05175 (2017). arXiv:1703.05175 27. Song, H.O., Xiang, Y., Jegelka, S., Savarese, S.: Deep metric learning via lifted structured feature embedding. CoRR abs/1511.06452 (2015). arXiv:1511.06452 28. Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. In: Computer Vision and Pattern Recognition (CVPR) (2015). arXiv:1409.4842 29. The HDF Group: Hierarchical data format version 5 (2000–2010). http://www.hdfgroup.org/ HDF5

80

S. Sinha and J. Byrne

30. Tonioni, A., di Stefano, L.: Product recognition in store shelves as a sub-graph isomorphism problem. CoRR abs/1707.08378 (2017). arXiv:1707.08378 31. Triantafillou, E., Zemel, R.S., Urtasun, R.: Few-shot learning through an information retrieval lens. CoRR abs/1707.02610 (2017). arXiv:1707.02610 32. Ustinova, E., Lempitsky, V.S.: Learning deep embeddings with histogram loss. CoRR abs/1611.00822 (2016). arXiv:1611.00822 33. Varol, G., Salih, R.: Toward retail product recognition on grocery shelves. In: Sixth International Conference on Graphic and Image Processing (ICGIP), p. 944309 (2015). https://doi.org/10. 1117/12.2179127 34. Vinyals, O., Blundell, C., Lillicrap, T.P., Kavukcuoglu, K., Wierstra, D.: Matching networks for one shot learning. CoRR abs/1606.04080 (2016). arXiv:1606.04080 35. Wald, I., Johnson, G., Amstutz, J., Brownlee, C., Knoll, A., Jeffers, J., Günther, J., Navratil, P.: Ospray - a CPU ray tracing framework for scientific visualization. IEEE Trans. Visual Comput. Graph. 23(1), 931–940 (2017). https://doi.org/10.1109/TVCG.2016.2599041 36. Wang, J., Zhou, F., Wen, S., Liu, X., Lin, Y.: Deep metric learning with angular loss. CoRR abs/1708.01682 (2017). arXiv:1708.01682 37. Winlock, T., Christiansen, E., Belongie, S.: Toward real-time grocery detection for the visually impaired. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops, pp. 49–56 (2010). https://doi.org/10.1109/CVPRW.2010.5543576 38. Yi, D., Lei, Z., Liao, S., Li, S.Z.: Deep metric learning for person re-identification. In: 2014 22nd International Conference on Pattern Recognition, pp. 34–39 (2014). https://doi.org/10. 1109/ICPR.2014.16 39. Yörük, E., Öner, K.T., Akgül, C.B.: An efficient Hough transform for multi-instance object recognition and pose estimation. In: 2016 23rd International Conference on Pattern Recognition (ICPR), pp. 1352–1357 (2016). https://doi.org/10.1109/ICPR.2016.7899825 40. Zhang, Q., Lee, K., Bao, H., You, Y., Li, W., Guo, D.: Large scale classification in deep neural network with label mapping. CoRR abs/1806.02507 (2018). arXiv:1806.02507 41. Zhang, Y., Wang, L., Hartley, R., Li, H.: Handling significant scale difference for object retrieval in a supermarket. In: DICTA, pp. 468–475. IEEE Computer Society (2009). http://dblp.unitrier.de/db/conf/dicta/dicta2009.html 42. Zhang, Y., Wang, L., Hartley, R.I., Li, H.: Where’s the weet-bix? In: Yagi, Y., Kang, S.B., Kweon, I.S., Zha, H. (eds.) ACCV (1), Lecture Notes in Computer Science, vol. 4843, pp. 800–810. Springer (2007)

Chapter 4

Robots Working in the Backroom: Depalletization of Mixed-Case Pallets Pierluigi Arpenti, Riccardo Caccavale, Andrea Giuseppe Fontanelli, Vincenzo Lippiello, Gianmarco Paduano, Bruno Siciliano, and Luigi Villani

Abstract Depalletizing robotic systems are commonly deployed to automatize and speed-up parts of logistic processes. Despite this, the necessity to adapt the preexisting logistic processes to the automatic systems often impairs the application of such robotic solutions to small business realities like supermarkets. In this chapter we propose an integrated robotic depalletizing system designed to be easily deployed into supermarket logistic processes. Integrating a robotic system into a supermarket backroom demands a high level of autonomy, based on strong perceptive, executive and gripping capabilities. We will describe the system along with its main components showing how the proposed framework performs into a real supermarket scenario. Keywords Mechatronics · Reconfigurable gripper · Box detection · Autonomous depalletizing

4.1 Introduction Robotic logistics is the field of industrial robotics whose aim is to optimize the flow of goods both inside manufacturing and large-scale distribution. Technological challenges are open for several operations, including the palletizing/depalletizing of goods [6]. In particular, depalletizing is the process of unloading an object, such as a corrugated carton on a pallet, in a defined pattern. This procedure is common in logistics where the goods are typically delivered inside cases (boxes), which can be of different dimensions or weights. This task is a hard and tiresome activity for human workers since they have to manually remove a huge number of weighty cases. Robotic depalletizing solutions increase productivity and are often deployed inside factories. In many industrial contexts robotic depalletizing cells are typically P. Arpenti (B) · R. Caccavale · A. G. Fontanelli · V. Lippiello · G. Paduano · B. Siciliano · L. Villani PRISMA Lab, CREATE - Department of Electrical Engineering and Information Technology, Università degli Studi di Napoli Federico II, Napoli, Italy e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 L. Villani et al. (eds.), Robotics for Intralogistics in Supermarkets and Retail Stores, Springer Tracts in Advanced Robotics 148, https://doi.org/10.1007/978-3-031-06078-6_4

81

82

P. Arpenti et al.

Fig. 4.1 Mixed pallet, made by products displaced in cases of different shapes

tailored to specific products (having the same shape, same dimensions, and same appearance) and specific product lines. Such kind of context dramatically simplifies the executive, perceptive and gripping capabilities required to the robotized system. From the perception viewpoint, the depalletizing process is made easy, due to the low amount of information required. Examples of perceptual systems for structured depalletizing processes are proposed in [12, 20] where time of flight sensors [12] or combined RFID data and depth images [20] are deployed to recognize homogeneous cases. Analogously, 3D-vertices detection of cases is faced via edge detection and robust line fitting in [11]. From the executive and gripping viewpoint, different systems have been proposed. For example, a robotic manipulator performing autonomous grasping, transporting and palletizing is presented in [13]. This system ensures more flexibility but only a specific type of objects is considered. In [18, 23] an innovative suction system is used by an autonomous robot able to pick standard boxes from the upper side and to place them on a conveyance line. In [17] a flexible robotic palletizer, mainly designed for structured industrial environments, is proposed. On the other hand, in logistic scenarios like supermarkets the story is totally different. The products are heterogeneous (different shapes, dimensions and textures) and stored on mixed pallets, that are made of cases with different dimensions and textures (see Fig. 4.1). Some products are not collected into uniform or standardized boxes, but instead into cases which can present specific dimensions and textures making them very difficult to be recognized [10, 21, 24] and depalletized [6, 18]. In this chapter an integrated robotic system is presented, which is purposely designed to operate in supermarkets’ backrooms (see Fig. 4.2). The system is able to recognize and flexibly depalletize cases from mixed pallets avoiding invasive sensors or structural changes in the logistic process. The proposed robotic system is composed of 3 main subsystems: an innovative gripping system to grasp cases from different sides, a perceptive system that identifies, recognizes and localizes cases on the pallet, and a strategy to schedule and orchestrate the depalletizing sequence.

4 Robots Working in the Backroom: Depalletization of Mixed-Case Pallets

83

Fig. 4.2 Rendering of the gripper prototype attached to an industrial robot

4.2 Gripper In this section, the working principle of the proposed depalletizing gripper is described [9]. To obtain a design adaptable to different scenarios, the gripper was developed by considering the following requirements: 1. 2. 3. 4. 5. 6. 7.

ability to grasp boxes from top and side; ability to grasp boxes placed over other boxes; ability to grasp boxes with dimensions in the range [15−50] cm; ability to grasp boxes with weight up to 10 Kg; automatic detection of the boxes orientation using the embedded sensors; embedded detection of the box weight; automatic reconfiguration.

The novel gripper is composed of two symmetric modules, each one equipped with a suction system and a fork, that can be reconfigured along the horizontal axis to adapt the gripper to boxes with different dimensions. This design was chosen for the following reasons: (a) the use of two different, independent and re-configurable modules allows the gripper to grasp boxes of different dimensions, involving a single module for small boxes and both modules for larger ones; (b) the use of forks enables the possibility to grasp the boxes from the bottom side by loading the box’s weight on the forks itself during handling and not on the suction system; (c) boxes with complex shapes (e.g. bottle packages) or fragile structures (e.g. boxes with easy openings) can be grasped from the side. The components of the proposed gripper are described in the next subsections.

84

P. Arpenti et al.

4.2.1 Mechanical Structure The gripper is designed to be assembled on the terminal flange of an industrial robot. This solution allows to maximize the system flexibility and to increase the system workspace, ensuring also compactness and facilitating the integration into preexisting logistic systems. Alluminium alloys were used for the gripper’s main parts and stainless steel only for some thin components such as the forks, as described in the next subsections. This allows the overall end-effector to be lightweight and capable to handle boxes with a maximum weight of about 15 Kg. The main innovation of the proposed design is the adaptability and flexibility to different scenarios. The three main components of the gripper (see Fig. 4.3) are described below. Main Structure The main structure of the gripper is composed of an aluminium frame with two rails placed in the front. A custom sensorized flange connects the gripper to the robotic arm and includes four load cells to estimate the box weight. Two independent gripping modules can move with respect to the main frame, thanks to four ball-guides (for each module) sliding on the two rails. The motion of the two modules (D1 in Fig. 4.3), symmetrical with respect to the gripper centre, is produced by an SD Bidirectional

Fig. 4.3 Gripper degrees of freedom. (D1) enlargement-narrowing, (D2) right fork motion, (D3) left fork motion, (D4) right suction system lifting, (D5) left suction system lifting, (D6) passive rotation of the right suction system, (D7) passive rotation of the left suction system

4 Robots Working in the Backroom: Depalletization of Mixed-Case Pallets

85

Fig. 4.4 Suction systems: a Matrix configuration of 9 suction cups. b Configuration composed of two large suction cups of vertical elongated shape

Ball Screw. The ball screw is actuated by a stepper motor. Each gripping module is composed of at least one suction system and one fork. Suction Systems Each suction system is composed of one suction plate placed in the front of the module. On the front side of this plate, several suction cups are disposed. Two different configurations have been tested to evaluate the best solution for each application, shown Fig. 4.4: (a) a matrix configuration of 9 suction cups, which is more suitable for rigid boxes with flat surfaces, (b) a configuration composed of only two large suction cups of vertical elongated shape. This latter configuration is more suitable for boxes with non-uniform surfaces and requires lower vacuum power because of the increased surface of the suckers. As better explained in the next subsection, one of the phases of the grasping procedure consists in the lifting of the box so that the forks can slide just below it. During this phase, the box is grabbed using the suction system and lifted. Each suction plate can slide vertically with respect to the module (D4 and D5 in Fig. 4.3) along a linear guide. This motion is actuated by a pneumatic cylinder to maximise the power-size ratio. Moreover, each suction plate can rotate passively w.r.t. the axis J (Fig. 4.3). This additional passive DoF was introduced to reduce the load on the suction cups during lifting. More in detail, considering Fig. 4.5, the resulting tangential force on the suction cups (Fs ) can be computed from the box weight (Fg ), assuming a homogeneous weight distribution of the box, by using the equation: Fs = Fg cos(α)/2

(4.1)

α = arcsin D/(L + L s ),

(4.2)

where:

86

P. Arpenti et al.

Fig. 4.5 Force decomposition of the proposed passive mechanism

D is the pneumatic cylinder displacement, L is the box length in the direction orthogonal to the grasped face and L s is the distance between the axis of the passive joint J and the suction cups. Notice that the lifting force is approximately halved thanks to the use of the passive joint and no twisting torque is applied on the suction cups, thus substantially increasing the grasping capability. The combination of the actuated vertical motion and the passive motion is obtained using a custom curved guide (see Fig. 4.5). The length of the curved guide has been calculated to achieve the required passive rotation for boxes with a minimum length L min = 10 cm. By considering Eq. (4.2), and the maximum displacement (Dmax = 5 cm) of the pneumatic cylinder, the required maximum rotation angle of the passive joint is αmax = 18o . The passive rotary motion of the suction systems allows the box to be lifted while leaving the lower edge leaned on the surface below. Forks Each gripping module contains one fork that can slide orthogonally to the suction system (D2 and D3 in Fig. 4.3) along two linear guides. The motion of the two forks is independent and is actuated by a pair of double rack-geared transmissions linked to two independent stepper motors. The two stainless steel racks represent the bearing structure of the fork. Each fork includes a pattern of aluminum rollers which are suitably alternated to protrude from the upper and lower sides respectively. This configuration allows the forks to be inserted between two stacking boxes, while sliding on the upper and lower surfaces of both the boxes.

4 Robots Working in the Backroom: Depalletization of Mixed-Case Pallets

87

4.2.2 Sensors and Actuators The gripper is provided by suitable actuation and sensing systems, as well as a low level control system which allows reconfiguration and robust grasping according to the strategy established by the executive system. Actuation System The gripper includes three electrical stepper motors (Oriental Motors AZM series) actuating the main gripper DoFs. In detail, the first two actuators are exploited to move the two forks independently—the extension length of the forks may vary according to the size of the boxes to grab—then the third motor actuates the bidirectional screw which moves the two gripping modules to reconfigure the gripper. Each motor is controlled independently by a power electronic board that also acts as a feedback controller. As shown in Fig. 4.6 each board communicates with the external PC through an industrial-grade network based on an RS-485 Bus in order to have a robust and reliable communication. Each electronic device responds to a unique ID during read/write operations to control actuators and read the sensors. A microcontroller board, based on an ST Microelectronics F401 ARM processor, is in charge of driving four digital electronic valves of the pneumatic system. The pneumatic system on each fork is made of a linear actuator and a vacuum sucker. The pneumatic vacuum suckers (Schmalz SPOB1 100x40 ED-65) are driven by a pneumatic ejector (Schmalz SBP 25 G03 SDA) that gives the right amount of vacuum thanks to the Venturi effect. This combination is best designed to fulfil the task of gripping both rigid cardboard and plastic surfaces up to 15 kg, thus including a wide range of boxes. All the pneumatic devices are connected to an external air compressor (11Bar) that powers the circuit. Each pneumatic device can be actuated by the electrovalves connected to the microcontroller board as depicted in Fig. 4.7.

Fig. 4.6 Functional diagram of the gripper control system

88

P. Arpenti et al.

Fig. 4.7 Functional diagram of the gripper pneumatic circuit Fig. 4.8 Load cells configuration. F y is the force acting to the load cells when the box is picked from the upper side; F z is the force acting to the load cells when the box is picked from the lateral side

Load Cells Thanks to four load cells embedded in the gripper flange, it is possible to measure the weight of the boxes in the horizontal configuration as well as the vertical one. The load cells are assembled as shown in Fig. 4.8, so that the measures can be compared on each couple of sensors. The maximum force applied to each load cell is 500 N (50 kg) with a resolution of about 10 g. The four load cells are connected to the four Analog to Digital Converters (ADCs) of the microcontroller, which is in charge of the data acquisition and processing. Due to the load cells position, not only is the box weight measured by the sensors but also the gripper weight contribute to a gravity force component which depends on the gripper orientation. Hence, we extract only the gravity components due to the box mass using the force observer presented in [15]. The gripper dynamic model required to apply this method has been identified using the approach presented in [8]. TOF Distance Sensors In the gripper design, four ToF (Time of Flight) sensors are used to measure the distance between the gripper and the boxes, when the gripper is close to a box, to improve the accuracy of the planned trajectory. As shown in Fig. 4.9, the four sensors are placed in the frontal part of the system in order to spot the configuration of the front facing flat surface of a box. In particular

4 Robots Working in the Backroom: Depalletization of Mixed-Case Pallets

89

Fig. 4.9 Top: Gripper actuators arrangement. Bottom: ToF distance sensors arrangement

we identify the four sensors as: lu (Left Up), r u (Right Up), ld (Left Down), r d (Right Down). Notice that the relative distance of the sensors is not always fixed. In particular, while the two sensors of a single suction system are placed in a fixed position, the distance between the left-end and right-end sensors changes according to the reconfiguration of the gripper. It can be obtained from the relative position of the two sucking systems. The distance D of the gripper with respect to the box is estimated by the average of the four sensors readings: D=

1 (Dlu + Dr u + Dld + Dr d ) 4

(4.3)

where Dlu , Dr u , Dld , Dr d are the distances of the four sensors respectively. Moreover, the pitch (θ p ) and yaw (θ y ) angles of the box with respect to the gripper are given by: θp =

1 −1 tan 2

Dlu − Dld Dv

+

1 tan−1 2

Dr u − Dr d Dv

(4.4)

90

P. Arpenti et al.

θy =

1 tan−1 2

Dr u − Dlu Dhu

+

1 tan−1 2

Dr d − Dld Dhd

(4.5)

where Dv is the distance between the ToF sensors along the vertical direction, and Dhu and Dhd are respectively the distances along the horizontal direction between the upper and down ToF sensors. As mentioned before, these two distances are changing when the gripper configuration is changed and are computed online from the horizontal slide motor state.

4.2.3 Grasping Procedure Figure 4.10 shows the grasping procedure of the gripper according to the following steps: (a) The robot moves the gripper in the neighborhood of the target box; in this situation, we assume that the error of the gripper height, with respect to the base of the box is at most 2 cm. (b) The gripper is reconfigured according with the dimensions of the target box; here, the ToF sensors are deployed to refine the position and orientation of the gripper. (c) The gripper sticks to the box using the suction systems. (d) The suction plates are lifted by the pneumatic cylinders to make room for the forks. (e) The two forks are deployed. (f) The suction plates are lowered to their initial position in order to place the box on the forks. At the end of this process, the box can be safely moved to the desired location.

Fig. 4.10 Actions to grasp a box using the proposed gripper: a Gripper approaches to the box. b Reconfiguration. c Gripper sticks to the box using the suction systems. d The suction plates are lifted to give room to the two forks to slide in. e Insertion of the two forks under the package. f Lowering the suction systems to rest the box on the forks

4 Robots Working in the Backroom: Depalletization of Mixed-Case Pallets

91

4.3 Detection, Recognition and Localization The proposed system has been developed to allow a robotic cell to autonomously depalletize a mixed pallet, such as the ones that, nowadays, are manually depalletized by the human clerks in the backrooms of supermarkets. This task requires the capability to identify cases with different shapes and textures, as well as the capability to localize them in a fixed reference frame with a precision below a given tolerance, to use the estimated poses as references for the motion planner of the robot. The overall pipeline of the perceptual system (Fig. 4.11) is composed of three modules: the cases database (CDB) which contains the information about the cases in the mixed pallet; the detection module (DM) where the cases are detected from RGB-D data; the geometrical module (GM) which identifies such cases with those in the CDB and localizes them in the scene [1].

4.3.1 The Cases Database The CDB contains information about the cases in the current mixed pallet. Each case c in the CDB is associated with a tuple (bc , n c , xc , yc , z c , c ) where bc is the product barcode, n c is the number of instances of c in the pallet, xc , yc , and z c are the dimensions of the case, and c = (Ic1 , Ic2 , ..., Icn ) is a set of images, one for each face of the case, if c is textured, otherwise it is empty. To standardize the dimensions of each case, it is defined a convention where the z axis is always aligned to the width of the case.

4.3.2 The Detection Module The DM detects closed planar surfaces in the three-dimensional space, using both the RGB image I p and the depth image D p of the mixed pallet. It produces a preliminary association between regions in the space and the faces of the cases. The process can be divided into several distinct phases. As a preliminary step, since the position and the dimensions of the pallet are known in advance, the background is removed from both D p and I p . Afterwards, a feature detection algorithm is exploited to describe and detect local features, such as points, corners, edges, and so on, in I p . Such features are then matched with those detected in Ic for each textured case c contained in CDB. Since multiple instances of the same case could be present in CDB (n c > 1), two strategies to infer good matches are deployed. Firstly, given a set of matching features (keypoints) m 1 , m 2 , . . . , m n between Ic and I p , that are ordered by the Lowe’s distance (refer to [14] for additional details), a subset of good matches m 1 , m 2 , . . . , m k with k < n is selected if and only if

92

P. Arpenti et al.

Fig. 4.11 Overall pipeline of the perceptual system: CDB (blue), DM (yellow), and GM (orange). Case reference frame has the axes aligned to the orthogonal edges of the case

∃k < n, dlw (m k ) < th dlw (m k+1 )

(4.6)

where dlw is the distance of Lowe while th ∈ [0, 1] is a suitable threshold, empirically set to 0.7, which is chosen so as to exclude matches whose Lowe’s distance strongly differs from the distance of the previous ones. In this way, since a single feature can be assigned to multiple occurrences of the same case in I p , the k-best features are collected instead of the single best one. These additional keypoints are virtually

4 Robots Working in the Backroom: Depalletization of Mixed-Case Pallets

93

associated with the multiple instances of the cases. To distinguish such instances, a clustering process is used for both images (the case and the pallet one) and the comparison is made considering clusters of matching features belonging to the same region. In particular, multiple occurrences of the same case can lead to multiple clusters, then only the best cluster is selected, whose features likely belongs to a single instance of the case. If such features are in a sufficient number, then a homography transformation between the image Ic and the pallet image I p is computed. In this case, the system creates a binary segment s f eat as a binary mask image corresponding to the region occupied by the image Ic of the matched face in the starting image I p . The mask is then applied to D p to isolate the region of the depth-map corresponding to s f eat . Moreover, since the segment s f eat is supposed to cover a single face, which is a region whose depth values are constant or change with continuity, an abrupt variation of the depth values indicates that the generated segment is not correctly fitting the face. For instance, if a textured case (in the background) is partially occluded by another case (in the foreground) and the number of matching keypoints is sufficient to cross the threshold, the occluded face can be detected by the algorithm. In this case, a new segment s f eat is generated, which does not correspond to a single face and must be discarded by the rest of the procedure. To do this, for each s f eat , the corresponding segment sd is analyzed. If the discontinuity area is less than a given percentage threshold value of the total area of the depth segment sd , than the segment s f eat is stored, otherwise, it is rejected. In the following experiments, such threshold has been experimentally set to 10%, taking into account possible noisy measures of the RGB-D sensor. The segments which pass such depth test are stored in the set S f eat and finally used to create monochromatic patches which are added to the input image I p to cover the textured cases detected, generating the image I f eat . The whole process of feature detection is described in Algorithm 1. For each case c in CDB and for each image Ic in c , the matching features detected in Ic and I p are collected into the set M (line 5). For each pair of features (m i , m i+1 ) in the set, the condition (4.6), necessary to collect the k-best features, is verified and the features are then stored in M ∗ (lines 6 to 8). Matches in M ∗ are associated to keypoints both in Ic and I p . They are then clustered by Euclidean distance in order to find groups of features belonging to the same region. The sets K p and K c are created (line 12), where K p contains all the clusters of keypoints found in I p while K c refers to Ic . In particular, |K c | is the number of clusters in which the keypoints belonging to a single face image Ic are grouped in. If the number of clusters in the input image, |K p |, is greater than |K c |, there are probably more instances of the same case inside the pallet. Anyway, the cluster with the highest number of keypoints in K p and the corresponding cluster in K c are collected in K match (line 13). If the number of matches in K match is sufficient (line 14), then the homography h between the two clusters of features is computed in the function f ind_homography() (line 15) and the binary segment s f eat is created from Ic through the function binar y_segmentation() (line 16). If the corresponding depth region is flat, i.e., there are not huge discontinuities of depth values in the region of the depth image D p associated to s f eat (line 17), the segment s f eat is stored in the set S f eat (line 18) and a monochromatic patch is added to the region corresponding to s f eat (line 20). At the end of this process

94

P. Arpenti et al.

Algorithm 1 The feature detection procedure is invoked to take into account textured faces inside the pallet. 1: procedure f eatur es_detection(I p , D p ) 2: I f eat = I p 3: for each c ∈ CDB do 4: for each Ic ∈ c do 5: M = f eatur e_matching(I p , Ic ) 6: for each (m i , m i+1 ) ∈ M do 7: if dlw (m i ) < th dlw (m i+1 ) then 8: M ∗ = (m 1 , . . . , m i ) 9: break 10: end if 11: end for 12: (K c , K p ) = clusteri ze(M ∗ , Ic , I p ) 13: K match = best_match(K c , K p ) 14: if |K match | ≥ 4 then 15: h = f ind_homography(Ic , I p , K match ) 16: s f eat = binar y_segmentation(h, Ic ) 17: if is_ f lat (s f eat , D p ) then 18: S f eat ← S f eat ∪ {s f eat } 19: B f eat ← B f eat ∪ {bc } 20: I f eat = add_ patch(Ic , I f eat , h) 21: end if 22: end if 23: end for 24: end for 25: return (I f eat , S f eat , B f eat ) 26: end procedure

the procedure returns a patched image I f eat , the set of segments S f eat and the set of barcodes B f eat (line 25). In particular, the image I f eat , which is the image of the pallet with the detected textured cases covered by patches, is then exploited by an image segmentation algorithm to segment the remaining untextured faces. The outcome of this process is twofold: the first output is the image Iseg , which is the completely segmented representation of the pallet; the second output is the set of segments Sseg , which is added to the set S f eat to get the set S = S f eat ∪ Sseg of all the segments s detected in the input image I p . In the faces detection phase, the binary segments s are exploited as masks for the depth image D p , resulting in a sequence of depth segments sd , as many as the segments stored in S, collected in the set Sd . Each of such depth segments is then exploited to build an associated point cloud which represents the depth segments in the three-dimensional space for the camera frame. For each segment s pc obtained from the point cloud, the planar surface model is estimated through an iterative method. Once the parameters describing each planar surface have been correctly retrieved, all the points belonging to each point cloud are projected into the associated plane. Finally, assuming that all the faces of the cases are rectangular, each planar

4 Robots Working in the Backroom: Depalletization of Mixed-Case Pallets

95

Algorithm 2 The face detection procedure detects candidate faces in the threedimensional space. 1: procedure f aces_detection(D p , S) 2: for each s ∈ S do 3: sd = apply_mask(D p , s) 4: s pc = get_ pointcloud(sd , D p ) 5: p = f ind_ plan(s pc ) 6: s ppc = pr oject (s pc , p) 7: f = bounding_r ect (s ppc ) 8: if s ∈ S f eat then 9: F ← F ∪ {( f, bc )} 10: else 11: F ← F ∪ {( f,“unknown”)} 12: end if 13: end for 14: return F 15: end procedure

point cloud is bounded by the minimum-area enclosing rectangle. Such rectangles represent the candidate faces f which have to be recognized by the GM. The whole process concerning plane estimation and candidate face detection is described in Algorithm 2. For each binary segment s in S, the procedure begins with the creation of a depth segment sd using s to mask D p (line 3). A point cloud segment s pc is obtained (line 4) which is used to estimate the underlying planar surface model p (line 5). Then s pc is projected on the plane p obtaining a planar point cloud segment s ppc (line 6), hence the bounding rectangle which is the candidate face f is built (line 7). Finally, if the segment s is in S f eat , the associated face f is stored in F along with the barcode bc (line 9), otherwise, the segment s is in Sseg so no barcode is available. In this case, the face f is stored in F associated to the “unknown” identifier (line 11), indicating that the face is detected but not recognized. Notice that, if all the cases are untextured, then the feature detection procedure is never invoked and, as a consequence, the DM reduces to image segmentation and faces detection phases only.

4.3.3 The Geometrical Module The role of GM is to associate each candidate faces f ∈ F provided by the DM to one of the cases listed in the CDB and, successively, to estimate its pose. This task is addressed by two distinctive procedures: the geometrical matching and the geometrical localization. For each f stored in F with a barcode bc , the geometrical matching checks if the width and the height characterizing f are compatible with two out of the three dimensions xc , yc , and z c . The candidate face f is recognized, i.e. definitely associated with a case among those listed in the CDB, when the difference

96

P. Arpenti et al.

Algorithm 3 The geometrical matching procedure recognizes faces and associates them to the cases listed in the CDB. 1: procedure geometrical_matching(F) 2: for each ( f, b) ∈ F do 3: if b = “unknown” then 4: c∗ = case_ f r om_bar code(b) 5: if match_dimensions( f, xc∗ , yc∗ , z c∗ ) then 6: c∗ ← subs_dimensions( f, xc∗ , yc∗ , z c∗ ) 7: C ∗ ← C ∗ ∪ {c∗ } 8: end if 9: else 10: c∗ ← f ind_best_match( f, CDB) 11: c∗ ← subs_dimensions( f, xc∗ , yc∗ , z c∗ ) 12: C ∗ ← C ∗ ∪ {c∗ } 13: end if 14: end for 15: return C ∗ 16: end procedure

between its dimensions and those of the case is less than a given threshold value. Then, the matched case c∗ can be stored in the set of the recognized cases C ∗ . Otherwise, if the dimensions are different, the geometrical matching function searches for the best matching case among the ones left in the CDB. If a good match is found, the matching case is stored in the set C ∗ . An analogous process is performed if f is associated to the “unknown” identifier, meaning that it has not been already associated with a barcode by the DM. The process of geometrical matching is described in Algorithm 3. For each candidate face (line 2), if f is associated with a known barcode b (line 3), the system selects from the database a case c∗ having the same barcode (line 4). The dimensions of f and c∗ are then compared (line 5) and, in case of matching, an updated description of c∗ is stored in C ∗ . On the other hand, if the dimensions of c∗ are too different from the detected one, then the procedure associates f with a new case c∗ that better matches its dimensions (line 10). Also in this case, an updated version of c∗ is stored in C ∗ (lines 11–13). The cases recognized by the geometrical matching are subsequently localized by the geometrical localization, i.e. the pose of each matched case is estimated in a fixed reference frame (world frame). Such pose is described by the position of the centroid of the case and by the orientation of the reference frame relative to the case, which is the direction of the three coordinates axes aligned along with the cases orthogonal T edges. Every matched case c∗ is characterized by c¯w = cwx cw y cwz ∈ R3 , c¯h = T T ch x ch y ch z ∈ R3 , and c¯d = cdx cd y cdz ∈ R3 which are the vectors (linked to the width, the height and the depth of the case, respectively) whose directions and norms describe the sought poses. The first two, at this stage of the process, are completely known, (they are the vectors associated to the matched face) whereas the direction of the third one has to be computed (the norm is equal to the dimension of

4 Robots Working in the Backroom: Depalletization of Mixed-Case Pallets

97

Algorithm 4 The geometrical localization procedure localizes cases in the threedimensional space. 1: procedure geometrical_locali zation(C ∗ ) 2: for each c∗ ∈ C ∗ do c¯¯o = get_orientation(cˆw , cˆh ) 3: 4: c¯ p = get_ position( f¯p , c¯d ) 5: c∗pose = (c¯ p , c¯¯o ) 6: C ∗poses ← C ∗poses ∪ {c∗pose } 7: end for 8: return C ∗poses 9: end procedure

the case which has not been matched). Such direction is given by the cross product cˆd = cˆw × cˆh , where cˆd is the unit vector of c¯d , cˆw = c¯w / c¯w is the unit vector of c¯w , and cˆh = c¯h / c¯h is the of c¯h . The orientation of the case is hence unit vector given by the matrix c¯¯o = cˆw cˆh cˆd ∈ R3×3 . On the other hand, the position of the centroid of the case, C¯ p ∈ R3 , is defined by projecting the center of the recognized face, f¯p , along cˆd , by a length equal to the half the norm of c¯d , as: c¯d . c¯ p = f¯p + cˆd 2

(4.7)

The localization process is described in Algorithm 4. For each recognized case c∗ , the orientation is given by the unit vectors cˆw , cˆh , and cˆd where the third one is computed through the cross product (line 3). Known the three unit vectors, as well as the three dimensions of the case, it is possible to estimate the position of the centroid as in (4.7) (line 4). Known c∗p and co∗ , the pose c∗pose of each recognized case is hence available (line 5). Finally, the outcome of the whole procedure is the set C ∗pose of all the poses of the recognized cases.

4.4 Integrated Cell In this section, we illustrate the overall architecture of the integrated depalletizing system describing its main components and functionalities (see Fig. 4.12). The previous modules are exploited to design a sensorized robotic cell where a robot manipulator operates and different cases have to be moved or stored from a mixed pallet to several target locations depending on logistic requests. In addition, the entire depalletizing process is monitored and scheduled by the executive system that deploys a hierarchical representation of tasks to orchestrate both the robotic actions, the perceptual module and the communications between the cell and the logistic management. This architecture is designed to be decoupled form the logistic flow to make the executive system able to suitably adapt the task execution and the depalletizing strategy to different logistic scenarios.

98

P. Arpenti et al.

Fig. 4.12 Overview of the proposed architecture. The executive system supervises robotic task execution and connects the cell with the in-house logistic flow

4.4.1 Cell Design In order to facilitate the integration in the logistic flow of supermarkets, the proposed robotic cell is designed to be flexible and adaptable to different logistic contexts [2]. In the considered application typical of supermarkets’ backroom, the products are collected into cases that have to be stored from the original pallet to a set of generic target locations (like trolleys or shelves). The cell includes a robot manipulator endowed with the proposed suction-based gripping end-effector and an RGB-D camera allowing detection/recognition of the cases and the estimation of their pose as explained in the previous sections [1]. The structure of the robotic cell can be split in two areas: an inner area (the inner part of the robot workspace) where the robot operates and an outer area (the outer part of the robot workspace) where products are stored to be eventually taken from external agents and carried to the next steps of the logistic flow. A representation of the proposed cell is depicted in Fig. 4.13. In our setting, the robot manipulator is placed in the center of the inner area and multiple storing locations can be placed all around it, in the outer area. The pallet is positioned in front of the robot, allowing the manipulator to reach the cases and to move them toward the target locations, or to move cases from one location to another, while the RGB-D camera is placed between the robot and the pallet, front-facing the latter. Notice that the cell configuration, along with the flexible design of the proposed gripper, allows the cases to be grasped in different ways: either vertically (from the above) or horizontally (from the sides). This feature is particularly relevant in

4 Robots Working in the Backroom: Depalletization of Mixed-Case Pallets

99

Fig. 4.13 CAD representation of depalletizing cell. The configuration includes 6 trolleys (green color) used to store the boxes taken from the pallet (blue color) by an industrial manipulator (red color)

supermarket logistics because several types of cases must be taken from the sides, and storing repositories like shelves or trolleys are usually filled horizontally.

4.4.2 Executive System To control the robotic cell and to on-line adapt picking/placing strategies depending on the observations and the logistic context, we deploy an executive system similar to the one proposed in [3]. It exploits an HTN-like representation of robotic tasks and operators including symbolic constraints and effects [19]. More specifically, each task is hierarchically defined by a set of predicates schema(m, l, p), where m is the name of the task, l is a list of subtasks associated with preconditions, while p represents a postcondition used to check the accomplishment of the task. An example of predicate is given below: schema(task(x1 , x2 . . . , xn ),

(subtask1 (x11 , x12 , . . . ), pr econd1 ), ..., (subtaskk (xk1 , xk2 , . . . ), pr econdk ), , postcond). A schema stands for a parametric task and can be either abstract or concrete, where abstract schemata are tasks that need to be further decomposed into sub-tasks, while concrete schemata are atomic primitives to be executed. Notice that, in our task representation, both robotic actions and the communications (like sending/receiving

100

P. Arpenti et al.

messages to/from external systems or users) are associated with primitives that can be scheduled and executed by the executive system. Schemata are also endowed with context-specific preconditions and postconditions that are continuously evaluated during execution. An example of a storing task is proposed below: schema(stor e(Box),

(take(Box), tr ue), (check(Box), Box.taken ∧ ¬Box.known), (quer y(Box), Box.known ∧ ¬Box.target), (leave(Box), Box.taken ∧ Box.target) , Box.stor ed). The task stor e(Box) represents the process of picking and placing a box from the pallet to a target location. The task is decomposed into 4 sub-tasks: the box is firstly grasped from the pallet (take(Box), always enabled) and moved to a barcode reader (check(Box)) if not recognized by the vision system (Box.taken ∧ ¬Box.known). When the box is correctly recognized (Box.known), if the storing position is unknown (¬Box.target) the system asks for it (quer y(Box)) and the box is finally placed in the storing position (leave(Box)). The whole task is considered accomplished when the box is stored into the target place (Box.stor ed). In order to be monitored and executed, tasks are allocated on-line from the repository to the executive system. This process generates for each task an annotated tree whose nodes are grounded schemata (i.e. the parameter Box of the previous example is now replaced with a box id) and edges are parental relation between them. During the execution, we associate to each node of the tree a state that can be: enabled if all precondition along the branch are satisfied, disabled if at least one precondition is not satisfied and accomplished if the postcondition is satisfied. In particular, enabled nodes of the tree that are associated to motion or communication patterns (concrete nodes) are directly executed by means of robot movements or sending/receiving messages. An example of allocated stor e task is proposed in Fig. 4.14. The task includes 4 abstract subtasks (dashed ovals), which are further decomposed into 7 concrete actions (continuous ovals). Nodes that are identified by kr 60 functors are associated with robotic actions, while the others are communication actions. Notice that this structured and hierarchical representation of tasks can be easily adapted to different logistic contexts: robotic primitives can be exploited as building blocks [5] to compose more complex tasks or to adjust task execution.

4.4.3 Selection and Grasping When a new pallet is transported into the cell, the system is provided with a set of boxes/products B pall from the CDB that are placed on the pallet, while each

Fig. 4.14 Running example of a stor e task. Preconditions are attached to the edges while dashed and continuous ovals are for abstract and concrete tasks respectively. In this representation green nodes/preconditions are enabled/satisfied, blue nodes are accomplished and red nodes are disabled

4 Robots Working in the Backroom: Depalletization of Mixed-Case Pallets 101

102

P. Arpenti et al.

Fig. 4.15 Color-based representation of the priority given the 3D position of the boxes. Assuming the robot in the origin, colors from red to blue represent decreasing priority

element b ∈ B pall is associated with box-specific information like barcode, weight and dimensions. Since the configuration and the pose of each element is not known in advance, the sequence in which boxes are taken have to be decided on-line depending on perceptual information. An effective heuristic for depalletizing is to take boxes from the upper part to the base (top-to-down) and from the sides to the center of the pallet (sides-to-center). This way, the base of the pallet is always larger then the summit and the stability of the structure is not compromised. Moreover, since the pallet is positioned in front of the robot, we slightly prioritize frontal boxes to support manipulability. Following the above criteria, we define a suitable function h(b) that associates a priority to boxes considering their positions on the pallet, defined as: h(b) =

1 2 4 |yb | + (xmax − xb ) + z b . 7 7 7

(4.8)

In Eq. (4.8), the components of the center-of-mass of the box [xb , yb , z b ] are weighted to prioritize boxes that are higher positioned ( 47 z b ), closer to the robot ( 27 (xmax − xb )) and closer to the edges ( 71 |yb |). Here we strongly prioritize higher positioned boxes to reduce the size of the higher layers of the pallet, then we consider boxes from the sides to the center giving more emphasis to the most accessible ones (frontal boxes). A graphical representation of the function h(b) is illustrated in Fig. 4.15. Here, boxes that are placed in red areas are prioritized by the robotic system, while the others (yellow/blue areas) are taken at a later time. Besides their position, the priority of the boxes is also affected by the requests that can be asynchronously provided to the system in response to specific logistic necessities. In this case, if a requested box is recognized by the perceptual system, the

4 Robots Working in the Backroom: Depalletization of Mixed-Case Pallets

103

Fig. 4.16 Example of the graph G (arrows and circles) overlapping a group of boxes (brown rectangles). Each box bi is associated to the corresponding vertex vi of the graph (inner circles) while arrows are relation between them. When the box b4 is requested the score of the associated sub-graph T (b4 ) (green arrows/nodes) is updated

score of all perceived boxes is adjusted to facilitate the selection of the requested one. We can define the output of the perceptual system as a subset of boxes Br ec ∪ Bdet ⊆ B pall that is partitioned into a set of recognized boxes (Br ec ) with their associated barcodes and a set of detected boxes (Bdet ) that have been perceived by the system but not recognized (e.g. boxes with no textures and occluded barcode). Moreover, we assume to have in input a set of requested boxes R ⊆ B pall were each box b ∈ R is associated to a priority p(b) ∈ [0, 1] denoting the importance of such request. If a requested box is reachable and graspable by the robot (namely, in the frontal-upper part of the pallet) it can be taken directly, otherwise, if the box is in the middle of the pallet or partially occluded by other boxes, the system has to adapt the way boxes are taken in order to free the requested one. We define an additional priority function l(b) that drives the system to unstack and store the requested boxes. To this end, given the pallet configuration, we create a graph structure where each vertex is a box while edges are the dependencies between boxes. More formally, we define G = (E, V ) along with a mapping function v : Br ec ∪ Bdet → V associating boxes and vertices, where for each couple of boxes b1 , b2 such that b2 is placed on or occluding b1 there exists an edge (v(b1 ), v(b2 )) ∈ E. In G we can define the subset of vertices T (b) ⊆ V for a perceived box b which contains all the vertices that are reachable from v(b). This way, given a requested box br and the subset T (br ) the priority of the request p(br ) can be propagated to all the boxes b that are blocking br (i.e. v(b) ∈ T (br )). An example of propagation is shown in Fig. 4.16, while the function l(b) for a generic perceived box is defined in the following equation: l(b) =

max ( p(x)).

x:v(b)∈T (x)

(4.9)

104

P. Arpenti et al.

Algorithm 5 The selection process is invoked when no storing tasks are allocated. 1: procedure boxSelection 2: get requested R = {r1 , r2 , . . . , rm } 3: get boxes B = {b1 , b2 , . . . bn } 4: init queue Q ← ∅ 5: create graph G from boxes B 6: for b ∈ B do 7: get b pose [xb , yb , z b ] 8: h(b) = 17 |yb | + 27 (xmax − xb ) + 47 z b 9: get vertex v(b) from G 10: l(b) = maxx:v(b)∈T (x) ( p(r )) 11: s(b) = wl(b) + (1 − w)h(b) 12: insert b into Q ordered by s(b) 13: end for 14: f ound ← f alse 15: while ¬ f ound do 16: pop box b with max s(b) from queue Q 17: compute grasping poses Gp from b 18: if exists a reachable pose p ∈ Gp then 19: set target t ← b 20: set f ound ← tr ue 21: end if 22: end while 23: allocate stor e(t) task 24: end procedure

In this way, if a box b is blocking one or more requested boxes, it inherits the priority of the most important one. Finally, the score of a box s(b) can be defined as a weighted sum of both priority functions, namely: s(b) = wl(b) + (1 − w)h(b),

(4.10)

where the weight w can be suitably regulated to emphasize logistic-based or positionbased depalletizing strategies. Notice that learning approaches, similar to [4, 5], can be easily deployed to balance the two functions following human demonstrations. The whole process of box selection is described in Algorithm 5. In every row the system collects the lists of requested (line 1) and perceived (line 2) boxes and initialize a queue that will be used to store the boxes sorted by score (line 3). The list of perceived boxes is also exploited to generate the graph of dependencies G (line 4). Each perceived box b is then associated to its position-based priority h(b) (lines 6–8) and to its request-based priority l(b) (lines 9–10). Finally, both priorities are fused into the score s(b) (line 11) and the box is inserted into the queue Q sorted by s (line 12). In the second part of the algorithm, until a graspable box is found (lines 14–15), boxes are taken form the queue (line 16) and if a valid grasping point exists for it (lines 17–18) the box is selected to be depalletized (lines 19–20) and the associated storing-task is allocated to be monitored and executed by the executive system (line 23).

4 Robots Working in the Backroom: Depalletization of Mixed-Case Pallets

105

4.5 Experiments In this section, we propose several case studies in a supermarket scenario considering both simulated and real environments. In simulation, we performed the analysis about the proposed depalletizing strategy, where multiple pallet configurations and logistic requests are randomly generated and the proposed framework has to suitably adapt the depalletizing sequence to them. In the real scenario, we show the system at work in a realistic supermarket depalletizing task, where different products have to be grasped and moved from the original pallet to specific location of trolleys, assessing the performance of the gripper along with the effectiveness of the perceptual and the executive systems. The real testing environment is depicted in Fig. 4.17. The depalletizing cell covers a 5 × 5 m area and includes: • 7 locations for box storing: 6 movable trolleys on the sides and 1 fixed shelf on the rear part of the cell; • a 6-DOF Kuka kr60-3 robotic arm with 60 kg of payload and a 2 m radius workspace endowed with the proposed suction-based system.

Fig. 4.17 Depalletizing cell (up) along with its main components (down)

106

P. Arpenti et al.

• an Intel Realsense D435 RGB-D camera which is exploited by the perceptual system for the recognition, identification and localization of cases. • different instances of mixed pallets including up to nine textured/untextured cases. The software architecture is integrated in ROS-kinetic, while robot motion-planning is performed by means of the OMPL library [22].

4.5.1 Quantitative Analysis In the quantitative analysis, the proposed depalletizing strategy is evaluated in simulation. The grasping performance, as well as the recognition and localization capabilities of the detection modules, are demonstrated in a set of experiments. Depalletizing Strategy The aim of this case study is to evaluate the effectiveness of the selection method proposed Sect. 4.4.3. In particular, we set-up a RViz simulation where both the pallet configuration and the sequence of requests are randomly generated. The system has to depalletize cases following their scores, while avoiding collisions due to occlusions. The total volume of each random configuration is fixed to 1.6 × 0.8 × 1 m, while the number, position and dimensions of cases are randomly generated. For each configuration, the system is deployed in two modalities: the first one is a baseline, where s(x) = h(x), then the selection process only depends on the position of boxes on the pallet; instead, in the second modality, both components are considered as in Eq. (4.10). Here, we are interested in studying how the average number of actions needed to fulfill a request changes, considering different settings of w. To this end, we selected 5 values for the weights [0.1, 0.3, 0.5, 0.7, 0.9] and, for each setting, we generated 50 random configurations of cases/requests. Considering both modalities, we performed a total of 500 simulated executions. Table 4.1 illustrates the average and the deviation of the number of cases stored before the requested one is selected. Here, it is possible to notice that, as expected, in the second modality (where both h(x) and l(x) are deployed) the number of actions needed to accomplish a request is always below the baseline modality (where only h(x) is considered). Moreover, as depicted in Fig. 4.18, the average action-reduction grows with the increment of the weight, becoming stable at around 5.5 actions for w ≥ 0.5. On the other hand, the request-oriented behavior of the system may lead to unstable configurations of the cases. In Fig. 4.19, we show an example of a random configuration (a) that is partially depalletized following the two modalities. After 20 steps (b-c) the h(x)-only modality lead to a compact configuration, while in the h(x)+l(x) modality there are some unstable pillars of boxes that are more likely to collapse.

4 Robots Working in the Backroom: Depalletization of Mixed-Case Pallets

107

Table 4.1 Number of actions needed to satisfy the requests for both modalities (250 runs for each modality) Weight h(x)-only h(x)+l(x) w = 0.1 w = 0.3 w = 0.5 w = 0.7 w = 0.9

Avg Std Avg Std Avg Std Avg Std Avg Std

12.74 9.91 13.10 10.02 12.81 9.19 13.03 9.73 12.84 9.47

11.14 9.89 9.18 8.92 7.81 7.34 7.69 7.45 7.34 7.13

Fig. 4.18 Average improvement on request accomplishment. The increment of w reduces the average number of actions needed to fulfill a request

Fig. 4.19 Random configuration of cases (a) along with the state of the execution using h(x) only (b) and the full version of s(x) (c) after 20 steps

108

P. Arpenti et al.

Fig. 4.20 Snapshots of the gripper including a frontal view (up), and horizontal (down-left) and vertical (down-right) grasping procedures

Grasping Performance The relevant feature of our design is the adaptability to different boxes and the possibility to take non-standard boxes also from the lateral side. We designed a case study to demonstrate the ability of the gripper to grasp boxes with different dimension, shape and weight. The gripper prototype mounted on the terminal flange of an industrial robotic arm KUKA KR60-3 is shown in Fig. 4.20. We performed a set of 57 experiments considering 19 different boxes picked using the proposed gripper. The boxes have been selected, from a typical supermarket pallet, to have a different shapes and dimensions, with weights in a range [0.5, 10] Kg. Each test has been repeated three times. During the experiments, the robot was controlled manually to decouple the performances of our gripper with respect to the robot positioning. The experiments have been carried out by considering the following pattern of actions: 1. Approach the gripper close to the target box. 2. Start the grasping procedure. 3. Move the box in a destination position to verify the stability of the grip. We tested the grasping of boxes from either lateral, frontal or upper side. Notice that, while the lateral-grasping procedure are performed executing all the actions described in Sect. 4.2.1, the upper-grasping processes only consists in executing step

4 Robots Working in the Backroom: Depalletization of Mixed-Case Pallets

109

Fig. 4.21 From left to right, the phases required to grasp boxes from the lateral side a–c and from the top d are reported: a Proper product. b Mozart product (box of bottles). c Frosh product taken from the small side using only one fork. d Frosh product taken from the upper side

(c). Some snapshots of four experiments (out of the 19 performed) are reported in Fig. 4.21. From left to right, all the steps required to grasp boxes from the lateral side (a–c) and from the upper side (d) are shown. The experimental results are reported in Table 4.2. The 19 different products are identified with the product name (see Fig. 4.22 to have a reference image for each product). In our test, we considered the product as grasped (yes or yes-sf for products grasped with both forks or a single fork respectively) if it was actually taken in more than 2 consecutive tests and not-grasped (no) otherwise. Finally, NA is associated to boxes that cannot be picked from the specified side. It is possible to see that the proposed gripper was able to grasp products in 17 out of 19 cases. It is worth noticing that, in the considered product set, the 83% of the products are not graspable at least from one side, and the proposed gripper was able to complete the grasping process from another side. This is due to the gripper capability to reconfigure itself and grasp boxes from different sides. In our test, only two products were not correctly grasped. In the “ricola” case, the packaging presents holes and fast opening in all the 6 sides. In the “small-bottles” case, the small size and the curved shapes of the packaging impair suckers to correctly stick on the surface. However, both these cases could be solved by replacing the suction cups with custom ones chosen for the specific application. Detection, Recognition and Localization of Cases This case study aims at evaluating the recognition and localization capabilities of the system. 9 cases (see Fig. 4.22 for an example of textured cases) organized in 10 different configurations of the mixed pallet, obtained by changing the positions of the cases, have been considered. For each configuration, the system was executed 3 times, for a total of 180 recognitions and localizations to be performed. The actual pose of each box is known because it was computed employing markers that have been displayed on every single case in a preliminary step.

110

P. Arpenti et al.

Table 4.2 Grasping results Prod. type Lateral

Frontal

Upper

Note Breaks from above Breaks from above

Always

No

Yes

NA

Ariel

Yes-sf

Yes

NA

Bang Frish-black

Yes No

Yes Yes

Yes NA

Proper

Yes

Yes

NA

Viss

Yes-sf

Yes

NA

Frosh1 Frosh2 Pampers Spee

Yes Yes Yes No

Yes Yes Yes Yes

Yes Yes Yes NA

Shar Vileda

Yes Yes

Yes No

Yes NA

Ricola

No

No

NA

Das Jessa Babylove

No Yes NA

Yes Yes Yes

Yes Yes NA

Mozart

No

Yes

NA

Small bottles

No

No

NA

Big bottles

No

Yes

NA

Breaks from above Breaks from above Breaks from above

Breaks from above Breaks from above Breaks from above, shape not allow lateral grip

Breaks from above No upper flat surface No upper flat surface No upper flat surface

The experiments carried out show that 177 faces out of 180 were correctly identified, with an accuracy of 98%. The system failed 3 times: 2 times visible cases were not recognized correctly, while 1 time an occluded face was recognized. Table 4.3, which refers only to the recognized cases, shows that the proposed system guarantees a good localization performance, with precision under the tolerance imposed by the gripper which requires a maximum estimation error of 0.020 m on the axes yc .

4 Robots Working in the Backroom: Depalletization of Mixed-Case Pallets

111

Fig. 4.22 The 19 different products used in the experimental section Table 4.3 Statistics of the centroids position estimation error Statistical index x (m) y (m) avg. err. norm. avg. err. std. dev.

−0.00179 0.010 0.013

−0.002 0.008 0.010

z (m) −0.012 0.012 0.007

4.5.2 Real Supermarket Scenario In this subsection, experiments performed in a real supermarket scenario are presented. Autonomous Execution This case study shows the whole depalletizing process (see Figs. 4.23 and 4.24). For the test, a pallet configuration that contains two instances of the same case was selected. The experiments were carried out in a cluttered environment without structured light. To avoid misreadings of the RGB-D sensor due to reflections, a black tendon was positioned behind the pallet. As explained in Sect. 4.3, the input image is firstly segmented by a feature detection algorithm. The Scale Invariant Feature Transform (SIFT) [14] was chosen, due to its robustness. The resulting patched image is then segmented through the Watershed Transform [16] which is an image segmentation method, based on intensity images, well suited for monochromatic faces. The segments which result from these two processes are then converted into binary images which in turn lead to planar surfaces estimated with Random Sampling and Consensus (RANSAC) [7] and hence to candidate faces in the three-dimensional space. The information contained in the CDB is then compared to those associated with each candidate face and finally, the cases are definitively recognized and localized in a fixed reference frame. The

112

P. Arpenti et al.

Fig. 4.23 System workflow. At each iteration, the faces in the input image (a) are detected via SIFT (b) and Watershed (c), and the textured ones are preliminary associated to barcodes (d). The geometrical matcher associates the cases in the CDB to each detected face and then the geometrical localizer estimates their poses (e, f)

estimated poses are sent to the high-level executive system that regulates the task execution. Each depalletizing task is composed of a recognition phase where the designed system is deployed to detect/recognize the cases, providing also their pose, a picking phase where one of the perceived case is taken from the mixed pallet (following the policy provided by the executive system), and a release phase where the case is stored in a specific location on a trolley. When a depalletizing task is accomplished, the stored case, with all the related information, is removed from the CDB and the whole process starts again from the beginning. The whole test was successfull completed, with all 9 cases of the mixed pallet correctly depalletized. Priority-Based Depalletizing A simplified configuration of the pallet is considered, with 10 cases placed in a single frontal line (see Fig. 4.17). The task is to store all the cases on a trolley (first on the right in figure) by using the full version of our scoring function in Eq. (4.10).

4 Robots Working in the Backroom: Depalletization of Mixed-Case Pallets

113

Fig. 4.24 Picking and release phases for two consecutive depalletized cases Fig. 4.25 Pallet configuration during real-world experiment: The labels on each box are the values of the scores before (gray equations) and after (black equations) that box b1 is requested; in the left upper part of the figure, the differences in the storing sequence are highlighted in red

For this experiment we set w = 0.2 to induce a conservative strategy (unstable configurations are rare with this setting) but still considering requests. In particular, b1 is designed as the requested box with a low priority ( p(b1) = 0.2). This is intended to show how even less prioritized requests affect the behavior of the system. Figure 4.25 shows the pallet configuration seen from the RGB-D frontal camera along with the priorities before and after the request. As shown in the legend in the top of the figure, by following the position-based priority only (no requests), the box b1 would be the last one to be stored, while this box is stored two actions before with the request-based priority enabled.

114

P. Arpenti et al.

In this set-up, for safety reasons, the robot velocity was limited to the 4% of the maximum allowed velocity. The system takes 19 56 to store the requested box, and each storing task takes an average of 2 28 per box.

4.6 Conclusion In this chapter, an integrated robotic depalletizing system has been presented. The system has been deployed in a real supermarket scenario considering a typical backroom setting. A suitably designed adaptive gripping module and a perceptive system allow the framework to operate with mixed pallets (i.e., pallets made by cases of different shapes and textures). Moreover, a peculiar depalletizing strategy was designed to coordinate manipulation activities while avoiding collisions and falling of cases. The integrated system has been tested for different pallet configurations showing satisfactory results in terms of grasping capabilities as well as successful recognition/localization of cases and satisfied requests.

References 1. Arpenti, P., Caccavale, R., Paduano, G., Andrea Fontanelli, G., Lippiello, V., Villani, L., Siciliano, B.: RGB-d recognition and localization of cases for robotic depalletizing in supermarkets. IEEE Robot. Autom. Lett. 5(4), 6233–6238 (2020). https://doi.org/10.1109/LRA.2020. 3013936 2. Caccavale, R., Arpenti, P., Paduano, G., Fontanellli, A., Lippiello, V., Villani, L., Siciliano, B.: A flexible robotic depalletizing system for supermarket logistics. IEEE Robot. Autom. Lett. 5(3), 4471–4476 (2020) 3. Caccavale, R., Finzi, A.: Flexible task execution and attentional regulations in human-robot interaction. IEEE Trans. Cognit. Dev. Syst. 9(1), 68–79 (2016) 4. Caccavale, R., Finzi, A.: Learning attentional regulations for structured tasks execution in robotic cognitive control. Auton. Robot. 43(8), 2229–2243 (2019) 5. Caccavale, R., Saveriano, M., Finzi, A., Lee, D.: Kinesthetic teaching and attentional supervision of structured tasks in human-robot interaction. Auton. Robot. 43(6), 1291–1307 (2019) 6. Echelmeyer, W., Kirchheim, A., Wellbrock, E.: Robotics-logistics: challenges for automation of logistic processes. In: 2008 IEEE International Conference on Automation and Logistics, pp. 2099–2103 (2008) 7. Fischler, M.A., Bolles, R.C.: Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM 24(6), 381–395 (1981) 8. Fontanelli, G.A., Ficuciello, F., Villani, L., Siciliano, B.: Modelling and identification of the da vinci research kit robotic arms. In: 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems, 1464–1469 (2017). https://doi.org/10.1109/IROS.2017.8205948 9. Fontanelli, G.A., Paduano, G., Caccavale, R., Arpenti, P., Lippiello, V., Villani, L., Siciliano, B.: A reconfigurable gripper for robotic autonomous depalletizing in supermarket logistics. IEEE Robot. Autom. Lett. 5(3), 4612–4617 (2020) 10. Holz, D., Topalidou-Kyniazopoulou, A., Stückler, J., Behnke, S.: Real-time object detection, localization and verification for fast robotic depalletizing. In: 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 1459–1466 (2015)

4 Robots Working in the Backroom: Depalletization of Mixed-Case Pallets

115

11. Katsoulas, D., Bergen, L.: Efficient 3d vertex detection in range images acquired with a laser sensor. In: Radig, B., Florczyk, S. (eds.) Pattern Recognition, pp. 116–123. Springer, Berlin, Heidelberg (2001) 12. Katsoulas, D., Kosmopoulos, D.I.: An efficient depalletizing system based on 2d range imagery. In: Proceedings 2001 IEEE International Conference on Robotics and Automation, pp. 305–312 (2001) 13. Krug, R., Stoyanov, T., Tincani, V., Andreasson, H., Mosberger, R., Fantoni, G., Lilienthal, A.J.: The next step in robot commissioning: autonomous picking and palletizing. IEEE Robot. Autom. Lett. 1(1), 546–553 (2016) 14. Lowe, D.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60, 91–110 (2004) 15. Magrini, E., Flacco, F., De Luca, A.: Control of generalized contact motion and force in physical human-robot interaction. In: 2015 IEEE International Conference on Robotics and Automation, pp. 2298–2304 (2015). https://doi.org/10.1109/ICRA.2015.7139504 16. Meyer, F.: Color image segmentation. In: 1992 International Conference on Image Processing and its Applications, pp. 303–306 (1992) 17. Moura, F.M., Silva, M.F.: Application for automatic programming of palletizing robots. In: 2018 IEEE International Conference on Autonomous Robot Systems and Competitions, pp. 48–53 (2018) 18. Nakamoto, H., Eto, H., Sonoura, T., Tanaka, J., Ogawa, A.: High-speed and compact depalletizing robot capable of handling packages stacked complicatedly. In: 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 344–349 (2016) 19. Nau, D.S., Au, T.C., Ilghami, O., Kuter, U., Murdock, J.W., Wu, D., Yaman, F.: Shop2: An HTN planning system. J. Artif. Intell. Res. 20, 379–404 (2003) 20. Prasse, C., Skibinski, S., Weichert, F., Stenzel, J., Müller, H., ten Hompel, M.: Concept of automated load detection for de-palletizing using depth images and rfid data. In: 2011 IEEE International Conference on Control System, Computing and Engineering, pp. 249–254 (2011) 21. Schwarz, M., Milan, A., Periyasamy, A.S., Behnke, S.: RGB-d object detection and semantic segmentation for autonomous manipulation in clutter. Int. J. Robot. Res. 37(4–5), 437–451 (2018) 22. Sucan, I.A., Moll, M., Kavraki, L.E.: The open motion planning library. IEEE Robot. Autom. Mag. 19(4), 72–82 (2012) 23. Tanaka, J., Ogawa, A., Nakamoto, H., Sonoura, T., Eto, H.: Suction pad unit using a bellows pneumatic actuator as a support mechanism for an end effector of depalletizing robots. ROBOMECH J. 7(1), 2 (2020) 24. Weichert, F., Skibinski, S., Stenzel, J., Prasse, C., Kamagaew, A., Rudak, B., Hompel, M.: Automated detection of euro pallet loads by interpreting PMD camera depth images. Logist. Res. 6 (2012)

Chapter 5

Robots Helping Humans: Collaborative Shelf Refilling Teodorico Caporaso, Dario Panariello, Stanislao Grazioso, Giuseppe Di Gironimo, and Luigi Villani

Abstract This chapter presents the ergonomic assessment of a typical shelf filling task performed by the store clerk. The proposed methodology is based on a robust design approach, which considers all the main factors that have influence on the ergonomic assessment of a typical refilling operation. The ergonomic assessment is based on two ergonomic indices, one specific for establishing the ergonomically optimal working height for lifting, and one specific for selecting the refilling process modality which minimises the clerks’ effort. The research work has been performed using both virtual simulations and real laboratory experiments. The goal is to provide input to a suitably designed robotic handling unit encapsulating a standard supermarket trolley. The handling unit consists in a suitable SCARA-like arm and two actuated trays, which allow to serve the cases with the products contained in the trolley at an ergonomic height for the clerks, with the aim of reducing refilling-related musculoskeletal disorders and thus improve clerks’ health and wellbeing. Keywords Ergonomic assessment · Ergonomic indices · Biomechanical risk · Collaborative robotics

5.1 Introduction The REFILLS project was aimed at developing robotic solutions intended to support clerks in the execution of the instore logistic operations, by replacing them in some tasks and collaborating with them in other tasks. Some operations—the most T. Caporaso (B) · D. Panariello · S. Grazioso · G. Di Gironimo CREATE and Department of Industrial Engineering, University of Naples Federico II, Napoli, Italy e-mail: [email protected] L. Villani CREATE and Department of Electrical Engineering and Information Technology, University of Naples Federico II, Napoli, Italy e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 L. Villani et al. (eds.), Robotics for Intralogistics in Supermarkets and Retail Stores, Springer Tracts in Advanced Robotics 148, https://doi.org/10.1007/978-3-031-06078-6_5

117

118

T. Caporaso et al.

physically or sometimes cognitively demanding for humans—are completely entrusted to robots: scanning the shelves, depalletizing, moving the trolleys with the cases close to the shelves, recognizing and pointing the level of the shelf where a given product must be placed. In some other operations, as that of emptying the cases and refilling the shelves, the clerks are by far more skillful than the robots. For this specific operation, a collaborative solution has been devised by developing a robotised handling unit capable of serving the clerks with the cases containing the products to be put on the shelves. The main motivation for the use of the collaborative solution is that of improving the efficiency of the process by improving the working conditions and by preserving the health of the human workers. The developed handling unit consists in a frame supporting a SCARA-like arm and two actuated trays, which encapsulates a standard trolley carrying the cases with the products. The robot pushes each case on one of the two trays, which is lifted to an optimal height tailored to the anthropometric characteristics of the clerk performing the manual refilling. The system also suggests the optimal refilling procedure form an ergonomic point of view. To develop this system, we have carried out an ergonomic analysis of the refilling task by doing massive simulations using digital avatars, from which we have derived data-driven ergonomic models that can be implemented in the REFILLIS Ergonomic Handling Module. In addition, which have been supported the obtained model by experiments. The typical outputs of the ergonomic assessment are evaluation indices which allow to evaluate the biomechanical risk through a comparison with threshold values. The literature presents a long list of indices applied for different type work activities and body part such as Rapid Upper Limb Assessment (RU L A), Ovako Working posture Analysing System (O W AS), Occupational Repetitive Actions index and checklist (OCRA) and the Rapid Entire Body Assessment method (R E B A) [1]. Recently, authors proposed the synthetic indices as combination of some of the previous indices reported (i.e., RU L A, O W AS) with the definition of the Posture Evaluation Index (P E I ) and Workcell Evaluation Index (W E I ) [2]. In addition, the task intensity can be also quantified measuring muscle activation using surface electromyography (sEMG) and it is function of external parameters such as the mass of the handled object [3]. In this work, we have used modified versions of the P E I and W E I indices to define the optimal lifting height and refilling procedure. The transfer of the results of this work in a real supermarket scenario can allow the implementation of the vision in Fig. 5.1. A total body 3D scanner (e.g., INBODY—Instant Body Scan [4]) can be used to provide an instant 3D scan of the store clerk, that from one side can be used to extract the main anthropometric characteristics specifics for the store clerk, and from another side to build a database of 3D digital avatars of the workers. From the 3D digital avatars, the REFILLS EHM can compute the ergonomically correct lifting height (h work ) and refilling procedure (W ) for each target level of the shelf to be filled (Sx ).

5 Robots Helping Humans: Collaborative Shelf Refilling

119

Fig. 5.1 Flow chart of the optimal framework that can be implemented in real supermarkets for the definition of the ergonomic lifting height (h work ) and refilling process (W ). These parameters are defined for each shelf level (Sx ) based on store clerk’s anthropometric characteristics (a) obtained using a total body 3D scanner

5.2 Handling Module The handling module developed for collaborative shelf refilling is a robotic system equipped with a robotic arm that can support the employee with refilling the shelves. The same arm, in the autonomous scenario described in next chapter, is able to refill the shelves itself. Therefore, with the same basic kinematics, the two different (partly) automated scenarios of shelf refilling are enabled. The system consists of a frame where two actuated trays and a robotic arm are mounted (see Fig. 5.2). To provide the products that must be refilled in the shelves, the handling module can encapsulate a standard trolley. The robot arm is a redundant SCARA-like kinematic. In the collaborative scenario, five axes are employed, one translational and four rotational. The SCARA can push up to three boxes in a row with up to 10 kg to the side where the trays are mounted. When the required box is on the tray, it automatically lifts on an ergonomic height for the clerk so that the products from the box can be placed into the shelf by the clerk in an ergonomically correct way.

120

T. Caporaso et al.

Fig. 5.2 Handling module

5.3 Methodology In this section we show the methodology that we have used to define the ergonomically correct combination of h work and W for each level of the shelf, based on the store clerks’ anthropometric characteristics and on the shelf to be filled.

5.3.1 Procedure The methodology is based on the robust design approach [6, 7] (see, e.g. Fig. 5.3). The main control factors are: (i) height of the case in the trolley (h work ); (ii) refilling process used to locate the case in the shelf (W ). Additional control factors are: (iii) store’s clerk position at the start of the lifting task (xwork and z work ); (iv) store clerk’s position in front of the shelf (xref and z ref ); (vi) positions of the items on the shelf (xitem , yitem , z item ). In order to optimize the process from an ergonomic point of view, we consider as adjustable factors the parameters h work and W . The other control factors related to the store clerk’s position in front of the shelf and item position (xitem and z item ) on the shelf are defined into a suitable normal range. It allows to increase the reliability of the simulations in comparison with experiments but at the same time to ensure significant influence on the ergonomic indices. Finally, yitem is defined as the height of each single plane of the shelf. The noise factors are: anthropometric variability (three percentiles (Splt ): 5th, 50th and 95th) and sex (male and female).

5 Robots Helping Humans: Collaborative Shelf Refilling

121

Fig. 5.3 P-Diagram of the robust design approach

The virtual simulation is conducted using a digital human modelling software largely used in industries (Jack, Siemens), using a specific tool to recreate the refilling task, where the overall task is divided in predetermined subtasks. From the obtained data, we derive ergonomic indices used as response. In particular, we define an ergonomic evaluation based on two indices: (i) for all the subtasks, a modified version of the P E I ; for the overall task, we have first used a modified version of the W E I and after a second version of the W E I (referred to as W E Icor ) that takes into consideration the different timing duration of the subtasks.

5.3.2 Ergonomic Indices The ergonomic assessment is based on two ergonomic indices: (i) for all the subtasks, we have used a modified version of the P E I (referred to as P E In ); for the overall task, we have first used a modified version of the W E I (referred to as W E In ) and after a second version of the W E I (referred to as W E Icor ) that takes into consideration the different timing duration of the subtasks. In the following, we refer to the P E In index simply as P E I and to the W E In index simply as W E I . The P E I index is used to evaluate the optimal lifting height (h opt ) that guarantees the most ergonomic posture in the lifting task for each percentile. The W E I indices are indeed used to establish which refilling process W is the optimal from the ergonomic point of view. For the P E I , we use a new formula with the integration

122

T. Caporaso et al.

of: L B A (Low Back Analysis) using the compression strength on the L4 and L5 vertebrae; R E B A scores; RU L A scores. The definition of the P E I is the sum of three dimensionless quantities as PEI =

RE B A RU L A LBA + + 3400 7 5

(5.1)

where: the first term is obtained by normalizing the L B A value with the NIOSH limit for the compression strength (equal to 3400 N); the second term is obtained by normalizing the R E B A index with its critical value (equal to 7); the third term is obtained by normalizing the RU L A index with its critical value (equal to 5). The lower the P E I value is, the better the posture is from the ergonomic point of view. Our objective is to obtain the optimal height h opt to which is associated the optimal value of the P E I index (P E Iopt ) that allows for the execution of the lifting task. The ergonomic assessment of the overall task can be indeed obtained by the W E I computed as n WEI = P E Ii · Ti (5.2) i=1

where i is the single subtask, n is the total number of subtasks and Ti is the equal to Ti =

STi CT

(5.3)

where STi is the time of subtask and C T is the total cycle time of the task. Since the two refilling processes W require different C T , we introduce a corrected version of the W E I index (W E Icor ) defined as W E Icor, j =

1 − C T j − C Tmax C Tmax

· W EIj +

C T j − C Tmax C Tmax

· P E Imin (5.4)

where: j is the index related the single trial and C Tmax is the maximum value of the C T achieved in the experimental tests. The use of W E Icor allows a better comparison of the two working configurations. As for the P E I , also for the W E I index the optimal value (W E Iopt ) is the minimum one. The ergonomically correct refilling process W (called Wopt ) is the one that achieves the minimum value for the W E I indices.

5.4 Simulations and Ergonomic Handling Module (EHM) In this section we present the performed simulations and the obtained ergonomic models from the data collected from the simulations. In particular, we obtain one data-driven ergonomic model which relates the optimal working height with the

5 Robots Helping Humans: Collaborative Shelf Refilling

123

percentile of the store clerk. Then, to this optimal working height h opt is associated an optimal cycle time C Topt . Finally, we have chosen the optimal refilling process Wopt .

5.4.1 Simulation Setup The layout used for the simulation was composed by the shelf, the trolley, the case, the items inside the case. The virtual scenario of the shop is implemented considering one shelf with dimensions to replicate the real shelf, cases and items provided into REFILLS project. The characteristics of the shelf are: (i) 5 shelf levels; (ii) the bottom shelf is at 30 mm and the top shelf is at 1880 mm; (iii) the depth of shelves is 400 mm; (iv) the minimum free height in between two shelf levels is at least 380 mm. About the case, the most important dimension is the height (that influences the h work for WTI ). According to the requirements and the system specification of the REFILLS project, the height of a case is fixed equal to 350 mm. The other dimensions (width and length) are respectively fixed equal to 190 mm and 340 mm. Another key parameter is the weight, fixed equal to 6 kg. The case contains 6 items each one with a weight of 1 kg. According to these dimensions, the items in the case are placed on three columns and two rows. Also, the final placement of the items in each shelf is fixed on three columns and two rows. The simulations are performed in the Jack software, where we have recreated the simulated scenario of the REFILLS project, according to the layout previously described. The virtual store clerk is a virtual mannequin that although is based on the 3D Static Strength Prediction Program, guarantees a good estimation also under dynamic conditions (e.g., walking [9]). Snapshots of 2 of the 266 performed simulations are reported in Fig. 5.4. Using the tool Task Simulation Builder, both the refilling processes W are reproduced in the virtual scenario. The number of subtasks required for each action is related to the working points (defined through the definition of the control factors). In order to improve the reliability of the virtual task, we consider all the control factors related to the item (except yitem that is fixed with height of each level shelf) with a random normal distribution.

5.4.2 Data Collection For the definition of the simulations, we need to select the levels for the adjustable factors h work and W . In particular, we consider six levels for the h work and two levels for the W . The six levels for the h work cover the recommended range for lifting tasks between the 5th female percentile and 95th male percentile, see e.g. the ACGIH recommendations [8]. Indeed, each level differs of 75 mm, in the range 825 mm (referred as level 0) to 1200 mm (referred as level 5).

124

T. Caporaso et al.

Fig. 5.4 Top: Highlights of the WTI refilling process, related to the following treatment: Male 50th; h work = 1050 mm; target shelf = S2 . a reach and grasp item 1 into the case; b walking from trolley to shelf; c bend and release item 1 on the shelf; d arise from bend; e walking from shelf to trolley; f reach and grasp item 2 into the case. Bottom: Highlights of the WTC refilling process, related to the following treatment: Female 95th; h work = 900 mm; target shelf = S2 . a Reach and grasp full case from the trolley; b walking from trolley to shelf; c release item 1 on the shelf; d reach and grasp 2 from the case; e walking from shelf to trolley; f put empty case into the trolley

We consider two refilling processes W . The first is the take item (WTI ), where the store clerk takes the item from the trolley one by one and places it on the shelf (see Fig. 5.4 top). In this refilling process, the h work is evaluated at the top of the case (where the store clerk takes the item). The second one is the refilling process take case (WTC ). In this configuration the store clerk is asked to take the case from the trolley and h work is evaluated at the bottom of the case. Then, he empties the items of the case inside the shelving, while holding the case with the left hand (see Fig. 5.4 bottom). So, the mixed-level full factorial plane used for the simulations is composed by 2 × 6 treatments [11]. The 12 treatments are replicated for the 6 percentiles and for the five different shelf levels, leading to a total number of trials equal to 360. However, we have conducted an accessibility analysis that underlines that for all

5 Robots Helping Humans: Collaborative Shelf Refilling

125

the percentiles the shelf level 5 can not be filled. In addition, for the female 5th and 50th percentiles, also at the shelf level 4 it is no possible to carry out the task. Therefore, 264 out of 360 trials can be performed. In addition, in order to have a better assessment of the h opt for the percentile 95th male, additional tests are carried out. In particular, we have made two additional tests with h work equal to 1275 mm for both refilling processes (WTC and WTI ).

5.4.3 Data Processing In this subsection, we present the main contributions of this work, with the calculation of the h opt with the related C Topt and the selected Wopt (for each level of the shelf and customized for each male and female percentile).

5.4.3.1

Calculation of the Optimal Working Height

The calculation of the h opt is based on the P E I index, evaluated for the lifting subtasks for both type of refilling processes. For the calculation of the optimal working height h opt in the refilling process WTC , we considered the P E I index of the lifting task. For the refilling process WTI , we evaluated the mean value of P E I index related the six lifting task. The P E I index is evaluated: (i) for each level of h work ; (ii) for each percentile; (iii) for each sex. Then, we find the best fit curves that interpolates the data, using a second-order model, which allows to get R2 values above 0.70. The optimal working height h opt is the one that minimize the P E I index. Finally, by using the data corresponding to the h opt for the three selected percentiles, and by fitting the data again with a second-order model, we obtain the model which assigns the h opt for all the anthropometric characteristics (represented by all range of percentiles). For the refilling process WTI , the optimal working height h opt ranges from 1014 mm to 1129 mm for the male sex (variation of 115 mm), and from 986 mm to 1084 mm for the female sex (variation of 98 mm), see Fig.5.5 top. For the WTC , the h opt ranges from 1063 mm to 1188 mm for the male sex (variation of 125 mm), and from 1032 mm to 1123 mm for the female sex (variation of 91 mm), see Fig.5.5 bottom.

5.4.3.2

Calculation of the Optimal Cycle Time

The single subtask time ST evaluated through the predetermined time analysis based on the method time measurement [10], allows to assess the cycle time C T for each h work . Starting from collected data, we find the best fit curves that interpolates these data, for each refilling process W and each sex and percentiles. In this case, we use a linear model fitting as function between C T and h work , which has very high values of correlation (R2 above 0.90). As expected, higher h work correspond to higher values of C T . Indeed, the slope of the linear model has a decremental trend when the

126

T. Caporaso et al.

Fig. 5.5 Data-driven model for the optimal working height (h opt ) as function of the anthropometric characteristics of the subjects (in terms of the percentiles, Splt ), computed for the WTI (top) and WTC (bottom)

percentile arises (from 5th female equal to 0.491 to 95th male equal to 0.341). For the WTC , we can see a lower slope coefficients, from 0.127—5th female percentile—to 0.103—95th male percentile). Starting from the obtained data-driven model we take the C T values for each h opt and we call it optimal cycle time (C Topt ). The calculated C Topt are reported in Table 5.1.

5 Robots Helping Humans: Collaborative Shelf Refilling

127

Table 5.1 Optimal cycle time values (C Topt ) obtained in correspondence to the optimal working height, for each refilling process W and for each shelf level, for the male and female sex. n.a. = not accessible shelf level Percentiles W Sl [s] S2 [s] S3 [s] S4 [s] Female Male Female Male Female Male Female Male 5th 50th 95th

5.4.3.3

WTC WTI WTC WTI WTC WTI

41.886 41.270 43.976 43.792 40.378 41.727 44.260 43.697 40.272 40.314 44.652 44.319

35.651 33.648 41.544 42.521 35.699 34.382 41.740 42.204 36.994 36.590 42.189 42.269

28.595 27.273 28.920 29.296 19.847 26.967 28.848 29.176 22.129 19.127 28.782 28.999

23.208 n.a. 37.959 n.a. 22.103 n.a. 31.205 n.a. 22.610 21.171 30.512 30.764

Selection of the Refilling Process

The last step is to verify which is the best W to be adopted (WTI or WTC ). This selection is made through the evaluation of the W E I indices. In the equations for the lifting subtasks, we use the values of P E I achieved in correspondence with h opt ). For the other subtasks of the W , we use the mean values of the P E I indices computed at the six level factors of h work . For the WTI , in correspondence with the h opt for each percentile, we calculate the optimal W E I values using Eq. 5.2. For the WTC , since this is characterized by smaller values of C Topt if compared to the WTI , we evaluate the W E Icor indices according to Eq. 5.4. Table 5.2 reports the selected W for each

Table 5.2 Optimal W E I values (W E Iopt ) calculated in both the refilling processes for all the shelf to be filled and for each male and female percentile. The best (minimum) values for each percentile, sex and shelf level are highlighted in bolt font. n.a. = not accessible shelf level Percentiles W Sl [s] S2 [s] S3 [s] S4 [s] Female Male Female Male Female Male Female Male 5th

50th

95th

WTC WTC WTI WTC WTC WTI WTC WTC WTI

– W E I 1.904 1.790 – W E Icor 1.834 1.712 1.754 1.591 – W E I 2.167 1.895 – W E Icor 2.020 1.830 1.872 1.675 – W E I 2.389 2.004 – W E Icor 2.206 2.004 1.989 1.823

1.613 1.449 1.445 1.660 1.490 1.484 1.761 1.609 1.572

1.562 1.325 1.401 1.629 1.411 1.421 1.590 1.440 1.505

1.629 1.616 1.315 1.500 1.184 1.341 1.571 1.330 1.396

1.555 1.477 1.317 1.569 1.484 1.312 1.509 1.158 1.349

1.345 n.a. 1.000 n.a. 1.297 n.a. 1.527 n.a. 1.224 n.a. 1.291 n.a. 1.668 1.433 1.373 1.135 1.337 1.312

128

T. Caporaso et al.

percentile, sex and shelf level to be filled that is the one that has the minimum value of W E Iopt .

5.5 Experiments In this section we describe the experiments that we have conducted for: (1) validation of the virtual simulations; (2) validation of the proposed ergonomic indices.

5.5.1 Experimental Setup and Protocol The experimental activity was carried out at ERGOs Lab—CESMA—University of Naples Federico II. The laboratory is equipped with the following instrumentation used for the experimental phase: (i) total body 3D scanner (INBODY—Instant Body Scan, BeyondShape s.r.l., [5]); (ii) MoCap system composed by 10 infrared digital cameras and 8 force platform (SMART DX 6000, BTS Bioengineering); (iii) sEMG sensors (FREEEMG 1000, BTS Bioengineering). The total body scanner allows to collect the subject’s anthropometric characteristic. The MoCap system allows to collect kinematic (sample frequency 340 Hz) and kinetic data (sample frequency 680 Hz) of the experimental tasks while sEMG sensors allow to collect muscle activity (sample frequency 1000 Hz). The experimental protocol consists of 20 markers (see Fig. 5.6) able to reproduce the real task using musculoskeletal digital model and six sEMG sensors for muscle activity analysis placed on the subject’s dominant side. Within the laboratory, we recreate the REFILLS scenario. The experimental scenario is the same of the virtual one, and comprises a trolley (with adjustable height) and a shelf with five different levels (from S1 to S5 ).

5.5.2 Experimental Data Collection One male volunteer subject belonging to the 50th percentile of the stature distribution is selected from the local population to conduct the experiments. The subject is not specialized in retails, and he did not report any musculoskeletal disorders over the last 12 months. Before the experimental tests, a physician collected the informed consent from the volunteer, and carried out the following steps: (i) collection of the main anthropometric characteristics; (ii) placement of the markers and sEMG sensors on the subject’s body [14]; (iii) the collection of the maximum voluntary contractions (M V C) of the selected muscle asking to the participant to perform isometric M V C. Then, the subject performed six treatments three trials for both the W (i.e., WTI and WTC ) and three different h work (i.e., 900, 1050 and 1125 mm). All the experimental

5 Robots Helping Humans: Collaborative Shelf Refilling

129

Fig. 5.6 Front (a), lateral (b) and back (c) views of the subject involved in the experiments, with the full marker set and sEMG probes. Marker set on the human body (in red): LTR/RTR: Left/Right Temporal Regions; C7: Cervical vertebra; LA/RA: Left/Right Acromion; LLHE/RLHE: Left/Right Lateral Humeral Epicondyle; LUS/RUS: Left/Right Ulnar Styloid; L5MC/R5MC: Left/Right 5st Metacarpal; S: Sacrum; LT/RT: Left/Right Greater Trochanter; LK/RK: Left/Right Lateral Femoral Epicondyle; LM/RM: Left/Right Malleolus; LMe/RMe: 5thMetatarsal of the Left/Right Foot. sEMG sensors on the subject’s right side (in white): AD: Anterior Deltoid; BB: Biceps Brachii; ESI: Erector Spinae Iliocostalis; OA: Obliquus Abdominis; RA: Rectus Abdominis; RF: Rectus Femoris

Fig. 5.7 Highlights of the laboratory tests and reconstructed trials, for the refilling process WTI , h work = 1050 mm; shelf level = S3 . a Reach and grasp item 1 into the case; b walking from trolley to shelf; c release item 1 on the shelf; d turn body; e walking from shelf to trolley; f reach and grasp item 2 into the case. The 1, 2 and 3 subscripts represent respectively the real scene, the marker reconstruction and the reproduction using OpenSim [12]

trials are conducted for the shelf level S3 . The items on the shelf are placed on three columns and two rows, and the subject is helped to correctly place the items with graphical indicators. Figure 5.7 shows some highlights of one the experimental trials. Kinematic joint reconstruction is performed using OpenSim 4.0 using a full-body Musculoskeletal Model [15] that includes 37 degrees of freedom, scaled according to

130

T. Caporaso et al.

the anthropometric characteristics of the subjects. For the estimation of the compression strengths on the L4 and L5 vertebrae, we have used indeed a specific full-body musculoskeletal Model of the lumbar spine [16]. The sEMG signals were indeed processed following these steps [13, 17]: (i) rectification; (ii) smoothing with a moving average filter (with time constant of 150 ms); (iii) filtering using a Butterworth low-pass filter with a cut frequency of 2 Hz; (iv) normalization with respect to the M V C; (v) R M S assessment.

5.5.3 Validation of the Virtual Simulations In this subsection we compare the virtual simulations and real experiments in terms of: (i) assessment of the h opt , (ii) assessment of C T ; (iii) selection of W . This comparison is mainly performed to have an acceptance test for the results obtained from the simulation data. Starting from the mean values of the kinematic and kinetic data for each subtask, we compute the RU L A, R E B A and L B A values and then the P E I indices for each subtask (see, Eq. 5.1). The P E I indices are evaluated for the two W and three h work (as reported in Table 5.4). These underline that: (i) the simulation presents higher P E I values than the experiments (the difference of the mean value is respectively equal to 0.256 for the WTI and 0.369 for the WTC ); (iii) both type of experiments present the minimum P E I value (useful for the assessment of the h opt ) at the same h work . The task performances in term of C T are: (i) for refilling process WTI : 28.708 ± 0.403 s and 37.630 ± 2.759s respectively for simulation trials and laboratory trials; (ii) for the refilling process WTC :19.747±0.132 s and 25.531±1.650 s respectively for simulation trials and laboratory trials. We can notice that: (i) although simulation trials last less than experimental trials, the ratio between C T virtual and real is the same for both W (i.e., mean values of simulated C T is 76.3% for WTI and 77.3% for WTC ); (ii) the C T for the real task presents more variability then the virtual one (standard deviation is 2.759 s compared with 0.403 s for WTI and 1.650 s compared with 0.132 s for the WTC ). Finally, starting from the assessment of ST for the real tests, according to Eqs. 5.2 and 5.4 W E I and W E Icor are assessed. The W E I and W E Icor values are indeed reported in Table 5.3. This table underlines that although the virtual and real tasks present different W E I values (i.e., virtual W E I values are higher than real W E I values), the identification of the Wopt (related to the minimum W E I values) is the same.

5 Robots Helping Humans: Collaborative Shelf Refilling

131

Table 5.3 W E I values assessed in all refills process. The best values for type of analysis are highlighted in bolt font Type of Analysis W h work 900 mm 1050 mm 1125 mm Experiments

Simulations

WTC WTC WTI WTC WTC WTI

– WEI – W E Icor

1.377 0.934 1.344 1.513 1.178 1.365

– WEI – W E Icor

1.367 0.927 1.298 1.502 1.177 1.329

1.367 0.928 1.344 1.501 1.179 1.332

Table 5.4 Comparison between the proposed ergonomic assessment strategy (modified P E I index) and the evaluation of E M G score for each refilling process, at the three different working height (h work equal to 900 mm, 1050 mm and 1125 mm). The best value for each type of analysis and each refilling process is underlined in bolt font W h work [mm] Type of analysis Experiments Simulations E M G score [-] P E I [-] P E I [-] WTI

WTC

900 1050 1125 900 1050 1125

0.915 0.296 0.309 0.362 0.214 0.190

1.462 1.217 1.416 1.001 0.932 0.934

1.747 1.558 1.559 1.420 1.276 1.277

5.5.4 Validation of the Ergonomic Indices For the validation of the proposed ergonomic assessment based on modified P E I and W E I indices we use the analysis on muscle activation, which is a standard for ergonomic analysis. The root mean square (R M S) values of the normalized muscle activations (N M A) is used as metric to characterize the store clerk’s muscle activity during the refilling process. Following the same motivations that have led to the computation of W E Icor in (Eq. 5.4), and therefore for considering the different C T required for the two W , the R M S is corrected as:

132

T. Caporaso et al.

R M Scor, j =

1 − C T j − C Tmax C Tmax

· RM Sj

(5.5)

where j is the index related the single treatment; C Tmax is the maximum value of the C T as recorded in the experimental tests. Finally, a synthetic value (i.e., E M G score ) for the muscle activities analysis is constructed by considering the sum of the R M S value of each selected muscle as: E M G Scor e =

k

R M Scor,l

(5.6)

l=1

where l is the index related to the single muscle and k is the number of muscles monitored. To confirm the validation of the proposed P E I index, we calculate the R M S and R M Scor (respectively WTI and WTC ) for related to the lifting subtasks. The results reported in Fig. 5.8 top show that the BB is always the more activated muscle. The WTI equal to 900 mm is the most demanding from the muscle activity point of view, in particular in the WTI . Table 5.4 indeed underlines how the minimum value of E M G score for the WTI happens in correspondence to the minimum value of P E Iopt (h work = 1050 mm). For the WTC , the minimum P E I value (1.276) in simulation is again achieved at h work equal to 1050 mm. However, in this case the minimum E M G score in the real experiments is achieved at h work equal to 1125 mm. At this h work , the P E I value is 1.277, which is in any case really close to the minimum value achieved (1.276). It is important to underline, however, that h work = 1125 mm is closer to h opt achieved from the simulation data, which is equal to 1150 mm. To verify the validity of the W E I indices used for Wopt definition, we take into consideration the two real experiments which show h work closer to the h opt obtained in the two refilling process. In particular, the muscle analysis is carried out at h work equal to 1050 mm for the refilling process WTI (simulated h opt equal to 1052 mm), and h work equal to 1125 mm for the refilling process WTC (simulated h opt equal to 1115 mm). Figure 5.8 bottom reports the R M S and R M Scor for the total task: again, the most activated muscle is the BB. Finally, the E M G score shows that the refilling process WTC has a lower total effort for the overall task, by considering the contribution from all the muscles (0.482 versus 0.512). Therefore, the results of the experiments are in accordance with the output of the simulation (see Table 5.4).

5 Robots Helping Humans: Collaborative Shelf Refilling

133

Fig. 5.8 Top: On the left, R M S; on the right, R M Scor , for WTI and WTC during the lifting task of the subject involved in the experiments. Bottom: On the left, R M S; on the right, R M Scor , for WTI and WTC related to whole task in relation to the total task at h opt . All values are represented as mean of with standard deviation of the trials related to the shelf S3

5.6 Conclusions In this chapter we have presented the ergonomic assessment used for the REFILLS project to establish the optimal working height and refilling process for the store clerk, to potentially reduce the risks for musculoskeletal disorders in a supermarket scenario. The methodology is based on a robust design approach with a series of simulations that have been performed by six digital avatars that have been used to extrapolate a predictive model which relates the anthropometric characteristics of the subjects (in terms of percentiles) and the shelf to be filled with the optimal working height. This model is derived from the lifting subtasks evaluation using the modified P E I index. Then, by the calculation of W E I indices, we have been able to select the optimal refilling process modality, again for all the anthropometric characteristics of the subjects (in terms of percentiles) and the shelf to be filled. Both the main results have been validated with laboratory experiments. The implementation of the results of this work in a real supermarket scenario will allow to test the ergonomic

134

T. Caporaso et al.

handling module which, for each subject and for each shelf level to be filled, from one side automatically puts the trolley at the optimal working height, from the other side, suggests the optimal refilling process modality to the store clerk. In addition, the proposed approach could be personalized for each store clerk. Indeed, all the store clerks of a supermarket can be subjected to a scanning process, where from an instant data acquisition is generated a 3D model of the operator, and from this the anthropometric characteristics of interests are extracted to optimize working height and refilling process.

References 1. Ranavolo, A., Ajoudani, A., Cherubini, A., Bianchi, M., Fritzsche, L., Iavicoli, S., ... Draicchio, F.: The sensor-based biomechanical risk assessment at the base of the need for revising of standards for human ergonomics. Sensors 20(20), 5750 (2020) 2. Tarallo, A., Di Gironimo, G., Gerbino, S., Vanacore, A., Lanzotti, A: Robust interactive design for ergonomics and safety: R-IDEaS procedure and applications. Int. J. Interact. Des. Manuf. 13(4), 1259–1268 (2019) 3. Argubi-Wollesen, A., Wollesen, B., Leitner, M., and Mattes, K.: Human body mechanics of pushing and pulling: analyzing the factors of task-related strain on the musculoskeletal system. Saf. Health Work 8(1), 11–18 (2017) 4. Grazioso, S., Selvaggio, M., Di Gironimo, G.: Design and development of a novel body scanning system for healthcare applications. Int. J. Interact. Des. Manuf. 12(2), 611–6201 (2018) 5. Grazioso, S., Selvaggio, M., Caporaso, T., Di Gironimo, G.: A digital photogrammetric method to enhance the fabrication of custom-made spinal orthoses. J. Prosthet. Orthot. 31(2) (2019) 6. Barone, S., Lanzotti, A: Robust ergonomic virtual design. Statistics for Innovation, pp. 43–64. Springer, Milano (2009) 7. Caporaso, T., Grazioso, S., Vaccaro, D., Di Gironimo, G., Lanzotti, A: User-centered design of an innovative foot stretcher for ergometers to enhance the indoor rowing training. Int. J. Interact. Des. Manuf. 12(4), 1211–1221 (2018) 8. American Conference of Governmental Industrial Hygienist: Threshold Limit Values. In: CGIH digital publication, TLVs) and Biological Exposure Indices (BEIs) (2022) 9. Caporaso, T., Di Gironimo, G., Tarallo, A., De Martino, G., Di Ludovico, M., Lanzotti, A.: Digital human models for gait analysis: experimental validation of static force analysis tools under dynamic conditions. In: Advances on Mechanics, Design Engineering and Manufacturing, pp. 479–488. Springer, Cham (2017) 10. Di Gironimo, G., Di Martino, C., Lanzotti, A., Marzano, A., Russo, G.: Improving MTM-UAS to predetermine automotive maintenance times. Int. J. Interact. Des. Manuf. 6(4), 265–273 (2012) 11. White, D., Hultquist, R.A.: Construction of confounding plans for mixed factorial designs. Ann. Math. Stat. 36(4), 1256–1271 (1965) 12. Delp, S.L., Anderson, F.C., Arnold, A.S., Loan, P., Habib, A., John, C.T., ... Thelen, D.G.: OpenSim: open-source software to create and analyze dynamic simulations of movement. IEEE Trans. Biomed. Eng. 54(11), 1940–1950 (2007) 13. Panariello, D., Grazioso, S., Caporaso, T., Palomba, A., Di Gironimo, G., Lanzotti, A: Biomechanical analysis of the upper body during overhead industrial tasks using electromyography and motion capture integrated with digital human models. Int. J. Interact. Des. Manuf. (2022) 14. Hermens, H.J., Freriks, B., Disselhorst-Klug, C., Rau, G.: Development of recommendations for SEMG sensors and sensor placement procedures. J. Electromyogr. Kinesiol. 10(5), 361–374 (2000)

5 Robots Helping Humans: Collaborative Shelf Refilling

135

15. Rajagopal, A., Dembia, C.L. , DeMers, M.S. , Delp, D.D., Hicks, J.L., Delp, S.L.: Full-body musculoskeletal model for muscle-driven simulation of human gait. IEEE Trans. Biomed. Eng. 63(10), 2068–2079 (2016) 16. Raabe, M.E., Chaudhari, A.M.: An investigation of jogging biomechanics using the full-body lumbar spine model: model development and validation. J. Biomech. 49(7), 1238–1243 (2016) 17. Grazioso, S., Caporaso, T., Palomba, A., Nardella, S., Ostuni, B., Panariello, D., Lanzotti, A: Assessment of upper limb muscle synergies for industrial overhead tasks: a preliminary study. In: Second Workshop on Metrology for Industry 4.0 and IoT, pp. 89–92 (2019)

Chapter 6

Robotic Clerks: Autonomous Shelf Refilling Alberto Cavallo, Marco Costanzo, Giuseppe De Maria, Ciro Natale, Salvatore Pirozzi, Simon Stelter, Gayane Kazhoyan, Sebastian Koralewski, and Michael Beetz Abstract Nowadays, robots are used in the retail market mostly for warehousing, while they could be of great help in different in-store logistics processes as discussed in previous chapters. The present chapter deals with the shelf replenishment task; its execution by a robot requires overcoming of technological and methodological barriers in the handling of single products rather than the boxes containing them. The challenges a robot has to face to replenish a supermarket shelf are all related to manipulation in narrow spaces of products with a large variety of size, shape, weight, and fragility. The solution proposed by REFILLS is based on a robotic system where perception is used at all hierarchical levels of the control architecture, from highlevel task planning algorithms and motion planning to reactive control layers based on physics models, where tactile and visual perception are combined to achieve highly reliable manipulation of items. Experiments in an emulated supermarket shelf are carried out to demonstrate the effectiveness of the approach. Keywords Reactive control · Tactile perception · RGB-D perception · In-hand object manipulation · Slipping control · Mobile pick and place

6.1 Introduction On the one hand, retail automation solutions like Amazon Go shops, where customers can go shopping without any check out, are improving the store experience of customers. On the other hand, automated solutions of in-store logistics processes are still limited. Commercial mobile robots monitor shelves for automated inventory management by resorting to different technologies, like 2D and 3D cameras in the A. Cavallo · M. Costanzo · G. De Maria · C. Natale · S. Pirozzi Università degli Studi della Campania Luigi Vanvitelli, Via Roma 29, 81031 Aversa, Italy e-mail: [email protected] S. Stelter (B) · G. Kazhoyan · S. Koralewski · M. Beetz Universität Bremen, Am Fallturm 1, 28359 Bremen, Germany e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 L. Villani et al. (eds.), Robotics for Intralogistics in Supermarkets and Retail Stores, Springer Tracts in Advanced Robotics 148, https://doi.org/10.1007/978-3-031-06078-6_6

137

138

A. Cavallo et al.

Bossa Nova 2020, Tally 3.0 by Simberobotics and LoweBot by Fellow Robots, or RFID antennas like the Stockbot by PAL robotics [5]. Another process being automatized is the depalletizing based on recent AI solutions like the Photoneo Depal [30]. Recently, a new robotic depalletizer, that is able to manage not only boxes but also highly mixed pallets, has been developed [22] and is used in the robotic depalettizing cell of Chap. 4. While few inventory management and automated unloading solutions exist on the market, retailers also demand automation of other in-store processes, such as product transport from the backroom to the store and shelf re-stocking. The motivation is to reduce the high costs of in-store logistics, which comprise up to 60% of total operational store costs [28]. The most time consuming task is certainly the shelf replenishment and 50% of such time is devoted to find the correct slot on the shelf. Only few scientific contributions exist on this specific automation issue [31, 33, 36], mainly related to the Future Convenience Store robotic challenge launched by the World Robot Summit [37]. Most of those solutions are based on the use of vacuum grippers, hence they are limited to the cases where the grasp poses for picking and placing are the same. On the other hand, the shelf replenishment task in a real environment may require sophisticated manipulation skills. The REFILLS project [35] proposes an elaborate software architecture to execute the complete shelf replenishment with a speed comparable to the human one as described in detail in Sect. 6.2. Perception is at the base of the solution, since sensor data are used for both scene understanding and planning as well as for actually controlling robot actions. Scene understanding is based on visual (RGB-D) perception while robot control exploits not only vision but also tactile sensing. These two steps are also tightly intertwined especially in the selection of the grasp pose of a given object. Due to the complexity of the scene and the narrow spaces where the robot has to perform its task, it is often required that an object has to be picked with a grasp pose different from the place one. This means that the planner finds a solution to the fetching task only if the grasp pose can be changed during task execution, i.e., by executing an inhand manipulation maneuver. This ability is typical of dexterous hands, but it can be accomplished even by parallel grippers by exploiting the so-called extrinsic dexterity [15, 19], namely external constraints to apply suitable forces to the manipulated object with the aim to change its configuration within the fingers. The REFILLS approach to manipulation planning is based on the availability of this skill, and the planning algorithm exploits such dexterity each time a plan with a fixed grasp fails. However, planning a grasp pose suitable for fetching the object from the pick pose to the shelf facing, with the correct pose, is not enough. The chosen grasp pose on the object should not only be reachable without any collision in a cluttered environment and in narrow spaces like those of a supermarket shelf, but it should allow in-hand manipulation actions that could be necessary to perform the task. For instance, placing a box, grasped from the top, between two narrow shelf layers can be accomplished by rotating the gripper about the object (gripper pivoting). Alternatively, a bottle knocked over a table can be placed vertically by letting it rotate between the fingers (object pivoting). Both maneuvers, however, require suitable grasp poses. Both object and gripper pivoting require a grasp pose above the object center of gravity on a

6 Robotic Clerks: Autonomous Shelf Refilling

139

vertical line passing through it. In particular, object pivoting needs a grasp location far enough from the center of gravity. Eventually, the output of the planning phase is not only a sequence of desired motions of the robot, but also the most appropriate sequence of control modalities with their parameters. It is straightforward to understand that grasping with a parallel gripper requires to establish the correct grasping force. It should be high enough to safely hold the object during the transportation maneuver and low enough to avoid any deformation or damage at the same time, in case of fragile or deformable objects. Control of the grasp force is an essential feature not only for a safe grasp but also for performing controlled in-hand object manipulations. Within REFILLS, tactile perception was the enabling technology for achieving these control objectives. A tactile sensor for an effective manipulation should be able not only to provide spatially distributed information on the contact, but it should be able to measure also the contact wrench, including torsional moment, especially for the pivoting maneuvers. The tactile sensor used in this work was purposefully developed in our laboratory [7, 17] and integrated into a commercial gripper together with an Intel Realsense camera. This reactive controller improves the picking and placing, but there are also other motions that need to be performed, but do not require grasping. An example would be deciding which camera pose enables the perception framework to give a good initial pose estimate for objects. Fortunately, in REFILLS we have semantic digital twins of the store that are kept up to date. This special circumstance allows one to optimize such poses with learning algorithms that have fast training times but tend to overfit. Such algorithms could be retrained before each autonomous shelf refilling with the current state of the store, making potential overfitting irrelevant. This chapter summarizes the main achievements of the REFILLS project within the so-called Scenario #3 “Autonomous shelf refilling”, which include, besides the cited sensorized gripper, the complete reactive control layer endowed with the grasp controller based on the results in [6, 10, 15], and a visual servoing controller described in [11], the manipulation planning algorithm in [18] and an approach to use semantic digital twins as described in Chap. 2 to optimize the parametrization of actions in the task-level plan executive [26].

6.2 Software Architecture for Autonomous Shelf Refilling Figure 6.1 shows the architecture of the integrated system for autonomous in-store manipulation. The three main high-level actions in the in-store manipulation domain are: 1. searching for objects that satisfy a given description in the environment, 2. fetching objects, 3. and delivering objects.

140

A. Cavallo et al.

Fig. 6.1 Architecture of the integrated system for in-store manipulation

The CRAM component1 is a task-level plan executive, where plans are written in a domain-specific robot programming language. CRAM contains a library of generalized plans for the in-store manipulation domain. The used perception framework is RoboSherlock 2 [3], which was already explained in Sect. 2.6.2. For the searching and scanning actions, CRAM sends commands to the perception framework to detect objects that satisfy a given description. The perception framework gets sensor data streams from the robot, and asserts perception results about the world into the belief state of the knowledge base. If the description of the object, provided by the plan executive, is not sufficient to resolve the perception query, RoboSherlock asks the knowledge base for object-specific information, such as the exact size of the object, its shape, color, texture, etc. We use the knowledge processing framework KnowRob3 [4] for such tasks, which keeps track of a semantic digital twin of the environment and is explained in Chap. 2 in more detail. The generalized plans of the searching and scanning actions are similar to each other, with the difference that scanning areas on the shelves in the retail environment is performed by commanding the end effector with the mounted camera to follow a continuous scanning trajectory, whereas the searching action requires to direct the camera to only one specific point in space.

1

CRAM is available open source on Github: https://github.com/cram2/cram. RoboSherlock is available open source on Github: https://github.com/RoboSherlock/ robosherlock. 3 KnowRob is available open source on Github: https://github.com/knowrob/knowrob. 2

6 Robotic Clerks: Autonomous Shelf Refilling

141

For fetching objects, there are a number of motion parameters that the CRAM system needs to infer. The inference of motion parameters is done by querying the knowledge base. First, a grasp pose needs to be chosen for the given object. The knowledge base contains multiple possible grasp poses for each known object. The motion planner Giskard 4 [21] is asked to choose one that is reachable and does not cause collisions. When determining if a pose is reachable, Giskard will take advantage of the pivoting capability, when the robot is holding an object. Once the grasp pose is chosen, a full body motion plan is generated by Giskard and sent to CRAM. Giskard might get additional constraints from the knowledge base, e.g., to model the pivoting maneuver, when objects are grasped. CRAM sends the motion plans to the Reactive Control System, as described in Sect. 6.4, while ensuring that the correct parameters and execution mode is set in the controller. The motion plans are executed on the robot in a closed-loop manner, in the sense that they can be slightly modified by the reactive controllers to comply with the real scene. For instance, picking an object with a grasp pose suitable to execute a pivoting maneuver requires an accuracy as high as a couple of millimeters. Hence, grasping the object cannot be done open-loop based on the grasp pose estimated by RoboSherlock. The object, instead, is grasped using an image-based visual servoing approach, based on a target image that represents the desired relative pose between the gripper and the object that is suitable for the rest of the manipulation plan.

6.3 The Sensorized Gripper The reactive control modules need a proper sensing and actuation system. Figure 6.2 shows the adopted end effector. The actuator is a Schunk WSG50 parallel gripper, the finger velocity is commanded via Ethernet by means of a Lua script that runs on the gripper controller, this limits the controlling signal at a rate of 50 Hz. An Intel

Fig. 6.2 The sensorized gripper: Schunk WSG50 gripper, Intel D435i camera and SUNtouch tactile fingers

4

Giskard is available open source on Github: https://github.com/SemRoCo/giskardpy.

142

A. Cavallo et al.

Fig. 6.3 Overview of the assembled sensor components, with details about the production processes. (This figure uses some material from [7] with permission from MDPI Open Access)

D435i RGB-D camera is mounted in an eye-in-hand configuration and provides the images for the visual servoing controller (Sect. 6.4.2). For the visual algorithm, the camera has been configured to provide images at a resolution of 640 × 480 pixels with a rate of 30 Hz. The sensorized fingers integrate tactile sensors based on the idea described the first time in [20], i.e., to use a Printed Circuit Board (PCB) that integrates a matrix of Light Emitting Diode (LED)/phototransistor couples, called “taxels”, assembled with a suitable designed mechanical layer. The working principle is transducing the contact information into mechanical deformations measured by taxels. The optical couples, typically integrated in single photo-reflector devices, are positioned below the mechanical layer and they provide a spatially distributed measurement of the deformation corresponding to the external contact. Taxels work in reflection mode: the LED light is reflected by the bottom side of the mechanical layer, with an intensity depending on local deformation, and then measured by the coupled phototransistor. The set of all the taxel measures corresponds to the “tactile map”. Several improvements were made to all components of the tactile sensor during the REFILLS project. In this chapter, the details about the final solution integrated into the sensorized gripper [7, 18] are reported, by highlighting the main improvements with respect to previous designs [12, 17]. An overview of the components constituting a sensorized finger is reported in Fig. 6.3. The figure also highlights how the parts are assembled and their manufacturing process.

6 Robotic Clerks: Autonomous Shelf Refilling

143

6.3.1 Tactile Sensor Electronics The electronics is constituted by a PCB with the optoelectronic sensing points. A second PCB, connected to the previous one, has been developed for serial communication and power supply management. With respect to previous solutions, the LEDs are driven by using current sources instead of a voltage supply. Additionally, the phototransistor analog signals, decoupled through a buffering stage, are directly digitized by using a more powerful microcontroller, and not by means of a separate Analog-to-Digital (A/D) converter. The sensing section is constituted by 25 photo-reflectors (manufacturer code NJL5908AR, by New Japan Radio), organized in a 5 × 5 matrix, with 3.55 mm spatial resolution. Each photo-reflector integrates an infrared LED and a phototransistor matched at 925 nm wavelength. The corresponding area of the whole sensing section reaches about 21 × 21 mm2 . The current driving of the LEDs has been implemented by using two adjustable current sources (manufacturer code LM334, by Texas Instruments), whose output has been fixed to 4 mA. For the latter, a zero temperature coefficient configuration has been used, to improve stability of the LED emitted light. This choice guarantees a higher repeatability and an improved signal-to-noise ratio for the sensor. The voltage supply for the current sources has been fixed to 24 V, since it is a typical power supply available in robotic units. The phototransistors are voltage driven with a 3.3 V supply. The additional buffering stage has been used to decouple the phototransistor signals from the A/D inputs. It is constituted by operational amplifiers in voltage follower configuration (manufacturer code ADA4691, by Analog Devices), with a 5 V power supply. Then, the buffer outputs are digitized by using the microcontroller on-board A/D channels. Differently from previous solutions, where separate A/D converters were needed, recent microcontrollers integrate a high number of 12−bit A/D channels, allowing the direct digitization of sensor signals. The selected microcontroller is the PIC16F19175, manufactured by Microchip, with a 3.3 V power supply. The external interface for the measured data transmission is a standard serial interface @1 Mbit/s, resulting in a 500 Hz sampling frequency for all the 25 taxels. The external connection is done using the second PCB, which allows to adapt the sensorized finger to different mechanical and electrical needs of the robotic system hosting the sensor. The solution designed for REFILLS relies on DC/DC converters for the generation of the different voltage levels from the 24 V power supply and on a connector for the microcontroller programming. Figure 6.4 reports some pictures of the realized and connected PCBs. Additional details can be found in [7, 18].

6.3.2 Tactile Sensor Mechanics The deformable pad can be designed with different shapes according to the tackled scenario. When the task execution requires the reconstruction of contact forces and torques, a domed soft pad associated with a suitable calibration procedure is preferred

144

A. Cavallo et al.

Fig. 6.4 Details of the mechanical parts of the finger

to a flat pad [17]. In the REFILLS project, the pad design presents a hemispherical contact surface with a curvature radius equal to 25 mm and a bottom side where suitable cells are realized with white reflective surfaces perfectly aligned with the photo-reflectors and black walls to optically separate the sensing points. The cells and the walls have been designed also to obtain a perfect interlocking with the rigid grid. The pad is made by silicone due to its good properties in terms of elasticity, repeatability and low hysteresis. The need to combine black and white parts requested the use of silicone molds technique. Hence, from the pad CAD drawing the mold parts, reported in the right top side of Fig. 6.3, have been produced by using Polyjet 3D printing technology with 25 µm layers. The silicone has been poured in the molds in different layers with different colors, by using a pneumatic fluid dispenser. The PRO-LASTIX silicone manufactured by PROCHIMA with 20 Shore A hardness has been used. Some pictures of the realized pad are shown in Fig. 6.4. Differently from [17] where mechanical pins were used, the rigid grid has been designed to exploit the sensing PCB edges for the optical and mechanical alignment among sensing points and reflective surfaces of the soft pad. From the CAD design, as shown in the central part of Fig. 6.3, the grid presents protruding edges on three sides, which allow a perfect interlocking among the rigid grid and the sensing PCB. The thickness of the grid is about 500 µm and it has been chosen to guarantee that the photo-reflectors work in a monotonic range. Indeed, as detailed in [17], the optoelectronic components are characterized by a non-monotonic response if the reflective surfaces approach a distance of less than 300 µm. The dimensions of the grid have to guarantee that the reflective surfaces cannot reach the latter distance also if the pad is subject to a large deformation. The upper side of the grid presents grooves suitably designed to improve the interlocking robustness among the grid and the bottom side of the deformable pad. The rigid grid has been manufactured in black ABS plastic by using the same Polyjet 3D technology used for mold manufacturing.

6 Robotic Clerks: Autonomous Shelf Refilling

145

The grid has been mechanically connected with both the PCB and the pad by using a cyanoacrylate-based glue. A picture of the grid is shown in the top-left side of Fig. 6.4. The assembly of the sensorized finger has been completed with the design of a case for housing the tactile sensor. The case presents a bottom part to house the PCBs and a top part used to close the case and to block the edges of the soft pad. The two parts are mechanically fixed by using screws positioned all around the case edges. The case presents, at the base, two holes for fixing the assembled finger to the WSG-50 gripper flange. It has been manufactured in nylon by using the Multi Jet Fusion 3D printing technology, with 60 µm layers. The whole assembled sensor is shown in the bottom-right side of Fig. 6.4.

6.3.3 Sensor Calibration After a calibration procedure, the tactile sensor is able to estimate the contact wrench at the fingertip. This is done by finding the mapping from the tactile voltages to the contact wrench that caused the sensor deformation. The calibration algorithm is described in detail in [17] and it is based on a feed-forward neural network FF-NN. Such an approach needs a good training set and the data collection is of paramount importance. In [17] the data collection was done by hand by a human operator with the visual aid of a specifically designed GUI. Recently, a robotized calibration setup has been realized as depicted in Fig. 6.5. The tactile sensor is mounted on a reference six-axis force/torque sensor (ATI Nano 43) and, at the same time, the wrench measured by the reference sensor and the raw tactile data are collected. The objective is to estimate the wrench in all possible combinations in a large interval of the contact plane orientation. The dimensionality of the problem is large, so there is a significant risk of missed wrench/orientation combinations in the training set. The Meca500 robot is programmed to apply all the desired force/torque combinations

Fig. 6.5 Automatic calibration setup

146

A. Cavallo et al.

on the deformable layer with various orientations. The robot is force controlled by directly using the measurements of the Nano43 sensor. The robot has to apply every possible wrench on every possible point of the soft pad. Given the soft pad shape and the normal force range, the contact point position and the normal force are sampled according to a grid with 12 samples for the normal force and 22 samples for the contact point positions. Then, at fixed normal force, the possible tangential force and torsional moment values that can be applied are inside an approximately elliptical surface called Limit Surface (LS) (see Sect. 6.4.1). Instead of sampling the volume inside the LS, the proposed approach is to sample the LS defining a set of wrenches PL S 1 , PL S 2 , . . . along the LS (see Fig. 6.7). Thus, the robot applies to the soft pad the wrenches defined by these points alternated by the wrench P0 (defined by zero tangential force and torsional torque). In other words the robot applies the sequence P0 , PL S 1 , P0 , PL S 2 , . . . , by alternating pushing actions and retreat motions. In this way, the robot also applies wrenches inside the LS during the transient between the set-points. Of course, for each sample all six components of the wrench measured by the Nano43 are recorded. Eventually, the data are used to train a FF-NN. The network is structured with six hidden layers, each composed of 90 neurons and a sigmoidal activation function, whereas the output layer has a linear activation function and six neurons.

6.4 Reactive Control System The reactive control system includes two modules, suitably activated by the planner. The first one is devoted to control the grasp force and has two control modalities, the slipping avoidance mode, usually adopted during the fetching phase of the product toward the shelf layer to avoid object drop, and the pivoting mode, usually activated by the planner to change the grasp configuration so as to accomplish the replenishment task. The second module is a visual servoing controller and is used only during the grasping phase of the object to achieve a successful grasp with the positioning accuracy required by the in-hand maneuvers.

6.4.1 Grasp Control The grasp controller is designed by resorting to a model-based technique. The motion of the object inside the fingers of a parallel gripper is described by the dynamics of a planar slider (see Fig. 6.6) subject to external tangential f t and torsional τn loads, and friction forces over a distributed contact area, that is the one below the soft pad of the tactile sensor. The instantaneous motion is described as a rotation with angular velocity ω about the instantaneous Center of Rotation (CoR). Assuming hemispherical soft pads with axisymmetric pressure distributions, the relation between the tangential and torsional loads and the instantaneous motion of the object can be described by a

6 Robotic Clerks: Autonomous Shelf Refilling

147

Fig. 6.6 Schematic view of a planar slider (Reprinted from [9] with permission from Elsevier) Fig. 6.7 Limit surface: the black line represents the maximum external load that can be applied to the object before it slides. The blue area can be enlarged by increasing the grasp force

single degree of freedom system through the well-known Limit Surface (LS) concept [23, 24]. The LS, which generalizes the Coulomb friction law to the case of coupled rotations and translations, is the locus of the maximum tangential and torsional friction load that the contact can withstand before a sliding motion starts. Figure 6.7 shows a typical LS in the wrench plane ( f t , τn ). If the wrench representing the external load is inside the blue area, no slippage occurs, otherwise, the object slips. As expected, the area can be controlled by acting on the grasp force f n , i.e., it can be enlarged by increasing f n . In fact, the limit values f tmax and τn max are related to f n as f tmax = μf n , τn max = μξ δ f nγ +1 ,

(6.1)

where μ is the friction coefficient, depending on the object surface, ξ depends on the pressure distribution and varies between 3π/16 and 2/3 for Hertzian and uniform pressure distribution, respectively [24], while δ and γ are parameters that relate the γ radius of the circular contact area ρ = δ f n to the normal force [38], and they depend on the soft pad only. Given an instantaneous sliding motion, i.e., given the position c of the CoR with respect to Center of Pressure (CoP), the corresponding friction load can be numerically computed or analytically approximated, by a superposition of basis functions,

148

A. Cavallo et al.

as explained in [9] with the two functions f tL S (c) and τn L S (c). Such a friction load corresponds to a point on the LS. Varying the value of c allows reconstructing the whole LS. Moreover, in view of the load motion inequality [23], c and τn have always opposite signs. The friction wrench computed through the LS method takes into account only the direction of the slipping velocity and not its magnitude, while the total friction depends on the velocity because it includes not only the dry friction but also the viscous one. The LS method can be extended to consider the viscous friction too, as explained in detail in [9]. The viscous friction, as function of the CoR position, can be obtained by integrating over the contact area the viscous friction generated by an infinitesimal area element that rotates about the CoR with velocity ω. Such friction depends on the viscous friction coefficient per area unit β A , assumed uniform over the contact area [34]. This gives the following expression for the viscous friction force f tv and torque τn v [9] f tv = πβ A δ 2 f n2γ ωc π τn v = − β A δ 4 f n4γ ω. 2

(6.2) (6.3)

Typically, the LS is used to compute the friction force and torque given the CoR position. Considering that the grasp control strategy is based on the dynamic model describing this instantaneous motion, the LS as a function of c is inverted given the friction force f t and torque τn measured by the tactile sensor. The inversion algorithm is reported in [9] and it needs an estimate of the friction parameters μ, δ, ξ and γ , which can be experimentally estimated through a procedure detailed in [8]. The dynamic model describing the sliding motion with friction is based on the well-known LuGre model [2], used to capture the dependence of the brake away force on the rate of variation of the external load, and it is σ0 z|ω| g( f n , c) J ω˙ = −σ1 ( f n , c) ω − σ0 z + τe , z˙ = ω −

(6.4) (6.5)

where z is the LuGre state variable, it represents the displacement of the micro asperities of the surfaces at the contact, σ0 is the so-called asperity stiffness, J is the inertia moment of the slider about the CoR, τe is the external torque acting on the slider, σ0 z represents the dry friction torque, σ1 ( f n , c) and g( f n , c) are the viscous friction coefficient function and the maximum dry friction torque function respectively, both depending on the grasp force and CoR position. The total viscous friction torque about the CoR axis, in view of (6.2) and (6.3), is π − σ1 ( f n , c) ω = τn v − c f tv = − β A δ 2 f n2γ (δ 2 f n2γ + 2c2 )ω 2

(6.6)

6 Robotic Clerks: Autonomous Shelf Refilling

149

The maximum dry friction torque g( f n , c) corresponds to the LS point ( f tL S , τn L S ) transformed to a pure torque about the CoR axis, and it is g( f n , c) = |τn L S | + |c f tL S |.

(6.7)

This dynamic system has a number of properties originally analysed in [6] in the case of a constant friction coefficient σ1 and extended to the case of a viscous friction coefficient depending on the grasp force in [9]. In particular, the system is strictly passive, the state (z, ω) always belongs to a bounded set as long as |τe | < g( f n , c) and it is locally observable, given the output function y = h(ω, z) = σ0 z + σ1 ( f n , c)ω,

(6.8)

that is the friction torque measured by the sensor w.r.t. the CoR axis, which can be computed from the measured friction wrench as y = τn − c f t .

(6.9)

In view of the observability, the angular sliding velocity ω can be estimated from the measured contact wrench, through the nonlinear observer ω˙ = l (−σ0 z − σ1 ( f n , c) ω + y) , l > 0 σ0 z˙ = ω− z| ω| g( f n , c) y = σ0 z + σ1 ( f n , c) ω.

(6.10) (6.11) (6.12)

The structure of the observer is the same as the original dynamic system (6.4), (6.5) with the measured output y playing the same role of the external torque τe , but with an inertia moment equal to the reciprocal of the observer gain l. The higher the gain, the faster is the convergence. To assess observer performance a simple experiment has been executed by grasping an object with a decreasing normal force until a rotational sliding starts. The estimated angular velocity is compared to a ground-truth velocity measured by the gyroscope of a IMU. Figure 6.8 shows that the two signals are quite close, such that the error on the estimated rotation angle is limited to less than 5 deg. This observer outperforms the observer with constant viscous friction coefficient originally proposed in [14]. The model-based grasp controller can work in two modalities, the slipping avoidance mode and the pivoting mode, that have to be selected by the high-level planner as shown in Fig. 6.9, which is also in charge of setting the control parameters. The control algorithm of each modality computes a grasp force reference that is then tracked by a low-level force controller which acts on the velocity of the gripper fingers.

150

A. Cavallo et al.

Fig. 6.8 Observer evaluation. Top plot: observer and IMU velocity, ω and ω, respectively. Bottom plot: corresponding angular positions θ and θ. (Reprinted from [9] with permission from Elsevier)

Fig. 6.9 Block scheme of the grasp controller

In the slipping avoidance modality, based on a strategy originally proposed in [13] where a Kalman filter was used to detect the slipping, the grasp force reference is computed as the superposition of two contributions, i.e., f n = ks f n s + f n d ,

(6.13)

being f n s and f n d the so-called static and dynamic contribution, respectively. The algorithm to compute the static contribution, reported in [9], starts from the measured tangential f t and torsional τn loads and uses the estimate of the CoR position to compute the normal force that lets the LS pass through the point ( f t , τn ). ks > 0 is a safety gain slightly greater than 1 to keep the load slightly inside the LS without an excessive grasp force.

6 Robotic Clerks: Autonomous Shelf Refilling

151

Fig. 6.10 Slipping avoidance test: before (left), during (center), and after the lift (right) (Reprinted from [9] with permission from Elsevier)

The static contribution alone can avoid slippage only in static or quasi-static conditions. In fact, it is well-known that if the load is time-varying the friction that the contact can withstand decreases as the rate of variation of the load increases [32], a dynamic effect captured by the LuGre model. For this reason, the dynamic ω as contribution f n d is needed; it is computed by means of the estimated velocity f n d = |Cd ω| ,

(6.14)

where Cd is a suitable linear differential operator and the absolute value is needed to ensure that f n d ≥ 0. More details about the slipping avoidance design and the stability of the closed-loop system are available in [6]. The slipping avoidance mode has been tested in a simple task, where the robot has to lift an object of unknown weight subject to the gravity torsional load avoiding any slippage (see Fig. 6.10). The results are shown in Fig. 6.11. As soon as the robot starts lifting the object, the measured torque increases up to about 0.04 Nm. The estimated velocity has two peaks and the torque rises more or less gradually. This is due to a slightly initial rotation that keeps the object in a partial contact with the table (Fig. 6.10-center). The first velocity peak corresponds to the actual start of the robot motion, the second one happens when the object completely leaves the table. As shown in Fig. 6.10-right, the robot is able to correctly lift the object with a negligible rotational slippage. The pivoting control modality is exploited to perform the in-hand manipulation maneuvers called object pivoting and gripper pivoting. Object pivoting consists in keeping the gripper (and in turn the fingers) fixed into the space while changing the orientation of the object as it is represented in Fig. 6.12-left until the object reaches a vertical pose (i.e., the line connecting the grasp point and the CoG is aligned with the gravity). Gripper pivoting is the dual maneuver and can be executed when the object is already in a vertical pose (e.g., after an object pivoting), it consists in keeping the object orientation fixed in the space while changing the orientation of the gripper by a rotation about the axis of the actuation direction, so as to change the grasp configuration (Fig. 6.12-right). Both maneuvers can be accomplished by regulating and keeping the grasping force to the value that avoids the translational slippage

152

A. Cavallo et al.

Fig. 6.11 Slipping avoidance test. Top plot: normal force (left axis) and measured torque (right axis). Bottom plot: estimated slipping velocity (Reprinted from [9] with permission from Elsevier)

Fig. 6.12 Sketch of object pivoting (left) and gripper pivoting (right) maneuvers. (Left figure reprinted from [9] with permission from Elsevier)

while allowing the rotational one. Thus, both maneuvers can be considered as a unique gripper control modality. The pivoting algorithm was originally presented in [15] and consists in applying only the static contribution of the slipping avoidance algorithm ks f n s by considering only the translational loads and not the torsional ones. This is equivalent to consider a CoR position c → ∞. This can be achieved by reducing the grasp force from its current value to the pivoting grasp force, i.e., f n P = ks

ft . μ

(6.15)

The reduction of the grasp force is done with an exponential decay characterized by a given time constant.

6 Robotic Clerks: Autonomous Shelf Refilling

153

6.4.2 Visual Servoing To execute the pivoting maneuvers, the grasp pose should be above the CoG on a vertical line passing through it with an accuracy as low as a couple of millimeters. For this reason, the object pose estimated by RoboSherlock is adjusted by a closed-loop image-based visual servoing algorithm. The visual controller is based on the ViSP library and was originally presented in [11] and detailed in [16]. The RealSense D435i RGB-D camera has been arranged in an eye-in-hand configuration (see Fig. 6.2). For each possible grasp configuration, we acquire offline the corresponding target image. At run-time, the algorithm compares the target image and the actual one to provide the robot velocity command. A keypoint matching algorithm available in the ViSP library identifies 3D feature points matched between the current and target images. The objective of the visual servoing algorithm is to minimize the error between the target features s and the actual ones e(t) = s(I (t)) − s (t),

(6.16)

where I (t) is the actual image acquired by the camera. The vectors s, s ∈ R3N f , where N f is the number of features, contain the 3D locations of the actual and target features pi and pi , respectively, thus s = p1 p2 . . . p N f s = p1 p2 . . . p N f .

(6.17) (6.18)

The visual serving controller minimizes the error (6.16) by controlling the camera 6D twist vc with the law vc = −λL(t)† e(t), (6.19) where L is the so-called interaction matrix, ·† is the Moore-Penrose pseudoinverse, and λ > 0 is the control gain [29]. Typically, s (t) = s¯ is constant which corresponds to apply a step reference to the control algorithm. Usually, to ensure stability with the higher initial error, an adaptive gain λ is adopted such that it is lower when the error is high and increases as the error declines. To speed-up the motion, [16] proposes to use a higher constant gain while applying a time-varying reference s (t) obtained by interpolating each feature between the initial value s(I (0)) and the target one s¯ while ensuring that the generated reference corresponds to a rigid body transformation. A typical behaviour obtained by adopting this method is reported in Fig. 6.13, where the feature error norm has a bell-shaped evolution due to the interpolation. The desired positioning accuracy is achieved as shown by the stead-state error lower than 2 mm.

154

A. Cavallo et al.

Fig. 6.13 Visual servoing experiment: evolution of the feature error norm (top) and corresponding camera velocity (bottom)

6.5 Plan Language for In-Store Manipulation Activities This chapter briefly describes the high level plan executive CRAM that was used for in-store manipulation activities, as well as how the pivoting capability of the low level controller is integrated. Plans in CRAM are written with a domain-specific robot programming language, shortly referred to as plan language in this chapter. The plan language contains the following components: • • • •

a vocabulary of executable action descriptions, the perform operator, which is used in the plan for executing an action description, operators for stating concurrent-reactive robot behaviors, and language constructs for failure handling and recovery strategy development.

Using the perform operator within the plan, one can execute a symbolic action description on the robot. An example of a call to perform a searching action for a specific object in the known symbolically-described location is shown below: (perform (an action (type searching ) ( object (an object (type denkmit−dishwasher−tabs−nature ) ( location (a location (on (an object (type shelf ) (name Shelf_XCBZVSEM) ( tier 3)))))))))) Performing an action is a three-step process in our system. In the first step, all the missing parameters of the action are inferred by querying the knowledge base. Examples of such parameters are:

6 Robotic Clerks: Autonomous Shelf Refilling

155

• a location of where to find a specific object in the store, • a motion plan for grasping the object, • a location where to position the robot’s base as well as the corresponding neck configuration for perceiving a certain location without occlusions. In the second step of performing an action, a plan that is associated with the given action is extracted from the plan library. Finally, the third step is the execution of the retrieved plan with the inferred motion parameters. For writing concurrent-reactive behaviors on the robot, the plan language contains a number of operators: • par construct can be used to execute two actions in parallel (in separate threads) until all of them finish; • pursue executes multiple threads in parallel until any of them finishes, at which point the other threads are terminated; • partial-order enforces that particular threads start only after other specified ones have finished. The plan language also supports the concept of fluents, which are useful for monitoring the state of the robot or the environment and for reacting to changes thereof. Last but not least, the language offers operators for failure handling: • with-failure-handling can catch and process failures across multiple threads, • with-retry-counters operator can be used to retry the same part of the plan multiple times until a retry counter is exhausted.

6.5.1 Library of Generalized Fetch and Deliver Plans for In-Store Manipulation In-store manipulation actions are the actions that enable the robot to search, fetch and deliver objects in the given shelf layer or at intermediate locations, such as a robot’s tray for transporting objects. The plan library is built up in a hierarchical way. On the lowest level of the hierarchy, plans for executing atomic actions are implemented. Atomic actions can be directly mapped to a specific motion or perception of the robot. The main atomic actions in the plan library are: • moving-arm, which is performed through a Cartesian or joint action of the arm; • moving-neck, which in the case of our mobile manipulation platform is executed with the same motion as moving-arm, as our camera is mounted on the wrist of the arm; • going, which is executed by our navigation controller; • detecting, which evokes the perception system; • grasping, which commands the reactive controller to initiate the grasping, that is achieved without any knowledge about the object weight.

156

A. Cavallo et al.

Fig. 6.14 Timeline of a plan for transporting an object from a source location to a destination location in an in-store manipulation domain

On top of these atomic actions, higher-level actions are implemented, which are executed by performing a set of lower-level actions in a sequential or concurrent manner, in accordance to the language constructs in the plan. Examples of these are: • navigating, which navigates the robot to the given target location through a call to perform a going action, while, additionally, ensuring that the arm is parked before moving the base, to avoid unnecessary collisions; • perceiving, which moves the neck to a location from where an object can be perceived, and evokes the perception system by calling the detecting action; • picking-up, which is a combination of calls to open the gripper, move the arm to reach the object, closing the gripper around the object, and retracting the arm, while concurrently monitoring the gripper for object slippage; • placing, which is the reverse of picking-up with slight variations. Finally, the top-level actions of the plan library are: • searching, which navigates the robot to a pose, from which it can perceive the object’s likely location without occlusions, moves the neck to position the camera accordingly, calls the perception system to find the exact coordinates of the object, and contains various failure handling strategies to, e.g., handle the situation when the object is not in the field of view (by repositioning the base or moving the neck); • fetching, which, given a concrete pose of the object, navigates the robot to a location, from which it can reach the object without obstruction, and then picks the object up; • and delivering, which, assuming that the object is already in the hand, navigates towards the destination location, perceives the shelf level, where the object is supposed to be placed, and performs the placing action. Figure 6.14 shows the timeline of a general plan for transporting an object from location A to location B, using the plans for performing actions from the plan library for in-store manipulation.

6 Robotic Clerks: Autonomous Shelf Refilling

157

6.5.2 Integration of the Reactive Controller The reactive controller is integrated at two points in the system. First, the motion planner needs to be aware of the pivoting capability and be able to utilize it. Second, the plan executive needs to parameterize and command the controller. To model the pivoting capability in the motion planner, we added a virtual joint between a grasped object and the robot’s fingers. The motion planner Giskard was used, it uses a constraint and optimization based approach to generate motions, similar to [1]. It allows the user to specify geometric constraints on the robot. Such a constraint is used to describe that the grasped object should stay vertical. It is implemented by minimizing the angle between the gravity vector and the vector pointing from the grasp point to the center of mass of the object. If that keep-vertical constraint is present, the motion planner will automatically use the virtual joint to rotate the gripper around the object, if needed. During motion planning Giskard will create a full body joint trajectory for the robot, including goals for the virtual joint. This approach has a negligible influence on the planning time, because only one additional constraint and one free variable for the virtual joint is needed. For comparison, a typical goal, such as moving the end effector to a specific Cartesian pose while avoiding collision requires approximately 100 constraints and 10 free variables for the robot’s degrees of freedom. To execute these trajectories, the plan executive CRAM splits this trajectory in two, one with only the virtual joint for the reactive controller and one with the rest for the robot. The joint trajectory for the robot is sent as usual. During the execution of that trajectory, the plan executive monitors the time and enables or disables the pivoting of the gripper, if the velocity of the virtual joint at that time is above or below a small threshold, respectively. A small threshold is needed to avoid switching too often, due to numerical noise. That threshold creates a small error in the tracking of the virtual joint trajectory. However, since the virtual joint is actuated by gravity, it corrects itself whenever the pivoting is turned on. More details on the proposed solution can be found in [18].

6.6 Experience-Based Learning and Optimization of Manipulation Activities This section describes how experience-based learning and optimization can be used to improve manipulation activities of the robot system in a retail store. In short, the semantic digital twins, as described in Chap. 2, are used to create a simulation training environment for the robot system. During the training, episodic memories are generated and used to optimize the parameters of actions in the mobile manipulation plan. In particular, Gaussian distributions are learned for the poses of the base of the robot that are predicted to result in a successful outcome of object perception,

158

A. Cavallo et al.

grasping, and placing actions. The improved action parameters are then used to execute mobile manipulation actions in the real-world store.

6.6.1 Learning Pipeline The experience-based learning and performance optimization pipeline is depicted in Fig. 6.15. The first step is the creation of a semantic digital twin as described in Chap. 2. The result of this step is an autonomously generated semantic digital twin, which includes the description of the shelves in the store and the items that are currently on the shelves. Additionally, by loading this map into our knowledge representation framework, KnowRob [4], the robot gains the capability to ask reasoning queries about the store, e.g., where are empty facings, where are misplaced objects, where do they belong, etc. In the next step, the semantic digital twin is loaded into a fast robot plan simulation environment. A robot is spawned together with the shelves and the objects of a semantic digital twin into this environment. In this simulator, the robot performs a mobile pick and place CRAM plan in different variations each time: different objects are picked up by the robot and placed at different locations in different shelves, and the robot chooses a different pose to stand for perceiving the objects and for grasping

Fig. 6.15 Pipeline for experience based learning and performance optimization of manipulation activities

6 Robotic Clerks: Autonomous Shelf Refilling

159

and placing them. The result of the simulation step is a collection of logs, so-called “episodic memories”, which the robot can use as training data. The advantage of using simulation for generating this data, as opposed to executing the actions in the real world, is that in simulation one does not require constant human supervision. There is no danger of the robot destroying the store items and that the simulator can be up to 50 times faster than real time [25]. In the third step, the generated episodic memories are used as a basis for training generative models for the parameters of mobile pick and place. The most important parameters of mobile pick and place are the following: 1. 2. 3. 4. 5. 6. 7.

where to search for the object, where to stand to perceive it, how to position the camera such that it points at the likely location of the object, where to stand to grasp the object, from which side to grasp the object (aka grasping pose), with which force to squeeze the object when grasping, what is the trajectory for the arm to grasp the object from the given base location with the given grasp pose, 8. where to place the object, 9. where to stand to be able to reach the placing pose. Parameters 1, 5, 6 and 8 are inferred from the semantic digital twin and do not require a learned model. Parameters 3 and 7 have a good analytical solution, which is provided by the motion planner. Therefore, only the parameters 2, 4 and 9 were learned. Finally, in the fourth step, the generative models from the previous step are used to replace the original parameters in the mobile pick and place plan with learned parameters. It is expected that these parameters will improve the execution success rate. To make informed conclusions about this, the mobile pick and place actions were performed in different variations using the new parameter models in the fast simulation environment, to generate a new collection of episodic memories. Afterwards, the old logs were compared with the new logs to see the difference in execution success rates.

6.6.2 Generating Episodic Memories within a Simulator The semantic digital twin, which was generated by scanning the real-world store shown at the top of Fig. 6.16, is loaded into the fast simulation environment for generating training data. The bottom of Fig. 6.16 shows a visualization of the simulation environment. The fast simulation environment is based on the Bullet physics engine [25]. In this environment action execution is not continuous but discrete. This means that instead of executing continuous motion trajectories, the robot “teleports” between the important via points of these trajectories. This makes the simulator very fast

160

A. Cavallo et al.

Fig. 6.16 Top: picture of the real store. Bottom: the semantic digital twin of the same store loaded into a simulation environment. The grey boxes are place holders for objects for which no 3D models are available

but less realistic. On the other hand, if it is possible to generate a collision-free configuration for the robot in order to reach a via point of the trajectory inside the simulator, it is very likely that a smooth trajectory will also be possible in the real world. If it is not, the robot has to deal with such failures at run time during real-world execution. In addition to inverse kinematics and collision detection, the fast simulator can also perform visibility reasoning based on off-screen rendering and stability calculations based on turning the dynamics of the world on and off on demand. A comprehensive object to object and object to robot attachment mechanism is also implemented to ensure that the object, which is held in the robot’s gripper or placed on the robot’s carry tray, follows the movements of the robot. Having the semantic digital twin from the first step of the pipeline and a kinematically correct representation of the robot, the real-world store can be recreated to a certain degree of realism in simulation. In this simulation the pick and place plan is repeated to create episodic memories. As mentioned in the previous section, most of the parameters of mobile pick and place are either acquired from the semantic digital

6 Robotic Clerks: Autonomous Shelf Refilling

161

twin or solved analytically. The parameters that need to be learned are robot base locations for (1) perceiving, (2) picking up and (3) placing. To generate the training data, the robot chooses a pose for its base for one of the three actions mentioned above, executes the action and logs if it was successful (positive sample) or if it failed (negative sample). The domain of robot base poses is infinite due to being continuous. Therefore, it is reduced as follows: • the space is discretized with a resolution of 4 cm × 4 cm, • the robot base pose is limited to only X and Y coordinates, as Z will be 0, • the orientation of the robot is restricted to only rotate around the Z axis, as the robot is always oriented upright, • the rotation angle around the Z axis is limited to a small set of values that are convenient for the given robot (for KUKA KMR-IIWA {− π2 rad, π2 rad is used}, because the robot can only move sideways within a shop aisle due to narrow space), • and the X and Y values are restricted to the dimensions of the store room, such that no base locations are chosen outside of the store. These assumptions considerably shrink the search space of robot base poses but the positive samples in this space are still extremely scarce. Therefore, common-sense heuristics are introduced, which include the additional assumptions, listed below. 1. For perceiving an object, the robot base poses are constrained to only those that are a certain distance away from the object (in the case of the KMR-IIWA robot 2 m were estimated as the maximum distance from which the robot can still perceive small objects from the retail domain). 2. Additionally, for perceiving objects it is assumed that the closer the robot stands to the object, the more successful the perception results will be. 3. For reaching objects, the robot base poses are constrained to only those that are a certain distance away from the object (in the case of KMR-IIWA 1.1 m were estimated as the furthest distance from the robot’s base to the object, at which the robot can still grasp the object with an outstretched arm). 4. Although one could think that the closer the robot is to the object, the easier it would be to grasp it, this is not true, as too short distances create a narrow space for the motion planner to maneuver in. Therefore, for placing the object, there is no preference between poses that are closer to or further from it. 5. However, for picking up an object, the robot always re-perceives it before grasping from the same base location. Therefore, the area of robot base poses for picking up combines reachability constraints and visibility constrains. 6. For all the parameters (poses for perceiving, picking up and placing) the domain is constrained to locations that do not cause collisions between the base of the robot and the static objects in the store, i.e. the shelves. This is achieved by extruding the objects with the distance between the center of the robot’s mobile base and its closest edge (the shortest “radius” of KMR-IIWA’s rectangular mobile base is bigger than 30 cm, therefore, the padding of 30 cm is chosen). These heuristic and commonsense constrains result in probability distributions, from which the robot samples the base poses to perform its mobile pick and place actions. Figure 6.17 shows three example distributions.

162

A. Cavallo et al.

Fig. 6.17 Heuristics-based probability distributions of robot base poses for perceiving (left), picking up (middle) and placing (right) an object. The distributions are visualized as heat maps: the hotter the color, the higher its estimated success probability for the given action. The placing poses have the same color as their probabilities are the same according to heuristic #4

When performing its action of bringing the given object from its current location on a shelf to a different location, the robot picks a random sample from each of the distributions for perceiving, picking-up and placing. To generate the training data, a large number of such mobile pick and place actions is executed. Each time different samples are chosen from the distributions, thus, a rich set of training data is generated. If a robot base pose results in a failed perception, grasping or placing, the robot simply picks a different pose from the distribution and retries its action right away, until a successful pose is found. All of the failed and successful samples are logged in the episodic memories. Additionally, the plan executive automatically enriches the memories with semantic annotations. This allows a learning algorithms to automatically extract information such as, where did the robot stand when it successfully grasped an object.

6.6.3 Learning from Episodic Memories For the learning task a plan specialization architecture is utilized, which is described in detail in [27]. The motivation of the architecture is to create a pipeline of acquiring the robot experiences during execution via an episodic memory logger, storing the data in a memory database, and from this data, creating statistical models to improve the robot performance. As mentioned in the previous section, to improve the performance of the robot the focus was set on improving its grasping, placing and perceiving skills. In the first step of the architecture pipeline, it was required to record the episodic memories from which the robot should learn. During a time span of 20 h, 99 episodic memories were generated in simulation, where the robot base poses were sampled from heuristics-based probability distributions. It is important to mention that episodes with failed experiment execution were discarded. From those 99 episodes,

6 Robotic Clerks: Autonomous Shelf Refilling

163

Table 6.1 For each action and object combination, one multivariate Gaussian was created to replace the defined heuristic. In the table, each cell contains as the first image the heuristic and as the second image the learned statistical model. These are the models for the two objects shown in Fig. 6.18 Balea Heuristic

Learned

place

pick

perceive

Dishwasher-tabs Heuristic Learned

all robot poses relative to the global fixed frame were extracted, where the agent perceived, grasped and placed an object successfully. Extracting the specific data from the database of episodic memories is made possible due to the rich semantic annotation of episodic memories from the plan executive. The main idea of the plan specialization architecture is to generate a multivariate Gaussian distribution to replace the heuristics, see Table 6.1. The mean of the Gaussian is the average of all X , Y positions where the robot successfully performed the action. Individual multivariate Gaussians were created for each action like perceive, grasp and place that we want to perform on an individual object. We decided to use a simple model representation in order to allow fast training, reasoning and for having an interpretable model. For the retail scenario, the assumption is that after the environment is scanned, the shelves and objects on it will not change, until a replenishment task is performed. Therefore, the products to be picked/placed should remain in the scanned locations and be placed on known shelves. This allows the training of a model for grasping an object, e.g., from a specific shelf instead of learning a distribution for grasping an object located anywhere in the environment. An alternative would be to create general models that can abstract away from the environment and deal with the dynamic changes in the environment [27]. However, creating the general models requires a large training set, which covers

164

A. Cavallo et al.

many variations of the learning problem and a lot of context knowledge. In addition, the general statistical models tend to have lower improvement rate than the specialized Gaussians [26]. Therefore, the decision was made to use the simple model with specialized multivariate Gaussians, as presented above.

6.7 Experiments 6.7.1 Replenishment Experiments To test the effectiveness of the pivoting ability for pick and place tasks in retail environment, we conducted several experiments. This approach was tested on the 5 objects depicted in Fig. 6.18. The interplay between the motion planner and the reactive controller was tested first within a simple pick-and-place task using fixed and non-fixed start/goal angles. The task consists of placing the object E of Fig. 6.18 on a desk by picking it from the floor with a given angle between the finger approach axis and the vertical direction. The experiment is first executed in a simulated environment using different desk heights and then on the real robot using a 0.72 m high desk. The experiments show that the interaction works reliably and improves the reachability of the robot. In a second experiment, we tested the whole algorithm in a complex real case scenario where the gripper pivoting ability may be mandatory due to obstacle positions. The objects from Fig. 6.18 were placed on the shelf depicted in Fig. 6.19. Table 6.2 shows the results for the pick and place actions with different start and goal angles between the objects and the fingers. It is obvious that the virtual joint greatly improves the robot’s capabilities in operating in the narrow spaces on shelves. A fixed start and goal configuration would only lead to a successful action in 2 cases, whereas an angle computed by Giskard with the help of its collision avoidance always lead to a success. Figure 6.20 shows

Fig. 6.18 The two objects used for evaluating the pipeline were C “Dishwasher-tabs” and D “Balea”. (Reprinted from [18] with permission from IEEE)

6 Robotic Clerks: Autonomous Shelf Refilling

165

Fig. 6.19 Shelf filled with objects at the end of the experiment. (Reprinted from [18] with permission from IEEE)

Table 6.2 Planning times (in seconds) of the shelf experiment for different start and goal angle combinations. (Reprinted from [18] with permission from IEEE) αs \ αg

Shelf at 0.2 m

Shelf at 0.6 m

Shelf at 0.93 m

Shelf at 1.31 m

−π/2

−π/4

(−0.95) −π/2

−π/4

(−1.38) −π/2

−π/4

(−1.58) −π/2

−π/4

(−1.82)

−π/4

20.5

18.3

22.1

17.7

–

18.9

–

22.5

21.5

–

–

25.5

0.0

23.5

19

21.1

18.5

–

17.9

20.8

21.7

19.9

–

–

24.1

(0.16)

23.2

19.7

22.3

17.1

–

18.7

20.2

21.7

21.1

–

–

25.5

a plot of the forces and virtual joint speed when object A is placed on the bottom shelf. The start and goal angles chosen by the planner are 0.16 rad and −0.95 rad respectively. The bottom part of the image depicts the computed trajectory for the virtual joint and the gray areas show when the reactive controller switches into gripper pivoting mode. It also showcases that the system is not constantly in pivoting mode, but switches back to slipping avoidance, when the joint is not needed.

6.7.2 Evaluation of Learning Pipeline Having the learned models, the robot can sample base poses for its actions from the learned distributions. Practically, when the robot encounters an unknown parameter value for a certain action, e.g., the parameter corresponding to the following query,

166

A. Cavallo et al.

Fig. 6.20 Shelf experiment. In this case the object A is placed on the bottom shelf. Note the gray areas where the planner activates the gripper pivoting mode. (Reprinted from [18] with permission from IEEE)

“where should I stand to perform this action”, the robot always first checks if a learned model exists to infer this parameter. If it does, the robot samples from that model. If it does not, the robot uses the heuristic-based distribution to sample from. There exist different models, depending on the object acted on, the robot hardware platform and the environment specifics, so the query to infer the parameter uses all of these as input parameters for the learned model lookup. This means that, currently, models learned for another robot platform cannot be reused on the KMR-IIWA. Arguably, the models should not be reused in general, as they are customized towards the specific hardware and another robot’s learned model might result in bad success rates. However, this is an insignificant disadvantage, since acquiring the required data and generating the required Gaussians is efficient. The mobile pick and place actions were executed with the same objects (C and D from Fig. 6.18) as were used to generate the training data, but using the learned models instead of heuristics to infer the three robot base pose parameters. During execution, the data was again logged. The experiments were conducted with heuristics and trained models for 20 h each. During the time frame, 193 episodes were generated while using the heuristics and 272 episodes were generated using the learned models. The increased number of episodes during the learned model execution indicates that the robot was doing less errors during the experiment and therefore was able to

6 Robotic Clerks: Autonomous Shelf Refilling

167

Table 6.3 Shows the numbers of failed/successful actions divided in Heuristics and Learned models Heuristic Learned Failed Success % Failed Success % Perceiving Picking Placing

281 218 470

967 566 564

77.5 72.2 54.5

66 2 18

1088 815 815

94.3 99.8 97.8

Fig. 6.21 Plan execution with trained models in the real robot

execute more experiments in the same time frame as compared to the heuristics. Table 6.3 lists the numbers of failed and successful actions divided in heuristics and learned models. To evaluate the success of the learned models, comparing the time it took to generate them is not a good benchmark. The reason for that is that action execution in the simulation takes only milliseconds and depends also on external components such as IK-solvers. Thus, measuring a time improvement will be insufficient to show the impact of the learned models. During the evaluation, we observed for the learned models a picking-success-ratio of 99.8%, for placing 97.8% and for perceiving 94.3%. Compared to the heuristics with a picking-success-ratio of 72.2%, for placing 54.5% and for perceiving 77.5%. The learned models improved the robot’s performance in simulation significantly, using only a small set of 99 episodes. However, as mentioned above, those models can only be used for the selected robot and environment. For the retail scenario, this should not be a disadvantage since collecting data and retraining the models is done efficiently and the robot can adapt quickly to a new environment. Finally, the trained models were used to perform the pick and place plan on the real KMR-IIWA (see Fig. 6.21). During execution, it was observed that action execution is more time-efficient and less failure-prone when using learned motion parameters as opposed to heuristics-based parameters. However, due to the challenge of collecting

168

A. Cavallo et al.

a large number of episodic memories with a real robot, a solid statistical evaluation of the improvement is unfeasible to perform. This pipeline has also been tested in a kitchen environment with a PR2 Robot [26].

6.8 Conclusions The robotic system presented in this chapter is one of the first examples of an autonomous shelf replenishment system ever tested in an industrially relevant environment. The success of the presented experiments can be ascribed to the appropriate integration of the planning, learning and reactive control components of the robotic system toward a common objective. The objective is to make a robot able to carry out a shelf re-stocking process with the dexterity required by the cluttered and narrow spaces where the robot has to act, even if equipped with only a parallel gripper. Different enabling technologies are behind this dexterity, combination of visual and tactile sensing, as well as integration of knowledge-enabled planning with reactive control.

References 1. Aertbeliën, E., De Schutter, J.: eTaSL/eTC: a constraint-based task specification language and robot controller using expression graphs. In: IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 1540–1546 (2014) 2. Aström, K.J., Canudas de Wit, C.: Revisiting the LuGre friction model. IEEE Control Syst. Mag. 28(6), 101–114 (2008) 3. Beetz, M., Bálint-Benczédi, F., Blodow, N., Nyga, D., Wiedemeyer, T., Marton, Z.C.: Robosherlock: unstructured information processing for robot perception. In: IEEE International Conference on Robotics and Automation, pp. 1549–1556 (2015) 4. Beetz, M., Beßler, D., Haidu, A., Pomarlan, M., Bozcuo˘glu, A.K., Bartels, G.: Knowrob 2.0– a 2nd generation knowledge processing framework for cognition-enabled robotic agents. In: IEEE International Conference on Robotics and Automation, pp. 512–519 (2018) 5. Bogue, R.: Strong prospects for robots in retail. Ind. Robot 46(3), 326–331 (2019) 6. Cavallo, A., Costanzo, M., De Maria, G., Natale, C.: Modeling and slipping control of a planar slider. Automatica 115, 108875 (2020). https://doi.org/10.1016/j.automatica.2020.108875 7. Cirillo, A., Costanzo, M., Laudante, G., Pirozzi, S.: Tactile sensors for parallel grippers: design and characterization. Sensors 21(5) (2021). https://doi.org/10.3390/s21051915 8. Costanzo, M.: Soft-contact modeling for in-hand manipulation control and planning. Ph.D. thesis, Università degli Studi della Campania Luigi Vanvitelli (2020). https://www.ingegneria. unicampania.it/roboticslab/publications#phd-theses 9. Costanzo, M.: Control of robotic object pivoting based on tactile sensing. Mechatronics 76, 102545 (2021). https://doi.org/10.1016/j.mechatronics.2021.102545 10. Costanzo, M., De Maria, G., Lettera, G., Natale, C.: Grasp control for enhancing dexterity of parallel grippers. In: IEEE International Conference on Robotics and Automation, pp. 524–530. Paris, F (2020) 11. Costanzo, M., De Maria, G., Lettera, G., Natale, C.: Can robots refill a supermarket shelf?: motion planning and grasp control. IEEE Robot. Autom. Mag. 28(2), 61–73 (2021). https:// doi.org/10.1109/MRA.2021.3064754

6 Robotic Clerks: Autonomous Shelf Refilling

169

12. Costanzo, M., De Maria, G., Lettera, G., Natale, C., Pirozzi, S.: Motion planning and reactive control algorithms for object manipulation in uncertain conditions. Robotics 7(4) (2018). https://doi.org/10.3390/robotics7040076 13. Costanzo, M., De Maria, G., Natale, C.: Slipping control algorithms for object manipulation with sensorized parallel grippers. In: IEEE International Conference on Robotics and Automation, pp. 7455–7461 (2018) 14. Costanzo, M., De Maria, G., Natale, C.: Control of sliding velocity in robotic object pivoting. In: IFAC World Congress, pp. 9950–9955. Berlin, DE (2020). https://doi.org/10.1016/j.ifacol. 2020.12.2710 15. Costanzo, M., De Maria, G., Natale, C.: Two-fingered in-hand object handling based on force/tactile feedback. IEEE Trans. Robot. 36(1), 157–173 (2020). https://doi.org/10.1109/TRO. 2019.2944130 16. Costanzo, M., De Maria, G., Natale, C.: Handover control for human-robot and robot-robot collaboration. Front. Robot. AI 8, 132 (2021). https://doi.org/10.3389/frobt.2021.672995 17. Costanzo, M., De Maria, G., Natale, C., Pirozzi, S.: Design and calibration of a force/tactile sensor for dexterous manipulation. Sensors 19(4) (2019). https://doi.org/10.3390/s19040966 18. Costanzo, M., Stelter, S., Natale, C., Pirozzi, S., Bartels, G., Maldonado, A., Beetz, M.: Manipulation planning and control for shelf replenishment. IEEE Robot. Autom. Lett. 5(2), 1595–1601 (2020). https://doi.org/10.1109/LRA.2020.2969179 19. Dafle, N., Rodriguez, A., Paolini, R., Tang, B., Srinivasa, S., Erdmann, M., Mason, M., Lundberg, I., Staab, H., Fuhlbrigge, T.: Extrinsic dexterity: In-hand manipulation with external forces. In: IEEE International Conference on Robotics and Automation, pp. 1578–1585. Hong Kong (2014) 20. De Maria, G., Natale, C., Pirozzi, S.: Force/tactile sensor for robotic applications. Sens. Actuators A: Phys. 175, 60–72 (2012). https://doi.org/10.1016/j.sna.2011.12.042 21. Fang, Z., Bartels, G., Beetz, M.: Learning models for constraint-based motion parameterization from interactive physics-based simulation. In: IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 4005–4012 (2016) 22. Fontanelli, G.A., Paduano, G., Caccavale, R., Arpenti, P., Lippiello, V., Villani, L., Siciliano, B.: A reconfigurable gripper for robotic autonomous depalletizing in supermarket logistics. IEEE Robot. Autom. Lett. 5(3), 4612–4617 (2020) 23. Goyal, S., Ruina, A., Papadopulous, J.: Planar sliding with dry friction, part i. Wear 143, 307–330 (1991) 24. Howe, R.D., Cutkosky, M.R.: Practical force-motion models for sliding manipulation. Int. J. Robot. Res. 15(6), 557–572 (1996). https://doi.org/10.1177/027836499601500603 25. Kazhoyan, G., Beetz, M.: Executing underspecified actions in real world based on online projection. In: IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5156–5163 (2019) 26. Kazhoyan, G., Stelter, S., Kenfack, F.K., Koralewski, S., Beetz, M.: The robot household marathon experiment (2020). arXiv:2011.09792 27. Koralewski, S., Kazhoyan, G., Beetz, M.: Self-specialization of general robot plans based on experience. IEEE Robot. Autom. Lett. 4(4), 3766–3773 (2019) 28. Kuhn, H., Sternbeck, M.: Integrative retail logistics: an exploratory study. Oper. Manag. Res. 6, 2–18 (2013) 29. Marchand, E., Spindler, F., Chaumette, F.: ViSP for visual servoing: a generic software platform with a wide class of robot control skills. IEEE Robot. Autom. Mag. 12(4), 40–52 (2005). https:// doi.org/10.1109/mra.2005.1577023 30. Photoneo s.r.o: Universal depalletizer. https://www.photoneo.com/wp-content/uploads/2020/ 07/Universal-Depalletizer.pdf. Accessed on June 2021 31. Ricardez, G.A.G., Okada, S., Koganti, N., Yasuda, A., Eljuri, P.M.U., Sano, T., Yang, P.C., Hafi, L.E., Yamamoto, M., Takamatsu, J., Ogasawara, T.: Restock and straightening system for retail automation using compliant and mobile manipulation. Adv. Robot. 34(3–4), 235–249 (2020)

170

A. Cavallo et al.

32. Richardson, R.S.H., Nolle, H.: Surface friction under timedependent loads. Wear 37(1), 87–101 (1976) 33. Sakai, R., Katsumata, S., Miki, T., Yano, T., Wei, W., Okadome, Y., Chihara, N., Kimura, N., Nakai, Y., Matsuo, I., Shimizu, T.: A mobile dual-arm manipulation robot system for stocking and disposing of items in a convenience store by using universal vacuum grippers for grasping items. Adv. Robot. 34(3–4), 219–234 (2020) 34. Shkulipa, S., den Otter, W., Briels, W.: Surface viscosity, diffusion, and intermonolayer friction: simulating sheared amphiphilic bilayers. Biophys. J. 89(2), 823–829 (2005). https://doi.org/ 10.1529/biophysj.105.062653 35. The REFILLS project: Robotics enabling fully-integrated logistics lines for supermarkets. http://www.refills-project.eu/. Accessed on June 2021 36. Winkler, J., Balint-Benczedi, F., Wiedemeyer, T., Beetz, M., Vaskevicius, N., Mueller, C., Doernbach, T., Birk, A.: Knowledge-enabled robotic agents for shelf replenishment in cluttered retail environments. In: International Conference on Autonomous Agents & Multiagent Systems, pp. 1421–1422 (2016) 37. World Robotic Summit: Future convenience store challenge. https://worldrobotsummit.org/en/ wrs2020/challenge/service/fcsc.html. Accessed on December 2020 38. Xydas, N., Kao, I.: Modelling of contact mechanics and friction limit surfaces for soft fingers in robotics, with experimental results. Int. J. Robot. Res. 18, 941–950 (1999)