Modelling Empty Container Repositioning Logistics 3030933830, 9783030933838


129 49 3MB

English Pages [174]

Report DMCA / Copyright

DOWNLOAD PDF FILE

Table of contents :
Acknowledgements
Contents
1 Empty Equipment Logistics and Empty Container Repositioning (ECR)
1.1 Empty Equipment Logistics
1.1.1 Empty Vehicle Redistribution: Freight Vehicle
1.1.2 Empty Vehicle Redistribution: Passenger Vehicle
1.1.3 Empty Bike Repositioning
1.1.4 Empty Container Chassis Repositioning
1.1.5 Empty Container Repositioning
1.2 ECR: Reasons and Characteristics
1.3 Modeling ECR Logistics
1.4 Structure of the Book
References
2 Optimal ECR Policy in a Single-Depot System
2.1 Introduction
2.2 A Discrete Stochastic Dynamic Programming Model
2.2.1 The Structural Properties of the Value Function and the Optimal Control at Period N
2.2.2 The Structural Properties of the Value Function and the Optimal Control at Period n
2.3 A Fluid-Flow Model Based on Continuous-Time Dynamic Programming
2.3.1 Structural Properties of the Optimal Policy
2.3.2 Solving the Hamilton–Jacobi-Bellman Equations
2.3.3 Extension to More General Cases
2.3.4 Numerical Examples
2.4 Summary and Notes
References
3 Optimal ECR Policy in Two-Depot System: Periodic Review
3.1 Introduction
3.2 A Discrete Stochastic Dynamic Programming Model
3.3 Optimal ECR Policy and Its Structural Properties
3.3.1 The Properties of the Value Function at Period N
3.3.2 The Structural Properties of the Value Function and the Optimal Control at Period n
3.3.3 Asymptotic Structural Properties of the Optimal Control Policy
3.4 Near-Optimal Threshold-Type Policy
3.5 Numerical Examples
3.6 Summary and Notes
References
4 Optimal ECR Policy in Two-Depot Shuttle Systems: Continuous Review
4.1 Introduction
4.2 Problem Formulation
4.3 Convert into Discrete-Time Markov Decision Process
4.4 Solve the Discounted Cost Case
4.4.1 Optimal ECR Policy and Its Structural Properties
4.4.2 Closed-Form Objective Function and Optimal Threshold Values
4.4.3 Numerical Experiments
4.5 Solve the Long-Run Average Cost Case
4.5.1 Stationary Distribution Under Threshold Control Policy
4.5.2 Optimality of the Threshold Control Policy
4.5.3 Numerical Examples
4.6 Extension to Cases with External Supply and Demand
4.7 Summary and Notes
References
5 Optimal and Near-Optimal ECR Policies in Hub-and-Spoke Systems: Continuous Review
5.1 Introduction
5.2 Problem Formulation and Uniformization
5.3 Optimal ECR Policy
5.4 Suboptimal Policy Using a Dynamic Decomposition Procedure
5.4.1 Optimal Feedback Control for Two-Depot Systems
5.4.2 Dynamic Decomposition Procedure
5.4.3 Structural Properties of Dynamic Decomposition Policy
5.5 Numerical Examples
5.5.1 Structural Properties of ECR Policies in Two-Spoke-One-Hub Systems
5.5.2 Comparing DDP with the Optimal Policies in Three-Spoke-One-Hub Systems
5.5.3 Comparing DDP with a Heuristic Policy in a Many-Spoke-One-Hub Systems
5.6 Extension to Cases with External Supply and Demand
5.7 Summary and Notes
References
6 Optimal ECR in General Inland Transportation Systems with Uncertainty: Periodic Review
6.1 Introduction
6.2 ECR in Inland Transportation Systems
6.3 ECR in Inland Transportation Systems with Transfer Ports
6.4 ECR in Intermodal Transportation Systems
6.5 Approximate Dynamic Programming Method
6.5.1 Generalized Stochastic Dynamic Programming Model
6.5.2 Approximate Dynamic Programming Algorithm
6.6 Simulation Methods and Parameterized Policies
6.7 Metaheuristic Optimization Methods
6.8 Stochastic Approximation Methods
6.9 Perturbation Analysis Methods
6.10 Ordinal Optimization Methods
6.11 Summary and Notes
References
7 Conclusions
7.1 Conclusions and Managerial Insight for ECR
7.2 Limitations and Further Research
References
Recommend Papers

Modelling Empty Container Repositioning Logistics
 3030933830, 9783030933838

  • 0 0 0
  • Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up
File loading please wait...
Citation preview

Dong-Ping Song Jingxin Dong

Modelling Empty Container Repositioning Logistics

Modelling Empty Container Repositioning Logistics

Dong-Ping Song · Jingxin Dong

Modelling Empty Container Repositioning Logistics

Dong-Ping Song School of Management University of Liverpool Liverpool, UK

Jingxin Dong Business School Newcastle University Newcastle upon Tyne, UK

ISBN 978-3-030-93382-1 ISBN 978-3-030-93383-8 (eBook) https://doi.org/10.1007/978-3-030-93383-8 © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland

Acknowledgements

We would like to thank the following colleagues for joint works and insightful discussions on topics that are related to the materials covered in this book during various periods of time: Prof. Qing Zhang, Prof. Daniel Ng, Prof. Christopher Earl, Dr. Jonathan Carter, and Prof. Michael Roe.

v

Contents

1 Empty Equipment Logistics and Empty Container Repositioning (ECR) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1 Empty Equipment Logistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1.1 Empty Vehicle Redistribution: Freight Vehicle . . . . . . . . . . 1.1.2 Empty Vehicle Redistribution: Passenger Vehicle . . . . . . . . 1.1.3 Empty Bike Repositioning . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1.4 Empty Container Chassis Repositioning . . . . . . . . . . . . . . . . 1.1.5 Empty Container Repositioning . . . . . . . . . . . . . . . . . . . . . . . 1.2 ECR: Reasons and Characteristics . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3 Modeling ECR Logistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.4 Structure of the Book . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 Optimal ECR Policy in a Single-Depot System . . . . . . . . . . . . . . . . . . . . 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 A Discrete Stochastic Dynamic Programming Model . . . . . . . . . . . 2.2.1 The Structural Properties of the Value Function and the Optimal Control at Period N . . . . . . . . . . . . . . . . . . . 2.2.2 The Structural Properties of the Value Function and the Optimal Control at Period n . . . . . . . . . . . . . . . . . . . 2.3 A Fluid-Flow Model Based on Continuous-Time Dynamic Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.1 Structural Properties of the Optimal Policy . . . . . . . . . . . . . 2.3.2 Solving the Hamilton–Jacobi-Bellman Equations . . . . . . . . 2.3.3 Extension to More General Cases . . . . . . . . . . . . . . . . . . . . . 2.3.4 Numerical Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4 Summary and Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1 1 2 3 3 5 6 7 10 11 14 17 17 18 21 22 24 26 29 34 35 40 41

vii

viii

Contents

3 Optimal ECR Policy in Two-Depot System: Periodic Review . . . . . . . 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 A Discrete Stochastic Dynamic Programming Model . . . . . . . . . . . 3.3 Optimal ECR Policy and Its Structural Properties . . . . . . . . . . . . . . 3.3.1 The Properties of the Value Function at Period N . . . . . . . . 3.3.2 The Structural Properties of the Value Function and the Optimal Control at Period n . . . . . . . . . . . . . . . . . . . 3.3.3 Asymptotic Structural Properties of the Optimal Control Policy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4 Near-Optimal Threshold-Type Policy . . . . . . . . . . . . . . . . . . . . . . . . . 3.5 Numerical Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.6 Summary and Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 Optimal ECR Policy in Two-Depot Shuttle Systems: Continuous Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3 Convert into Discrete-Time Markov Decision Process . . . . . . . . . . 4.4 Solve the Discounted Cost Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4.1 Optimal ECR Policy and Its Structural Properties . . . . . . . . 4.4.2 Closed-Form Objective Function and Optimal Threshold Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4.3 Numerical Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.5 Solve the Long-Run Average Cost Case . . . . . . . . . . . . . . . . . . . . . . 4.5.1 Stationary Distribution Under Threshold Control Policy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.5.2 Optimality of the Threshold Control Policy . . . . . . . . . . . . . 4.5.3 Numerical Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.6 Extension to Cases with External Supply and Demand . . . . . . . . . . 4.7 Summary and Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 Optimal and Near-Optimal ECR Policies in Hub-and-Spoke Systems: Continuous Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2 Problem Formulation and Uniformization . . . . . . . . . . . . . . . . . . . . . 5.3 Optimal ECR Policy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4 Suboptimal Policy Using a Dynamic Decomposition Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4.1 Optimal Feedback Control for Two-Depot Systems . . . . . . 5.4.2 Dynamic Decomposition Procedure . . . . . . . . . . . . . . . . . . . 5.4.3 Structural Properties of Dynamic Decomposition Policy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.5 Numerical Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

43 43 44 46 46 48 55 66 68 70 71 73 73 75 78 80 80 84 90 95 96 99 100 101 103 104 105 105 106 111 112 112 113 115 117

Contents

5.5.1 Structural Properties of ECR Policies in Two-Spoke-One-Hub Systems . . . . . . . . . . . . . . . . . . . . . . 5.5.2 Comparing DDP with the Optimal Policies in Three-Spoke-One-Hub Systems . . . . . . . . . . . . . . . . . . . . . 5.5.3 Comparing DDP with a Heuristic Policy in a Many-Spoke-One-Hub Systems . . . . . . . . . . . . . . . . . . . 5.6 Extension to Cases with External Supply and Demand . . . . . . . . . . 5.7 Summary and Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 Optimal ECR in General Inland Transportation Systems with Uncertainty: Periodic Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2 ECR in Inland Transportation Systems . . . . . . . . . . . . . . . . . . . . . . . 6.3 ECR in Inland Transportation Systems with Transfer Ports . . . . . . 6.4 ECR in Intermodal Transportation Systems . . . . . . . . . . . . . . . . . . . 6.5 Approximate Dynamic Programming Method . . . . . . . . . . . . . . . . . 6.5.1 Generalized Stochastic Dynamic Programming Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.5.2 Approximate Dynamic Programming Algorithm . . . . . . . . . 6.6 Simulation Methods and Parameterized Policies . . . . . . . . . . . . . . . 6.7 Metaheuristic Optimization Methods . . . . . . . . . . . . . . . . . . . . . . . . . 6.8 Stochastic Approximation Methods . . . . . . . . . . . . . . . . . . . . . . . . . . 6.9 Perturbation Analysis Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.10 Ordinal Optimization Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.11 Summary and Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.1 Conclusions and Managerial Insight for ECR . . . . . . . . . . . . . . . . . . 7.2 Limitations and Further Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

ix

118 119 121 125 127 128 129 129 131 134 138 142 142 144 146 148 149 151 153 155 156 159 159 162 165

Chapter 1

Empty Equipment Logistics and Empty Container Repositioning (ECR)

Abstract This chapter first provides a general discussion of empty equipment logistics including empty freight vehicle redistribution, empty passage vehicle redistribution, empty bike repositioning, empty container chassis repositioning, and empty container repositioning problems. Secondly, the scale and importance of ECR in the global context are explained, and the main factors that affect ECR are discussed. Thirdly, the similarity and unique characteristics of ECR compared to other empty equipment repositioning problems are explained. Fourthly, the modeling techniques for ECR and the scope of the book are described. In this book, we mainly take the inventory control perspective to address the ECR logistics in regional transportation systems by explicitly considering the features such as demand imbalance over space, dynamic operations over time, uncertainty, and leasing activity. Finally, the structure of the book is described.

1.1 Empty Equipment Logistics Empty equipment logistics may be defined as efficient and effective positioning of a fleet of equipment to meet customer requirements over time. The word positioning implies the management decisions, which essentially adds space and time utility values to customers in the sense of providing empty equipment closer to customers and making them more readily available for customers’ use. The objectives of empty equipment logistics are implied in the phrases of efficient, effective, and meeting customer requirements, which represent the main logistics performance measures such as return-on-investment, cost, utilization, lead-time, reliability, and flexibility. Empty equipment logistics exist widely in practice. It arises when an operator allocates a fleet of equipment over space and time in order to anticipate and satisfy uncertain demands. Typical examples include: empty vehicle redistribution, empty bike repositioning, empty container chassis repositioning, and empty container repositioning. Empty vehicle redistribution can be classified into two broad types: freight vehicle allocation and passenger vehicle allocation. Freight vehicle refers to a motor vehicle designed for the carriage of goods. Passenger vehicle refers to a motor vehicle

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 D.-P. Song and J. Dong, Modelling Empty Container Repositioning Logistics, https://doi.org/10.1007/978-3-030-93383-8_1

1

2

1 Empty Equipment Logistics and Empty Container Repositioning (ECR)

(other than a motor cycle) designed for the carriage of passengers and their effects. In the following sections, brief descriptions of these examples are provided.

1.1.1 Empty Vehicle Redistribution: Freight Vehicle Empty freight vehicle allocation can be further divided into two groups: freight car (or train car, train wagon, train carriage) allocation and road truck (lorry, trailer) allocation. Freight vehicles usually travel relatively long distances between cities or railyards, which can be regarded as inter-city transport modes. Dejax and Crainic (1987) provided a literature review on empty freight vehicle allocation problems. There are several differences between freight car allocation and road truck allocation. First, freight cars are transported on railways between railyards while road truck is more flexible to move between depots. Second, freight cars are often repositioned in bulk while road truck is in a single unit. Third, the loaded cycle of a freight car is much longer and may exceed a week, while road trucks are utilized much more frequently. Therefore, their decision periods and planning horizons are quite different. In the research stream of freight car allocation, Jordan and Turnquist (1983) proposed a stochastic dynamic model for empty freight car allocation problem, where empty freight cars are allocated to different classification railyards over time to meet uncertain demands. Noting that the cost associated with car-handling operations at railyards depends on the number of car groups, Joborn et al. (2004) developed an optimization model for empty freight car allocation in a scheduled railway system that considered the effect of economies of scale. A tabu heuristic algorithm is used to solve the model. Hungerlander and Steininger (2019) investigated the joint fleet sizing and empty freight car allocation problem, where a time–space network is formulated into an integer linear programming model. In the research stream on-road truck allocation, Powell (1986) emphasized the importance of handling uncertain demands. For a truckload motor carrier, customer demand may represent a request of a truck (trailer) to move a load of freight from one city to another on a given day. The carrier has to anticipate the customer needs by ensuring that truck/trailers are available at the right places at the right time, which is the main purpose of logistics management. However, a carrier usually has little advance notice regarding future needs. For example, a truckload carrier may only know 40% of the loads at the beginning of a day and the remaining 60% are called in on the day randomly (Powell, 1986). Powell (1987) presented an operational planning model for the dynamic road truck allocation problems with uncertain demands, in which the implications of the future cost and revenue on the decisions of loaded/empty trailer dispatching are anticipated. The empty truck/trailer movements may also be caused by demand imbalance and affected by political events. For example, according to Lloyd Loading List, there were 40% of trailers traveled empty from the UK to the EU in January 2021 (post-Brexit), which was only 25% before Brexit (Todd & Waters, 2021).

1.1 Empty Equipment Logistics

3

1.1.2 Empty Vehicle Redistribution: Passenger Vehicle Empty vehicle redistribution for passenger vehicles predominantly focuses on personal rapid transit (PRT) systems or autonomous station-based taxi services, which are intra-city transport modes for passengers. A PRT system uses a fleet of driverless vehicles running on dedicated guideways to transport individuals or a small group of people between pairs of stations on demand (Lees-Miller, 2011). PRT vehicles normally wait for passengers at stations and depart immediately for the destination that passengers request. After reaching the destinations, the empty vehicles may stay at the station or the empty vehicles may be dispatched to other stations for expected passengers with high needs. PRT vehicle bears some similarity to conventional taxi as follows: (i) the PRT vehicles do not run on schedule. The system is demand-responsive; (ii) the PRT vehicle takes its passengers directly from origin station to destination station without stopping to pick up other passengers. It is a direct service. On the other hand, PRT vehicle differs from taxi in a few aspects including: (i) PRT vehicles are driverless and fully automatic; (ii) passengers can board and alight only at designated PRT stations in the service system; (iii) PRT vehicles run on dedicated guideways that are physically separated from pedestrians and normal road traffic (Lees-Miller, 2011). The empty vehicle redistribution problem in a PRT system is to determine when and where to reposition empty PRT vehicles in the system on a real-time basis. One of the key performance measures is the waiting time of passengers. The PRT vehicle redistribution problem may be tackled by two different strategies: a reactive strategy and a proactive strategy. Under the reactive strategy, empty vehicles are dispatched only in response to known requests; under the proactive strategy, empty vehicles are dispatched in anticipation of future requests (Lees-Miller & Wilson, 2012). Fatnassi et al. (2017) addressed the empty vehicle redistribution for PRT vehicles considering limited electric battery capacity. The redistribution decisions are made on a real-time basis by minimizing the set of empty vehicle movements. A simulation model is developed to evaluate a few dispatching strategies. Babicheva et al. (2019) investigated the joint optimization problem of empty vehicle redistribution and PRT vehicle fleet sizing considering passenger service and operator cost. A proactive strategy, termed index-based redistribution, is proposed. The index-based redistribution strategy calculates an index for each vehicle station, which is based on the information on waiting passengers, predicted near-future demands, and projected arrival of vehicles. The empty vehicle redistribution is then done according to the obtained index.

1.1.3 Empty Bike Repositioning Empty bike repositioning arises from the bike sharing systems (BSS), which is an intra-city transport mode for the public. Vogel (2016) provided a comprehensive

4

1 Empty Equipment Logistics and Empty Container Repositioning (ECR)

literature review on BSS. BSS has become a convenient and efficient transport mode to complement public transport in many cities. It was reported that BSS was available in over 1000 cities with a global fleet of 1.5 million shared bicycles in 2017 (Ghosh et al., 2017). The business objective of a BSS operator is to provide reliable service where users can easily find an available bike and find an available dock to return a bike after use (Legros, 2019). In the BSS, bikes are usually docked at stations, which are scattered within a city. Due to the uncertain demands from the users and the characteristics of the stations, some stations may accumulate bikes and some stations may become empty. For example, there may be more demands from higher altitude stations to lower altitude stations; the demand volume and pattern may change over the time of day, and may also depend on their locations, e.g., whether it is close to the public transport stations or shopping centers (Legros, 2019). As a result, the bike flows among stations are subject to stochasticity and imbalance. This gives rise to the necessity of repositioning empty bikes from surplus stations to deficit stations. Empty bike repositioning concerns the redistribution of empty bikes among stations so that there is an appropriate number of bikes and docks available at each station to satisfy user needs. The empty bike repositioning problems can be classified into two categories: static repositioning and dynamic repositioning. The static repositioning problem assumes nighttime operations when there is little demand or little variation of demand among stations; the dynamic repositioning problem assumes daytime operations on a real-time basis where the variation of real-time demands at stations are explicitly considered (Shui & Szeto, 2018). In Paris in 2017, the empty bike repositioning was performed by 23 trucks and two buses. Each truck has a capacity of carrying 20 bikes and is responsible for the repositioning of empty bikes within a sector. Each bus has a capacity of carrying 62 bikes. The buses travel longer distances between large docking stations following prespecified routes, which can be adjusted if needed (Legros, 2019). In both static and dynamic repositioning problems, there is a need to determine the optimal truck routes, the number of bikes, and the loading/unloading activities of each truck at stations by considering various constraints associated with the repositioning trucks, stations, user demand, and operational constraints. The majority of the literature has focused on static bike repositioning problems (e.g., Espegren et al., 2016; Huang et al., 2020; Vallez et al., 2021). Fewer studies have addressed dynamic bike repositioning problems (e.g., Legros, 2019; Zhang et al., 2021). This is probably due to the additional difficulty caused by the varying demand during the repositioning operations in the dynamic repositioning problems, which essentially implies that the routes and volume need to be rescheduled dynamically in response to the demand variations in real-time basis (Shui & Szeto, 2018). Studies on empty bike repositioning predominantly concentrate on dock-based bike sharing systems. A recent systematic review on empty bike repositioning in dock-based bike sharing systems can be found in Vallez et al. (2021). For the dockless bike sharing system, because bikes can be picked up and dropped off at any location within the city that allows bike parking, the empty bike repositioning problem is less well defined. Abstract stations may be defined based on the data about trip start and

1.1 Empty Equipment Logistics

5

end locations (Barabonkov et al., 2020). Hence, a data mining process is essential to tackle the empty bike repositioning in dockless BSS systems.

1.1.4 Empty Container Chassis Repositioning A container chassis is a wheeled metal frame to hold shipping containers for hauling over road by semi-tractor. It is equipped with twist locks, suspension, brakes, and lighting mechanism. Chassis are key links for container movements between terminals, depots, warehouses, and ports (https://container-xchange.com). Since 2012, most ocean carriers have given up the ownership of chassis. Instead, intermodal equipment providers and terminal operators take over the ownership and manage the container chassis fleet. For example, TRAC Intermodal is the largest provider of container chassis in North America, which operates 180,000 marine chassis over 11 pools of container chassis across the US (TRAC Intermodal, 2020). It has been recognized that poor chassis availability is one of the primary reasons that cause long truck delays at maritime container terminals (American Shipper, 2014). There are more than 100,000 chassis in circulation in the Los Angeles and Long Beach area. However, the mismatch between where they are located and where they are needed is the main reason that leads to poor chassis availability. This gives rise to the problem of chassis repositioning including other associated decisions such as selecting chassis pool locations, arranging maintenance procedures, balancing the flows, and coordinating operations among stakeholders. One solution that has been implemented in the US container ports is to establish a gray chassis pool that makes chassis inter-operable within existing equipment pools supplied by different chassis providers. That means truckers can pick up and return chassis at any pool in the port, and therefore save the time that is previously required to retrieve and return chassis to a particular terminal or provider (American Shipper, 2014). Dekker et al. (2013) analyze the concept of chassis exchange terminal (CET) in the context of Maasvlakte container terminals in Rotterdam. The CET is an offdock container terminal. It has two main functions: chassis exchange service for truckers, and shuttle service between CET and seaport terminals. Chassis exchange service takes place during daytime when truckers deliver a chassis with an export container and collect a chassis with an import container at CET. Shuttle service takes place during off-peak nighttime to transfer export/import containers to/from seaport terminals. The advantages of the use of CET include (i) leveling out the workload and reducing congestion at seaport terminals; (ii) better matching import flows with export flows so that chassis flows are balanced and empty trips can be reduced. The executive director of the Port of Los Angeles proposed the concept of moving some chassis pools from near-dock yards to off-dock yards, which would free up dozens of acres inside the terminals occupied by the chassis storage and reduce congestion to benefit every stakeholder. In fact, many stakeholders, including intermodal equipment providers, truckers, ocean carriers, and terminal operators, were supportive of the idea of moving chassis off-dock according to the survey (Mongelluzzo, 2019).

6

1 Empty Equipment Logistics and Empty Container Repositioning (ECR)

Nevertheless, further studies are required to elaborate the logistics questions such as where chassis yards should be located, how those locations will affect traffic flows nearby terminals, how the chassis fleet size and repositioning will be affected, how much new infrastructure will be needed, and what are the impacts to different stakeholders. A lot of work needs to be done before its actual implementation. Ng and Talley (2017) took the inventory control viewpoint of chassis repositioning problem at a single maritime container terminal. They use Markov chain theory to model a neutral chassis pool to address various planning questions such as sizing the chassis yard, assessing the impact of chassis repositioning contracts with truckers, and evaluating the chassis inventory control policies. The model is able to capture the complex and stochastic dynamics of the chassis operations in practice. Ng (2021) further investigated two strategies of chassis management. The first strategy concerns the chassis repositioning decision faced by chassis pool operators where only partial information of probability distribution of chassis demand is known. Lower and upper bounds of the optimal cost are established. The second strategy concerns the chassis yard consolidation, which can reduce the variance in chassis demand and lead to more accurate demand forecasts. It is stated that the empty chassis repositioning decision in practice is largely manual and subjective at present (Ng, 2021).

1.1.5 Empty Container Repositioning Empty container repositioning (ECR) concerns the efficient and effective management of flow and storage of empty containers and associated information in the container transport networks to better meet customer demands (Song, 2021). Container shipping has experienced rapid development in the last two decades. It carries over 50% of seaborne world trade by value (Lee & Song, 2017). According to Drewry Maritime Research, there were over 37 million twenty-equivalent units (TEUs) of shipping containers in the world in 2018 (www.iicl.org). They are mainly owned by shipping companies and container lessors with roughly 50:50 split. This indicates that container leasing is a huge business operation in the container logistics industry. The scale and importance of ECR in the global transport context have been well recognized. From the economic aspect, empty container logistics incur a wide range of costs such as cleaning, repairing, and storage at inland depots or port terminals; handling operations at depots, rail terminals, and maritime terminals; inland transportation by road or rail or barge; and seaborne transportation by vessels. Song et al. (2005) modeled the global container seaborne transportation and stated that the cost of repositioning empties in seaborne transport networks was about US$15 billion. The annual Review of Maritime Transport by the United Nations Conference on Trade and Development (UNCTD) in 2011 reported that the cost of ECR was about $20 billion for seaborne transportation in 2009 and $10 billion for landside transportation (Song, 2021). It is clear that the total cost associated with ECR is substantial and has a substantial implication on the profitability of shipping companies. It also

1.1 Empty Equipment Logistics

7

Table 1.1 Trade demands in million TEUs in three major shipping routes Year

NEM-EA

EA-NEM

EA-NA

NA-EA

NA-NEM

NEM-NA

2014

6.3

15.5

16.2

7.0

2.8

3.9

2015

6.4

15.0

17.4

6.9

2.7

4.1

2016

6.8

15.3

18.2

7.3

2.7

4.3

2017

7.1

16.4

19.4

7.4

3.0

4.6

2018

7.0

17.3

20.8

6.8

3.1

4.9

2019

7.2

17.5

20.0

7.0

2.9

4.9

2020

6.9

16.1

18.1

7.0

2.8

4.7

has a significant impact on the congestion at ports and on the environmental emission in the logistics sector. For example, in the second half of 2020, major container ports in the UK and on the west coast of the US experienced severe congestion because container yards were piled up with empty containers to be repositioned to Asian ports. In general, container ports in Europe and North America have a surplus of empty containers, whereas Asian ports are facing deficit of last empty containers to meet export demands. Based on the data in the annual reviews published by UNCTD (UNCTD, 2020), Table 1.1 gives the containerized trade volumes in million TEUs in three major shipping routes (Trans-Pacific, Asia-Europe, and Transatlantic) between East Asia (EA), North America (NA), and Norther Europe and Mediterranean (NEM) in the last few years. From Table 1.1, it can be seen that the annual containerized trade from East Asia to North Europe and Mediterranean or from East Asia to North America has been more than double the volume in the opposite direction for each of the last seven years. In the Transatlantic route, there has also been a significant trade imbalance. The ECR in inland transport networks is even more severe. Braekers et al. (2011) reported that the percentage of empty containers out of all transported containers in inland transport networks was more than 40%. This may be explained by the fact that whenever a laden container is unpacked at customers’ warehouse, it is often moved to ports or inland depots, which means that a laden container trip is usually accompanied by an empty container trip. The point is that there is a large amount of empty container movements in the global and regional transport networks. More details about the reasons that cause ECR will be further explained in the next section.

1.2 ECR: Reasons and Characteristics The main factors that affect ECR include the trade imbalance, dynamic operations, uncertainties, seasonality, type of containers, lack of visibility in the hinterland transport, container manufacturing, and leasing cost, and ocean carrier’ operational and strategic practices (Song, 2021; Song & Carter, 2009; Song & Dong, 2015).

8

1 Empty Equipment Logistics and Empty Container Repositioning (ECR)

• Containerized cargo trade imbalance is the fundamental reason for ECR in the international context. Table 1.1 demonstrated that containers ports in Europe and North America tend to accumulate empty containers. In fact, more than 50% of those empty containers will be repositioned back to East Asia for future reuse. • Dynamic operation is a natural characteristic of any freight transportation because cargos and vehicles are traveling over space and time to meet transport demands. When a laden container becomes an empty container at the destination point, it is unlikely that it can be used to meet next transport demand near the destination point at that particular time. Normally, empty containers will be transported to port or depot for storage and wait for next customer demand locally or be repositioned to other ports internationally. This causes both regional and global ECR. • Uncertainty is another natural characteristic of freight transportation, which represents the unpredictable factors in the system, e.g., equipment breakdown, waiting time, bad weather, and demand variation. In particular, demand uncertainty is probably the most important type of uncertainty in transportation systems. Customer demands cannot be predicted accurately in container shipping partially due to the fierce competition and the un-commitment between shipping companies and shippers. To buffer against the uncertainty in demands, shipping companies have to maintain certain safety stocks of empty containers at ports and depots and allocate and redistribute empty containers over the entire shipping network appropriately. • Seasonality is different from uncertainty because it is known to demand variation over time, e.g., Christmas, Easter, and Chinese New Year. To accommodate the surge of largely expected demands, empty containers have to be repositioned into the areas before the peak season. Some empirical studies have examined the impact of seasonality on the flow of empty containers, e.g., in the Baltic Sea Region (Wolff et al., 2011) and Turkish terminals (Basarici & Satir, 2019). • Different types of containers can affect the ECR. Although over 80% of container fleet are dry containers, there are other types of containers that are suitable for carrying specific types of commodities, e.g., reefer container and tank container. The shortage of one type of container cannot be substituted by the surplus of other types. For example, reefer containers are mostly required by countries in Africa for exporting perishable food produce. On the other hand, African countries import a lot of manufactured goods using standard dry containers. The mismatch of container types necessitates ECR. Containers also differ in their dimensions, e.g., 20-, 40-ft, and high-cube. Shippers may prefer one type of dimension to others. For example, Scotland imports a lot of goods using high-cube containers but needs standard 20- or 40-ft containers to export Whiskey to other countries. Containers further differ in ownership. Different shipping companies are not willing to share their container fleet because they are competitors. • The lack of visibility in the hinterland transport is due to the fact that shipping companies may only offer port-to-port service or subcontract the inland transport to other companies. As a result, shipping companies have no information about the status of the containers during inland transportation, which prevents shipping companies from managing their container fleet effectively.

1.2 ECR: Reasons and Characteristics

9

• Shipping companies and leasing companies can purchase new containers from manufacturers in the deficit area if the steel price is sufficiently low. This may reduce the need to reposition empty containers. On the other hand, higher manufacturing or leasing costs would encourage more empty repositioning activities. • Shipping companies’ strategies and operational practices will affect empty container movements, e.g., fleet sizing, container leasing, canvassing strategy, laden container routing, and slow steaming. Ineffective practices would incur unnecessary empty container movements. It is not unusual that empty containers may be re-repositioned in order to give away slots to high priority laden containers due to the vessel capacity constraint. Shipping companies may purposely price the trade demand to reduce the degree of imbalance or redeem the cost of ECR (Chen et al., 2016). In the recent decade, almost all shipping companies have adopted slow steaming practices. However, slow steaming implies that a larger container fleet or more efficient ECR is needed because more containers are tied with cargo at sea. The point is that some strategies may have a side effect on the ECR problem. ECR bears the similarity to other empty equipment repositioning problems in terms of allocating the empty equipment over space and over time to satisfy external uncertain demands. However, ECR has a number of important characteristics that differ from other types of empty equipment repositioning problems, e.g. • First, ECR exists not only at the global/international level but also at the regional/local level. More importantly, these two levels are closely related and interact with each other. Other equipment repositioning problems are often limited to regional or local levels. • Second, empty container movements can be in a single unit (e.g., via trucks) or in bulk (e.g., via vessels or trains). The transport routes may be fixed and regular (e.g., rail services and liner shipping services) or flexible and on demand (e.g., trucks). Other equipment repositioning problems often have a single transport mode with more specific features. • Third, for ECR systems, empty containers are physically transferred with external customers. Note that individual customers are difficult to model explicitly due to their dispersed locations. Empty containers may exit the system and reenter the system in a random way. As a result, it is not a closed system in terms of the container fleet. For other equipment repositioning systems, the equipment in the fleet is kept within the transportation system. • Fourth, ECR systems face more and higher degrees of uncertainties than other empty equipment repositioning problems. Firstly, there exists supply uncertainty and demand uncertainty regarding empty containers entering and exiting the system. This is because when a container (either empty or laden) is dispatched to a shipper, it is often beyond the control of the shipping company in terms of when it will be returned. Secondly, the demand for laden container movements within the transport system is uncertain because of the randomness of customers’ requests. Thirdly, the transit time of laden or empty containers is subject to uncertainty due

10

1 Empty Equipment Logistics and Empty Container Repositioning (ECR)

to the factors such as equipment breakdown, human errors, port congestion, labor strikes, and bad weather. • Fifth, leasing activity is an important and highly related decision to ECR since roughly 50% of container fleets in the world are owned by container lessors. For other empty equipment repositioning problems, leasing is often not available or in neglectable scale. In addition, solving the global ECR problems may require the collaboration of many stakeholders in the shipping supply chain such as ocean carriers, port authorities, terminal operators, freight forwarders, rail operators, truckers, depot operators, and shippers (Song, 2021).

1.3 Modeling ECR Logistics There are a few review papers on ECR models. For example, Braekers et al. (2011) classified the ECR models into three planning levels, i.e., strategic, tactical, and operational levels. Khakbaz and Bhattacharjya (2014) classified the maritime ECR literature according to the subjects such as engineering, management, transport, and logistics. Song and Dong (2015) classified the ECR models according to the modeling techniques into two broad categories: network flow models and inventory control models. From the discussions of general empty equipment repositioning problems and the characteristics of ECR problems, it is believed that ECR models should cover the features such as imbalance of transport demands, dynamic operations, uncertainty, and leasing activity. The imbalanced transport demand can be easily incorporated by input data and the leasing activity can also be incorporated relatively easily. Dynamic operations imply that we are making sequential decisions over time and uncertainty implies that we have to anticipate the impact of future unpredictable factors. We treat the uncertainty as stochasticity that can be represented by random variables over time or stochastic process. For this type of optimization problem, stochastic dynamic programming or multi-stage stochastic programming is an appropriate modeling approach. Stochastic programming is often solved using scenario-based methods. However, there are some concerns about the methods. First, scenario-based methods often select a rather limited number of scenarios, which is not sufficient to represent the randomness of the system, especially with continuous random variables. Second, stochastic programming does not yield a policy to control empty containers. Instead, the solutions are in the format of scenario trees, which may not be applicable directly due to the random events in the system. Third, the scenario trees in multi-stage stochastic programming suffer from its own curse of dimensionality, which is even worse than the one by traditional stochastic dynamic programs because the entire history of the scenario trees has to be captured. Fourth, there are no well-developed solution methods to solve general multi-stage stochastic programming models (Powell, 2014). In addition, the underlying logic of stochastic programming models is often hidden from the operators who actually manage the container fleet (Du & Hall, 1997).

1.3 Modeling ECR Logistics

11

This book will mainly adopt the stochastic dynamic programming approach to tackle the ECR problems for the reasons below. First, stochastic dynamic programming offers an algorithmic strategy to characterize the optimal policy, which naturally captures the sequential decision-making phenomenon in anticipation of uncertainties over time (Powell, 2014). Second, we attempt to establish the structural properties of the optimal ECR policies in relatively simple transportation systems, and then utilize these structural properties to construct threshold-type ECR policies for more complicated transportation systems. Third, the threshold-type ECR policies would use the rules that resemble the (s, S) or (s, Q) policy in inventory control theory to manage the empty containers. It has the advantages of being decentralized, easy to understand, easy to operate, quick response to random events, and minimal online computation and communication (Du & Hall, 1997). Fourth, there have been several sophisticated techniques (such as approximate dynamic programming) that could be applied to tackle the curse of dimensionality when solving the high-dimensional stochastic dynamic programming models. In this book, the scope of ECR problems is limited with regional or local inlanddepot systems where empty containers could be repositioned either in a single unit or in bulk quantity. We mainly take the inventory control perspective to address the ECR logistics in regional transportation systems. The rationale to take the inventory control perspective to manage empty containers can be justified by industrial practices. For example, Song and Dong (2008) stated that some European shipping lines tend to hold empty containers in European ports for up to one month in order to match demands before they are repositioned to the Far East ports empty. On the other hand, some Asian shipping lines tend to reposition empty containers from European ports to Far East ports as soon as they are available. In this case, the time length for the empty containers kept in European ports can be regarded as the container inventory information. Epstein et al. (2012) developed an inventory model to determine the safety stock of empty containers required at each location for a large shipping company. Significant economic benefit was gained. Another example is the concept of the container available index (CAI), which was proposed by the company Xchange (https://container-xchange.com). Xchange works with over 500 shipping companies. The CAI is defined as the ratio between the surplus and the shortage of containers per location using the demands and stocks information from the past two years. It takes a value between 0 and 1. If CAI takes a value of 1, it means the location has a sufficient surplus of empty containers. If CAI takes a value of 0, it means the location has a shortage of empty containers. Therefore, the CAI can be interpreted as the inventory information of empty containers at a specific location to trigger the ECR decisions.

1.4 Structure of the Book This book consists of seven chapters. In this chapter, an introduction to general empty equipment logistics is provided. This includes a range of popular empty equipment

12

1 Empty Equipment Logistics and Empty Container Repositioning (ECR)

management such as empty freight vehicle redistribution, empty passenger vehicle redistribution, empty bike repositioning, empty container chassis repositioning, and ECR problems. Regarding ECR, the scale and significance of ECR are explained from the global transport context and from the regional context, and the main determinants causing ECR are described. The commonalities and unique characteristics of ECR are discussed in comparison with other empty equipment repositioning problems. The modeling techniques for ECR and the scope of this book are then explained and justified. A flow chart is provided to illustrate the structure of the book. In Chap. 2, we consider the ECR problem in a single depot facing random demand and supply. Two situations are investigated. Firstly, we consider the discrete-time sequential ECR decision-making situation. We formulate the problem as a discretetime stochastic dynamic programming model. We prove that the optimal ECR policy can be characterized by two threshold parameters at each period in the form of (s, S) inventory control. Secondly, we consider the continuous-time continuous-state sequential ECR decision-making situation. Initially, it is assumed that the demand process follows a two-state Markov process. The problem is formulated into the optimal control problem using stochastic dynamic programming. A fluid-flow model is used to characterize the underlying dynamics and stochasticity of the system. The qualitative structural properties of the value functions and the optimal control policies are established. The Hamilton–Jacobi-Bellman (HJB) equations are solved analytically, which leads to the closed-forms of the value functions. The fluid-flow model is then extended to the situations with multi-state Markov demand process. Numerical examples are given to illustrate the results. In Chap. 3, we consider a two-depot system, in which each depot faces independent supply and demand of empty containers from shippers. The inventory of empty containers is reviewed periodically. The purpose is to seek the optimal ECR policy between two depots over a multi-period planning horizon to minimize the total expected cost consisting of empty container transport costs, inventory holding costs, and container leasing costs. The problem is formulated as a stochastic dynamic programming model. The local properties of the value function such as the first and the second derivatives on a region-wise basis are analyzed. The region-wise properties of the value function are used to establish the structural characteristics of the optimal ECR policy over multiple time periods. Specifically, the entire state space is divided into three control regions by two monotonic switching curves. The asymptotic behaviors of the switching curves are analyzed. The structural properties of the optimal ECR policy and the asymptotic behaviors of the switching curves are then used to construct simple near-optimal and easy-to-operate policies. Numerical examples are provided to demonstrate the analytical results. In Chap. 4, we address the ECR problem in a two-depot shuttle service system over an infinite time horizon with the focus of deriving optimal stationary ECR policies. The inventory of empty containers is reviewed continuously. Empty containers are required at each depot to meet random customer demands that derive laden container movements between two depots. Customer demands must be satisfied by either owned empty containers or leasing from lessors. The system is based on continuous review and discrete state, where the system state represents the inventory level of

1.4 Structure of the Book

13

containers at two depots. An event-driven model is formulated, where the ECR decisions are made at each event when the system state changes. Under the assumption of Poisson arrival process of laden containers and exponential distribution of empty container transfer times, the continuous-time Markov decision process is converted into an equivalent discrete-time Markov decision process by using the uniformization technique. It is shown that the optimal ECR policy is a threshold policy, characterized by two control parameters, in both the discounted cost and the long-run average cost cases. The closed-form of the optimal discounted cost function is obtained by using the characteristic equation method. The closed-form of the optimal long-run average cost function is obtained by calculating the stationary distribution under the threshold control policy. The models are extended to the cases with external supply and demand of empty containers at both depots, where empty containers may exit and enter the two-depot shuttle system randomly. In Chap. 5, we consider the ECR problem in a hub-and-spoke transportation system over an infinite time horizon. Similar to the methodology in Chap. 4, we take the perspective of continuous review and discrete state to formulate an eventdriven Markov decision model. The empty repositioning decisions are made at each epoch when the system state changes. To overcome the computational complexity of the stochastic dynamic programming model, a dynamic decomposition procedure is presented, whose computational complexity is linear in the number of spokes and can be calculated offline. The requirement for online calculation and data communication is very minor. The structure of the dynamic decomposition policy is analyzed and it is shown that the dynamic decomposition policy has the same asymptotic behaviors as the optimal ECR policy. The proposed dynamic decomposition procedure can be applied to both discounted cost and long-run average cost cases. Numerical experiments demonstrate the effectiveness of the dynamic decomposition policy and its robustness against the assumption of the distribution types in terms of the laden container arrivals and the empty container transfer times. The model is then extended to the cases with external supply and demand of empty containers at all depots, where empty containers may exit and enter the system randomly. In Chap. 6, we first consider the optimal ECR problems for general inland transportation systems with multiple interconnected depots over multiple time periods. On the one hand, there are laden and empty container flows between depots. On the other hand, each depot is facing external supply and demand of empty containers, which means empty containers may enter or exit the system at each de-pot. Periodic review decision-making scheme is assumed. Three optimization models are formulated for three different systems including (i) a multi-depot transportation system without transfer seaports; (ii) a multi-depot transportation system with transfer seaports; (iii) an intermodal multi-depot transportation system with transfer seaports. To solve the optimization problems, a range of optimization methods are presented. This includes the approximate dynamic programming method, simulation methods, metaheuristic optimization methods, stochastic approximation methods, perturbation analysis methods, and ordinal optimization methods. The relative advantages and disadvantages between these methods are explained.

14

1 Empty Equipment Logistics and Empty Container Repositioning (ECR)

Chapter 1

Chapter 2

Chapter 3

Chapter 4

Chapter 6

Chapter 5

Chapter 7 Fig. 1.1 Flowchart of chapter relationships

In Chap. 7, we summarize the main findings and highlights the managerial insights to assist dynamic ECR decision making in regional container transportation system subject to uncertainty. The limitations and further research opportunities are discussed. We conclude this chapter with a flowchart in Fig. 1.1 to explain the relationships of these chapters.

References American Shipper (2014). Port of Long Beach to acquire emergency chassis fleet, freightwaves, October 15, 2014. Babicheva, T., Burghout, W., Andreasson, I., & Faul, N. (2019). Empty vehicle redistribution and fleet size in autonomous taxi systems. IET Intelligent Transport Systems, 13(4), 677–682. Barabonkov, D., D’Alonzo, S., Pierre, J., Kondor, D., Zhang, X., & Tien, M. A. (2020). Simulating and evaluating rebalancing strategies for dockless bike-sharing systems. arXiv:2004.11565v1. https://arxiv.org/pdf/2004.11565.pdf Basarici, A., & Satir, T. (2019). Empty container movements arising from cargo seasonality: Turkish terminals. Maritime Business Review, 4(3), 238–255. Braekers, K., Janssens, G. H., & Caris, A. (2011). Challenges in managing empty container movements at multiple planning levels. Transport Review, 31(6), 681–708. Chen, R. Y., Dong, J. X., & Lee, C. Y. (2016). Pricing in a shipping market with waste shipments and empty container repositioning. Transportation Research Part B, 85, 32–55. Dejax, P., & Crainic, T. (1987). A review of empty flows and fleet management models in freight transportation. Transportation Science, 21(4), 227–247.

References

15

Dekker, R., Heide, S., Asperen, E., & Ypsilantis, P. (2013). A chassis exchange terminal to reduce truck congestion at container terminals. Flexible Services and Manufacturing Journal., 25(4), 528–543. Du, Y., & Hall, R. (1997). Fleet sizing and empty equipment redistribution for center-terminal transportation networks. Management Science, 42(2), 145–157. Epstein, R., Neely, A., Weintraub, A., Valenzuela, F., Hurtado, S., González, G., Beiza, A., Naveas, M., Infante, F., Alarcón, F., Angulo, G., Berner, C., Catalán, J., González, C., & Yung, D. (2012). A strategic empty container logistics optimization in a major shipping company. Interfaces, 42(1), 5–16. Espegren, H. M., Kristianslund, J., Andersson, H., & Fagerholt, K. (2016). The static bicycle repositioning problem—Literature survey and new formulation. In A. Paias, M. Ruthmair & S. Voß (Eds.), Computational Logistics. ICCL 2016. Lecture Notes in Computer Science, Vol. 9855. Springer, Cham. Fatnassi, E., Chebbi, O., & Chaouachi, J. (2017). Dealing with the empty vehicle movements in personal rapid transit system with batteries constraints in a dynamic context. Journal of Advanced Transportation, 2017, 8512728. Ghosh, S., Varakantham, P., Adulyasak, Y., & Jaillet, P. (2017). Dynamic repositioning to reduce lost demand in bike sharing systems. Journal of Artificial Intelligence Research, 58, 387–430. Huang, D., Chen, X., Liu, Z., Lyu, C., Wang, S., & Chen, X. (2020). A static bike repositioning model in a hub-and-spoke network framework. Transportation Research Part E, 141, 102031. Hungerlander, P., & Steininger, S. (2019). Fleet sizing and empty freight car allocation. In B. Fortz & M. Labbe (Eds.), Operations research proceedings 2018. Springer, Cham. Joborn, M., Crainic, T. G., Gendreau, M., Holmberg, K., & Lundgren, J. T. (2004). Economies of scale in empty freight car distribution in scheduled railways. Transportation Science, 38(2), 121–134. Jordan, W. C., & Turnquist, M. A. (1983). A stochastic dynamic model for railroad car distribution. Transportation Science, 17, 123–145. Khakbaz, H., & Bhattacharjya, J. (2014). Maritime empty container repositioning: A problem review. International Journal of Strategic Decision Sciences, 5(1), 1–23. Lee, C. Y., & Song, D. P. (2017). Ocean container transport in global supply chains: Overview and research opportunities. Transportation Research Part B, 95, 442–474. Lees-Miller, J. D. (2011). Empty vehicle redistribution for personal rapid transit, PhD thesis, University of Bristol. Lees-Miller, J. D., & Wilson, R. E. (2012). Proactive empty vehicle redistribution for personal rapid transit and taxis. Transportation Planning and Technology, 35(1), 17–30. Legros, B. (2019). Dynamic repositioning strategy in a bike-sharing system; how to prioritize and how to rebalance a bike station. European Journal of Operational Research, 272, 740–753. Mongelluzzo, B. (2019). LA-LB ports say off-dock chassis storage yards essential. Journal of Commerce. Ng, M. W. (2021). Strategies for chassis dislocation management at container ports: Repositioning and yard consolidation. Transportation Research Part C, 124, 102782. Ng, M. W., & Talley, W. K. (2017). Chassis inventory management at U.S. container ports: Modelling and case study. International Journal of Production Research, 55(18), 5394–5404. Powell, W. B. (1986). A stochastic model of the dynamic vehicle allocation problem. Transportation Science, 20, 117–129. Powell, W. B. (1987). An operational planning model for the dynamic vehicle allocation problem with uncertain demands. Transportation Research Part B, 21, 217–232. Powell, W. B. (2014). Clearing the jungle of stochastic optimization. In Tutorials in operations research (pp.109–137). INFORMS. Shui, C. S., & Szeto, W. Y. (2018). Dynamic green bike repositioning problem—A hybrid rolling horizon artificial bee colony algorithm approach. Transportation Research Part D, 60, 119–136. Song, D. (2021). Container logistics and maritime transport. Routledge.

16

1 Empty Equipment Logistics and Empty Container Repositioning (ECR)

Song, D. P., & Carter, J. (2009). Empty container repositioning in shipping industry. Maritime Policy & Management, 36(4), 291–307. Song, D. P., & Dong, J. X. (2008). Empty container management in cyclic shipping routes. Maritime Economics & Logistics, 10(4), 335–361. Song, D. P., & Dong, J. X. (2015). Empty container repositioning. In C. Y. Lee & Q. Meng (Eds.), Handbook of ocean container transport logistics—Making global supply chain effective (pp. 163– 208). Springer. Song, D. P., Zhang, J., Carter, J., Field, T., Marshall, M., Polak, J., Schumacher, K., Sinha-Ray, P., & Woods, J. (2005). On cost-efficiency of the global container shipping network. Maritime Policy & Management, 32(1), 15–30. Todd, S., & Waters, W. (2021). Border issues continue to hinder UK-EU freight flows, Lloyds Loading List, Tuesday, March 23, 2021. TRAC Intermodal (2020). TRAC intermodal releases case study on port of NY/NJ chassis pool model success during record port growth, PR Newswire. December 4, 2020. UNCTD. (2020). Review of maritime transport. United Nations Publication. Vallez, C. M., Castro, M., & Contreras, D. (2021). Challenges and opportunities in dock-based bike-sharing rebalancing: A systematic review. Sustainability, 13, 1829. Vogel, P. (2016). Service network design of bike sharing systems: Analysis and optimization. Springer. Wolff, J., Herz, N., & Flamig, H. (2011). Report on empty container management in the Baltic Sea region: Experiences and solutions from a multi-actor perspective, Hamburg University of Technology, The Baltic Sea region programme 2007–2013. Zhang, J., Meng, M., Wong, Y. D., Ieromonachoub, P., & Wang, D. Z. W. (2021). A data-driven dynamic repositioning model in bicycle-sharing systems. International Journal of Production Economics, 231, 107909.

Chapter 2

Optimal ECR Policy in a Single-Depot System

Abstract This chapter considers the ECR problem in a single depot such as an inland terminal or a seaport facing random demand and supply. We seek the optimal dynamic ECR policy over a planning horizon to minimize the expected cost consisting of container repositioning costs, inventory holding costs, and container leasing costs. Two situations are investigated. First, we consider the discrete-time sequential ECR decision-making situation. We formulate the problem as a discrete-time stochastic dynamic programming model. We prove that the optimal ECR policy can be characterized by two threshold parameters at each period in the form of (s, S) inventory control. Second, we consider the continuous-time continuous-state sequential ECR decision-making situation. Under the assumption of a two-state Markov demand process, we derive the closed-form solution to the optimal ECR problem. We use a fluid-flow model to characterize the underlying dynamics and stochasticity of the system. The qualitative structural properties of the value functions and the optimal control policies are established. The Hamilton–Jacobi-Bellman (HJB) equations are solved analytically, which leads to the closed-forms of the value functions. The fluidflow model is then extended to the situations with multiple-state Markov demand process. Numerical examples are given to illustrate the results.

2.1 Introduction In the container logistics chains, empty containers are stored at various nodes in global transport network. These nodes represent warehouses, distribution centers, inland depots, intermodal terminals, dry ports, and seaports. Each node can be regarded as a control system, in which the inventory level is the output, the repositioning in/out decision is the control input, and the random demand and supply is a perturbation. We use the term, depot, to represent the node in the container logistics chains. A depot is a key component in container logistics chains, where both laden containers and empty containers are moving into and out over time. Functionally, a depot can be regarded as a temporary storage point where containers flow in and flow out. It should be noted that laden container movements are largely determined

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 D.-P. Song and J. Dong, Modelling Empty Container Repositioning Logistics, https://doi.org/10.1007/978-3-030-93383-8_2

17

18

2 Optimal ECR Policy in a Single-Depot System

exogenously by customer demands, whereas empty container movements are determined endogenously by transport companies. The uncertainty of laden container movements (including their uncertain return as empty containers after use) and the imbalance of laden container flows necessitate the appropriate management of empty container flows in order to improve the performance of the transport system. This chapter considers two major parts dealing with two scenarios respectively. The first part investigates the discrete-time sequential ECR decision-making situation. We formulate the ECR problem in a single depot facing uncertain demand and supply over multiple planning periods as a discrete-time stochastic dynamic programming model. We prove that the optimal ECR policy can be characterized by two threshold parameters at each period like a (s, S) inventory control. The threshold value at the final period can be obtained in the closed form. The second part investigates the continuous-time continuous-state sequential ECR decision-making situation. Under the assumption that the single depot is facing a twostate Markov demand process, we obtain the closed-form solution to the optimal ECR problem (Song & Zhang, 2010). We use a fluid-flow model based on stochastic dynamic programming to characterize the underlying dynamics and stochasticity of the system. The qualitative structural properties of the value functions and the optimal control policies are established. The Hamilton–Jacobi-Bellman (HJB) equations are solved analytically, which leads to the closed-forms of the value functions. The fluidflow model is then extended to the situations with multiple-state Markov demand process. The rest of this chapter is organized as follows. In Sect. 2.2, a discrete-time stochastic dynamic programming model is formulated and analyzed for the singledepot ECR problem with random demand and supply over multiple periods. In Sect. 2.3, a fluid-flow model based on continuous-time dynamic programming is developed to model the single-depot ECR problem facing a Markov demand process. The closed-forms of the value functions are obtained. In Sect. 2.4, we provide a summary and notes in relation to the literature.

2.2 A Discrete Stochastic Dynamic Programming Model Consider a single depot or port concerning ECR in or out over a dynamic multi-period planning horizon facing uncertain customer demand and empty container supply as shown in Fig. 2.1. We make the following assumptions: (i) there is a single type of container under consideration, i.e., twenty-foot equivalent unit (TEU). A forty-foot container can be regarded as two TEUs; (ii) all customer demands must be satisfied. If owned containers are insufficient, extra empty containers can be leased from lessors to meet demands; (iii) laden containers become empty and reusable immediately after arrivals at the depot.

2.2 A Discrete Stochastic Dynamic Programming Model Fig. 2.1 A two-depot system

19

supply

Depot

Empty container decision

Demand

The following notations are introduced. n N xn − 1 un

ZI ZO z zn f (.) F(.) Ch Cl C e,1 C e,2

a discrete decision period; the length of the planning horizon; the inventory level of empty containers at depot at the beginning of period n, which can be negative (indicating the leased containers); the number of empty containers repositioned out of the depot in period n, which is a decision variable, a negative un indicates the number of containers repositioned into the depot; the random supply of containers into depot in a period; the random demand of containers out of depot in a period; the random variable representing the net number of containers in a period into depot, which is defined as: z := ZI − ZO; the net number of containers into depot at period n, which is a sample of z; the probability distribution function (pdf) of random variable z; the cumulative distribution function (cdf) of random variable z; the container holding cost per unit per period; the container leasing cost per unit per period; the cost of reposition-out one empty container; the cost of reposition-in one empty container;

Assumption 2.1 C e,1 < C h and C e,2 < C l . Assumption 2.2 The pdf of the random variable z is continuous and satisfies f (.) > 0. Assumption 2.1 indicates that the transportation cost of repositioning out one container is less than the unit holding cost, and the cost of repositioning in one unit is less than the unit leasing cost. This is reasonable and makes the problem physically nontrivial. Otherwise, there is no need to reposition empty containers in or out. Assumption 2.2 will make our later analysis rigorous. The normal distribution satisfies Assumption 2.2. Moreover, the main results such as the structural properties of the optimal policy are preserved for most general distributions. The system state, i.e., the inventory level at the depot, in period n is determined by

20

2 Optimal ECR Policy in a Single-Depot System

xn = xn−1 + z n − u n The problem is to find the optimal dynamic ECR policy that minimizes the following finite horizon cost function with the initial state x 0, N 

  − + − α n E Ce,1 u + n + C e,2 u n + C h x n + Cl x n |x 0 ,

n=1

where x + = max{0, x} and x − = max{−x, 0}; α is a discount factor (0 < α ≤ 1). Let V n (x n-1 ) be the expected discounted cost from period n to N. The backward Bellman optimality equation is given by   − + − Vn (xn−1 ) = min Ce,1 u + n + C e,2 u n + C h E x n + C l E x n + α E Vn+1 (x n ) , un

where V N+1 (x) ≡ 0 for all x. To simplify the narrative, let L n (xn−1 , u n ) : = Ch E xn+ + Cl E xn− + α E Vn+1 (xn ) − G n (xn−1 , u n ) : = Ce,1 u + n + C e,2 u n + L n (x n−1 , u n ).

Hence, the problem is to seek the solution {un | 1 ≤ n ≤ N} to the Bellman equation V n (x n-1 ) = min{Gn (x n-1 , un )} from period 1 to period N. With a slight misuse of the un

notation and further simplification of the exposition, we drop the subscript n in the system state and control decision. The Bellman optimality equation is given by, Vn (x) = min{G n (x, u)},

(2.1)

G n (x, u) = Ce,1 u + + Ce,2 u − + L n (x, u),

(2.2)

un

L n (x, u) = Ch E(x + z − u)+ + Cl E(x + z − u)− + α E Vn+1 (x + z − u). (2.3) The above Bellman optimality equations can be solved using traditional value iteration or policy iteration algorithms numerically. However, the obtained ECR policies are implicit and not easy to understand from the manager’s perspective. In the following, we will establish the explicit structural properties of the optimal ECR policy. The main technique is the backward induction approach. We first investigate the final period’s value function V N (x) and the structural properties of the optimal control and then move backwards to other periods.

2.2 A Discrete Stochastic Dynamic Programming Model

21

2.2.1 The Structural Properties of the Value Function and the Optimal Control at Period N From V N+1 (x) ≡ 0 for all x, we have  L N (x, u) = Ch + Cl

+∞

(x + z − u) f (z)dz

u−x  u−x −∞

−(x + z − u) f (z)dz

∂ L N (x, u)/∂u = (Ch + Cl )F(u − x) − C h , ∂ L N (x, u)/∂u∂u = (Ch + Cl ) f (u − x) ≥ 0. 2

Therefore, ∂L N (x, u)/∂u is differentiable and monotonic increasing in u. From (2.2), it is clear that ∂GN (x, u)/∂u is monotonic increasing in u, but not continuous at u = 0. In order to obtain the optimal control uN * (x), we define two switching curves DN (x) and U N (x) as follows:   D N (x) = min u|∂ L N (x, u)/∂u ≥ −Ce,1 ,   U N (x) = max u|∂ L N (x, u)/∂u ≤ Ce,2 . Because ∂L N (x, u)/∂u is monotonic increasing in u, it is obvious that DN (x) < U N (x). Moreover, from Assumption 2.2, we have ∂ 2 L N (x, D N (x))/∂u∂u > 0, and ∂ 2 L N (x, U N (x))/∂u∂u > 0. It follows,

 −1 C h − C e,1 x, −C /∂u = x + F D N (x) = ∂ L −1 e,1 N Ch + Cl

 −1 −1 C h + C e,2 U N (x) = ∂ L N x, Ce,12 /∂u = x + F Ch + Cl

(2.4) (2.5)

From the monotonicity of ∂L N (x, u)/∂u and ∂GN (x, u)/∂u in u, and the definitions of DN (x) and U N (x), the optimal control uN * (x) at n = N can be described as follows: ⎧ 0 < D N (x) ⎨ D N (x) u ∗N (x) = 0 D N (x) ≤ 0 ≤ U N (x) . ⎩ U N (x) < 0 U N (x)

(2.6)

Note that DN (x) and U N (x) are actually linearly increasing in x by (2.4) and (2.5). Let S and s be two threshold values that satisfy DN (S) = 0 and U N (s) = 0 respectively.

22

2 Optimal ECR Policy in a Single-Depot System

Then we have, S = −F −1





Ch − Ce,1 Ch + Ce,2 ; s = −F −1 Ch + Cl Ch + Cl

(2.7)

It is clear s < S because F(x) is monotonic increasing. Then, the optimal control uN * (x) takes the form of the conventional (s, S) inventory control structure: ⎧ ⎨ D N (x) x > S u ∗N (x) = 0 s≤x≤S. ⎩ U N (x) x < s

(2.8)

Lemma 2.1 (i) DN (x) and U N (x) are differentiable and monotonic increasing; (ii) L N (x, u), GN (x, u) and V N (x) are convex; ∂L N (x, u)/∂u is differentiable in (x, u) and strictly monotonic increasing in u; ∂V N (x)/∂x is monotonic increasing in x. Proof (i) (ii)

In fact, DN (x) and U N (x) are linear in x as shown in (2.4) and (2.5). Hence, assertion (i) is true. L N (x, u) is obviously convex and twice continuously differentiable. Note that GN (x, u) = C e,1 u+ + C e,2 u− + L N (x, u), and the operations such as nonnegative weighted sum and pointwise maximum preserve the convexity. It follows GN (x, u) is convex. From V N (x) = GN (x, uN * (x)) and (2.8), we have.

• When x > S, V N (x) = GN (x, DN (x)) = C e,1 DN (x) + L N (x, DN (x)); • When s ≤ x ≤ S, V N (x) = GN (x, 0) = L N (x, 0); • When x < s, V N (x) = GN (x, U N (x)) = –C e,2 U N (x) + L N (x, U N (x)). First, V N (x) is convex in three intervals x > S, s ≤ x ≤ S, and x < s respectively. Second, the first directive of V N (x) is continuous and increasing at two boundary points s and S. It follows that ∂V N (x)/∂x is monotonic increasing in x. Hence, V N (x) is convex. This completes the proof. 

2.2.2 The Structural Properties of the Value Function and the Optimal Control at Period n Equation (2.8) characterizes the optimal ECR policy at n = N by two threshold values in the closed-form in (2.7). For the multiple period decision-making problems, an interesting and challenging issue is whether the optimal decisions at each period can be characterized and whether they share similar structural properties. This section will show that the optimal ECR decisions at each period have the same structural properties as that in the final period.

2.2 A Discrete Stochastic Dynamic Programming Model

23

Proposition 2.1 For any n ∈ {1, …, N}, we have (i) (ii) (iii)

L n (x, u), Gn (x, u) and V n (x) are convex; ∂L n (x, u)/∂u is differentiable in (x, u) and strictly monotonic increasing in u; ∂V n (x)/∂x is monotonic increasing in x; Define Dn (x) and U n (x) below, then Dn (x) ≤ U n (x) and they are differentiable and increasing in x:   Dn (x) = min u|∂ L n (x, u)/∂u ≥ −Ce,1   Un (x) = max u|∂ L n (x, u)/∂u ≤ −Ce,2 ;

(iv)

The optimal policy un * (x) is given by ⎧ 0 < Dn (x) ⎨ Dn (x) u ∗n (x) = 0 Dn (x) ≤ 0 ≤ Un (x) . ⎩ Un (x) < 0 Un (x)

Proof We use the backward induction approach to prove all the assertions in Proposition 2.1. We know assertions (i)–(iv) are true for n = N from Lemma 2.1. Suppose all the assertions hold for n + 1. We want to show that they are also true for n. To simplify notation, define a vector y: = x + z – u in the rest of this proof. Assertion (i): Because the operations such as non-negative weighted sum, pointwise maximum, and expectation preserve the convexity, L n (x, u) is convex by the induction hypotheses. From Gn (x, u) = C e,1 u+ + C e,2 u− + L n (x, u), it is convex for the same reason. The convexity of V n (x) follows from assertion (ii). Assertion (ii): Note that L n (x, u) = L N (x, u) + αEV n+1 (y) from (2.3). The differentiability of ∂L n (x, u)/∂u is assured because it is the sum of differentiable functions and the induction hypotheses. Moreover, ∂L n (x, u)/∂u is strictly monotonic increasing in u, because of the hypotheses and the strictly increasing property of ∂L N (x, u)/∂u. The monotonic increasing property of ∂V n (x)/∂x in x will be proved at the end of assertion (iv). Assertion (iii): This can be obtained from the definitions, assertion (ii), and the Implicit Function Theorem. Assertion (iv): Note that Dn (x) ≤ U n (x) and both are increasing in x. Define S n and sn as two values that satisfy Dn (S n ) = 0 and U n (sn ) = 0 respectively. Consider three cases below. Case 1: x > S n . It implies that 0 < Dn (x). We first check the impact of control u on Gn (x, u). Note that when u < Dn (x), we have ∂L n (x, u)/∂u < –C e,1 by the defintion of Dn (x). It follows, If u < 0 < Dn (x), then ∂Gn (x, u)/∂u = –C e,2 + ∂L n (x, u)/∂u < – C e,2 – C e,1 < 0. If 0 < u < Dn (x), then ∂Gn (x, u)/∂u = C e,1 + ∂L n (x, u)/∂u < 0. If u > Dn (x), then ∂Gn (x, u)/∂u = C e,1 + ∂L n (x, u)/∂u ≥ 0.

24

2 Optimal ECR Policy in a Single-Depot System

The above results imply that Gn (x, u) is decreasing when u < Dn (x), and is increasing when u > Dn (x). Therefore, we have: un * (x) = Dn (x). Case 2: sn ≤ x ≤ S n . It implies that Dn (x) ≤ 0 ≤ U n (x). We can observe. If u < 0, then ∂Gn (x, u)/∂u = –C e,2 + ∂L n (x, u)/∂u ≤ 0. If 0 < u, then ∂Gn (x, u)/∂u = C e,1 + ∂L n (x, u)/∂u > 0. Therefore, argminu {Gn (x, u)} = 0. We have un * (x) = 0. Case 3: x < sn . It implies that U n (x) < 0. We have, If u < U n (x), then ∂Gn (x, u)/∂u = –C e,2 + ∂L n (x, u)/∂u ≤ 0. If U n (x) < u < 0, then ∂Gn (x, u)/∂u = –C e,2 + ∂L n (x, u)/∂u > 0. If u > 0, then ∂Gn (x, u)/∂u = C e,1 + ∂L n (x, u)/∂u > 0. Therefore, argminu {Gn (x, u)} = U n (x). That is, un * (x) = U n (x). In summary, the optimal policy in period n can be characterized by two threshold values dividing the state space into three intervals as follows, which implies that assertion (iii)⎧is true. ⎨ Dn (x) x > Sn u ∗n (x) = 0 sn ≤ x ≤ Sn . ⎩ Un (x) x < sn In addition, the optimal value function is given by V n (x) = Gn (x, un * (x)) as follows. ⎧ x > Sn ⎨ Ce,1 Dn (x) + L n (x, Dn (x)) Vn (x) = L n (x, 0) sn ≤ x ≤ Sn ⎩ −Ce,2 Un (x) + L n (x, Un (x)) x < sn Note that ∂L n (x, Dn (x))/∂u = –C e,1 and ∂L n (x, U n (x))/∂u = C e,2 . It is clear that V n (x) is differentiable and the first directive of V n (x) is continuous and increasing in x. Hence, V n (x) is convex. This completes the induction proof for all the assertions in Proposition 2.1.  It should be pointed out that although we have established the explicit threshold structure of the optimal ECR policy, it is difficult to obtain the closed-form of the objective functions and the threshold parameters for all planning periods (except the final period N). In next section, we apply an alternative technique with the aim to obtain the closed-form solution of the optima ECR policy.

2.3 A Fluid-Flow Model Based on Continuous-Time Dynamic Programming These section models the container flows as continuous-time continuous-state processes. Although containers are discrete units by nature, their movements through depot (intermodal terminal or seaport) are often in large volume and may be approximated as a piecewise continuous flow. The main advantage of using the fluid-flow model is to enables us to apply advanced mathematical techniques in exploring the

2.3 A Fluid-Flow Model Based on Continuous-Time Dynamic Programming

25

qualitative structural properties of the value functions and deriving closed-form solutions to the Hamilton–Jacobi-Bellman (HJB) equations. There has been a rich literature on the successful application of fluid-flow models in stochastic manufacturing systems (e.g., Gershwin, 1994; Sethi, 2019; Sethi & Zhang, 1994). This section is mainly based on Song and Zhang (2010). Let x(t) ∈ R denote the number of containers (i.e., inventory level) at time t ≥ 0. Its derivative with respect to time is given by (where x  (t) denotes the derivative of x(t) with respect to t), x  (t) = u 2 (t) − u 1 (t) − d(t), x(0) = x

(2.9)

where u1 (t) is the reposition-out rate and u2 (t) is the reposition-in rate. Both the control variables u1 (t) and u2 (t) belong to the control set  = [0, 1]. Here d(t) is the rate of net flow-out demand. Negative d(t) should be understood as the rate of net flow-in demand. We first consider the simple case with d(t) ∈ {d 1 , d 2 } is a two-state Markov chain with − 1 < d 1 < 0 and 0 < d 2 < 1. Its generator is given by the matrix:

Q=

−λ λ μ −μ



where λ > 0 and μ > 0. Here d(t) represents the uncertain market environment which is assumed to be a Markov jump process. Such treatment is common in the literature for characterizing exogenous customer demands, e.g., (Chan et al., 2008; Tan, 2002). The cost function under a given inventory of empty containers and control actions are given by g(x, u 1 , u 2 ) = Ch x + + Cl x − + Ce,1 u 1 + Ce,2 u 2 where C h denotes the container holding cost rate, C l denotes the container leasing cost rate, C e,1 and C e,2 denote the cost rates of reposition-out and reposition-in respectively; and x + = max(x, 0) and x − = max(−x, 0). Physically, positive x represents the inventory level of empty containers at the depot and negative x represents the number of leased containers. Given the initial states x(0) = x ∈ R and d(0) = d ∈ {d 1 , d 2 }, the cost function of the control process U(t) = (u1 (t), u2 (t)) is given as follows: ∞ J (x, d, U (.)) = E

e−ρt g(x(t), u 1 (t), u 2 (t))dt

(2.10)

0

where ρ > 0 is the discount factor. We consider admissible controls specified as follows (Song & Zhang, 2010):

26

2 Optimal ECR Policy in a Single-Depot System

Definition 2.1 A control process U(·) = {U(t) = (u1 (t), u2 (t)) ∈ R2 : t ≥ 0} is called admissible if: (i) U(t) is adapted to the filtration σ{d(s): s ≤ t} the σ-field generated by d(·); (ii) U(t) ∈  ×  for t ≥ 0. Let A denote the set of admissible control processes. Definition 2.2 A function U(x, d) = (u1 (x, d), u2 (x, d)) is called an admissible feedback control, or simply feedback control, if (i) for any given initial (x, d), the equation x  (t) = u 2 (x(t), d(t)) − u 1 (x(t), d(t)) − d(t), x(0) = x, d(0) = d has a unique solution; and (ii) U(t) = U(x(t), d(t)) is admissible. Let V i (x) denote the value functions with the initial state x(0) = x and initial demand d = d i for i = 1,2. That is, Vi (x) = inf J (x, di , U (.)), i = 1, 2 U (.)∈A

(2.11)

Note that the inventory cost and leasing cost are generally higher than the empty repositioning costs. Intuitively, the leasing cost should be greater than the repositioning cost, while inventory cost may be not. However, in reality, holding inventory of containers at one depot often implies that there are shortages of containers at other depots, which incurs leasing costs at those depots. Therefore, it is reasonable to assume a higher inventory cost reflecting possible leasing cost in other places. We impose the following assumptions throughout this chapter, which indicates that both holding and leasing costs are higher than linear combinations of repositioning costs.   (A1): Ch > ρCe,1 + λ Ce,1 + Ce,2 ; Cl > ρCe,2 + μ Ce,1 + Ce,2 .

2.3.1 Structural Properties of the Optimal Policy In this section, we examine the structural properties of the value functions and then establish the structure of optimal control. First, since g(.) is a joint convex function. It follows that the value functions V i (x), i = 1, 2, are convex in x (Sethi & Zhang, 1994). Moreover, it can be shown that V i (x) is local Lipschitz in x and there exists a constant K such that |V i (x)| ≤ K(1 + |x|). To simplify the narrative, let vi (x) represent functions that possess all properties of the value functions V i (x). We use these properties to find analytical expressions for vi (x) and will show later that they are indeed equal to the value functions V i (x). The associated HJB equations should have the form:

2.3 A Fluid-Flow Model Based on Continuous-Time Dynamic Programming

27

⎧ ⎨ ρv1 (x) = min {(u 2 − u 1 − d1 )v1 (x) + g(x, u 1 , u 2 ) + λ(v2 (x) − v1 (x))} u 1 ,u 2 ∈

⎩ ρv2 (x) = min {(u 2 − u 1 − d2 )v2 (x) + g(x, u 1 , u 2 ) + μ(v1 (x) − v2 (x))} u 1 ,u 2 ∈

(2.12) We will show that vi (x) is the unique strictly convex and continuously differentiable solutions to the HJB equations. The convexity of vi (x) implies the monotonicity of vi  (x). The optimal control U ∗ (x, d) = (u1 ∗ (x, d), u2 ∗ (x, d)) is expected to have the following form: ⎧ ⎨ 0 ∗ u ∗1 (x, di ) = x1,i ⎩ 1 ⎧ ⎨ 0 ∗ u ∗2 (x, di ) = x2,i ⎩ 1

if vi (x) < Ce,1 if vi (x) = Ce,1 ; if vi (x) > Ce,1 if vi (x) > −Ce,2 if vi (x) = −Ce,2 if vi (x) < −Ce,2

(2.13)

∗ ∗ where x1,i , x2,i ∈  for i = 1, 2 will be determined so that the corresponding differential Eq. (2.9) has a solution. Let z1 and z2 denote the numbers such that v1  (z1 ) = C e,1 and v1  (z2 ) = − C e,2 respectively. The monotonicity of v1  (x) implies that z1 > z2 . Similarly, let x 1 and x 2 denote the numbers such that v2  (x 1 ) = C e,1 and v2  (x 2 ) = − C e,2 respectively. The monotonicity of v2  (x) implies that x 1 > x 2 . Using the four threshold numbers z1 , z2 , x 1 and x 2 , the feedback control U ∗ (x, d) defined in (2.13) can be written as

⎧ ⎨ 0 ∗ ∗ u 1 (x, d1 ) = x1,1 ⎩ 1 ⎧ ⎨ 0 ∗ u ∗1 (x, d2 ) = x1,2 ⎩ 1

⎧ if x < z 1 ⎨ 0 if ∗ ∗ if x = z 1 . . . u 2 (x, d1 ) = x2,1 if ⎩ if x > z 1 1 if ⎧ if x < x1 ⎨ 0 if ∗ if x = x1 . . . u ∗2 (x, d2 ) = x2,2 if ⎩ if x > x1 1 if

x > z2 x = z2 x < z2 x > x2 x = x2 x < x2

(2.14)

Lemma 2.2 Song and Zhang (2010). The threshold numbers satisfy: z1 = x 2 = 0. From Lemma 2.2 and the structure of the optimal control, it can be seen that x = 0 is an attraction point. The existence of solution to (2.9) under the feedback control policy U ∗ (x, d) only requires x ∗ 1,1 = − d 1 and x ∗ 2,2 = d 2 to avoid the so-called chattering control. One may take x ∗ 2,1 and x ∗ 1,2 to be any numbers in [0, 1]. We take x ∗ 2,1 = 1 (and x ∗ 1,2 = 1) so that the x  (t) at x = z2 (and x = x 1 ) is maximized (minimized). Therefore, the corresponding feedback control U ∗ (x, d) can be written by:

28

2 Optimal ECR Policy in a Single-Depot System

⎧ ⎨ 0 u ∗1 (x, d1 ) = −d1 ⎩ 1 ⎧ ⎨ 0 if u ∗1 (x, d2 ) = 1 if ⎩ 1 if

⎧ if x < 0 ⎨ 0 if if x = 0 . . . u ∗2 (x, d1 ) = 1 if ⎩ 1 if if x > 0 ⎧ x < x1 ⎨ 0 if x = x1 . . . u ∗2 (x, d2 ) = d2 if ⎩ x > x1 1 if

x < z2 x = z2 x > z2 x 0

(2.15)

We will show that U ∗ (x, d) defined in (2.15) is indeed optimal. In addition, the points z2 < 0 and x 1 > 0 naturally divide the x-axis into four intervals. The corresponding differential Eq. (2.12) on each of these intervals can be specified as follows: On the interval x ∈ (−∞, z2 ): 

(ρ + λ)v1 = (1 − d1 )v1 + g(x, 0, 1) + λv2 (ρ + μ)v2 = (1 − d2 )v2 + g(x, 0, 1) + μv1

(2.16)

On the interval x ∈ (z2 , 0): 

(ρ + λ)v1 = −d1 v1 + g(x, 0, 0) + λv2 (ρ + μ)v2 = (1 − d2 )v2 + g(x, 0, 1) + μv1

(2.17)

On the interval x ∈ (0, x 1 ): 

(ρ + λ)v1 = −(1 + d1 )v1 + g(x, 1, 0) + λv2 (ρ + μ)v2 = −d2 v2 + g(x, 0, 0) + μv1

(2.18)

On the interval x ∈ (x 1 , ∞): 

(ρ + λ)v1 = −(1 + d1 )v1 + g(x, 1, 0) + λv2 (ρ + μ)v2 = −(1 + d2 )v2 + g(x, 1, 0) + μv1

(2.19)

Furthermore, on each of these intervals, the convexity of vi (x), i = 1, 2, is equivalent to the nonnegativity of vi  (x ), i = 1, 2, which in turn implies the following inequalities by differentiating both sides of (2.16)–(2.19),



 ρ + λ −λ Cl v1 (x) , for x < 0 ≥−  Cl v2 (x) −μ ρ + μ

 ρ + λ −λ v1 (x) Ch , for x > 0. ≤ −μ ρ + μ Ch v2 (x)

(2.20)

(2.21)

2.3 A Fluid-Flow Model Based on Continuous-Time Dynamic Programming

29

2.3.2 Solving the Hamilton–Jacobi-Bellman Equations In this section, we will solve the Hamilton–Jacobi-Bellman equations analytically. It consists of two main steps. Firstly, we derive the closed-form of the value functions under the feedback control policy. Secondly, we provide a verification theorem to show that the obtained value functions are indeed the optimal value functions and the feedback control policy is indeed optimal.

Closed-Forms of the Value Functions We first introduce the following result regarding the calculation of exponential matrix.

ab Lemma 2.3 Let A = be a real 2 × 2 matrix with distinct real eigenvalues cd γ 1 and γ 2 . Then, the exponential matrix can be represented as follows, e Ax =

1 γ2 − γ1





 a − γ1 b γ2 − a −b e γ2 x e γ1 x + −c γ2 − d c d − γ1

Let K 0 denote the column vector [v1 (0), v2 (0)]T . Here AT represents the transpose of the vector A. Likewise, Let K 1 = [v1 (x 1 ), v2 (x 1 )]T and K 2 = [v1 (z2 ), v2 (z2 )]T . Recall that v1 (0) = Ce,1 , v1 (z 2 ) = −Ce,2 , v2 (x1 ) = Ce,1 , and v2 (0) = −Ce,2 , for x > 0. (2.22) Plug the values of v1  (0) and v2  (0) into the first equation in (2.17) and the second equation in (2.18) respectively, we have 

(ρ + λ)v1 (0) = −d1 Ce,1 + g(x, 0, 0) + λv2 (0) (ρ + μ)v2 (0) = d2 Ce,2 + g(x, 0, 0) + μv1 (0)

Solve for K 0 = [v1 (0), v2 (0)]T , we have, −1

−d1 Ce,1 ρ + λ −λ d2 Ce,2 −μ ρ + μ

1 −(ρ + μ)d1 Ce,1 + λd2 Ce,2 = ρ(ρ + λ + μ) −μd1 Ce,1 + (ρ + λ)d2 Ce,2

K0 =

v1 (0) v2 (0)





=

The middle two equations in (2.22) can be written in terms of K 1 and K 2 in connection with the first equation in (2.17) and second equation in (2.18)

30

2 Optimal ECR Policy in a Single-Depot System



(−μ, ρ + μ)K 1 = −d2 Ce,1 + Ch x1 (ρ + λ, −λ)K 2 = d1 Ce,2 − Cl z 2

(2.23)

The above equations can be used to determine the values of x 1 and z2 .

Existence and Calculation of x 1 To determine the value of x1 , we solve the differential Eq. (2.18) on [0, ∞) originated at x = 0. Let     ρ+λ λ 1+d1 1+d1 μ ρ+μ d2 d2

A1 =

, and FA1 (x) =

Ch x+Ce,1 1+d1 Ch x d2

Then, the differential equations in (2.18) can be written as

v1 (x) v2 (x)



= A1

v1 (x) v2 (x)

+ FA1 (x)

Its solution is given by

v1 (x) v2 (x)



x = e A1 x K 0 +

e A1 (x−y) FA1 (y)dy

0

=e

A1 x

K0 +

A1 x |A−1 1 e

 −

A−1 1 |

·

Ce,1 − Cρh + 1+d 1 − Cρh



 +

Ch x ρ Ch x ρ

 (2.24)

where the exponential matrix can be calculated using Lemma 2.3. For x ≥ 0, in relation to (2.23), let

v1 (x) φ A1 (x) = (−μ, ρ + μ) + d2 Ce,1 − Ch x v2 (x)

(2.25)

To show the existence of x 1 (or v2  (x 1 ) = C e,1 ), it is equivalent to showing φ A1 (x1 ) = 0. It is easy to check that matrix A1 has two distinct negative eigenvalues. Using (2.24) and Assumption (A1), we have, φ A1 (0) = (−μ, ρ + μ)K 0 + d2 Ce,1 = d2 (Ce,1 + Ce,2 ) > 0 φ A1 (∞) = −d2 (Ch − ρCe,1 )/ρ < 0 Hence, the continuity of φ A1 (x) implies that the existence of x 1 > 0 such that φ A1 (x1 ) = 0. Moreover, its derivative is given by,

2.3 A Fluid-Flow Model Based on Continuous-Time Dynamic Programming

φ A1 (x)



v  (x) = (−μ, ρ + μ) 1 v2 (x)

31

− Ch

Using the second inequality in (2.21), we have φ A1 (x) ≤ 0. This leads to the monotonicity of φ A1 (x). The points satisfying φ A1 (x1 ) = 0 consist of either a singleton or an interval. In the latter case, we take x 1 to be the lower end point of the interval. Actually, we will impose conditions to guarantee the strict convexity of the value functions. In this case, φ A1 (x) is strictly decreasing which gives the uniqueness of x1 .

Existence and Calculation of z2 We next determine the value of z2 . First, we solve the differential Eq. (2.17) on (−∞, 0] originated at x = 0. Let  B1 =

− ρ+λ d1 μ − 1−d 2

λ d1 ρ+μ 1−d2



 , and FB1 (x) =

− Cdl1x

 .

Cl x−Ce,2 1−d2

Then, the differential equations in (2.17) can be written as

v1 (x) v2 (x)



= B1



v1 (x) v2 (x)

+ FB1 (x)

Its solution is given by

v1 (x) v2 (x)



x =e

B1 x

K0 +

e B1 (x−y) FB1 (y)dy

0

=e

B1 x

K0 +

|B1−1 e B1 x

 −

B1−1 |

·

Cl ρ

Cl ρ



 Ce,2 1−d2

 −

Cl x ρ Cl x ρ

 (2.26)

For x ≥ 0, in relation to (2.23), let

v (x) φ B1 (x) = (ρ + λ, −λ) 1 v2 (x)

− d1 Ce,2 + Cl x

(2.27)

To show the existence of z2 (or v1  (z2 ) = − C e,2 ), it is equivalent to showing φ B1 (z 2 ) = 0. It is easy to check that matrix B1 has two distinct positive eigenvalues. Using (2.26) and Assumption (A1), we have, φ B1 (0) = (ρ + λ, −λ)K 0 − d1 Ce,2 = −d1 (Ce,1 + Ce,2 ) > 0

32

2 Optimal ECR Policy in a Single-Depot System

φ B1 (−∞) = d1 (Cl − ρCe,2 )/ρ < 0 Hence, the continuity φ B1 (x) implies that the existence of z2 < 0 such that φ B1 (z 2 ) = 0. Moreover, its derivative is given by, φ B1 (x)



v  (x) = (ρ + λ, −λ) 1 v2 (x)

+ Cl

Because of the convexity given in (2.20), we have φ B1 (x) ≥ 0. This leads to the monotonicity of φ B1 (x). We will impose conditions to ensure the strict convexity of the value functions, which will, in turn, leads to the strict monotonicity of φ B1 (x). As a result, the uniqueness of x 1 can be guaranteed. By examining the convexity conditions in four intervals such as (z2 , 0), (− ∞, z2 ), (0, x 1 ), and (x 1 , ∞), a set of sufficient conditions can be established to ensure the strict convexity of the value functions. The details of the sufficient conditions can be referred to Song and Zhang (2010).

A Verification Theorem In this section, we give a verification theorem to show that the solutions vi (x), i = 0, 1, of Eq. (2.12) are indeed the optimal value functions and the feedback control U ∗ (x, d) is optimal. Theorem 2.1 Under the assumptions (A1) and the sufficient conditions to ensure the strict convexity of the value functions, we have vi (x) = Vi (x) = inf J (x, di , U (.)), for i = 1, 2. U (.)∈A

The feedback control U ∗ (x, d) given in (2.15) is optimal. Proof We first show that the solution (v1 (x), v2 (x))T obtained is continuously differentiable on R. It suffices to show the continuous differentiability of (v1 (x), v2 (x))T at x = z2 , x = 0, and x = x 1 . Note that the initial value selection at x = z2 , x = 0, and x = x 1 guarantees the continuity of (v1 (x), v2 (x))T at these points, i.e.,

v1 (z 2− ) v2 (z 2− )



=









v1 (z 2+ ) v1 (0− ) v1 (0+ ) v1 (x1− ) v1 (x1+ ) , = , = v2 (z 2+ ) v2 (0− ) v2 (0+ ) v2 (x1− ) v2 (x1+ )

Note that v1 (z 2+ ) = −Ce,2 and v1 (z 2+ ) = −d1 (−Ce,2 ) − Cl z 2 (ρ + λ, −λ) v2 (z 2+ )

v (z − ) = (1 − d1 )v1 (z 2− ) − Cl z 2 + Ce,2 (ρ + λ, −λ) 1 2− v2 (z 2 )

2.3 A Fluid-Flow Model Based on Continuous-Time Dynamic Programming

33

The choice of

v1 (z 2+ ) v2 (z 2+ )



= K2 =

v1 (z 2− ) v2 (z 2− )



yields −d1 (−Ce,2 ) − Cl z 2 = (1 − d1 )v1 (z 2− ) − Cl z 2 + Ce,2 This implies that v1 (z 2− ) = −Ce,2 = v1 (z 2+ ). Similarly, we can show the continuous differentiability at all other points. Hence, (v1 (x), v2 (x))T is a classical solution to the HJB Eq. (2.12). Moreover, (v1 (x), v2 (x))T is continuously differentiable and strict convex. In addition, it is easy to check using the explicit solutions of (v1 (x), v2 (x))T , it has at most linear growth, i.e., |v1 (x)| ≤ K (1 + |x|) for some constant K . Finally, we show the optimality of U ∗ (x, d) = (u1 ∗ (x, d), u2 ∗ (x, d)). Using Dynkin’s formula (c.f. Yin & Zhang, 1998), for each fixed T > 0 and any admissible (u1 (t), u2 (t)), we have Ee

−ρT

 vd(T ) (x(T )) = vi (x) + E

T

 e−ρt −ρvd(t) (x(t))

0  + vd(t) (x(t))(u 2 (t) − u 1 (t) − d(t))

+Qv. (x(t))(d(t)))dt where  Qv (x)(d) =

λ(v2 (x) − v1 (x)) if d = d1 μ(v2 (x) − v1 (x)) if d = d2

Recall that (v1 (x), v2 (x))T satisfies the HJB Eq. (2.12), which becomes an inequality when dropped the minimum over (u1 , u2 ). It follows that T E

 e−ρt

0

T ≥ −E

  −ρvd(t) (x(t)) + vd(t) (x(t))(u 2 (t) − u 1 (t) − d(t)) dt +Qv. (x(t))(d(t)) e−ρt g(x(t), u 1 (t), u 2 (t))dt

0

Sending T towards ∞, and using the linear growth condition, we have

34

2 Optimal ECR Policy in a Single-Depot System

∞ vi (x) ≤ E

e−ρt g(x(t), u 1 (t), u 2 (t))dt = J (x, di , U (.))

0

For all admissible of U(.) = (u1 (.), u2 (.)). The equality holds when (u1 (t), u2 (t)) = (u1 ∗ (x(t)), u2 ∗ (x(t))). Therefore, of U ∗ (x, d) is optimal. This completes the proof. 

2.3.3 Extension to More General Cases In this section, we extend the fluid-flow model to more general situations with multiple demand states and uncertain repositioning capacities. The purpose is to establish the general structure of the control policies and numerically validate the applicability of the fluid-flow models. Let α 1 (t) ≥ 0 and α 2 (t) ≥ 0 denote the maximum available capacity for empty container repositioning out and repositioning in, respectively. The equation for x(t) becomes x  (t) = α2 (t)u 2 (t) − α1 (t)u 1 (t) − d(t), x(0) = x

(2.28)

Here u1 (t) ∈ [0, 1] and α 1 (t)u1 (t) represents the repositioning-out rate. Similarly, u2 (t) ∈ [0, 1] and α 2 (t)u2 (t) represents the repositioning-in rate. In addition, we allow d(t) to take values from a finite set {d 1 , d 2 , …, d l }. In this section, we consider the case when (α 1 (t), α 2 (t), d(t)) is a finite-state Markov chain with a generator Q. Let g(x, α1 , α2 , u 1 , u 2 ) = Ch x + + Cl x − + Ce,1 α1 u 1 + Ce,2 α2 u 2 Given the initial states x(0) = x ∈ R and α 1 (0) = α 1 , α 2 (0) = α 2 , d(0) = d, and U(·) = (u1 (.), u2 (.)), the corresponding cost function is given as follows: ∞ J (x, α1 , α2 , d, U (.)) = E

e−ρt g(x(t), α1 (t), α2 (t), u 1 (t), u 2 (t))dt

(2.29)

0

Define the corresponding admissible controls set A in a similar way as that in Sect. 2.3, the value functions are given as follows with the initial state x(0) = x and α 1 (0) = α 1 , α 2 (0) = α 2 , d(0) = d: vi (x, α1 , α2 , d) = inf J (x, α1 , α2 , d, U (.)) U (.)∈A

The associated HJB equations should have the form,

(2.30)

2.3 A Fluid-Flow Model Based on Continuous-Time Dynamic Programming

 ρv(x, α1 , α2 , d) = min

u 1 ,u 2 ∈

35

 (α2 u 2 − α1 u 1 − d)v  (x, α1 , α2 , d) + g(x, α1 , α2 , u 1 , u 2 ) + Qv(x, ·, ·, ·)(α1 , α2 , d) (2.31)

Here Qv(x, ·, ·, ·)(α1 , α2 , d) is the coupling term corresponding to the generator of the Markov chain. It can be shown that the value function is the unique viscosity solution to the HJB equation (Yin & Zhang, 1998). In addition, the optimal control U * (x, α 1 , α 2 , d) = (u1 * (x, α 1 , α 2 , d), u2 * (x, α 1 , α 2 , d)) should have the following form: ⎧ 0 if v  (x, α1 , α2 , d) < Ce,1 ⎨ ∗ ∗ u 1 (x, α1 , α2 , d) = x1,α1 ,α2 ,d if v  (x, α1 , α2 , d) = Ce,1 (2.32) ⎩ 1 if v  (x, α1 , α2 , d) > Ce,1 ⎧ 0 if v  (x, α1 , α2 , d) > −Ce,2 ⎨ ∗ ∗ (2.33) u 2 (x, α1 , α2 , d) = x2,α1 ,α2 ,d if v  (x, α1 , α2 , d) = −Ce,2 ⎩ 1 if v  (x, α1 , α2 , d) < −Ce,2 ∗ ∗ , x2,α ∈  are the threshold levels to be determined. Here x1,α 1 ,α2 ,d 1 ,α2 ,d

2.3.4 Numerical Examples This section gives numerical examples to demonstrate the results. We divide it into four subsections: two-demand-state examples; three-demand-state examples; fourdemand-state examples; and discussions.

Two-Demand-State Cases We consider the examples with the following specifications as the base scenario: C e,1 = 1, C e,1 = 2, C h = 2, C l = 10, d 1 = 0.5, d 2 = 0.5, ρ = 0.01, λ = 0.3, μ = 0.3. With the above setting, the depot actually has a balanced demand flow in long-term statistical sense. However, in short-term the dynamic container flows are imbalanced and dictated by the Markov jump process. Moreover, empty repositioning-in cost is twice of the empty repositioning-out cost, and the leasing cost is five times of the inventory cost. We use φ A1 (x) defined in (2.25) to identify x 1 and use φ B1 (x) defined in (2.27) for z2 . The results are (z2 , x 1 ) = (−0.1386, 0.5650). The threshold value z2 is closer to 0 than x 1 , which is consistent with intuition since we have higher leasing cost than inventory cost. For all the examples in this section, the convexity sufficient conditions in Song and Zhang (2010) are satisfied. In addition, we have not found

36

2 Optimal ECR Policy in a Single-Depot System

cases when sufficient conditions are violated after examining a large set of parameters in reasonable ranges. Next, we examine the sensitivity of (z2 , x 1 ) with respect to input parameters by varying the parameters. In Group 1, we let λ vary from 0.1 to 0.5 and keep other parameters the same as the base scenario. In Group 2, we let μ vary from 0.1 to 0.5 and keep other parameters the same as the base scenario. In Group 3, we let d 1 vary from − 0.1 to − 0.9 and keep other parameters the same as the base scenario. In Group 4, we let d 2 vary from 0.1 to 0.9 and keep other parameters the same as the base scenario. In Group 5, we let ρ vary from 0.01 to 0.3 and keep other parameters the same as the base scenario. The results are shown in Table 2.1. For Group 1, intuitively, larger λ would reduce the average time for d(t) staying in state d 1 which creates larger overall flow-out demand. In this case, (z2 , x 1 ) increases to meet the increasing flow-out demand. This is confirmed in Table 2.1. In Group 1, we have a fixed μ = 0.3, therefore, λ = 0.1 and 0.2 represents the case that the depot is a surplus depot (i.e., having more flow in than flow out in long term), and λ = 0.4 and 0.5 represents the cases that the depot is a deficit depot. For Group 2, likewise, increasing μ would reduce the overall flow-out demand, which leads to smaller (z2 , x 1 ). Similarly, in Group 2, the first two cases are the deficit depot and the last two cases are the surplus depot. In Groups 3 and 4, increasing either d 1 or d 2 has a similar effect, i.e., increase the average flow-out demand which leads to larger (z2 , x 1 ). In Group 3, the first two cases are the deficit depot and the last two cases are the surplus depot. In Group 4, the first two cases are the surplus depot and the last two cases are the deficit depot. In Group 5, the discount factor is varying. Note that a smaller discount factor indicates higher weight on long-term future costs and less weight on short-term Table 2.1 The optimal threshold values (z2 , x 1 ) under different scenarios Group 1

Group 2

Group 3

Group 4

Group 5

λ

0.1

0.2

0.3

0.4

0.5

z2

− 0.146

− 0.1421

− 0.1386

− 0.1353

− 0.1322

x1

0.5496

0.5574

0.565

0.5724

0.5796

μ

0.1

0.2

0.3

0.4

0.5

z2

− 0.1383

− 0.1385

− 0.1386

− 0.1387

− 0.1389

x1

0.671

0.612

0.565

0.5264

0.494

d1

− 0.1

− 0.3

− 0.5

− 0.7

− 0.9

z2

− 0.0285

− 0.0843

− 0.1386

− 0.1915

− 0.2433

x1

0.5849

0.5773

0.565

0.542

0.4808

d2

0.1

0.3

0.5

0.7

0.9

z2

− 0.1407

− 0.14

− 0.1386

− 0.1357

− 0.125

x1

0.121

0.3496

0.565

0.7706

0.9686

ρ

0.01

0.05

0.1

0.2

0.3

z2

− 0.1386

− 0.1389

− 0.1393

− 0.1402

− 0.141

x1

0.565

0.5639

0.5628

0.5612

0.5606

2.3 A Fluid-Flow Model Based on Continuous-Time Dynamic Programming

37

recent costs, whereas larger x 1 and z2 imply fewer repositioning-out activities and more repositioning-in activities. Therefore, when the discount factor is larger, we tend to focus more on short-term costs and it is reasonable to have smaller x 1 and z2 so that more expensive repositioning-in activities could be reduced. The above examples are illustrative. They can be easily scaled up to represent realistic situations due to the linearity of the state equation and the cost function. For example, if we have − 1000 < d 1 < 0, 0 < d 2 < 1000, and |ui | ≤ 1000, then the optimal threshold values for the first example will be: (z2 , x 1 ) = (− 138.6, 565.0). The above numerical examples demonstrate how the safety inventory level should be computed and changed in response to the system parameter changes.

Three-Demand-State Cases Assume the demand process is a three-state Markov chain and both repositioning-in and repositioning-out capacities are two-state Markov chains. Let the cost coefficients be the same as those in the base scenario and the discounted factor being 0.5. It is assumed that the transit rate between any two demand states is 0.3, the transit rate between two repositioning-in capacity states is 0.5; and the transit rate between two repositioning-out capacity states is 0.5. Kushner’s approximation scheme (Kushner & Dupuis, 1992; Yan & Zhang, 1997) is used to solve the HJB equations given in (2.31). The discretization step size is set as x = 0.01 and the state space for the inventory of empty containers is limited to [− 10, 10]. The maximum iteration number in the value iteration algorithm is 10,000. The iteration algorithm may terminate if the difference of the value function in consecutive iterations is less than 0.0001. Three cases are examined: • Case 1: d(t) ∈ {− 0.6, 0.0, 0.6}, α 1 (t) ∈ {1.0, 0.5}, α 2 (t) ∈ {1.0, 0.5}. This represents a balanced depot; • Case 2: d(t) ∈ {− 0.6, − 0.3, 0.3}, α 1 (t) ∈ {1.0, 0.5}, α 2 (t) ∈ {1.0, 0.5}. This represents a surplus depot; • Case 3: d(t) ∈ {− 0.3, 0.3, 0.6}, α 1 (t) ∈ {1.0, 0.5}, α 2 (t) ∈ {1.0, 0.5}. This represents a deficit depot. It is found that in all three cases, the optimal empty repositioning policy has a threshold structure. For any given state of the coupling Markov chain, (α 1 , α 2 , d), the repositioning-in and repositioning-out decisions are determined by two threshold parameters (x 2 (α 1 , α 2 , d), x 1 (α 1 , α 2 , d)). Namely, u1 = 0 and u2 = 1 if x < x 2 (α 1 , α 2 , d); u1 = 0 and u2 = 0 if x 2 (α 1 , α 2 , d) < x < x 1 (α 1 , α 2 , d); u1 = 1 and u2 = 0 if x > x 1 (α 1 , α 2 , d). More specifically, the results are given in Table 2.2. It can be observed in Table 2.2 that: (i) when demand state d(t) is negative (i.e., flow-in demand is more than flow-out demand), we tend to keep lower inventory levels and reposition-out empty containers; whereas when demand state d(t) is positive (i.e., flow-in demand is less than flow-out demand), we tend to keep higher inventory levels and reposition-in empty containers. (ii) the available capacity for empty repositioning does have an impact on the repositioning decisions, e.g., when

38

2 Optimal ECR Policy in a Single-Depot System

Table 2.2 Threshold values (x 2 , x 1 ) at (α 1 , α 2 , d) in cases with three demand states Case 1

Case 2

Case 3

(α 1 , α 2 )

d 1 = − 0.6

d 2 = 0.0

d 3 = 0.6

(1.0, 1.0)

(− 0.15, 0.00)

(0.00, 0.00)

(0.01, 0.60)

(1.0, 0.5)

(− 0.13, 0.02)

(0.00, 0.03)

(0.08, 0.62)

(0.5, 1.0)

(− 0.16, − 0.01)

(0.00, 0.00)

(0.01, 0.57)

(0.5, 0.5)

(− 0.14, − 0.01)

(0.00, 0.03)

(0.08, 0.59)

(α 1 , α 2 )

d 1 = − 0.6

d 2 = − 0.3

d 3 = 0.3

(1.0, 1.0)

(− 0.16, 0.00)

(− 0.09, 0.00)

(0.00, 0.29)

(1.0, 0.5)

(− 0.16, 0.00)

(− 0.08, 0.00)

(0.01, 0.29)

(0.5, 1.0)

(− 0.17, − 0.01)

(− 0.09, 0.00)

(0.00, 0.27) (0.01, 0.27)

(0.5, 0.5)

(− 0.16, − 0.01)

(-0.08, 0.00)

(α 1 , α 2 )

d 1 = − 0.3

d 2 = 0.3

d 3 = 0.6

(1.0, 1.0)

(− 0.08, 0.06)

(0.00, 0.41)

(0.01, 0.68)

(1.0, 0.5)

(− 0.06, 0.09)

(0.01, 0.41)

(0.09, 0.69)

(0.5, 1.0)

(− 0.08, 0.06)

(0.00, 0.40)

(0.01, 0.66)

(0.5, 0.5)

(− 0.06, 0.09)

(0.01, 0.41)

(0.09, 0.68)

d 3 = 0.6 in Case 1, the lower threshold values are significantly different for α 2 = 1.0 and α 2 = 0.5; the latter requires a higher level of minimum inventory due to the smaller repositioning-in capacity. (iii) the cost structure also affects the empty repositioning decisions. In our examples, the leasing cost is much higher than the inventory cost and the repositioning-in cost is higher than the repositioning-out cost. This leads to the asymmetrical results in Table 2.2 (e.g., in Case 1, d(t) = -0.6 and d(t) = 0.6). Namely, we prefer to maintain relatively high levels of inventory to avoid leasing and repositioning-in empties. (iv) for the surplus depot (Case 2) we tend to keep lower inventory levels compared to the deficit depot (Case 3), e.g., at states d(t) = -0.3 and 0.3. For the balanced depot (Case 1) with d(t) = 0, both threshold values are zero or very close to zero, which means little repositioning action is required.

Four-Demand-State Cases Assume the demand process is a four-state Markov chain, the repositioning-out capacity is a two-state Markov chain, while the repositioning-in capacity is fixed. Two cases are examined: • Case 1: d(t) ∈ {− 0.6, − 0.2, 0.2, 0.6}, α 1 (t) ∈ {1.0, 0.5}, α 2 (t) = 0. This represents a balanced depot with no capacity to reposition in empties; • Case 2: d(t) ∈ {− 0.6, − 0.4, − 0.2, 0.2}, α 1 (t) ∈ {1.0, 0.5}, α 2 (t) = 0. This represents a heavy surplus depot with no capacity to reposition in empties. Again, it is found that in both cases the optimal empty repositioning policy is of threshold structure. The results are given in Table 2.3. Because the repositioning-

2.3 A Fluid-Flow Model Based on Continuous-Time Dynamic Programming

39

Table 2.3 Threshold values (x 2 , x 1 ) at (α 1 , α 2 , d) in cases with four demand states Case 1

Case 2

(α 1 , α 2 )

d 1 = − 0.6

d 2 = − 0.2

d 3 = 0.2

d 4 = 0.6

(1.0, 0.0)

(− 10, 0.56)

(− 10, 0.57)

(− 10, 0.72)

(− 10, 1.20)

(0.5, 0.0)

(− 10, 0.51)

(− 10, 0.56)

(− 10, 0.71)

(− 10, 1.17)

(α 1 , α 2 )

d 1 = − 0.6

d 2 = − 0.4

d 3 = − 0.2

d 4 = 0.2

(1.0, 0.0)

(− 10, 0.07)

(− 10, 0.09)

(− 10, 0.09)

(− 10, 0.33)

(0.5, 0.0)

(− 10, 0.03)

(− 10, 0.08)

(− 10, 0.08)

(− 10, 0.31)

in capacity is zero and the numerical computation is limited into a state space [− 10, 10], the lower threshold values take − 10 in both cases. This implies that no repositioning-in action will be taken. It can be seen although Case 1 has balanced demands in long term, pretty high levels of inventory are required for all states of d(t). In Case 2, the inventory levels are fairly low because it is a surplus depot.

Discussions The numerical examples reveal that if the uncertainty of customer demands and repositioning capacities is described by a finite-state Markov chain (α 1 (t), α 2 (t), d(t)), the optimal empty repositioning policy can be characterized by a set of threshold parameter-pairs. One parameter pair corresponds to one state of the coupling Markov chain. More specifically, for any given state (α 1 , α 2 , d), there exists a pair of threshold parameters (x 2 (α 1 , α 2 , d), x 1 (α 1 , α 2 , d)). The repositioning-out (u1 ) and repositioning-in (u2 ) decisions are determined by: • u1 = 0 and u2 = 1, if x < x 2 (α 1 , α 2 , d); • u1 = 0 and u2 = 0, if x 2 (α 1 , α 2 , d) < x < x 1 (α 1 , α 2 , d); • u1 = 1 and u2 = 0, if x > x 1 (α 1 , α 2 , d). This confirms that the structural properties of the optimal empty repositioning policy obtained from simple two-state situations do carry over to more general systems with multiple demand states and multiple repositioning capacity states. It should be pointed out that the Assumption (A1) ensures that x 2 = z1 = 0 in the two-state situation. If the condition (A1) does not hold, there is no guarantee that any threshold value will take zero. However, the two threshold structure is still preserved. This has been confirmed by additional numerical examples. Our model focuses on operational level of decisions involving empty container repositioning in and out. The threshold structure simplifies the repositioning decision making. These threshold values can be served as safety stock levels. In container logistics management, setting safety stocks of empty containers at seaports or inland depots is an important tactical decision. Our model may be used as a broad-brush planning tool to set safety stocks when the demand flow and repositioning capacity change dynamically (e.g., seasonally).

40

2 Optimal ECR Policy in a Single-Depot System

There are some limitations of the proposed model. Firstly, the fluid model treats the repositioning quantity as a flow. In reality, there are only limited trips that empty containers may be repositioned in each day. Nevertheless, the use of continuous flow may be justified in the following aspects: (i) containers are often moved into (and out of) a depot via multiple transport modes using different types of vehicles, e.g., deep-sea vessels, short-sea vessels, feeders, trains, and trucks. The deep-sea voyages may be limited, while feeders, trains, and trucks access seaports or depots much more frequently. The total volume of daily container movements through most depots is large and may be approximated as a flow with varying capacities; (ii) the flow model is a reasonable approximation to perform tactical planning such as setting safety stocks at seaports or intermodal terminals. Secondly, in this chapter, the leasing decisions are implicit and characterized by the cost structure, which follows the same treatment as Li et al. (2004, 2007) and Song (2007). In reality, leasing decisions may be classified into long-term leasing and on-spot leasing. Different strategies about leasing and off-leasing decisions could be taken. Therefore, a more sophisticated model is required in order to model the specific leasing decisions. Nevertheless, the commonly accepted relationships such as meeting demands should have the highest priority, leasing cost is higher than repositioning cost is generally captured in our model.

2.4 Summary and Notes This chapter seeks the explicit structure and closed-form of the optimal ECR policy in a single-depot system facing uncertain demand and supply. Firstly, we consider the discrete-time sequential ECR situation. We formulate the problem as a discrete-time stochastic dynamic programming model. It is shown that the optimal ECR policy can be characterized by two threshold parameters at each period in the form of (s, S) inventory control. In particular, the threshold parameters at the final period can be obtained in closed-form. Secondly, we consider the continuous-time continuous-state sequential ECR situation. The random demand is modeled as a two-state Markov chain. By treating the flow of containers as continuous fluid, continuous stochastic dynamic programming approach is used to solve the optimal control problem. The associated Hamilton– Jacobi-Bellman (HJB) equations are used to characterize the value functions. The optimal policies are given in terms of four threshold levels. The closed-form solutions of the value functions are obtained, which can then be used to obtain these threshold levels easily. The extension of the model to more general systems with multiple demand states and uncertain repositioning capacities is also addressed. Numerical examples are given to illustrate the models and results. In the literature, Li et al. (2004) considered the ECR problem in a single depot. They modeled the system as a discrete-time periodic review inventory control problem. However, they were not able to obtain closed-form solutions and their calculation of threshold levels of the optimal policies is based on a value iterative

2.4 Summary and Notes

41

algorithm, which is computational demanding. Song and Zhang (2011) investigated the optimal ECR policy in a port with random demands and the existence of delays in container repositioning. The structural properties of the optimal ECR policy are illustrated. Zhang et al. (2014) formulated the ECR problem in a single port as a periodic review inventory control problem over a finite horizon with stochastic import and export of empty containers. The objective is to minimize the total operating cost including container holding cost, stockout cost, importing cost, and exporting cost. The optimal policy at each period is characterized by a pair of threshold levels. A polynomial-time algorithm is developed to determine the two thresholds. Legros et al. (2019) took the consignee’s viewpoint to investigate how to manage empty containers at their location via time-based policy, which is essentially a single-depot system. It is shown the optimal policy is of threshold-type control structure, and the policy structure depends on whether container cleaning costs are considered. It is believed that investigating the control structural properties of the optimal control policies in single-depot systems could offer useful insight to design the control policies for more complex systems, e.g., how to dynamically set up appropriate safety stock levels at multiple depots like two-depot systems, hub-and-spoke systems. Moreover, it is a common practice that large-scale dynamic control problems arising in practice can often be decomposed into smaller and simpler subproblems, where the simpler subproblems are more tractable and their solutions can be combined to construct good control policies to the original large system (Sethi & Zhang, 1994; Song, 2013).

References Chan, F. T. S., Wang, Z., Zhang, J., & Wadhwa, S. (2008). Two-level hedging point control of a manufacturing system with multiple product-types and uncertain demands. International Journal of Production Research, 46(12), 3259–3295. Gershwin, S. B. (1994). Manufacturing systems engineering. Prentice-Hall. Kushner, H. J., & Dupuis, P. G. (1992). Numerical methods for stochastic control problems in continuous time. Springer-Verlag. Legros, B., Bouchery, Y., & Fransoo, J. (2019). A time-based policy for empty container management by consignees. Production and Operations Management, 28(6), 1503–1527. Li, J. A., Liu, K., Stephen, C. S., & Lai, K. K. (2004). Empty container management in a port with long-run average criterion. Mathematical and Computer Modelling, 40, 85–100. Li, J. A., Leung, S. C. H., Wu, Y., & Liu, K. (2007). Allocation of empty containers between multi-ports. European Journal of Operational Research, 182, 400–412. Sethi, S. P. (2019). Optimal control theory: Applications to management science and economics (3rd ed.). Springer. Sethi, S., & Zhang, Q. (1994). Hierarchical decision making in stochastic manufacturing systems. Birkhauser. Song, D. P. (2007). Characterizing optimal empty container reposition policy in periodic-review shuttle service systems. Journal of the Operational Research Society, 58(1), 122–133. Song, D. P. (2013). Optimal control and optimization in stochastic supply chain systems. Springer.

42

2 Optimal ECR Policy in a Single-Depot System

Song, D. P., & Zhang, Q. (2010). A fluid flow model for empty container repositioning policy with a single port and stochastic demand. SIAM Journal on Control and Optimization, 48(5), 3623–3642. Song, D. P., & Zhang, Q. (2011). Optimal inventory control for empty containers in a port with random demands and repositioning delays. In K. Cullinane (Ed.), International handbook of maritime economics, Chap. 14 (pp. 301–321). Edward Elgar Publishing. Tan, B. (2002). Production control of a pull system with production and demand uncertainty. IEEE Transactions on Automatic Control, 47(5), 779–783. Yan, H. M., & Zhang, Q. (1997). A numerical method in optimal production and setup scheduling of stochastic manufacturing systems. IEEE Transactions on Automatic Control, 42(10), 1452–1455. Yin, G., & Zhang, Q. (1998). Continuous-time Markov chain and application: A singular perturbation approach. Springer-Verlag. Zhang, B., Ng, C. T., & Cheng, T. C. E. (2014). Multi-period empty container repositioning with stochastic demand and lost sales. Journal of the Operational Research Society, 65(2), 302–319.

Chapter 3

Optimal ECR Policy in Two-Depot System: Periodic Review

Abstract This chapter considers two depots that are facing independent supply and demand of empty containers from shippers. We seek the optimal ECR policy between two depots over a multi-period planning horizon to minimize the total expected cost consisting of empty container transferring costs between two depots, and inventory holding costs and container leasing costs at both depots. We formulate the problem as a stochastic dynamic programming model. The local properties of the value function such as the first and the second derivatives on a region-wise basis are analyzed. The region-wise properties of the value function enable us to establish the structural characteristics of the optimal ECR policy over multiple time periods. Specifically, the entire state space is divided into three control regions by two monotonic switching curves. The asymptotic behaviors of the switching curves are analyzed. The structural properties of the optimal ECR policy and the asymptotic behaviors of the switching curves are then used to construct simple near-optimal and easy-to-operate policies. Numerical examples are provided to demonstrate the analytical results.

3.1 Introduction This chapter considers a transport company operating two independent depots, which are located in the same hinterland region. Two depots are independent in the sense that they face separate external supplies and demands of empty containers. However, two depots are related in the sense that empty containers can be transferred between them by incurring a transportation cost. Specifically, each depot receives independent random supply of empty containers that are returned by consignees and faces independent random demand from consigners that require empty containers to fulfill the demand. Both random supply and demand are regarded as exogenous variables. Customer demands at the current period must be satisfied using owned or leased containers, where the leasing cost is incurred on time basis. The transport company needs to make dynamic decisions about whether to reposition empty containers in what quantity from one depot to the other over multiple periods so that customer demands can be better satisfied and the total expected cost can be minimized. The

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 D.-P. Song and J. Dong, Modelling Empty Container Repositioning Logistics, https://doi.org/10.1007/978-3-030-93383-8_3

43

44

3 Optimal ECR Policy in Two-Depot System: Periodic Review

discrete-time periodic review mechanism is used. The transport time between two depots is one period. The research methodology and technique used in this chapter are based on Ng et al. (2012). However, container leasing is considered instead of demand backlogging. As a result, the structure of the optimal ECR policy becomes much simpler and easier to interpret. The rest of the chapter is organized as follows. In Sect. 3.2, a discrete-time stochastic dynamic programming model is formulated. In Sect. 3.3, we establish the structural properties of the optimal policies analytically. In Sect. 3.4, the structural properties are utilized to construct simple near-optimal policies, which are easier to implement in practice. In Sect. 3.5, numerical examples are provided. A summary and notes are given in Sect. 3.6.

3.2 A Discrete Stochastic Dynamic Programming Model Consider a transport company operating two depots and managing empty container transfer between them over a dynamic multi-period planning horizon. Two depots have their own supply and demand for empty containers as shown in Fig. 3.1. For simplicity, we assume a single type of container, i.e., twenty-foot equivalent unit (TEU). The following notations are introduced n N x i,n-1 un

ZIi ZOi zi f i (.)

a discrete decision period; the length of the planning horizon; the inventory level of empty containers at depot i at the beginning of period n, which can be negative (indicating the leased containers); the number of empty containers repositioned from depot 1 to depot 2 in period n, which is a decision variable; a negative un indicates the number of containers repositioned from depot 2 to depot 1; the random supply of containers into depot i in a period; the random demand of containers out of depot i in a period; the random variable representing the net number of containers in a period into depot i, which is defined as: zi := ZIi − ZOi ; the probability distribution function (pdf) of random variable zi ;

Fig. 3.1 A two-depot system

supply

Depot 1

Demand

supply

Empty containers

Depot 2

Demand

3.2 A Discrete Stochastic Dynamic Programming Model

F i (.) zi,n cij hi bi

45

the cumulative distribution function (cdf) of random variable zi ; the net number of containers into depot i at period n, which is a sample of zi ; the transportation cost of repositioning an empty container from depot i to depot j; the on-hand inventory holding cost of an empty container per period at depot i; the leasing cost of an empty container per period at depot i.

Assumption 3.1

c12 < h 1 + b2 and c21 < h 2 + b1 .

Assumption 3.2 The pdf of the random variable zi is continuous and satisfies f i (.) > 0 for i = 1 and 2. Assumption 3.1 indicates that the transportation cost of repositioning one container from one depot to the other is less than the sum of the unit holding cost at the current depot and the unit leasing cost at the other depot. This is reasonable and makes the problem nontrivial. Otherwise, there is no need to reposition empty containers between the two depots. Assumption 3.2 will make our later analysis rigorous. The normal distribution satisfies Assumption 3.2. However, the main results such as the structural properties of the optimal policy and the development of near-optimal policies are preserved for more general distributions (e.g., discrete uniform distributions as illustrated in the numerical example). It is assumed that the transport lead-time between the two depots is one period. The system state, i.e., the inventory levels at the two depots, in period n is determined by x1,n = x1,n−1 + z 1,n − u n , and x2,n = x2,n−1 + z 2,n + u n . The problem is to find the optimal dynamic ECR policy {un | 1 ≤ n ≤ N} that minimizes the following finite horizon cost function with the initial state (x 1,0 , x 2,0 ) N 

  +  − − α n E c12 u + + b1 x1,n n + c21 u n + h 1 x 1,n

n=1

 +  −   +h 2 x2,n + b2 x2,n | x1,0 , x2,0 ,

where α is a discount factor (0 < α ≤ 1), x + = max{0, x}, and x – = max{0, –x}. Let V n (x 1,n-1 , x 2,n-1 ) be the expected discounted cost from period n to N. The backward Bellman optimality equation is given by   + − +    + c21 u − + h 1 E x1,n + b1 E x1,n + h 2 E x2,n Vn x1,n−1 , x2,n−1 = min{c12 u + n n un  −   + b2 E x2,n + a E Vn+1 x1,n , x2,n }, where VN +1 (x1 , x2 ) ≡ 0 for all (x1 , x2 ). To simplify the exposition, let

46

3 Optimal ECR Policy in Two-Depot System: Periodic Review

  + − + −     L n x1,n−1 , x2,n−1 , u n : = h 1 E x1,n + b1 E x1,n + h 2 E x2,n + b2 E x2,n   + α E Vn+1 x1,n , x2,n     − G n x1,n−1 , x2,n−1 , u n := c12 u + n + c21 u n + L n x 1,n−1 , x 2,n−1 , u n . Therefore,  the problem is to seek the  solution {u n | 1 ≤ n ≤ N } to the Bellman equation Vn x1,n−1 , x2,n−1 = min G n x1,n−1 , x2,n−1 , u n from period 1 to period un

N. With a slight misuse of the notation and further simplification of the exposition, we drop the subscript n in the system state and control decision, and denote x: = (x 1 , x 2 ). The Bellman optimality equation is given by, Vn (x) = min G n (x, u),

(3.1)

G n (x, u) = c12 u + + c21 u − + L n (x, u),

(3.2)

un

L n (x, u) = h 1 E(x1 + z 1 − u)+ + b1 E(x1 + z 1 − u)− + h 2 E(x2 + z 2 + u)+ + b2 E(x2 + z 2 + u)− + α E Vn+1 (x1 + z 1 − u, x2 + z 2 + u). (3.3) In the following, we will establish the explicit structural properties of the optimal control policy so that it can be better understood from the manager’s perspective. The main technique is the backward induction approach.

3.3 Optimal ECR Policy and Its Structural Properties In this section, we first explore and establish the structural properties of the final period value function V N (x) and the final period optimal control, then show that the properties can be carried over to other decision periods.

3.3.1 The Properties of the Value Function at Period N From V N+1 (x) ≡ 0 for all x, we have  L N (x, u) = h 1 + b1

+∞

(x1 + z 1 − u) f 1 (z 1 )dz 1

u−x1  u−x1 −∞

−(x1 + z 1 − u) f 1 (z 1 )dz 1

3.3 Optimal ECR Policy and Its Structural Properties



+∞



−u−x2 −u−x2

+ h2 + b2

−∞

47

(x2 + z 2 + u) f 2 (z 2 )dz 2 −(x2 + z 2 + u) f 2 (z 2 )dz 2 ,

∂ L N (x, u)/∂u = (h 1 + b1 )F1 (u − x1 ) − h 1 + h 2 − (h 2 + b2 )F2 (−u − x2 ), ∂ 2 L N (x, u)/∂u∂u = (h 1 + b1 ) f 1 (u − x1 ) + (h 2 + b2 ) f 2 (−u − x2 ) ≥ 0. Therefore, ∂ L N (x, u)/∂u is differentiable and monotonic increasing in u. From (3.2), it is clear that ∂G N (x, u)/∂u is monotonic increasing in u, but not continuous at u = 0. In order to obtain the optimal control u ∗N (x), we define two switching surfaces D N (x) and U N (x) as follows: D N (x) = min{u | ∂ L N (x, u)/∂u ≥ −c12 }, U N (x) = max{u | ∂ L N (x, u)/∂u ≤ c21 }. It is clear that D N (x) ≤ U N (x). Moreover, from Assumption 3.2, we have ∂ 2 L N (x, D N (x))/∂u∂u > 0, and ∂ 2 L N (x, U N (x))/∂u∂u > 0. −1 D N (x) = ∂ L −1 N (x, −c12 )/∂u and U N (x) = ∂ L N (x, c21 )/∂u

The above results are also true for other common probability distributions (e.g., uniform and gamma distributions) under an additional condition, i.e., c21 and – c12 do not equal –h1 + h2 , b1 – b2 . Note that for any given x, ∂ L N (x, u)/∂u is monotonic increasing from –h1 – b2 to h2 + b1 . For common probability distributions, ∂ L N (x, u)/∂u may be flat at most four values, i.e., –h1 – b2 , –h1 + h2 , b1 – b2 , h2 + b1 . Since c21 and –c12 do not equal any of these four numbers, D N (x) and U N (x) are unique, and the above results and the results in Lemma 3.1 hold. From the monotonicity of ∂ L N (x, u)/∂u and ∂G N (x, u)/∂u in u, and the definitions of D N (x) and U N (x) the optimal control u ∗N (x) is given by: u ∗N (x) = D N (x), if D N (x) > 0; u ∗N (x) = 0, if D N (x) ≤ 0 ≤ U N (x); u ∗N (x) = U N (x), if U N (x) < 0. More specifically, the optimal control at n = N can be described in three regions as follows:

48

3 Optimal ECR Policy in Two-Depot System: Periodic Review

⎧ (I) = {x: D N (x) > 0} ⎨ D N (x) u ∗N (x) = 0 (II) = {x: D N (x) ≤ 0 ≤ U N (x)} ⎩ (III) = {x: U N (x) < 0} U N (x)

(3.4)

Lemma 3.1 (i) D N (x) and U N (x) are continuously differentiable; (ii) The optimal value function VN (x) is continuous in the entire defined domain. Proof (i)

(ii)

Note that ∂ L N (x, u)/∂u is continuously differentiable in (x, u), ∂ 2 L N (x, D N (x))/∂u∂u > 0, and ∂ 2 L N (x, U N (x))/∂u∂u > 0. From the Implicit Function Theorem (Douglass, 1996, p. 415), assertion (i) is true. Note that VN (x) = G N (x, u ∗N (x)). From (3.4), we have, VN (x) = L N (x, 0) in regions (II); VN (x) = c12 D N (x) + L N (x, D N (x)) in region (I); and VN (x) = −c21 U N (x) + L N (x, U N (x)) in region (III). Clearly, VN (x) is continuous within each of (I)–(III) control regions.

Check all the boundaries, it is clear that VN (x) is continuous on all the above boundaries due to the continuity of L N (x, u), D N (x), and U N (x). This completes the proof. 

3.3.2 The Structural Properties of the Value Function and the Optimal Control at Period n Equation (3.4) characterizes the optimal control at n = N by three control regions. For the multiple period decision-making problems, we would like to know whether the optimal decisions at each period can be characterized and whether they share similar structural properties. This section will prove that the optimal ECR decisions at each period have the same structural properties as those in the final period. Proposition 3.1 For any n ∈ {1, . . . , N }, we have (i) (ii) (iii)

L n (x, u) and G n (x, u) are twice continuously differentiable except a finite number of boundary surfaces; ∂ L n (x, u)/∂u is continuous in (x, u) and strictly monotonic increasing in u; The optimal policy u ∗n (x) can be characterized by two switching surfaces, defined by: Dn (x) = min{u | ∂ L n (x, u)/∂u ≥ −c12 }, Un (x) = max{u | ∂ L n (x, u)/∂u ≤ c21 };

3.3 Optimal ECR Policy and Its Structural Properties

(iv) (v)

(vi)

(vii)

49

Dn (x) and Un (x) are continuous in x and differentiable except a finite number of boundary curves; ∂ 2 L n (x, u)/∂ x1 ∂ x1 ≥ 0, ∂ 2 L n (x, u)/∂ x2 ∂ x2 ≥ 0, ∂ 2 L n (x, u)/∂u∂u > 0, ∂ 2 L n (x, u)/∂u∂ x1 ≤ 0, ∂ 2 L n (x, u)/∂u∂ x2 ≥ 0, ∂ 2 G n (x, u)/∂ x1 ∂ x1 ≥ 0, ∂ 2 G n (x, u)/∂ x2 ∂ x2 ≥ 0, ∂ 2 G n (x, u)/∂u∂u > 0 at differentiable points; ∂ 2 L n (x, u)/∂ x1 ∂ x1 · ∂ 2 L n (x, u)/∂u∂u ≥ [∂ 2 L n (x, u)/∂u∂ x1 ]2 , ∂ 2 L n (x, u)/∂ x2 ∂ x2 · ∂ 2 L n (x, u)/∂u∂u ≥ [∂ 2 L n (x, u)/∂u∂ x2 ]2 and ∂ 2 L n (x, u)/∂ x1 ∂ x1 ·∂ 2 L n (x, u)/∂ x2 ∂ x2 ≥ [∂ 2 L n (x, u)/∂ x1 ∂ x2 ]2 at differentiable points; Vn (x) is twice differentiable except a finite number of boundary curves, ∂ 2 Vn (x)/∂ x1 ∂ x1 ≥ 0, ∂ 2 Vn (x)/∂ x2 ∂ x2 ≥ 0 , ∂ 2 Vn (x, u)/∂ x2 ∂ x1 − ∂ 2 Vn (x, u)/∂ x1 ∂ x1 ≤ 0 , ∂ 2 Vn (x)/∂ x2 ∂ x2 − ∂ 2 Vn (x)/∂ x1 ∂ x2 ≥ 0 and ∂ 2 Vn (x)/∂ x1 ∂ x1 · ∂ 2 Vn (x)/∂ x2 ∂ x2 ≥ [∂ 2 Vn (x)/∂ x1 ∂ x2 ]2 at differentiable points.

Proof We use the backward induction approach to prove all the assertions in Proposition 3.1. It is easy to check that assertions (i)–(vi) are true for n = N and assertion (vii) is true for n = N + 1 due to V N+1 (x) ≡ 0 for all x and Assumption 3.2. Suppose all the assertions hold for n + 1. We want to show that they are also true for n. To simplify notation, define a vector y: = (x 1 + z1 – u, x 2 + z2 + u) in the rest of this proof. Assertion (i): From the induction hypotheses and their definitions, we have: L n (x, u) and Gn (x, u) is twice continuously differentiable except a finite number of boundary surfaces. Assertion (ii): Note that L n (x, u) = L N (x, u) + αEVn+1 ( y) from (3.3). The continuity of ∂ L n (x, u)/∂u is assured because it is the sum of a continuous function ∂ L N (x, u)/∂u and the integral of a piecewise differentiable function. Note that, ∂ 2 L n (x, u)/∂u∂u = ∂ 2 L N (x, u)/∂u∂u + α E[∂ 2 Vn+1 ( y)/∂ x1 ∂ x1 − ∂ 2 Vn+1 ( y)/∂ x1 ∂ x2 − ∂ 2 Vn+1 ( y)/∂ x2 ∂ x1 + ∂ 2 Vn+1 ( y)/∂ x2 ∂ x2 ]. The induction hypothesis (v) and ∂ 2 L N (x, u)/∂u∂u > 0 lead to ∂ 2 L n (x, u)/∂u∂u > 0. This implies that ∂ L n (x, u)/∂u is strictly monotonic increasing in u. Assertion (iii): Note that Dn (x) ≤ Un (x) from their definitions. Consider three cases below. Case 1: 0 < Dn (x). We first check the impact of control u on Gn (x, u). If u < 0 then ∂G n (x, u)/∂u = −c21 + ∂ L n (x, u)/∂u < −c21 − c12 < 0. If 0 < u < Dn (x), then ∂G n (x, u)/∂u = c12 + ∂ L n (x, u)/∂u < 0. If u > Dn (x), then ∂G n (x, u)/∂u = c12 + ∂ L n (x, u)/∂u ≥ 0. The above results imply that Gn (x, u) is decreasing when u < Dn (x), and is increasing when u > Dn (x). Therefore, we have

50

3 Optimal ECR Policy in Two-Depot System: Periodic Review

u ∗n (x) = argminu G n (x, u) = Dn (x). It yields Vn (x) = G n (x, Dn (x)) = c12 Dn (x) + L n (x, Dn (x)) in Case 1. Case 2: Dn (x) ≤ 0 ≤ Un (x). We can observe If u < 0: ∂G n (x1 , x2 , u)/∂u = −c21 + ∂ L n (x1 , x2 , u)/∂u ≤ 0 If 0 < u: ∂G n (x1 , x2 , u)/∂u = c12 + ∂ L n (x1 , x2 , u)/∂u > 0 Therefore, argminu {G n (x1 , x2 , u)} = 0. We have u ∗n (x) = 0 and Vn (x) = G n (x, 0) = L n (x, 0) in Case 2. Case 3: Un (x) < 0. If u < Un (x1 , x2 ): ∂G n (x1 , x2 , u)/∂u = −c21 + ∂ L n (x1 , x2 , u)/∂u ≤ 0. If Un (x1 , x2 ) < u < 0: ∂G n (x1 , x2 , u)/∂u = −c21 + ∂ L n (x1 , x2 , u)/∂u > 0. If u > 0 : ∂G n (x1 , x2 , u)/∂u = c12 + ∂ L n (x1 , x2 , u)/∂u > 0. Therefore, argminu {G n (x1 , x2 , u)} = Un (x) and u ∗n (x) = Un (x). It follows, Vn (x) = G n (x, Un (x)) = −c21 Un (x) + L n (x, Un (x)) in Case 3. In summary, the optimal policy in period n can be characterized by two switching surfaces dividing the state space into three regions below, which implies that assertion (iii) is true. ⎧ (I) = {x : Dn (x) > 0} ⎨ Dn (x) u ∗n (x) = 0 (II) = {x : Dn (x) ≤ 0 ≤ Un (x)} ⎩ (III) = {x : Un (x) < 0} Un (x)   In addition, the optimal value function is given by Vn (x) = G n x, u ∗n (x) and can be described in three regions (its continuity at boundaries is easy to check). ⎧ ⎨ c12 Dn (x) + L n (x, Dn (x)) x ∈ (I) Vn (x) = x ∈ (II) L n (x, 0) ⎩ −c21 Un (x) + L n (x, Un (x)) x ∈ (III) Assertion (iv): Because ∂ L n (x, u)/∂u is continuous in (x, u) and strictly monotonic increasing in u, Dn (x) and U n (x) can be written as: −1 Dn (x) = ∂ L −1 n (x, −c12 )/∂u and Un (x) = ∂ L n (x, c21 )/∂u

They are continuous in x due to the continuity of ∂ L n (x, u)/∂u. By induction hypotheses (i) and (v), we know L n (x, u) is twice differentiable and ∂ 2 L n (x, u)/∂u∂u > 0 except a finite number of boundary surfaces. From the

3.3 Optimal ECR Policy and Its Structural Properties

51

Implicit Function Theorem, it follows that Dn (x) and U n (x) are differentiable except a finite number of boundary curves. Assertion (v): ∂ 2 G n (x, u)/∂.∂. = ∂ 2 L n (x, u)/∂.∂. except the non-differentiable surface u = 0. We only need to prove the assertions for ∂ 2 L n (x, u)/∂.∂. Note that L n (x, u) = L N (x, u) + α E Vn+1 ( y), ∂ 2 L n (x, u)/∂ x1 ∂ x1 = ∂ 2 L N (x, u)/∂ x1 ∂ x1 + α E∂ 2 Vn+1 ( y)/∂ x1 ∂ x1 , ∂ 2 L n (x, u)/∂ x2 ∂ x2 = ∂ 2 L N (x, u)/∂ x2 ∂ x2 + α E∂ 2 Vn+1 ( y)/∂ x2 ∂ x2 , ∂ 2 L n (x, u)/∂u∂u = ∂ 2 L N (x, u)/∂u∂u + α E[∂ 2 Vn+1 ( y)/∂ x1 ∂ x1 − ∂ 2 Vn+1 ( y)/∂ x1 ∂ x2 − ∂ 2 Vn+1 ( y)/∂ x2 ∂ x1 + ∂ 2 Vn+1 ( y)/∂ x2 ∂ x2 ], ∂ 2 L n (x, u)/∂u∂ x1 = ∂ 2 L N (x, u)/∂u∂ x1 + α E[−∂ 2 Vn+1 ( y)/∂ x1 ∂ x1 + ∂ 2 Vn+1 ( y)/∂ x2 ∂ x1 ], ∂ 2 L n (x, u)/∂u∂ x2 = ∂ 2 L N (x, u)/∂u∂ x2 + α E[−∂ 2 Vn+1 ( y)/∂ x1 ∂ x2 + ∂ 2 Vn+1 ( y)/∂ x2 ∂ x2 ],

From the induction hypotheses of assertion (vii), assertion (v) is true. Assertion (vi): Note that ∂ 2 L n (x, u)/∂ x1 ∂ x1 · ∂ 2 L n (x, u)/∂u∂u − [∂ 2 L n (x, u)/∂ x1 ∂u]2 = [∂ 2 L N (x, u)/∂ x1 ∂ x1 + α E∂ 2 Vn+1 ( y)/∂ x1 ∂ x1 ] · [∂ 2 L N (x, u)/∂u∂u + α E(∂ 2 Vn+1 ( y)/∂ x1 ∂ x1 − 2∂ 2 Vn+1 ( y)/∂ x1 ∂ x2 + ∂ 2 Vn+1 ( y)/∂ x2 ∂ x2 )] − [∂ 2 L N (x, u)/∂ x1 ∂u + α E(∂ 2 Vn+1 ( y)/∂ x1 ∂ x2 − ∂ 2 Vn+1 ( y)/∂ x1 ∂ x1 )]2 = [∂ 2 L N (x, u)/∂ x1 ∂ x1 · ∂ 2 L N (x, u)/∂u∂u − (∂ 2 L N (x, u)/∂ x1 ∂u)2 ] + [∂ 2 L N (x, u)/∂ x1 ∂ x1 · α E∂ 2 Vn+1 ( y)/∂ x2 ∂ x2 + ∂ 2 L N (x, u)/∂u∂ x2 · α E∂ 2 Vn+1 ( y)/∂ x1 ∂ x1 ] + [α E∂ 2 Vn+1 ( y)/∂ x1 ∂ x1 · α E∂ 2 Vn+1 ( y)/∂ x2 ∂ x2 − α E∂ 2 Vn+1 ( y)/∂ x1 ∂ x2 · α E∂ 2 Vn+1 ( y)/∂ x1 ∂ x2 ] ≥ 0. = The above equation utilizes the fact that ∂ 2 L N (x, u)/∂ x1 ∂ x1 −∂ 2 L n (x, u)/∂ x1 ∂u and ∂ 2 L N (x, u)/∂u∂u = ∂ 2 L N (x, u)/∂u∂ x2 − ∂ 2 L N (x, u)/∂u∂ x1 ; the last inequality is due to the induction hypothesis and Eξ 2 ≥ (Eξ )2 . Similarly, we can prove other inequalities (using the fact ∂ 2 L N (x, u)/∂ x1 ∂ x2 = 0. Thus, assertion (vi) is true. Assertion (vii): From the definitions we have: ∂ L n (x, Dn (x))/∂u = −c12 and ∂ L n (x, Un (x))/∂u = c21 . Then, corresponding to the three regions of u ∗n (x) in the proof of Assertion (iii), we can obtain the first and second partial derivatives of V n (x) in each region as follows,

52

3 Optimal ECR Policy in Two-Depot System: Periodic Review

⎧ ⎨ ∂ L n (x, Dn (x))/∂ x1 x ∈ (I) ∂ Vn (x)/∂ x1 = ∂ L n (x, 0)/∂ x1 x ∈ (II) ⎩ ∂ L n (x, Un (x))/∂ x1 x ∈ (II) ⎧ ⎨ ∂ L n (x, Dn (x))/∂ x2 x ∈ (I) ∂ Vn (x)/∂ x2 = x ∈ (II) ∂ L n (x, 0)/∂ x2 ⎩ ∂ L n (x, Un (x))/∂ x2 x ∈ (III) The second derivatives of the value function V n (x) in each region are given in Table 3.1. This indicates that V n (x) is twice differentiable except for the boundary curves between three regions. This indicates that V n (x) is twice differentiable except the boundary ≥ 0, curves between the three regions. To show ∂ 2 Vn (x)/∂ x1 ∂ x1 we check all three regions. In region (I), note that ∂ Dn (x)/∂ x1 = −∂ 2 L n (x, Dn (x))/∂u∂ x1 /∂ 2 L n (x, Dn (x))/∂u∂u. It follows that ∂ 2 Vn (x)/∂ x1 ∂ x1 = ∂ 2 L n (x, Dn (x))/∂ x1 ∂ x1 + ∂ 2 L n (x, Dn (x))/∂ x1 ∂u · ∂ Dn (x)/∂ x1 ≥ 0 The last inequality is from assertion (vi). For other regions, the results can be similarly proved from assertions (v)–(vii). Applying similar arguments, we can prove that ∂ 2 Vn (x)/∂ x2 ∂ x2 ≥ 0 in three regions. To show ∂ 2 Vn (x)/∂ x2 ∂ x1 − ∂ 2 Vn (x)/∂ x1 ∂ x1 ≤ 0, we check all three regions one by one. Define: Tn (x) := ∂ Vn (x)/∂ x2 − ∂ Vn (x)/∂ x1 Table 3.1 The second derivatives of the value function Control region

(I)

(II)

(III)

∂ 2 Vn (x)/∂ x1 ∂ x1

∂ 2 Vn (x)/∂ x2 ∂ x2

∂ 2 Vn (x)/∂ x2 ∂ x1

∂ 2 L n (x, Dn (x))/

∂ 2 L n (x, Dn (x))/

∂ 2 L n (x, Dn (x))/

∂ x1 ∂ x1 +

∂ x2 ∂ x2 +

∂ x2 ∂ x1 +

∂ L n (x, Dn (x))/∂ x1 ∂u·

∂ L n (x, Dn (x))/∂ x2 ∂u·

∂ 2 L n (x, Dn (x))/∂ x2 ∂u·

∂ Dn (x)/∂ x1

∂ Dn (x)/∂ x2

∂ 2 Dn (x)/∂ x1

∂ 2 L n (x, 0)/∂ x1 ∂ x1

∂ 2 L n (x, 0)/∂ x2 ∂ x2

∂ 2 L n (x, 0)/∂ x2 ∂ x1

∂ 2 L n (x, Un (x))/∂ x1 ∂ x1

∂ 2 L n (x, Un (x))/∂ x2 ∂ x2

∂ 2 L n (x, Un (x))/∂ x2 ∂ x1

+∂ 2 L n (x, Un (x))/∂ x1 ∂u

+∂ 2 L n (x, Un (x))/∂ x2 ∂u

+∂ 2 L n (x, Un (x))/∂ x2 ∂u

·∂Un (x)/∂ x1

·∂Un (x)/∂ x2

·∂Un (x)/∂ x1

2

2

3.3 Optimal ECR Policy and Its Structural Properties

53

In region (I), we have u ∗n (x) = Dn (x) and Vn (x) = G n (x, Dn (x)) = c12 Dn (x)+ L n (x, Dn (x)), which yields Tn (x) = c12 · ∂ Dn (x)/∂ x2 − c12 · ∂ Dn (x)/∂ x1 + ∂ L n (x, Dn (x))/∂ x2 + ∂ L n (x, Dn (x))/∂u · ∂ Dn (x)/∂ x2 − ∂ L n (x, Dn (x))/∂ x1 − ∂ L n (x, Dn (x))/∂u · ∂ Dn (x)/∂ x1 = ∂ L n (x, Dn (x))/∂ x2 − ∂ L n (x, Dn (x))/∂ x1 = ∂ L n (x, Dn (x))/∂u = −c12 The second and the fourth equations utilize the result of ∂ L n (x, Dn (x))/∂u = −c12 . The third equation is from the definition of L n (x, u) in (3.3). Thus, ∂ 2 Vn (x)/∂ x2 ∂ x1 − ∂ 2 Vn (x)/∂ x1 ∂ x1 = 0. In fact, we actually have ∂ 2 Vn (x)/∂ x2 ∂ x1 = ∂ 2 Vn (x)/∂ x1 ∂ x1 = ∂ 2 Vn (x)/∂ x2 ∂ x2 = 0 in this region. In regions (II), we have u ∗n (x) = 0 and Vn (x) = G n (x, 0) = L n (x, 0). Thus, Tn (x) = ∂ L n (x, 0)/∂ x2 − ∂ L n (x, 0)/∂ x1 = ∂ L n (x, 0)/∂u. The second equation is from the definition of L n (x, u) in (3.3). It follows, ∂ 2 Vn (x)/∂ x2 ∂ x1 − ∂ 2 Vn (x)/∂ x1 ∂ x1 = ∂ 2 L N (x, u)/∂u∂ x1 + αE∂ 2 Vn+1 ( y)/∂ x2 ∂ x1 − αE∂ 2 Vn+1 ( y)/∂ x1 ∂ x1 when u = 0. This is less than 0 due to ∂ 2 L N (x, u)/∂u∂ x1 ≤ 0 and the induction hypothesis. Similarly, we can prove: T n (x) = – c12 in region (III). In fact, we actually have ∂ 2 Vn (x)/∂ x2 ∂ x1 = ∂ 2 Vn (x)/∂ x1 ∂ x1 = ∂ 2 Vn (x)/∂ x2 ∂ x2 = 0 in this region. Therefore, ∂ 2 Vn (x)/∂ x2 ∂ x1 − ∂ 2 Vn (x)/∂ x1 ∂ x1 ≤ 0 holds for all three regions. Similarly, we can prove ∂ 2 Vn (x)/∂ x2 ∂ x2 − ∂ 2 Vn (x)/∂ x1 ∂ x2 ≥ 0. Finally, we show ∂ 2 Vn (x)/∂ x1 ∂ x1 · ∂ 2 Vn (x)/∂ x2 ∂ x2 ≥ [∂ 2 Vn (x)/∂ x1 ∂ x2 ]2 . For the regions (I) and (III), the above proof has already shown ∂ 2 Vn (x)/∂ x2 ∂ x1 = ∂ 2 Vn (x)/∂ x1 ∂ x1 = ∂ 2 Vn (x)/∂ x2 ∂ x2 = 0. For region (II), we have V n (x, 0) = L n (x, 0). Assertion (vi) leads to this assertion. Therefore, ∂ 2 Vn (x)/∂ x1 ∂ x1 · ∂ 2 Vn (x)/∂ x2 ∂ x2 ≥ [∂ 2 Vn (x)/∂ x1 ∂ x2 ]2 . This completes the induction proof for all the assertions in Proposition 3.1.  It is worth noting that a more general assumption for the probability density functions f 1 (.) and f 2 (.) is that they are always positive within an interval, but take zero values outside this interval, e.g., the uniform distribution and gamma distribution. Under such an assumption, Proposition 3.1(ii) should be changed to: ∂ L n (x, u)/∂u is continuous in (x 1 , x 2 , u) and monotonic increasing in u. In Proposition 3.1(v), ∂ 2 L n (x, u)/∂u∂u > 0 should be replaced by ∂ 2 L n (x, u)/∂u∂u ≥ 0 However, the proof is similar, but an extra condition is required, i.e., c21 and –c12 do not equal –h1 + h2 or b1 – b2 . Take Proposition 3.1(iv) as an example, it is easy to show that ∂ L N (x, u)/∂u has an inverse function except at most four points (i.e., –h1 – b2 , –h1 + h2 , b1 – b2 , h1 + b2 ). Therefore, if c21 and –c12 do not take these four values, then DN (x) and U N (x) are differentiable except a finite number of boundary curves. For n < N, note that ∂ L n (x, u)/∂u = ∂ L N (x, u) + αE[−∂ Vn+1 (x1 + z 1 − u, x2 + z 2 + u)/∂ x1 +

54

3 Optimal ECR Policy in Two-Depot System: Periodic Review

∂ Vn+1 (x1 + z 1 − u, x2 + z 2 + u)/∂ x2 ]. Both terms on the right-hand side of the above equation are monotonic increasing in u. Therefore, ∂ L n (x, u)/∂u also has an inverse function except at most four points (i.e., –h1 – b2 , –h1 + h2 , b1 – b2 , h1 + b2 ). The differentiability of Dn (x) and U n (x) can be similarly established if c21 and –c12 do not take these four values. Definition 3.1 A transition function T n (x) is defined as follows: Tn (x) := −∂ Vn (x)/∂ x1 + ∂ Vn (x)/∂ x2 . Physically, this function can be interpreted as the difference in cost increasing rates between the directions x 2 and x 1 . Proposition 3.2 The transition function T n (x) has the following properties: (i) (ii) (iii)

T n (x) = –c12 in region (I), {(x 1 , x 2 ) | 0 < Dn (x)}; T n (x) = c21 in region (III), {(x 1 , x 2 ) | 0 < –U n (x)}; Tn (x) = ∂ L n (x, 0)/∂u in region (II), {(x1 , x2 ) | Dn (x) ≤ 0 ≤ Un (x)};

Proof The results have been proved in the proof of Proposition 3.1(vii).  Physically, Proposition 3.2(i) indicates when the system state is located in region (I), we could reduce cost by c12 if we transport one empty container from depot 1 to depot 2. On the other hand, Proposition 3.2(ii) indicates when the system state is located in region (III), we could reduce cost by c21 if we reposition one empty container from depot 2 to depot 1. Corollary 3.1 ∂ L n (x, u)/∂u is decreasing in x 1 and increasing in x 2 . Proof Proposition 3.1(v) yields ∂ 2 L n (x, u)/∂u∂ x1 ≤ 0 and ∂ 2 L n (x, u)/∂u∂ x2 ≥ 0. Proposition 3.1(ii) indicates that ∂ L n (x, u)/∂u is continuous in (x, u). Therefore, ∂ L n (x, u)/∂u is decreasing in x 1 and increasing in x 2 . This completes the proof.  Proposition 3.3 (i) ∂ Dn (x)/∂ x1 ≥ 0 and ∂ Dn (x)/∂ x2 ≤ 0; (ii) ∂Un (x)/∂ x1 ≥ 0 and ∂Un (x)/∂ x1 ≤ 0. Proof Note that ∂ L n (x, Dn (x))/∂u = −c12 from its definition and Corollary 3.1. Taking its derivative with respect to x 1 and x 2 , respectively, it yields, ∂ 2 L n (x, Dn (x1 , x2 ))/∂u∂ x1 + ∂ 2 L n (x, Dn (x))/∂u∂u · ∂ Dn (x)/∂ x1 = 0, ∂ 2 L n (x, Dn (x1 , x2 ))/∂u∂ x2 + ∂ 2 L n (x, Dn (x))/∂u∂u · ∂ Dn (x)/∂ x2 = 0. From Proposition 3.1(v), assertion (i) is true. Similar arguments apply to U n (x). This completes the proof.  Note that Dn (x) and U n (x) are continuous in x; together with Proposition 3.3, it follows that Dn (x) and U n (x) are increasing in x 1 and decreasing in x 2 . In the (x 1 , x 2 ) plane, the control regions are divided into three regions by two switching curves: Dn (x) = 0 and U n (x) = 0 where Dn (x) ≤ U n (x). Next, we examine the detailed structural properties of these switching curves.

3.3 Optimal ECR Policy and Its Structural Properties

55

Proposition 3.4 The switching curves Dn (x) = 0 and U n (x) = 0 have the following properties: (i) (ii) (iii)

The curves Dn (x) = 0 and U n (x) = 0 in the (x 1 , x 2 ) plane are monotonic increasing in x 1 , i.e., dx 2 /dx 1 ≥ 0 for both curves Dn (x) = 0 and U n (x) = 0; The curve U n (x) = 0 is located above the curve Dn (x) = 0 in the (x 1 , x 2 ) plane; The set {x | Dn (x) > 0} is located below the curve Dn (x) = 0; the set {x | U n (x) < 0} is located above the curve U n (x) = 0; while the set {x | Dn (x) < 0 < U n (x)} is located between two switching curves.

Proof For the switching curve Dn (x) = 0, taking its derivative with respect to x 1 , we have: ∂ Dn (x)/∂ x1 + ∂ Dn (x)/∂ x2 · dx2 /dx1 = 0. It follows that dx2 /dx1 = −∂ 2 L n (x, Dn (x))/∂u∂ x1 /∂ 2 L n (x, Dn (x))/∂u∂ x2 ≥ 0. The last inequality is due to Proposition 3.1(v). Therefore, as x 1 increases, x 2 is increasing for Dn (x) = 0. Similar arguments apply to the switching curve U n (x) = 0. For assertion (ii), note that the switching curve Dn (x) = 0 is defined by ∂ L n (x, 0)/∂u = −c12 and the switching curve U n (x) = 0 is defined by ∂ L n (x, 0)/∂u = c21 . From Corollary 3.1, ∂ L n (x, 0)/∂u is increasing in x 2 . It follows that the switching curve U n (x) = 0 is located above Dn (x) = 0 in the (x 1 , x 2 ) plane. For assertion (iii), consider any pair (x 1 , x 2 ) such that Dn (x) > 0 and ∂ L n (x, Dn (x)) = −c12 . As ∂ L n (x, u) is increasing in u and x 2 , and decreasing in x 1 from Proposition 3.1 and Corollary 3.1, we must have that the pair (x 1 , x 2 ) is located below the curve Dn (x) = 0. Similar arguments apply to the remaining assertions. This completes the proof. 

3.3.3 Asymptotic Structural Properties of the Optimal Control Policy In the remainder of this section, we aim at establishing the asymptotic structural properties of the optimal control policy by examining the asymptotic behaviors of the switching curves in the (x 1 , x 2 ) plane and the switching surfaces. We also derive the relationships between such asymptotic behaviors and the system cost parameters. Lemma 3.2 ∂ L n (x, 0)/∂u and T n (x) are bounded. Proof From Proposition 3.2, we have min(−c12 , inf{∂ L n (x, 0)/∂u}) ≤ Tn (x) ≤ max(c21 , sup{∂ L n (x, 0)/∂u}). From Assumption 3.1,

56

3 Optimal ECR Policy in Two-Depot System: Periodic Review

∂ L N (x, u)/∂u = (h 1 + b1 )F1 (u − x1 ) − h 1 + h 2 − (h 2 + b2 )F2 (−u − x2 ), ∂ L n (x, u)/∂u = ∂ L N (x, u) + αETn+1 (x1 + z 1 − u, x2 + z 2 + u), It follows, −h 1 − b2 < ∂ L N (x, 0)/∂u < b1 + h 2 ; −h 1 − b2 < TN (x1 , x2 ) < b1 + h 2 and −h 1 − b2 + α(−h 1 − b2 ) + . . . + α N −n (−h 1 − b2 ) < ∂ L n (x, 0)/∂u <  (b1 + h 2 ) + α(b1 + h 2 ) + . . . + α N −n (b1 + h 2 ). This completes the proof. Lemma 3.3 (i) (ii)

∂ L n (+∞, −∞, 0)/∂u = −h 1 − b2 − αc12 and ∂ L n (−∞, +∞, 0)/∂u = b1 + h 2 + αc21 for n < N ; Tn (+∞, −∞) = −c12 and Tn (−∞, +∞) = c21 .

Proof Use the backward induction approach. For n = N, ∂ L N (∞, −∞, 0)/∂u = −h 1 − b2 ; ∂ L N (−∞, +∞, 0)/∂u = b1 + h 2 , TN (−∞, +∞) = c21 , and TN (+∞, −∞) = −c12 . From ∂ L n (x1 , x2 , 0)/∂u = ∂ L N (x1 , x2 , 0)/∂u + αETn+1 (x1 + z 1 , x2 + z 2 ) we have ∂ L N −1 (+∞, −∞, 0)/∂u = −h 1 − b2 − αc12 and ∂ L N −1 (−∞, +∞, 0)/∂u = b1 + h 2 + αc21 . Now suppose the assertions hold for any k > n, we want to show they are also true for n. First, we want to show that ET n+1 (x 1 + z1 , x 2 + z2 ) converges to Tn+1 (∞, −∞) as x 1 tends to positive infinity and x 2 tends to negative infinity. From Lemma 3.2, there exists a finite positive number K such that |∂ L n+1 (x1 , x2 , 0)/∂u| < K and |Tn+1 (x1 , x2 )| < K . For any given positive number ε, let M be a sufficiently large positive number such that K · F1 (−M) < ε, K (1 − F1 (M)) < ε, K · F2 (−M) < ε, and K (1 − F2 (M)) < ε; and N be a sufficiently large positive number such that |Tn+1 (x1 + z 1 , x2 + z 2 ) − Tn+1 (∞, −∞) | < ε, for any –M < z1 < M, –M < z2 < M, x 1 > N, and x 2 < –N from the hypotheses. It follows that |ETn+1 (x1 + z 1 , x2 + z 2 ) − Tn+1 (∞, −∞)| < K · F1 (−M) + K (1 − F1 (M)) + K · F2 (−M) + K (1 − F2 (M))  M M +| Tn+1 (x1 + z 1 , x2 + z 2 ) − Tn+1 (∞, −∞) f 1 (z 1 ) f 2 (z 2 )dz 1 dz 2 | −M

−M

+ K · F1 (−M) + K (1 − F1 (M)) + K · F2 (−M) + K (1 − F2 (M)) < 9ε. Thus, ETn+1 (x1 + z 1 , x2 + z 2 ) = Tn+1 (∞, −∞) = −c12 . lim x1 → +∞ x2 → −∞ The last equation is from the induction hypothesis. Similar arguments apply to the convergence of ETn+1 (x1 + z 1 , x2 + z 2 ) to Tn+1 (−∞, ∞). Second, the above results directly lead to assertion (i), i.e., ∂ L n (+∞, −∞, 0)/∂u = −h 1 − b2 − αc12 and ∂ L n (−∞, +∞, 0)/∂u = b1 + h 2 + αc12 .

3.3 Optimal ECR Policy and Its Structural Properties

57

Third, note that ∂ L n (+∞, −∞, 0)/∂u < −(1 + α)c12 and ∂ L n (−∞, +∞, 0)/∂u > (1 + α)c21 . These two inequalities imply that there exists a solution of (x 1 , x 2 ) to the equation ∂ L n (x1 , x2 , 0)/∂u = −c12 due to the continuity of ∂ L n (x1 , x2 , 0)/∂u. Select a pair (x 1d , x 2d ) that satisfies the inequality ∂ L n (x1d , x2d , 0)/∂u < −c12 . Note that ∂ L n (x1 , x2 , u)/∂u is decreasing in x 1 and increasing in x 2 from Corollary 3.1. We have ∂ L n (x1 , x2 , 0)/∂u < −c12 for any x 1 > x 1d and x 2 < x 2d . From the monotonic increasing property of ∂ L n (x1 , x2 , u)/∂u with respect to u, the solution to ∂ L n (x1 , x2 , u)/∂u = −c12 for x 1 > x 1d and x 2 < x 2d must be greater than 0. That is, Dn (x 1 , x 2 ) > 0 for x 1 > x 1d and x 2 < x 2d . Therefore, the region {(x 1 , x 2 ) | 0 < Dn (x 1 , x 2 )} includes {(x 1 , x 2 ) | x 1 > x 1d and x 2 < x 2d }. This implies that the region {(x 1 , x 2 ) | 0 < Dn (x 1 , x 2 ), x 1 > 0} includes {(x 1 , x 2 ) | x 1 > x 1d and x 2 < x 2d , x 1 > 0}. It follows that T n (x 1 , x 2 ) = –c12 is in that region by Proposition 3.2, which implies that Tn (+∞, −∞) = −c12 . Using a similar argument, we can  show that Tn (−∞, +∞) = c21 . This completes the induction proof.

N −n k Lemma 3.4 If −c12 < k=0 α (h 2 − h 1 ) < c21 , then (i)

(ii) (iii)

for a sufficiently large positive number x 1 , the solution of x 2 to ∂ L n (x1 , x2 , 0)/∂u = −c12 exists; and for a sufficiently large positive number x 2 , the solution of x 1 to ∂ L n (x1 , x2 , 0)/∂u = c21 exists; −c12 < ∂ L n (+∞, +∞, 0)/∂u < c21 ; lim E Tn (x1 + z 1 , x2 + z 2 ) = Tn (+∞, +∞) = ∂ L n (+∞, +∞, 0)/∂u. x1 →+∞ x2 →+∞

Proof Use the bckward induction approach. The assertions are obviously true for n = N. Suppose they are true for all k > n. We want to show that they also hold for n. The induction hypothesis (i) ensures that (∞, ∞) belongs to the region {(x 1 , x 2 ) | Dn+1 (x 1 , x 2 ) ≤ 0 ≤ U n+1 (x 1 , x 2 )}. This implies that Tn+1 (x1 , x2 ) = ∂ L n+1 (x1 , x2 , 0)/∂u as (x 1 , x 2 ) tends to (∞, ∞) from Proposition 3.2. From the induction hypothesis (iii) for all k > n, we have ∂ L n (+∞, +∞, 0)/∂u = ∂ L N (+∞, +∞, 0)/∂u + α E Tn+1 (∞, ∞) = ∂ L N (+∞, +∞, 0)/∂u + α[∂ L N (+∞, +∞, 0)/∂u + αETn+2 (∞, ∞)] = (h 2 − h 1 ) + α(h 2 − h 1 ) + · · · + α N −n (h 2 − h 1 ) =

N −n 

α k (h 2 − h 1 ).

k=0

N −n k It follows that −c12 < ∂ L n (+∞, +∞, 0)/∂u if −c12 < k=0 α (h 2 − h 1 )

N −n k and ∂ L n (+∞, +∞, 0)/∂u < c21 if k=0 α (h 2 − h 1 ) < c21 . Thus assertion (ii) is true. Moreover, from Lemma 3.3, we have ∂ L n (−∞, +∞, 0)/∂u > (1 + α)c21 and ∂ L n (+∞, −∞, 0)/∂u < −(1 + α)c12 Therefore, the continuity of ∂ L n (x1 , x2 , 0)/∂u yields assertion (i).

58

3 Optimal ECR Policy in Two-Depot System: Periodic Review

For assertion (iii), from Lemma 3.2, there exists a finite positive number K such that |∂ L n (x1 , x2 , 0)/∂u| < K and |Tn (x1 , x2 )| < K . For any given positive number ε, let M be a sufficiently large positive number such that K · F1 (−M) < ε, K (1 − F1 (M)) < ε, K · F2 (−M) < ε, and K (1 − F2 (M)) < ε; and N be a sufficiently large positive number such that |Tn (x1 + z 1 , x2 + z 2 ) − ∂ L n (∞, ∞, 0)/∂u| < ε, for any –M < z1 < M, –M < z2 < M, x 1 > N, and x 2 > N due to the fact that Tn (x1 , x2 ) = ∂ L n (x1 , x2 , 0)/∂u as (x 1 , x 2 ) tends to (∞, ∞) according to assertion (i) and Proposition 3.2. It follows that |E Tn (x1 + z 1 , x2 + z 2 ) − ∂ L n (∞, ∞, 0)/∂u| < K · F1 (−M) + K (1 − F1 (M)) + K · F2 (−M) + K (1 − F2 (M))  M M +| [Tn (x1 + z 1 , x2 + z 2 ) − ∂ L n (∞, ∞, 0)/∂u] f 1 (z 1 ) f 2 (z 2 )dz 1 dz 2 | −M

−M

+ K · F1 (−M) + K (1 − F1 (M)) + K · F2 (−M) + K (1 − F2 (M)) < 9ε. Therefore, we have lim E Tn (x1 + z 1 , x2 + z 2 ) = ∂ L n (+∞, +∞, 0)/∂u.

x1 →+∞ x2 →+∞

Denote this limit as Tn (∞, ∞). This completes the induction proof.

N −n k Lemma 3.5 If −c12 < k=0 α (b1 − b2 ) < c21 , then (i)

(ii) (iii)



for a sufficiently large negative number x 2 , the solution of x 1 to ∂ L n (x1 , x2 , 0)/∂u = −c12 exists; and for sufficiently large negative number x 1 , the solution of x 2 to ∂ L n (x1 , x2 , 0)/∂u = c21 exists; −c12 < ∂ L n (−∞, −∞, 0)/∂u < c21 ; lim E Tn (x1 + z 1 , x2 + z 2 ) = Tn (−∞, −∞) = ∂ L n (−∞, −∞, 0)/∂u. x1 →−∞ x2 →−∞

Proof This can be proved in the same way as Lemma 3.4. Proposition 3.5 The switching curves Dn (x) = 0 and U n (x) = 0 have the following asymptotic behaviors:

N −n k (i) if −c12 < k=0 α (h 2 − h 1 ) < c21 , then the curve Dn (x) = 0 converges to a finite x 2 as x 1 tends to +∞; the curve U n (x) = 0 converges to a finite x 1 as x 2 tends to +∞;

N −n k α (b1 − b2 ) < c21 , then the curve Dn (x) = 0 converges to a (ii) If −c12 < k=0 finite x 1 as x 2 tends to −∞; the curve U n (x) = 0 converges to a finite x 2 as x 1 tends to −∞. Proof Lemma 3.4 ensures that for a sufficiently large x 1 , the solution of x 2 to ∂ L n (x1 , x2 , 0)/∂u = −c12 exists. In other words, a solution of x 2 to Dn (x) =

3.3 Optimal ECR Policy and Its Structural Properties

59

0 exists for any sufficiently large x 1 . Suppose x 2 is increasing to positive infinity as x 1 tends to positive infinity in the switching curve Dn (x) = 0. This leads to ∂ L n (+∞, +∞, 0)/∂u = −c12 , which contradicts Lemma 3.4(ii), i.e., −c12 < ∂ L n (+∞, +∞, 0)/∂u < c21 . Thus, the assertion is true. Suppose x 1 is increasing to negative infinity as x 2 tends to negative infinity in the switching curve Dn (x) = 0. Lemma 3.5 indicates that for a sufficiently large negative number x 2 , the solution of x 1 to ∂ L n (x1 , x2 , 0)/∂u = −c12 exists. It follows that ∂ L n (−∞, −∞, 0)/∂u = −c12 which contradicts Lemma 3.5(ii), i.e., −c12 < ∂ L n (−∞, −∞, 0)/∂u < c21 . Similarly, we can prove the assertions for the switching curve U n (x) = 0. This completes the proof.  Proposition 3.6 Assume that the switching curve Dn (x) = 0 converges to An,2 as x 1 tends to positive infinity for any n and it converges An,1 as x 2 tends to negative infinity for any n. Further assume that the switching curve U n (x) = 0 converges to Bn,1 as x 2 tends to positive infinity for any n and it converges Bn,2 as x 1 tends to negative infinity for any n. Then these asymptotic points can be determined recursively as follows:     c12 − h 1 + h 2 − (h 2 + b2 )F2 −An,2 − αc12 F2 An+1,2 − An,2  ∞ +α ∂ L n+1 (∞, An,2 + z 2 , 0) f 2 (z 2 )dz 2 = 0, An+1,2 −An,2

     c12 − b2 − h 1 + (h 1 + b1 )F1 −An,1 − αc12 1 − F1 An+1,1 − An,1  An+1,1 −An,1 ∂ L n+1 (An,1 + z 1 , −∞, 0) f 1 (z 1 )dz 1 = 0, +α −∞

    − c21 + (h 1 + b1 )F1 −Bn,1 − h 1 + h 2 + αc21 F1 Bn−1,1 − Bn,1  Bn−1,1 −Bn,1 ∂ L n+1 (Bn,1 + z 1 , ∞, 0) f 1 (z 1 )dz 1 = 0, +α −∞

    − c21 + b1 + h 2 − (h 2 + b2 )F2 −Bn,2 + αc21 1 − F2 Bn−1,2 − Bn,2  ∞ +α ∂ L n+1 (−∞, Bn,2 + z 2 , 0) f 2 (z 2 )dz 2 = 0, Bn−1,2 −Bn,2

where   ∂ L N (∞, A N ,2 , 0)/∂u = −h 1 + h 2 − (h 2 + b2 )F2 −A N ,2 , ∂ L n (∞, An,2 , 0)/∂u = ∂ L N (∞, An,2 , 0)/∂u + αETn+1 (∞, An,2 + z 2 ), Tn+1 (∞, An,2 + z 2 ) = −c12 if z 2 < An+1,2 − An,2 ;

60

3 Optimal ECR Policy in Two-Depot System: Periodic Review

Tn+1 (∞, An,2 + z 2 ) = ∂ L n+1 (∞, An,2 + z 2 , 0)/∂u if z 2 > An+1,2 − An,2 ,   ∂ L N (A N ,1 , −∞, 0)/∂u = (h 1 + b1 )F1 −A N ,1 − h 1 − b2 , ∂ L n (An,1 , −∞, 0)/∂u = ∂ L N (An,1 , −∞, 0)/∂u + αETn+1 (An,1 + z 1 , −∞), Tn+1 (An,1 + z 1 , −∞) = −c12 if z 1 > An+1,1 − An,1 ; Tn+1 (An,1 + z 1 , −∞) = ∂ L n+1 (An,1 + z 1 , −∞, 0)/∂u if z 1 < An+1,1 − An,1 ,   ∂ L N (B1,1 , ∞, 0)/∂u = (h 1 + b1 )F1 −B1,1 − h 1 + h 2 , ∂ L n (Bn,1 , ∞, 0)/∂u = ∂ L N (Bn,1 , ∞, 0)/∂u + αETn+1 (Bn,1 + z 1 , ∞), Tn+1 (Bn,1 + z 1 , ∞) = c21 if z 1 < Bn−1,1 − Bn,1 , Tn+1 (Bn,1 + z 1 , ∞) = ∂ L n+1 (Bn,1 + z 1 , ∞, 0)/∂u if z 1 > Bn−1,1 − Bn,1 ,   ∂ L N (−∞, B1,2 , 0)/∂u = b1 + h 2 − (h 2 + b2 )F2 −B1,2 , ∂ L n (−∞, Bn,2 , 0)/∂u = ∂ L N (−∞, Bn,2 , 0)/∂u + αETn+1 (−∞, Bn,2 + z 2 ), Tn+1 (−∞, Bn,2 + z 2 ) = c21 if z 2 > Bn−1,2 − Bn,2 , Tn+1 (−∞, Bn,2 + z 2 ) = ∂ L n+1 (−∞, Bn,2 + z 2 , 0)/∂u if z 2 < Bn−1,2 − Bn,2 . Proof Note that An,2 is determined by ∂ L n (∞, An,2 , 0)/∂u = −c12 and the properties of Tn (∞, x2 ) in Proposition 3.2. It is easy to prove the recursive equations for  An,2 . Similar arguments apply to other assertions. This completes the proof.

N −n k Proposition 3.7 If −c12 < k=0 α (h 2 − h 1 ) < c21 , then the switching surfaces Dn (x) and U n (x) have the following asymptotic behaviors: (i) (ii)

for any fixed x 2 , the value of Dn (x) converges to a finite number as x 1 tends to positive infinity and |∂ Dn (x1 , x2 )/∂ x2 | < 1; for any fixed x 1 , the value of U n (x) converges to a finite number as x 2 tends to positive infinity and |∂Un (x1 , x2 )/∂ x1 | < 1.

3.3 Optimal ECR Policy and Its Structural Properties

61

Proof From ∂ L n (x1 , x2 , Dn (x))/∂u = −c12 , we have ∂ Dn (x1 , x2 )/∂ x2 = −∂ 2 L n (x1 , x2 , Dn (x))/∂u∂ x2 /∂ 2 L n (x1 , x2 , Dn (x))/∂u∂u. Note that ∂ 2 L n (x1 , x2 , u)/∂u∂u = ∂ 2 L N (x1 , x2 , u)/∂u∂u + α E[∂ 2 Vn+1 (x1 + z 1 − u, x2 + z 2 + u)/∂ x1 ∂ x1 − 2∂ 2 Vn+1 (x1 + z 1 − u, x2 + z 2 + u)/∂ x1 ∂ x2 + ∂ 2 Vn+1 (x1 + z 1 − u, x2 + z 2 + u)/∂ x2 ∂ x2 ], ∂ 2 L n (x1 , x2 , u)/∂u∂ x2 = ∂ 2 L N (x1 , x2 , u)/∂u∂ x2 + α E[−∂ 2 Vn+1 (x1 + z 1 − u, x2 + z 2 + u)/∂ x1 ∂ x2 + ∂ 2 Vn+1 (x1 + z 1 − u, x2 + z 2 + u)/∂ x2 ∂ x2 ]. ≥ 0. From Proposition 3.1(v) yields ∂ 2 L n (x1 , x2 , u)/∂u∂ x2 ∂ L N (x1 , x2 , u)/∂u∂u > ∂ 2 L N (x1 , x2 , u)/∂u∂ x2 and Proposition 3.1(vii), we have 0 < ∂ 2 L n (x1 , x2 , Dn (x))/∂u∂ x2 < ∂ 2 L n (x1 , x2 , Dn (x))/∂u∂u. Thus0 > ∂ Dn (x1 , x2 )/∂ x2 > −1. For any x 2 ≥ An,2 (defined in Proposition 3.6), we have Dn (x) ≤ 0 due to the it is bounded monotonic increasing property of the curve Dn (x) = 0 in x 1 . Therefore,   above. For any x 2 < An,2 and Dn (x) > 0, we have  Dn x1 , An,2 − Dn (x1 , x2 ) < An,2 − x2 due to |∂ Dn (x1 , x2 )/∂ x2 | < 1. Let x 1 tend to positive infinity. It follows that 2

|Dn (+∞, An,2 ) − lim Dn (x1 , x2 )| = lim Dn (x1 , x2 ) < An,2 − x2 . x1 →+∞

x1 →+∞

Thus, Dn (x 1 , x 2 ) converges to a finite number as x 1 tends to positive infinity since it is increasing in x 1 and bounded above. Similarly, we can prove the asymptotic behavior of U n (x). This completes the proof.  Physically, the above Proposition indicates that for any fixed x 2 , the number of empty containers that should be transferred from depot 1 to depot 2 is increasing as x 1 increases. However, the increasing rate is lower than the increasing rate of x 1 . Similarly, for any fixed x 1 , the number of empty containers that should be transferred from port 2 to port 1 is increasing as x 2 increases, but the increasing rate is smaller than that of x 2 . Proposition 3.8 (i)

If h2 – h1 < –c12 , then for a sufficiently large positive number x 2 , the solution x 1 to ∂ L n (x1 , x2 , 0)/∂u = −c12 exists; the curve Dn (x) = 0 converges to a

62

(ii)

3 Optimal ECR Policy in Two-Depot System: Periodic Review

finite x 1 as x 2 tends to +∞; and the curve U n (x) = 0 converges to a finite x 1 as x 2 tends to +∞; If h2 – h1 > c21 , then for a sufficiently large positive number x 1 , the solution x 2 to ∂ L n (x1 , x2 , 0)/∂u = c21 exists; the curve U n (x) = 0 converges to a finite x 2 as x 1 tends to+∞; and the curve Dn (x) = 0 converges to finite x 2 as x 1 tends to +∞.

Proof We only prove assertion (i) using the induction approach (assertion (ii) can be similarly proved). For n = N, note that ∂ L N (∞, ∞, 0)/∂u = −h 1 + h 2 < −c12 and ∂ L N (−∞, +∞, 0)/∂u > (1 + α)c21 . It follows that D N (x1 , ∞) = 0 has a finite solution for x 1 . In other words, the curve DN (x 1 , x 2 ) = 0 converges to a finite x 1 as x 2 tends to +∞ Moreover, the result also implies that D N (∞, ∞) > 0, so TN (∞, ∞) = −c12 by Proposition 3.2. Suppose assertion (i) holds for k > n. We want to show that it is true for n. The induction hypothesis ensures that (∞, ∞) belongs to the region {(x 1 , x 2 ) | Dn+1 (x 1 , x 2 ) > 0; x 1 > 0}. Proposition 3.2 yields that Tn+1 (x1 , x2 ) = −c12 as (x1 , x2 ) tends to (∞, ∞). It follows that ∂ L n (∞, ∞, 0)/∂u = ∂ L N (∞, ∞, 0)/∂u + α E Tn+1 (∞, ∞) = (h 2 − h 1 ) + α(−c12 ) < −c12 + α(−c12 ) < −c12 . From Lemma 3.3, we have ∂ L n (−∞, +∞, 0)/∂u > (1+α)c21 . Therefore, for a sufficiently large positive number x 2 , the solution of x 1 to ∂ L n (x1 , x2 , 0)/∂u = −c12 exists. In addition, the switching curve Dn (x) = 0 converges to a finite x 1 as x 2 tends to +∞; otherwise it would lead to ∂ L n (∞, ∞, 0)/∂u = −c12 , which is a contradiction. Since the curve U n (x) = 0 locates above the curve Dn (x) = 0, it is bounded on the right and increasingly converging to a finite x 1 as x 2 tends to +∞. This completes the proof.  Proposition 3.9

N −n k (i) If k=0 α (h 2 − h 1 ) < −c12 < h 2 − h 1 , then there exists an integer l (n < l < N) such that the switching curve Dj (x) = 0 converges to a finite x 2 as x 1 tends to+∞ for l ≤ j ≤ N, while the switching curve Dj (x) = 0 converges to +∞. for n ≤ j < l; finite x 1 as x 2 tends to N −n k α (h 2 − h 1 ), then there exists an integer m (n < (ii) If h 2 − h 1 < c21 < k=0 m < N) such that the switching curve U k (x) = 0 converges to a finite x 1 as x 2 tends to+∞ for m ≤ j ≤ N, while the switching curve U k (x) = 0 converges to a finite x 2 as x 1 tends to+∞ for n ≤ j < m. Proof The condition in assertion (i) implies that h2 – h1 < 0 and there exists an integer l (n < l < N) such that N −l+1 k=0

α k (h 2 − h 1 ) < −c12
(1 + α)c21 . Therefore, for a sufficiently large positive number x 2 , the solution of x 1 to ∂L j (x 1 , x 2 , 0)/∂u = –c12 exists. Thus, the switching curve Dj (x) = 0 converges to a finite x 1 as x 2 tends to positive infinity when j = l – 1. In addition, from Proposition 3.2, we have T j (∞, ∞) = –c12 since (∞, ∞) belongs to the region {(x 1 , x 2 ) | Dj (x 1 , x 2 ) > 0; x 1 > 0}. For j = l – 2, we have ∂ L l−2 (∞, ∞, 0)/∂u = ∂ L N (∞, ∞, 0)/∂u + αETl−1 (∞, ∞) −c12 = (h 2 − h 1 ) + α(−c12 ) < N −l+1 − αc12 < −c12 . αk k=0 Note that ∂ L n (−∞, +∞, 0)/∂u > (1 + α)c21 from Lemma 3.3. Assertion (i) is true for j = l – 2. Following the induction approach, we can prove that the switching curve Dj (x) = 0 converges to a finite x 1 as x 2 tends to positive infinity for all n ≤ j < l. Similarly, we can prove assertion (ii). This completes proof. 

the N −n k α (h 2 − h 1 ) = −c12 , If there exists an integer l(n ≤ l ≤ N ) such that α k=0 then the switching curve Dj (x) = 0 is increasing to positive infinity as x 1 (or x 2 ) tends to positive infinity for j = l. This is due to the fact that ∂ L n (x1 , x2 , 0)/∂u is decreasing in x 1 and increasing in x 2 from Lemma 3.3, and the continuity of ∂ L n (x1 , x2 , 0)/∂u. For j = l, its asymptotic behavior can be determined based on Proposition 3.9.

64

3 Optimal ECR Policy in Two-Depot System: Periodic Review

x2

Fig. 3.2 The control regions of the optimal control policy in the (x 1 , x 2 ) plane

Un(x) = 0 (II): un*(x) = 0 Dn(x) = 0

(III): un*(x) = Un(x)

x1 (I): un*(x) = Dn(x)

To provide an intuitive view of the analytical results, we illustrate the structural properties of the optimal policy u ∗n (x) in the (x 1 , x 2 ) plane in Fig. 3.2, in which the monotonic switching curves Dn (x) = 0 and U n (x) = 0 are displayed to divide the entire state space into three control regions (I), (II) and (III). In addition, the values of Dn (x) and U n (x) are increasing in x 1 and decreasing in x 2 . Physically, the optimal control action in each region is to bring the inventory level to the nearest boundary contacting the region u ∗n (x) = 0. For example, in region (I), the optimal action is to bring the inventory level at depot 1 to the boundary Dn (x) = 0. In regions (III), the empty container is transferred from depot 2 to depot 1 to bring the system state to the boundary U n (x) = 0. More specifically, we summarize the optimal control policy, together with the asymptotic properties of the switching curves in relation to the system parameters, in the following Proposition. Proposition 3.10 The optimal control policy has the following asymptotic behaviors depending on the system parameters, as shown in Figs. 3.3, 3.4 and 3.5, in which Fig. 3.3 The optimal control policy in the (x 1 , x 2 ) plane with asymptotic properties in Case A

x2

Un(x) = 0 (II): un*(x) = 0

(III): un*(x) = Un(x) An,2

Dn(x) = 0

Bn,1 (I): un*(x) = Dn(x)

x1

3.3 Optimal ECR Policy and Its Structural Properties Fig. 3.4 The optimal control policy in the (x 1 , x 2 ) plane with asymptotic properties in Case B

65

x2

Un(x) = 0

Dn(x) = 0

(II)

(III): un*(x) = Un(x)

Bn,1

x1

An,2

(I): un*(x) = Dn(x)

Fig. 3.5 The optimal control policy in the (x 1 , x 2 ) plane with asymptotic properties in Case C

x2 Bn,1 (III): un*(x) = Un(x) An,2

Un(x) = 0 (II): un*(x) = 0 Dn(x) = 0 x1 (I): un*(x) = Dn(x)

the ECR actions in the area of {(x 1 , x 2 ) | x 1 < 0 and x 2 < 0} are ignored because they are less important since both depots would have no empty container available in that region.

N −n k α (h 2 − h 1 ) < c21 , then un * (x) is shown in Fig. 3.3; Case A: If −c12 < k=0 Case B: If h2 – h1 < –c12 , then un * (x) is shown in Fig. 3.4; * Case C: If h 2 – h1 > c21 , then un (x) is shown in Fig. 3.5; N −n k Case D: If k=0 α (h 2 − h 1 ) < −c12 < h 2 − h 1 , then there exists an integer l (n < l < N) such that the optimal policy has the structure in Fig. 3.3 for l ≤ k ≤ N and has the structure in Fig. 3.4 for n ≤ k < l; N −n k α (h 2 − h 1 ), then there exists an integer m (n Case E: If h 2 − h 1 < c21 < k=0 < m < N) such that the optimal policy has the structure in Fig. 3.3 for m ≤ k ≤ N and has the structure in Fig. 3.5 for n ≤ k < m. The knowledge of the structural properties of the optimal ECR policy shown in Proposition 3.10 provides managerial insights into decision making on empty container repositioning between the two depots, e.g., it provides intuitive and qualitative instructions on when we should reposition empty containers between the

66

3 Optimal ECR Policy in Two-Depot System: Periodic Review

two depots, in which direction and in what quantity, how the system state affects the repositioning decisions, and how the cost parameters affect the repositioning decisions.

3.4 Near-Optimal Threshold-Type Policy In practice, the optimal ECR policy obtained in the previous section may be too complicated and difficult to implement because the switching curves are not obvious. Since we have established the structural properties of the optimal policy such as monotonicity, asymptotic behaviors, and region-feedback format, these properties enable us to construct easy-to-operate, easy-to-implement, and near-optimal policies. One simple approximation is to use straight lines to replace the switching curves so that the control regions can be characterized by a few threshold values. Proposition 3.11 A near-optimal policy can be constructed based on the structural properties of the optimal control policy, as shown in Figs. 3.6 and 3.7. In each region, the number of empty containers to be repositioned can be determined by the minimum value that brings the inventory level to the nearest boundary. Essentially, the proposed near-optimal policy bears the similarity to the order-up-to-point policies in the traditional inventory control practices.

N −n k α (h 2 − h 1 ) < c21 , then the near-optimal policy is Case A: if −c12 < k=0 given in Fig. 3.6; Case B: if h2 – h1 < –c12 , then the near-optimal policy is given in Fig. 3.7; Case C: if h2 – h1 > c21 , then the near-optimal policy is given in Fig. 3.8; Cases D and E can be similarly approximated by combinations of the above three cases in different decision periods. Alternatively, it can be further simplified by using its stationary control policy, which takes the form of either Case B or C. x2

Fig. 3.6 Near-optimal control policy in the (x 1 , x 2 ) plane in Case A

(II): un*(x) = 0 (III): un*(x) = Un(x)

(B1, B2) (A1, A2) x1

C1 C2 (I): un*(x) = Dn(x)

3.4 Near-Optimal Threshold-Type Policy

67

x2

Fig. 3.7 Near-optimal control policy in the (x 1 , x 2 ) plane in Case B

(II) (B1, B2)

(A1, A2)

*

(III): un (x) = Un(x)

x1

C1 C2

(I): un*(x) = Dn(x)

x2

Fig. 3.8 Near-optimal control policy in the (x 1 , x 2 ) plane in Case C

(B1, B2) (III): un*(x) = Un(x)

(II): un*(x) = 0 (A1, A2) x1

C1 C2 (I): un*(x) = Dn(x)

Based on Proposition 3.11, we can see that the near-optimal policy is determined by four points with six parameters (e.g., A1 , A2 , B1 , B2 , C 1 , C 2 ). The advantages of this policy are: • It has an explicit form and is much easier to implement than the optimal policy; • It is constructed based on the structural properties of the optimal policy, so it is close to the optimal policy if the six parameters are appropriately selected; • It is easy to understand and operate from the managers’ viewpoint because it does not require extensive data and calculations; and • Some parameters in the above near-optimal policy may be computed analytically or numerically, e.g., from Proposition 3.6 about the asymptotic behaviors of the switching curves.

68

3 Optimal ECR Policy in Two-Depot System: Periodic Review

3.5 Numerical Examples This section provides two numerical examples to verify the analytical results. We apply the value iteration algorithm to the numerical examples to calculate the optimal value function and the optimal ECR policy at each period. In our case, the iteration number represents the number of periods. In the experiments, the ECR volume is constrained by the available number of empty containers at the depot. In other words, we do not allow to lease a container from one depot and reposition the leased container to the other depot. Example 3.1 Consider a scenario with two symmetric depots. Let the system parameters take the following values: h1 = h2 = 1, b1 = b2 = 10, c12 = c21 = 2, and α = 0.6. The leasing cost is much higher than the inventory holding and transport costs, whereas the transport cost is twice the inventory holding cost. Such relative relationships between these cost coefficients are reasonable in practice. We take the planning horizon as N = 14, which covers two weeks if one period represents one day. We assume that both container supply and demand follow the same discrete uniform distribution, i.e., ZIi ~ U[0, 4] and ZOi ~ U[0, 4], at both depots. Therefore, statistically, each depot has a balanced container supply and demand. However, due to the stochasticity in both supply and demand, each depot may face short-term imbalance in container supply and demand, which necessitates the empty container repositioning between two depots. After N iterations, the optimal cost in state (0, 0) is given by V 1 (0, 0) = 59.32. The optimal ECR policy after N iterations is partly displayed in Fig. 3.9, in which the solid curves represent the switching curves D1 (x 1 , x 2 ) = 0 and U 1 (x 1 , x 2 ) = 0, which divide the state space into three control regions together with two axes. The numbers in Fig. 3.9 represent the value of the optimal policy u ∗1 (x1 , x2 ) in the corresponding system states, where positive numbers indicate the quantities of empty containers repositioned from depot 1 to depot 2 and negative numbers indicate the quantities of empty containers repositioned from depot 2 to depot 1. For example, at the system states (x 1 , x 2 ) = (0, 0) and (1, 0), there is no need to reposition any empty containers between two depots; at the system states (x 1 , x 2 ) = (2, 0), (3, 0) and (4, 0), the transport company should reposition 1 empty container from depot 1 to depot 2; at the system states (x 1 , x 2 ) = (5, 0), (6, 0), (7, 0) and (8, 0), the transport company should reposition 2 empty containers from depot 1 to depot 2. At the system state (x1 , x2 ) = (−1, 1), the transport company should reposition 1 empty container from depot 2 to depot 1. The logic of the ECR policy is intuitive to understand, namely, when depot 1 has more inventory of empty containers, we tend to reposition more empty containers to depot 2. Evidently, the optimal policy has the structural properties shown in Sect. 3.3.

N −n ECR α k (h 2 − h 1 ) < c21 holds in this symmetric scenario. The Note that −c12 < k=0 control structure verifies the result in Fig. 3.3. Example 3.2 Consider a scenario with two asymmetric depots. Let h1 = 2.5 and all other parameters remain the same as that in Example 3.1. This represents the scenario

3.5 Numerical Examples

69

Fig. 3.9 The optimal control policy in the (x 1 , x 2 ) plane for the symmetric scenario

that depot 1 has a much higher unit inventory holding cost than depot 2, which may be interpreted as that depot 1 has less storage space. That means two depots are asymmetric in terms of cost structure. After N = 14 iterations, the optimal cost in state (0, 0) is given by V 1 (0, 0) = 62.04. The optimal control policy after N iterations is partly displayed in Fig. 3.10, in which the solid curves represent the switching curves D1 (x 1 , x 2 ) = 0 and U 1 (x 1 , x 2 ) = 0. The numbers in Fig. 3.10 represent the values of the optimal policy u1 * (x 1 , x 2 ) in the corresponding system states. For example, at the system states (x 1 , x 2 ) = (0, 0) and (1, 0), there is no need to reposition any empty containers between two depots; at the system states (x 1 , x 2 ) = (2, 0) and (3, 0), the transport company should transfer 1 empty container from depot 1 to depot 2; at the system states (x 1 , x 2 ) = (4, 0) and (5, 0), the transport company should transfer 2 empty containers from depot 1 to depot 2; at the system states (x 1 , x 2 ) = (6, 0), the transport company should transfer 3 empty containers from depot 1 to depot 2. Comparing these results with that in Example 3.1, it can be seen that more empty containers should be repositioned to depot 2 at the same system state. This is due to the relatively high unit inventory holding cost at depot 1. Again, the optimal ECR policy has the structural properties shown in Sect. 3.3. Note that h2 – h1 < –c12 holds in this asymmetric scenario. The control structure verifies the result in Fig. 3.4.

70

3 Optimal ECR Policy in Two-Depot System: Periodic Review

Fig. 3.10 The optimal control policy in the (x 1 , x 2 ) plane for the asymmetric scenario

For the Example 3.1, if a stationary near-optimal policy in Fig. 3.6 with control parameters A1 = 7, A2 = 3, B1 = 3, B2 = 7, C 1 = − 1, and C 2 = − 1 is used to manage empty container transfer between the two ports in all the periods, the incurred cost will be V 1 (0, 0) = 59.88, which is 0.9% above the optimal solution. For the Example 3.2, using a stationary near-optimal policy in Fig. 3.7 with control parameters A1 = 4, A2 = 2, B1 = 1, B2 = 3, C1 = −2, and C2 = −1 to manage empty container transfer between the two ports, the incurred cost will be V 1 (0, 0) = 63.23, which is 1.9% above the optimal solution. These results demonstrate the effectiveness of the proposed near-optimal policies.

3.6 Summary and Notes This chapter investigates the optimal ECR policy between two independent depots over a multi-period planning horizon. Two depots face independent random container supply and random customer demand. Customer demands must be fulfilled using leased containers if needed. The problem is to determine the optimal empty container transferring between two depots by minimizing the total expected cost including empty container transport costs, inventory holding costs, and demand leasing costs. It is shown that the optimal ECR policy can be characterized by two monotonic

3.6 Summary and Notes

71

switching curves that divide the state space into three control regions. The structural properties of the optimal ECR control hold for each decision period. Based on the structural properties, simple near-optimal and easy-to-operate ECR policies have been constructed. The problem under consideration is related to dynamic inventory control, which is commonly formulated into an optimal control problem using stochastic dynamics. There has been a rich literature on the optimal control of production and inventory systems under uncertainty using continuous-time fluid-flow models (e.g., Gershwin, 1994; Sethi & Zhang, 1994; Song & Zhang, 2010), discrete-time periodic review models (Ng et al., 2012; Song, 2007; Song et al., 2010; Zhang et al., 2014), continuous-time discrete event models (e.g., Feng & Yan, 2000; Arruda & do-Val, 2008; Song, 2009, 2013). In particular, our problem is similar to the literature on the optimal control of inventory transfer/transshipments between stocking locations (e.g., Agrawal, et al., 2004; Das, 1975; Hu et al., 2008; Yang & Qin, 2007). For example, Paterson et al. (2011) provided a comprehensive literature review of inventory models with lateral transshipments between two or multiple locations. However, it differs from the traditional inventory transshipment problem in the following aspects: (i) we consider both depots face random supply and random demand with leasing activities, which essentially makes the inventory state space unbounded in both positive and negative directions; (ii) we consider the dynamic decision making over multiple periods, while the traditional inventory transshipment problem often considers the single-period case (Ng et al., 2012). Further research could be done in several directions: (i) we consider finite time planning with a discounted cost. Extension could be done for infinite planning horizon with a long-run average cost, in which case the optimal stationary policy should be investigated; (ii) a natural extension of the two-depot case is the ECR over multiple independent depots, which will be addressed in Chap. 6; (iii) there may be two transport companies that operate two depots independently; coordination among different companies by joint repositioning and/or sharing the empty containers under revenue sharing mechanism is interesting (Xie et al., 2017); (iv) some depots may be closely connected, e.g., depots in a transport corridor, or intermodal terminal and seaports, where the demands are predominantly flowing between these two depots themselves (Song et al., 2010). This type of transport system can be regarded as shuttle services. Its ECR problems will be discussed in later chapters.

References Agrawal, V., Chao, X., & Seshadri, S. (2004). Dynamic balancing of inventory in supply chains. European Journal of Operational Research, 159, 296–317. Arruda, E. F., & do Val, J. B. R. (2008). Stability and optimality of a multiproduct production and storage system under demand uncertainty. European Journal of Operational Research, 188(2), 406–427.

72

3 Optimal ECR Policy in Two-Depot System: Periodic Review

Das, C. (1975). Supply and redistribution rules for two-location inventory systems: One period analysis. Management Science, 21(7), 765–776. Douglass, S. A. (1996). Introduction to mathematical analysis. Addison-Wesley. Feng, Y. Y., & Yan, H. M. (2000). Optimal production control in a discrete manufacturing system with unreliable machines and random demands. IEEE Transactions on Automatic Control, 45(12), 2280–2296. Gershwin, S. B. (1994). Manufacturing systems engineering. Prentice-Hall. Hu, X., Duenyas, I., & Kapuscinski, R. (2008). Optimal joint inventory and transshipment control under uncertain capacity. Operations Research, 56(4), 881–897. Ng, C. T., Song, D. P., & Cheng, T. C. E. (2012). Optimal policy for inventory transfer between two depots with backlogging. IEEE Transactions on Automatic Control, 57(12), 3247–3252. Paterson, C., Kiesmüller, G., Teunter, R., & Glazebrook, K. (2011). Inventory models with lateral transshipments: A review. European Journal of Operational Research, 210(2), 125–136. Sethi, S., & Zhang, Q. (1994). Hierarchical decision making in stochastic manufacturing systems. Birkhauser. Song, D. P. (2007). Characterizing optimal empty container reposition policy in periodic-review shuttle service systems. Journal of the Operational Research Society, 58(1), 122–133. Song, D. P. (2009). Optimal integrated ordering and production policy in a supply chain with stochastic lead-time, processing-time, and demand. IEEE Transactions on Automatic Control, 54(9), 2027–2041. Song, D. P. (2013). Optimal control and optimization in stochastic supply chain systems. Springer. Song, D. P., Dong, J. X., & Roe, M. (2010). Container dispatching policies in two-terminal shipping services with uncertain demands. International Journal of Shipping & Transport Logistics, 2(1), 44–58. Song, D. P., & Zhang, Q. (2010). A fluid flow model for empty container repositioning policy with a single port and stochastic demand. SIAM Journal on Control and Optimization, 48(5), 3623–3642. Xie, Y., Liang, X., Ma, L., & Yan, H. (2017). Empty container management and coordination in intermodal transport. European Journal of Operational Research, 257(1), 223–232. Yang, J., & Qin, Z. (2007). Capacitated production control with virtual lateral transshipments. Operations Research, 55(6), 1104–1119. Zhang, B., Ng, C. T., & Cheng, T. C. E. (2014). Multi-period empty container repositioning with stochastic demand and lost sales. Journal of the Operational Research Society, 65(2), 302–319.

Chapter 4

Optimal ECR Policy in Two-Depot Shuttle Systems: Continuous Review

Abstract This chapter addresses the ECR problem in a two-depot shuttle service system over an infinite time horizon with the focus of deriving optimal stationary ECR policies. Empty containers are required at each depot to meet random customer demands that derive laden container movements between two depots. Customer demands must be satisfied by either owned empty containers or leasing from lessors. The system is based on continuous review and discrete state, where the system state represents the inventory level of containers at two depots. An event-driven model is formulated, where the ECR decisions are made at each event when the system state changes. Under the assumption of Poisson arrival process of laden containers and an exponential distribution of empty container transfer times, we convert the continuous-time Markov decision process into an equivalent discrete-time Markov decision process by using the uniformization technique. It is shown that the optimal ECR policy is a threshold policy, characterized by two control parameters, in both the discounted cost and the long-run average cost cases. The closed-form of the optimal discounted cost function is derived by using the characteristic equation method. The closed-form of the optimal long-run average cost function is derived by calculating the stationary distribution under the threshold control policy. Numerical examples are provided to demonstrate the analytical results and test the sensitivity of the model to the distribution assumption. Finally, the models are extended to the cases with random supply and demand of empty containers at both depots, where empty containers may exit and enter the two-depot shuttle system randomly.

4.1 Introduction Transport systems are always facing the challenges of dynamic operations, various uncertainties, and imbalanced demands. This chapter focuses on a two-depot transport system. Such systems represent the container transportation between two inland depots or between an inland depot and a seaport, where there are frequent smallamount containerized cargo flows. For example, Maritime Transport Company owns the biggest container road fleet in the UK. It operates a number of transport depots that are located in most of the UK’s major and regional ports, rail terminals, and inland © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 D.-P. Song and J. Dong, Modelling Empty Container Repositioning Logistics, https://doi.org/10.1007/978-3-030-93383-8_4

73

74

4 Optimal ECR Policy in Two-Depot Shuttle … Lessors

Laden container

Empty container Depot 1

Lessors Empty container

Empty container

Depot 2

Laden container

Fig. 4.1 A two-depot system

strategic locations (www.maritimetransport.com). The container flows between these depots are frequent using trucks with one or two containers every trip. In the two-depot systems under consideration, empty containers are required at each depot to meet random customer demands. Customer demand is defined as an exogenous request of transporting containerized cargos from one depot to the other. The fulfilled demand is regarded as a laden (loaded) container. After the laden container arrives at a depot, the laden container is unpacked and becomes empty. The empty container is ready for reuse or repositioning. It is assumed that customer demands can always be satisfied by either the available empty containers at the current depot or by leasing an empty container from lessors. The two-depot system is illustrated in Fig. 4.1. Due to the demand imbalance and other factors, it is often necessary to reposition empty containers between two depots to better meet customer demands and avoid expensive leasing costs. We treat the empty container inventories at two depots as the system state. The purpose is to determine the optimal empty container repositioning decisions at each occasion when the system state changes. The system is subject to two types of uncertainties. One is the random laden container arrivals, and the other is the random transfer time of the repositioned empty containers. Clearly, the arrivals of either a laden or an empty container will change the system state, which may trigger action for empty container repositioning. We assume the continuous review scheme, which means that the information of the empty container inventory levels at both depots is available at any time. In this chapter, we will investigate two cases of the ECR problems, the discounted cost case (Song & Earl, 2008), and the long-run average cost case (Song, 2005). The emphasis is to establish analytical results of the optimal cost functions and derive the closed-form of the optimal ECR policy in such stochastic dynamic systems.

4.2 Problem Formulation

75

4.2 Problem Formulation The methodology can be described as follows. Stochastic dynamic programming is applied to formulate the problem. Structural properties of the optimal empty repositioning policy are established, which can be utilized to derive the explicit form of the optimal cost function. Finally, we can obtain the explicit form of the optimal ECR policy. The solution techniques have been successfully applied to optimal production and inventory control problems (e.g., Feng & Yan, 2000; Sethi & Zhang, 1994; Song, 2013). However, our transport system differs from production and inventory systems. This is because manufactured goods will normally disappear from the system after satisfying customer demands, whereas containers are reusable assets and particularly leased containers are required to be returned to their original depots because of the imbalance of the laden container movements. We make the following assumptions. Assumption 4.1 When a customer demand arrives at a depot and there is no owned empty container available, an empty container can be leased from a local lessor immediately to satisfy the demand. Leased containers are charged on a time basis, which implies that they must be off-leased at the original depot where they were leased. Assumption 4.2 The laden container arrival process is a Poisson process that is exogenous and not controllable. Assumption 4.3 The empty container repositioning times follow independent exponential distributions; but there is a choice of different repositioning modes for empty containers corresponding to different speeds of transport between two depots. Assumption 4.1 implies that customer demands will always be satisfied. Note that nearly 50% of shipping containers are owned by lessors, the leasing assumption is reasonable. The time-based charging for leased containers is a common practice, and the requirement of returning the leased container to its origin depot is due to the fact that the transport costs in two directions are often different due to demand imbalance. Assumption 4.2 can be justified as follows. Since customer demands are always satisfied, the laden container arrival process is a combined stochastic process of the customer demand occurrence and the laden container transportation time. The customer demand occurrence is commonly assumed to be a Poisson process (Deb, 1978; Deb & Schmidt, 1987; Du & Hall, 1997), and the laden container transportation time is relatively reliable (Cheung & Powell, 1996; Du & Hall, 1997). Therefore, the combined process may be approximated by a Poisson process. Nevertheless, in the numerical experiments, the Poisson process assumption will be relaxed to test the robustness of the model. In Assumption 4.3, the assumption of the exponentially distributed empty container repositioning times is proposed as a plausible simplification, which will make the model analytically tractable. In general, the empty repositioning time includes cleaning time, repair time, congestion time, waiting time, and transportation time. These could vary significantly due to their relatively low

76

4 Optimal ECR Policy in Two-Depot Shuttle …

priority compared to laden containers. We define the empty container repositioning time to be the period from the time when a repositioning decision is made to the time when the empty container reaches its destination. The uncertainty of waiting time and congestion time can somewhat justify the assumption that empty container repositioning times are exponentially distributed. However, in the numerical experiments, we will demonstrate that the analytical results based on the exponential assumption for repositioning times are largely valid for a range of other probability distributions including deterministic, uniform, and normal distributions. In practice, empty container repositioning may be done in several modes with various durations and costs, e.g., truck, rail, and barge. This justifies the different speeds of empty container transfer. Define the notation as follows: N μij λij (t)

x i (t) q12 and q21 ci+ ci−

the fleet size, which represents the total number of owned containers. the laden container arrival rate from depot i to depot j in the Poisson arrival process. the control parameter at time t for repositioning an empty container from depot i to depot j. The empty repositioning time is exponentially distributed with mean 1/λij . The parameter λij (t) takes a value from a finite set, 0, . . . , λi j , where λi j is the maximum element in the set. the cumulative number of the containers at depot i (including containers on the way to the other depot) at time t. the cost of empty container repositioning from depot 1 to 2 and from depot 2 to 1 respectively. the routine inventory or maintenance costs of containers associated with depot i per container per unit time. the leasing cost of containers that are leased from depot i per container per unit time.

The control parameter λij gives the options of different empty container repositioning modes (or interpreted as speeds) including “not reposition” (when λij = 0). The empty container repositioning cost is assumed to be proportional to the control parameter λij . Specifically, if λij takes the value 0, it means a decision, “do not transport empty container from depot i to depot j”, is made and no repositioning cost is incurred. On the other hand, if λij takes the value λi j , it means a decision, “use the fastest transport mode to transfer an empty container from depot i to depot j”. Physically, the parameter 1/λij represents the average empty container transfer time with larger λij corresponding to faster transport modes. It is reasonable to assume that a faster mode will incur a higher cost. The relation between speed of repositioning mode and cost is not simple and will depend on several economic factors. We assume that the cost per unit time is proportional to the parameter λij (Choong et al., 2002). The coefficients of proportionality are denoted by q12 ≥ 0 and q21 ≥ 0 for empty container repositioning from depot 1 to 2 and from depot 2 to 1, respectively; that is, the repositioning costs are expressed as qij λij . The values of the coefficients qij should reflect the fact that repositioning an empty container is often cheaper than leasing an empty container.

4.2 Problem Formulation

77

Note that x i (t) represents the inventory level of empty containers at depot i. We use the vector x(t) := (x 1 (t), x 2 (t)) to denote the system state. Changes in x(t) indicate changes in system state. If a laden or empty container arrives at depot 1, then we have: x 1 (t) = x 1 (t) + 1 and x 2 (t) = x 2 (t) − 1. Similarly, if a container reaches depot 2, then x 2 (t) = x 2 (t) + 1 and x 1 (t) = x 1 (t) − 1. It should be noted here that x i (t) can be negative. This is due to the fact that the laden container arrival is random and beyond control. For example, if all owned containers have already arrived at depot 1, then one more laden container arrival at depot 1 means that a container has been leased at depot 2. Therefore, it results in a negative value of container inventory at depot 2. Specifically, when x i (t) is positive, it represents the number of containers at depot i (including containers on the way to the other depot) at time t. When x i (t) is negative, then −x i (t) represents the number of containers that were leased from depot i and are currently staying at the other depot (or on the way back to depot i) at time t. The values x 1 (t) and x 2 (t) are always changed in opposite directions by one unit with x 1 (t) + x 2 (t) ≡ N, where N represents the owned container fleet size. As a result, x 2 (t) = N − x 1 (t). A leased container will be charged on a time basis until it is returned. There is no constraint on the leasing time. Due to the relatively high leasing cost, the leased containers have a higher priority for transfer either to meet a demand or reposition than owned containers, because this can reduce leasing duration and thus cost. The cost model for container leasing and ownership is defined as follows. If x i (t) > 0, then x i (t) is the number of containers at depot i. All these containers will incur routine inventory or maintenance costs, charged at the rate associated with depot i of ci+ per container (for both owned and leased) per unit time. We do not distinguish the maintenance cost for a container en route from i to j from the maintenance costs of a container actually held in depot i. If x i (t) < 0, then −x i (t) represents the number of containers leased from depot i charged at ci− per container per unit time. Therefore, the container maintenance and leasing cost function per unit time at time t with state variable x(t) can be written as g(x(t)) = c1+ x1+ (t) + c1− x1− (t) + c2+ x2+ (t) + c2− x2− (t)

(4.1)

where x i + (t) = max (x i (t), 0) and x i − (t) = max (−x i (t), 0) for i = 1, 2. {x = (x1 , x2 )|x2 = N −x1 and x1 ∈ Z } be the system Let X =  state space. Let  = u := (λ12 (t), λ21 (t))|λ12 (t) ∈ 0, . . . , λ12 ,    λ21 (t) ∈ 0, . . . , λ21 , t ∈ (0, ∞) be the admissible control set consisting of all stationary control policies (i.e., state feedback policies). The discounted-cost optimization problem is to find the optimal control policy u ∈  to minimize the infinite-horizon expected discounted cost starting from an initial state x: ⎞⎤ ⎡ ⎛∞  J (x) = min⎣ E ⎝ e−βt [q12 λ12 (t) + q21 λ21 (t) + g(x(t))] dt|x(0) = x⎠⎦ (4.2) u

0

78

4 Optimal ECR Policy in Two-Depot Shuttle …

where 0 < β < 1 is a discount factor. The contribution of future costs to the cost function is discounted exponentially. This guarantees the cost function to be finite for an infinite time horizon.

4.3 Convert into Discrete-Time Markov Decision Process The system is event driven and there are two types of events, that is, the arrivals of laden containers, and the arrivals of repositioned empty containers. Both events change the system state. Due to the memoryless properties of the Poisson process and the exponential distribution, an empty container in the process of being repositioned between depots that are interrupted by an event is statistically equivalent to restarting the repositioning. For example, suppose an empty container has been dispatched from depot 1 to depot 2, but before it reaches depot 2 another event occurs, say a laden container arrives at depot 2. The remaining time until that empty container is repositioned follows the same exponential distribution due to the memoryless property of the exponential distribution. This means that statistically the empty container is effectively re-dispatched from depot 1. This Markovian property greatly simplifies the description of the evolution of the dynamic system. Specifically, the system state transition map of this continuous-time Markov chain model can be illustrated in Fig. 4.2. Following the standard uniformization technique (Bertsekas, 1987; Puterman, 1994; Song, 2013), the continuous-time Markov chain model of state transitions can be transformed into an equivalent discrete-time Markov chain model with ν = μ12 + μ21 + λ12 + λ21 as the uniform transition rate. This uniformization process is explained as follows. Under an admission control policy u ∈ , the one-step transition probability function Prob(y | x, u) at each state x ∈ X is given as follows: Prob( y|x, u) = (μ21 + λ21 (y))/v, if y = (x1 + 1, x2 −1); Fig. 4.2 The system state transition map in two-depot system

x2 x1-1, x2+1 12

12

x1, x2 21

21

x1+1, x2-1 x1

4.3 Convert into Discrete-Time Markov Decision Process

79

Prob( y|x, u) = (μ12 + λ12 (y))/v, if y = (x1 −1, x2 + 1);

 Prob( y|x, u) = λ12 + λ21 − λ21 (k) − λ21 (k) /v; if y = x; Prob( y|x, u) = 0, otherwise. To simplify the narrative, let H (x(t), u(t)) = q12 λ12 (t) + q21 λ21 (t) + g(x(t)) Let 0 = t 0 < t 1 < … < t k < … be the potential state transition epochs, and xk := x(t k ) be the destination state of the kth transition. If uk = u(t k ) denotes the empty container repositioning rates at time t k , i.e., the control decision of the kth transition, it follows that x(t) = xk and u(t) = uk , if t ∈ [t k , t k+1 ]. To compute the cost function for a given initial condition x(0) = x under the control policy u(t), we have ∞ E

e−βt H (x(t), u(t))dt

0 ∞  

tk+1

=E

e−βt H (xk , u k )dt

k=0 t k

=E

∞   1 −βtk  e · 1 − e−β(tk+1 −tk ) · H (xk , u k ) β k=0

 Note that tk = kj=1 (t j − t j−1 ). Random variables (t j − t j−1 ) are independent for any j > 0 and follow the same exponential distribution with the “uniform transition rate” v. Due to the independence of the three terms on the right-hand side of the above equation and exchanging the mathematical expectation with the sum operator, the cost function can be further simplified using

Ee

−βtk

⎛∞ ⎞k   −βτ −vτ ⎝ ⎠ = e · ve dτ = 0

 E 1 − e−β(tk+1 −tk ) =

v β +v

k and

β . β +v

Hence, we have ∞ E

−βt

e 0

k ∞  v 1  H (x(t), u(t))dt = E H (xk , u k ) β + v k=0 β + v

80

4 Optimal ECR Policy in Two-Depot Shuttle …

Therefore, the problem is transformed into a discrete-time Markov chain problem with non-negative unbounded cost per step and infinite countable state space. To further simplify the narrative. Let Dx: = (x 1 − 1, x 2 + 1) and Ax: = (x 1 + 1, x 2 − 1). Following the stochastic dynamic programming theory (Puterman, 1994, Chap. 11), the Bellman optimality equation for the minimum expected discounted cost J(x) in (4.2) can be rewritten as,  J (x) = (β + v)−1 min g(x) + q12 λ12 + q21 λ21 + μ12 J (Dx) λ12 ,λ21  

+ μ21 J (Ax) + λ12 J (Dx) + λ21 J (Ax) + λ12 + λ21 − λ12 − λ21 J (x) (4.3) Equation (4.3) is linear in λ12 and λ21 and can thus be simplified as  J (x) = (β + v)−1 g(x) + μ12 J (Dx) + μ21 J ( Ax) + λ12 min{J (Dx) + q12 , J (x)}  (4.4) + λ21 min{J (Ax) + q21 , J (x)} Now we have converted the continuous-time Markov chain model into the equivalent discrete-time Markov chain model, which becomes more tractable.

4.4 Solve the Discounted Cost Case This section assumes that the cost is discounted over time. We first establish the structural properties of the optimal cost function and show that optimal stationary ECR policy is of threshold control type. Then, we derive the closed-form objective function and obtain the optimal threshold values of the optimal ECR policy (Song & Earl, 2008).

4.4.1 Optimal ECR Policy and Its Structural Properties Equation (4.4) implies that the optimal ECR policy is a bang-bang type, i.e., the control parameter λij only takes two values, 0 or λi j . This result is expressed as follows. Proposition 4.1 For any x ∈ X the following stationary policy is optimal, λ∗12



λ12 if J (Dx) + q12 < J (x) ; 0 if J (Dx) + q12 ≥ J (x)  λ21 if J (Ax) + q21 < J (x) = 0 if J (Ax) + q21 ≥ J (x)

=

λ∗21

4.4 Solve the Discounted Cost Case

81

The optimal policy given in Proposition 4.1 is implicit and not obvious to implement because the value function J(x) is unknown. The following value iteration algorithm can calculate the optimal cost function numerically. Proposition 4.2 For any x ∈ X, let J 0 (x) = 0 and Jk+1 (x) = (β + v)−1 [g(x) + μ12 Jk (Dx) + μ21 Jk (Ax) + λ¯ 12 min{Jk (Dx) + q12 , Jk (x)}  + λ¯ 21 min{Jk (Ax) + q21 , Jk (x)}

(4.5)

lim Jk (x) = J (x)

(4.6)

then k→+∞

The existence of an optimal stationary policy to achieve the minimum in (4.2) and the convergence of J k (x) to the optimal cost function J(x) follow from the fact that only a finite number of controls is considered at each system state (Bertsekas, 1987). In order to find the explicit form of the optimal cost function in (4.4), we first establish the characteristics of the optimal cost function such as convexity, monotonicity, and asymptotic behaviors. Then we can characterize the control structure of the optimal stationary control policy, which can be used to solve the Bellman optimality Eq. (4.4) under this policy. Throughout this chapter, the term increasing refers to non-decreasing, and decreasing refers to non-increasing. Lemma 4.1 Define a function f(x) as increasing in D if f (x) ≤ f (Dx). Then the cost function has the following monotonic properties, (i) (ii)

J(Dx) − J(x) is increasing in D and decreasing in A. J(Ax) − J(x) is increasing in A and decreasing in D.

Proof We only need to show that J(Dx) − J(x) is increasing in D since ADx = DAx = x. From Proposition 4.2, the proof can be shown by the induction approach on k using iteration Eqs. (4.5) and (4.6). Induction on k is used to show that J k (Dx) − J k (x) is increasing in D for all k. J 0 (x) = 0 for any x ∈ X, so J 0 (Dx) − J 0 (x) = 0 and the assertion holds for k = 0. Suppose the assertion holds for k. Then, Jk+1 (Dx)−Jk+1 (x) = (β + v)−1 [(g(Dx)−g(x))

  + μ12 Jk D 2 x −Jk (Dx) + μ21 (Jk (ADx)−Jk (Ax)) 

   + λ12 min Jk D 2 x + q12 , Jk (Dx) − min{Jk (Dx) + q12 , Jk (x)}  + λ21 min{Jk (ADx) + q21 , Jk (Dx)}− min{Jk (Ax) + q21 , Jk (x)}

(4.7)

We need to show every term on the right-hand-side (RHS) of (4.7) is increasing in D. The first three terms of RHS in (4.7) are increasing in D by the definition of g

82

4 Optimal ECR Policy in Two-Depot Shuttle …

and the induction hypothesis. The fourth term of RHS in (4.7), excluding constants, can be rewritten as    min Jk D 2 x + q12 −Jk (Dx), 0 + max{−q12 , Jk (Dx)−Jk (x)}

(4.8)

Since min and max preserve monotonicity, (4.8) is increasing in D from the induction hypothesis. Similarly, from ADx = DAx = x, the fifth term of RHS in (4.7) can be rewritten as min{q21 , Jk (Dx)−Jk (x)} + max{Jk (D Ax)−Jk (Ax)−q21 , 0}

(4.9)

which is increasing in D. Thus, J k+1 (Dx) − J k+1 (x) is increasing in D. This completes the induction proof.  Lemma 4.2 The cost function has the following asymptotic behavior.



 (a) lim J D n+1 x − J (D n x) = c1− + c2+ /β; n→+∞  

(b) lim J An−1 x − J (An x) = − c1+ + c2− /β. n→+∞

Proof From Lemma 4.1, J(Dn+1 x) − J(Dn x) is increasing

in n. Using Proposition 4.2 and induction on n, lim J D n+1 x − J (D n x) ≤ c1− + c2+ /β. That means n→+∞

J(Dn+1 x) − J(Dn x) is bounded and converged  number. From the Bellman

to a finite Eq. (4.4), it is easy to show that the limit is c1− + c2+ /β. A similar argument applies to prove assertation (b). Note that as n increases the number of leased containers n x) approaches the discounted cost of a single leased increases and J(D n+1 x) − J(D  − +  container, that is c1 + c2 /β. Lemma 4.3 The cost function has the following convex and asymptotic behavior. (a) (b) (c) (d)

J (Dx) is convex in D; lim J (D n x) = +∞ and lim J (An x) = +∞; n→+∞ n→+∞

  − c1+ + c2− /β ≤ lim J (D n x)/n ≤ c1− + c2+ /β; n→+∞

 

− c1− + c2+ /β ≤ lim J (An x)/n ≤ c1+ + c2− /β. n→+∞

Proof (a)–(d) can be derived from Lemmas 4.1 and 4.2. Note that (b)–(d) are intuitively established since as n increases the number of leased containers increases linearly with n.  Definition 4.1 Define two thresholds as follows: h ∗1 := min{ x1 |J (Ax)−J (x) ≥ −q21 } h ∗2 := max{ x1 |J (Dx)−J (x) ≥ −q12 } For any x such that x1 < h ∗1 , we have J(Ax) − J(x) < − q21 , that is, J(x) − J(Ax) > q21 . From Lemma 4.1, it follows J(Dx) − J(x) ≥ J(x) − J(Ax) ≥ q21 ≥ − q12 , thus, x 1 < h ∗2 and h ∗1 ≤ h ∗2 .

4.4 Solve the Discounted Cost Case

83

Proposition 4.3 The optimal control policy is a threshold type with (a) (b)

λ12 = 0 if x1 ≤ h ∗2 and λ12 = λ12 if x1 > h ∗2 ; λ21 = 0 if x1 ≥ h ∗1 and λ21 = λ21 if x1 < h ∗1 .

Proof This follows immediately from Proposition 4.1, Lemma 4.2, and the definitions of h ∗1 and h ∗2 . 

 Corollary 4.1 (a) If c1+ + c2− /β ≤ q21 , the optimal policy for λ21 is: λ21 ≡ 0. (b) 

− If c1 + c2+ /β ≤ q12 , the optimal policy for λ12 is: λ12 ≡ 0.

 Proof From Lemma 4.2 and the definitions of h ∗1 and h ∗2 , h ∗1 = −∞ if c1− + c2+ /β   ≤ q12 and h ∗2 = +∞ if c1+ + c2− /β ≤ q21 . Physically, Corollary 4.1 implies that if the cost unit (i.e.,  transferring

q12 ) for an empty container from depot 1 to depot 2 is more than c1− + c2+ /β, then it is better not to transfer an empty container. This is intuitively true because in this case, the cost of repositioning an empty container will  than the cost  a

of leasing

be more container. However, in reality, we should have c1+ + c2− /β > q21 and c1− + c2+ /β > q12 , which represents the general practical case that repositioning empty containers is less expensive than continuing to lease containers in one depot while storing inventories in the other. Hence, it is reasonable to assume that 0 ≤ h ∗1 ≤ h ∗2 ≤ N. Now consider the system under control by a general threshold policy, i.e., replacing h ∗1 and h ∗2 in Proposition 4.3 by variables h1 and h2 respectively, where 0 ≤ h1 ≤ h2 ≤ N. Then, Eq. (4.3) for the corresponding discounted cost function J h (x) under the policy h = (h1 , h2 ) takes the form  J h (x) = (β + v)−1 g(x) + μ12 J h (Dx) + μ21 J h (Ax) + λ12 J h (x)  + λ21 J h (Ax) + λ21 q21 , if x1 < h 1 ;

(4.10)

 J h (x) = (β + v)−1 g(x) + μ12 J h (Dx) + μ21 J h (Ax) + λ12 J h (x)  + λ21 J h (x) , if h 1 ≤ x1 ≤ h 2 ;

(4.11)

 J h (x) = (β + v)−1 g(x) + μ12 J h (Dx) + μ21 J h (Ax) + λ12 J h (Dx)  + λ21 J h (x) + λ12 q12 , if x1 > h 2 . (4.12) The threshold structure of the optimal ECR policy is given in Proposition 4.3. The remaining issue is to find the optimal threshold values h ∗1 and h ∗2 . This is done by developing an explicit expression for the cost function J h (x).

84

4 Optimal ECR Policy in Two-Depot Shuttle …

4.4.2 Closed-Form Objective Function and Optimal Threshold Values Equations (4.10)–(4.12) together are sufficient to solve for the cost function J h (x) using the characteristic equation method (Feng & Yan, 2000). It consists of three steps. First, we solve the homogeneous equation and obtain the characteristic solution. Second, we derive a particular solution. Third, we construct the general solutions. Proposition 4.4 The cost function J h (x) under a threshold policy (h1 , h2 ) (0 ≤ h1 ≤ h2 ≤ N) has the closed-form as follows: If x 1 < 0, c1− + c2+ x1 β  

q21 λ21 + c2+ N − μ21 + λ21 − μ12 c1− + c2+ /β + β

J h (x) = w1r1x1 −

(4.13)

If 0 ≤ x 1 < h1 , c1+ − c2+ x1 β  

q21 λ21 + c2+ N + μ21 + λ21 − μ12 c1+ − c2+ /β + β

J h (x) = w2 r1x1 + w3r2x1 +

(4.14)

If h1 ≤ x 1 ≤ h2 , c1+ − c2+ x1 β 

c2+ N + (μ21 − μ12 ) c1+ − c2+ /β + β

J h (x) = w4 r3x1 + w5r4x1 +

(4.15)

If h2 < x 1 ≤ N, c1+ − c2+ x1 β  

q12 λ12 + c2+ N + μ21 − λ12 − μ12 c1+ − c2+ /β + β

J h (x) = w6r5x1 + w7r6x1 +

If x 1 > N,

(4.16)

4.4 Solve the Discounted Cost Case

J (x) = h

w8r6x1

85

 

q12 λ12 − c2− N + μ21 − λ12 − μ12 c1+ + c2− /β c1+ + c2− + x1 + β β (4.17)

where r 1 , r 2 , r 3 , r 4 , r 5 , and r 6 are given in Eqs. (4.21)–(4.23) and w1 , w2 , w3 , w4 , w5 , w6 , w7 , and w8 are determined by Eqs. (4.44)–(4.51) in the proof. Proof Equations (4.10)–(4.12) can be solved by the characteristic equation method. This consists of the following steps: (i) solving the homogenous equations to obtain the characteristic solution; (ii) finding a particular solution to (4.10)–(4.12); (iii) constructing the general solutions to (4.10)–(4.20) and finding the undetermined coefficients based on boundary conditions. This solution method is general. Step 1. Homogeneous equation and characteristic solution Removing the holding and leasing costs and the repositioning empty cost in (4.10)– (4.12) results in the homogenous equations.   J h (x) = (β + v)−1 μ12 J h (Dx) + μ21 J h (Ax) + λ12 J h (x) + λ21 J h (Ax) (4.18)   J h (x) = (β + v)−1 μ12 J h (Dx) + μ21 J h (Ax) + λ12 J h (x) + λ21 J h (x)

(4.19)

  J h (x) = (β + v)−1 μ12 J h (Dx) + μ21 J h (Ax) + λ12 J h (Dx) + λ21 J h (x) (4.20) The characteristic solution to the system of these simultaneous difference equations can be rewritten in terms of the operator A: A J h (x) = J h (Ax) Using this operator, the system (4.13)–(4.15) becomes   

(μ21 + λ21 )A2 − β + μ12 + μ21 + λ21 A + μ12 J h (x) = 0 

 μ21 A2 − (β + μ12 + μ21 )A + μ12 J h (x) = 0

 

 μ21 A2 − β + μ12 + μ21 + λ12 A + μ12 + λ12 J h (x) = 0

(4.21) (4.22) (4.23)

It can be shown that for each characteristic Eqs. (4.21)–(4.23) there are two solutions: one lies between 0 and 1 and the other is greater than 1. However, for Eq. (4.21) and x 1 < 0, the one that lies between 0 and 1 cannot be used to structure the solutions because inserting it will result in exponential growth in the cost function, which contradicts the linear increase of leasing cost and inventory cost in (4.10). With the same argument, the solution that is greater than 1 to the characteristic Eq. (4.23) cannot be used to structure the solution to the homogenous equation for x 1 > N.

86

4 Optimal ECR Policy in Two-Depot Shuttle …

Now let r 1 > 1 and r 2 < 1 be the solutions to the characteristic Eq. (4.21), r 3 > 1 and r 4 < 1 be the solutions to the characteristic Eq. (4.22), and r 5 > 1 and r 6 < 1 be the solutions to the characteristic Eq. (4.23). The homogenous solution to the system (4.10)–(4.12) can be specified using underdetermined coefficients method: J h (x) = w1r1x1 , if x1 < 0;

(4.24)

J h (x) = w2 r1x1 + w3r2x1 , if 0 ≤ x1 < h 1 ;

(4.25)

J h (x) = w4 r3x1 + w5r4x1 , if h 1 ≤ x1 ≤ h 2 ;

(4.26)

J h (x) = w6r5x1 + w7r6x1 , if h 2 < x1 ≤ N ;

(4.27)

J h (x) = w8r6x1 , if x1 > N .

(4.28)

Step 2. Finding a particular solution From the format of Eqs. (4.10)–(4.12), a linear form of J h (x) appears to satisfy all equations. We assume that a particular J h (x) has the following form: J h (x) = a + bx1

(4.29)

Substitute this particular cost function into (4.10)–(4.12), we have 

−βb = c1− + c2+ −βa + b(μ21 + λ21 − μ12 ) = −q21 λ21 − c2+ N  −βb = −c1+ + c2+ If 0 ≤ x1 < h 1 , −βa + b(μ21 + λ21 − μ12 ) = −q21 λ21 − c2+ N  −βb = −c1+ + c2+ If h 1 ≤ x1 ≤ h 2 , −βa + b(μ21 − μ12 ) = −c2+ N  −βb = −c1+ + c2+ If h 2 < x1 ≤ N , −βa + b(μ21 − μ12 − λ12 ) = −λ12 q12 − c2+ N  −βb = −c1+ − c2− If x1 > N , −βa + b(μ21 − μ12 − λ12 ) = −λ12 q12 + c2− N If x1 < 0,

(4.30)

(4.31)

(4.32)

(4.33)

(4.34)

Equations (4.29)–(4.34) together present a particular solution to the system of (4.10)–(4.12).

4.4 Solve the Discounted Cost Case

87

Step 3. Constructing general solutions By combining the solution to the homogenous system and the particular solution, we are able to construct a general solution to the system (4.10)–(4.12). A general solution to the system (4.10)–(4.12) is the sum of a general homogenous solution and a particular solution, which is given in (4.13)–(4.17). The undetermined coefficients w1 , w2 , w3 , w4 , w5 , w6 , w7 , and w8 in (4.13)–(4.17) can be determined by the boundary conditions at states x 1 = −1, 0, h1 − 1, h1 , h2 , h2 + 1, N and N + 1. To simplify the narrative, we introduce the following notation (with a slight abuse of the notation): J1h (x) = RHS of (4.13), x1 ∈ Z; J2h (x) = RHS of (4.14), x1 ∈ Z; J3h (x) = RHS of (4.15), x1 ∈ Z; J4h (x) = RHS of (4.16), x1 ∈ Z; J5h (x) = RHS of (4.17), x1 ∈ Z. Check the boundary conditions. Note that x is determined by the value of x 1 . To simplify the narrative, we sometimes use the values of x 1 to replace x. At the state x 1 = −1, Eq. (4.10) yields  

μ21 + λ21 J2h (0) − β + μ12 + μ21 + λ21 J1h (−1) + μ12 J1h (−2) = −g(−1) − λ21 q21

(4.35)

From the definition of J1h (x), the above equation also holds if J2h (0) is replaced by J1h (0). Therefore, we have J2h (0) = J1h (0)

(4.36)

At the state x 1 = 0, Eq. (4.10) yields  

μ21 + λ21 J2h (1) − β + μ12 + μ21 + λ21 J2h (0) + μ12 J1h (−1) = −g(0) − λ21 q21 From the definition of J2h (x), the above equation also holds if J1h (−1) is replaced by J2h (−1). It follows J1h (−1) = J2h (−1)

(4.37)

88

4 Optimal ECR Policy in Two-Depot Shuttle …

With the similar arguments at states x 1 = h1 − 1, h1 , h2 , h2 + 1, N and N + 1, we have J3h (h 1 ) = J2h (h 1 )

(4.38)

J2h (h 1 − 1) = J3h (h 1 − 1)

(4.39)

J4h (h 2 + 1) = J3h (h 2 + 1)

(4.40)

J3h (h 2 ) = J4h (h 2 )

(4.41)

J5h (N + 1) = J4h (N + 1)

(4.42)

J4h (N ) = J5h (N )

(4.43)

Equations (4.36)–(4.43) can be rewritten as  

w1 − w2 − w3 = c1+ + c1− μ21 + λ21 − μ12 /β 2 ; 

 r1−1 w1 − r1−1 w2 − r2−1 w3 = c1+ + c1− μ21 + λ21 − μ12 − β /β 2 ;

 r1h 1 w2 + r2h 1 w3 − r3h 1 w4 − r4h 1 w5 = −λ21 c1+ − c2+ + βq21 /β 2 ;

 r1h 1 −1 w2 + r2h 1 −1 w3 − r3h 1 −1 w4 − r4h 1 −1 w5 = −λ21 c1+ − c2+ + βq21 /β 2 ;

 r3h 2 +1 w4 + r4h 2 +1 w5 − r5h 2 +1 w6 − r6h 2 +1 w7 = −λ12 c1+ − c2+ − βq12 /β 2 ;

 r3h 2 w4 + r4h 2 w5 − r5h 2 w6 − r6h 2 w7 = −λ12 c1+ − c2+ − βq12 /β 2 ; 

 r5N +1 w6 + r6N +1 w7 − r6N +1 w8 = c2+ + c2− μ21 − λ12 − μ12 + β /β 2 ;  

r5N w6 + r6N w7 − r6N w8 = c2+ + c2− μ21 − λ12 − μ12 /β 2 . The above eight equations can uniquely determine the unknown variables w1 –w8 . We have   

r2 (c1+ + c1− ) (r1 − 1) μ21 + λ21 − μ12 − βr1 w3 = (4.44) (r2 − r1 )β 2

4.4 Solve the Discounted Cost Case

w6 =

89



  (c2+ + c2− ) (r6 − 1) μ21 − λ12 − μ12 − β r5N (r6 − r5 )β 2

(4.45)

  w4 = r4h 2 −h 1 +1 (r6 − r4 ) r2h 1 −1 (r1 − r2 )w3

  + λ21 c1+ − c2+ + βq21 (r1 − 1)/β 2  

 −(r1 − r4 ) r5h 2 (r6 − r5 )w6 − λ12 c1+ − c2+ − βq12 (r6 − 1)/β 2   / r3h 1 −1r4h 2 −h 1 +1 (r1 − r3 )(r6 − r4 ) − r3h 2 (r6 − r3 )(r1 − r4 ) (4.46)   

 w5 = r3h 2 −h 1 +1 (r6 − r3 ) r2h 1 −1 (r1 − r2 )w3 + λ21 c1+ − c2+ + βq21 (r1 − 1)/β 2  

 −(r1 − r3 ) r5h 2 (r6 − r5 )w6 − λ12 c1+ − c2+ − βq12 (r6 − 1)/β 2   / r3h 2 −h 1 +1r4h 1 −1 (r6 − r3 )(r1 − r4 ) − r4h 2 (r1 − r3 )(r6 − r4 ) (4.47)  

 w2 = r1−h 1 −r2h 1 w3 + r3h 1 w4 + r4h 1 w5 − λ21 c1+ − c2+ + βq21 /β 2

(4.48)

 

 w7 = r6−h 2 r3h 2 w4 + r4h 2 w5 − r5h 2 w6 + λ12 c1+ − c2+ − βq12 /β 2

(4.49)

 

w1 = w2 + w3 + c1+ + c1− μ21 + λ21 − μ12 /β 2    

w8 = r6−N r5N w6 + r6N w7 − c2+ + c2− μ21 − λ12 − μ12 /β 2

(4.50) (4.51)

Therefore, given a threshold policy with 0 ≤ h1 ≤ h2 ≤ N, the cost function is explicitly given in (4.13)–(4.17). This completes the proof.  Now we turn to find the optimal threshold values of (h1 , h2 ). Although the optimal cost function J h (x) depends on the initial state x, the optimal threshold values do not depend on the initial state. In other words, for any given x, the optimal threshold values always achieve the minimum of J h (x) Thus, we have the following result. Proposition 4.5 For a given initial state x, the optimal thresholds, h* = (h ∗1 , h ∗2 ) are determined by ∗

J (x) = J h (x) = min J h (x) h

where J(x) is given in (4.2) and J h (x) is given in Proposition 4.4. In fact, to find h ∗1 and h ∗2 , we can simply take x = 0. Using the closed-form of the cost function under the threshold control policy given in Proposition 4.4, the following search procedure can be used to find h* = (h ∗1 , h ∗2 ):

90

(i) (ii) (iii) (iv)

4 Optimal ECR Policy in Two-Depot Shuttle …

For each h1 ∈ [0, N]; For each h2 ∈ [h1 , N]; Evaluate the cost function J h (0) under h = (h1 , h2 ) using (4.14) or (4.15) and record the best one up to now; Repeat (i)–(iii) for all combinations and return the optimal parameters h ∗1 and h ∗2 .

Note that the initial state x(0) = x represents the initial fleet distribution over two depots. The optimal initial fleet distribution can be determined by minimizing the cost function J(x) over x. Proposition 4.6 For a given fleet size N, the optimal initial fleet distribution x* can be determined by x∗ = arg min J (x) = arg min{ x|J (x) > J (Dx)} x

x

For any given fleet size N, the explicit cost functions and Propositions 4.5 and 4.6, determine the optimal thresholds h ∗1 and h ∗2 , and the optimal initial fleet distribution x* .

4.4.3 Numerical Experiments The numerical experiments are organized into three sections. In Section “Verify Analytical Results via Value Iteration and Simulation”, the analytical results are verified by the dynamic programming value iteration algorithm and stochastic discrete-event simulation. In Section “Robustness Against Distribution Type of Empty Container Transfer Time”, the robustness of the model with respect to the distribution type of empty container transfer time is evaluated. In Section “Sensitivity to the Laden Container Arrival Process”, the sensitivity of the model to the distribution of the laden container arrival process is investigated. The system parameters are set as follows: β = 0.9, λ12 = λ21 = 1, c1+ = c2+ = 1, q12 = q21 = 1. The container fleet size N varies in a certain range so that the effect of fleet size can be evaluated. The following three cases are experimented. Case A. μ12 = μ21 = 1 and c1− = c2− = 20 represent a scenario with balanced laden container arrivals and symmetrical container leasing costs. Case B. μ12 = 2, μ21 = 1 and c1− = c2− = 20 represent a scenario with imbalanced laden container arrivals and symmetrical container leasing costs. The laden container arrival rate at depot 2 is twice of that at depot 1. Case C. μ12 = 2, μ21 = 1, c1− = 40 and c2− = 20 represent a scenario with imbalanced laden container arrivals and asymmetrical container leasing costs. The container leasing cost at depot 1 is twice of that at depot 2, which reflects the fact that there are higher trade demands from depot 1 to depot 2.

4.4 Solve the Discounted Cost Case

91

Verify Analytical Results via Value Iteration and Simulation For the exponential distribution of ECR time, the optimal threshold values ∗ ∗ initial states (x1∗ ) and the minimum cost values (h

1 , h∗2), the ∗optimal ∗  J x1 := J x1 , N − x1 for all three cases with different fleet size, N can be obtained analytically by the results of Sect. 4.4. Numerically, the value iteration algorithm can be applied to compute the optimal policy and the optimal cost in Propositions 4.1 and 4.2, by appropriately limiting the state space. A stochastic discrete-event simulation model is developed and applied to evaluate ECR policies by averaging the performance over multiple samples. In our experiments, 3000 replicates are used to estimate the average cost. Given the threshold structure of the empty repositioning policy, the simulation model can also be used to estimate the optimal values of the threshold parameters h1 and h2 . This can be done by embedding the simulation into a search procedure. However, it should be pointed out that this simulation-based search process may be time consuming when the state space becomes large. Table 4.1 compares the results across the analytic, value iteration, and simulation models. All three methods produced the same optimal values for the threshold-type policies (h ∗1 , h ∗2 ) and the optimal initial states (x1∗ ), which are given in the second, third, and fourth columns in Table 4.1. The minimum cost values J(x1∗ ) under the analytical method, the minimum cost from the value iteration method J ∗ (x1∗ ), and the average cost J (x1∗ ) under the optimal threshold policy by simulation are given in the last three columns in Table 4.1 respectively. We use J * (.) to denote the minimum Table 4.1 Optimal threshold values, initial states, costs by analytical, value iteration, and simulation methods h ∗1

h ∗2

x1∗

3

1

2

1

6.76

6.76

6.81

4

1

3

2

6.35

6.35

6.36

5

2

3

2

6.83

6.83

6.85

6

2

4

3

7.38

7.38

7.37

3

2

3

2

10.66

10.66

10.65

4

3

3

3

9.07

9.07

9.19

5

3

4

4

8.63

8.63

8.75

6

4

5

5

8.87

8.87

8.95

4

3

4

3

11.83

11.83

11.77

5

4

4

4

10.21

10.21

10.31

6

4

5

5

9.76

9.76

9.85

7

5

6

6

9.99

9.99

10.06

N

Anal J(x1∗ )

Value ite J ∗ (x1∗ )

Simu J (x1∗ )

Case A

Case B

Case C

92

4 Optimal ECR Policy in Two-Depot Shuttle …

cost from the dynamic programming value iteration method because its optimality is guaranteed if the number of iteration and the state space is sufficiently large. From Table 4.1, it can be seen that the value iteration algorithm produced the same optimal cost as the analytical method. The costs by simulation method are very close to those by other two methods. The numbers in bold font in Table 4.1 indicate the minimum cost in each case for varying container fleet sizes. It reveals that all three methods have the same optimal fleet size, i.e., N * is equal to 4, 5, and 6 for Case A, B, and C respectively. This verifies the accuracy of the analytical method. Several managerial insights could be observed from Table 4.1: (i) for each case, as the container fleet size N increases both threshold values h ∗1 and h ∗2 are increasing, which indicates that there is a decreasing need to reposition empty containers to depot 1. With the same fleet size, the optimal threshold values are increasing from Case A to Case B to Case C. This is due to the fact that in Case B more containers are required to meet higher demands at depot 1, and in Case C a higher leasing cost at depot 1 works in favor of keeping more empty containers at depot 1; (ii) the optimal initial number of containers at depot 1, i.e., x1∗ , is half of the fleet size in Case A because the system is symmetric. However, in Cases B and C, the value of x1∗ is higher than half of the fleet size due to the higher demand and higher leasing cost at depot 1; (iii) for each case, the optimal cost appears to be a U shape with respect to the fleet size N. This indicates the optimal container fleet size could be easily found as a relatively long-term decision.

Robustness Against Distribution Type of Empty Container Transfer Time The analytical results are derived based on the assumption that the transportation time of empty containers is exponentially distributed. In this section, we will test the robustness of the model against the randomness of the empty repositioning time. Three other distribution types of ECR times are tested. The first is a deterministic repositioning time at 1/λi j . The second is a uniform distribution in the interval [0, 2/λi j ]. The third is a normal distribution N(μ, σ 2 ) with μ = 1/λi j , σ = 0.2μ and left-truncated at zero. All three types of distributions have the same or similar mean value at 1/λi j . We examine the three cases (A, B, and C) under non-exponential distributions of empty repositioning times. Firstly, for each distribution, the threshold control parameters h1 and h2 are optimized using the simulation-based search procedure and the optimal cost J (x1∗ ) for this derived policy is calculated using the value of x1∗ given in Table 4.1. In other words, J (x1∗ ) is the derived cost corresponding to the optimal threshold values found by the simulation-based search procedure. Secondly, the analytically derived threshold control policies that are given in Table 4.1 are applied to the non-exponential distributions of empty repositioning time, and their performances are evaluated by the simulation model. The costs corresponding to the threshold values obtained by the analytical method (i.e., columns 2 and 3 in Table 4.1) are not explicitly given in Table 4.2, instead, their percentage costs above J (x1∗ ) are given in columns “%AN” in Table 4.2.

4.4 Solve the Discounted Cost Case

93

Table 4.2 Compares results across three distributions—deterministic, uniform, and normal N

Det J (x1∗ )

%AN

Uni J (x1∗ )

%AN

Norm J (x1∗ )

%AN

Case A 3

7.63

0.00

7.20

0.00

7.65

0.00

4

6.76

0.00

6.59

0.00

6.81

0.00

5

7.06

0.25

6.94

0.00

7.16

0.03

6

7.50

0.01

7.46

0.00

7.48

0.99

11.62

0.00

11.18

0.00

11.71

0.00

4

9.61

3.34

9.45

3.54

9.71

1.63

5

9.00

2.09

8.85

2.40

9.15

1.27

6

9.21

0.79

9.11

1.06

9.26

0.81

4

12.24

0.00

12.10

0.00

12.46

0.00

5

10.44

3.06

10.40

3.15

10.56

1.54

6

9.95

1.57

9.86

2.03

10.14

1.07

7

10.22

0.83

10.14

1.17

10.28

0.67

Case B 3

Case C

The bold values refer to the optimal solution for each case

From Table 4.2, it can be seen that the derived policy performs effectively. In most cases, its costs are no more than 1% above the optimal cost. Among all the experimented cases, the worst performance is 3.54% above the optimal cost. The optimal fleet size is 4 for Case A, 5 for Case B, and 6 for Case C across all three new distributions, which is the same as that in Table 4.1. This indicates the optimal fleet size obtained from the exponential distribution is indeed robust against the distribution type of the empty repositioning time. Suppose we are working at the optimal container fleet size, then the value of “%AN” ranges 0–2.09%. Overall, Table 4.2 shows that the derived ECR policy and fleet size are actually optimal or near-optimal and quite robust against the distribution type of the repositioning times. The unique advantage of our model over simulation model is that we provide an analytical method to determine the threshold values.

Sensitivity to the Laden Container Arrival Process When deriving the analytic threshold results, the laden container arrival process is assumed to be a Poisson process. Namely, the interval times between arriving laden containers are exponentially distributed. We now test the sensitivity of the model to this assumption.

94

4 Optimal ECR Policy in Two-Depot Shuttle …

Suppose that the interval times between laden containers are uniformly distributed, i.e., U(0, 2/μij ). In the long term, this assumption yields the laden container arrival rate μij . The ECR time takes three types of distribution as in Section “Robustness Against Distribution Type of Empty Container Transfer Time”, i.e., deterministic, uniform, and normal distributions. Similar to Section “Robustness Against Distribution Type of Empty Container Transfer Time”, we first optimize the threshold parameters h1 and h2 using the simulation-based optimization method and obtain the optimal cost J (x1∗ ). Secondly, we apply the derived threshold control policies that are given in Table 4.1 to three cases with different distribution types of empty container transfer time and evaluate their performances. The percentage of the cost above J (x1∗ ) is given in columns “%AN” in Table 4.3. From Table 4.3, the derived policy appears to perform reasonably well when the laden container arrival times are uniformly distributed. In most cases, its performance is within 3% above the optimal cost. The optimal fleet size in Table 4.3 is 3 for Case A, 4 for Case B, and 5 for Case C, which is slightly lower than those in Table 4.2. This indicates that the optimal fleet size based on exponential laden container arrival times may be slightly overestimated if the actual laden container arrival times are uniformly distributed. This may be explained by the fact that the uniform distribution has lower variance than the exponential distribution. Table 4.3 Comparing the costs of simulation-based optimal threshold policy with the costs of analytical threshold policy under uniform distribution for laden container arrival times Det J (x1∗ )

%AN

Uni J (x1∗ )

%AN

Norm J (x1∗ )

%AN

3

4.93

0.00

4.83

0.65

4.96

0.31

4

5.03

0.00

4.99

0.00

5.00

0.00

5

5.90

3.18

5.85

4.23

5.85

4.69

6

6.79

1.20

6.79

1.18

6.79

1.16

3

6.96

0.00

7.11

0.00

7.07

0.65

4

6.27

4.27

6.38

5.64

6.32

4.68

5

6.58

2.65

6.63

2.38

6.58

2.37

6

7.34

3.03

7.34

2.90

7.38

2.41

4

7.02

0.00

7.30

0.00

7.21

0.00

5

6.90

4.54

7.06

5.47

6.93

5.79

6

7.49

2.27

7.54

2.30

7.54

1.70

7

8.36

3.15

8.37

2.74

8.36

2.96

N Case A

Case B

Case C

The bold values refer to the optimal solution for each case

4.5 Solve the Long-Run Average Cost Case

95

4.5 Solve the Long-Run Average Cost Case One limitation of the discounted cost model is that the performance depends on the initial container distribution over two depots. In reality, the objective functions determined by the steady states (i.e., independent of initial states) are more interesting. Such objective functions include long-run average cost and service level (which may be defined as the fraction of demands served by owned containers). This section will deal with the long-run average cost model. First, the continuous-time Markov decision problem is converted into an equivalent discrete-time Markov decision problem as in Sect. 4.3. Second, the stationary distribution of the induced dynamic system under a threshold control policy is derived. Third, we show that the threshold control policy is indeed optimal for the long-run average cost model (Song, 2005). Following the discussions in Sect. 4.2, the long-run average cost optimization problem is to find the optimal control policy u ∈  to minimize the following cost: 1 V (u) = lim inf E T →∞ T

T [q12 λ12 (x(t)) + q21 λ21 (x(t)) + g(x(t))]dt 0

From the uniformization in Sect. 4.3, we have obtained the converted discretetime Markov chain. As the system state x is actually determined by a single integer x 1 because x 2 = N − x 1 , we will use a scalar k (= x 1 ) to represent the system state to make the narrative more conventional (to represent an integer). Under the uniformized discrete-time Markov chain, the step cost incurred at state k with control policy u = (λ12 (k), λ21 (k)) is given by G(k, u) = g(k) + q12 λ12 (k) + q21 λ21 (k) Note that in terms of stationary control policy set , the original continuous-time Markov chain problem is the same as the uniformized discrete-time Markov chain problem. Therefore, the problem is equivalent to find an optimal stationary policy u ∈  to minimize V (u) = lim inf n→∞

n 1 E G(k, u) n x =1

(4.52)

1

Clearly, the induced Markov chain under any stationary policy u ∈  is irreducible and homogeneous. We define that the system is stable if the steady-state distribution exists for some u ∈ . In the previous sections, we have shown that the optimal ECR policy is of threshold type for the discounted cost case. For the long-run average cost case, we are also interested in investigating the optimality of the threshold-type control policies. Define a subset of , which consists of all threshold control policies characterized by two integers l and h, where l ≤ h.

96

4 Optimal ECR Policy in Two-Depot Shuttle …

Definition 4.2 A threshold control policy ul,h states that: λ21 (k) = 0 if k ≥ l and λ21 (k) = λ21 if k < l; λ12 (k) = λ12 if k ≥ h and λ12 (k) = 0 if k < h. where k represents the system state, i.e., x = (k, N − k).

4.5.1 Stationary Distribution Under Threshold Control Policy Under a threshold control policy ul,h , the induced dynamic system forms an irreducible, aperiodic, and homogeneous Markov chain. The aperiodic is due to the fact that self-transition probability Prob{x | x} > 0 for x ∈ X (cf. Sect. 4.3). Let π k = Prob{state k} be the limiting state probability of the Markov chain. From Probability theory, the limiting state probability in an irreducible aperiodic homogeneous Markov chain always exists and it is independent of the initial state probability distribution. In addition, we have that either π k ≡ 0, in which case all states are transient or null recurrence; or π k > 0, in which case all states are positive recurrence and {π k } forms a stationary distribution (i.e., steady-state probability distribution). The following aims to derive the stationary distribution, which will then be utilized to calculate the threshold values. The stationary distribution may be found by solving a set of flow balance equations (based on the transition map under the threshold control policy) and a normalization condition. The flow balance equations are given as follows 

(μ21 + λ21 )πk−1 + μ12 πk+1 = μ12 + μ21 + λ21 πk , if k < l;

(4.53)

(μ21 + λ21 )πk−1 + μ12 πk+1 = (μ12 + μ21 )πk , if k = l;

(4.54)

μ21 πk−1 + μ12 πk+1 = (μ12 + μ21 )πk , if l < k < h − 1;

(4.55)

μ21 πk−1 + (μ12 + λ12 )πk+1 = (μ12 + μ21 )πk , if k = h − 1;

(4.56)

μ21 πk−1 + (μ12 + λ12 )πk+1 = (μ12 + μ21 + λ12 )πk , if k > h − 1.

(4.57)

The normalization condition is: 

πk = 1.

(4.58)

4.5 Solve the Long-Run Average Cost Case

97

The linear recursive Eq. (4.53) contains infinite number of variables {π k : −∞ < k < l}. Define a forward shifting operator A: Aπ k = π k+1 . The recursive Eq. (4.53) can be rewritten as 



μ12 A2 − μ12 + μ21 + λ21 A + μ21 + λ21 πk−1 = 0, for k < l.

(4.59)

The two roots of the characteristic equation of (4.59) are 1 and (μ21 + λ21 )/μ12 . A general solution of (4.53) is therefore −k + b1 , for k < l; πk = a1 ρ12

(4.60)

where ρ12 := μ12 /(μ21 + λ21 ), a1 and b1 are undetermined coefficients. With the same arguments, from (4.55) and (4.57), we have πk = a2 ρ k + b2 , for l < k < h − 1;

(4.61)

k πk = a3 ρ21 + b3 , for k > h − 1;

(4.62)

where ρ := μ21 /μ12 , ρ21 := μ21 /(μ12 + λ12 ). In Eq. (4.61), we can assume b2 = 0 if ρ = 1. The undetermined constants, a1 , b1 , a2 , b2 , a3 and b3 , can be determined by the boundary conditions and normalization condition. Proposition 4.7 The sufficient and necessary conditions for the existence of stationary distribution {π k } under a threshold control policy ul,h are: ρ 12 < 1 and ρ 21 < 1. Moreover, the stationary distribution {π k } is given by ⎧ −k ⎨ a1 ρ12 k ≤ l πk = a2 ρ k l ≤ k ≤ h − 1 ⎩ k k ≥h−1 a3 ρ21 where 

h−1 μ12 + λ12 a1 = a2 · ; a3 = a2 · μ12 μ21 + λ21  −1 h−l−1  ρ12 1 ρ h−l−1 ρ21 k a2 = l · + + ρ . ρ 1 − ρ12 1 − ρ21 k=0 

μ21

l

Proof We follow the standard arguments in Karlin and Taylor (1981). The normalization condition requires b1 = b3 = 0. From (4.53) and (4.55), the boundary condition at state k = l is −l = a2 ρ l + b2 πl = a1 ρ12

(4.63)

98

4 Optimal ECR Policy in Two-Depot Shuttle …

From (4.55) and (4.57), the boundary condition at state k = h − 1 yields h−1 = a2 ρ h−1 + b2 πh−1 = a3 ρ21

(4.64)

Equation (4.54) yields the boundary condition at state k = l: −l = a2 ρ l + b2 /ρ a1 ρ12

(4.65)

Comparing (4.63) with (4.65), we have b2 = 0. Thus, substituting (4.60)–(4.62) into the normalization condition (4.58). It follows that if and only if ρ 12 < 1 and ρ 21 < 1, there exists a positive solution of a2 , which can be expressed in the form in Proposition 4.7. This completes the proof.  The sufficient and necessary conditions given in Proposition 4.1 ensure the stability of the dynamic system in long term. It can be interpreted as follows: for each depot, the laden container arrival rate should be less than the sum of the laden container departure rate and the maximum ECR rate. In the remainder of the chapter, we assume that ρ 12 < 1 and ρ 21 < 1. Therefore, the induced irreducible, aperiodic, and homogeneous Markov chain is positive recurrence and also ergodic. Proposition 4.8 The long-run average cost under a threshold control policy ul,h has the closed-form as follows ∞ h 

 ρ −l+1 ρ21 + a3 q12 λ12 + g(k)πk V u l,h = a1 q21 λ21 12 1 − ρ12 1 − ρ21 k=−∞

(4.66)

Proof From Ergodic theory, the cost function (4.52) can be rewritten as 

  G(k)πk = (g(k) + q21 λ21 (k) + q12 λ12 (k))πk V u l,h = k

=

l−1  k=−∞

k

q21 λ21 πk +

∞ 

q12 λ12 πk +

k=h

Using Proposition 4.7, Eq. (4.66) can be derived.

∞ 

g(k)πk

k=−∞



Proposition 4.9 The optimal threshold control u l ∗ ,h ∗ can be determined by u l ∗ ,h ∗ = arg minl,h V (u l,h ), where V(ul,h ) is given in Proposition 4.8. Proposition 4.10 Under a threshold control policy ul,h , the service level, which is defined as the  Nfraction of demands served by the owned container fleet, can be πk . calculated by k=0

4.5 Solve the Long-Run Average Cost Case

99

4.5.2 Optimality of the Threshold Control Policy In this section, we will show that the optimal threshold control given in Proposition 4.9 is actually an optimal policy among all stationary control policies. This can be done by applying the average cost optimality theory developed in Sennott (1999) to our problem. Since it has been shown that the induced Markov chain under a threshold control is positive recurrent and covers the entire state space. From Corollary 7.5.10 in Sennott (1999), the following average cost optimality equation holds ⎫ ⎧ ⎬ ⎨1  V∗ = min G(k, u) + pk j w(k) w(k) + u∈ ⎩ v ⎭ v j = [g(k) + μ12 w(k − 1) + μ21 w(k + 1) + λ12 min{w(k − 1) + q12 , w(k)}  + λ21 min{w(k + 1) + q21 , w(k)} /v

(4.67)

where V * is the optimal average cost, w(k) is a finite function, pkj is the state transition probability from state k to state j after uniformization, and v is the uniformization transition rate. The following result is immediate. Proposition 4.11 The optimal stationary control policy u = (λ12 (k), λ21 (k)) can be stated by: λ12 (k) = λ12 , if w(k − 1) − w(k) < −q12 ; λ12 (k) = 0, otherwise. λ21 (k) = λ21 , if w(k + 1) − w(k) < −q21 ; λ21 (k) = 0, otherwise. Proposition 4.12 w(k + 1) − w(k) is non-decreasing in k. Proof Replace the average cost problem with a discounted cost one for the same Markov chain. Let V β (k) denote the discounted cost function with a discounted factor β. Note

that V β (k + 1)  − V β (k) is non-decreasing in k by Lemma 4.1, and w(k) = lim Vβ (k) − Vβ (0) by Sennott (1999), the assertion is true.  β→0

Proposition 4.13 The optimal stationary policy is a threshold control u l ∗ ,h ∗ with l* ≤ h* . Proof Define two threshold values as follows: l ∗ := min{ k|w(k + 1)−w(k) ≥ −q21 } and h ∗ := min{ k|w(k)−w(k − 1) > q12 } From Propositions 4.11 and 4.12, the optimal stationary policy can be described by: λ21 (k) = λ21 if k < l * and λ21 (k) = 0 if k ≥ l * ; λ12 (k) = λ12 if k ≥ h* and λ12 (k) = 0 if k < h* . Therefore, the optimal stationary policy is of threshold control type. In addition, from Proposition 4.12 and the definitions of l * and h* , we have: w(h* + 1)

100

4 Optimal ECR Policy in Two-Depot Shuttle …

− w(h* ) ≥ w(h* ) − w(h* − 1) > q12 ≥ −q21 . It follows that l* ≤ h* . This completes the proof.  In fact, the result of l* ≤ h* for the optimal threshold control is intuitive because it would not be beneficial to reposition empty containers simultaneously from depot 1 to depot 2 and also from depot 2 to depot 1 in our setting.

4.5.3 Numerical Examples This section presents numerical examples to demonstrate the results. Consider a twodepot shuttle system with parameter setting as follows: λ12 = λ21 = 1, c1+ = c2+ = 1, c1− = c2− = 10, q12 = q21 = 1, μ12 = 0.5, and μ21 = 1. From Proposition 4.8, the long-run average costs under different threshold control policies ul,h are calculated and given in Table 4.4, where the fleet size N equals 4. The cases with l > h are ignored (because they are not optimal by Proposition 4.13). It can be seen from Table 4.4 that the optimal threshold values are: l* = 1, h* = 2. The result is in agreement with the intuition that fewer containers should be stored in depot 1 because the demand rate from depot 1 to depot 2 is less than that from depot 2 to depot 1 and all other parameters are the same for two depots. Now let’s vary the fleet size from 1 to 5, the optimal threshold values, optimal average costs, and corresponding service levels (defined in Proposition 4.10) are given in Table 4.5. It can be seen that with the given system parameter settings the optimal fleet size is 2, which is shown in bold font in Table 4.5. Moreover, the service level is increasing as the container fleet size increases. The point is that the model is able to obtain the optimal threshold control policy and the service level analytically, and can also answer the fleet sizing problem, which is another important decision problem in transportation research. Table 4.4 Average costs under different threshold control policies ul,h l = −1

l=0

l=1

l=2

h = −1

3.4994







h=0

2.4512

2.0122





h=1

1.5107

1.1837

1.2942



h=2

1.1825

1.0189

1.0152

1.2492

h=3

1.1041

1.0257

1.0248

1.0937

h=4

1.2301

1.1950

1.2014

1.2454

The bold values refer to the optimal solution for each case

4.6 Extension to Cases with External Supply and Demand

101

Table 4.5 Optimal threshold values, average costs, and service levels with varying fleet sizes N

l*

h*

J*

Service level (%)

1

0

1

1.0840

75.00

2

0

1

0.9801

83.57

3

0

2

0.9312

90.88

4

1

2

1.0152

95.93

5

1

3

1.1125

97.38

4.6 Extension to Cases with External Supply and Demand Containers are not necessarily consolidated and unpacked at depots. When laden containers are dispatched to customers and empty containers are required from depots to the customers’ warehouses for consolidation, the system will face the additional uncertainty of external supply and demand of empty containers at each depot. This is because it is often unknown when the laden containers will become empty and be returned, and when a new request for empty containers will be received. As a result, the container fleet size in the two-depot system will be varying. This section extends the proposed models to the cases with external supply and demand of empty containers. The following additional notations are introduced. ai di

the empty container arrival rate from external customers to depot i in the Poisson arrival process. the empty container request rate by external customers from depot i in the Poisson arrival process.

The discounted-cost optimal ECR problem can be formulated like (4.1) and (4.2), i.e., g(x(t)) = c1+ x1+ (t) + c1− x1− (t) + c2+ x2+ (t) + c2− x2− (t) ⎡ ⎛∞ ⎞⎤  J (x) = min⎣ E ⎝ e−βt [q12 λ12 (t) + q21 λ21 (t) + g(x(t))]dt|x(0) = x⎠⎦ u

0

where 0 < β < 1 is a discount factor.   Let X = {x = (x 1 , x 2 ) | x 1 , x 2 ∈ Z} be the system state space. Let  = u := (λ12 (t), λ21 (t))|λ12 (t) ∈ 0, . . . , λ12 ,    λ21 (t) ∈ 0, . . . , λ21 , t ∈ (0, ∞) be the admissible control set consisting of all stationary control policies (i.e., state feedback policies). The optimization problem is to find the optimal control policy u ∈  to minimize the infinite-horizon expected discounted cost starting from an initial state x:

102

4 Optimal ECR Policy in Two-Depot Shuttle …

Fig. 4.3 The system state transition map in two-depot system with external flows

x1, x2+1

x1-1, x2+1 12

a2

21

a1

x1 , x2

x1-1, x2 d1

x1+1, x2

21

d2

12

x1, x2-1

x1+1, x2-1

Note that we now have four additional events representing the empty container supply and demand arrivals at both depots. The system state transition map of this continuous-time Markov chain model with external flows can be illustrated in Fig. 4.3. The continuous-time Markov chain model in Fig. 4.3 can be transformed into an equivalent discrete-time Markov chain model by using the uniformization technique with the uniform transition rate ν = μ12 + μ21 + λ12 + λ21 + a1 + a2 + d1 + d2 . Then, under an admission control policy u ∈ , the one-step transition probability function Prob(y | x, u) at each state x ∈ X is given as follows: Prob( y|x, u) = (μ21 + λ21 (y))/v, if y = (x1 + 1, x2 −1); Prob( y|x, u) = (μ12 + λ12 (y))/v, if y = (x1 −1, x2 + 1); Prob( y|x, u) = a1 /v, if y = (x1 + 1, x2 ); Prob( y|x, u) = d1 /v, if y = (x1 −1, x2 ); Prob( y|x, u) = a2 /v, if y = (x1 , x2 + 1); Prob( y|x, u) = d2 /v, if y = (x1 , x2 −1);

 Prob( y|x, u) = λ12 + λ21 − λ21 (k) − λ21 (k) /v; if y = x; Prob( y|x, u) = 0, otherwise. To simplify the notation, define A1 x: = (x 1 + 1, x 2 ), A2 x: = (x 1 , x 2 + 1), D1 x: = (x 1 − 1, x 2 ), D2 x: = (x 1 , x 2 − 1). Following the stochastic dynamic programming theory, the Bellman optimality equation for the minimum expected discounted cost

4.6 Extension to Cases with External Supply and Demand

103

J(x) in can be rewritten as, J (x) = (β + v)−1 min [g(x) + q12 λ12 + q21 λ21 + a1 J (A1 x) λ12 ,λ21

+ a2 J (A2 x) + d1 J (D1 x) + d2 J (D2 x) + μ12 J (D1 A2 x) + μ21 J (A1 D2 x) + λ12 J (D1 A2 x) + λ21 J (A1 D2 x)

  + λ12 + λ21 − λ12 − λ21 J (x) The above equation can be further simplified as J (x) = (β + v)−1 [g(x) + a1 J ( A1 x) + a2 J (A2 x) + d1 J (D1 x) + d2 J (D2 x) + μ12 J (D1 A2 x) + μ21 J ( A1 D2 x) + λ12 min{J (D1 A2 x) + q12 , J (x)}  + λ21 min{J (A1 D2 x) + q21 , J (x)} Now we derived the equivalent discrete-time Markov chain model. This model is tractable at least numerically. It should be pointed out that the owned container fleet size is not fixed anymore because owned containers can exit and enter the two-depot shuttle system randomly.

4.7 Summary and Notes This chapter addresses the ECR problem in a two-depot service system. An eventdriven model is formulated based on continuous review and discrete state. The ECR decisions are made at each event when the system state changes. We take the perspective of the infinite time horizon to model the dynamic system with the emphasis on seeking the optimal stationary ECR policy. Both the discounted cost version and the long-run average cost version are investigated. The closed forms of the optimal cost functions and the optimal ECR policies are obtained. It is shown that the optimal ECR policies are a threshold policy, characterized by two control parameters, in both the discounted cost and the long-run average cost cases. This provides theoretical support for the use of simple threshold types of control policies in practice because of their optimality in certain service systems. Another advantage of the models in this chapter is to enable us to find the optimal threshold values analytically. Although the analytical results are derived based on the assumption that empty container transfer times are exponentially distributed and the laden container arrivals are Poisson processes, numerical experiments show that the model is fairly robust

104

4 Optimal ECR Policy in Two-Depot Shuttle …

against variations in the distributions of empty container reposition time and against variations in the laden container arrival processes. The models can be extended to more complicated transport systems such as huband-spoke systems (Song & Carter, 2008), which will be discussed in next chapter.

References Bertsekas, D. P. (1987). Dynamic programming: Deterministic and stochastic models. Prentice-Hall. Cheung, R. K., & Powell, W. B. (1996). An algorithm for multistage dynamic networks with random arc capacities, with an application to dynamic fleet management. Operations Research, 44(6), 951–963. Choong, S. T., Cole, M. H., & Kutanoglu, E. (2002). Empty container management for intermodal transportation networks. Transportation Research Part E, 38(6), 423–438. Deb, R. K. (1978). Optimal dispatching of a finite capacity shuttle. Management Science, 24(13), 1362–1372. Deb, R. K., & Schmidt, C. P. (1987). Optimal average cost policies for the two-terminal shuttle. Management Science, 33(5), 662–669. Du, Y. F., & Hall, R. (1997). Fleet sizing and empty equipment redistribution for center-terminal transportation networks. Management Science, 43(2), 145–157. Feng, Y. Y., & Yan, H. M. (2000). Optimal production control in a discrete manufacturing system with unreliable machines and random demands. IEEE Transactions on Automatic Control, 45(12), 2280–2296. Karlin, S., & Taylor, H. M. (1981). A second course in stochastic processes. Academic Press. Puterman, M. L. (1994). Markov decision processes: Discrete stochastic dynamic programming. Wiley. Sennott, L. I. (1999). Stochastic dynamic programming and the control of queueing systems. Wiley. Sethi, S., & Zhang, Q. (1994). Hierarchical decision making in stochastic manufacturing systems. Birkhauser. Song, D. P. (2005). Optimal threshold control of empty vehicle redistribution in two depot service systems. IEEE Transactions on Automatic Control, 50(1), 87–90. Song, D. P. (2013). Optimal control and optimization in stochastic supply chain systems. Springer. Song, D. P., & Carter, J. (2008). Optimal empty vehicle redistribution for hub-and-spoke transportation systems. Naval Research Logistics, 55(2), 156–171. Song, D. P., & Earl, C. F. (2008). Optimal empty vehicle repositioning and fleet-sizing for two-depot service systems. European Journal of Operational Research, 185(2), 760–777.

Chapter 5

Optimal and Near-Optimal ECR Policies in Hub-and-Spoke Systems: Continuous Review

Abstract This chapter considers the ECR problem in a hub-and-spoke transportation system over an infinite time horizon. Similar to the methodology in Chap. 4, we take the perspective of continuous review and discrete state to formulate an eventdriven Markov decision model. The empty repositioning decisions are made at each epoch when the system state changes. To overcome the computational complexity of the stochastic dynamic programming model, we propose a dynamic decomposition procedure, whose computational complexity is linear in the number of spokes and can be calculated offline. The requirement for online calculation and data communication is very low. We analyze the structures of the dynamic decomposition policy and show that the dynamic decomposition policy has the same asymptotic behaviors as the optimal ECR policy. The proposed dynamic decomposition procedure can be applied to both discounted cost and long-run average cost cases. Numerical experiments demonstrate the effectiveness of the dynamic decomposition policy and its robustness against the assumption of the distribution types in terms of the laden container arrivals and the empty container transfer times. The model is then extended to the cases with external supply and demand of empty containers at all depots, where empty containers may exit and enter the two-depot shuttle system randomly.

5.1 Introduction In the last chapter, we investigated the ECR problem in a two-depot service system over an infinite time horizon. Under certain assumptions, we prove that the optimal state feedback policy is of threshold-type characterized by two control parameters, and an explicit form of the optimal cost function is established. This chapter extends the above work to a hub-and-spoke system that is similar to Du and Hall (1997). A hub-and-spoke distribution network resembles the structure of a bicycle wheel, where the center of the wheel is the hub and each spoke represents a direction of delivery or pickup. Hub-and-spoke transportation system has attracted a lot of attention in inland logistics management. Du and Hall (1997) used a single-value threshold control policy to redistribute empties in a hub-and-spoke network with random demands

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 D.-P. Song and J. Dong, Modelling Empty Container Repositioning Logistics, https://doi.org/10.1007/978-3-030-93383-8_5

105

106

5 Optimal and Near-Optimal ECR Policies in Hub-and-Spoke …

and deterministic travel times. Their model is based on the assumptions that number of terminals is large and terminal stockout probabilities are low. Transportation service systems are discrete event dynamic systems (Cassandras & Lafortune, 1999) and their dynamics can be represented by the state changes over time. We focus on the state feedback control policies. The system state is represented by the inventory level of empty containers at each depot. The ECR decisions are made at each epoch when system state changes. It is assumed that customer demands can always be satisfied by either owned containers or leased containers if there are no owned empty containers available at a depot. Our purpose is to design effective ECR policies for hub-and-spoke systems subject to random demands and random reapportioning times in dynamic situations. This chapter is based on Song and Carter (2008). Hub-and-spoke system is quite common in hinterland container transportation. Typically, a major seaport can be regarded as a hub depot, and associated inland depots can be regarded as spoke depots. For example, in the UK, Felixstowe port is a hub depot and it is closely linked to multiple inland depots such as Birch Coppice, Birmingham, Ditton, Doncaster, Hams Hall, Manchester, and Leeds. The unique feature of such systems is that there are significant container flows between the hub and the spokes, whereas there are little container flows between spokes. This reflects the phenomenon that importers and exporters are often performing consolidation or unpacking at inland depots, and the laden and empty containers are mainly moving between inland depots and seaport depots because goods are imported or exported through seaports. The chapter is organized as follows. First, the problem is formulated as an eventdriven model and converted into a discrete-time Markov decision process. Second, stochastic dynamic programming is applied to seek the optimal feedback control policy for empty container repositioning. Third, a dynamic decomposition procedure with linear computational complexity is proposed to yield a near-to-optimal policy. This is done by decomposing the hub-and-spoke system into a series of two-depot subsystems where analytical results are available. The structural properties of the resulted control policy are then addressed. Fourth, numerical examples are given to demonstrate the results. We then extend the model to the cases with external supply and demand of empty containers at all depots, where empty containers can exit and enter the system randomly. Finally, a summary and a brief note are provided.

5.2 Problem Formulation and Uniformization Consider a hub-and-spoke system, which consists of a hub (i.e., a center depot) and n spokes (i.e., terminal depots) connected to the hub. A fleet of containers is owned by the transport company to satisfy exogenous transportation demands. A transportation demand is defined as a requirement of moving goods from the original depot to the destination depot, i.e., from the hub to one spoke or from one spoke to the hub. There is no direct container flow (either laden or empty) between any two spokes.

5.2 Problem Formulation and Uniformization

107

Laden container

Laden container Hub

Spoke 1

Empty container

Spoke n

Empty container ...

Spoke 2

...

Spoke i

Fig. 5.1 A hub-and-spoke transportation system

Figure 5.1 illustrates the container flows in the hub-and-spoke system, where solid lines indicate laden container flows and dashed lines indicate empty container flows. Define the notation as follows: N n d ij λij (t)

x i (t) qij ci+ ci−

the fleet size, which represents the total number of owned containers. the number of spokes in the hub-and-spoke system. the laden container arrival rate from depot i to depot j in the Poisson arrival process. Here index i = 0 indicates the hub, and index i ∈ {1, 2, …, n} represents spoke i. the control parameter at time t for repositioning an empty container from depot i to depot j. The empty repositioning time is exponentially distributed with mean 1/λij . The parameter λij (t) takes a value from a finite set, {0, …, r ij }, where r ij is the maximum element in the set. the cumulative number of the containers at depot i (including containers on the way to the other depot) at time t. the cost of empty container repositioning from depot i to j. the routine inventory or maintenance costs of containers associated with depot i per container per unit time. the leasing cost of containers that are leased from depot i per container per unit time.

It is assumed that when a laden container reaches the destination depot, it will be unloaded and unpacked, and become empty and available for reusing immediately. Empty containers can either be stored as inventory to satisfy a future demand at the current depot or be repositioned to a connected depot. If there is no empty container available when a demand arrives, an empty container will be leased from lessors immediately at the original depot by paying the leasing cost. The leasing cost is assumed to be proportional to the leasing time (which means it will be charged until a container is returned). The leased container must be returned to the original depot where it was leased. We assume that a container leased from one spoke will not be used to fulfill a demand from the hub to another spoke. This is due to the assumption that we can always lease another container at the hub to avoid the incurred delay to return the leased container to the original spoke.

108

5 Optimal and Near-Optimal ECR Policies in Hub-and-Spoke …

There are three major sources of uncertainties, namely the customer demand occurrence (i.e., requiring empty container) at each depot, the transportation time of a laden container, and the transportation time of an empty container. Since customer demands can always be satisfied, the laden container arrival process is a combined stochastic process of customer demand occurrence and the laden container transportation time. It is assumed that the laden container arrival process from the hub to the spoke i (from the spoke i to the hub) forms an independent homogeneous Poisson flow with a constant rate d 0i (d i0 ). This simplifies the problem formulation because we needn’t consider demand occurrence and laden container transportation time separately. The time to move an empty container from the hub to the spoke i (from the spoke i to the hub) is exponentially distributed with rate λ0i (λi0 ). The rate λij is a controllable variable subject to λij ∈ [0, r ij ]. The empty repositioning cost is proportional to the selected rate λij . Let x i (t) denote the accumulative number of containers at spoke i from time 0 to time t for i = 1, 2, …, n. Let x 0 (t) denote the accumulative number of containers at the hub from time 0 to time t excluding those containers leased from spokes. Define xi+ := max(x i , 0) and xi− := max(−x i , 0). Clearly, xi+ (t) represents the number of containers stored at spoke i at time t, and xi− (t) represents the number of containers leased from the spoke i and stored at the hub at time t. Note that there are N owned containers in the system, we have x 0 (t) = N − x1+ (t) − x2+ (t) − … − xn+ (t). More specifically, x0+ (t) represents the number of owned containers at the hub and x0− (t) represents the number of containers that are leased from the hub and stored at all spokes at time t. The system state can therefore be described by a vector x := (x 0 , x 1 , x 2 , …, x n ). Although x 0 is not an independent variable, it is convenient to keep it in the notation. The evolution of the system state is driven by two types of events, i.e., the arrivals of laden containers, and the arrivals of repositioned empty containers. Due to the memoryless properties of Poisson process and exponential distribution, unarrived container that is interrupted by an event is statistically equivalent to that of restarting from a depot. These two events are denoted by two operators T 0i and T i0 , defined as follows:  • T 0i x = y := (y0 , y1 , y2 , …, yn ) with yi = x i + 1, y0 = N − 1 n yj + , and yj = x j for j = 0, i.  • T i0 x = y := (y0 , y1 , y2 , …, yn ) with yi = x i − 1, y0 = N − 1 n yj + , and yj = x j for j = 0, i. The system state transition map is partly illustrated in Fig. 5.2, where the transition rates λ0i and λi0 are controllable decisions and depend on the current state x. Let X = {x | x 0 = N − x1+ − x2+ − … − xn+ and x i ∈ Z for i = 1, 2, …, n} be the system state space. Let λij (t) be the empty repositioning rate at time t. We only consider the state feedback control policies (i.e. stationary policies). Therefore, λij (t) should be understood as λij (x(t)), where x(t) represents the current system state at time t. Let  = {u(t) := (λ01 (t), λ02 (t), …, λ0n (t), λ10 (t), λ20 (t), …, λn0 (t)) | λ0i (t) ∈ [0, r 0i ] and λi0 (t) ∈ [0, r i0 ] for t ∈ (0, ∞), i = 1, 2, …, n} be the admissible control set. The problem is to find the optimal feedback control policy u ∈  to minimize

5.2 Problem Formulation and Uniformization Fig. 5.2 The system state transition map

109

T01x ... T0nx

d01 01

10

x

d0n

T10x

d10

0n

dn0 n0

... Tn0x

the following infinite-horizon expected discounted cost: ⎞⎤ ⎛∞  J (x) = min⎣ E u ⎝ e−βt G(x(t), u(t))dt|x(0) = x⎠⎦ ⎡

u

(5.1)

0

where 0 < β < 1 is a discounted factor, x is the initial system state, and G(x(t), u(t)) represents the container holding costs, leasing costs and empty container repositioning costs, which may be defined as G(x(t), u(t)) = g(x(t)) +

n

(q0i λ0i (t) + qi0 λi0 (t))

i=1

where q0i ≥ 0 and qi0 ≥ 0 are cost constants for empty repositioning rates and g(x(t)) represents the container holding and leasing costs, defined by g(x(t)) :=

n n

+ + 

 ci xi (t) + ci− xi− (t) + c0+ xi− (t) + c0+ x0+ (t) + c0− x0− (t) i=1

i=1

(5.2) where ci+ and ci− are the holding cost and leasing cost per unit of container over per unit of time at depot i. The holding costs per unit for an owned container and for a leased container are assumed to be the same at the same depot. The first term in right-hand side of (5.2) represents the container holding and leasing costs at all spokes, the second term represents the holding costs for containers leased from a spoke and stored at the hub, and the third term represents the container holding and leasing costs at the hub. Similar to that in Chap. 4, by the uniformization technique (Bertsekas, 1987; Puterman, 1994), the continuous-time Markov chain problem (5.1) can be transformed into n an equivalent discrete-time problem. Define the uniform transition rate as: v = i=1 (di0 + d0i + ri0 + r0i ). Under an admission control policy u ∈ , the one-step transition probability P(y | x, u) is given as follows:

110

5 Optimal and Near-Optimal ECR Policies in Hub-and-Spoke …

P(T0i x|x, u) = (λ0i + d0i )/v, P(Ti0 x|x, u) = (λi0 + di0 )/v, P(x|x, u) = (v −

n i=1

(d0i + di0 + λ0i + λi0 ))/v,

P(y|x, u) = 0, otherwise. Let 0 = t 0 < t 1 < … < t k < … be the state transition epochs, and xk := x(t k ) be the destination state of the kth transition. Let uk := u(t k ) denote the empty redistribution rates at time t k . Compute the cost function for the given initial condition x(0) = x and control policy u(t), we have ∞ E

e−βt G(x(t), u(t))dt =

0

k ∞  v 1 E G(xk , uk ) β + v k=0 β + v

Therefore, the problem is transformed into a discrete-time Markov chain problem with non-negative unbounded cost per step and infinite countable state space. Following the stochastic dynamic programming theory, the Bellman optimality equation is  n n 1 min g(x) + J (x) = d0i J (T0i x) (q0i λ0i + qi0 λi0 ) + β + v λi0 ,λ0i i=1 i=1 + +

n i=1 n

di0 J (Ti0 x) +

n

λ0i J (T0i x) +

i=1



n

λi0 J (Ti0 x)

i=1

(r0i + ri0 − λ0i − λi0 )J (x)

(5.3)

i=1

where J(x) is defined in (5.1). The above equation can be simplified as  n n 1 J (x) = d0i J (T0i x) + di0 J (Ti0 x) g(x) + β +v i=1 i=1 +

n i=1

r0i min(J (T0i x) + q0i , J (x)) +

n



ri0 min(J (Ti0 x) + qi0 , J (x))

i=1

(5.4)

5.3 Optimal ECR Policy

111

5.3 Optimal ECR Policy The existence of a state feedback control policy to achieve the minimum in (5.1) follows from the fact that only finitely many controls are considered at each state (Bertsekas, 1987; Puterman, 1994). Equation (5.4) implies that the optimal policy can be described in terms of the optimal cost function. Proposition 5.1 The optimal feedback control policy is given by: λ∗0i = ∗ λi0

 

=

r0i if J (T0i x) + q0i < J (x) ; for i = 1, 2, . . . , n. 0 if J (T0i x) + q0i ≥ J (x) ri0 if J (Ti0 x) + qi0 < J (x) ; for i = 1, 2, . . . , n. 0 if J (Ti0 x) + qi0 ≥ J (x)

Physically, the quantity r 0i (J(T 0i x) + q0i − J(x)) can be interpreted as the additional cost incurred when an empty container is transferred from the hub to spoke I at speed r 0i . Therefore, we would not transfer any empty container to spoke i if the additional cost is positive. Similarly, if J(T i0 x) + qi0 − J(x) is positive, we do not transfer any empty container from spoke i to the hub. The optimal feedback control policy given in Proposition 5.1 is implicit. Its implementation in practice depends on whether we know the optimal cost function J(x) or the additional cost incurred when an empty container is dispatched from hub/spoke to spoke/hub at any state. Numerical methods such as value iteration algorithm given below are often used to approximate the optimal cost function. Proposition 5.2 Let J 0 (x) = 0, for any x ∈ X, and  Jk+1 (x) = (b + v)−1 g(x) +

n

d0i Jk (T0i x) +

n

i=1

+

n

r0i min(Jk (T0i x) + q0i , Jk (x)) +

i=1

di0 Jk (Ti0 x)

i=1 n

 ri0 min(Jk (Ti0 x) + qi0 , Jk (x))

i=1

(5.5) for k ≥ 0 and x ∈ X, where J k (x) is the k-stage cost function for state x, then lim Jk (x) = J (x), for x ∈ X

k→+∞

(5.6)

where J(x) is defined in (5.1). The convergence of the k-stage policy and cost function to the infinite-horizon optimal policy and cost again follows from the fact that a finite number of controls is taken at each state. However, the numerical method in Proposition 5.2 becomes computational expensive when the number of spokes (i.e., n) is large. An alternative is to find near-optimal

112

5 Optimal and Near-Optimal ECR Policies in Hub-and-Spoke …

control policies with affordable computational effort, which will be discussed in the next section.

5.4 Suboptimal Policy Using a Dynamic Decomposition Procedure This section presents a dynamic decomposition procedure that can produce a nearoptimal policy with linear computational complexity in terms of the number of spokes. The procedure is based on the analytical results of two-depot systems in Chap. 4 and further discussions are given below.

5.4.1 Optimal Feedback Control for Two-Depot Systems If n = 1, then the system in Fig. 5.1 becomes a two-depot shuttle system. For such a simple system, we have established the explicit forms of the optimal cost function and the optimal feedback control policy (Chap. 4). In particular, we have shown that the optimal policy has a simple threshold structure characterized by two parameters, and can be obtained with little computational effort (Song & Earl, 2008). Let J l,h (x 1 ) be the cost function in the two-depot system under a threshold policy with threshold values l and h, and initial state (x 1 , N − x 1 ). From Chap. 4, we know that J l,h (x 1 ) can be derived analytically. Since we have the analytical expression of the cost function J l,h (x 1 ), we can easily obtain the optimal threshold values for a given container fleet size. Note that the container fleet size affects the values of the optimal threshold parameters as shown in Chap. 4. We denote the optimal threshold values as l * (N) and h* (N) to represent their dependence on the fleet size N. Moreover, the optimal threshold values are independent of the initial system state. Therefore, they can be obtained by minimizing the cost function at the initial state x 1 = 0, i.e., (l* (N), h* (N)) = arg min J l,h (0). l≤h

Proposition 5.3 In a two-depot shuttle system, the optimal threshold values l* (N) and h* (N) are increasing in the fleet size N. Moreover, 0 ≤ l* (N) ≤ h* (N) ≤ N. Proof With a slight abuse of notation, let J N (x 1 ) and J N+1 (x 1 ) be the optimal cost function in a two-depot shuttle system with the container fleet size being N and N + 1 respectively. Using the induction approach on stage k in Proposition 5.2 we can prove J N +1 (x1 + 1) − J N +1 (x1 ) + q01 ≤ J N (x1 + 1) − J N (x1 ) + q01

5.4 Suboptimal Policy Using a Dynamic Decomposition Procedure

113

From Lemma 4.1 in Chap. 4, we know that J N (x 1 + 1) − J N (x 1 ) is increasing in x 1 and l * (N) can be defined as: l* (N) := max{x 1 | J N (x 1 + 1) − J N (x 1 ) + q01 ≤ 0}, it follows l* (N) ≤ l* (N + 1). Similarly, we can prove h* (N) ≤ h* (N + 1). The relationships and the ranges of l* (N) and h* (N) are intuitive and can be proved by the induction approach rigorously. This completes the proof.  The monotonic property of l* (N) and h* (N) is intuitive. The larger the container fleet size is, the higher the inventory threshold levels could be maintained and the less the empty container movement is required.

5.4.2 Dynamic Decomposition Procedure This section presents a dynamic decomposition procedure, which decomposes the hub-and-spoke system into n two-depot shuttle subsystem based on the current system state and then uses the analytical results of two-depot subsystems to construct an ECR control policy for the original hub-and-spoke system. The decomposition is designed in a dynamic way to reflect the real-time interaction between the hub and spokes closely. Suppose the hub-and-spoke system is at a system state x = (x 0 , x 1 , x 2 , …, x n ), the decisions to reposition empty container between a specific spoke i and the hub can be approximated by that of a two-depot system consisting of the spoke and the hub with a state (x i , x 0 (i) ), where x 0 (i) represents the hub state in the subsystem. In other words, the decisions of each spoke are determined separately. Figure 5.3 shows the set of decomposed two-depot subsystems. Since the optimal ECR policies for individual two-depot systems can be explicitly obtained with little computational Fig. 5.3 Decomposed into a set of two-depot subsystems

d01, d10 Spoke 1

x1 01,

x0(1)

Hub

x0(2)

Hub

x0(n)

Hub

10

d02, d20 Spoke 2

x2 02,

...

20

...

...

d01, d10 Spoke n

xn 01,

10

114

5 Optimal and Near-Optimal ECR Policies in Hub-and-Spoke …

effort by the results in Chap. 4, these individual policies can be combined to form a complete ECR policy for the original hub-and-spoke system at the state x. However, the hub state in the decomposed individual two-depot subsystems should be carefully designed to reflect the effects of other spokes on the hub. Note that x 0 = N − x1+ − x2+ − … − xn+ , which implies x 0 ∈ (−∞, N). Clearly, positive x 0 means that some owned empty containers are stored in the hub while negative x 0 means that some containers have been leased from the hub and stored in some spokes. In addition, negative x i implies that there have been some containers leased from spoke i and stored at the hub because of the assumption that a leased container from one spoke cannot be used for another spoke and must be returned to the original depot. By considering the current state information of x i and x 0 , we propose a method below to determine the hub states x 0 (i) for all the decomposed subsystems. The details of the dynamic decomposition procedure are given as follows. Algorithm 5.1 A dynamic decomposition procedure to construct a suboptimal ECR policy: Step 1. For each two-depot subsystem consisting of spoke i and the hub with parameters d 0i , d i0 , r 0i , r i0 , q0i , qi0 , ci+ , ci− , c0+ , c0− , calculating the optimal threshold values for varying fleet size K, i.e., li * (K) and hi * (K) for K = 0, 1, …, N and i = 1, 2, …, n. Step 2. For any given state x = (x 0 , x 1 , x 2 , …, x n ) in the original system, determining the hub states x 0 (1) , x 0 (1) , …, and x 0 (n) in the decomposed two-depot subsystems, which is based on the local information at the spoke and the information at the hub: (i) (ii) (iii) (iv)

If x i If x i If x i If x i

≤ 0 and x 0 ≥ 0, then x 0 (i) = x 0 − x i . ≤ 0 and x 0 < 0, then x 0 (i) = −x i . > 0 and x 0 + x i ≥ 0, then x 0 (i) = x 0 . > 0 and x 0 + x i < 0, then x 0 (i) = −x i .

This gives the dynamic fleet size in the subsystem, denoted by K i (x), i.e., K i (x) := x i + x 0 (i) . Step 3. For any given state in the original system, x = (x 0 , x 1 , x 2 , …, x n ), determining the decision variables by (for i = 1, 2, …, n)  λ0i =

r0i if xi < li∗ (K i (x)) ; λi0 = 0 if xi ≥ li∗ (K i (x))



ri0 if xi < h i∗ (K i (x)) 0 if xi ≥ h i∗ (K i (x))

The resulted ECR policy from Algorithm 5.1 is called dynamic decomposition policy. The rationale of the algorithm can be explained as follows. In Step 2, Case (i) represents the situation that spoke i has shortage of empty containers while the hub has inventories of both leased containers from spoke i and owned empty containers. We, therefore, set the decomposed hub state to be the sum of the owned containers and leased containers from the spoke i. Case (ii) represents the situation that both spoke i and the hub are short of empty containers. Obviously, the containers leased from the hub must be stored in other spokes, which is nothing to do with spoke i.

5.4 Suboptimal Policy Using a Dynamic Decomposition Procedure

115

Hence, we set x 0 (i) = −x i . Case (iii) represents the situation that spoke i has positive inventories and the current number of leased containers from the hub is less than that stored in spoke i. Case (iv) represents the situation that spoke i has positive inventories and there are some containers leased from the hub but are stored in other spokes. By ignoring those containers that are leased from the hub and stored in other spokes, we set x 0 (i) = −x i . Note that Step 1 is an offline calculation which is a simple parameter optimization for an analytical function. The computational complexity in Step 1 is linear to the number of spokes. Steps 2 and 3 are performed dynamically (i.e., online) because K i (x) depends on x, which is changing over time; and the decisions for each decomposed two-depot subsystem depend on K i (x). However, since li * (K) and hi * (K) have been computed offline and K i (x) can be calculated straightforward, there is little computational effort involved in Steps 2 and 3. Therefore, Algorithm 5.1 is computationally efficient. Moreover, since K i (x) actually only depends on the information at the local spoke and the hub, the online data communication requirement is very low.

5.4.3 Structural Properties of Dynamic Decomposition Policy This section examines some structural properties of the dynamic decomposition policy and discusses its sub-optimality. Proposition 5.4 In the dynamic decomposition policy, if there exists j = i such that x j ≥ N, then l i * (K i (x)) ≡ li * (0) = 0 and hi * (K i (x)) ≡ hi * (0) = 0. Proof Note that x 0 = N − x1+ − x2+ − … − xn+ . Therefore, x j ≥ N implies that x 0 ≤ 0. Consider the following cases, (a) (b)

If x i ≤ 0, then K i (x) = x 0 (i) + x i = 0 from Step 2 (ii) in Algorithm 5.1. + + − xi+1 − … − xn+ ≤ 0 due to x j ≥ N. If x i > 0, then x 0 + x i = N − x1+ − xi−1 (i) Thus, K i (x) = x 0 + x i = 0 from Step 2 (iv) in Algorithm 5.1.

From (a) and (b), we always have K i (x) = 0. Therefore, the assertion is true by Proposition 5.3.  Proposition 5.5 In the dynamic decomposition policy, if x j ≤ 0 for j = i, then l i * (K i (x)) ≡ li * (N) and hi * (K i (x)) ≡ hi * (N). Proof Since x j ≤ 0 for j = i, it implies that x 0 = N − xi+ . Consider (a) (b)

If x i ≤ 0, then K i (x) = x 0 (i) + x i = N from Step 2 (i) in Algorithm 5.1. If x i > 0, then x 0 = N − x i and K i (x) = x 0 (i) + x i = N from Step 2 (iii) in Algorithm 5.1.

From (a) and (b), we have K i (x) = N and the assertion is true by Proposition 5.3. 

116

5 Optimal and Near-Optimal ECR Policies in Hub-and-Spoke …

x2

x2 Spoke 2 to Hub

Spoke 1 to Hub 0 x1 Hub to Spoke 1 0

x1 Hub to Spoke 2

Fig. 5.4 Control structure of dynamic decomposition policy in a two-spoke system

Proposition 5.6 In the dynamic decomposition policy, 0 = li * (0) ≤ l i * (K i (x)) ≤ li * (N) and 0 = hi * (0) ≤ hi * (K i (x)) ≤ hi * (N) ≤ N. Proof Similar to the proof of Proposition 5.5, we can show that 0 ≤ K i (x) ≤ N. The assertion is true by Proposition 5.3.  To have an intuitive visualization of the structural properties in Propositions 5.4– 5.6, consider a two-spoke-one-hub system. Then the control structure of the dynamic decomposition policy (for spoke 1) can be illustrated in the state space, see Fig. 5.4, where the horizontal axis is x 1 (the inventory position at spoke 1) and the vertical axis is x 2 (the inventory position at spoke 2). The entire state space is divided into three control regions by two switching curves. The dynamic decomposition policy for spoke 1 is illustrated on the left-side figure as follows: reposition an empty container from the hub to spoke 1 if the current system state is located in the left region; do not reposition any empty container between the hub and spoke 1 if the current system state is located in the middle region (in the area between two switching curves marked with 0), and reposition an empty container from spoke 1 to the hub if the current system state is in the right region. The control structure of the ECR policy for spoke 2 is illustrated on the right-side figure and can be similarly explained. It is interesting to examine the asymptotic behaviors of the optimal ECR policy. We present the following remarks based on observations. Remark 5.1 As x j increases, the optimal ECR policy between the hub and spoke i (i = j) tends to be a threshold policy with li * (0) = hi * (0) = 0. This observation can be explained as follows. As x j tends to be positive infinity, it implies that a large of containers have been leased at the hub since the owned container fleet size is fixed. All of the owned containers must have been absorbed by spoke j. From the assumption that a leased container from spoke i cannot be dispatched to other spokes and must return to the origin spoke i. Therefore, the empty container repositioning between spoke i and the hub can be approximated to be a standalone two-depot shuttle system with zero fleet size. Remark 5.2 As x j decreases to be sufficiently negative for all spokes j = i, the optimal ECR policy between the hub and spoke i tends to be a threshold policy with li * (N) ≤ hi * (N), where l i * (N) and hi * (N) are given in Proposition 5.3.

5.4 Suboptimal Policy Using a Dynamic Decomposition Procedure

117

Note that as x j tends to negative infinity for all spokes j = i, the hub has stored enough leased containers from every spoke j (j = i). Therefore, all of the owned containers are available to use for spoke i. The empty container movement between spoke i and the hub can be approximated to be a two-depot system with a fleet size N. Comparing the above remarks with Propositions 5.4 and 5.5, it reveals that the optimal ECR policy and the dynamic decomposition policy appear to have the same asymptotic behaviors. Moreover, Remarks 5.1 and 5.2 lead to the following observation. Remark 5.3 Define a threshold control policy for the hub-and-spoke system as follows: (a) (b)

λ0i = 0 if x i ≥ l i * and λ0i = r 0i if x i < l i * ; λi0 = 0 if x i ≤ hi * and λi0 = r i0 if x i > hi * .

Where the threshold values li * and hi * only depend on i. Then, any of such threshold policies could not be optimal if hi * (N) > 0. The sub-optimality of the dynamic decomposition policy can be justified in three aspects. Firstly, the fleet size in the decomposed subsystems generally reflects the actual relationships between the spokes and the hub. Secondly, because the decomposition procedure is done dynamically, it automatically updates the system states in the decomposed subsystems according to the actual state information. Thirdly, the optimal ECR policy and the dynamic decomposition policy have the same asymptotic structures. Numerical examples in the next section show that the proposed dynamic decomposition policy achieves very close performance compared with the optimal ECR policy and they indeed have the same asymptotic behaviors.

5.5 Numerical Examples This section gives numerical examples to illustrate the effectiveness of the dynamic decomposition approach by comparing it with the optimal ERC policy obtained from the value iteration algorithm, and then examine the robustness of the results to the deviations from the assumptions. In Sect. 5.5.1, the structural properties of the optimal policy (OP) and the dynamic decomposition policy (DDP) in a two-spokeone-hub system are examined. In Sect. 5.5.2, the performances of the DDP and the optimal policy in a three-spoke-one-hub system are compared in a range of cases. In Sect. 5.5.3, we relax the assumptions on empty container transport times and laden container arrival processes and examine the effectiveness of the DDP against a heuristic ECR policy in various scenarios. In Sects. 5.5.1 and 5.5.2, the value iteration algorithm (Bertsekas, 1976) is used to compute the expected costs for the OP and the DDP. To perform the algorithm numerically, the state space has to be limited to a finite region (e.g., −30 < x i < 30 for

118

5 Optimal and Near-Optimal ECR Policies in Hub-and-Spoke …

i > 0). The iterative procedure is terminated when the number of iterations reaches 1000 or there are no further changes in costs.

5.5.1 Structural Properties of ECR Policies in Two-Spoke-One-Hub Systems This section examines a two-spoke-one-hub system. The detailed control structures of dynamic decomposition policy and the optimal policy are visualized in the (x 1 , x 2 ) two-dimension state space. Example 5.1 Consider a two-spoke-one-hub system with parameters: β = 0.9, r 0i = r i0 = 5, d 0i = 1, d i0 = 2, q0i = qi0 = 1, ci+ = 1, ci− = 10 for i = 1 and 2, c0+ = 1, c0− = 20, and N = 4. The control structures of the optimal policy and the dynamic decomposition policy are partially displayed in the state space (x 1 , x 2 ) in Figs. 5.5 and 5.6, respectively. In Figs. 5.5 and 5.6, the letter “O” indicates that no empty container should be repositioned, the letter “H” indicates that an empty container should be repositioned to the Hub, and the letter “S” indicates that an empty container should be repositioned to the spoke at the corresponding system state. From Figs. 5.5 and 5.6, it can be seen that the optimal ECR policy and the dynamic decomposition policy have similar control regions and the same asymptotic behaviors. The optimal cost under the optimal policy is J * (2, 1) = 9.70, while the cost

(a) Optimal policy for Spoke 1 Fig. 5.5 The control structures of the optimal policy

(b) Optimal policy for Spoke 2

5.5 Numerical Examples

(a) Optimal policy for Spoke 1

119

(b) Optimal policy for Spoke 2

Fig. 5.6 The control structures of the dynamic decomposition policy

under the dynamic decomposition policy is J(2, 1) = 9.80, which indicates that the cost incurred under dynamic decomposition policy is only 1% above the optimal cost.

5.5.2 Comparing DDP with the Optimal Policies in Three-Spoke-One-Hub Systems This section examines a three-spoke-one-hub system with seven different cases. We compare the performances of the optimal policy (OP) and the dynamic decomposition policy (DDP). Note that the state space is three dimensional, it is appropriate to use the value iteration algorithm to calculate the optimal cost. Example 5.2 Consider a three-spoke-one-hub system. Let β = 0.9, q0i = qi0 = 1, ci+ = 1, r 0i = r i0 for i = 1, 2, 3, c0+ = 1, c0− = 20, N = 10, and other system parameters are set up in Table 5.1 using Case A as the reference point. Seven cases are studied. Case A—balanced and same demand arrival rate. Case B—no demands from the hub to spokes. Case C—no demands from spokes to the hub. Case D—balanced and different demand arrival rates. Case E—imbalanced and different demand arrival rates. Case F—balanced and same demand arrival rate with different repositioning speed capacity.

120

5 Optimal and Near-Optimal ECR Policies in Hub-and-Spoke …

Table 5.1 System parameter settings different from the base Case A

Case

Parameter setting

Case A

Base case setting

Case B

d 01 = d 02 = d 03 = 0

Case C

d 10 = d 20 = d 30 = 0

Case D

d 02 = 3; d 03 = 4; d 20 = 3; d 30 = 4

Case E

d 02 = 3; d 03 = 4; d 10 = 4; d 20 = 3

Case F

r 01 = 3; r 02 = 5; r 02 = 7

Case G

c2− = 20; c3− = 30

Case G—balanced and same demand arrival rate with different leasing cost units. Let Case A be a base case as a reference point with the parameter settings as follows: d 01 = d 02 = d 03 = 2; d 10 = d 20 = d 30 = 2; r 01 = r 02 = r 02 = 5; c1− = c2− = c3− = 10. The costs and the optimal initial states (because the cost function depends on the initial state) for the above seven cases under the optimal policy (OP) and the dynamic decomposition policy (DDP) are given in Table 5.2. It can be seen from Table 5.2 that the dynamic decomposition policy is very close to the optimal, which is only 0.00–1.46% above the optimal cost (the worst case is Case E). The optimal initial empty container distributions are also the same for the dynamic decomposition policy and the optimal policy. It is found that the performances of two policies are the same in Case B, where there are no demands from the hub to spokes. Table 5.2 Costs and optimal initial states for seven cases with fixed fleet size Case

Policy

J(x0∗ )

x1∗ (0)

x2∗ (0)

x3∗ (0)

A

OP

14.53

2

2

2

A

DDP

14.59 (0.41%)

2

2

2

B

OP

15.43

3

3

4

B

DDP

15.43 (0.00%)

3

3

4

C

OP

13.71

0

0

0

C

DDP

13.85 (1.02%)

0

0

0

D

OP

17.12

2

2

2

D

DDP

17.25 (0.76%)

2

2

2

E

OP

18.48

4

2

1

E

DDP

18.75 (1.46%)

4

2

1

F

OP

14.63

2

2

2

F

DDP

14.72 (0.62%)

2

2

2

G

OP

15.31

2

2

3

G

DDP

15.48 (1.11%)

2

2

3

5.5 Numerical Examples

121

Table 5.3 Costs and optimal initial states for Case A with varying fleet sizes N

Policy

J(x0∗ (0))

x1∗ (0)

x2∗ (0)

x3∗ (0)

4

OP

15.17

1

1

1

4

DDP

15.31 (0.92%)

1

1

1

8

OP

13.92

1

2

2

8

DDP

14.03 (0.79%)

1

2

2

10

OP

14.53

2

2

2

10

DDP

14.59 (0.41%)

2

2

2

12

OP

15.88

2

3

3

12

DDP

15.93 (0.31%)

2

3

3

16

OP

19.13

3

3

4

16

DDP

19.14 (0.05%)

3

3

4

Note that container fleet sizing is an important issue that is closely related to ECR. We vary the container fleet size for Case A and examine the performance of two control policies. The results are given in Table 5.3. From Table 5.3, it can be observed that the costs under dynamic decomposition policy again are very close to the optimal costs (i.e., within the gap 0.05–0.92%) for the varying fleet sizes. It also reveals that the optimal fleet size N is 8 for Case A, which can be obtained either from the optimal policy or the dynamic decomposition policy.

5.5.3 Comparing DDP with a Heuristic Policy in a Many-Spoke-One-Hub Systems This section aims to test the dynamic decomposition policy (DDP) in more realistic settings. We consider larger systems in terms of the number of spokes (i.e., n) and the container fleet size (i.e., N). Due to the curse of dimensionality, it is difficult to compute the optimal costs for the systems with n > 3 using the value iteration algorithm. Moreover, if systems do not satisfy the exponential distribution assumptions, it appears that the optimal ECR policy is intractable. Therefore, we compare the DDP with a heuristic repositioning policy and use simulation to evaluate their performance. The heuristic repositioning policy (HRP) is defined as follows (Song & Carter, 2008). (i)

Set initial container inventory level at spoke i be x i (0) :=    N · di0 / j (d j0 + d0 j ) , and x 0 (0) := N − j x j (0), where . takes the greatest integer that is not greater than the number in . The heuristic policy states: between each spoke i and the hub,

122

(ii) (iii) (iv)

5 Optimal and Near-Optimal ECR Policies in Hub-and-Spoke …

If d i0 = d 0i , then λi0 = λ0i = 0. If d i0 > d 0i , then λi0 = 0; λ0i = r 0i if x i < x i (0), and λ0i = 0 otherwise. If d i0 < d 0i , then λ0i = 0; λi0 = r i0 if x i > x i (0), and λi0 = 0 otherwise.

The logic of the above HRP policy can be explained as follows. The owned containers are initially allocated to the depots in proportion to the customer demands at the depots (i.e., d i0 ). This is reasonable since a high demand depot should keep a high number of empty containers to buffer against its higher uncertainty. Subsequently, if the demand rates are the same, which implies that the demands are balanced in the long term, no empty container will be repositioned. The repositioning decision between each spoke i and the hub is triggered by a single threshold value x i (0). This policy is similar to the one proposed by Du and Hall (1997). In the experiments, the number of spokes varies from 5 to 20 and the number of containers varies from 30 to 200. The assumptions on travel times are also relaxed. For the empty container repositioning times, three probability distributions are tested. The first is the Exponential distribution that has been assumed in the model development. The second is a Uniform distribution in the interval [0, 2/r ij ]. The third is a Normal distribution N(μ, σ 2 ) with μ = 1/r ij , σ = 0.2μ and left-truncated at zero. The Normal distribution may be more appropriate to model the uncertainty in container transfer time due to various factors such as traffic and weather. For the interval times between arriving laden containers, we also test three types of probability distributions. The first is the Exponential distribution. The second is a Uniform distribution in the interval [0, 2/d ij ]. The third is a Normal distribution N(μ, σ 2 ) with μ = 1/d ij , σ = 0.2μ and left-truncated at zero. The above settings will give rise to nine different combinations of empty repositioning time and laden container arriving interval time (empty_dis, laden_dis), which takes a form such as (Exp, Exp), (Exp, Uni), (Exp, Norm), (Uni, Exp), (Uni, Uni), (Uni, Norm), (Norm, Exp), (Norm, Uni) and (Norm, Norm). To evaluate the performances of the DDP and the HRP policies, we develop a simulation to make a comparison. The performance measure is estimated by averaging over 10,000 samples. We take three levels of the number of spokes (i.e., n): 5 spokes, 10 spokes, and 20 spokes. For a given number of spokes, we consider three levels of container fleet size (i.e., N), 30, 50, and 100 for 5 and 10 spokes, and 50, 100, and 200 for 20 spokes. For each pair of (n, N), the nine combinations of different distributions for empty container transfer times and laden container arriving interval times are tested. A scenario is characterized by: n and N, and a combination of distributions, i.e., (n, N, empty_dis, laden_dis). For example, the scenario (n, N, empty_dis, laden_dis) = (5, 30, Norm, Exp) represents a five-spoke-one-hub system with container fleet size 30, Normal distribution of empty transfer times, and Exponential distribution of laden container arriving intervals. For a given scenario, the parameters d ij and r ij are randomly generated from the interval (0, 10). The empty repositioning costs qi0 and q0i , and the container inventory cost ci+ are randomly generated from the interval (0, 10), while the container leasing

5.5 Numerical Examples

123

cost ci− is randomly taken from the interval (10, 30). This reflects that the leasing costs are greater than the repositioning cost and the inventory cost. For each scenario, 10 sets of the system parameters (i.e., d ij , r ij , q0i , qi0 , ci+ , and − ci ) are randomly generated in order to consider the variability of the parameter combinations. After a set of parameters is generated, the simulation is run with 1000 instances to estimate the expected costs for the dynamic decomposition policy and the heuristic repositioning policy. The costs given in Tables 5.4, 5.5 and 5.6 are the average costs over those 10 settings. In all cases, the initial state is determined by the heuristic repositioning policy and fixed for both policies. The columns of HRP give the percentage of the costs under HRP above those under DDP. It can be seen from Tables 5.4, 5.5 and 5.6 that DDP performs significantly better than HRP in all scenarios. This is particularly true in the situations where the ratio of the fleet size to number of spoke (i.e., N/n) is large, the cost saving is always greater than 20% when N/n = 10. For the scenarios with relatively smaller N/n Table 5.4 Costs under DDP and HRP for five-spokes-one-hub systems n=5 (Exp, Exp)

N = 30

N = 50

N = 100

DDP

HRP (%)

DDP

HRP (%)

DDP

HRP (%)

183

14.0

206

21.0

343

23.9

(Exp, Uni)

156

18.0

188

26.5

335

26.7

(Exp, Norm)

143

21.8

179

30.6

331

28.3

(Uni, Exp)

186

13.7

209

21.0

346

23.7

(Uni, Uni)

158

18.4

189

27.3

337

26.8

(Uni, Norm)

144

23.4

180

32.3

333

28.8

(Norm, Exp)

187

13.7

209

21.2

347

23.7

(Norm, Uni)

158

19.0

189

28.1

338

27.0

(Norm, Norm)

144

25.2

180

33.6

334

29.2

Table 5.5 Costs under DDP and HRP for ten-spokes-one-hub systems n = 10

N = 30 DDP

HRP (%)

N = 50

N = 100

DDP

HRP (%)

DDP

HRP (%)

(Exp, Exp)

394

6.7

366

12.0

414

20.9

(Exp, Uni)

323

7.6

312

15.5

380

26.1

(Exp, Norm)

284

9.2

284

19.5

364

29.9

(Uni, Exp)

402

6.3

372

12.0

418

21.2

(Uni, Uni)

327

8.2

316

16.1

383

27.1

(Uni, Norm)

287

10.4

286

21.4

366

31.7

(Norm, Exp)

403

6.4

373

12.2

420

21.5

(Norm, Uni)

328

8.6

316

16.8

384

27.9

(Norm, Norm)

287

11.6

286

23.0

367

32.7

124

5 Optimal and Near-Optimal ECR Policies in Hub-and-Spoke …

Table 5.6 Costs under DDP and HRP for twenty-spokes-one-hub systems n = 20

N = 50

N = 100

N = 200

DDP

HRP (%)

DDP

HRP (%)

DDP

HRP (%)

(Exp, Exp)

750

7.8

672

15.9

797

21.3

(Exp, Uni)

596

7.5

570

17.5

735

24.8

(Exp, Norm)

507

6.6

516

19.6

704

26.6

(Uni, Exp)

770

7.5

687

15.8

810

21.5

(Uni, Uni)

610

7.9

580

18.2

745

25.2

(Uni, Norm)

515

7.5

524

21.1

713

27.5

(Norm, Exp)

777

7.1

691

15.8

813

21.5

(Norm, Uni)

612

7.9

582

18.7

747

25.5

(Norm, Norm)

516

8.1

524

22.1

714

28.1

ratios, the cost savings achieved by DDP over HRP were of the order of 6–11%. This relationship may be explained by the fact that when owned containers are in shortage at all depots, there is not much an ECR policy could improve because the operator has to turn to lessors for leasing containers to meet customer demands. On the other hand, if the operator has a good number of container fleets, the performance could be improved significantly through efficiently repositioning empty containers. It has also been observed that the dynamic decomposition policy is reasonably robust in terms of the distribution assumptions on empty container transfer times and laden container arrivals. Although the absolute costs may vary significantly for different distribution types, the relative performance between the DDP and the HRP is at a similar level for different combinations of distribution types for fixed n and N. Interestingly, when the interval times of laden container arrivals follow Normal distributions, the cost saving achieved by DDP from the HRP is generally slightly higher than those in other combinations. Comparing the costs with three levels of fleet size in Fig. 5.5 (i.e., n = 10) and Fig. 5.6 (i.e., n = 20), it can be seen that the optimal fleet size takes the middle level when the interval times of laden container arrivals follow Exponential distributions, while the optimal fleet size takes the lower level when the interval times of laden container arrivals follow Uniform or Normal distributions. This may be explained by the fact that in our experiments Exponential distribution has a larger variance than uniform and normal distributions, and therefore the system requires a larger fleet size to buffer against the higher degree of uncertainty in demands. Tables 5.4, 5.5 and 5.6 also reveal that the fleet sizing is more sensitive to the distribution type for laden container arrivals than to the distribution type for empty container transfer times. The above observations may be explained by the fact that ECR is indirect and driven by uncertain/imbalanced demand arrivals. More reliable customer demands lead to more efficient empty repositioning and require smaller number of owned containers.

5.6 Extension to Cases with External Supply and Demand

125

5.6 Extension to Cases with External Supply and Demand Suppose each depot in the hub-and-spoke system is facing additional uncertainty of external supply and demand of empty containers. This implies that the container fleet size in the system will be varying over time due to the random entry and exit of empty containers. This section extends the model to the cases with external supply and demand of empty containers. We introduce the following additional notations. ai di

the empty container arrival rate from external customers to depot i in the Poisson arrival process, where i = 0, 1, …, n and i = 0 represents the hub depot. the empty container request rate by external customers from depot i in the Poisson arrival process, where i = 0, 1, …, n and i = 0 represents the hub depot. The discounted-cost optimal ECR problem can be formulated similar to (5.1), i.e., g(x(t)) :=

n n

+ + 

 ci xi (t) + ci− xi− (t) + c0+ xi− (t) + c0+ x0+ (t) + c0− x0− (t) i=1

i=1

G(x(t), u(t)) = g(x(t)) +

n

(q0i λ0i (t) + qi0 λi0 (t))

i=1

⎞⎤ ⎛∞  J (x) = min⎣ E u ⎝ e−βt G(x(t), u(t))dt|x(0) = x⎠⎦ ⎡

u

0

where 0 < β < 1 is a discounted factor, and G(x(t), u(t)) represents the container holding costs, leasing costs and empty container repositioning costs. The system state is described by a vector x := (x 0 , x 1 , x 2 , …, x n ). It should be noted that x 0 is now an independent variable because the container fleet size in the system is not fixed anymore. Let X = {x | x i ∈ Z for i = 0, 1, 2, …, n} be the system state space. Let  = {u(t) := (λ01 (t), λ02 (t), …, λ0n (t), λ10 (t), λ20 (t), …, λn0 (t)) | λ0i (t) ∈ [0, r 0i ] and λi0 (t) ∈ [0, r i0 ] for t ∈ (0, ∞), i = 1, 2, …, n} be the admissible control set. The problem is to find the optimal feedback control policy u ∈  to minimize the infinite-horizon expected discounted cost J(x). Apart from the arrival events of laden containers and the arrival events of repositioned empty containers, there are two additional types of events, i.e., external empty container supply and demand at each depot in the system. To simplify the narrative, define the following operators: • • • •

Ai x = y := (y0 , y1 , y2 , …, yn ) with yi = x i + 1, and yj = x j for j = i; Di x = y := (y0 , y1 , y2 , …, yn ) with yi = x i − 1, and yj = x j for j = i; T 0i x = D0 Ai x; T i0 x = Di A0 x.

126

5 Optimal and Near-Optimal ECR Policies in Hub-and-Spoke …

Fig. 5.7 The system state transition map in hub-and-spoke system with external flows

Aix

T0ix d0i

ai

0i

a0

x

D0x

di0

d0 di

A0x

i0

Dix

Ti0x

The system state transition map can be illustrated in Fig. 5.7, where the transition rates λ0i and λi0 are controllable decisions and depend on the current state x and all other transition rates are exogenously given. By the uniformization technique, the continuous-time Markov chain problem can be transformed into an equivalent discrete-time problem with the uniform transition n rate v = a0 + d0 + i=1 (ai + di + di0 + d0i + ri0 + r0i ). Under an admission control policy u ∈ , the one-step transition probability P(y | x, u) is given as follows: P(Ai x|x, u) = ai /v, for i = 0, 1, 2, . . . , n. P(Di x|x, u) = di /v, for i = 0, 1, 2, . . . , n. P(T0i x|x, u) = (λ0i + d0i )/v, for i = 1, 2, . . . , n. P(Ti0 x|x, u) = (λi0 + di0 )/v, for i = 1, 2, . . . , n.   n P(x|x, u) = v − (d0i + di0 + λ0i + λi0 ) /v. i=1

P(y|x, u) = 0, otherwise. Following the stochastic dynamic programming theory, the Bellman optimality equation is  n n 1 min g(x) + d0i J (T0i x) J (x) = (q0i λ0i + qi0 λi0 ) + β + v λi0 ,λ0i i=1 i=1 +

n i=1

di0 J (Ti0 x) +

n i=1

λ0i J (T0i x) +

n i=1

λi0 J (Ti0 x)

5.6 Extension to Cases with External Supply and Demand

+

n

(ai J ( Ai x) + di J (Di x)) +

i=0

n

127

 (r0i + ri0 − λ0i − λi0 )J (x)

i=1

The above equation can be simplified as  n 1 J (x) = g(x) + (d0i J (T0i x) + di0 J (Ti0 x)) β +v i=1 + +

n i=0 n

(ai J ( Ai x) + di J (Di x)) +

n

r0i min(J (T0i x) + q0i , J (x))

i=1



ri0 min(J (Ti0 x) + qi0 , J (x))

i=1

Now we have obtained the equivalent discrete-time Markov chain model. This model is more tractable than the continuous-time Markov chain model. For example, the value iteration algorithm could be applied to calculate the value function numerically, which then yields the optimal ECR policy. To circumvent the curse of dimensionality, approximate dynamic programming could be used, which will be introduced in the next chapter.

5.7 Summary and Notes This chapter considers the ECR problem in a hub-and-spoke transportation system. Similar to the methodology in Chap. 4, we take the perspective of continuous review and discrete state to formulate an event-driven Markov decision model. The empty repositioning decisions are made at each epoch when the system state changes. To overcome the computational complexity of the stochastic dynamic programming model, we propose a dynamic decomposition procedure, whose computational complexity is linear in the number of spokes and can be calculated offline. The requirement for online calculation and data communication is very low. We analyze the structures of the dynamic decomposition policy and show that the dynamic decomposition policy has the same asymptotic behaviors as the optimal ECR policy. We also confirm that the commonly used threshold control policy is usually not optimal in hub-and-spoke systems. The proposed dynamic decomposition procedure can be applied to both discounted cost and long-run average cost cases. Numerical experiments demonstrate the effectiveness of the dynamic decomposition policy and its robustness against the assumption of the distribution types in terms of the laden container arrivals and the empty container transfer times. We also discuss the extension of the model to the cases with external supply and demand of empty containers, where the empty containers may enter and exit the system randomly.

128

5 Optimal and Near-Optimal ECR Policies in Hub-and-Spoke …

Further development of the model is to consider multiple hubs systems. Many countries (e.g., the UK, France) have several major container ports, where each of them can be regarded as a hub depot. The inland depots are connected to most of these seaports. However, import and export containers are often associated with specific liner shipping services, which have fixed port sequences. That means import containers from one seaport are likely to be moved back to the same seaport for exporting or repositioning. In that sense, the system can be regarded as a set of independent many-spokes-one-hub systems. Nevertheless, more complicated transportation systems require further research (Hall & Zhong, 2002). Methodologically, the assumption of exponentially distributed transportation times can greatly simplify the computation complexity but may not represent the realistic situations accurately. In the literature, dynamic programming formulations of fleet management problems usually assume deterministic transportation times. This causes computational difficulty in order to keep track of individual containers over multiple time periods, which leads to much larger state space in the models (Godfrey & Powell, 2002; Topaloglu & Powell, 2006). An alternative is to apply approximate dynamic programming method to overcome the curse of dimensionality by sacrificing the accuracy of the cost function (Powell, 2011; Sutton & Barto, 2018). The approximate dynamic programming method will be discussed in next chapter.

References Bertsekas, D. P. (1976). Dynamic programming and stochastic control. Academic Press. Bertsekas, D. P. (1987). Dynamic programming: Deterministic and stochastic models. Prentice-Hall. Cassandras, C. G., & Lafortune, S. (1999). Introduction to discrete event systems. Kluwer. Du, Y. F., & Hall, R. W. (1997). Fleet sizing and empty equipment redistribution for center-terminal transportation networks. Management Science, 43(2), 145–157. Godfrey, G., & Powell, W. B. (2002). An adaptive dynamic programming algorithm for singleperiod fleet management problems II: Multiperiod travel times. Transportation Science, 36(1), 40–54. Hall, R. W., & Zhong, H. S. (2002). Decentralized inventory control policies for equipment management in a many-to-many network. Transportation Research Part A, 36, 849–865. Powell, W. B. (2011). Approximate dynamic programming. Wiley. Puterman, M. L. (1994). Markov decision processes: Discrete stochastic dynamic programming. Wiley. Song, D. P., & Carter, J. (2008). Optimal empty vehicle redistribution for hub-and-spoke transportation systems. Naval Research Logistics, 55(2), 156–171. Song, D. P., & Earl, C. F. (2008). Optimal empty vehicle repositioning and fleet-sizing for two-depot service systems. European Journal of Operational Research, 185(2), 760–777. Sutton, R., & Barto, A. (2018). Reinforcement learning (2nd ed.). The MIT Press. Topaloglu, H., & Powell, W. B. (2006). Dynamic-programming approximations for stochastic timestaged integer multicommodity-flow problems. Informs Journal on Computing, 18(1), 31–42.

Chapter 6

Optimal ECR in General Inland Transportation Systems with Uncertainty: Periodic Review

Abstract This chapter consists of two parts. In the first part, we consider the optimal ECR problems for general inland transportation systems with multiple interconnected depots over multiple time periods. On the one hand, there are laden and empty container flows between depots. On the other hand, each depot is facing external supply and demand of empty containers, which means empty containers may enter or exit the system at each depot. Three stochastic dynamic programming models are formulated based on periodic review mechanisms including (i) a multi-depot transportation system without transfer seaports; (ii) a multi-depot transportation system with transfer seaports; (iii) an intermodal multi-depot transportation system with transfer seaports. In the second part, facing the challenge of dynamic decision making in stochastic systems, various optimization methods are introduced to solve the optimization problems. Specifically, we discuss the applications of approximate dynamic programming methods, simulation methods, metaheuristic optimization methods, stochastic approximation methods, perturbation analysis methods, and ordinal optimization methods. Their relative advantages and disadvantages are explained. Finally, a summary and a note are provided.

6.1 Introduction General inland container transportation systems consist of multiple inland depots and multiple seaport depots, and these depots are interconnected. Both laden containers and empty containers are transported between these depots. Inland depots usually incur lower storage fees than port depots due to less congested storage capacity, while ports are closer to seaborne transport networks and easier for international empty repositioning by vessels. Empty containers can also be moved from import customers to inland depots, and from inland depots to export customers in response to their transportation requests. Each depot (either inland depot or port depot) is interfacing with local customers, who may request empty containers to meet demands and/or return empty containers after fulfilling the demands. Such demand and supply of empty containers at each depot are often imbalanced, external, and uncertain. Intuitively, import-dominant © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 D.-P. Song and J. Dong, Modelling Empty Container Repositioning Logistics, https://doi.org/10.1007/978-3-030-93383-8_6

129

130

6 Optimal ECR in General Inland Transportation Systems …

depots (called surplus depots) tend to accumulate empty containers, whereas exportdominant depots (called deficit depots) tend to face container shortages. Clearly, it is important to ensure there are adequate empty containers in deficit depots to satisfy customer demands, and also necessary to avoid excessive cumulation of empty containers at surplus depots. The inland ECR problem concerns the optimal transportation of laden and empty containers in the inland transportation system over multiple periods in order to better serve uncertain customer demands and minimize the expected total operating costs. This chapter will address three types of inland transportation systems. In the first model, all the depots are assumed to be homogeneous in the sense we do not distinguish inland depots and port depots. Apart from each depot faces external local demand and supply of empty containers, there are transport demands between each pair of depots. In the second model, inland depots and port depots are treated differently and the focus is on international trade demands. Ports act as two roles: a depot role similar to inland depot to serve local demand and supply, and a transfer point role that interfaces with overseas countries via seaborne transport. In the third model, an intermodal transportation system is considered, where inland depots and seaports are assumed to be connected with multiple transport modes. The transport capacity of each transport mode is also considered explicitly. The following assumptions are made to facilitate the formulation of the inland ECR problems in the general inland transportation systems with uncertainty. Assumption 6.1 Uncertainties are described by random variables. These random variables are independent for different depots over different periods. The probability distributions of the random variables are known. Assumption 6.2 Customer demands must be satisfied at each period. If there are inadequate owned containers, additional containers can be leased from lessors in a complementary way. There is no capacity restriction on the leased empty containers at any depot. The leased container will be charged on time basis until it is returned. Assumption 6.3 The decisions on laden and empty container movements are made at the beginning of each period in a sequential and dynamic way. That means decisions at the current period should be determined after the realization of the random variables at earlier periods and before the realization of the random variables at later periods. Assumption 6.4 Laden and empty container transportation between any two depots take one time period. They are dispatched from the original depot at the beginning of each period and reach the destination depot at the end of the current period. Assumption 6.5 A single type of container is considered, e.g., the standard twentyfoot equivalent unit (TEU). Assumption 6.6 The damage, repairs, and maintenance of containers are not considered explicitly. They may be consolidated into the periodic time and inventory costs.

6.1 Introduction

131

Assumption 6.7 When multiple transport modes are considered, each transport mode is subject to capacity constraints for the sum of both laden and empty container movements. In the following sections, we will formulate the ECR optimization problems in three cases respectively, and then present a number of solution methods to tackle the optimization problem.

6.2 ECR in Inland Transportation Systems Consider a transportation system consisting of N depots where empty containers are repositioned between them over a dynamic multi-period planning horizon subject to uncertain laden container movements. Moreover, each depot has its own external supply and demand for empty containers, and its net supply is denoted as a random variable zi . The laden container movements between any two depots are denoted as random variables d ij . Empty containers can be repositioned between any two depots. Let x i be the cumulative inventory level of empty containers at depot i. Negative x i represents the number of leased containers at depot i. For simplicity, we assume a single type of container. The activities involved in the dynamic system include: (i) empty container demand and supply at each depot; (ii) laden container transportation between any two depots; (iii) empty container repositioning between any two depots; (iv) leasing containers at a depot when needed. The system can be illustrated in Fig. 6.1. The following notations are introduced t N d ij ,t

a discrete time period with t = 1, 2, …, T, where T denotes the planning horizon; the number of depots in the transportation system; the random variable representing the demand in containers from depot i to depot j at the beginning of period t, which must be satisfied at period t by either owned containers or leased containers;

Fig. 6.1 A general inland transportation system

z1

z2

d1N, dN1

x1

x2

u1N, uN1

xN

xi

zN

zi

132

6 Optimal ECR in General Inland Transportation Systems …

zi,t

the random variable representing the net number of containers into depot i at the beginning of period t, which could be negative if external demand is greater than external supply. If positive, they are available for use at period t; the inventory level of empty containers at depot i at the beginning of period t, which can be negative (representing leased containers); it includes the received shipments, external demand, and supply, and repositioned empty containers that incurred at period t − 1; the number of empty containers repositioned from depot i to depot j in period t, which is a non-negative decision variable; the unit transportation cost of laden container from depot i to depot j; the unit transportation cost of empty container from depot i to depot j; the unit inventory holding cost of empty container per period at depot i; the unit leasing cost of empty container per period at depot i.

x i,t

uij ,t cidj ciej cih cil

It is assumed that the transport lead times between any two depots are one period. For example, if we take one day as one period, any two depots in the UK can be reached within one day. The sequence of activities at each depot at each period is assumed as follows: (i) at the beginning of period t, depot i has the inventory level x i,t ; depot i satisfy the transportation demands from depot i to other depots (leasing empty containers if needed), and meanwhile determines the number of empty containers to be repositioned to other depots; (ii) calculate the inventory cost, leasing costs, and empty repositioning costs for period t; (iii) at the end of period t, depot i receives the net supply of empty containers from external sources in period t; receives the shipments of laden containers from other depots to depot i in period t; receives the empty containers repositioned to depot i from other depots in period t; (iv) update the inventory state at depot i at the beginning of period t + 1. This process can be illustrated in Fig. 6.2. The system state, i.e., the inventory levels at depot i in over periods, is updated as follows     di j,t − u i j,t + z i,t + di j,t + u i j,t (6.1) xi,t+1 = xi,t − j

j



j

j

+ u i j,t ≤ xi,t , u i j,t ≥ 0.

(6.2)

j

Fig. 6.2 The event sequence to update state at a depot

zi,t-1

dji,t-1, uji,t-1

zi,t

xi,t

xi,t+1

t

t+1 dij,t,uij,t

dji,t, uji,t

6.2 ECR in Inland Transportation Systems

133

To simplify the narrative, let xt := (x 1,t , x 2,t , …, x N,t ) and ut := {uij,t | 1 ≤ i, j ≤ N}. The problem is to find the optimal ECR policy {ut | 1 ≤ t ≤ T } that minimizes the following finite horizon cost function with the initial state x0 := (x 1,0 , x 2,0 , …, x N,0 ). T 

⎡ γ t E⎣

t=1

+

cidj di j,t +

N  N 

i=1 j=1 N  i=1

+

N  N 

N 



cih ⎝xi,t

ciej u i j,t

i=1 j=1

⎞+ N    di j,t + u i j,t ⎠ − j=1



⎞+ ⎤ N    di j,t + u i j,t − xi,t ⎠ ⎦ cil ⎝

i=1

(6.3)

j=1

where γ is a discount factor, and x + = max{0, x}. Let V t (xt ) be the expected discounted cost from period t to T. The backward Bellman optimality equation is given by Vt (xt ) = min{G t (xt , ut ) + γ E Vt+1 (xt+1 )} ut

(6.4)

where V T +1 (xT ) ≡ 0 for all x, and Gt (xt , ut ) is defined by, G t (xt , ut ) =

N  N  i=1 j=1

+E

N 

ciej u i j,t + E

N  N 

cidj di j,t

i=1 j=1

⎛ cih ⎝xi,t

i=1

⎞+ N    di j,t + u i j,t ⎠ − j=1

⎛ ⎞+ N N     di j,t + u i j,t − xi,t ⎠ +E cil ⎝ i=1

(6.5)

j=1

Due to the curse of dimensionality, the above Bellman optimality equations are difficult to solve exactly. An alternative is to use approximate dynamic programming method. Another alternative is to simplify the control policy {ut } to take the threshold type of control policies based on the results in previous chapters, then use simulation-based metaheuristics or perturbation analysis to optimize the threshold control parameters (Lee et al., 2012).

134

6 Optimal ECR in General Inland Transportation Systems …

6.3 ECR in Inland Transportation Systems with Transfer Ports The model in this section assumes that all depots have similar functions and laden containers are essentially terminated at the destination depots. This may not represent international containerized cargos appropriately because their journeys do not stop at (or originated from) the regional depots. Specifically, export containers are required to be moved to seaports and then transferred onto vessels to continue their journey to overseas countries. Import containers from overseas countries are unloaded at seaports and then transported to inland depots or customers. The point is that those laden containers are not terminated at seaports and will not become empty containers for reuse immediately. On the other hand, there may be empty containers to be repositioned from overseas countries to seaports if the port is a deficit port or from seaports to overseas countries if the port is a surplus port. Moreover, there may be no customer demands between inland depots directly when we focus on international trade. Therefore, from the international trade’s perspective, port depots and inland depots have different functions. Ports perform an additional transfer function to interface with seaborne transport on top of normal depot’s functions, which incurs transfer costs. Consider an inland transportation network including a set of inland depots (denoted by D) and a set of seaport depots (denoted as P). Each depot has its own local external supply and demand for empty containers, and its net supply of local empty containers is denoted by a random variable. Each depot has a random export demand to overseas countries via a seaport and a random import demand from overseas to the depot via a seaport. Empty containers can be repositioned between any two depots. The activities involved in this dynamic transportation system include: (i) local empty container demand and supply at each depot (both inland depots and seaport depots); (ii) international empty container demand and supply at each port depot; (iii) export laden container transportation from inland depots via seaports to overseas countries; (iv) import laden container transportation from overseas countries via seaports to inland depots; (v) container transfer at port between inland transport and seaborne transport; (vi) empty container repositioning between any two depots; (vii) leasing containers at a depot when needed. The following notations are introduced. t D P d ip,t

a discrete time period with t = 1, 2, …, T, where T denotes the planning horizon; the set of inland depots in the transportation system; the set of seaport depots in the transportation system; the random variable representing the export demand from inland depot i (i ∈ D) to overseas countries transferred via seaport p (p ∈ P) at the beginning of period t, which must be satisfied at period t by either owned containers or leased containers;

6.3 ECR in Inland Transportation Systems with Transfer Ports

d p0,t

d pi,t d 0p,t zi,t

z0p,t

x i,t uij ,t cidj ciej cih cil ctp

135

the random variable representing the export demand originated from seaport p (p ∈ P) to overseas countries at the beginning of period t, which must be satisfied at period t by either owned containers or leased containers; the random variable representing the import demand from overseas countries to inland depot i (i ∈ D) transferred via seaport p (p ∈ P) in period t; the random variable representing the import demand from overseas countries to seaport p (p ∈ P) in period t; the random variable representing the net number of local empty containers into depot i (i ∈ D ∪ P) at the beginning of period t, which could be negative if external local demand is greater than external local supply; the random variable representing the net number of international empty containers into seaport p (i ∈ P) from overseas countries via seaborne transport at the beginning of period t, which could be positive if the seaport is a deficit port and negative if the seaport is surplus port; the inventory level of empty containers at depot i at the beginning of period t. Negative x i,t represents the number of leased containers; the number of empty containers repositioned from depot i (i ∈ D ∪ P) to depot j (j ∈ D ∪ P) in period t, which is a non-negative decision variable; the unit transportation cost of laden container from depot i to depot j; the unit transportation cost of empty container from depot i to depot j; the unit inventory holding cost of empty container per period at depot i; the unit leasing cost of empty container per period at depot i; the unit transfer cost of containers interfacing with seaborne transport at port p.

The system can be illustrated in Fig. 6.3. Fig. 6.3 An inland transportation system with transfer seaports

zi

xi

z1

uNi, uiN

zN

dpi, dip

xN

x1

zp

xp

z0p

xq

dp0, d0p

z0q

zq

dq0, d0q

136 Fig. 6.4 The event sequence to update state at an inland depot

6 Optimal ECR in General Inland Transportation Systems …

zi,t

zi,t-1

xi,t

xi,t+1

t

t+1

dpi,t-1, uji,t-1

dip,t,uij,t

dpi,t, uji,t

It is assumed that the transport lead times between any two depots (inland depots or seaport depots) are one period. The sequence of activities at an inland depot at each period is: (i) at the beginning of period t, inland depot i has the inventory level x i,t ; then inland depot i satisfies the demands from depot i to overseas countries via seaport p (leasing empty containers if needed), and meanwhile determines the empty containers be repositioned to other depots; (ii) calculate the inventory cost, leasing costs, inland laden container transport costs, inland empty repositioning costs for period t; (iii) at the end of period t, each inland depot receives the net supply of empty containers from local external sources in period t; receives the shipments of laden containers from overseas countries via ports to inland depots in period t; receives the empty containers repositioned to the depot from other depots in period t; (iv) each inland depot updates its inventory state at the beginning of period t + 1. This process can be illustrated in Fig. 6.4. The sequence of activities at a seaport at each period is as follows: (i) at the beginning of period t, seaport p has the inventory level x p,t ; seaport p satisfies the export demands originated from seaport p to overseas countries (leasing empty containers if needed), and meanwhile determines the empty containers to be repositioned to other depots; transfers laden containers from overseas to inland depots; (ii) calculate the inventory costs, leasing costs, inland laden container transport costs, inland empty repositioning costs, container transfer costs between the port and seaborne transport for the period t; (iii) at the end of period t, seaport p receives the net supply of empty containers from local external sources in period t; receives the net supply of empty containers from overseas external sources in period t; receives the shipment of laden containers from overseas countries to seaport p in period t; receives the empty containers repositioned to seaport p from other depots in period t; transfers export laden containers received from inland depots to seaborne transport for overseas destinations in period t; (iv) update the inventory state at seaport p at the beginning of period t + 1. This process is shown in Fig. 6.5. Fig. 6.5 The event sequence to update state at a seaport

z0p,t-1, zp,t-1

d0p,t-1, uip,t-1, uqp,t-1

z0p,t, zp,t

xp,t

xp,t+1

t

t+1 dp0,t, dpi,t, upi,t, upq,t

d0p,t, uip,t, uqp,t

6.3 ECR in Inland Transportation Systems with Transfer Ports

137

The evolution of the system state at inland depot i (i ∈ D) and seaport p (i ∈ P) can be described as follows     di p,t − u i j,t + z i,t + d pi,t + u ji,t (6.6) xi,t+1 = xi,t − p∈P

j∈D∪P



x p,t+1 = x p,t − d p0,t −

p∈P

j∈D∪P

u pj,t + z p,t + z 0 p,t + d0 p,t +

j∈D∪P





u j p,t

(6.7)

j∈D∪P

+ u i j,t ≤ xi,t , u i j,t ≥ 0.

(6.8)

j∈D∪P

To simplify the narrative, let xt := {x i,t , x p,t | i ∈ D, p ∈ P) and ut := {uij,t | i, j ∈ D ∪ P}. The problem is to find the optimal ECR policy {ut | 1 ≤ t ≤ T } that minimizes the following finite horizon cost function with the initial state x0 . T 

⎡ γ t E⎣

t=1

+



 i∈D

+

 p∈P

+







cih ⎝xi,t ⎛ cil ⎝









di p,t −





chp ⎝x p,t − d p0,t −

 j∈D∪P

⎛ clp ⎝d p0,t +



⎞+

u i j,t − xi,t ⎠

j∈D∪P



⎞+

u i j,t ⎠

j∈D∪P

di p,t +

p∈P

  cidp di p,t + cdpi d pi,t i∈D p∈P

p∈P

p∈P

+

ciej u i j,t +

i∈D∪P j∈D∪P

i∈D

+



⎞+ u pj,t ⎠ ⎞+

u pj,t − x p,t ⎠

j∈D∪P

ctp d p0,t + d0 p,t

p∈P

⎤     d pi,t + di p,t ⎦ + z 0 p,t  +

(6.9)

i∈D

where γ is a discount factor. Let V t (xt ) be the expected discounted cost from period t to T. The backward Bellman optimality equation is given by Vt (xt ) = min{G t (xt , ut ) + γ E Vt+1 (xt+1 )} ut

where V T +1 (xT ) ≡ 0 for all x, and Gt (xt , ut ) is defined by,

(6.10)

138

6 Optimal ECR in General Inland Transportation Systems …



G t (xt , ut ) =



ciej u i j,t + E

i∈D∪P j∈D∪P

+E

 i∈D

+E

 i∈D

+E

 p∈P

+E



cih ⎝xi,t −

cil ⎝

+E

p∈P



di p,t −

p∈P



di p,t +

p∈P

clp ⎝d p0,t +

⎞+ u i j,t ⎠

j∈D∪P



chp ⎝x p,t − d p0,t − ⎛



 j∈D∪P



⎞+

u i j,t − xi,t ⎠

j∈D∪P



p∈P



i∈D p∈P





  cidp di p,t + cdpi d pi,t

⎞+ u pj,t ⎠ ⎞+

u pj,t − x p,t ⎠

j∈D∪P

ctp d p0,t + d0 p,t

⎤     d pi,t + di p,t ⎦ + z 0 p,t  +

(6.11)

i∈D

6.4 ECR in Intermodal Transportation Systems One of the unique advantages of containerization is its intermodal nature, which means trucks, trains, vessels, and handling equipment have been purposely designed to carry the standard container. Intermodal transport is very common in inland container transport chains (Zhao et al., 2018). This section extends the models in previous sections to the intermodal situations, where multiple transport modes and/or carriers are available to carry out the transport tasks. Consider a regional intermodal transportation network including a set of inland depots (denoted by D) and a set of seaport depots (denoted as P). The depots are connected in multiple ways representing different transport modes or different transport carriers, which incur different transport costs and carrying capacities. Compared to the models in the previous sections, we have to make additional decisions on how to split the laden and empty container movements over multiple capacitated transport modes. The activities involved in the dynamic system include: (i) local empty container demand and supply at each depot; (ii) international empty container demand and supply at each port; (iii) export laden container transportation from inland depots to overseas countries via port depots through different transport modes; (iv) import laden container transportation from overseas countries via ports to inland depots through different transport modes; (v) container transfer at port between land transport and seaborne transport; (vi) empty container repositioning between any two

6.4 ECR in Intermodal Transportation Systems

139

depots through different transport modes; (vii) leasing containers at a depot when needed. The laden and empty container flows can be similarly illustrated as Fig. 6.3; however, the links between depots should be interpreted as multiple transport modes (or carriers). The additional notations and the modified notations from previous section are given below. K d ij ,t

yijk ,t uijk ,t

Capi jk cidjk ciejk

the set of transport modes (or carriers) between any two depots; the random demand from depot i to depot j at the beginning of period t. We can let d ij ,t be zero if both depots (i, j) are inland depots or seaport depots. This will simplify the narrative of the model; the laden containers transported from depot i through transport mode k to depot j at the begging of period t, which is a decision variable; the number of empty containers repositioned from depot i (i ∈ D ∪ P) to depot j (j ∈ D ∪ P) through transport mode k in period t, which is a decision variable; the carrying capacity from depot i to depot j through transport mode k; the unit transportation cost of laden container from depot i to depot j through transport mode k; the unit transportation cost of empty container from depot i to depot j through transport mode k.

It is assumed that the transport lead times between any two depots (inland depots or seaports) are one period. The sequence of activities at each inland depot at each period t is: (i) at the beginning of period t, inland depot i has the inventory level x i,t ; inland depot i satisfies the demands from depot i to overseas countries via seaport p (leasing empty containers if needed) through each transport mode, and meanwhile determines the empty containers to be repositioned to other depots through each transport mode; (ii) calculate the inventory cost, leasing costs, and laden container transport costs, empty repositioning costs for period t; (iii) at the end of period t, inland depot i receives the net supply of empty containers from local external sources in period t; receives the shipment of laden containers from overseas countries via port p to depot i through all transport modes in period t; receives the empty containers repositioned to depot i from other depots through all transport modes in period t; (iv) update the inventory state at inland depot i at the beginning of period t + 1. This process can be illustrated in Fig. 6.6. The sequence of activities at each seaport at each period is as follows: (i) at the beginning of period t, seaport p has the inventory level x p,t ; seaport p satisfies the Fig. 6.6 The event sequence to update state at an inland depot with multiple modes

zi,t-1

ypik,t-1, ujik,t-1

zi,t

xi,t

xi,t+1

t

t+1 yipk,t,uijk,t

ypik,t, ujik,t

140

6 Optimal ECR in General Inland Transportation Systems …

z0p,t-1, zp,t-1

Fig. 6.7 The event sequence to update state at a seaport with multiple modes

z0p,t, zp,t

d0p,t-1,uipk,t-1, uqpk,t-1

xp,t

xp,t+1

t

t+1 dp0,t, ypik,t, upik,t, upqk,t

d0p,t, uipk,t, uqpk,t

export demands from seaport p to overseas countries (leasing empty containers if needed), and meanwhile determines the empty containers to be repositioned to other depots through each transport mode; transfer import laden containers to inland depots via different transport modes; (ii) calculate the inventory costs, leasing costs, laden container transport costs, empty repositioning costs, and container transfer costs between the port and seaborne transport for the period t; (iii) at the end of period t, seaport p receives the net supply of empty containers from local external sources in period t; receives the net supply of empty containers from overseas external sources in period t; receives the shipment of laden containers from overseas countries to seaport p in period t; receives the empty containers repositioned to seaport p from other depots through all transport modes in period t; transfer export laden containers received from inland depots via different transport modes to seaborne transport for overseas destinations; (iv) update the inventory state at seaport p at the beginning of period t + 1. This process is illustrated in Fig. 6.7. The evolution of the system state at inland depot i (i ∈ D) and seaport p (i ∈ P) can be described as follows   yi pk,t − u i jk,t xi,t+1 = xi,t − p∈P,k∈K

+ z i,t +



j∈D∪P,k∈K

p∈P,k∈K

x p,t+1 = x p,t − d p0,t −



y pik,t +

u jik,t

(6.12)

u j pk,t

(6.13)

j∈D∪P,k∈K



u pjk,t

j∈D∪P,k∈K

+ z p,t + z 0 p,t + d0 p,t +



j∈D∪P,k∈K



+ u i jk,t ≤ xi,t , u i jk,t ≥ 0,

(6.14)

yi jk,t = di j,t , for any i, j and t,

(6.15)

j∈D∪P,k∈K

 k∈K

yi jk,t + u i jk,t ≤ Capi jk , for any i, j, k, and t.

(6.16)

6.4 ECR in Intermodal Transportation Systems

141

To simplify the narrative, let xt := {x i,t , x p,t | i ∈ D, p ∈ P) and ut := {(yij,t , uij,t ) | i, j ∈ D ∪ P}. The problem is to find the optimal modal split and ECR policy {ut | 1 ≤ t ≤ T } that minimizes the following finite horizon cost function with the initial state x0 . ⎡ T       cidpk yi pk,t + cdpik y pik,t γ t E⎣ ciejk u i jk,t + t=1

i∈D∪P j∈D∪P k∈K

+

 i∈D

+

 i∈D

+

 p∈P

+





cih ⎝xi,t − ⎛ cil ⎝





+

 

di p,t −

p∈P

⎞+

u i jk,t ⎠

j∈D∪P k∈K

 

di p,t +

p∈P



chp ⎝x p,t − d p0,t −

 

⎞+ u pjk,t ⎠

j∈D∪P k∈K



 

clp ⎝d p0,t +

⎞+

u i jk,t − xi,t ⎠

j∈D∪P k∈K

p∈P



i∈D p∈P k∈K

⎞+

u pjk,t − x p,t ⎠

j∈D∪P k∈K

⎤     d pi,t + di p,t ⎦ + z 0 p,t  +

ctp d p0,t + d0 p,t

p∈P

(6.17)

i∈D

Let V t (xt ) be the expected discounted cost from period t to T. The backward Bellman optimality equation is given by Vt (xt ) = min{G t (xt , ut ) + γ E Vt+1 (xt+1 )} ut

(6.18)

where V T +1 (xT ) ≡ 0 for all x, and Gt (xt , ut ) is defined by, G t (xt , ut ) =



 

ciejk u i jk,t + E

i∈D∪P j∈D∪P k∈K

+E

 i∈D

+E

 i∈D



cih ⎝xi,t − ⎛ cil ⎝

i∈D p∈P k∈K



di p,t −

p∈P

 p∈P

  cidpk yi pk,t + cdpik y pik,t

di p,t +

  j∈D∪P k∈K

  j∈D∪P k∈K

⎞+

u i jk,t ⎠ ⎞+

u i jk,t − xi,t ⎠

142

6 Optimal ECR in General Inland Transportation Systems …

+E

 p∈P

+E



⎛ chp ⎝x p,t − d p0,t − ⎛ clp ⎝d p0,t +

p∈P

+E

 p∈P

 

⎞+ u pjk,t ⎠

j∈D∪P k∈K

 

⎞+

u pjk,t − x p,t ⎠

j∈D∪P k∈K

ctp d p0,t + d0 p,t

⎤     d pi,t + di p,t ⎦ + z 0 p,t  +

(6.19)

i∈D

In the following sections, we will discuss how to solve the dynamic decisionmaking problems formulated in Sects. 6.2–6.4.

6.5 Approximate Dynamic Programming Method The key characteristic of the optimization problems in previous sections is making decisions over time in the presence of uncertainties that are realized sequentially over time. This type of problem is predominantly tackled using stochastic dynamic programming method. The main idea is to make the current decisions that use an expected value of states to which the current actions might take us in the future. The foundation of stochastic dynamic programming is the Bellman optimality equation, also called the Hamilton–Jacobi–Bellman equations especially in continuous-time continuous-state situations (Powell, 2009). To solve the Bellman optimality equations presented in the previous sections, we encounter three curses of dimensionality. First, the state space has multiple dimensions determined by the number of depots. Second, the outcome space is determined by the vector of random variables to perform expectation that has high dimensions. Third, the action space is associated with the vector of ECR decisions with high dimensions. To overcome the curses of dimensionality, one promising solution method is approximate dynamic programming (Powell, 2011). Other similar methods have also been developed, e.g., Neuro-Dynamic Programming (Bertsekas & Tsitsiklis, 1996) that takes the control theory perspective, and Reinforcement Learning (Sutton & Barto, 1998) that takes the artificial intelligence perspective (Powell, 2009). This section will present the approximate dynamic programming method to solve the stochastic dynamic programming models.

6.5.1 Generalized Stochastic Dynamic Programming Model We first standardize the problems in the previous sections to facilitate the narrative for its generalization.

6.5 Approximate Dynamic Programming Method

143

The dynamic decision-making problem under uncertainty can be described as follows. When we are at the system state xt at time t, we take a set of actions ut and then observe the realization of the random variables (uncertainties) that occurs during the period from t to t + 1, which is represented by a vector ξt . This takes the system into a new state xt+1 . The state space is denoted as X, i.e., xt ∈ X. We use u = {ut } to denote the policy (or decision function) for making decisions over time. The actions ut are assumed to take the feedback control form depending on the current system state xt . There is a transition function T (.) that describes how a system evolves from the current state xt at time t to the next state xt+1 at time t + 1. The transition function T (.) is also called the one-step transition matrix, which represents the probability that we transition the current state to the next state after taking the current actions. More specifically, the dynamic system is given by, ut = u(xt )

(6.20)

xt+1 = T (xt , ut , ξt )

(6.21)

The one-step cost function is denoted as g(xt , ut , ξt ), which depends on the current state, the current actions, and the realization of the random variables from the current period to the next period. It is assumed that the random vector ξt is independent over time period t. The problem is to find the optimal policy u that minimizes the total expected cost over the planning horizon from period 1 to period T, min u

T 

γ t E g(xt , ut , ξt )

(6.22)

t=1

Here γ is a discount factor. Let G(xt , ut ) = Eg(xt , ut , ξt ). Define V t (xt ) as the cost-to-go value function, which gives the expected value of being in state xt at time t and following an optimal policy forward from period t to period T. The Bellman optimality equation is given by    Vt (xt ) = min G t (xt , ut ) + γ E Vt+1 (xt+1 )|xt ut

(6.23)

In the standard stochastic dynamic programming theory, the backward value iteration algorithm is commonly used to calculate V t (x) after known V t+1 (x) for any x ∈ X. If the state space X is discrete and the state variable is scalar, it is relatively straightforward to perform the value iteration algorithm to calculate V t (x) exactly. However, if X is a vector, then the number of states will grow exponentially with the number of dimensions. This phenomenon is called the curse of dimensionality in solving dynamic programming problems.

144

6 Optimal ECR in General Inland Transportation Systems …

6.5.2 Approximate Dynamic Programming Algorithm The main idea of approximate dynamic programming is to replace the true value function V t (xt ) with some sort of statistical approximation to circumvent the difficulty of calculating V t (x) exactly. Another important difference is that instead of working backward through time in a standard value iteration algorithm, approximate dynamic programming works forward in time. We start with a given state x0 and follow a particular sample path forward. We do this iteratively over a number of iterations (sample paths). In this process, we use an approximate value function based on the previous iteration to make decisions in the current sample path (Powell, 2011). Suppose we have completed iteration n − 1 and start iteration n. After iteration n n−1 − 1, we have an approximate value function V t (xt ). At iteration n, a sample path ωn is generated. Then, the evolution of the state and the actions is given by   xt+1 = T xtn , ut , ξt

(6.24)

   n−1 utn = arg min G t (xtn , ut ) + γ E V t+1 (xt+1 )|xtn

(6.25)

ut

Define a sample estimate of the value being at state xtn as follows,   n−1   v nt = G t xtn , utn + γ E V t+1 (xt+1 )|xtn

(6.26)

There are different ways to approximate a value function. The simplest approximation is known as a lookup table. For each discrete state x, we have an estimate V t (x) which gives the value of being in state x. Using a lookup-table representation, the estimate can be updated as follows, n

n−1

V t (xtn ) = (1 − αn−1 )V t

(xtn ) + αn−1 v nt

(6.27)

Here α n−1 is the step size at iteration n − 1. A generic approximate dynamic programming algorithm using a lookup-table representation is summarized in Algorithm 6.1 (Powell, 2011). Algorithm 6.1 A generic approximate dynamic programming algorithm Step 0. Initialization: 0 Step 0a. Initialize V t (xt ) for all states xt ∈ X. Step 0b. Select an initial state x01 for n = 1. Step 0c. Set the first iteration n = 1. Step 1. Generate a sample path ωn for the nth iteration. Step 2. For t = 0, 1, 2, …, T, do the following sub-steps: Step 2a. For the state xtn , solve

6.5 Approximate Dynamic Programming Method

145

   n−1 utn = arg min G t (xtn , ut ) + γ E V t+1 (xt+1 )|xtn ut

  n−1   v nt = G t xtn , utn + γ E V t+1 (xt+1 )|xtn n−1

Step 2b. Update V t

 n V t (xt )

=

n

(xt ) with V t (xt ) using n−1

(1 − αn−1 )V t n−1 V t (xt )

(xtn ) + αn−1 v nt if xt = xtn if xt = xtn

Step 2c. Update the system state by   n = T xtn , utn , ξt (ωn ) xt+1 Step 3. Let n = n + 1. Go to Step 1 until the number of iterations reaches the prespecified maximum number. Note that Algorithm 6.1 steps forward in time, and it does not require to enumerate all the states in the state space. However, a few points have to be clarified. First, the algorithm approximates the value of being in any state that has been visited. The number of states that might be visited can still be very large. Second, this algorithm only updates the values of the states that we actually visit. There is a balance to strike between exploration and exploitation in the learning process. Third, the multidimensional random vector may require treatment because finding the expectation over the multivariate distribution of the vector can be computationally expensive. Fourth, solving Eq. (6.25) may be challenging when the actions are a vector (Powell, 2009). Algorithm 6.1 essentially uses the flat representation for the state space to overcome the curse of dimensionality with respect to the state variables. However, for the state space of a multidimensional vector, it is impractical to list out all possible combinations of the state variables in a single list as the flat representation. One treatment that has been widely used to overcome multidimensional variables is to simply treat the vector xt as continuous. Note that xi,t is the number of containers at depot i at period t. Suppose the set of depots in the system is denoted as D. We might then approximate the value function using V t (xt |θ ) =

  + − θi+ xi,t + θi− xi,t

(6.28)

i∈D

This is a simple approximation that assumes that the behavior of the value function is piecewise linear in the number of containers. The parameter θ i + and θ i − capture the marginal value of holding and leasing containers at depot i. Now we have a value function that covers the entire state space with just 2|D| parameters. However, it should be pointed out that the linear-type approximation architecture may not provide a good approximation. Nevertheless, it provides an idea as the basic strategy to overcome the curse of dimensionality with respect to the state variables.

146

6 Optimal ECR in General Inland Transportation Systems …

6.6 Simulation Methods and Parameterized Policies A simulation is the imitation of the operation of a real-world system over time. It is usually based on an abstract or mathematical model that describes the characteristics, behaviors, and relationships of the real system. The simulation study includes the following key steps: (i) model conceptualization to understand the entities involved in the system and the relationship among them; (ii) the data collection and representation that are essential to run the simulation; (iii) model translation to convert the model into a computer program; (iv) verification and validation to check whether the program works properly and whether the computerized system accurately represents the real system (Garrido, 1998). Discrete-event simulation has been very successful in modeling complex dynamic systems subject to uncertainty. Recent software developments have successfully applied discrete-event simulation to offer great flexibility to model stochastic dynamic systems and evaluate the performance of a given policy, e.g., Arena (www. opttek.com), Simul8 (www.simul8.com), Witness (www.lanner.com) (Fu, 1994). For the problems formulated in Sects. 6.2–6.4, the simulation clock (i.e., the virtual time used in the simulated model) proceeds periodically with fixed time increment. The simulation will run from period 1 to period T, and the events that occurred in each period will be executed during that period. An event is defined as an instantaneous activity that changes the state of the system, e.g., a container departure activity and the container arrival activity. Take the model in Sect. 6.2 as an example, the flow chart of the simulation model can be illustrated in Fig. 6.8. The main purpose of the simulation model is to simulate the real system and evaluate the performance of a given ECR policy through averaging over multiple samples. The law of large numbers ensures that the estimated performance measure will be accurate if the number of the samples is sufficiently large. A policy is often characterized by a set of rules that can be used to determine the actual empty container movements in dynamic situations in response to external stochasticity. In the previous chapters, it has been shown that threshold-type control policies can be optimal in simple systems such as single-depot or two-depot systems. They also perform well in hub-and-spoke systems. Therefore, it is reasonable to extend threshold-type control policies to general inland transport networks. In fact, a number of studies have already applied threshold-type inventory-based control policies in managing empty containers, e.g., Lee et al. (2012) and Dang et al. (2013). Suppose we adopt an ECR policy that is characterized by a set of threshold parameters {(si , S i ) | i ∈ D}, where (si , S i ) represents the minimum and maximum inventory levels at depot i. Under such ECR policy, the dynamic empty container repositioning decisions can be described as follows. Algorithm 6.2 Dynamic decisions of empty containers under (si , S i ) threshold control policy.

6.6 Simulation Methods and Parameterized Policies Fig. 6.8 The flow chart of simulation model for stochastic dynamic systems

147

Input system parameters ECR repositioning policy Sample = 1 Initialize system states For period t = 1 to T and for each depot, handle the activities, e.g. Receive external empty supply Receive and meet customer demands Lease empty containers Dispatch laden/empty containers Receive laden/empty containers Calculate incurred costs Update system states Generate new events for next period

Terminate samples

No

Yes Output performance measures

Step 0. At period t. Step 1. Identify the set of surplus depots by Ds = {i | x i,t > S i }, and estimate the maximum volume of empty containers to be repositioned out of the surplus depots by x i,t − S i for i ∈ Ds . Step 2. Identify the set of deficit depots by Dd = {i | x i,t < si }, and estimate the maximum volume of empty containers to be repositioned into the deficit depots by si − x i,t for i ∈ Dd . Step 3. Determine the actual empty container flows from surplus depots to deficit depots in period t by minimizing the total repositioning costs. Clearly, for a given (si , S i ) threshold control policy, the simulation model can easily estimate its performance. However, the performance of the threshold policy is highly related to the values of the control parameters (si , S i ) for i ∈ D. Hence, the remaining question is how to find the best values of the control parameters in the threshold policies. In deterministic situations, commonly used approaches to

148

6 Optimal ECR in General Inland Transportation Systems …

optimizing parameters in complex systems include metaheuristic search and gradient search. These approaches can be also adapted to stochastic situations. The next few sections will explain the methods to optimize the control parameters in the ECR policies.

6.7 Metaheuristic Optimization Methods A metaheuristic is a high-level problem-independent algorithmic framework that provides a set of guidelines or strategies to develop heuristic optimization algorithms that often employ stochastic search mechanisms (Sorensen & Glover, 2013). Here the heuristics refer to partial search algorithms. Metaheuristics are always heuristic in nature and therefore cannot guarantee the optimality of the solution. Nevertheless, metaheuristics are able to find a solution that is satisfactory in terms of solution quality at an acceptable computing time. Metaheuristics usually have the following features (Blum & Roli, 2003): • Provide guidelines and strategies that guide the search process. • Be designed to explore the search space efficiently and exploit the search results effectively. • Be flexible to incorporate other techniques ranging from simple local search procedures to complex learning processes. • Be problem-independent for wide application. • Aim to find near-optimal solutions at acceptable computational time. Metaheuristics may be classified into two broad categories according to the number of candidate solutions used in the iterative search process: single solutionbased searches and population-based searches. Single solution-based approaches search for better solutions by modifying and improving a single candidate solution in each iteration. Typical examples of single solution-based metaheuristics include simulated annealing (Kirkpatrick et al., 1983) and tabu search (Glover, 1989). Population-based approaches search for better solutions by maintaining and improving multiple candidate solutions simultaneously in each iteration. Typical examples of population-based metaheuristics include genetic algorithms (Goldberg, 1989), scatter search (Glover, 1977), ant colony optimization algorithms (Dorigo et al., 1996), particle swarm optimization algorithm (Eberhart & Kennedy, 1995), and artificial bee colony algorithms (Karaboga, 2010). There are also hybrid metaheuristics that combine different types of traditional metaheuristics. Metaheuristics have been widely used to tackle optimization problems, especially for difficult combinatorial optimization problems. In fact, commercial software (e.g., Matlab) has also implemented metaheuristic optimization algorithms such as genetic algorithms. The combination of simulation and metaheuristics has been regarded as promising research area. The procedure of simulation-based optimization includes two key steps: (i) generate candidate solutions by the search engine; (ii) evaluate solutions

6.7 Metaheuristic Optimization Methods

149

Fig. 6.9 Simulation-based metaheuristic optimization

Metaheuristic optimization ECR policies (solutions)

Performance estimates Stochastic simulation

by the stochastic simulation model, which can be illustrated in Fig. 6.9. Here the simulation model represents the stochastic dynamic system in Fig. 6.8. It should be noted that optimizing a vector of variables (control parameters) is difficult even in deterministic situations when the structure of the objective function is little known (Banks et al., 2000; Fu, 2002). The stochastic nature adds further complications. Note that the stochastic simulation can only offer a performance estimate for a given solution instead of evaluating exactly. This implies that it is impossible to conclude whether one solution is definitely better than another solution. In theory, this problem may be overcome by making a sufficiently large number of replications so that the performance estimate of a specific solution does not vary essentially according to the law of large numbers. However, in practice, this would dramatically increase the computational time if we want to explore a large number of candidate solutions (Fu, 2002). The implication is that simply extending metaheuristics to the optimization problems in stochastic situations may be computationally difficult. In later sections, we will introduce other techniques to reduce the computation times for optimization in stochastic situations.

6.8 Stochastic Approximation Methods Stochastic approximation method can be regarded as the extension of the gradient search method in deterministic optimization to stochastic situations (Qi & Song, 2012; Rubinstein, 1986). It is particularly suitable to optimize a set of real variables. In the (si , S i ) threshold-type ECR policy, the control parameters can be regarded as continuous real variables. Therefore, the stochastic approximation algorithm can be applied to our problems. Suppose there are N depots in the system. Let s: = (s1 , s2 , …, sN , S 1 , S 2 , …, S N )T be a vector of control parameters to be optimized, and J(s) be the objective function under the control policy s. More specifically, let ω denote the sample process to represent the stochasticity of the dynamic system and L(s, ω) be the sample objective function. We have: J(s) = E L(s, ω). The stochastic approximation method takes the following iterative form: sk+1 =



(sk − γk · ∇ Jk )

(6.29)

150

6 Optimal ECR in General Inland Transportation Systems …

where  denotes a projection function to satisfy the constraint set of the control variables when the iteration leads to a point outside the set, sk is the parameter vector at the beginning of iteration k, γ k is a step size multiplier, and ∇J k is an estimator of the gradient ∇J(sk ), i.e., an estimate for the gradient of the objective function with respect to the decision variables, which is defined as ∇ J (sk ) := (∂ J (sk )/∂s1 , . . . , ∂ J (sk )/∂sn , ∂ J (sk )/∂ S1 , . . . , ∂ J (sk )/∂ S N )T (6.30) The multiplier γ k is a positive sequence of step sizes satisfying the following conditions: (i) it decreases to zero; (ii) the sum of all the sequence {γ k } is infinite; and (iii) the sum of its squares is bounded. Typically, the harmonic sequence 1/k satisfies all above assumptions for γ k . When ∇J k is an unbiased estimator of the gradient ∇J(sk ), the above stochastic approximation procedure (6.29) is called a Robbins–Monro (RM) algorithm. When a finite difference estimator is used, it is called a Kiefer–Wolfowitz (KW) algorithm (Rubinstein, 1986). RM algorithm generally has a faster convergence rate than the KW algorithm (Kleinman et al., 1999), but it requires direct measurements of the gradients, which is less obvious to obtain. The KW-type algorithm is much simpler. Take the parameter si as an example, by running two simulations under normal value si and perturbed parameter value si +  for the dynamic system, we can calculate the finite difference ratio to obtain the sensitivity estimate of the objective functions with respect to the parameter si . In order to obtain the gradient estimator, we have to do this for each control parameter, which means at least N + 1 times of simulation runs are required. Moreover, to approximate the objective function appropriately, multiple sample objective functions have to be evaluated to do averaging. In our problem, the KW-type algorithm may need modifications as that in Xu and Song (2021) due to the possible undifferentiability of the objective function. Instead of using the finite difference estimator directly, we can estimate the rightside finite difference and the left-side finite difference simultaneously via simulation. If both sides’ finite differences are positive, the corresponding element in ∇J k is set to zero. The rationale to repair the gradient estimator is that changing the corresponding parameters on either side will not reduce the objective function. If both sides’ finite differences are negative, the corresponding element in ∇J k is set to encourage the control parameter to move the steeper descending side. The rationale for this adjustment is based on the greedy strategy. The KW-type stochastic approximation algorithm requires to run at least N + 1 simulation experiments in order to get the N-dimensional gradient vector, which is time consuming considering the need for many iterations over many samples. More efficient stochastic approximation algorithms could be developed, e.g., perturbation analysis methods (Cassandras & Lafortune, 2008; Glasserman, 1991; Ho & Cao, 1991), which will be discussed in the next section.

6.9 Perturbation Analysis Methods

151

6.9 Perturbation Analysis Methods Perturbation Analysis (PA) can be defined as a technique to utilize a single sample path or experiment of the dynamic system to construct a perturbed sample path with slight changes in parameters. In essence, a single sample path (called nominal path) plus appropriate mathematical analysis on the sample path is able to yield much more information including gradients (Ho, 1987). In other words, a single simulation run can estimate the gradient of the sample objective function with respect to the control parameter vector. As a result, the number of simulation runs can be reduced from N + 1 to 1 compared to the finite difference estimator used in the KW algorithm. Among various PA algorithms, infinitesimal perturbation analysis (IPA) is the most efficient one in obtaining gradient estimates. It also has the advantages of simplicity and ease of implementation. Another advantage of IPA is that it estimates the gradients directly, rather than through finite differences. Hence, it has superior variance properties (Ho & Cao, 1991), which is very important in stochastic systems. In fact, IPA has been one of the most attractive tools for data-driven control and optimization, especially in stochastic dynamic systems where modeling random aspects of a process is prohibitively difficult (Kleinman et al., 1999; Marbach & Tsitsiklis, 2001; Song et al., 2001; Song & Sun, 1998; Wardi et al., 2018). Note that J(s) = EL(s, ω). The purpose of IPA is to obtain the gradient ∇L(s, ω) from observable data on the nominal path (i.e., the sample path with a specific realization of the stochastic process ω). However, what we are actually interested in is ∇J(s) = ∇EL(s, ω). Clearly, we need the following condition to ensure that the IPA gradient estimation is unbiased to the gradient of the objective function. ∇ E L(s, w) = E∇ L(s, w)

(6.31)

In general, if the sample objective function L(s, ω) has a jump at a point of s, the interchangeability of mathematical expectation and differentiation inherent in (6.31) would not hold at that point. More specifically, if a parameter change at s may cause a change in the order of events in the stochastic dynamic system, the sample objective function may have a jump (discontinuity) at s, and then the interchangeability of expectation and differentiation in (6.31) may hold only if the probability of such jumps in [s, s + s] is of the order 0 (s) (Wardi et al., 2018). Another intuitive interpretation of (6.31) is from the mathematical perspective. The interchangeability of differentiation with respect to s and integration (expectation) with respect to ω is roughly equivalent to the continuity of the function L(s, ω) in s with probability one (in ω). This view provides a practical way of judging whether the IPA gradient estimation is unbiased (Wardi et al., 2018). In our context, the nominal path refers to the sample path generated by the simulation model under the control parameters in vector s. Let s be a perturbed vector whose elements are the same as those in vector s except one of the elements has been changed to be s + s. Hence, the perturbed path refers to the sample path theoretically constructed from the nominal path under the control parameters in vector s .

152

6 Optimal ECR in General Inland Transportation Systems …

Namely, the perturbed path can be regarded as the sample path generated using the same model and same random seeds as the normal path. Let the perturbation amount of the control parameter (s) be sufficiently small in the sense that the sets of surplus depots and deficit depots are the same in both nominal and perturbed paths in every period (similar to Lee et al., 2012). The important step of the IPA is to construct the perturbed path from the nominal path. This involves the analysis of how the perturbation s affects the trajectories of the system states and the control actions. In other words, we need to identify how the perturbed path deviates from the nominal path. This can be done by analyzing the changes of the states and actions (i.e., x i,t and uij,t ) for all depots over the time periods. Such changes should be identified as perturbation generation and propagation rules, which can then be used to calculate the gradient estimator (Song et al., 2001). The application of IPA to optimize the control parameters in the threshold-type ECR policies can be illustrated in Fig. 6.10. In the literature, Lee et al. (2012) has applied IPA-based gradient estimation technique to optimize the container fleet size and empty container inventory levels in a multi-port transportation system. Fig. 6.10 Perturbation analysis-based stochastic approximation optimization

Input system parameters, iteration = 1 ECR repositioning policy, sample = 1

Simulation and Perturbation Analysis Simulate the nominal path Calculate sample objective function Construct the perturbed path Obtain sample IPA gradient estimate

Terminate samples Average sample objective functions Average sample IPA gradient estimates Update control parameters in ECR policy No Terminate iterations Yes Output performance measures

No

6.10 Ordinal Optimization Methods

153

6.10 Ordinal Optimization Methods Ordinal optimization (OO) is another approach to speed up the search process and save computational time. The approach was first proposed by Ho et al. (1992). There are two tenets underpinning the ordinal optimization method. The first tenet is “order is easier than value”, which states that it is much easier to estimate the relative order of two solutions rather than to estimate their value difference. Specifically, it is easier to determine whether or not alternative A is better than B than to determine the value between A and B (Ho et al., 2007). The second tenet is “nothing but the best is very costly” or “goal softening”, which states that instead of looking for the best solution, it settles for a solution that is good enough in the statistical sense. In the deterministic setting, the comparison of different solutions in a simulationbased search algorithm is essentially combined together with the evaluating process by calculating their performance difference (Fu, 2002). In the stochastic setting, the performance of a solution is obtained by averaging over a large number of sample objective functions. This offers an opportunity to apply the first tenet of the OO approach to better integrate the comparison and evaluation during the search process. Specifically, we can use a very small number of samples to evaluate the performances of the candidate solutions and then roughly compare their ordinal ranking so that a subset can be selected from the entire candidate set. This subset is more likely to include the best solution in the original entire set. The rationale can be explained by the following result (Ho et al., 2007). If a solution s1 is better than s2 , i.e., J(s1 ) = EL(s1 , ω) < J(s2 ) = EL(s2 , ω), then, Prob{L(s1 , ω) < L(s2 , ω)} ≥ Prob{L(s1 , ω) > L(s2 , ω)}

(6.32)

The above inequality indicates that the truly better solution has a larger chance to be better in the samples. The second tenet of OO approach is to retreat from “seeking the best solution” to a softer goal of being “good enough”, e.g., settle for any solution in the top-g choices. Let denote the entire set of the candidate solutions and assume the solutions have been ranked from the best to the worst. Let G denote the set of good enough solutions. In other words, any solution in G is satisfactory. The top-g percent solutions in are given by g=

|G| · 100% | |

(6.33)

where | . | represents the size of the set. Suppose we randomly pick up k solutions from . The probability that none of these selected solutions is in G will be (1 − g)k . Hence, the probability that at least one of the k solutions in G is given by 1 − (1 − g)k . Let this probability be denoted as Psat . Namely, Psat = Prob{| ∩ G| ≥ 1}. Then, we can estimate the value of k to ensure that: 1 − (1 − g)k ≥ Psat . It follows,

154

6 Optimal ECR in General Inland Transportation Systems …

k≥

ln(1 − Psat ) ln(1 − g)

(6.34)

From (6.34), we easily calculate the required values of k (candidate solutions) for different combinations of Psat and g. For example, if we want to make sure that at least one of the k solutions is desired to be in the top g = 0.1% of the solution space with no less than Psat = 99% probability, then we only need to let k take a value not less than 4603. If we want to make sure that at least one of the k solutions is desired to be in the top 0.01% of the solution space with no less than 99% probability, then k can take any number not less than 46,050. A few points should be emphasized: (i) the required value of k given in (6.34) is independent of the size of the entire solution space . This indicates that (6.34) is widely applicable; (ii) we simply assume that the k candidate solutions are randomly picked from the entire solution space . If a more appropriate selection mechanism is used, we could have a more favorable result, e.g., either reducing the value of k or increasing the quality of the solution (i.e., decreasing g) or increasing the probability to find the desirable solution (i.e., Psat ); (iii) the top-g solutions do not reveal the difference between the selected solutions and the best solution. The good solution is in the statistical sense. In summary, the spirit of OO approach is to seek good enough solutions with high probability instead of seeking the best solution for sure (Ho et al., 2007). The application of the OO approach together with simulation for stochastic systems can be illustrated in Fig. 6.11. In Fig. 6.11, the search-based optimization and the stochastic simulation are better integrated with the help of ordinal optimization, e.g., to achieve more efficient evaluation, comparison, and search processes by using appropriate ranking and selection strategies, and probability and statistical analysis (Fu, 2002). The OO approach can be combined with other methods. For example, Song et al. (2006) developed an OO-based elite genetic algorithm, which integrates the OO with the elite genetic algorithm. Fig. 6.11 Ordinal optimization with simulation for stochastic system

Search-based optimization

ECR policies (solutions)

Ordinal optimization

Stochastic simulation

Performance estimates

6.11 Summary and Notes

155

6.11 Summary and Notes This chapter consists of two parts. In the first part, we consider general inland transportation systems with multiple interconnected depots. On the one hand, there are laden and empty container flows between depots. On the other hand, each depot is facing external supply and demand of empty containers, which means empty containers may enter or exit the system at each depot. Three dynamic programming models are formulated based on periodic review mechanism. First, an inland transportation system with a set of homogeneous depots is considered, where inland depots and port depots are treated in the same way. Second, an inland transportation system with transfer ports is formulated, where ports play two roles: a depot role similar to an inland depot to serve local demand and supply, and a transfer point role that interfaces with overseas countries via seaborne transport. Third, an intermodal transportation system is considered, where inland depots and seaports are assumed to be connected with multiple transport modes or carriers that are constrained by carrying capacity. In the second part of this chapter, we present several methods to tackle the optimization problems for stochastic dynamic systems. Firstly, the approximate dynamic programming method is introduced. The approximate dynamic programming is able to overcome the challenge of the curses of dimensionality (Powell, 2009, 2011). In this regard, Lam et al. (2007) have applied approximate dynamic programming method to a two-port transport system by utilizing linear approximation architecture. In the literature, Bertsekas and Tsitsiklis (1996) took the control theory perspective and proposed the neuro-dynamic programming; Sutton and Barto (1998) took the artificial intelligence perspective and developed reinforcement learning. These two notions bear the similarity to the concept of stochastic dynamic programming. Secondly, simulation methods are discussed. Simulation is powerful to model and evaluate complex dynamic systems subject to uncertainty (Fu, 2015). It has been a cornerstone for many optimization methods. It is especially suitable to optimize parameterized control policies. Threshold-type ECR control policies have been shown to be optimal in simple inland transportation systems, e.g., single-depot or two-depot systems. Yun et al. (2011) extended the (s, S)-type threshold control policy to reposition empty containers over multiple time periods in an inland transportation system. Simulation-based optimization tool is used to optimize the control parameters. It is worth noting that in order to evaluate the performance of an ECR policy properly, it requires to run a large number of simulations due to the stochasticity of the system. Thirdly, metaheuristic optimization methods are introduced to optimize the parameterized ECR policies. Metaheuristics have the advantages of problemindependent algorithmic framework for wide application (Blum & Roli, 2003; Sorensen & Glover, 2013). Although in the literature metaheuristics are predominantly applied to deterministic optimization problems, their extension to stochastic systems is relatively straightforward. Dong and Song (2009) applied genetic algorithms to optimize a parameterized ECR policy and container fleet size in a liner

156

6 Optimal ECR in General Inland Transportation Systems …

shipping system. Dang et al. (2013) combined a generic algorithm with simulation to address the ECR problem in an inland transportation system with transfer ports. The ECR policies are characterized by a set of parameters and are evaluated by a simulation model. It should be noted that there is a multiplicative effect of computational complexity with respect to the number of iterations in the metaheuristic search procedure and the number of replications required in each iteration to evaluate the performance in stochastic systems. As a result, the computational burden of metaheuristic optimization for stochastic systems is extremely high. Fourthly, stochastic approximation method is discussed, which is a gradient-based search method and suitable for stochastic system optimization (Rubinstein, 1986). The key step of stochastic approximation is the estimation of the gradient of the objective function with respect to control parameters. Finite difference estimator is simple and easy to use, but not efficient because it requires multiple runs of the simulation to evaluate the sensitivity of the objective function to individual parameters and the gradient estimate is not necessarily unbiased. Fifthly, perturbation analysis method makes full use of a single sample path of the dynamic system to estimate the gradient of the objective function with respect to the control parameters (Cassandras & Lafortune, 2008; Glasserman, 1991; Ho & Cao, 1991). In particular, infinitesimal perturbation analysis has been proved to be very efficient and the unbiasedness of the gradient estimator can be established rigorously under certain conditions. Lee et al. (2012) have applied infinitesimal perturbation analysis techniques to optimize threshold types of ECR policies in a multi-port transport system. Sixthly, ordinal optimization method is another approach to speed up the search process and save computational time (Ho et al., 2007). It is based on two key ideas: ordinal comparison and goal softening. The first idea is to simplify the ranking and selection of good solutions, and the second idea is to soften the goal to target. The emphasis is to seek good enough solutions with high probability. Utilizing the ordinal optimization method, the search-based optimization and the stochastic simulation could be better integrated, e.g., to achieve more efficient evaluation, comparison, and search processes by using appropriate ranking and selection strategies, and probability and statistical analysis. Moreover, it is possible to integrate ordinal optimization with metaheuristics to improve search efficiency (Song et al., 2006).

References Banks, J., Carson, J. S., Nelson, B. L., & Nicol, D. M. (2000). Discrete event systems simulation (3rd ed.). Prentice Hall. Bertsekas, D., & Tsitsiklis, J. (1996). Neuro-dynamic programming. Athena Scientific. Blum, C., & Roli, A. (2003). Metaheuristics in combinatorial optimization: Overview and conceptual comparison. ACM Computing Surveys, 35(3), 268–308. Cassandras, C. G., & Lafortune, S. (2008). Introduction to discrete event systems (2nd ed.). Springer. Dang, Q. V., Nielsen, I., & Yun, W. Y. (2013). Replenishment policies for empty containers in an inland multi-depot system. Maritime Economics & Logistics, 15, 120–149.

References

157

Dong, J. X., & Song, D. P. (2009). Container fleet sizing and empty repositioning in liner shipping systems. Transportation Research Part E, 45(6), 860–877. Dorigo, M., Maniezzo, V., & Colorni, A. (1996). The ant system: Optimization by a colony of cooperating agents. IEEE Transactions on Systems, Man and Cybernetics Part-B, 26, 29–41. Eberhart, R., & Kennedy, J. (1995, October 4–6). A new optimizer using particle swarm theory. In Proceeding of the Sixth International Symposium on Micro Machine and Human Science (pp. 39–43), Nagoya. Fu, M. C. (1994). Optimization via simulation: A review. Annals of Operations Research, 53(1), 199–247. Fu, M. C. (2002). Optimization for simulation: Theory vs. practice. INFORMS Journal on Computing, 14(3), 192–215. Fu, M. C. (2015). Handbook of simulation optimization. Springer. Garrido, J. M. (1998). Practical process simulation: Using object-oriented techniques and C++. Artech House. Glasserman, P. (1991). Gradient estimation via perturbation analysis. Kluwer Academic Publication. Glover, F. (1977). Heuristics for integer programming using surrogate constraints. Decision Sciences, 8(1), 156–166. Glover, F. (1989). Tabu search—Part I. ORSA Journal on Computing, 1(3), 190–206. Goldberg, D. E. (1989). Genetic algorithms in search, optimization and machine learning. AddisonWesley. Ho, Y. C. (1987). Performance evaluation and perturbation analysis of discrete systems: Perspective and open problems. IEEE Transactions on Automatic Control, 32, 563–572. Ho, Y. C., & Cao, X. R. (1991). Perturbation analysis of discrete event dynamic systems. Kluwer Academic Publication. Ho, Y. C., Sreenivas, R., & Vakili, P. (1992). Ordinal optimization of DEDS. Discrete Event Dynamic Systems: Theory and Applications, 2, 61–88. Ho, Y. C., Zhao, Q. C., & Jia, Q. S. (2007). Ordinal optimization: Soft optimization for hard problems. Springer. Karaboga, D. (2010). Artificial bee colony algorithm. Scholarpedia, 5(3), 6915. Kirkpatrick, S., Gelatt, C. D., & Vecchi, M. P. (1983). Optimization by simulated annealing. Science, 220(4598), 671–679. Kleinman, N. L., Spall, J. C., & Naiman, D. Q. (1999). Simulation-based optimization with stochastic approximation using common random numbers. Management Science, 45(11), 1570–1578. Lam, S. W., Lee, L. H., & Tang, L. C. (2007). An approximate dynamic programming approach for the empty container allocation problem. Transportation Research Part C, 15(4), 265–277. Lee, L. H., Chew, E. P., & Luo, Y. (2012). Empty container management in multi-port system with inventory-based control. International Journal on Advances in Systems and Measurements, 5, 164–177. Marbach, P., & Tsitsiklis, J. N. (2001). Simulation-based optimization of Markov reward processes. IEEE Transactions on Automatic Control, 46(2), 191–209. Powell, W. B. (2009). What you should know about approximate dynamic programming. Naval Research Logistics, 56(3), 239–249. Powell, W. B. (2011). Approximate dynamic programming: Solving the curses of dimensionality (2nd ed.). Wiley. Qi, X., & Song, D.-P. (2012). Minimizing fuel emissions by optimizing vessel schedules in liner shipping with uncertain port times. Transportation Research Part E, 48(4), 863–880. Rubinstein, R. Y. (1986). Monte Carlo optimization. Wiley. Song, D. P., Hicks, C., & Earl, C. F. (2001). Setting planned job release times in stochastic assembly systems with resource constraints. International Journal of Production Research, 39(6), 1289– 1301.

158

6 Optimal ECR in General Inland Transportation Systems …

Song, D. P., Hicks, C., & Earl, C. F. (2006). An ordinal optimization based evolution strategy to schedule complex make-to-order products. International Journal of Production Research, 44(22), 4877–4895. Song, D. P., & Sun, Y. X. (1998). Gradient estimate for parameter design of threshold controllers in a failure-prone production line. International Journal of Systems Science, 29(1), 21–32. Sorensen, K., & Glover, F. (2013). Metaheuristics. In S. I. Gass & M. Fu (Eds.), Encyclopedia of operations research and management science (pp. 960–970). Springer. Sutton, R., & Barto, A. (1998). Reinforcement learning. The MIT Press. Wardi, Y., Cassandras, C. G., & Cao, X. R. (2018). Perturbation analysis: A framework for datadriven control and optimization of discrete event and hybrid systems. Annual Reviews in Control, 45, 267–280. Xu, W., & Song, D. P. (2021). Integrated optimization for production capacity, raw material ordering and production planning under time and quantity uncertainties based on two case studies. Operational Research (in press). Yun, W. Y., Lee, Y. M., & Choi, Y. S. (2011). Optimal inventory control of empty containers in inland transportation system. International Journal of Production Economics, 133(1), 451–457. Zhao, Y., Xue, Q., & Zhang, X. (2018). Stochastic empty container repositioning problem with CO2 emission considerations for an intermodal transportation system. Sustainability, 10, 4211.

Chapter 7

Conclusions

Abstract This chapter first summarizes the main findings from the previous chapters and highlights the managerial insights to assist ECR decision making. Then, the limitations and further research opportunities are discussed.

7.1 Conclusions and Managerial Insight for ECR Empty equipment logistics management is a common issue in practice. It arises whenever an operator needs to manage a fleet of reusable equipment over space and time. Typical examples include empty freight vehicle redistribution, empty passage vehicle redistribution, empty bike repositioning, empty container chassis repositioning, and ECR. The commonality of various empty equipment logistics problems is to allocate the fleet of empty equipment over space and over time to satisfy external uncertain demands, in which dynamic operations and stochasticity are two key characteristics. Due to the commonality, the models and the managerial insights generated from ECR could be extended to other empty equipment repositioning problems. Nevertheless, ECR has a few characteristics that differ from other types of empty equipment repositioning problems, e.g., (i) it exists in two levels: global ECR and regional ECR, and these two levels are related as they are parts of the entire container shipping supply chain; (ii) empty containers may be repositioned in a single unit or in bulk. They can be moved using different transport modes; (iii) empty containers may exit the system and reenter the system in a random way because customers may hold the containers as a warehouse for an uncertain period of time; (iv) there is a high level of uncertainty from a variety of sources such as demand, supply, transit, and breakdown; (v) up to 50% of world container fleet is owned by container lessors. This implies that the leasing activity is an important action in ECR. In the inland transport networks, the movements of empty containers account for over 40% of total transported containers (Boile et al., 2008; Braekers et al., 2011). This indicates the severity of the regional ECR problems. The main factors that cause ECR include the imbalanced demand, dynamic operations, various uncertainties, seasonality, container types, lack of visibility in the hinterland transport, container

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 D.-P. Song and J. Dong, Modelling Empty Container Repositioning Logistics, https://doi.org/10.1007/978-3-030-93383-8_7

159

160

7 Conclusions

manufacturing, and leasing cost, and ocean carrier’ operational and strategic practices, e.g., blank sailing and slow steaming (Song & Dong, 2015). To model the ECR problem at the regional level, it is desirable to incorporate the most important features such as demand imbalance, dynamic operations, uncertainty, and leasing activity. In particular, dynamic operations imply the need for sequential decision making on empty container management when new information becomes available over time. Uncertainty implies that our current decisions have to anticipate the impact of future unpredictable factors. Using random variables to represent various uncertainties in the system, it is believed that stochastic dynamic programming is probably the most appropriate modeling approach to the ECR problems for optimizing the sequential decisions under uncertainty. We have taken the inventory control perspective to address the ECR logistics in regional transportation systems that explicitly consider the features such as demand imbalance over space, dynamic operations over time, multiple uncertainties, and leasing activity. The adoption of inventory control perspective to manage empty containers is supported by industrial practices (e.g., the concept of container available index at https://container-xchange.com). Our emphasis is to seek effective dynamic ECR policies in anticipation of future uncertainty. In Chap. 2, the ECR problem in a single-depot system is considered. Two situations are investigated in detail: the discrete-time sequential ECR decision-making situation, and the continuous-time continuous-state sequential ECR decision-making situation. A discrete-time dynamic programming model and a fluid-flow model are presented to describe the underlying dynamics and stochasticity of the systems respectively. The main findings include: (i) in the discrete-time situation, it is shown that the optimal ECR policy can be characterized by two threshold parameters at each period in the form of (s, S) inventory control structure; and the closed-forms of two threshold parameters in the final period are obtained; (ii) in the continuoustime continuous-state situation, we obtain the closed-form solution of the objective function to the optimal ECR problem under two-state Markov demand process; the explicit structure of the optimal ECR policy is characterized by two threshold values; the fluid-flow model is then extended to more general cases with multiple-state Markov demand process; numerical examples verify the analytical results. In Chap. 3, the ECR problem in a two-depot system under periodic review scheme is considered, in which each depot faces independent supply and demand of empty containers. Only empty containers are moving between two depots to meet external random demands. The main findings are: (i) the optimal ECR policy is shown to have a region-wise structure, which is characterized by two switching curves in the inventory state space; (ii) the structural properties of the two switching curves such as monotonicity and asymptotic behaviors are established; (iii) simple threshold-type ECR policies are constructed based on the structural properties of two switching curves; the proposed ECR policies are near-optimal and easy-to-operate; (iv) numerical examples illustrate the analytical results and demonstrate the effectiveness of the proposed threshold-type policy. In Chap. 4, the ECR problem in a two-depot shuttle service system under continuous review scheme is considered, in which containers are transported in a single

7.1 Conclusions and Managerial Insight for ECR

161

unit. Both laden and empty containers are moving between two depots. Based on continuous review and discrete state of the inventory levels at two depots, an eventdriven model is formulated. Under the assumption of Poisson arrival process of laden containers and exponential distribution of empty container transfer times, the continuous-time Markov decision process is converted into an equivalent discretetime Markov decision process by using the uniformization technique. The main findings are: (i) it is shown that the optimal ECR policy is a threshold policy, characterized by two control parameters, in both the discounted cost and the long-run average cost cases; (ii) the closed-form solution to the optimal discounted cost function is obtained by using the characteristic equation method; (iii) the closed-form solution to the optimal long-run average cost function is obtained by calculating the stationary distribution under the threshold control policy; (iv) numerical examples illustrate the effectiveness of the models and analytical results; (v) the model is extended to the cases with external supply and demand of empty containers at both depots, where empty containers may exit and enter the two-depot shuttle system randomly. In Chap. 5, the ECR problem in a hub-and-spoke transportation system under continuous review scheme is considered, which is an extension of the system in Chap. 4. Both laden and empty containers are moving on a single unit basis between the hub depot and each of the spoke depots. Under the assumption of Poisson arrival process of laden containers and exponential distribution of empty container transfer times, a discrete-time Markov decision process is formulated using the uniformization technique. The main findings are: (i) a dynamic decomposition procedure is presented, which leads to a dynamic decomposition ECR policy. The computational complexity is linear in the number of spokes and can be calculated offline; (ii) the dynamic decomposition ECR policy has the same asymptotic behaviors as the optimal ECR policy; (iii) the proposed dynamic decomposition procedure can be applied to both discounted cost and long-run average cost cases; (iv) numerical examples demonstrate the effectiveness of the dynamic decomposition ECR policy and its robustness against the assumptions of the distribution types; (v) the model is extended to the cases with external supply and demand of empty containers at all depots, where empty containers may exit and enter the system randomly. In Chap. 6, the ECR problems in general inland transportation systems with multiple interconnected depots under periodic review schemes are considered. Both laden and empty containers are moving between depots. Moreover, each depot is facing external supply and demand of empty containers, which means empty containers may enter or exit the system at each depot. The main results are: (i) three ECR optimization models are formulated. First, an inland transportation system with a set of homogeneous depots is considered, where inland depots and port depots are treated in the same way. Second, an inland transportation system with transfer ports is examined, where ports play two roles: a depot role similar to an inland depot to serve local demand and supply, and a transfer point role that interfaces with overseas countries via seaborne transport. Third, an intermodal transportation system is examined, where inland depots and seaport depots are assumed to be connected with multiple transport modes (or carriers) that are constrained by finite capacity; (ii) to overcome

162

7 Conclusions

the complexity of solving the stochastic dynamic models, a range of solution methods are presented. The structural properties of the optimal ECR policies in previous chapters could be utilized to construct parameterized ECR policies for complex systems. Optimization methods can then be used to seek the optimal control parameters. Specifically, we discuss the applications of approximate dynamic programming methods, simulation methods, metaheuristic optimization methods, stochastic approximation methods, perturbation analysis methods, and ordinal optimization methods. Their relative advantages and disadvantages are explained. In summary, this book mainly applies stochastic dynamic programming method to model the ECR problems for optimizing the decision-making process in the presence of uncertainties that are realized over time. For relatively simple transportation systems, the structural properties of the optimal ECR policies have been explicitly established. In Chaps. 2 and 4, we actually obtained the closed-form solutions to the optimal ECR problems. The structural properties of the optimal ECR policies can be utilized to construct threshold-type ECR policies for more complicated transportation systems. The threshold-type ECR policies resemble the (s, S) or (s, Q) policy in inventory control theory. It has the advantages of being decentralized control, easy to understand, easy to operate, dynamically responding to random events, and minimal online computation and communication. The rule-based nature of the threshold-type ECR policies makes it robust to the uncertainty and is suitable for real-time control of empty containers. Although the focus of this book is on regional inland ECR problems, the models and results of the ECR policies could be generalized to some global seaborne ECR problems (e.g., Dong & Song, 2009; Song, 2007a).

7.2 Limitations and Further Research This book has a number of limitations or assumptions to model and solve the ECR problems. Further research opportunities include the relaxation of the assumptions and the investigation of other interesting issues and topics that are closely related to ECR. Several directions are given below. Global ECR and regional ECR: The end-to-end container shipping supply chain covers both global seaborne transport and regional inland transport. Ideally, regional ECR problems should be integrated with the global ECR problems. This book is limited to regional ECR problems. Although there have been some studies related to ECR incorporating global seaborne transport and regional inland transport (e.g., Dong & Song, 2012a; Epstein et al., 2012; Kuzmicz & Pesch, 2019), more research is required. However, one of the challenges is that global transport and regional transport have different planning horizons, and they may face different nature uncertainties that incur difficulty to represent them appropriately. Multiple periods of lead times: In Chaps. 4 and 5, the transportation times between depots are assumed to be random with exponential distributions; in Chaps. 3 and 6, the transportation time is assumed to be one period. In practice, the transportation time may follow other types of distributions and may span over multiple time periods.

7.2 Limitations and Further Research

163

There have been some studies on the optimization of the threshold-type inventory control policies for ECR in inland transportation systems with multiple periods of lead times (e.g., Dang et al., 2013; Yun et al., 2011). However, the optimality of threshold-type ERC policies in those systems requires further investigation. Multiple types of containers with partial substitution: We have assumed a single type of container in the models. This is reasonable since shipping container is fairly standard and different types of containers may not necessarily substitutable (because different types of containers carry different types of commodities, and shippers have preference for certain type of containers). However, it is possible that sometimes shippers are willing to accept different types of containers, e.g., two 20-foot containers instead of one 40-foot container when the 40-foot container is not available (Chang et al., 2008). In addition, multiple types of containers add another layer of decision in terms of which type of containers should be repositioned in the ECR problems. Therefore, it is worthwhile to investigate the ECR problems with multiple types of containers considering shippers’ behaviors of partial container substitution (Olivo et al., 2013). Multiple stops in a single trip: If the transport vehicle can carry more than one container, it is possible that a single ECR trip may have multiple stops, which requires the additional routing decision (Funke & Kopfer, 2016). The routing decision introduces the network effect on the ECR problem, which becomes more challenging and deserves more research. Empty vehicle redistribution and ECR: Containers are carried by moving vehicles over inland transport networks. After the delivery of containers, the vehicle might have an empty journey. On the one hand, there is an ECR problem, which is under the management of container owners; on the other hand, there is empty vehicle (or rail-car) redistribution problem, which is under the management of inland carriers. These two types of problems are interwoven. However, no research has been done to model the ECR and empty vehicle redistribution together. Optimality of inventory-based policies: The optimality of the ECR policies and the structure of the optimal ECR policies have only been established for single-depot or two-depot systems in this book. Further research could be conducted to investigate the structural properties of the optimal ECR policies for more complicated inland transportation systems. In addition, more sophisticated inventory-based policies could be developed and evaluated by incorporating the concepts of Kanban and echelon base stock in traditional inventory control theory (Lee & Song, 2017; Song, 2021a). Joint optimization with other decisions: The inland ECR problem is highly related to other management decisions, e.g., inland-depot locations, container fleet sizing, leasing terms, laden container pricing. Determine an optimal set of inland depots to be opened (Chen et al., 2016; Dong & Song, 2012b; Lu et al., 2020; Mittal et al., 2013). Most of the above decisions are made at strategic planning level, whereas ECR is usually at operational or real-time level. Therefore, a hierarchical planning framework is probably needed. The leasing decisions could be more explicitly modeled, e.g., the restriction on the amount of leased empty containers at depots, the

164

7 Conclusions

constraints on off-leasing activities, and the contracting between shipping company and container lessors. Detention and maintenance: Containers may be damaged during the transferring and handling processes. The ECR problems will involve the movements of both non-damaged containers and damaged containers (Hjortnaes et al., 2017). There may be additional decisions on where to perform the maintenance for the damaged containers. When the laden containers reach customers, it is not unusual that customers may hold the containers longer than the agreed period of time. This phenomenon is called detention. How to set up free detention time and how to price the detention charge for over-holding have a great impact on the return of empty containers and the management of empty containers (Yu et al., 2018). Over-holding is particularly common in tank containers because customers may not have relevant facility to store the liquid and use the tank container as temporary storage. A recent study showed that both demurrage and detention have gone up by 104% in 2021 compared to 2020 based on the world’s top 20 container ports. On average, demurrage and detention charges exceeded $1200 per container across container types after two weeks in 2021. This is a substantial burden on shippers on top of sky-rocketing freight rate (Waters, 2021). Environmental impact and sustainability: Empty container movements not only incur economic implications, but also environmental and social impacts, e.g., CO2 emissions, road and port congestions, pollutions, and accidents (Bernat et al., 2016; Zhao et al., 2018). With the increasing concern on the sustainability of transport and logistics, the traditional externality of environmental and social impacts could be incorporated within the ECR optimization models. Efficient ECR could play an important role to achieve International Maritime Organization’s decarbonization targets (Song, 2021b). Vertical integration and system analysis: The regional ECR repositioning is a complex problem because of the involvement of a range of stakeholders that are interacting with each other but have conflicting objectives (Gusaha et al., 2019). On the one hand, it is reasonable to take the system perspective and use an agent-based model to simulate the logistics processes and the interactions of the complex system. As a result, different stakeholders’ behaviors and their interactions could be more appropriately modeled. On the other hand, vertical integration along the container shipping supply chain would achieve best performance from the entire supply chain perspective. However, this is challenging because container shipping is still rather fragmented although much effort has been committed towards vertical integration (Song, 2021b). Nevertheless, with the development of digitalization and artificial intelligence such as machine learning, there are increasing rooms and opportunities for vertical integration and achieving win–win situations. Horizontal integration and cross channel management: ECR is an industry-wide problem. Almost all shipping companies have to face this logistics challenge. It has been recognized that horizontal integration across different container owners could better utilize the container fleets. For example, neutral online platforms have been proposed to facilitate empty container exchanges between shipping companies so that one company’s empty container can be used to match another shipping company’s

7.2 Limitations and Further Research

165

demand. Research has shown the economic benefits of container sharing and collaboration between carriers (e.g., Song, 2007b; Sterzik et al., 2015; Uddin & Huynh, 2020). However, one challenge is that these companies are often competitors and may not be willing to share the container fleets, which are regarded as critical assets. The conservative behavior and the risk-averse attitude of shipping companies are among the main barriers to prevent them from integration. Impact of shipping companies’ strategy and ports’ policies: In the first half of 2020, most shipping companies adopted the blank sailing strategy in response to the decreasing demand due to COVID-19, which left many empty containers in European and North American ports. When trade demand suddenly picked up in the second half of 2020, almost all container ports in China suffered severe shortage of empty containers. Moreover, some European ports have adopted a policy of refusing empty containers entering the ports due to high yard density and congestion, which further complicates the ECR problem along the container shipping supply chain (Song, 2021b). Handling disruptive events such as pandemic: Regular uncertainty may be described by random variables based on historical data. Disruptive events are often one-off events without historical data and are difficult to model using probability distribution. The disruptive events such as the occurrence of COVID-19, the Suez Canal blockage by the large containership in March 2021, and the partial closedown of Shenzhen port in June 2021 have evidenced the serious ripple effects of the delays and congestion along the global supply chains. How to tackle the ECR problems under disruptive events deserves more research.

References Bernat, N. S., Schulte, F., Vos, S., & Bose, J. (2016). Empty container management at ports considering pollution, repair options, and street-turns. Mathematical Problems in Engineering, 2016, 3847163. Boile, M., Theofanis, S., Baveja, A., & Mittal, N. (2008). Regional repositioning of empty containers: Case for inland depots. Transportation Research Record, 2066(1), 31–40. Braekers, K., Janssens, G. H., & Caris, A. (2011). Challenges in managing empty container movements at multiple planning levels. Transport Review, 31(6), 681–708. Chang, H., Jula, H., Chassiakos, A., & Ioannou, P. (2008). A heuristic solution for the empty container substitution problem. Transportation Research Part E, 44(2), 203–216. Chen, R. Y., Dong, J. X., & Lee, C. Y. (2016). Pricing in a shipping market with waste shipments and empty container repositioning. Transportation Research Part B, 85, 32–55. Dang, Q. V., Nielsen, I. E., & Yun, W. Y. (2013). Replenishment policies for empty containers in an inland multi-depot system. Maritime Economics and Logistics, 15(1), 120–149. Dong, J. X., & Song, D. P. (2009). Simulation-based optimization for container fleet sizing and empty repositioning in liner shipping systems. Transportation Research Part E, 45(6), 860–877. Dong, J. X., & Song, D. P. (2012a). Quantifying the impact of inland transport times on container fleet sizing in liner shipping services with uncertainties. OR Spectrum, 34(1), 155–180. Dong, J. X., & Song, D.P. (2012b). Lease term optimization in container shipping systems. International Journal of Logistics Research and Applications, 15(2), 87–107.

166

7 Conclusions

Epstein, R., Neely, A., Weintraub, A., Valenzuela, F., Hurtado, S., González, G., Beiza, A., Naveas, M., Infante, F., Alarcón, F., Angulo, G., Berner, C., Catalán, J., González, C., & Yung, D. (2012). A strategic empty container logistics optimization in a major shipping company. Interfaces, 42(1), 5–16. Funke, J., & Kopfer, H. (2016). A model for a multi-size inland container transportation problem. Transportation Research Part E, 89, 70–85. Gusaha, L., Cameron-Rogersa, R., & Thompson, R. G. (2019). A systems analysis of empty container logistics—A case study of Melbourne, Australia. Transportation Research Procedia, 39, 92–103. Hjortnaes, T., Wiegmans, B., Negenborn, R. R., Zuidwijk, A., & Klijnhout, R. (2017). Minimizing cost of empty container repositioning in port hinterlands, while taking repair operations into account. Journal of Transport Geography, 58, 209–219. Kuzmicz, K. A., & Pesch, E. (2019). Approaches to empty container repositioning problems in the context of Eurasian intermodal transportation. Omega, 85, 194–213. Lee, C. Y., & Song, D. P. (2017). Ocean container transport in global supply chains: Overview and research opportunities. Transportation Research Part B, 95, 442–474. Lu, T., Lee, C. Y., & Lee, L. H. (2020). Coordinating pricing and empty container repositioning in two-depot shipping systems. Transportation Science, 54(6), 1439–1731. Mittal, N., Boile, M., Baveja, A., & Theofanis, S. (2013). Determining optimal inland-emptycontainer depot locations under stochastic demand. Research in Transportation Economics, 42(1), 50–60. Olivo, A., Di Francesco, M., & Zuddas, P. (2013). An optimization model for the inland repositioning of empty containers. Maritime Economics & Logistics, 15, 309–331. Song, D. P. (2007a). Characterizing optimal empty container reposition policy in periodic-review shuttle service systems. Journal of the Operational Research Society, 58(1), 122–133. Song, D. P. (2007b, June 24–28). Analysis of a collaborative strategy in container fleet management. In The 11th World Conference on Transport Research, University of California, Berkeley. Song, D. P. (2021a). Container logistics and maritime transport. Routledge. Song, D. P. (2021b). A literature review, container shipping supply chain: Planning problems and research opportunities. Logistics, 5, 41. Song, D. P., & Dong, J. X. (2015). Empty container repositioning. In C. Y. Lee & Q. Meng (Eds.), Handbook of ocean container transport logistics-making global supply chain effective (pp. 163– 208). Springer. Sterzik, S., Kopfer, H., & Yun, W. Y. (2015). Reducing hinterland transportation costs through container sharing. Flexible Services and Manufacturing Journal, 27, 382–402. Uddin, M., & Huynh, N. (2020). Model for collaboration among carriers to reduce empty container truck trips. Information, 11(8), 377. Waters, W. (2021, June 29). Demurrage and detention charges double in a year. Lloyds Loading List, Tuesday. Yu, M., Fransoo, J. C., & Lee, Y. C. (2018). Detention decisions for empty containers in the hinterland transportation system. Transportation Research Part B, 110, 188–208. Yun, W. Y., Lee, Y. M., & Choi, Y. S. (2011). Optimal inventory control of empty containers in inland transportation system. International Journal of Production Economics, 133(1), 451–457. Zhao, Y., Xue, Q., & Zhang, X. (2018). Stochastic empty container repositioning problem with CO2 emission considerations for an intermodal transportation system. Sustainability, 10, 4211.