Software Engineering Research and Practice [1 ed.] 9781683924463, 9781601324894

This book contains the proceedings of the 2018 International Conference on Software Engineering Research and Practice (S

229 78 40MB

English Pages 257 Year 2015

Report DMCA / Copyright

DOWNLOAD PDF FILE

Recommend Papers

Software Engineering Research and Practice [1 ed.]
 9781683924463, 9781601324894

  • 0 0 0
  • Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up
File loading please wait...
Citation preview

PROCEEDINGS OF THE 2018 INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING RESEARCH & PRACTICE

SERP’18 Editors Hamid R. Arabnia Leonidas Deligiannidis, Fernando G. Tinetti

Publication of the 2018 World Congress in Computer Science, Computer Engineering, & Applied Computing (CSCE’18) July 30 - August 02, 2018 | Las Vegas, Nevada, USA https://americancse.org/events/csce2018

Copyright © 2018 CSREA Press

This volume contains papers presented at the 2018 International Conference on Software Engineering Research & Practice. Their inclusion in this publication does not necessarily constitute endorsements by editors or by the publisher.

Copyright and Reprint Permission Copying without a fee is permitted provided that the copies are not made or distributed for direct commercial advantage, and credit to source is given. Abstracting is permitted with credit to the source. Please contact the publisher for other copying, reprint, or republication permission.

American Council on Science and Education (ACSE)

Copyright © 2018 CSREA Press ISBN: 1-60132-489-8 Printed in the United States of America https://americancse.org/events/csce2018/proceedings

Foreword It gives us great pleasure to introduce this collection of papers to be presented at the 2018 International Conference on Software Engineering Research & Practice (SERP’18), July 30 – August 2, 2018, at Luxor Hotel (a property of MGM Resorts International), Las Vegas, USA. An important mission of the World Congress in Computer Science, Computer Engineering, and Applied Computing, CSCE (a federated congress to which this conference is affiliated with) includes "Providing a unique platform for a diverse community of constituents composed of scholars, researchers, developers, educators, and practitioners. The Congress makes concerted effort to reach out to participants affiliated with diverse entities (such as: universities, institutions, corporations, government agencies, and research centers/labs) from all over the world. The congress also attempts to connect participants from institutions that have teaching as their main mission with those who are affiliated with institutions that have research as their main mission. The congress uses a quota system to achieve its institution and geography diversity objectives." By any definition of diversity, this congress is among the most diverse scientific meeting in USA. We are proud to report that this federated congress has authors and participants from 67 different nations representing variety of personal and scientific experiences that arise from differences in culture and values. As can be seen (see below), the program committee of this conference as well as the program committee of all other tracks of the federated congress are as diverse as its authors and participants. The program committee would like to thank all those who submitted papers for consideration. About 50% of the submissions were from outside the United States. Each submitted paper was peer-reviewed by two experts in the field for originality, significance, clarity, impact, and soundness. In cases of contradictory recommendations, a member of the conference program committee was charged to make the final decision; often, this involved seeking help from additional referees. In addition, papers whose authors included a member of the conference program committee were evaluated using the double-blinded review process. One exception to the above evaluation process was for papers that were submitted directly to chairs/organizers of pre-approved sessions/workshops; in these cases, the chairs/organizers were responsible for the evaluation of such submissions. The overall paper acceptance rate for regular papers was 19%; 23% of the remaining papers were accepted as poster papers (at the time of this writing, we had not yet received the acceptance rate for a couple of individual tracks.) We are very grateful to the many colleagues who offered their services in organizing the conference. In particular, we would like to thank the members of Program Committee of SERP’18, members of the congress Steering Committee, and members of the committees of federated congress tracks that have topics within the scope of SERP. Many individuals listed below, will be requested after the conference to provide their expertise and services for selecting papers for publication (extended versions) in journal special issues as well as for publication in a set of research books (to be prepared for publishers including: Springer, Elsevier, BMC journals, and others). • • • • • •

Prof. Nizar Al-Holou (Congress Steering Committee); Professor and Chair, Electrical and Computer Engineering Department; Vice Chair, IEEE/SEM-Computer Chapter; University of Detroit Mercy, Detroit, Michigan, USA Prof. Hamid R. Arabnia (Congress Steering Committee); Graduate Program Director (PhD, MS, MAMS); The University of Georgia, USA; Editor-in-Chief, Journal of Supercomputing (Springer); Fellow, Center of Excellence in Terrorism, Resilience, Intelligence & Organized Crime Research (CENTRIC). Dr. Travis Atkison; Director, Digital Forensics and Control Systems Security Lab, Department of Computer Science, College of Engineering, The University of Alabama, Tuscaloosa, Alabama, USA Prof. Dr. Juan-Vicente Capella-Hernandez; Universitat Politecnica de Valencia (UPV), Department of Computer Engineering (DISCA), Valencia, Spain Prof. Kevin Daimi (Congress Steering Committee); Director, Computer Science and Software Engineering Programs, Department of Mathematics, Computer Science and Software Engineering, University of Detroit Mercy, Detroit, Michigan, USA Prof. Zhangisina Gulnur Davletzhanovna; Vice-rector of the Science, Central-Asian University, Kazakhstan, Almaty, Republic of Kazakhstan; Vice President of International Academy of Informatization, Kazskhstan, Almaty, Republic of Kazakhstan

• • • • •



• • • • • • • • • •

• • •

• •

Prof. Leonidas Deligiannidis (Congress Steering Committee); Department of Computer Information Systems, Wentworth Institute of Technology, Boston, Massachusetts, USA; Visiting Professor, MIT, USA Prof. Mary Mehrnoosh Eshaghian-Wilner (Congress Steering Committee); Professor of Engineering Practice, University of Southern California, California, USA; Adjunct Professor, Electrical Engineering, University of California Los Angeles, Los Angeles (UCLA), California, USA Dr. Sam Hosseini; Chief Knowledge Officer, Synrgix corporation, Palo Alto, California, USA; and Senior Software Development Manager, Routing Business Unit, Juniper Networks, San Jose, California, USA Prof. Byung-Gyu Kim (Congress Steering Committee); Multimedia Processing Communications Lab.(MPCL), Department of Computer Science and Engineering, College of Engineering, SunMoon University, South Korea Prof. Louie Lolong Lacatan; Chairperson, Computer Engineerig Department, College of Engineering, Adamson University, Manila, Philippines; Senior Member, International Association of Computer Science and Information Technology (IACSIT), Singapore; Member, International Association of Online Engineering (IAOE), Austria Dr. Vitus S. W. Lam; Senior IT Manager, Information Technology Services, The University of Hong Kong, Kennedy Town, Hong Kong; Chartered Member of The British Computer Society, UK; Former Vice Chairman of the British Computer Society (Hong Kong Section); Chartered Engineer & Fellow of the Institution of Analysts and Programmers Dr. Andrew Marsh (Congress Steering Committee); CEO, HoIP Telecom Ltd (Healthcare over Internet Protocol), UK; Secretary General of World Academy of BioMedical Sciences and Technologies (WABT) a UNESCO NGO, The United Nations Prof. Dr., Eng. Robert Ehimen Okonigene (Congress Steering Committee); Department of Electrical & Electronics Engineering, Faculty of Engineering and Technology, Ambrose Alli University, Nigeria Prof. James J. (Jong Hyuk) Park (Congress Steering Committee); Department of Computer Science and Engineering (DCSE), SeoulTech, Korea; President, FTRA, EiC, HCIS Springer, JoC, IJITCC; Head of DCSE, SeoulTech, Korea Prof. Dr. R. Ponalagusamy; Department of Mathematics, National Institute of Technology, India Prof. Abd-El-Kader Sahraoui; Toulouse University and LAAS CNRS, Toulouse, France Dr. Akash Singh (Congress Steering Committee); IBM Corporation, Sacramento, California, USA; Chartered Scientist, Science Council, UK; Fellow, British Computer Society; Member, Senior IEEE, AACR, AAAS, and AAAI; IBM Corporation, USA Chiranjibi Sitaula; Head, Department of Computer Science and IT, Ambition College, Kathmandu, Nepal Ashu M. G. Solo (Publicity), Fellow of British Computer Society, Principal/R&D Engineer, Maverick Technologies America Inc. Prof. Fernando G. Tinetti (Congress Steering Committee); School of Computer Science, Universidad Nacional de La Plata, La Plata, Argentina; also at Comision Investigaciones Cientificas de la Prov. de Bs. As., Argentina Prof. Hahanov Vladimir (Congress Steering Committee); Vice Rector, and Dean of the Computer Engineering Faculty, Kharkov National University of Radio Electronics, Ukraine and Professor of Design Automation Department, Computer Engineering Faculty, Kharkov; IEEE Computer Society Golden Core Member; National University of Radio Electronics, Ukraine Varun Vohra; Certified Information Security Manager (CISM); Certified Information Systems Auditor (CISA); Associate Director (IT Audit), Merck, New Jersey, USA Dr. Haoxiang Harry Wang (CSCE); Cornell University, Ithaca, New York, USA; Founder and Director, GoPerception Laboratory, New York, USA Prof. Shiuh-Jeng Wang (Congress Steering Committee); Director of Information Cryptology and Construction Laboratory (ICCL) and Director of Chinese Cryptology and Information Security Association (CCISA); Department of Information Management, Central Police University, Taoyuan, Taiwan; Guest Ed., IEEE Journal on Selected Areas in Communications. Prof. Layne T. Watson (Congress Steering Committee); Fellow of IEEE; Fellow of The National Institute of Aerospace; Professor of Computer Science, Mathematics, and Aerospace and Ocean Engineering, Virginia Polytechnic Institute & State University, Blacksburg, Virginia, USA Prof. Jane You (Congress Steering Committee); Associate Head, Department of Computing, The Hong Kong Polytechnic University, Kowloon, Hong Kong

We would like to extend our appreciation to the referees, the members of the program committees of individual sessions, tracks, and workshops; their names do not appear in this document; they are listed on the web sites of individual tracks.

As Sponsors-at-large, partners, and/or organizers each of the followings (separated by semicolons) provided help for at least one track of the Congress: Computer Science Research, Education, and Applications Press (CSREA); US Chapter of World Academy of Science; American Council on Science & Education & Federated Research Council (http://www.americancse.org/). In addition, a number of university faculty members and their staff (names appear on the cover of the set of proceedings), several publishers of computer science and computer engineering books and journals, chapters and/or task forces of computer science associations/organizations from 3 regions, and developers of high-performance machines and systems provided significant help in organizing the conference as well as providing some resources. We are grateful to them all. We express our gratitude to keynote, invited, and individual conference/tracks and tutorial speakers - the list of speakers appears on the conference web site. We would also like to thank the followings: UCMSS (Universal Conference Management Systems & Support, California, USA) for managing all aspects of the conference; Dr. Tim Field of APC for coordinating and managing the printing of the proceedings; and the staff of Luxor Hotel (Convention department) at Las Vegas for the professional service they provided. Last but not least, we would like to thank the Co-Editors of SERP’18: Prof. Hamid R. Arabnia, Prof. Leonidas Deligiannidis, and Prof. Fernando G. Tinetti. We present the proceedings of SERP’18.

Steering Committee, 2018 http://americancse.org/

Contents SESSION: MODEL DRIVEN DEVELOPMENT + HUMAN COMPUTER INTERACTION + USABILITY STUDIES + MOBILE APPLICATIONS On the Scalability of Concurrency Coverage Criteria for Model-based Testing of Autonomous Robotic Systems Mahmoud Abdelgawad, Sterling McLeod, Anneliese Andrews, Jing Xiao

3

User Modelling in Search for People with Autism Esha Massand, Keith Leonard Mannock

10

Model Driven Development for Android Apps Yoonsik Cheon, Aditi Barua

17

Development of Smart Mobility Application based on Microservice Architecture Masatoshi Nagao, Takahiro Ando, Kenji Hisazumi, Akira Fukuda

23

User Interaction Metrics for Hybrid Mobile Applications Thomas G. Hastings, Kristen R. Walcott

30

A GUI Interfaces for a Multiple Unmanned Autonomous Robotic System Ahmed Barnawi, Abdullah Al-Barakati, Omar Alhubaiti

36

M-Commerce Apps Usability: The Case of Mobile Hotel Booking Apps Dominique Matlock, Andrea Rendel, Briana Heath, Samar Swaid

42

Static Taint Analysis Tools to Detect Information Flows Dan Boxler, Kristen R. Walcott

46

SESSION: TESTING METHODS, AND SECURITY ISSUES + REVERSE ENGINEERING Using Deep Packet Inspection to Detect Mobile Application Privacy Threats Gordon Brown, Kristen R. Walcott

55

Industrial Practices in Security Vulnerability Management for IoT Systems - an Interview Study Martin Host, Jonathan Sonnerup, Martin Hell, Thomas Olsson

61

A Review of Popular Reverse Engineering Tools from a Novice Perspective Amanda Lee, Abigail Payne, Travis Atkison

68

An Approach to Analyze Power Consumption on Apps for Android OS Based on Software Reverse Engineering Eduardo Caballero Espinosa, Abigail Payne, Travis Atkison

75

TTExTS: A Dynamic Framework to Reverse UML Sequence Diagrams From Execution Traces 82 Xin Zhao, Abigail Payne, Travis Atkison Integrating Software Testing Standard ISO/IEC/IEEE 29119 to Agile Development Ning Chen, Ethan W. Chen, Ian S. Chen

89

SESSION: AGILE DEVELOPMENT AND MANAGEMENT Conformance to Medical Device Software Development Requirements with XP and Scrum Implementation Ozden Ozcan-Top, Fergal McCaffery

99

Seven Principles of Undergraduate Capstone Project Management Xiaocong Fan

106

The role of Product Owner from the practitioner's perspective: An exploratory study Gerardo Matturro, Felipe Cordoves, Martin Solari

113

Requirements Mining in Agile Software Development Micheal Tuape

119

SESSION: PROGRAMMING ISSUES AND ALGORITHMS + SOFTWARE ARCHITECTURES AND SOFTWARE ENGINEERING Improving Programming Language Transformation Steven O'Hara

129

Statistical Analysis of Prediction Accuracy Measure Distributions for Effort and Duration of Software Projects Cuauhtemoc Lopez-Martin

136

Software Architecture of an ISR Asset Planning Application Wilmuth Muller, Frank Reinert, Roland Rodenbeck, Jennifer Sander

142

Improved Technique for Implementing Gradual Type in Racket Language Asiful Islam, Jingpeng Tang, Sun Wenhui, Qianwen Bi

148

Progress in Requirement Modeling and Analysis in Software Development A.Keshav Bharadwaj, V.K. Agrawal

156

SESSION: SOFTWARE ENGINEERING AND APPLICATIONS + TOOLS Effectiveness of Code Refactoring Techniques for Energy Consumption in a Mobile Environment Osama Barack, LiGuo Huang

165

Discovery of Things In the Internet of Things Naseem Ibrahim

172

Publishing Things in the Internet of Things Naseem Ibrahim

179

WBAS: A Priority-based Web Browsing Automation System for Real Browsers and Pages Kenji Hisazumi, Taiyo Nakazaki, Yuki Ishikawa, Takeshi Kamiyama, Akira Fukuda

186

Clone Detection Using Scope Trees Mubarek Mohammed, Jim Fawcett

193

Solar and Home Battery Based Spot Market for Saudi Arabia Using Consortium and Negotiation Fathe Jeribi, Sungchul Hong

201

Measuring Cloud Service APIs Quality and Usability Abdullah Bokhary, Jeff Tian

208

Risk Priority Numbers for Various Categories of Alzheimer Disease Chandrasekhar Putcha, Brian Sloboda, Vish Putcha

215

A Web-based Application for Sales Predication Mao Zheng, Song Chen, Yuxin Liu

218

SESSION: POSTER PAPERS AND EXTENDED ABSTRACTS Parser Development Using Customized TypeScript Compiler Kazuaki Maeda

225

SESSION: LATE BREAKING PAPERS Patterns for Systematic Derivation of Loop Specifications Aditi Barua, Yoonsik Cheon

229

A Practical Methodology for DO-178C Data and Control Coupling Objective Compliance Thiago Maia, Marilia Souza

236

Probabilistic Study of Beam Columns Bhupendra Singh Rana, Sanjay Kumar Gupta, Chandrasekhar Putcha

241

Int'l Conf. Software Eng. Research and Practice | SERP'18 |

SESSION MODEL DRIVEN DEVELOPMENT + HUMAN COMPUTER INTERACTION + USABILITY STUDIES + MOBILE APPLICATIONS Chair(s) TBA

ISBN: 1-60132-489-8, CSREA Press ©

1

2

Int'l Conf. Software Eng. Research and Practice | SERP'18 |

ISBN: 1-60132-489-8, CSREA Press ©

Int'l Conf. Software Eng. Research and Practice | SERP'18 |

3

On the Scalability of Concurrency Coverage Criteria for Model-based Testing of Autonomous Robotic systems M. Abdelgawad1 , S. McLeod2 , A. Andrews1 , and J. Xiao2 1 Department of Computer Science, University of Denver, Denver, CO 80208, USA 2 Department of Computer Science, University of North Carolina at Charlotte, Charlotte, NC 28223, USA Abstract— Model-based Testing (MBT) can play a key role in testing autonomous robotic systems. MBT leverages graph-based behavioral models, such as Communicating Extended Finite State Machines (CEFSMs), to describe the behavior of environment objects that autonomous robots interact with (e.g., moving obstacles). These environment objects interact with each other concurrently and independently. Nevertheless, autonomous robots perceive the interaction between environment objects as stimuli. MBT uses behavioral models to generate test paths that represent such stimuli. These test paths are used for testing autonomous robots. However, the potentially large number of possible executions requires a practical solution (i.e., scalable and practical concurrency coverage criteria). This paper explores the scalability of a number of concurrency coverage criteria. It also applies them to a case study. Keywords: Model-based testing, autonomous robotic systems, concurrency coverage criteria, scalability

1. Introduction Autonomous robotic systems perform physical tasks independently, sometimes with very limited external intervention [1]. Autonomous robotic systems exist in various applications such as space exploration, rescue and military missions, mobile manipulations, medical robotics, unmanned vehicles, intelligent manufacturing, and social-service robotics. Such systems can be classified critical due to the environments where these systems are designed to operate; a system’s failure in such environments costs physical damages and sometimes threatens human lives. These environments consist of dynamic objects (e.g., moving obstacles) that interact with each other concurrently and independently. For example, a warehouse robot avoids mobile obstacles (e.g., workers) and moves from location to another location carrying heavy material. Nevertheless, autonomous robots perceive such interaction as world stimuli [1]. One challenge is how to test an autonomous robotic system performing in such environment without having risk of a physical damage or a human life. Model-based Testing (MBT) can play a key role in testing autonomous robots interacting with environment objects. MBT leverages graphbased behavioral models, such as Communicating Extended Finite State Machines (CEFSMs), to describe the behavior

of environment objects. MBT then uses these models to generate test scenarios (i.e., test paths) that represent the world stimuli, which are used for testing autonomous robotic software [2]. The behavior of environment objects need to be modeled as concurrent processes. Insofar as these environment objects are unpredictable and independent, they lack synchronization with each other. Consequently, there are few if any restrictions on possible parallel executions, leading to large number of possible concurrent executions. This requires a practical solution (i.e., scalable MBT coverage criteria) to handle such large number of concurrent executions. This paper explores the scalability of existing coverage criteria (i.e., concurrency coverage criteria) such as All Possible Execution Serialized Sequence (APESS) [2]. It also introduces new concurrency coverage criteria: All Possible Execution Serialized Sequence for every (n) number of Transitions (APESSnT), Ordered Random Execution Serialized Sequence (O-RESS), and None-Random Execution Serialized Sequence (NO-RESS). It then compares between the scalability of concurrency coverage criteria and applies them to a case study, a robotic motion planning system (namely Real-time Adaptive Motion Planning (RAMP)). Lastly, we discuss the issue of concurrency coverage criteria for model-based testing of autonomous robotic systems. Structure of the Paper. Section 2 summarizes briefly MBT and concurrency coverage criteria. Section 3 explains how concurrency coverage criteria work and introduces new ones. It also quantitatively compares the scalability of concurrency coverage criteria. In Section 4, we apply the concurrency coverage criteria to a real case study (RAMP) and discuss the results. Section 5 draws conclusions.

2. Background 2.1 Model-based Testing (MBT) MBT is a software testing technique that uses various models to generate testcases [3]. It includes three key elements: models that describe software behavior, criteria that guide the test-generation algorithms, and tools that support the execution of testcases. Li and Wong [4] present a MBT approach to generate behavioral testcases. They use CEFSMs to model behaviors and events of a system under test (SUT). Events with

ISBN: 1-60132-489-8, CSREA Press ©

4

Int'l Conf. Software Eng. Research and Practice | SERP'18 |

variables are used to model data while the events’ interaction channels are used to model communication. Testcases are then generated based on a combination of behaviors, data, and communications. Lill et al. [5] use Coloured Petri Nets (CPN) to build a test model for testing a factory robot that carries a load from one place to another. The authors define a hierarchy of coverage criteria for CPN to generate test cases, such as Color-based, Event-based, and State-based coverage criteria. They show that CPN is practical for testing autonomous robots. A. Andrews et al. [2] propose a MBT approach for testing autonomous robotic systems. They use CEFSM and Petri Nets to model the interaction between robots and environment objects for test path generation. They apply the MBT approach to various types of autonomous robotic applications including autonomous ground vehicle (AGV), a tour-guide robots, and a search and rescue robot. The results show that MBT is efficient, in terms of cost, for testing autonomous robotic systems. The results also illustrate that the formal models, CEFSM and Petri Nets, are practical for test path generation.

2.2 Concurrency Coverage Criteria Unlike testing sequential programs, testing concurrent programs requires special types of coverage criteria (i.e., concurrency coverage criteria) that handle the potential large number of interleavings of a set of sequential test paths that are executed in parallel. R. Yang and C.G. Chung [6] propose Rendezvous Coverage Criteria (RCC). The authors use a flow graph and a rendezvous graph to model the execution tasks’ synchronization. The RCC include Rendezvous Node coverage (RN), Rendezvous Branch coverage (RB), Rendezvous Route coverage (RR), and Concurrent Route coverage (CR). The authors apply the RCC to a small Ada program. RCC are effective for testing concurrent programs whose tasks show dependencies. R.N. Taylor et al. [7] define a set of concurrent coverage criteria for testing current programs. This set includes AllPossible-Rendezvous (APR), All-CC-States (ACCS), AllEdges-Between-CC-State (AEBCCS), All-Proper-CC- Histories (APCCH), and All-Concurrency-Paths (ACP). The authors build a subsumption hierarchy for these criteria. They also apply them to a small example, i.e., the dining philosophers. Although this work does not provide experimental results, it highlights the importance of concurrent coverage criteria for testing concurrent programs. Andrews et al. [2] propose a concurrency coverage criterion (All Possible Execution Serialized Sequence (APESS)) that generates all possible serializations for concurrent sequential test paths for testing autonomous robotic systems.

3. Approach 3.1 Testing Approach and Example The environment objects that an autonomous robotic system deals with interact with each other independently and concurrently. Thus, they must be modeled as concurrent processes. This paper adopts the CEFSMs from [8] due to the model’s support for concurrent processes, which is necessary for testing autonomous robotic systems. A CEFSM is defined as a tuple Q = (S, s0 , E, P, T, A, M, V, C) [9], such that, S is a finite set of states, s0 is the initial state, E is a set of events, P is a set of predicates, T is a set of transition functions T: S×P ×E→S×A×M , A is a set of actions, M is a set of communicating messages, V is a set of variables, and C is the set of input/output communication channels. Definition 1 (Process): A process represents CEF SMi that contains a set of states Si = {si1 , si2 , . . . , six } and a set of transitions Ti = {ti1 , ti2 , . . . , tiy }, and x > 0 and y > 0. The Ti sequentially occur to move Pi from a state sij to an another state sik . Test paths that represent execution of concurrent processes are generated in three steps: i ) Covering each process separately through sequential test paths. Coverage criteria, such as Node Coverage (NC), Edge Coverage (EC) and Prime Path Coverage (PPC) [10], can be used to generate sets of sequential test paths. ii ) Selecting every sequential test path from a set to be interleaved with every sequential test path from other sets. The result of this step is various of selections of sequential test paths. iii ) Interleaving sequential test paths of a selection with each other. This step is repeatedly applied to all selections. Concurrency coverage criteria are required to rule the interleaving of sequential test paths. For illustration purpose, we construct a simple example that includes two processes, Process #1 and Process #2, modeled as CEFSM (Fig. 1). Process #1 consists of four transitions (t11 , t12 , t13 , t14 ) and three states (s11 , s12 , s13 ). Process #2 consists of four transitions (t21 , t22 , t23 , t24 ) and two states (s21 , s22 ). The dot arrows represent the communication channels between these two processes.

Fig. 1: CEFSM Model consists of two processes

ISBN: 1-60132-489-8, CSREA Press ©

Int'l Conf. Software Eng. Research and Practice | SERP'18 |

3.2 Sequential Test Path Generation To cover Process #1 in Fig. 1, EC [10] criterion generates two test paths {([s11 , s12 ], [s12 , s13 ], [s13 , s12 ]), ([s11 , s12 ], [s12 , s13 ], [s13 , s11 ])}. The test path is considered as a sequence of transitions that moves a process to an expected state. Thus, these two test paths can simply be written as: {(t11 , t12 , t13 ), (t11 , t12 , t14 )}. Process #2 is covered by two test paths, {(t21 , t22 ), (t23 , t24 )}. Note that these two processes can possibly be covered as one test path each. We covered them as two test paths each to demonstrate the concurrent coverage criteria. Definition 2 (Sequential test path): For given a process Pi , a sequential test path stpij , consists of at least one transition, starts at some initial state sim ∈ Si0 and ends at some final state sin ∈ Sif , such that Si0 is a set initial states and Sif is a set of finial states for Pi . Definition 3 (Set of sequential test path): For a given process Pi , a set of sequential test paths ST Pi = {stpi1 , stpi2 , . . . , stpij }, and j >= 1, cover Pi based on a certain coverage criterion. In our example, we use EC [10] that cover transitions (i.e., edges) of a process Pi . As a result, we generate two sets of sequential test paths (ST P1 and ST P2 ) as they include two paths each, ST P1 = {stp11 , stp12 } and ST P2 = {stp21 , stp22 }, where stp11 = (t11 , t12 , t13 ), stp12 = (t11 , t12 , t14 ), stp21 = (t21 , t22 ), and stp22 = (t23 , t24 ).

5

Definition 4 (Selection): Given sets of sequential test paths {ST P1 , ST P2 , . . . , ST Pm }, a selection, Si ∈ S, includes a sequential test path, stpij , from each set, ST Pi , k T such that Si = ∅. i=1

The Cartesian product of the sequential test path sets describes all possible selections. Hence, the number of selections, |selections|, for m parallel processes is |selections|= m Q |ST Pi |. For our example, there are four possible seleci=1

tions, selection1 (stp11 , stp21 ), selection2 (stp11 , stp22 ), selection3 (stp12 , stp21 ), selection4 (stp12 , stp22 ).

3.4 Using Concurrency Coverage Criteria For each selection, sequential test paths must be interleaved with each other to present the interaction between the concurrent processes. The interleaving algorithm interleaves the sequential test paths based on concurrency coverage criteria. The result if this step is the set of interleaved paths, called interaction test paths. Definition 5 (Interaction test path): Given a set of sequential test paths (stpij , . . . , stpxy ) that work in parallel, an interaction test path, IT Pi , is a possible serialization form for the transitions of these sequential test paths.

3.3 Selection of Sequential Test Paths The processes, P1 , P2 , . . . , Pm , that represent environment objects, and m is the number of these objects, interact concurrently with each other. Hence, the sets of sequential test paths ST P1 , ST P2 , . . . , ST Pm are considered parallel. However, we do not know which stpai ∈ ST Pa will interact with which stpbj ∈ ST Pb at a time due to non-deterministic execution of these processes. Therefore, every sequential test path stpxi ∈ ST Px is selected to be interleaved with every sequential test path stpyi ∈ ST Py and x 6= y. That is, the selection operation repeatedly continues until it covers all possible selections for all sequential test paths. Algorithm 1 outlines the selection operation. Algorithm 1 Generate Selections 1: let P be a set processes, (sizeof (P ) > 1), and each process Pi consists of a set of sequential test paths, ST Pi = {stpi1 , stpi2 , . . . , stpiki }; 2: let S be an empty set of sets for selections; 3: for each ST P ∈ P do 4: let tmp be an empty set of sets; 5: for each x ∈ S do 6: for each stpiki ∈ ST P do 7: tmp ← add(x); 8: tmp.currentSet ← stpiki ; 9: end for 10: end for 11: Swap(S, tmp); 12: end for 13: return S; . all possible selections.

Fig. 2: Concurrency Coverage Criteria on Selection1 Concurrency coverage criteria differ in how many possible serialization forms need to be achieved. We consider four such concurrency (one existing and three new ones) coverage criteria for interleaving: All Possible Execution Serialized Sequence (APESS) [2], All Possible Execution Serialized Sequence for n number of Transitions (APESSnT), Ordered Random Execution Serialized Sequence (O-RESS), and None-Random Execution Serialized Sequence (NO-RESS) coverage criterion. 3.4.1 All Possible (APESS):

Execution

Serialized

Sequence

APESS [2] is satisfied when all transitions of a sequential test path interleave with all transitions of other sequential test paths (see Fig. 2-(a), the loop arrow indicates that the order

ISBN: 1-60132-489-8, CSREA Press ©

6

Int'l Conf. Software Eng. Research and Practice | SERP'18 |

Algorithm 2 All Possible Execution Serialized Sequences 1: let S be a set selections and each selection Si consists of a set

of sequential test paths, {stpij , . . . , stpmn }; 2: let AP ESS be an empty set to store serializations; 3: for each (s ∈ S) do 4: orders ← P ermutations(s); 5: for each (r ∈ Orders) do 6: serial ← concatenate(r); 7: AP ESS ← add(serial); 8: S HIFT BACK(AP ESS&, serial, len(pi .f irstpath),

len(pi .f irstpath)); 9: end for 10: end for 11: return AP ESS; . all possible execution serialized sequences. 12: procedure S HIFT BACK(AP ESS&, serial, start, no_t) 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: 23: 24: 25: 26: 27: 28: 29: 30: 31: 32:

. start is an index where shift-backward starts, and no_t refers to how many transitions will be shifted. if ((start + no_t) < (serial.size() − 1)) then if (no_t == 0) then return ; else if (no_t == 1) then for (i = start; i < (serial.size() − 1); i + +) do Swap(serial[i], serial[i + 1]); AP ESS ← add(serial); end for else for (i = start; i < (serial.size() − 1); i + +) do for (j = start; j > (start − no_t); j − −) do Swap(serial[j], serial[j + 1]); end for AP ESS ← add(serial); start + +; S HIFT BACK(AP ESS&, serial, start, (no_t− 1)); end for end if end if end procedure

of sequential test paths must be changed to generate more possible serializations). APESS is illustrated in detail in Algorithm 2. First, for each selection, the order of sequential test paths is formed as all permutations. For each order, the sequential test paths are sequentially concatenated as the first serialization, IT P1 . Then, it shifts the last transition of the first sequential test path one step backward to generate the second serialization, IT P2 . It keeps shifting this transition backward until it reaches the end of last sequential test path. In each step, a new interaction test path, IT Pi , is generated and added to the APESS serialization list. Next, the algorithm recursively shifts the two last transitions of the first sequential test path backward to reach the end of the last sequential test path, and so on. It recursively keeps shifting backward until the first sequential test path crosses others to be the last sequential test path. This recursive routine serializes only one order of sequential test paths; thus it starts over again for other orders until it serializes all sequential test paths of a selection. Definition 6 (APESS criterion): Given an ordered set of sequential test paths, (stpab , . . . , stpxy ), as test require-

ments, APESS is satisfied when every transition, tij ∈ stpab , serializes at least once with every transition of other sequential test paths, (. . . , stpxy ). For selection1 (stp11 , stp21 ) illustrated in Fig. 2-(a), the APESS algorithm generates 10 IT P ’s as AP ESS(selection1 ) = {(t11 , t12 , t13 , t21 , t22 ), (t11 , t12 , t21 , t13 , t22 ), (t11 , t21 , t12 , t13 , t22 ), (t21 , t11 , t12 , t13 , t22 ), (t21 , t22 , t11 , t12 , t13 ), (t21 , t11 , t22 , t12 , t13 ), (t21 , t11 , t12 , t22 , t13 ), (t11 , t21 , t22 , t12 , t13 ), (t11 , t21 , t12 , t22 , t13 ), (t11 , t12 , t21 , t22 , t13 )}

In general, the number of interaction test paths generated by APESS is |AP ESS|=

s P i=1

(

si P

j=1 k Q

(|stpij |))!

[2], where s (|stpij |)!

j=1

denotes the number of selections, si is the number of sequential test paths for selection i, |stpij | denotes the number of transitions for each sequential test path. APESS is practically infeasible; especially when the number of processes is large and each of the processes is covered by many sequential test paths. Section 3.5 uses it as an upper bound when comparing concurrency coverage criteria. Definition 7 (APESSnT criterion): Given an ordered set of sequential test paths, (stpab , . . . , stpxy ), as test requirements, and n is a number of transitions, APESSnT is satisfied when every n transitions from stpab serialize at least once with every n transitions of other sequential test paths, (. . . , stpxy ). Similar to APESS, APESSnT visits all selections, for each selection, the order of sequential test paths is formed as all permutations. It reduces the number of serializations by keeping n transitions without interleaving (in APESS, n = 1). Otherwise It is identical to APESS. For selection1 (stp11 , stp21 ) illustrated in Fig. 2-(b), assuming n = 2, the APESS2T algorithm generates 3 IT P s as AP ESS2T (selection1 ) = {(t11 , t12 , t13 , t21 , t22 ), (t21 , t22 , t11 , t12 , t13 ), (t11 , t12 , t21 , t22 , t13 )}.

The number of interaction test paths generated by APESSnT is

s P

(

i=1

si P

j=1 s i Q j=1

trunc( trunc(

|stpij | ))! n |stpij | )! n

+

s P i=1

|stpij | n |stpij | mod( n

mod(

), where

s denotes the number of selections, si is the number of sequential test paths for a selection, |stpij | is the number of transitions for each sequential test path, and n refers to the number of transitions that will be shifted (n > 1). Definition 8 (O-RESS criterion): Given a set of ordered sequential test paths, (stpmn , . . . , stpxy ), as test requirements, O-RESS is satisfied when a random number of transitions from a sequential test path interleave with another random number of transitions of other sequential test paths. O-RESS is illustrated in detail in Algorithm 3. For selection1 (stp11 , stp21 ) illustrated in Fig. 2-(c), O-RESS generates 2 interaction test paths as ORESS(selection1 ) = {(t11 , t22 , t13 ), (t21 , t12 )}. The number of IT P s generated n P by O-RESS is (|selectioni |)!, where n is the number of i=1

selection and |selectioni | is the size of selection i.

ISBN: 1-60132-489-8, CSREA Press ©

Int'l Conf. Software Eng. Research and Practice | SERP'18 |

Algorithm 3 Ordered Random Execution Serialized Sequence

3.5 Quantitative Analysis

1: let S be a set selections and each selection Si consists of a set 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13:

of sequential test paths, {stpij , . . . , stpmn }; let O_EP CSS be an empty set of serializations; for each (s ∈ S) do orders ← P ermutations(s); for each (r ∈ orders) do for each (i = 0 ; i < r.size() ; i + +) do sub_sequence copy(f [i], 1, random(1, size(f [i]) − 1); serial ← add(sub_sequence); end for O_EP CSS ← add(serial); end for end for return O_EP CSS;

7

=

Definition 9 (NO-RESS criterion): Given a set, either ordered or unordered, of sequential test paths, (stpmn , . . . , stpxy ), as test requirements, NO-RESS is satisfied when a random number of transitions from a sequential test path interleave with another random number of transitions of other sequential test paths and the paths order is disregarded. In general, it generates only one interaction test path for a selection. It is considered a weak coverage criterion but counted as a lower bound when comparing concurrency coverage criteria. RESS is illustrated in detail in Algorithm 4. As shown in line 5, it cuts a random number of transitions Algorithm 4 None-Ordered Random Execution Serialized Sequence 1: let S be a set selections and each selection Si consists of a set of sequential test paths, {stpij , . . . , stpmn }; 2: let N O_EP CSS be an empty set of serializations; 3: for each (s ∈ S) do 4: for each (i = 0 ; i < s.size() ; i + +) do 5: sub_sequence = copy(s[i], 1, Random(1, size(s[i]) − 1); 6: serial ← add(sub_sequence); 7: end for 8: N O_EP CSS ← add(serial); 9: end for 10: return N O_EP CSS;

(i.e., subsequence) from each sequential test path without considering the test paths’ order. It then combines these subsequences of test paths into one serialization. In line 6, the algorithm adds the new serialization before loops to another selection. For selection1 (stp11 , stp21 ) illustrated in Fig. 2-(a), NO-RESS generates 1 interaction test paths as N ORESS_selection1 = {(t11 , t22 , t13 ), (t21 , t12 )}. The number of interaction test paths generated by NORESS equals to the number of selections, |N ORESS|= |selections|, since it generates only one serialization for each selection. This small example does not show how scalable these concurrency coverage criteria are. A tool that automatically serializes a numerous entered processes, sequential test paths and transitions is required.

We built a tool that automatically generates serializations, i.e., interaction test paths (IT P s). We use it to compare between the scalability of concurrency coverage criteria (APESS, APESSnT, O-RESS and NO-RESS). When we add more processes, sequential test paths and transitions, it exponentially generates more interaction test paths. Hence, we measure the scalability by the number of interaction test paths generated. The tool takes the following inputs: • Enter processes P that describe environment objects. • For each Pi , enter a set of sequential test paths, ST Pi = {stpi1 , stpi2 , . . . , stpiki }, that cover Pi based on Edge Coverage (EC) [10]. • For each stpxy ∈ ST Pi , enter a sequence of transitions, {tx1 , tx2 , . . . , txy } that describe behaviors of the environment object Pi . Based on these inputs, the tool generates interaction test paths for APESS, APESS2T, NO-RESS and O-RESS. Input: #P #stp #t 3 5 8 10 13 15

2 3 4 2 3 5

3 4 5 6 8 8

AEPSS 13440 7.42463E+13 1.24358E+36 2.27588E+56 2.206E+112 1.6871E+140

Output: #of IT P s By AEPSS2T O-RESS

NO-RESS

56 24 8 27556200 1215 243 5.35623E+15 524288 65536 4.49208E+27 10240 1024 1.46716E+56 20726199 1594323 5.02986E+71 4.57764E+11 30517578125

Table 1: Scalability of concurrency coverage criteria. The results show that the concurrency criteria are unscalable for large number of processes. As shown in Table 1, first we start with 3 processes (#P = 3), each of which consists of 2 sequential test paths (#stp = 2), and each sequential test path includes 3 transitions (#t = 3). Then, we add more processes, sequential test paths and transitions. The inputs are increased up to 15 processes (#P = 15), 5 sequential test paths for each process (#stp = 5), and 8 transitions for each sequential test path (#t = 8). Table 1 compares between the scalability of APESS, APESS2T, NO-RESS and O-RESS. The result reports that APESS and APESS2T are unscalable; they scale up exponentially with the number of transitions, number of sequential test paths, and the number of processes. If any of these values are increased, the number of generated interaction test paths increase drastically. This means that APESS and APESS2T are practically infeasible. Nevertheless, O-RESS and NO-RESS are practically feasible for a small number of environment objects (e.g., a robot simultaneously interacts with 5 environment objects). The O-RESS and NO-RESS increase only when the number of processes is increased (e.g., NO-RESS generates 1024 IT P s for 10 processes that are represented by 2 stp0 s each, and each stp consists of 6 transitions). This is reasonable because O-RESS and NO-RESS take neither the number

ISBN: 1-60132-489-8, CSREA Press ©

8

Int'l Conf. Software Eng. Research and Practice | SERP'18 |

of sequential test paths nor the number of transitions into account. It depends on only the number of selections to generate interaction test paths.

4. Case Study 4.1 Real-time Adaptive Motion Planning RAMP is a state-of-the-art robot motion planning framework [11]. It enables a robot to operate in an environment with moving obstacles that have unpredictable and arbitrary motions. RAMP performs real-time simultaneous

and angular velocity, are used for test-data generation, which are used to transform interaction test paths into executable testcases (see [12] for detail). This paper emphasizes the scalability of concurrency coverage criteria and does not include the transformation of interaction test paths into executable testcases. 1) Obstacle#1 Transitions: t11 = (−/[]/−/move(linear_vel, angular_vel)) t12 = (−/[]/−/stop(pose)) 1) Obstacle#2 Transitions: t21 = (−/[]/−/move(linear_vel, angular_vel)) t22 = (−/[]/−/stop(pose)) 1) Obstacle#3 Transitions: t31 = (−/[]/−/move(linear_vel, angular_vel)) t33 = (−/[]/−/stop(pose))

Table 2: Transitions of the obstacle CEFSM model

Fig. 3: RAMP system interacting with three obstacles planning and execution of robot motion to achieve its goal while avoiding moving obstacles. As shown in Fig. 3, this paper exercises the RAMP system interacting with three obstacles that move unpredictably, concurrently and independently. The SUT is the RAMP framework that generates motion for the robot to execute. The obstacles (Obstacle#1, Obstacle#2 and Obstacle#3) move with random speed and random direction for a random duration. The SUT is expected to drive the robot to its goal and avoid the obstacles. The motion of these obstacles (i.e., obstacles’ behaviors) are world stimuli that the RAMP system perceives as inputs.

Transitions of the obstacles’ CEFSM model are shown in Table 2. We use Edge Coverage (EC) [10] to generate sequential test paths. Each obstacle is covered by one sequential test path. Table 3 illustrates these sequential test path. t

t

t21

t

t31

t

11 12 Obstacle#1: stp11 (stopped −− → moving −− → stopped) 22 Obstacle#2: stp21 (stopped −−→ moving −− → stopped) 32 Obstacle#3: stp31 (stopped −−→ moving −− → stopped)

Table 3: Sequential test paths for three obstacles The generated sequential test paths together produces only one selection as |selections|= 1 × 1 × 1 (see section 3.3). This selection is illustrated as selection1 = {stp11 , stp21 , stp31 }. Applying AEPSS to this selection generates 90 IT P s as: |AP ESS(selection1 )|=

(2+2+2)! 2!×2!×2!

= 90

The APESS2T generates 6 IT P s and O-RESS generates 3 IT P s, while NO-RESS generates only 1 IT P for selection1 . Table 4 shows 6 IT P that are generated by APESS2T. The application of these concurrency coverage criteria to only three obstacles is not enough to show how scalable these coverage criteria are.

Fig. 4: CEFSM Model of Three Obstacles

(t11 , t12 , t21 , t22 , t31 , t32 ), (t11 , t12 , t31 , t32 , t21 , t22 ), (t21 , t22 , t11 , t12 , t31 , t32 ), (t21 , t22 , t31 , t32 , t11 , t12 ), (t31 , t32 , t11 , t12 , t21 , t22 ), (t31 , t32 , t21 , t22 , t11 , t12 )

Table 4: Six IT P s generated by APESS2T As shown in Fig. 4, the obstacles’ behaviors are modeled with CEFSM model. Each obstacle is modeled as a process. We assume that the obstacles have the same behaviors: move and stop. The move includes linear velocity and angular velocity. These velocities control the obstacle’s speed and direction. The stop reports the position where the obstacle is stopped. Note that, variables, such as position, linear velocity

4.2 Discussion This paper assesses the scalability of four concurrency coverage criteria (APESS, APESS2T, O-RESS, and NORESS). We apply them to a case study. The result shows that the concurrency coverage criteria are unscalable for a

ISBN: 1-60132-489-8, CSREA Press ©

Int'l Conf. Software Eng. Research and Practice | SERP'18 |

large number of environment objects. The number of interaction test paths that are generated by APESS is practically infeasible; 90 interaction test paths for a small model like RAMP obstacles is simply too large. The other coverage criteria generate a feasible number of interaction test paths for small number of environment objects (e.g., maximum 5 objects). O-RESS and NO-RESS cover a random number of transitions from each sequential test path. One can argue that O-RESS and NO-RESS are weak coverage criteria because we cannot guarantee that the environment objects would interact with each other in this way. This is true; however these interactions are still possible. Therefore, any combinatorial technique to interleave sequential test paths with each other is possible. For example, concatenating all sequential test paths together as one long interaction test path is also possible. This interaction test path semantically means that environment objects are ordered as a line to interact with a robot one by one. This probably rare occurrence but nevertheless possible. The generated interaction test paths are abstract. They require test-data (i.e., possible values that input parameters can take on) to be transformed into executable testcases. This transformation of IT P s into executable testcase increases the number of testcases by the size of the test-data set. If N is the size of test-data that is used to transform the 6 IT P s shown in Table 4, the result is 6 × N executable testcases. Any member of the input-space partitioning coverage criteria from [10] such as Each Choice , Pair-Wise, T-Wise, and Base Choice coverage can be used to generate test-data. Many controlled execution schemes (i.e., testing automation framework such as GoogleTest [13]) can be used to enforce coverage of the feasible interaction test paths. More detail about testcase scripting and execution for testing the RAMP system by using GoogleTest is found in [12].

5. Conclusion This paper presents a challenge related to the scalability of concurrency coverage criteria for MBT of autonomous robotic systems. It is a challenge because concurrency coverage criteria have to be scalable and practical. This paper evaluates the scalability of four concurrency coverage criteria (APESS [2], APESSnT, O-RESS, and NO-RESS. APESSnT, O-RESS, and NO-RESS were introduced in this paper. First, The concurrency coverage criteria explained by a small example. This paper then compares between the scalability of these concurrent coverage criteria with large numbers of processes, sequential test paths and transitions. We applied the concurrency coverage criteria to a case study, i.e., Real-time Adaptive Motion Planning (RAMP). We assumed that RAMP interacts with three moving obstacles. The result show that these concurrency coverage criteria are exponentially scalable in terms of the number of interaction test path generation. The number of interaction test paths generated by APESS is too large

9

and practically infeasible. On the other hand, although the number of processes, sequential test paths and transitions is increased, O-RESS and NO-RESS generates practical a number of interaction test paths. It can be argued that ORESS and NO-RESS are weak coverage criteria because they do not cover many possibilities. This is true; however, for testing concurrent processes, it remains difficult to trade between covering processes’ interactions as much as possible (i.e., strong testing coverage criteria), and how practically feasible this number of possible interactions is (i.e., the size of the test-suite). Nevertheless, introducing any new concurrency coverage criteria that are practically feasible would be purposeful as long as the environment objects are described as concurrent and independent. Future work will introduce new practical concurrency coverage criteria for testing autonomous robotic systems. It will also discuss the effectiveness of the generated interaction test paths.

References [1] H. Cheng, Autonomous Intelligent Vehicles: Theory, Algorithms, and Implementation, 1st ed. Springer London Dordrecht Heidelberg, New York: Springer-Verlag, 2011. [2] A. Andrews, M. Abdelgawad, and A. Gario, “Towards world modelbased test generation in autonomous systems,” in Proceedings of the 3rd International Conference on Model-Driven Engineering and Software Development (MODELSWARD) 2015. SCITEPRESS Digital Library, 2015, pp. 165–176. [3] A. Dias-Neto, R. Subramanyan, M. Vieira, and G. H. Travassos, “A survey on model-based testing approaches: A systematic review,” in Proceedings of the 1st ACM International Workshop on Empirical Assessment of Software Engineering Languages and Technologies (ASE). ACM, 2007, pp. 31–36. [4] J. Li and W. Wong, “Automatic test generation from communicating extended finite state machine (CEFSM)-based models,” in Proceedings of 5th IEEE International Symposium on Object-Oriented RealTime Distributed Computing. (ISORC 2002), 2002, pp. 181–185. [5] R. Lill and F. Saglietti, “Testing the cooperation of autonomous robotic agents,” in Proceedings of the 9th International Conference on Software Engineering and Applications (ICSOFT-EA), Aug 2014, pp. 287–296. [6] R. Yang and C.-G. Chung, “A path analysis approach to concurrent program testing,” in Proceedings of the 9th Annual International Phoenix Conference on Computers and Communications, Mar 1990, pp. 425–432. [7] R. N. Taylor, D. L. Levine, and C. D. Kelly, “Structural testing of concurrent programs,” IEEE Trans. Softw. Eng., vol. 18, no. 3, pp. 206–215, Mar. 1992. [8] M. Gouda, E. Manning, and Y. Yu, “On the progress of communication between two finite state machines,” Information and Control, vol. 63, no. 3, pp. 200–216, 1984. [9] D. Brand and P. Zafiropulo, “On communicating finite-state machines,” J. ACM, vol. 30, no. 2, pp. 323–342, Apr. 1983. [10] P. Ammann and J. Offutt, Introduction to Software Testing, 2nd ed. 32 Avenue of the Americas, New York, NY 10013, USA: Cambridge University Press, 2017. [11] J. Vannoy and J. Xiao, “Real-time adaptive motion planning (RAMP) of mobile manipulators in dynamic environments with unforeseen changes,” IEEE Transactions on Robotics, vol. 24, no. 5, pp. 1199– 1212, Oct. 2008. [12] M. Abdelgawad, S. McLeod, A. Andrews, and J. Xiao, “Model-based testing of a real-time adaptive motion planning system,” Advanced Robotics, vol. 31, no. 22, pp. 1159–1176, 2017. [13] Google, “Google C++ Testing Framework (GoogleTest),” 2013. [Online]. Available: https://code.google.com/p/googletest/

ISBN: 1-60132-489-8, CSREA Press ©

10

Int'l Conf. Software Eng. Research and Practice | SERP'18 |

User Modelling in Search for People with Autism Keith Leonard Mannock

Esha Massand

Birkbeck Knowledge Lab Birkbeck Knowledge Lab Department of Computer Science and Information Systems Department of Computer Science and Information Systems Birkbeck, University of London Birkbeck, University of London London, WC1E 7HX, UK London, WC1E 7HX, UK Email: [email protected] (Contact Author)

Keywords—Human Computer Interaction and Usability Engineering, Usability Engineering, Domain Modeling and Metamodeling, Web-Based Tools, Applications and Environment, Case Studies and Emerging Technologies Abstract—We present the architecture and prototype implementation of J ELLIBEANS, a web search tool for assisting users with Autistic Spectrum Disorder (ASD)1 . The system models user interactions within the search process, utilising a user profile, and by integrating insights from the core features of the Autistic condition. The system has an integrated infra-red motion controlled user interface component, utilising gesture and hand movement data to enhance the interactive search process. The work provides insights into how search can be improved for users on the Autistic spectrum and includes an analysis of experiments carried out with the system.

I. I NTRODUCTION According to the “Diagnostic and Statistical Manual of Mental Disorders” [1], Autism is characterised by persistent and early deficits in reciprocal social interaction. Autism is the most common neuro-developmental condition (1 in 68 children meet criteria for the Autism Spectrum). It is well known that interaction with computers is prominent in this group, and that individuals with Autism are more engaged when using technology that is receptive and interactive (e.g., games, responsive consoles, motion controlled devices, etc.) when compared to technology that is not [2]. Research has also shown that people with Autism are less context-sensitive [3]. Generally speaking, individuals with Autism prefer, and are more likely to engage in, an itemspecific, or detailed processing style. There is good reason to postulate then, that their web-search queries are formed very differently to the typical population, more likely consisting of single-order associations [4]. In addition to a detailed, item-specific processing style, individuals with Autism are also less likely to engage in a contextual processing style. For example, in a search for the word “Guitar”, a contextual style of processing would imply an awareness that the word is related to “Piano”, but also that both words are hierarchically related to “Instrument”. The working memory is the aspect of the human memory system that assists with the transient holding and processing of information for updating, learning, and comprehension. In day-to-day life, a major factor determining the effectiveness 1 J ELLIBEANS are a rainbow of colours, different sizes and shades, and the name represents the difference in style of processing of individuals with ASD.

of working memory, and the quality of information that an individual can recall (e.g., enter into the web-search), is the number of cues that are available at the time. For example, if presented with a list of fifty items to remember, you are more likely to remember that you are presented with the phrase “Zebra” if you are told that you have been presented an animal, than if no cue of “animal” was given at all. This type of cued recall can be used to bolster recall of information, and is a technique that we have applied in this work to enable individuals with Autism to formulate better search queries in their browser. In Section II we discuss some of the problems people on the autistic spectrum have with existing search engines. In the following section we discuss the process we undertook to complete this work. In Section IV we present the architecture of the system and the main characteristics of the implementation. In Section V we provide an analysis of some of the experiments we have carried out with J ELLIBEANS. Finally, we provide some conclusions and indications of further work. II. W HAT SHOULD SEARCH OFFER PEOPLE WITH AUTISM ? The Internet is one of the largest resources of information available to us. Search engines allow users to collate hundreds of links on a single topic, using only a few words or phrases. The typical user sorts the returned results into “relevant” or “irrelevant” categories, (mentally) flexibly shifting between one result and the next, to determine the relevance of each page returned by the search engine. Search allows the user to assimilate the information on the page into their knowledge and is an important learning tool. For people with Autism, the requirement is no different, however, we suggest that for individuals with Autism current search is inadequate. A large body of research has shown that “mental shifts” are a known area of weakness for people with Autism [5]. The information is therefore harder to assimilate or learn, and judging the relevance of each document “on the fly”, and speedily, becomes near impossible. One successful Psychological technique to increase the assimilation of information for individuals with Autism is to present the information in smaller amounts which are clearer, and therefore easier, to process [6]. This technique avoids overwhelming individuals, and although less information is presented as a whole, whatever is presented to the user becomes “digestible”

ISBN: 1-60132-489-8, CSREA Press ©

Int'l Conf. Software Eng. Research and Practice | SERP'18 | and ultimately, the information as a whole becomes more understandable. Search queries usually fall into one of three broad categories with several stages [7]: • Do — Type queries which characterise transactions between the user and the search engine. For example, when the user wants to do something such as “buy a plane ticket”, “listen to a song”, or “download a screensaver”. • Know — Type queries, which are informational in nature, usually covering a broad topic. For example, the “name of a band”, or “restaurant in London”, “trucks”, or “Colorado”. • Go — Type queries, which are navigational in nature. For example, searching for a particular home page on the web, “YouTube”, or “American Airlines”. After identifying the information need, the user must formulate a search query. The user must browse through results once the query has been entered into a search engine. The whole process can be repeated if the user is not satisfied. III. O UR RESEARCH PROCESS One aim of our work was to build a tool that synthesised several leading search engines. The tool can be used to present the top results of search queries to ensure the best possible results were retrieved and presented to the user and weren’t skewed towards any one particular search engine profile. The first stage was therefore to build the combination search engine and test the results on a user group with Autism. The goal of this part of the research was to understand if the synthesis of the search engines enhanced the search experience (in other words, were users happy with the combination search), or, whether the results introduced redundancies or oddities in the results. In programmatic terms, our system implements a researchguided user model. The development of this user model consisted of iterations of the research-development-test-evaluate lifecycle. We conducted several research surveys to identify differences in user queries and analysed search behaviour patterns from people with, and without Autism. Using these data we built a set of features into search, to guide the user through formulating a complete search query. Development and evaluation included the implementation and testing of the features, and further implementation and revision of the software, resulting in an enhanced search process and improved precision measurements for the users’ intended search query. J ELLIBEANS integrates a motion controlled user interface (UI) using the LEAP motion controller (Figure 1). The interface is highly dynamic, as opposed to static interfaces which are common in search interfaces, and can hold users’ attention for longer periods of time. In the past, users with Autism have struggled to maintain their attention, which is required to sift through the large number of search results. In line with the attention difficulties individuals with Autism face, J ELLIBEANS has been designed to reduce the amount of text on the search results page, presenting only three results per page.

11

Figure 1: LEAP motion controlled user interface hardware

A. Parametric Search Design for Web Applications Search engines, like Google, apply strong search reduction techniques to navigation of the web. For example, one common way this reduction pattern is implemented is by assuming the behaviour of the current user is similar to the behaviour of other users in similar situations [8]. This is often also seen in recommendation engines, e.g., Amazon. As we have noted so far, users with Autism behave in different ways to typical users when navigating search. Users with Autism do not use the same key phrases when looking for documents with several attributes, i.e., queries that would be best formulated using several iterations of search, or multiple search parameters. This leads to an ineffective search; one that requires users to sift through results which are in large-part irrelevant, and a poor user experience. Parametric search queries allow users to specify parameters for their search in an increasingly logical and structured way. As an example, consider the experience of searching for flights to a particular holiday destination. This requires high cognitive “load”, remembering and manipulating arrival, departure, destination, For typical users, parametric search is more structured, and in some circumstances, appears more natural than a free keyword search. We can apply this idea to web search for people with Autism, by asking them to enter criteria that can be applied to subgroups of search queries, therefore reducing the cognitive load. Parametric search can assist the user with capturing the search parameters that are useful for a query, but it does not ultimately reduce the number of search results returned; the possibility of a large result set is most definitely true. So, although this tackles one aim of our research, further refinement of the search results is also necessary. B. Motion controllers Individuals with Autism have poor attention [9]. The static two-dimensional interface of many current search engines

ISBN: 1-60132-489-8, CSREA Press ©

12

Int'l Conf. Software Eng. Research and Practice | SERP'18 | is unlikely to maintain adequate (sustained) interest levels. Individuals (and especially teenagers) with Autism spend a substantial amount of their time using computers, the web, portable devices, or game consoles [10], as they find these more stimulating and attention-grabbing. For these individuals, computer-based technologies provide a stable, consistent learning environment that can be customised [11]. Furthermore, motion recognition devices can be programmed to make consistent responses to environmental triggers. These controlled and interactive environments have shown promise for improving social communication skills and reducing repetitive behaviours [12]. IV. S YSTEM A RCHITECTURE AND I MPLEMENTATION J ELLIBEANS is a prototype tool that assists users with Autism search and navigate the web. As search can be compartmentalised into “Do”, “Know”, and “Go” type queries, it provides a natural way to tackle the issue of simplifying the search process for people with Autism. Users first identify the type of search they would like to carry out. Then J ELLIBEANS re-phrases a question to guide the user through their informational demand. Each stage of the search process is better tailored towards providing a concrete and unambiguous experience for the user, reducing the amount of text on the page. When using J ELLIBEANS, users are assisted through the process of formulating a search query, much like a parametric search engine form. Concretely phrased questions are aided by suggestions that appear laterally on the page. The user can add these suggestions into their search query without the need to type them in, by using natural hand movements, tracked by the motion controller. The UI component makes the experience of searching the web more interactive and receptive for user input. The suggestions also serve as cues for the user, and simplify the search process for them. Each of these features were tested in one-on-one interviews and with a group of users with Autism2 . The prototype was integrated into a web browser and within a motion-controlled environment. Users of the system first complete a screening tool for Autism and save their score (which is stored in a database), so that they can log in and out of J ELLIBEANS without having to complete the Autism quotient each time they wish to use the tool. A client-server software architecture was used for J EL LIBEANS . The presentation layer (PL) consisted of the user interface and motion controller. The PL has access to the application layer (AL), which controls the functionality of the system by performing detailed processing. This included management of J ELLIBEANS within the web-application framework. The static aspects of the website are handled by the JBake[14] framework, together with several other presentation frameworks (to be discussed later). This includes a rewrite 2 Autism diagnostic status was determined using the Autism Quotient Score (AQS), which strictly speaking is not a diagnostic assessment tool [13]. However the measure has a 79% sensitivity to Autism if the user scores above 32 on the measure

engine, which modifies the URLs so that they are more informative/relevant for the user. This has the advantage that it also decouples the HTML files from the URL that is presented to the user. The system is implemented using the Kotlin programming language, a more concise derivative of Java which also has an enhanced feature set. The presentation, application and data store layers were maintained as three distinct modules. The advantages of this type of design is that it allows each of the tiers to be modified, upgraded or replaced independently of one another. For example, if J ELLIBEANS were to be implemented with another motion controller (e.g. the Oculus VR Rift), it would only impact the Presentation tier (Figure 2) although VR headsets can be problematic for ASD people.

Figure 2: J ELLIBEANS architecture

J ELLIBEANS implements a combination search; it synthesises the results from three of the largest, and most popular, engines; Google, Bing and Yahoo [15]. J ELLIBEANS implements a user-model within its search process. The user-model employs a parametric-like search, that accurately quantifies the user’s search query by decomposing it into smaller sub-queries, until the user has exhausted the information they can provide to complete their search. The system also offers suggestions to guide the user towards forming succinct, accurate, and complete search queries. The user model utilises a rule-based learning system to record the user’s search history/process and to make appropriate suggestions. Once signed in, the user completes the Autism Quotient and receive feedback regarding their autism symptoms and a score (the Autism Quotient Score, AQS) 3 . Users can save their responses for each question to a local file for their reference. As stated earlier, the web application is integrated with a motion controlled UI, the LEAP controller (see Figure 3) giving the application a receptive and interactive user interface. J ELLIBEANS prioritises and displays the three highest precision search results for users. The recall of the search engine 3 The Autism Quotient [13] is a Psychological assessment and validated screening tool for autism spectrum disorders

ISBN: 1-60132-489-8, CSREA Press ©

Int'l Conf. Software Eng. Research and Practice | SERP'18 |

Figure 3: User Forming A J ELLIBEANS Search Query with LEAP Using Both Hands

is sacrificed in line with the demands of the user model, to have less textual information on the page [16]. A. Building a User Model of Autism To identify the features of the user model to build, we ran a study to collect example search queries on a set of informational needs. Data was gathered from thirty seven participants. The participants were asked to give examples of search queries they would use, for example, to identify the name of a song they had heard (given the lyrics), or the name of a breed of a dog they had seen (given a picture of the dog). All participants were also asked to indicate which search engine they used most often and to complete the Autism Spectrum Quotient (discussed earlier). Participants were divided into two groups; low AQS (scores below 30, indicating a low number of Autism traits), and high AQS (scores equal to and above thirty, indicating a high number of Autism traits). There were thirty low scorers and seven high scorers. B. Differences in Search Queries Between Users With and Without Autistic-like traits We now discuss a qualitative analysis on the search query strings from both low and high AQS scorers. In both groups, Google was the preferred search engine by far, with all participants reporting that they used Google as a first choice. No one in the current sample used Yahoo or Bing. The low AQS responses were analysed together to establish a baseline answer. This was generated using a frequency criterion of 40% i.e., if 12 out of 30 respondents or more generated the same portion of a query string given an informational need, it was included in the model. If two responses were equally as common, both are reported in the model. Data was discarded when a response indicated that the participant would do an image search, as this was not the aim of the survey. There is insufficient space to discuss the findings here but, suffice to say, there was more variation in the responses gathered from the group with Autistic-like traits. C. Transforming the User Query Given the set of observations in the data reported above, the aims were to “transform” queries made by individuals in the

13 high AQS group to queries more similar to the low AQS group. The search engines already handle some of the observations from high AQS queries. For example, the use of pronouns (“I” and “You”) is already taken care of with the use of stop words [16]. The aim of the prototype is to therefore address issues that result in the search query string being misleading, and returning different results to low AQS search queries. The rule engine is a concrete and operationalised framework, for a theoretically-grounded stereotyped user model of autism within search. J ELLIBEANS works to address multiple observations made during the data collection stage, so the order in which the observations are tackled in the user model is not linear or consecutive by any means. As a general rule, changes will take the form of additional questions presented to the user that aim to structure the individuals search query in a logical manner, so that key search terms are not dropped. A structured query formation assists the user with less idiosyncratic search queries. J ELLIBEANS prompts the user to categorise their search query into one of three possible types of queries, a “Do”, “Know” or, “Go”. We also integrate a fourth subtype of query — the “Social” query. “Social” queries, as the name suggests, are for when the search is about something social; a person, or group of people. Social queries have additional functionality beyond the other query types, to reflect the additional difficulties individuals with Autism have when formulating social search queries on the web. Each type of query is associated with a different colour. “Do” is blue, “Know” is yellow, “Go” is green, and “Social” is red. Once the user has selected the type of query, this colour will be prominent on the page throughout their search to serve as a visual reminder of the task. For individuals with Autism, this technique works to reduce the working memory load as it serves as a goal-directed cue. Two things we know are difficult for people with Autism are maintenance of information in working memory, and goal-directed tasks involving a high demand on Executive Functioning 4 . To address the observation that user queries were often submitted when incomplete, suggestions as well as drop down menus have been implemented on each search page. Users can click on suggestions, and select from the menus if they want the keyword to be added to their search. As individuals with Autism characteristically have an uneven cognitive profile, where verbal ability is usually lower than performance skill [1], the cues assist users who have particular word-finding difficulties. Once the user has requested their search results be returned, the system collates all information the user has entered into the search boxes, as well as any suggestions, and options from drop down menu’s that were selected. To achieve a good 4 Executive functions — also known as the cognitive control and supervisory attentional system) is an umbrella term for the management, regulation and control of cognitive processes, including working memory, reasoning, task flexibility, and problem solving as well as planning and execution [17]

ISBN: 1-60132-489-8, CSREA Press ©

14

Int'l Conf. Software Eng. Research and Practice | SERP'18 | precision and specificity, we have implemented conjunction boolean operators between each input of the parametric search. Not surprisingly this results in a drop in recall, but as only the top three results were to be presented, this was in fact advantageous. The results are then presented to the user, with the exact phrase that was searched for. This means that the user can clearly see their final search query, and rather than a “black-box” system; J ELLIBEANSalso aims to train the user for the long-term about specific ways to improve their personal search query strings. D. Integration with the Motion Controlled Interface J ELLIBEANSis integrated with the LEAP Motion Controller Hardware (Figure 4) which polls frames at a constant rate to keep to timing of accurate movement, which is important for user experience, requiring minimal calibration. It is noninvasive and has a low sensory experience (sensory stimulation is not tolerated well by individuals with Autism). The cognitive demand needed to operate the LEAP is low, and given that web search is a highly cognitive task, this is of real benefit for the user. The LEAP recognises and tracks hands, fingers, finger-like tools, positions, motions and gestures using infrared light and optical sensors along the cartesian coordinate system. The controller has a 150-degree field of view, and operates in a range of 1 inch to 2 feet.

Figure 4: The LEAP controller, with 150 degree view [18]

Each of our senses operates with a different lag time. Hearing has the fastest sense-to-cognition/understanding, and perhaps surprisingly, sight — the slowest. If the device interferes with the processing of the sense, it will confuse the combinatorial configuration of the senses, leading to misunderstandings in the meaning and a worse user experience. The LEAP does not have the best cognitive lag time compared to other devices such as the Oculus VR Rift. However, for our system, the browser refresh rate would likely be the biggest bottleneck for performance, rather than the motion-controller chosen. V. T ESTING AND CRITICAL EVALUATION We considered the evaluation of the prototype under three main headings: 1) the evaluation of the combined search engine, 2) the user interface via the LEAP controller, and most crucially, 3) user testing. Due to lack of available space we will only consider point (3) in this paper. Due to the lack of space we will discuss the results obtained from a focus group which consisted of five high AQS scorers drawn from the original study group; additional studies have

been carried out on the whole group, including one-to-one interviews. The five participants in the focus group were asked to reconstruct the same search queries they had been asked to construct in the first research study, using the search engine they identified as their usual choice (all participants said they would usually Google), and also using J ELLIBEANS. Participants were then asked to comment on: 1) Whether they found a result they would be satisfied with using both search tools, J ELLIBEANS only, Google only, or neither. 2) Which search tool gave them the most relevant results overall, and how many were relevant (focusing only on the top three links for comparison purposes, as J ELLIBEANS only returns three results.) A categorical analysis was conducted to validate the search efficacy of J ELLIBEANS. The graph below (Figure 5), demonstrates that J ELLIBEANS has a good precision rate (30%) compared to other search tools (25%), which indicates that J ELLIBEANS has successfully reduced the number of results the participant has to sift through without reducing precision of the search tool within those items (Table I).

Figure 5: The Y axis represents the number of search queries correctly retrieved, and the X axis shows number for each search tool, both or neither

Table I: No loss in precision with J ELLIBEANS

Participants were also asked to count the number of results that they felt were relevant to the search query. These data are presented in Figure 6. The results show that the number was comparable (albeit variable) between J ELLIBEANS and the usual search tool for the participant. There was a significant difference in the number of words used to form search queries when comparing J ELLIBEANS to other search tools. Significantly more words were used to form search queries using J ELLIBEANS (mean number of words in J ELLIBEANS = 3.53; mean number of words in other search tools = 6.55). A paired-samples t−test was run to confirm this observation (t = -7.79, df = 39, p < 0.001, see Table II). This

ISBN: 1-60132-489-8, CSREA Press ©

Int'l Conf. Software Eng. Research and Practice | SERP'18 |

Figure 6: Number of queries considered relevant for users

is of interest because it indicates that participants used more words in their search query, which in turn narrows down the scope of their search. This is useful for increasing precision scores of the J ELLIBEANS search tool.

15 During the focus group one user with high functioning Autism described that the direct questions, e.g., “What do you want to do on the web?”, with suggestions, “Shop”, “Watch”, “Listen”, etc., was useful when he was “at a total blank”. He disclosed that often his need for perfectionism and sameness could have him staring at a blank screen until he came up with the “perfect” query. However, the suggestions were helpful for him to get started, and so his experience was that the queries were completed faster than usual (sometimes he never completed a search with the conventional approach). Two users however, had considerations about the “missing” results from their search and commented that for them, three results were too few and perhaps five or six results would have been a better choice. It must be said that these individuals were amongst the lowest AQS scorers, suggesting that the amount of text individuals with Autism find optimal is a function of the severity of Autism traits in that individual. VI. C ONCLUSIONS AND FUTURE WORK

Table II: Result of t-test comparing number of words used to form search queries

There was also more repetition in words used to form search queries in J ELLIBEANS. This is probably due to the nature of the search process using this tool — there are more search inputs to complete and drop down input boxes (to select “keywords”) which are added to the search. To remove this confound from the analysis, keywords that were repeated were dropped from any calculations. This suggests that individuals are defining more accurate search queries, and using more keywords in J ELLIBEANS. A. Observations Four out of five participants agreed that one of the major positives of J ELLIBEANS was that unlike Google, users did not receive back pages of mostly irrelevant results. J ELLIBEANS held their interest for longer because of the reduced number of results that users had to sift through (by analysis of the video recordings of the search sessions). Testing also revealed that the parametric search style meant that they did not need to try and remember all the key words to search for, rather the cues helped them remember points they would have missed otherwise. A common comment (3 out of 5 users) was the time/reward payoff from completing a detailed search form. Users agreed that, although they would spend more time formulating their search query, they were likely to save time that would usually be spent sifting through search results. These participants indicated that they would use J ELLIBEANS over other search engines if use on other queries continued to show such benefit. All users positively commented on the reduced number of search results presented. The amount of textual information that was returned on the results page is significantly reduced in J ELLIBEANS.

In this paper we have presented the architecture and prototype implementation of J ELLIBEANS, a web search tool for assisting users on the Autistic spectrum. Several unexpected challenges and successes occurred during the course of the work. The key points being that users observed that with J ELLIBEANS the time/reward payoff is really only apparent, over and beyond a traditional search engine, when the search query is complex. For example, if users are searching for something they have yet to identify, i.e., something ambiguous. (To overcome this, we added a “Search in Google” option, so that if users know what they are searching for, they have the option to run the search outside of J ELLIBEANS.) The users found the motion controlled interface, and parametric interface, to be easier to use than a conventional “blank page”. They liked the “conversational” aspects of the interface and the guided and suggestive nature of the search process. Overall, the user experiences reflect the focus group shown in this paper, that, in general, the use of J ELLIBEANS enhanced their search and sometimes, made a search possible when they would simply have “given up”. Additional work needs to be carried out to extend the interface and validate these initial findings, but certainly these are promising results that are worth investigating further. There are many avenues to consider for the future of J EL LIBEANS . For example, modifications to the aesthetic appeal, user needs, and functionality of the application. A future aesthetic consideration is to improve the visual design of the website itself. In the current version B OOTSTRAP was used, which helped considerably with visual aesthetics, however this could be further refined along with considerations of functionality. A future functional direction for J ELLIBEANS would be to use an API or library for word frequencies in the written English language, which could be integrated with a thesaurus. This type of API requires significant research into the natural language processing of written (rather than spoken) English text. These frequency data could be used to replace infrequent words in queries formed by individuals with Autism

ISBN: 1-60132-489-8, CSREA Press ©

16

Int'l Conf. Software Eng. Research and Practice | SERP'18 | with more frequent words; providing alternative suggestions to the user. Individuals with Autism have increased comorbidity with Attention Hyperactivity Disorder (ADHD), and Colour Blindness, and so these can be taken into account for future iterations of J ELLIBEANS. For example, the colours used for J ELLIBEANS are in the RGB colour scheme, but these could be replaced with gradients or patterns so that individuals with colour blindness can also also benefit. One of the more interesting developments is the use of Augmented Reality [19], which we have started to incorporate into the next version of J ELLIBEANS again, utilising the LEAP tool.

[8]

[9]

[10]

R EFERENCES [1]

[2]

[3]

[4]

[5]

[6]

[7]

Centers for Disease and Control, “Prevalence of autism spectrum disorders–Autism and Developmental Disabilities Monitoring Network, 14 sites, United States, 2008.”, Morbidity and mortality weekly report. Surveillance summaries (Washington, D.C. : 2002), vol. 61, no. 3, pp. 1–19, 2012, ISSN: 1545-8636. DOI: ss6103a1[pii]. F. Garzotto, M. Valoriani, and L. Bartoli, “Touchless motion-based interaction for therapy of autistic children”, Intelligent Systems Reference Library, vol. 68, pp. 471–494, 2014, ISSN: 18684408. DOI: 10.1007/9783-642-54816-1 23. L. Mottron, J. A. Burack, J. E. Stauder, and P. Robaey, “Perceptual processing among high-functioning persons with autism”, Journal of Child Psychology and Psychiatry and Allied Disciplines, vol. 40, no. 2, pp. 203–211, 1999, ISSN: 00219630. DOI: 10 . 1017 / S0021963098003333. L. Waterhouse and C. Gillberg, “Why autism must be taken apart”, Journal of Autism and Developmental Disorders, vol. 44, no. 7, pp. 1788–1792, 2014, ISSN: 15733432. DOI: 10.1007/s10803-013-2030-5. M. Elsabbagh, A. Volein, K. Holmboe, L. Tucker, G. Csibra, S. Baron-Cohen, P. Bolton, T. Charman, G. Baird, and M. H. Johnson, “Visual orienting in the early broader autism phenotype: Disengagement and facilitation”, Journal of Child Psychology and Psychiatry and Allied Disciplines, vol. 50, no. 5, pp. 637–642, 2009, ISSN : 00219630. DOI : 10 . 1111 / j . 1469 - 7610 . 2008 . 02051.x. K. Plaisted, M. O’Riordan, and S. Baron-Cohen, “Enhanced discrimination of novel, highly similar stimuli by adults with autism during a perceptual learning task”, Journal of Child Psychology and Psychiatry and Allied Disciplines, vol. 39, no. 5, pp. 765–775, 1998, ISSN: 00219630. DOI: 10.1017/S0021963098002601. J. Y. Kim, M. Cramer, J. Teevan, and D. Lagun, “Understanding how people interact with web search results that change in real-time using implicit feedback”, in Proceedings of the 22nd ACM international conference on Conference on information & knowledge

[11]

[12]

[13]

[14] [15] [16]

[17] [18]

[19]

management - CIKM ’13, 2013, pp. 2321–2326, ISBN: 9781450322638. DOI: 10.1145/2505515.2505663. D. J. Weitzner, “Google, profiling, and privacy”, IEEE Internet Computing, vol. 11, no. 6, pp. 95–97, 2007, ISSN : 10897801. DOI : 10.1109/MIC.2007.129. B. Morgan, M. Maybery, and K. Durkin, “Weak Central Coherence, Poor Joint Attention, and Low Verbal Ability: Independent Deficits in Early Autism”, Developmental Psychology, vol. 39, no. 4, pp. 646–656, 2003, ISSN : 00121649. DOI : 10.1037/0012-1649.39.4.646. H. C. Shane and P. D. Albert, “Electronic screen media for persons with autism spectrum disorders: Results of a survey”, Journal of Autism and Developmental Disorders, vol. 38, no. 8, pp. 1499–1508, 2008, ISSN: 01623257. DOI: 10.1007/s10803-007-0527-5. D. Moore, P. McGrath, and J. Thorpe, “ComputerAided Learning for People with Autism – a Framework for Research and Development”, Innovations in Education & Training International, vol. 37, no. 3, pp. 218–228, 2000, ISSN: 1355-8005. DOI: 10 . 1080 / 13558000050138452. M. O. Mazurek, P. T. Shattuck, M. Wagner, and B. P. Cooper, “Prevalence and correlates of screen-based media use among youths with autism spectrum disorders”, Journal of Autism and Developmental Disorders, vol. 42, no. 8, pp. 1757–1767, 2012, ISSN: 01623257. DOI : 10.1007/s10803-011-1413-8. S. Baron-Cohen, S. Wheelwright, R. Skinner, J. Martin, and E. Clubley, “The Autism-Spectrum Quotient (AQ): Evidence from Asperger Syndrome/High-Functioning Autism, Males and Females, Scientists and Mathematicians”, Journal of Autism and Developmental Disorders, vol. 31, no. 1, pp. 5–17, 2001, ISSN: 01623257. DOI: 10.1023/A:1005653411471. JBake is a Java based, open source, static site/blog generator for developers & designers. Comscore, comScore Releases February 2016 U.S. Desktop Search Engine Rankings, 2016. R. Baeza-Yates and B. Ribeiro-Neto, “Modern Information Retrieval: The Concepts and Technology behind Search”, Information Retrieval, vol. 82, p. 944, 2011, ISSN : 00094978. DOI : 10.1080/14735789709366603. Executive Function. L. E. Potter, J. Araullo, and L. Carter, “The Leap Motion controller”, in Proceedings of the 25th Australian Computer-Human Interaction Conference on Augmentation, Application, Innovation, Collaboration - OzCHI ’13, 2013, pp. 175–178, ISBN: 9781450325257. DOI: 10.1145/2541016.2541072. M. Billinghurst, A. Clark, and G. Lee, “A Survey of Augmented Reality”, Foundations and Trends® in Human–Computer Interaction, vol. 8, no. 2-3, pp. 73–272, 2015, ISSN: 1551-3955. DOI: 10 . 1561 / 1100000049.

ISBN: 1-60132-489-8, CSREA Press ©

Int'l Conf. Software Eng. Research and Practice | SERP'18 |

17

Model Driven Development for Android Apps 1

Yoonsik Cheon1 and Aditi Barua2 Department of Computer Science, The University of Texas at El Paso, El Paso, TX, U.S.A. 2 Business Administration, The University of Texas at El Paso, El Paso, TX, U.S.A. {ycheon,abarua}@utep.edu

Abstract - In model driven development (MDD), an application is built as a series of transformations on models including eventual code generation. Will the key ideas of MDD work for developing Android apps? In this paper, we perform an experiment developing an Android app by following the MDD approach. We use the Object Constraint Language (OCL) as the notation for creating precise models. Our findings are mixed in that there is a great opportunity for generating a significant amount of both platform-neutral and Android-specific code automatically but there is a potential concern on the memory efficiency of the generated code. We also identified several shortcomings of OCL in writing precise and complete specifications for UML models.

resource-constrained in storage capacity and battery lifetime, and thus memory efficiency is an important quality factor for Android apps [15]. Android apps are also relatively small and not as complex as typical enterprise applications. Thus, it is natural to ask whether the ideas of MDD are applicable to the development of Android apps and whether the promised benefits of MDD such as productivity can be obtained even without using MDD-specific tools. There is no well-known commercial quality MDD tools for Android. 3

2

9

7

7

8

6

5

1

5

4

3

1

Keywords: class invariant, model-driven development, pre and post-conditions, Android, OCL.

2

Introduction

2 7

4

6

7

2 1 6

9

6 5

5

Model driven development (MDD) relies on the use of models as the basis for software development and shifts the focus of development from writing code to building models. The key idea of MDD, especially the Model Driven Architecture (MDA) [5][16], is to construct applications as series of transformations on models including automatic code generation from the models. It is also suggested to construct models at several different abstraction levels, such as computation independent models (CIM), platform independent models (PIM), and platform specific models (PSM). The underlying assumption of MDD, however, is the existence of an appropriate model, a representation that is sufficiently general to capture the semantics of many different domains and yet precise enough to support eventual transformation into code. A formal notation like the Object Constraint Language (OCL) [20] of the Unified Modeling Language (UML)[17] can play an important role to build such a precise model. OCL is a textual, declarative notation to specify constraints or rules that apply to models expressed in UML diagrams. Android is one of the most popular mobile platforms with its own operating systems, libraries, and application programming interfaces (API). Despite the use of Java, the Android application domain is sufficiently narrow with several interesting characteristics, such as XML-based UI, eventbased reactive apps, life cycles of apps, and the single active app. However, one key difference is that Android devices are

1

5

9 3

6

1

1

2

1 9

5

1

9

8

6

7

4

3

Figure 1. Sudoku puzzle In this paper, we perform an experiment to answer the above questions. Our experiment is a small case study of applying the key ideas of MDD to the development of an Android app. The app we develop is for playing Sudoku puzzles, one of the running examples in [14]. Sudoku is a logic-based, number placement puzzle for a single player. The objective of the game is to fill a 9×9 grid with numbers so that each column, row, and 3×3 sub-grid that composes the grid contains all of the numbers from one to nine. Thus, the same number cannot appear more than once in the same column, row or sub-grid. Figure 1 shows a partially solved Sudoku puzzle; a gray square represents a number that is given and thus cannot be changed. A game starts with a partially filled grid that ideally has a single solution. Our case study consists of three main steps: (a) creating a precise specification model, (b) creating a design model, and (c) generating Android Java code from the design model. For both specification and design models, we use UML and OCL to create static models (class diagrams) as well as dynamic models (state machines). We generate functioning code manually but systematically from the UML models and accompanying OCL constraints.

ISBN: 1-60132-489-8, CSREA Press ©

18

Int'l Conf. Software Eng. Research and Practice | SERP'18 |

The rest of this paper is structured to reflect the case study steps. In Section 2, we present few related work on MDD and Android application development. In Section 3 we create a specification model of the Sudoku app that describes precisely what the app has to do. In Section 4, we transform the specification model to a design model by incorporating design decisions; we extend existing classes and add new PIM/PSM classes. In Section 5, we generate functioning Android Java code manually from our design models. In Section 6, we share our findings and lessons learned from the case study, and conclude this paper.

2

Related Work

Several researchers have applied MDD approaches to the development of mobile applications to reduce technical complexity of the development as well as the development costs. A survey and evaluation of different techniques and methods are found in [18]. Some proposed their own graphical notations to allow developers to use pre-defined blocks or models in order to customize the UI design as well as app functionalities [1]. Others used the standard modeling notation such as UML [2] [4] [19]. Unlike previous work on the use of MDD for mobile apps, the purpose of our study is not to propose new languages, techniques, methods, or toolsets. Our main objective is to study a practical application of the key components of MDD -- precise models and code generation -in developing Android apps using the standard modeling notation UML/OCL. Another objective is to study the suitability of OCL in creating precise models that can be used as the basis of MDD. Board

Game init()

size: Integer /boxSize: Integer at(x: Integer, y: Integer): Square box(x: Integer, y: Integer): Box column(x: Integer): Set(Square) row(y: Integer): Set(Square) isSolved(): Boolean

Box x: Integer y: Integer boxes /size: Integer 9

at(x: Integer, y: Integer): Square

9 squares Square x: Integer 81 y: Integer number: Integer /squares hasNumber: Boolean() permittedNumbers(): Set(Integer) isConsistent(): Boolean

Figure 2. Class diagram

3

System Specification Model

In this section, we create a specification model for the Sudoku app. A specification model describes precisely what the app has to do, and our specification model consists of two UML diagrams, a class diagram and a state machine diagram. The class diagram models the entities and the static structure of our app while the state machine diagram specifies the dynamic behavior of the app by constraining the allowed sequences of operation calls. Figure 2 show the static model. A Sudoku game consists of a 9×9 grid with numbers, called a

board. A 3x3 sub-grid of a board is called a box, and each cell of the grid is called a square. The association between Board and Square is derived from the Board-Box and the BoxSquare associations. A complete OCL specification of the model is found in [7]. Below we show a few representative specifications. The init() operation of the Game class comes up with a partially filled grid of numbers. The partially filled board should have a solution -- preferably one -- and this is expressed by a newlyintroduced operation isSolvable(). A board is solvable if there exists a solved version, a board with all the empty squares filled with non-conflicting numbers (see the specification of the Board class below). The where clause in the postconditionis our own extension to OCL to present a specification in a more structured fashion. context Game::init() post: isSolvable() and 0 < filled and filled < n where n = board.size * board.size, filled = board.squares->select(hasNumber())->size() context Game def: isSolvable():Boolean = Board.allInstances()->exists( size = self.board.sizeandisSolved() and squares->forAll(let s = self.board.at(x, y) in s.hasNumber() implieshasNumber() and number = s.number)) A board consists of a set of squares subdivided into boxes. The size attribute denotes the width and height of a board. The first invariant below constraints the size to be a square number, and the second invariant together with the invariant of the Square class asserts that each box is uniquely identified by its row and column indexes. The specification of the Board::isSolved() operation states that a board is solved if each of its squares has a consistent or non-conflicting number (see the specification of the Square class). context Board inv: size >= 9 and Sequence{1..size}->exists(i | i * i = size) inv: boxes->isUnique(Tuple{col = x, row = y}) context Board::isSolved(): Boolean body: squares->forAll(hasNumber() andisConsistent()) A square of a board is uniquely identified by a pair of column (x) and row (y) indexes. A square may have a number between 1 and Board::size, inclusive; an empty square is denoted by the number 0. All these are specified as class invariants [7]. Two interesting operations of the Square class are permittedNumbers() and isConsistent() specified below. context Square::permittedNumbers(): Set(Integer) body: result = all - b - h – v where all = Sequence{1..board.size}->asSet(), b = box.squares->excluding(self)->collect(number), h = board.row(x)->excluding(self)->collect(number), v = board.column(y)->excluding(self)->collect(number)

ISBN: 1-60132-489-8, CSREA Press ©

Int'l Conf. Software Eng. Research and Practice | SERP'18 |

context Square::isConsistent(): Boolean body: b and h and v where b = box.squares->select(s| s self and hasNumber() and number = self.number)->size() = 0, h = board.row(y)->select(s| s self and hasNumber() and number = self.number)->size() = 0, v = board.column(x)->select(s| s self and hasNumber() andnumber = self.number)->size() = 0

19

the UI of an app can be defined in XML. The design model is comprised of all classes from the specification model with some extensions as well as several new classes. Although we don’t make a clear distinction between PIM and PSM, certain extensions to the specification classes and newly introduced classes are PIM while others are PSM in that they are Android-specific extensions or classes.

The permittedNumbers() operation returns the set of all non-conflicting or allowed numbers for a square. A number is allowed in a square if it doesn’t appear in any other squares in the same box, column, and row. The isConsistent() operation determines if a square is consistent. A square is consistent if its number doesn’t appear in any other squares in the same box, column, and row. An empty square is consistent as can be inferred from the specification. Figure 3 shows a dynamic model of the game. It is a protocol state machine for the Game class and specifies the allowed sequence of operation calls. Figure 4. UI design init()/A clear(x,y)[Q]

fill(x,y,n)[P]

init()/A

Filling

[isSolved()]

Solved clear(x,y) ≡ board.at(x,y).setNumber(0) fill(x,y,n) ≡ board.at(x,y).setNumber(n) A ≡ given = board.filledSquares() P ≡ not isSolved() and given->excludes(board.at(x,y))] Q ≡ not isSolved() and given->excludes(board.at(x,y)) and board.at(x,y).hasNumber()

Figure 4 shows the layout of the UI consisting several buttons and a custom view. A custom view is used to display the current board state in 2-D graphics and to select a square. We can associate UI events such as button click and screen touch with game operations, e.g., Game::init() with the new button. With these bindings, the protocol state machine in the previous section can be turned into a state machine describing the allowed sequences of user interactions. use

interface SelectionListener selected(s: Square) *

Figure 3. Protocol state machine An initial board configuration -- partially filled grid of numbers that has a solution -- is created by the init() operation. Numbers are filled in or removed from the board until the board becomes solved. The guards P and Q in the fill(x,y,n) and clear(x,y) transitions prevent the fixed number given by the initial configuration from being replaced or removed. The composite transition init() allows one to start a new game in any state.

4

Design Model

In this section, we create a design model for our app that conforms to the specification model created in the previous section. Our design model consists of a UI design (see Figure 4), an architecture, and detailed designs of classes and algorithms. We uses the model-view-control (MVC) pattern, where the model (business logic) is separated from the view (UI) and the control. This is a good style for Android apps as

activity MainActivity

BoardView

onCreate() onResume()

use

onDraw(c: Canvas) board BoardModel

0..1

Square given: Boolean squares

Game theInstance: Game

* interface ChangeListener

Board

getInstance(): Game

filled(s: Square, o: Integer)

Figure 5. Design class diagram The class diagram in Figure 5 shows part of our design model. Our design approach is to refine the specification model to a PSM that can be directly translated to source code. New classes and interfaces are introduced, and existing specification classes are extended with new attributes, associations and operations. In Android, an activity is an app component responsible for a single UI screen. The MainActivity class is the main entry point to our app. The BoardView class is a custom view class to display a board,

ISBN: 1-60132-489-8, CSREA Press ©

20

Int'l Conf. Software Eng. Research and Practice | SERP'18 |

and the BoardModel class is a subclass of Board to contain view-dependent board data, e.g., the current selection of a square to be displayed differently. The Observer design pattern [11] is used to notify changes in BoardView and Board to interested observers such as MainActivity. One of our platform-specific design decisions is to use the Singleton design pattern [11] for the Game class. It is to save and restore the game state when the activity is destroyed and later recreated. On Android, an activity may be paused, stopped, and destroyed at any time by the operating system; e.g., it is destroyed when the screen orientation changes. When the main activity is (re)created, the single instance of the Game class is retrieved to obtain its previous state. The OCL specification of the design model is found in [7]. Specifying Android framework classes such as activity and view classes are challenging because there is no formal model for the framework itself. Our approach is to write a loose specification by focusing only on the key state changes or interactions among objects. For example, the onDraw() framework operation of the BoardView class that displays the current state of the board is specified loosely as follows.

for creating an initial board configuration. It creates a partially filled grid of numbers. We design one possible algorithm for the operation and specify it by drawing a behavioral state machine (see Figure 6). The algorithm consists of two steps: creating a solved grid of numbers and removing numbers repeatedly that, upon removal, are likely to produce only a single solution. To determine the likelihood of one solution, a candidate number is removed from the board and then the board is solved again. If the solver fills the candidate square with the same number, it is more likely to have one solution; otherwise, the candidate is rejected and a new one is attempted. The key of our algorithm is a solver that finds a solution for an empty or partially filled board. We design a solver based on backtracking (see Figure 7). A backtracking algorithm constructs a solution by making succession of choices. If there is no choice available, it retraces backwards through the choices made, undoing their effect, until an alternative choice is found. The complete specification of the solver along with its operations and algorithms are found in [7]. Assignment

contextBoardView::onDraw(c: Canvas) post: self^drawLines(c) and board.squares->forall(s| s.hasNumber() implies self^drawSquare(c, s))

{ordered} *

Game

Solver

init()

solve():Boolean

history

values: Integer[*] perform undo() hasAlternative(): Boolean *

Board

The postcondition is written using the OCL message expression; the hasSent (^) operator specifies that a specified message should be sent during the execution of the operation. Drawing a 2-D representation of a board involves drawing horizontal and vertical lines of the grid (drawLines) and drawing the numbers for the squares (drawSquare). The specification is loose in that it only constrains that appropriates drawing messages be sent during the execution; it doesn’t say anything about its effect or state change.

[c->size() < N]

squares

Square

Figure 7. Design of a backtracking solver

/board.clear(); solver.solve()

entry: A [c->size() > N] /s = c->any(true); v = s.number(); s.setNumber(0); solver.solve()

Figure 8. GUI

/P [s.number v] /s.setNumber(v)

[s.number = v] /s.setNumber(0)

A ≡ c = board.squares->select(hasNumber()) P ≡ [board.squares->forAll(s| c.excludes(s) implies s^setNumber(0)]

Figure 6. Algorithm for creating a new game For operations whose constraints cannot be translated directly to executable code, we design algorithms for them. One such operation is the Game::init() operation responsible

5

Implementation

In this section, we translate our design model to executable Android Java code including both the UI and the functional core. The key of our approach is to generate code incrementally on a need basis. We first generate skeletal code from the class diagram considering only the structures such as classes and their attributes. We then translate operations to Java methods, considering one operation at a time. We focus on core operations that are explicitly specified in the model. The idea is to introduce additional structural elements (e.g.,

ISBN: 1-60132-489-8, CSREA Press ©

Int'l Conf. Software Eng. Research and Practice | SERP'18 |

fields for associations) and behavior such as helper methods (e.g., getter and setters for fields and association ends) on a need basis. This will let us to generate minimal code with minimal effort. The UI of an Android app is defined declaratively in XML called a layout. We create our layout XML file (see Figure 8) using the Layout Editor of Android Studio, a visual editor for creating layouts by dragging widgets. We next translate our detailed design -- consisting of a class diagram, OCL specifications, and behavioral state machines -- to working Java code. While the class diagram is used to generate the basic structural elements of an implementation, it is necessary to use OCL constraints and dynamic models like state machines to generate the detailed functional behavior of a system. Operations of classes are translated to methods based on either their postconditions or behavioral state machines. A behavioral state machine can be automatically translated to functional code if it is written in the style used in this paper. When an operation doesn’t have an algorithm specified as a behavioral state machine, we use its postcondition to generate its functional code. Its precondition can also be used to generate a runtime check to determine whether the assumption of the operation is met or not [6]. When the postcondition is written in an explicit style in which the expected change in the value of features are specified definitely, it can be used to derive code systematically [7]. The new features of Java 8 language and APIs such as lambda expressions and the Stream API are of great help in systematically deriving concise code from constraints [12]. The OCL collection iterators such as forAll can be translated to the corresponding Java stream operations such as allMatch and their argument expressions to Java lambda expressions. For example, shown below is one possible translation of the Square::isConsistent() operation specified in Section 3. public booleanisConsistent() { return isConsistent(box.squares()) &&isConsistent(board.row(y)) &&isConsistent(board.column(x)); } private Boolean isConsistent(Collectionsqs) { return sqs.stream().filter(s -> s != this && s.hasNumber() && s.number() == number()).count() == 0; } The select iterator of OCL is nicely translated to the corresponding stream method, filter, and its argument expression becomes a lambda expression in Java. The resulting code thus has a structure similar to that of the OCL constraint. It should be noted, however, that generating code for update operations are more involved than query operations, as one needs to figure out the features whose values have to be changed [7]. It is often needed to address platform-specific restrictions or constraints during the implementation.

21

In our app implementation, we generated all nine classes from the design model with total 30 fields and 74 methods (without counting constructors), and Java source code consists of 1279 lines. Among the 30 fields, more than half (57%) are generated from associations. This is generally a good sign for an object-oriented program. It indicates that there exists a significant structure in the program. It may also mean that the functionalities of the program are well distributed among the constituent classes. One interesting result is that only 39% of methods are from the operations explicitly specified in the model. The rest are from attributes, associations, and common subexpressions of operation specifications (helpers). The percentage will decease further with getter and setters for such attributes as size, x, and y of Box and Square classes that are currently translated to final fields. This means that one needs to focus on specifying and designing only interesting core operations that account for less than 40% of operations, and the rest of the operations -- secondary, uninteresting or helper operations, which are not even specified as operations in the model -- can be generated automatically. In sum, a significant amount of code can be generated automatically.

6

Discussion and Conclusion

Our findings from the case study are mixed. There are of course obvious benefits of creating precise models. As shown in the previous section, a significant amount of code can be generated automatically from the models. Besides code that can be generated from constraints and behavioral state machines, operations can be derived on a need basis from the static structures expressed in class diagrams, e.g., operations for accessing attributes and traversing associations. In our app, for example, more than 60% of operations are of these and other helper operations mechanically generated from the models. It is also possible to generate platform-specific code automatically from models, e.g., code to preserve the state of an app by marking its attributes using a custom stereotype [7]. Automatic code generation allows one to focus on specifying and designing only important and interesting operations; one does not have to consider or even express in the model uninteresting detailed design or implementation decisions such as the visibility of features and the navigability of associations. The process of creating a precise model provides an opportunity for refactoring the model. It lets one to review and evaluate the current model and thus refine and improve it. This happens naturally as part of formulating and writing constraints. A precise model also facilitates evolution of an application. For example, we made several enhancements to our app along with options to enable and disable the new features [7]. Our models provided us a good guidance in developing the enhancements by allowing us to identify the required change along with its impact in the models and the source code. The preciseness of the model helped us to come up with an extended design with minimal change in order to support the enhancements.

ISBN: 1-60132-489-8, CSREA Press ©

22

Int'l Conf. Software Eng. Research and Practice | SERP'18 |

Figure 9. Pattern of garbage collection events One potential problem for automatically-generated code, however, is its performance, especially memory efficiency. Since Android devices are resource-constrained in storage capacity and battery lifetime, performance is always a concern in developing an app. It is even said that identifying an app’s performance bottlenecks and addressing them is critical to the success of the app [15]. A performance problem hardly shows up in the model and is revealed only after the model is transformed to code and then executed. In fact, we encountered such a problem in one of our operations whose constraints was directly translated to Java 8 code. The generated method caused frequent garbage collections, suspending all threads several times and eventually making the app abort (see Figure 9). Since Java 8 stream operations cause significant memory overheads by creating hidden objects [9], we initially thought that the problem was caused by the use of Java 8 Stream API. But, the cause of the problem turned out to be a faulty specification. A more fundamental concern is that specifications are generally written with clarity in mind, not for efficiency. Can the code generated from such specifications be efficient on Android? We learned that developing a good model is an iterative process and manual coding can be a complementary technique to find and fix problems in a model. Is OCL a good notation for constructing a precise model for MDD? OCL can be an effective notation for writing constraints involving a single state, e.g., invariants, values of derived attributes and associations, query operations, and preconditions of update operations. However, it lacks expressiveness for writing complete specifications of state changes as well as being precise on the required interactions of operations [7]. What is missing is a built-in language construct for specifying the so-called frame axiom that essentially says “and nothing else changes” [3] [13]. Android apps are becoming highly interactive and more complex reactive systems; e.g., 43% of our source code lines are for Android framework-related UI and control classes. Although OCL message expressions can provide some help in specifying the required collaboration among objects, it would be challenging to specify the rich interactions possible on the Android platforms abstractly and at the same time sufficiently detailed so as to generate efficient code. We identified several possible extensions to OCL to make it more expressive and practical as well, including local-scoped definitions, a quoting mechanism to refer to other constraints, an informal description to escape from formality, specification-only features to write abstract constraints, and a lambda-like notation to constrain callback operations [7]. Is MDD a practical approach for developing Android apps? The key components of MDD such as precise models and

code generations can certainly be incorporated into the development of Android apps. However, one needs to consider the effort as well as the skills needed to create precise models. Thus, it may not be such an attractive approach for developing typical Android apps like our Sudoku app. However, it may be possible to reap the benefits of MDD for a certain types of apps such as health-related apps(e.g., [8]) that require high assurance in meeting functional correctness or satisfying appropriate safety or regulatory requirements.

References [1]

[2]

[3]

[4]

[5] [6]

[7]

[8]

[9] [10] [11] [12]

[13] [14] [15]

[16]

[17] [18]

[19]

[20]

F. Balagtas-Fernandez and H. Hussmann, Model-Driven Development of Mobile Applications, International Conference on Automated Software Engineering, pages 509-512, September 15 - 19, 2008. H. Eenouda, et al., Modeling and Code Generation of Android Applications Using Acceleo, International Journal of Software Engineering and Its Applications. 10(3):83-94, 2016. A. Borgida, et al., “… And nothing else changes”: the frame problem in procedure specifications, 15th International Conference on Software Engineering, pages 303-314, Baltimore, MA, May 1993. G. Botturi, et al., Model-driven design for the development of multiplatform smartphone applications, Specification & Design Languages, Paris, France, September 24-26, 2013. A. Brown. Model driven architecture: principles and practice, Software and System Modeling, 3(4):314-327, Dec., 2004. Y. Cheon, et al., Checking Design Constraints at Run-time Using OCL and AspectJ, International Journal of Software Engineering, 2(3):5-28, December 2009. Y. Cheon and A. Barua, Sudokup App: Model-Driven Development of Android Apps Using OCL, Technical report 17-91, Department of Computer Science, University of Texas at El Paso, El Paso, TX, November 2017. Y. Cheon, R. Romero, and J. Garcia, HifoCap: An Android App for Wearable Health Devices, 8-th International Conference on Applications of Digital Information and Web Technologies, volume 295 of Frontiers in Artificial Intelligence and Applications, pages 178192, 2017. , University of Texas at El Paso, El Paso, TX, September 2017. R. B. France, et al., Model-driven development using UML 2.0: promises and pitfalls. IEEE Computer, 39(2):59-66, February 2006. E. Gamma, et al., Design Patterns, Addison-Wesley, 1994. J. Gosling, et al., The Java Language Specification, Java SE 8 Edition, February, 2015, Available from: https://docs.oracle.com/javase/specs/ jls/se8/html/index.html. P. Kosiuczenko, Specification of invariability in OCL, Software & Systems Modeling, 12(2):415-434, 2013. K. Lano, Model-Driven Software Development with UML and Java. Course Technology, 2009. M. Linares-Vasquez, et al., How developers detect and fix performance bottlenecks in Android apps, IEEE International Conference on Software Maintenance and Evolution, September 2015, pages 352-361. T. O. Meservy and K. D. Fenstermacher, Transforming software development: an MDA road map, IEEE Computer, 38(9): 52-58, September 2005. J. Rumbaugh, I. Jacobson, and G. Booch, The Unified Modeling Language Reference Manual, 2nd ed. Addison-Wesley, 2004. E. Umuhoza, Eric and M. Brambilla, Model Driven Development Approaches for Mobile Applications: A Survey, International Conference on Mobile Web and Information Systems, LNCS, volume 9847, pages 93-107, 2016. S. Vaupel, et al., Model-Driven Development of Mobile Applications Allowing Role-Driven Variants, Model-Driven Engineering Languages and Systems, LNCS, volume 8767, pages 1-17, 2014. J. Warmer and A. Kleppe, The Object Constraint Language: Getting Your Models Ready for MDA, 2nd ed. Addison-Wesley, 2003.

ISBN: 1-60132-489-8, CSREA Press ©

Int'l Conf. Software Eng. Research and Practice | SERP'18 |

23

Development of Smart Mobility Application based on Microservice Architecture Masatoshi Nagao1 , Takahiro Ando1 , Kenji Hisazumi2 , Akira Fukuda1 1 Graduate School / Faculty of Information Science and Electrical Engineering, Kyushu University, Fukuoka, Japan 2 System LSI Research Center, Kyushu University, Fukuoka, Japan Abstract— This study proposed the use of microservice architecture for smart mobility applications. For application development in the field of intelligent transport systems, considering a system that is simple, sustainable, and has long-term operability is necessary. Using the microservice architecture, evolutionary and flexible applications can be easily developed using the services that already exist. This study introduces the overview of microservice architecture and discuss its usefulness for the development of smart mobility applications based on the bus timetable recommendation application developed herein using this architecture. Fig. 1: Difference between monolithic and microservice architecture Keywords: Smart Mobility, Intelligent Transport Systems, Microservice Architecture, Application, Mixed Reality

2. Related Technology 2.1 Microservice Architecture [2]

1. Introduction Smart mobility society is a society to realize smooth and comfortable mobilities with intelligent transport systems (ITS). With the development of ITS, the realization of smart mobility society is expected to increase. Applications/systems related to smart mobility such as destination recommendation guide applications and driving safety support systems are being actively developed. However, much of this research is based on the perspective of civil works and city planning and research from the viewpoint of information platform is limited [1]. For the development of smart mobility applications, confirming the appropriate architecture in advance, including its effect on the life cycle of the system, is essential. Some applications related to ITS use augmented reality (AR) technology as the user interface. Recently, a technology called mixed reality (MR), which is related to AR, is being advocated by Microsoft. MR uses the information of objects in the real world and displays life-like virtual objects. Utilizing this characteristic, MR can be used as the user interface for applications related to ITS. This study is based on the idea that creation of a new implementation for a function can be avoided by taking advantage of other services for the same function already provided.

Microservice architecture is a system configuration that combines multiple small-scale services (microservices) and realizes a new application. Each microservice is a service that provides a different function, and the connection between each service is realized by a lightweight communication method. The difference between monolithic architecture and microservice architecture is shown in Fig. 1. Lewis and Fowler mentioned the nine demands of the microservice architecture as follows [2]. Componentization via Services Major functions are not performed as libraries, but are decomposed as services and can be individually managed. Organized around Business Capabilities Make it a developer team with a wide range of skills to individually build and operate services. Products not Projects Increase value as a business by developing and operating continuously. Smart Endpoints and Dumb Pipes Communicate between services using a simple and lightweight protocol. Decentralized Governance Design using technology that is appropriate for each service. No need to standardize on a single

ISBN: 1-60132-489-8, CSREA Press ©

24

Int'l Conf. Software Eng. Research and Practice | SERP'18 |

3. Development of Smart Mobility Application Using Microservice Architecture

Fig. 2: The concept of Mixed Reality

platform. Decentralized Data Management Manage a database that is appropriate for each service instead of creating an integrated database. Infrastructure Automation Realize system cooperation between services and quality improvement of applications by automating the application test. Design for Failure Since multiple services are connected in the microservice architecture, considering the influence of service failure is always necessary. Evolutionary Design Construct a design that can be updated by each service to ensure appropriate generalization.

2.2 Mixed Reality MR is a technology wherein virtual information and objects are projected to the real world through a headmounted display, and both the information in the real and virtual worlds interacts with each other. The concept of MR is shown Fig. 2. There are two types of MR devices: holographic and immersive. Using the holographic-type MR device, the real world can be seen through a transmission lens and an AR-like experience can be achieved by displaying objects on the lens. The immersive-type MR device recognizes the real world through a camera and a VR-like experience can be achieved by projecting an environment composed of virtual objects on the field of view. In brief, MR includes the concepts of both AR and VR. Examples of a typical immersive device include the Acer Windows Mixed Reality Headset AH101 [3] and Samsung Odyssey [4] and that of a typical Holographic device include the Microsoft HoloLens [5] and Meta2 [6]. In this study, we use Microsoft HoloLens, which can operate independently (without connecting with other personal computers).

Various kinds of data are used in the development of smart mobility applications. For example, the destination recommendation application requires map data, congestion information, weather information, transportation fee data, etc. Moreover, because such data is frequently updated the application must be flexibly adaptable to different situations for providing accurate data. Furthermore, developers need to consider application usability at all times. In summary, a smart mobility application will need to satisfy the following items. • Easily deal with many types of data • Improve usability • Adapt to the frequently updated data In this study, we consider that using microservice architecture as a development method is most suitable to satisfy these items. To satisfy the first requirement, each data can be divided as a service and managed independently in microservice architecture. By assigning work to the service side, developers do not need to manage the data when dealing with many types of data. To satisfy the second requirement, the concept of Design for Failure (Section 2.1) of the microservice architecture can be applied. Applications can be developed such that even when a certain service fails, they can be available except for certain functions. In other words, the scenario wherein the entire application cannot be used owing to failure of a certain service can be avoided. Therefore, using microservice architecture improves usability. Finally, since each type of data is divided into services, it can be easily updated. The service manager only has to manage the data to ensure continuous operation of the service. In summary, microservice architecture is suitable for smart mobility applications. Furthermore, using services that already exist, constructing a new service is easy.

4. Overview of the Bus Timetable Recommendation Application In this study, we developed a smart mobility application using MR by applying microservice architecture. In this section, we describe the outline of the bus timetable recommendation application developed herein and explain the element technology used.

4.1 Application Behavior Figure 3 shows the outline of the bus timetable recommendation application. This application uses two independent timetable services and performs the following operation.

ISBN: 1-60132-489-8, CSREA Press ©

Int'l Conf. Software Eng. Research and Practice | SERP'18 |

25

Table 1: Operation of REST API

Unity Bus A timetable service ・Timetable data

Fig. 3: Outline of the bus timetable recommendation application

・Receive request ・Analyze request ・Send data

REST API JSON

Name

Detail

POST

Create data

GET

Get data

PUT

Update data

DELETE

Delete data

Service communication part of the bus A ・Received timetable data ・Start connection ・Update data

Display panel

・Recommend time ・Display time ・Erase panel ・Track gaze

Service communication part of the bus B ・Received timetable data ・Start connection ・Update data

Bus B timetable service REST API JSON

・Timetable data ・Receive request ・Analyze request ・Send data

When the marker is recognized, Display panel appear Vuforia(Library) Name Attribute Function

1) When HoloLens recognizes the prespecified AR marker, it sends a data request to the corresponding timetable services. 2) Based on the request, the timetable services converts the data into the JavaScript Object Notation (JSON) format and return it. 3) HoloLens displays the recommended time on the MR space from the received data in the JSON format.

4.2 Element Technology Vuforia [7] Vuforia is a software development kit for mobile devices that enables the development of AR applications. It can register any image as an AR marker. The behavior when the marker is recognized is described on Unity (Section 6.1).

REST API Representational state transfer (REST) application programming interface (API) is a type of web service design model. The specifications are as follows. •

• •

Individual resources can be referred to by uniform resource locator (URL) and communicate using the hypertext transfer protocol (HTTP) method. The four operations of the REST API are shown in Table 1. Notify the result by the HTTP status code. The meaning of the code number is briefly given below. 1) (100 number) Information The request is accepted and processing is continued. 2) (200 number) Success The request was accepted and processing is completed. 3) (300 number) Redirection Additional processing is required to complete the request.

・Marker data ・Recognize marker ・Show "Display Panel"

Write functions in C#

Send the dataset for UnityEditor to Unity Vuforia(External site) ・Register marker ・Create dataset for UnityEditor

Fig. 4: Outline of the application system

4) (400 number) Client Error Processing could not be completed because there was an error in the request from the client. 5) (500 number) Server Error The server could not complete processing the request.

JSON JSON is a type of data interchange format and is described as a name-value pair in the {key : value} format. This format is characterized by high readability, lightweight data. In the application developed in this study, we launch simple timetable services that can return data in the JSON format. This service is realized using a tool called “JSON Server,” which can easily launch RESTful mock server.

5. System of the Bus Timetable Recommendation Application This section describes the design and specifications of the application outlined in the previous chapter.

5.1 Overall View of the System Figure 4 outlines the entire application including the main modules that make up the system. It expresses the name/attribute/function for each module and represents the relationships by lines. Each module is explained below. Display panel The “Display panel” and “Service communication part of the bus A(B)” are connected. This module

ISBN: 1-60132-489-8, CSREA Press ©

26

Int'l Conf. Software Eng. Research and Practice | SERP'18 |

Behavior on HoloLens Recognize

Marker

recognition standby

Bus A timetable service

Communicating

the marker/ Start communication

/Send request

Bus A

recommend/ Display the data

Recommend [Success] bus data

Analysis completed [Success]/ Send data

[Failure]/Display error

Displaying Manipulate page Track gaze

/send request

Display the data

Analyzing

response

Delete window

Finish recommend/

Receive request /Start request analysis

Receive

Finish

Pressed erase button/

Waiting

Waiting for reception

Bus B Waiting for reception Receive

Recommend bus data

[Failure]/ Send error code

Bus B timetable service

[Success]

Waiting

response

Receive request

/Start request analysis

Analyzing

[Failure]/Display error

Analysis

completed

Pressed update button [Success]/ Send data

[Failure]/ Send error code

Fig. 5: Outline of the application flow plays a role in displaying information on a panel that can be seen on HoloLens. Service communication part of the bus A(B) This module connects to a service that provides a bus timetable and is responsible for acquiring and updating the specified timetable. Communication with the service is performed using REST API, and if there is no problem in the response from the service, the timetable data is transmitted in the JSON format. Bus A(B) timetable service This is an independent small-scale service that provides bus timetables. It analyzes the received request and transmits the specified timetable data in the JSON format to the client if the request is correct. If there are faults in the request, it notifies the client of request errors via the HTTP status code. Vuforia In the “External site,” it registers the markers that are recognized with Unity and downloads the dataset for UnityEditor. In the “Library,” it selects the marker included in the dataset and plays the role of making the “Display panel” appear when the HoloLens recognizes the marker.

5.2 Application Procedure The state machine diagram representing the state transition of the application is shown in Fig. 5. It shows the operation when the application is executed on HoloLens and the operation of the two independent timetable services. Each state is as follows.

Behavior on HoloLens Marker recognition standby This state shows the operation from activation of

the application to recognition of the marker on the view of HoloLens. When HoloLens recognizes the marker, the application shifts to the state of “Communicating.” Communicating This state shows the operation until the timetables are acquired from the timetable services. A request is sent to each service and the application will be in the “Waiting for reception” state until a response is received from the service. If timetable data acquisition from the service is successful, the recommended time is decided by comparing the timetable data with the current time; the application then exits the state of “Communicating.” If data acquisition fails, a warning indicating that communicating with the service has failed is displayed and the application exits this state. In this state, the communication with the two services is performed asynchronously. After each communication ends and the recommended data (or error) display is completed, the application shifts to the state of “Displaying.” Displaying This state displays the recommended time. When the data of the recommended time is described over a page, it is displayed over multiple pages. In addition, the window displayed on HoloLens has the function to follow the line of sight, and in the “Displaying” state, the eye tracking process is always performed. This state has two events to shift to other states. One is an event wherein the erase button is pressed and the process of deleting the display window is executed. Subsequently, the application shifts to the “Marker recognition standby” state. Another is an event wherein the update button is pressed and the application is shifted to the “Communicating” state to update the recommendation time.

Bus A(B) timetable service Waiting When starting the server, the application shifts to the “Waiting” state. Upon receipt of a request from the client, the application transitions to the “Analyzing request” state. Analyzing request In this state, the application analyzes requests from clients. If the request is in the correct format, specified data is sent to the client; otherwise an error code is sent. Subsequently, the application shifts to the “Waiting” state that waits for a request from the client.

ISBN: 1-60132-489-8, CSREA Press ©

Int'l Conf. Software Eng. Research and Practice | SERP'18 |

27

Fig. 7: Bus recommendation time display on HoloLens Fig. 6: Overall view of the user interface

6.1 Implementation Environment 5.3 User Interface The timetable displayed on HoloLens is shown in Fig. 6. Figure 7 shows a screenshot of when the timetable in Fig. 6 is actually displayed on HoloLens. Functions shown in this figure are explained below. 1) Current time Displays the current time. 2) Destination Displays the bus company name and destination. 3) Update button Communicates again with the bus timetable server and updates the information. 4) Erase button Erases the information on the display panel. 5) Bus status Displays the current status of the bus such as “Awaiting” or “Passed.” 6) Bus recommended time Recommends and displays the time of the bus that has not passed based on the current time. 7) Page manipulation Displays the next(previous) page. 8) Page number Displays the number indicating the page currently displayed. 9) Gaze tracking The timetable board moves following the line of sight on the HoloLens.

When developing the bus timetable recommendation application, we create objects and describe behaviors in Unity. The development procedure of the application is as follows. 1) Create objects and describe behavior. 2) Convert to code for universal windows platform (UWP) and debug. 3) Deploy the application to Microsoft HoloLens.

Unity [8] Unity is used to create and operate the object projected by Microsoft HoloLens. It is a game engine corresponding to multiple platforms and is developed by Unity Technologies. It is often used when creating games that handle 2D/3D models.

Visual Studio [9] After creating the application and converting to code for UWP in Unity, we used Visual Studio to deploy the application to Microsoft HoloLens. Visual Studio is an integrated development environment provided by Microsoft for developing software.

6.2 Method and Result of the Evaluation We qualitatively evaluated how well the nine features of the microservice architecture described in Section 2.1 are satisfied in the application developed in this study. Based on the above, we discuss how well the microservice architecture is suitable for smart mobility applications. The evaluation results are shown in Table 2 and we explain the reason for the evaluation result of each item below.

Componentization via Services

6. Implementation and Evaluation This section describes the development environment and procedures involved in developing the bus timetable recommendation application using microservice architecture. Moreover, it evaluates whether applying microservice architecture is suitable for the development of smart mobility applications.

For implementing the bus timetable recommendation application, we use two independent services that provide bus timetables. This application acquires bus timetables from each service and displays the recommended time for each timetable by comparing it with the current time. Therefore, the main function of the application can be implemented without using a library. Therefore, we evaluate this parameter as “being satisfied (✓).”

ISBN: 1-60132-489-8, CSREA Press ©

28

Int'l Conf. Software Eng. Research and Practice | SERP'18 |

Table 2: Evaluation result of the developed application Componentization via Services



Organized around Business Capabilities



Products not Projects



Smart Endpoints and Dumb Pipes



Decentralized Governance



Decentralized Data Management



Infrastructure Automation



Design for Failure



Evolutionary Design



Fig. 8: Influence of one service not working on the application

Infrastructure Automation Organized around Business Capabilities In this study, we developed a prototype of the application. However, because it was developed individually, we did not organize a team to construct and operate the service. Therefore, we evaluate this parameter as “unconfirmed (–).” Note that because functions of the application are decomposed into services, it is realistic to organize teams with various abilities for each service.

Products not Projects For the same reason in the previous item, we could not verify this item. Therefore, we evaluate it as “unconfirmed.” However, we think that it is sufficiently possible to continuously circulate the life cycle for each service.

Smart Endpoints and Dumb Pipes The communication between the services of the application is implemented using the REST API via the HTTP method, and only select from the four operations shown in Table 1 with URL specifying. Because extra functions are reduces as much as possible and only the necessary functions exist, we evaluate this parameter as “being satisfied.”

Decentralized Governance This application has two microservices that provide timetables. Because the application is a prototype of smart mobility applications using microservice architecture, both services use the JSON server, which is a tool for launching a simple server. Therefore, the application is not implemented using multiple platforms. However, regardless of the internal implementation of the service, data communication can be done without problems as long as the interface is decided. Thus, although we think that it is sufficiently possible to satisfy this item, since we do not actually implement it, we evaluate it as “unconfirmed.”

Decentralized Data Management Because both timetable services have their own timetable data for each service, the suitable database for each service can be used for data management. Therefore, we evaluate this parameter as “being satisfied.”

Because automation test of the developed application is not implemented, we evaluate this parameter as “unconfirmed.” For implementation, we think that a simple data server should be launched for testing and repeatedly communicating via various requests.

Design for Failure This application has two microservices that provide timetables. Herein, the application is constructed such that even when one microservice is unavailable, the other service can be used. Figure 8 shows the execution of the application on HoloLens. Even if one service cannot be used, the application can display the information of the other service. In other words, a system that considers the influence of service failure has been constructed. For these reasons, we evaluate this parameter as “being satisfied.”

Evolutionary Design The functions used in the application are divided by service units. Because services are connected only by simple communication using REST API, each service can evolve independently. For this reason, we evaluate this parameter as “being satisfied.”

General Review Based on the above evaluation results, no item is concluded as “not suitable.” Therefore, we conclude that the microservice architecture is fully suitable to develop smart mobility applications. Furthermore, it is possible to completely satisfy the “unconfirmed” parameters depending on the implementation.

7. Related Work Considerable research has been conducted on microservice architecture. Reference [10] mentions how to divide the system into microservices and refers to the communication patterns between services. Our application adopts the asynchronicity recommended in this research as the communication method of the internet of things application.

ISBN: 1-60132-489-8, CSREA Press ©

Int'l Conf. Software Eng. Research and Practice | SERP'18 |

Reference [11] describes the experience when a monolithic software configuration is moved to a microservice architecture. Using this architecture can enable a system to expand flexibly; however, moving to this architecture requires considerable labor. In conclusion, it should not be determined easily to shift microservice architecture. In reference [12], a smart mobility application that supports driving by projecting virtual objects in the real world is developed. Examining the comfort of driving when using AR (or MR) shows overall good evaluation results and indicates that AR (or MR) is useful in the field of traffic engineering.

29

[8] U. Technologies, “Unity 2017.2,” https://unity3d.com/jp, 2017. [9] Microsoft, “Microsoft Visual Studio 2017,” https://www.microsoft. com/ja-jp/dev/default.aspx, 2017. [10] D. Namiot and M. Sneps-Sneppe, “On micro-services architecture,” Interenational Journal of Open Information Technologies, vol. 2, pp. 24–27, 09 2014. [11] A. Balalaie, A. Heydarnoori, and P. Jamshidi, “Migrating to cloudnative architectures using microservices: An experience report,” in Advances in Service-Oriented and Cloud Computing, A. Celesti and P. Leitner, Eds. Cham: Springer International Publishing, 2016, pp. 201–215. [12] G. Moussa, E. Radwan, and K. Hussain, “Augmented reality vehicle system: Left-turn maneuver study,” Transportation Research Part C: Emerging Technologies, vol. 21, no. 1, pp. 1–16, 2012.

8. Conclusions In this study, we proposed the use of microservice architecture for smart mobility applications. Using this architecture, the functions of the application can be divided as a services, so that each function can be easily managed and updated according to the life cycle of each service. We developed the bus timetable recommendation application as a smart mobility application based on microservice architecture. Results of examining whether the features of microservice architecture are satisfied through this application show that it is fully suitable to develop smart mobility application. As future work, we plan to develop a large-scale project. The application developed in this research was a smallscale prototype, and we could not examine its influence for long-term operation. Therefore, a large-scale development is important to confirm its long-term operability. In addition, we will have to consider the communication of large capacity data. In the current system, because communication between services is performed by a simple method, only communication with lightweight data is assumed. However, large capacity data such as map data will be required when developing smart mobility applications. We will continue to discuss how this is the most suitable method for the development of smart mobility applications. Acknowledgment. This work is partially supported by JSPS KAKENHI Grant Number JP15H05708.

References [1] A. Fukuda, K. Hisazumi, S. Ishida, T. Mine, T. Nakanishi, H. Furusho, S. Tagashira, Y. Arakawa, K. Kaneko, and W. Kong, “Towards sustainable information infrastructure platform for smart mobility - project overview,” in 2016 5th IIAI International Congress on Advanced Applied Informatics (IIAI-AAI), July 2016, pp. 211–214. [2] J. Lewis and M. Fowler, “Microservices,” https://martinfowler.com/ articles/microservices.html, March 2014. [3] Acer, “Acer Windows Mixed Reality Headset AH101,” https://www. acer.com/ns/ja/JP/smart/ah101/, 2017. [4] SAMSUNG, “Samsung Odyssey,” https://www.samsung.com/us/ computing/hmd/windows-mixed-reality/xe800zaa-hc1us-xe800zaahc1us/, 2017. [5] Microsoft, “Microsoft Hololens,” https://www.microsoft.com/ja-jp/ hololens, 2016. [6] Meta, “Meta2,” http://www.metavision.com/, 2017. [7] Vuforia, “Vuforia 7,” https://www.vuforia.com/, 2017.

ISBN: 1-60132-489-8, CSREA Press ©

30

Int'l Conf. Software Eng. Research and Practice | SERP'18 |

User Interaction Metrics for Hybrid Mobile Applications Thomas G. Hastings

Kristen R. Walcott

College of Engineering and Applied Science University of Colorado at Colorado Springs Colorado Springs, Colorado 80918 Email: [email protected]

College of Engineering and Applied Science University of Colorado at Colorado Springs Colorado Springs, Colorado 80918 Email: [email protected]

Abstract—Understanding user behavior and interactions in mobile applications is critical for developers to understand where to spend limited resources when adding, updating, and testing features but current tools do not do a good job of providing actionable insights. User behavior insights can provide value to the developer when it’s time to code and implement new features. Google Analytics and New Relic provide user insights but they fall short when it comes to identifying user interactions and behaviors as it pertains to individual features of mobile applications We have developed a framework with middleware that provides user interaction insights, using time-series analysis, to hybrid mobile applications along with an empirical study to showcase the value of the framework. Index Terms—Mobile Application Testing, Software Engineering, Time Series Analysis, Metrics

I. I NTRODUCTION Mobile software engineers have a finite amount of time and resources while working through the software development life cycle. How does an engineer decide where she should budget more time and resources to get the most return for her investment? Instrumentation within a hybrid mobile application can provide metrics based on user interaction. These metrics can be turned into actionable insights for the developer to better understand how users use the application. Based on these insights and understanding the developer can then focus on the popular aspects of the application and increase her return on investment. Hybrid mobile applications are applications which are written in web languages such as JavaScript, HTML 5, and CSS. Unlike native applications which are written in a programming language specific to the mobile platform. Hybrid mobile applications are written once and distributed across multiple platforms. We talk more about hybrid mobile applications in the background section of the paper. According to a survey by Java World, software engineers spend more than half of their time on tasks unrelated to software development [8]. The engineers are spending time on necessary tasks related to administration, waiting for code to build, and tests to execute. This is an interesting statistic and one that highlights how necessary effective resource management is. The survey also says that software engineers spend about 10% of their time waiting for tests to complete. A developer who has insight and understands her users will be able to target key functionality. This targeted approach will allow the developer to maximize her time on areas of the application that are used the most. In addition, this approach

will also enable to developer to write meaningful tests in areas of code that are most used. There really is not a good way in hybrid mobile application development to collect user interaction data at the function level. There are plenty of tools that are decent at collecting hardware statistics from mobile devices. We talk about a few of these tools in the related works section of the paper. There are also a couple of tools that offer some insight into user interaction but none that offer the fidelity of information that our framework provides. These tools are also discussed in the related work section of the paper. It is hard to tell why there are not good ways of collecting hybrid mobile application metrics. First, we needed to see if developers cared about user metrics and identifying user behaviors. We also wanted to know if developers would like to be able to collect information on function usage and frequency. In order to solve this issue we conducted a developer survey. Based on this survey and the amount of interest we gathered from developer we decided it would be worthwhile to find a novel way to collect hybrid mobile application metrics. In order to evaluate our research we came up with 3 criteria to evaluate against: 1) Do developers want metrics that can produce insights into user behavior and feature usage? 2) Does the application we created identify user behaviors? 3) Does the application identify the frequency of functions that are used? Through this research we have contributed: • An open source software project that allows researchers to stand up a web service where developers can push data, using a RESTful web service, to persistent storage utilizing relational databases or time series databases. • A novel Node framework that collects user interaction metrics and pushes the data to a web service for insight analysis. • A developer study where we surveyed a number of developers to gauge the usefulness of collecting metrics using time-series analysis. • Discussion of insights that could be gathered using our tool and approach II. BACKGROUND Our research and framework are built around two emerging platforms. Throughout this paper we discuss mobile hybrid applications and time series analysis.

ISBN: 1-60132-489-8, CSREA Press ©

Int'l Conf. Software Eng. Research and Practice | SERP'18 |

31

A. Mobile Hybrid Applications Hybrid mobile applications have taken the market by storm. A few of the benefits of using hybrid mobile applications is the portability, cheaper origination costs, and faster speeds to market. Companies such as Facebook, Adobe, and Apache have released frameworks and platforms to foster the development of hybrid mobile applications. Companies such as Tesla, Facebook, Walmart, Instagram, Skype, and Airbnb to name a few are using JavaScript based applications [4]. B. Time Series Analysis Time series analysis is not a new idea in the world of analytics. Analysts on Wall Street have been using time-series analysis for decades to measure stock values. Time series analysis has begun to become more prominent in the world of software engineering because more and more tools are taking advantage of it. There are now databases created just to manage time series data. James Blackburn does a good job of explaining time series analysis, time series analysis is the collection of data at specific intervals over a period of time, with the purpose of identifying trends, cycles, and variances to aid in the forecasting of a future event. Data is any observed outcome that is measurable. Unlike in statistical sampling, in time series analysis, data must be measured over time at consistent intervals to identify patterns that form trends, cycles, and seasonal variances. Measurements at random intervals lose the ability to predict future events”[2]. We use time series analysis in our research to identify user behavior trends as we collect user interaction metrics. The time series allow us to identify usage trends of functionality within the mobile application. III. N EEDS A SSESSMENT We could see the usefulness of collecting metrics inside of hybrid mobile applications. We could also see the benefits of understanding user behavior inside of an application. What we did not know was whether other developers could see the value in such things. We realized that we needed to gather more information and get input from other sources. We put together 20 survey questions and asked developers at a local company what they thought. Based on the survey responses, which we talk about more in the developer survey section, we realized that we were not alone. Other developers understood the benefits of collecting such metrics. We set out then to create an application which would help developers collect metrics and identify user trends and behaviors, which we talk about in the the application section. IV. T HE A PPLICATION Based on the amount of interest we received from our developer survey, we set out to create an application which would allow us to collect metrics whioch provide insights into user behavior and function usage. Here is a high level overview of how the system works. There are three core components to the tool that we developed. The first component is the Node middleware module.

Fig. 1. Architecture Example

The second component is the web service which provides an API for the middleware to send data to. The last component is the metric analytics and visualization. This component offers developers a graphical user interface to interact with the data that the middleware sent to the web service. We used an opernsource tool called Grafana to provide the time-series analysis dashboard for the UI. Figure 1 shows an example of the architecture. The mobile device running the Node middleware module sends metrics to the API. The API then takes the information, parses it, and sends it to the database. The developer then connects to the Grafana front-end and is able to pull the metrics from the database. Below are the steps that a typical developer would take to begin tracking user interactions within his mobile application: 1) The developer must setup the Grafana frontend and the MySQL backend. The developer will create a table in the database called metrics. 2) The developer then must configure the web service which provides the RESTful API. He will need to provide database connection information to the web service. 3) The developer will then include the Node framework module in his mobile application. The module will need to be configured to point to the appropriate endpoint for the RESTful API. The developer will also need to set the increment for which the module pushes metrics to the web service. 4) Developer then places the collection function inside of the application’s functions for which he wants to collect user interaction data. 5) The Node module function sends an API call when a user interacts with a function which is being tracked. 6) The Developer logs in to web service, selects his application, and can query the database to see what functions are used the most 7) Lastly, the developer takes data back to development to target testing efforts on popular features and functions

ISBN: 1-60132-489-8, CSREA Press ©

32

Int'l Conf. Software Eng. Research and Practice | SERP'18 | B. Web Service

Fig. 2. Use case diagram of a user view his own timeline.

A. Middleware The middleware is written in Node.js. A developer is able to include it in their project using the Node Package Manager. Once the middleware is installed the developer can include it anywhere in his project. The middleware needs minimal setup other than the API endpoint for where it needs to push metrics. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

import import import import

React from ’react’; Display from ’./Display’; collect from ’hybrid-metrics’; ’./App.css’;

class App extends React.Component { view_timeline() { collect(’view_timeline’) timeline = Display.Timeline(user_id) return { timeline } } } export default App; Listing 1. Hybrid-Metrics.js Example

Figure 2 shows a use case diagram of a user using an application and changing the background image. The new image information and the save new background image information is sent to the web service. Listing 1 shows an example of what the middleware looks like when implemented in an application. In this example the middleware is used to collect user interaction information when the user views his own timeline. The framework allows the developer to identify a class, method, or function. The framework then collects the information and stores it in an object. The middleware runs on a timed cycle and at the end of the cycle period the metrics are transmitted to the API endpoint. In addition to the name of the function that was hit the number of times the function was used is also collected.

The web service is a Nodejs application built using express.js. The web service provides an API that the middleware Nodejs module communicates with. We chose to use Node.js because it provides a lightweight web server which provides the required functionality. In In addition to providing the lightweight web server it is also written in JavaScript so it made life easier when transitioning between development for the web service and the Node module. 1) Application Programming Interface: The API which the web service provides utilizes RESTful endpoints. When the middleware makes a POST call to the web service the web services parses the request. There are a couple of required fields for the request to be valid. The first required field is the name of the function that was called. The second is the number of times the function was called since the last time metrics were pushed to the API. 2) Database Backend: To handle the developer and application management aspects of the project we chose to use a the MySQL relational database. The MySQL database stores all of the information about developers and applications. It was an easy choice to pick a relational database for this purpose because the data is relational. We picked MySQL because it is well tested and we were familiar with it. Additionally, we also use MySQL as the database backend for metric collection. MySQL provides timestamps for each database entry. This timestamp allows us to compare values over a set period of time. Other databases provide this functionality as well but because we were already using it for the application management we decided to use it for this purpose as well. C. Metric Analytics and Visualization The developer is able to view all collected metrics through a graphical user interface. The developer is able to query the database for different analytics. We showcase the metric analytics and visualization during the case study. Grafana provides graphs and time-series analysis UI elements such as bar charts, line charts, dots, and heat maps. Grafana helps to make the information that comes from the middleware more understandable and actionable for the developer. Grafana is also open source and integrates with a number of different data providers and databases which makes it perfect for future work. V. D EVELOPER S URVEY We surveyed 34 developers at local software engineering companies. The majority of software engineers surveyed work on Department of Defense contracts. We asked 20 questions which focused on how the developers felt about collecting user metrics, how important they felt understanding user behavior was, and if they had concerns about privacy while collecting metrics. Overall, the findings are conclusive. Developers care about collecting metrics, they would prioritize maintenance

ISBN: 1-60132-489-8, CSREA Press ©

Int'l Conf. Software Eng. Research and Practice | SERP'18 |

33

Fig. 3. Metrics Shows New Feature Adaptation

projects based on function usage, and they arent overly concerned with privacy issues while collecting the metrics. The questions are included in the first appendix. A. Survey Bias The survey pool may have bias views towards the survey and the questions. The developers surveyed all have backgrounds as Department of Defense contractors. Most of the developers who responded all work at one company. B. Findings Out of the 34 developers surveyed we found the following: • 61.8 percent of developers think user metric collection is important • 76.5 percent believe it is important to know how often features of an application are used • 88 percent of developers want to maximize time when maintaining existing applications • 83.2 percent want to know how users use an application • 79.4 percent would prioritize work based on user behavior. • 85.3 percent of developers are not concerned about user privacy while collecting usage metrics This survey answers our first question. 83.2 percent of developers surveyed would find user behavior metrics useful. 79.4 percent would prioritize work based on the user behavior metrics. VI. C ASE S TUDY In order to showcase the benefits of the framework and middleware we conducted a simulated case study using a custom python script which sent simulated metrics at set intervals to the API service. A. Study Operations 1 2 3 4 5 6 7 8 9

user_feature_weights = { "add-image": 1, "add-post": 1, "view-timeline": 24, "view-friends-timeline": 24, } for feature, weight in user_feature_weights.items(): for x in xrange(randint(0, weight)): user_features.append(feature) Listing 2. sim.py Example

The study took a simple mobile application which was modeled after a social media application with four core functions. The functions are view-timeline, view-friends-timeline, addpost, and add-image. After collecting metrics for a period of time we introduce a new creator function called add-video. Based on the data gathered from the empirical study we are able to conclude that the framework and middleware provide insights that help developers maximize their resources and monitor user adaptation for new features. The features were broken out into two categories, creator and consumer. Creators make up a small percentage, around 3 percent, of users compared to the consumers. Based on this information our simulation weighted the consumer functions over the creator functions with a ratio of 1 to 25. We made 200 simulated users and loop over the users every second. These users are then randomly assigned a function out of the user-features array which we populated based on the weights of each function. After they are assigned a random feature they are then assigned a random number of usages ranging from 0 to 50. We provide a usage number because the middleware collects metrics and usage numbers for five minutes before pushing the metrics to the API. As the simulation loops the time to post to the API is randomized as well. We randomize the time to post to better model user behavior. We did not want all of the API requests to come in at precisely one second intervals. B. Results and Analysis We were able to collect metrics from the case study and identify user trends. We were able to see user behavior metrics in the form of function usage. The consumer functions, view timeline and view friends timeline were the most popular features of our simulated application. The creator functions were used but not as heavily. Figure 3 shows an example of the metrics UI where we highlight the point which we add a new feature to the application. The new feature, add-video, had high user adoption and we can see that from the metrics collected. This answers the second evaluation question, we are able to identify user behavior from the application. Figure 4 shows an example of the Grafana UI where we are able to tell how often a function is used over a period of time. Each dot on the figure represents a user’s actions over a period

ISBN: 1-60132-489-8, CSREA Press ©

34

Int'l Conf. Software Eng. Research and Practice | SERP'18 |

Fig. 4. Frequency of Usage

of time. For Looking at this example we can see the viewfriends-timeline and view-timeline are used more frequently than add-image. This makes sense because consumer functions are used more often than creator functions. This figure answers the fourth evaluation question, we are able to identify function frequency use. The case study helped us answer two of the evaluation questions. It showed us that the application is able to identify user behaviors. It also showed that we are able to identify frequency of function usage. VII. D ISCUSSION The application provided some interesting insights. The first insight was that we were able to identify user adaptation of a new feature. Figure 3 shows a very clear delineation of when a new feature came online. We can also see that it was a popular feature based on the amount of usage it received. This information could be very valuable to a developer to understand the scope of feature adaptation. We were also able to identify the frequency of function usage across a period of time. This information could potentially help a developer know where to focus testing, development, forecast resources, and identify possible bugs. The responses from the survey were interesting. We have attached the survey results for all of the questions to this paper. The most surprising response for us was that developers are not concerned about user privacy while collecting metrics. In todays age of information sensitivity and privacy concerns we would have thought more developers would be concerned. We were happy that the survey data reinforced what we believed. Developer want to know how their applications are being used and that they would prioritize work based on such metrics. VIII. T HREATS TO VALIDITY The first threat to the validity of the case study is that it was a simulation. We made a number of assumptions throughout the simulation as well. The first assumption we made was that creator functions would make up about 3 percent of the functions called. This assumption was made based on research that was taken from social media web site statistics. We

could not find good statistical information regarding mobile application usage and the difference between creators and consumers. The second assumption we made was that mobile devices would always have internet connectivity and that content would be uploaded every minute. We made this assumption to keep our simulation simple. This assumption did not impact any of the functionality of the middleware but may have skewed the way the metrics present in the time-series analysis graphs. The third assumption we made was that the add-video function would be popular. We made this assumption for the sake of the simulation to show what a popular feature would look like had one been added to an existing application that was already monitored using our middleware. IX. R ELATED W ORK Wu et. all [5], wrote about a testing service for android using crowd sourcing. They used a web client to get mobile applications in front of users. The primary focus of the crowdsource testing was user interaction with the UI using metrics from clicks, long clicks, typing, scrolling, and pinching. Wu et all even have a replay feature to see how the user interacted with the UI. Ferreira et all [3], have developed a framework title AWARE. Their research focuses on mobile context for capturing, inferring, and generating context on mobile devices. They accomplish this by collecting details of sensor data and uploading the data to a cloud. Their research is similar in that they have a client server framework and are collecting user metrics and pushing the data to a server. Their research is focused on understanding what a user is doing based on sensor data and then use that information to execute some intent in the application. There are also tools that exist that are excellent at collecting hardware metrics from mobile applications. One such tool is app metrics from IBM which provides insights into the operating environment, CPU utilization, memory usage, and database queries [6]. New Relic is an application performance monitoring and management tool. New Relic provides a suite of tools to help developers monitor and improve their applications. The suite offers tools that measure response times, throughput, and error rates. In addition New Relic can be used to monitor thread profiles, transaction metrics, and key business transactions. New Relic is primary used to monitor web applications but it can be used to monitor applications built using PhoneGap which utilize HTML 5 web views for user graphical interfaces. The research presented in this paper and the user interaction metrics collected from the framework are different from New Relic because the metrics collected from the mobile applications are at the code level. We can track and understand user interactions at the function level. New Relic sits on top of an existing application but it does not tie in to the code the way our framework does. Since our framework integrates at such

ISBN: 1-60132-489-8, CSREA Press ©

Int'l Conf. Software Eng. Research and Practice | SERP'18 | a low level we are able to collect metrics that are specific to user interactions [1]. Google Analytics is another application that provides insights to performance and application management. Google Analytics is a powerful analytics platform that developers use to gather metrics such as the number of users using an application, what actions the users are taking, measure inapp payments, and visualize user navigation path [7]. Google Analytics, like New Relic, was first developed for use in web sites and web applications. Google now offers Analytics for use with mobile applications. Google Analytics is similar to our framework in that you can glean information about user activity from the platform. Our framework is different from Google Analytics because we are able to gather the data that is passed from one function to another. This allows us to look directly at the user´s interaction with the application to understand better how the application is utilized. X. C ONCLUSION It is important for developers to understand how their applications are being used so that they can prioritize development, testing, and maintenance of new features. We have presented an approach to collect user interaction metrics from hybrid mobile applications which give insights that will help developer understand user interactions. Through this research have contributed an opensource middleware, a RESTful web API service, a developer survey, and discussion on our approach. Overall, we were able to answer all of our evaluation questions. We were also able to, through the survey, show that such a tool which provides these metrics would be useful to developers. XI. F UTURE W ORK The framework along with the time series analysis can be used to measure and identify usage trends in any web-based application. Some future work examples that we would like to explore would be identifying bugs based on user interactions. We could do this several ways and possibly build off of existing research which leverages time series analysis such as Coker et all [9] research where they developed a method using time series analysis to prioritize bugs. Although our research does not touch on prioritization or on identifying bugs our framework could be used to identify usage trends that may highlight bugs within the application. Another area would be to take measurements of the impact of the middleware on real world software examples. It would be interesting to measure the code growth and the resource utilization of an exiting application with before and after metrics. An additional area of opportunity would be to use the API service with standard mobile applications. The middleware would be reasonably easy to develop for native Android or iOS applications. The web service needs to be hardened against malicious users. It also needs to support multiple developers in separate Grafana UIs. One avenue we could go to improve both issues is

35 to implement a token-based approach for the web UI endpoint. If the UI endpoint receives metrics, then it will first look for a developer token. If the token is valid then it will tag the metrics with the developers ID and post the information to the database. This would improve the system against malicious users who may just want to provide false information by keeping out the false information. This would also allow the system to support multiple users. We could query first by developer ID and then by features or function names. The case study used a constant simulation which is inconsistent with normal user behaviors. In future work we would like to find a better way to model user behavior. There are a couple of different ways to do this. First, we would like deploy a simple application that users are able to interact with. This approach will give us the most realistic data set. Second, we could use Poisson Distribution to model user behavior if we aren’t able to get a wide enough usage from our mobile application. R EFERENCES [1] New Relic APM. n.d.. New Relic Application Performance Monitoring Features and Tools. (n.d.). https://newrelic.com/applicationmonitoring/features [2] James Blackburn. n.d.. Time Series Analysis and Its Applications. (n.d.). http: //study.com/academy/lesson/time-series-analysis-itsapplications.html [3] Vassilis Kostakos1 Denzil Ferreira1 and Anind K. Dey. 2015. AWARE: mobile context instrumentation framework. Technology Report ARTICLE (April 2015). https://doi.org/DOI10.3389/fict.2015.00006 [4] Facebook. n.d.. Showcase React Native. (n.d.). https://facebook.github.io/ react-native/showcase.html [5] Wei Chen Jun We Hua Zhong Tao Huang Guoquan Wu, Yuzhong Cao. 2017. AppCheck: A Crowdsourced Testing Service for Android Applications. IEEE (November 2017), 253260. https://doi.org/DOI10.1109/ICWS.2017.40 [6] Inc IBM. n.d.. Application Metrics for Node.js. (n.d.). https://developer.ibm.com/ node/monitoring-post-mortem/applicationmetrics-node-js/ [7] Google Inc. n.d.. Google Analytics for Mobile Apps. (n.d.). https://developers. google.com/analytics/solutions/mobile [8] Paul Krill. 2013. Software engineers spend lots of time not building software. (2013). https://www.javaworld.com/article/2078758/core-java/ software-engineers-spend-lots-of-time-not-building-software.html [9] Zack Coker ; Kostadin Damevski ; Claire Le Goues ; Nicholas A. Kraft ; David Shepherd ; Lori Pollock. 2017. Behavior Metrics for Prioritizing Investigations of Exceptions. IEEE (September 2017), 554563. https://doi.org/10.1109/ICSME.2017. 62

ISBN: 1-60132-489-8, CSREA Press ©

36

Int'l Conf. Software Eng. Research and Practice | SERP'18 |

A GUI Interfaces for a Multiple Unmanned Autonomous Robotic System Ahmed Barnawi, Abdullah Al-Barakati, Omar Alhubaiti Faculty of Computing and IT, King Abdulaziz University, Jeddah, Saudi Arabia

Abstract - Multiple Unmanned Autonomous Vehicle Experimental Testbed (MAUVET) is a state-of-the-art testbed that is being developed to facilitate research on engineering multiple agents robotic applications. Such testbed is essential to ensure quality of the application and satisfactory service delivery. In this paper, details of the testbed GUI interface is presented as designed and implemented. This system is made up of multiple autonomous and heterogeneous robotic agents performing a cooperative mission where each of them is assigned with a specific part in the mission proportional to its functions or capabilities. The system operator is involved throughout the mission i.e. to design the mission, to monitor the mission and to control the mission in some events evolving during the execution. Such provided privileges requires tight control over the system realized though the Graphical User Interface (GUI) that interact with the agents via a set of Application Programing Interface (API). In our testbed both GUI and API are developed from scratch with the main design objectives that the system’s software are based on modularity, reusability, interoperability and configurability. Important aspects of the system design were highlighted and performance evaluation of core components were also investigated. Keywords: Unmanned Aerial Vehicles, UAV Controller, Embedded Systems, Multilayered architecture

1

Introduction

The Multiple Unmanned Autonomous Vehicle Experimental Testbed (MUAVET) is an experimental testbed of novel multi robotic system that supports communication and task distribution in real conditions of open-air area. The varication scenario targets a setup of multiple autonomous vehicles (AV) including unmanned aerial vehicles (UAV), ground vehicles (UGV) or marine vehicles (UMV), generally noted as UxV, performing coordinated tasks such as search scenario for objects of interest over a given area, Figure 1. The testbed contribute toward a wide range of benefits especially in case of application development. Such applications of multiple robotic system could include surveillance applications or any specific purpose automated applications based on mobile robots. Those applications require, beside onboard image and data processing, path planning and mobile robots traversal.

In fact, the literature doesn’t provide much detailed open design of similar platforms. It is usually the case that the underlined designs of these platforms are being tagged to robotic vendor SDK’s and licensed software libraries. In this paper, we summarize our findings on the design and implementation of the system’s API and GUI interfaces. In other related activities [1][2], we have introduced our contribution with respect to the traversal path planning for a search and find application. Modularity is defined as the degree to which a system's components may be separated and/or combined. In software engineering, modularity implies that software is divided into different modules (components) based on functionality [3]. Each module has its inputs and outputs clearly defined, and software should be compartmentalized such that it allows for components to be developed independently with least amount of major design decisions impacting other modules in the system [4]. Reusability is another important design concept when it comes to complex software solutions development. In [5], it is defined as "…the use of existing assets in some form within the software product development process; these assets are products and by-products of the software development life cycle and include code, software components, test suites, designs and documentation. The work in this project included various work packages such as the design and implementation of GUI. As design objectives, the intended interfaces have to ensure modular and reusable design of the software components and/or modules to support general-purpose use of the testbed and to minimize the customization effort needed to develop different application thus maximizing the ROI benefits of such a complex system. The structure of this paper is as follows; in Section II, related work research motivation are introduced. In Section III, an overview on the MAUVET system is introduced. In Section IV, the system software architecture is presented. Section V, the GUI interface design and modules are detailed. In Section VI, we discuss the performance evaluation of the main components of the GUI’s Code Generation Engine (CGE). In section VIII, the conclusions and future work in this project are put down.

2

Related work

MASON [6] project involves the simulation and visualization library for JAVA that is able to handle numerous agents even in most demanding situations (such as swarm

ISBN: 1-60132-489-8, CSREA Press ©

Int'l Conf. Software Eng. Research and Practice | SERP'18 |

tactics). It has a modular design and modules can be added and or removed easily. The paper showcases separation of GUI and model and it mentions that this is what hampered some previous systems (TeamBots, SWARM, Ascape, and RePast). For that, this software is platform independent. Design goals include speed, ease of modification, and platform independence. There are three layers in this architecture (utility, model and visualization). Utility layer concerns with general purpose, reusable components, the model layer models agents and events and their attributes in various classes. Last layer is the GUI. On another hand, the work in [7] presents just the ground control station based on Robot Operating System (ROS). The design goals include using as much open components as possible, handling heterogeneous UxVs, developing a GUI monitoring and controlling component, and utilizing automatic flight path generation. This software can be run in Hardware in Loop (HIL) type simulation, or real experiments mode, which is our concern. This system uses different components from different parties and developed protocols for communication between every two components. GUI wise, design goals include ability to modify object properties midflight, real-time mission monitoring, and adaptability to varying number of UAVs. Multiple components are used here as well for path planning, map visualization etc. This software is an example of how using multiple off-the-shelf or configurable software would look like. Pros include short development time and a powerful visual capability with lots of features. Cons include having to develop custom protocols to facilitate interoperability between various programs used. Another related conurbation is documented in [8], this paper presents details of architecture of software used, a modular system handling different UAVs, sensors, controllers and missions. This system also uses many off-the-shelf components (both hardware and software) to handle micro commands (flight related, GPS, low level sensors, … etc) but not in GUI. The system eliminates Ground-Control stations, so UAVs have on-board PCs with custom OS (QNX, blackberry – used for autonomous driving) handling planning and collaboration between UAVs.The architecture of the system is multilayered by complexity (e.g. low level on controller, Trajectory tracking, Trajectory design, Planning and task allocation). Also it worth mentioning that the system is task based with allocation and collision handling by QNX scheduling providing sub-optimalness-greedy. GUI wise, the system proposes three different GUIs; Piccolo's for mission monitoring, team Control and a Process Monitoring GUI. In fact this system demonstrates how software can be developed to allow single user to control many UAVs as opposed to most current solutions of high user/UxV ratio without the need for a robust or dynamic GUI. To conclude on this point, our developed GUI interface features similarity to the work presented in [6] in the sense that the underlying system can run independent of GUI, though in that case, the GUI depends on underlying control system and is not modifiable. In [7] and [add reference third paper], however, they use off-the-shelf components for various

37

parts in GUI and system while ours uses custom built GUI. We also identify similar GUI functional modules as in [8]. Finally, our GUI was built to interact to multiple drones in real time like [7] with one advantage to our system that it deals with heterogeneous robotic agents. .

3

Related work

The MUAVET project is an experimental testbed of novel multi robotic system that supports communication and task distribution in real conditions of open-air area. The varication scenario targets a setup of multiple autonomous vehicles (AV) including unmanned aerial vehicles (UAV), ground vehicles (UGV) or marine vehicles, performing a coordinated tasks such as search scenario for objects of interest over a given area. The designed system infrastructure should be applicable to all types of robots, but UAVs are considered as the primary actor in the platform. UAVs (or other robots) with unique functionalities may be incorporated in the system. Figure 1 shows the basic system components and the interfaces among them. Over an open air area, the robots are deployed. The UAV agents are “physically” connected to the Base Station (BS) via the access point in a star topology, nonetheless, the UAV agents are logically interconnected to simulate UAV to UAV link. It is noticed that different UAV’s are introduced including ground and marine robotic vehicles. This system is operated and managed via Graphical User Interface (GUI). The MUAVET Agent subsystem consists of onboard PC, Flight controller, GPS receiver, camera, Wi-Fi Communication Module and sensors. Functionalities of the UAV Control software is provided through a low level standard TCP/IP socket interface. A C++ application interface (API) is built above the socket interface in form of C++ functions encapsulated by MUAVET Interface class. The application code can exploit UAV control functionalities simply by calling function from this API. Both these interfaces are available on the server as well as on the UAV onboard computer, so it is up to user to decide strategy of controlling the agent i.e. centralized controlled via BS or autonomous control via onboard PC.

ISBN: 1-60132-489-8, CSREA Press ©

Fig. 1. MUAVET

38

Int'l Conf. Software Eng. Research and Practice | SERP'18 |

4

Software Architecture

Basic software architecture of the MUAVET system is shown consists of a central server providing central point of communication and connecting all connected entities, namely user applications (including user interface) and robots. The MUAVET system server is meant to act as a central node connecting user applications with the control software running onboard robots/UAVs. Basic block architecture of the server is depicted in Fig. 2, having TCP user interface on the left side and robot UDP interface on the other side. The connection between user applications and server is realized by TCP socket interface however, the connection between the server and several agents is realized by datagram-based UDP protocol.

buffering of sequentially sent user commands thus violating the requirements for minimal latency. This problem is solved by defining the rule that newly sent command override previous command. Implementation of the TCP interface is encapsulated by CMuavetTcpServer class with the public interface shown in the Fig. 3. There are methods to start server (open server TCP port), stop server and message transfer methods. The processDatagramFromClient method processes messages sent by client, handling system messages (e.g. user authentication) directly and passing robot control messages forward. Methods processDatagramFromRobot and sendDatagramToRobot are used for the incoming/outgoing communication with robots respectively. These methods are declared virtual, which means their implementation will be different in server and onboard application. Basic implementation of the processDatagramFromRobot checks the destination (client) ID of the message and re-sends the message to the specified client or broadcasts it to the all connected clients.

Fig. 2. MUAVET Server’s communication software components

The MAUEVET server software is written in C++. The server software is designed to have minimal dependencies on external libraries. The communication protocol between server and robots is designed according to the following requirements: 1. Temporary loss of connection should not affect the system functionality 2. The latency in message delivery to the agent should be minimal, 3. In case of temporary loss of connection, the messages to the base station should not be delayed In order to fulfill the above requirements, packets discard has been allowed on the lowest communication level between the agent and the base station, in case packet delivery fails. This makes the UDP an optimal protocol. An additional advantage of the UDP is datagram oriented communication, which is probably simplest form for message processing. For secure message delivery, additional mechanism is needed to deliver mission critical messages those are the messages carrying the robot commands (sent to robots from the base station). This is achieved by sending an acknowledge message to each received command by robot. The secure delivery is managed by the server. If there is no acknowledge received for the specific command in the specified timeout, the command message is resent. However, in case of a longer loss of communication with the robot, this approach will lead to

Fig. 3. CMuavetTcpServer Class interface

The CRobotCommunication class in the server implements UDP communication with the robots, using the communication protocol architecture described above to manage connections between agent and the server. A single UDP socket is used for the communication with all connected robots. Single thread handles this socket, validates all received datagrams and assigns them to the connected robots by the source IP address and port. Before any other communication between server and robot may be established, exchange of the CONNECT messages is required in order to identify both sides. 5

GUI Interface Design and Modules

The developed GUI interface is a multiple components modular system the intention is that with minimal customization this GUI interface may fit desired application. System modularity is ensured as the main four modules namely; Mission Design, Manual Control, Command Generator Engine (CGE), and Mission Monitoring, each are intended for doing a specific set of related functions. These functions (manifested over multiple reusable software

ISBN: 1-60132-489-8, CSREA Press ©

Int'l Conf. Software Eng. Research and Practice | SERP'18 |

components) are independent from each other. Each component has its input and output clearly defined. The modules are in turn interconnected in the sense that output from a module and is fed to another as it is needed for operator convenience. It is important to stress here, as a design objective, our system is essentially a customized solution with configurability in mind. That is system modifications or update are enabled without prohibitive effort. For instant, if we wanted to utilize the testbed’s GUI to handle a certain application, the customization task may be limited to few software components associated with the desired functionalities while other components remain as is.

5.1

Mission Design Module

In this module, initial parameters of search operation are set, the output of this mission is trajectory waypoints for each agent participating in the mission. The input of this module can be the searched area boundaries, number of drones and their basic capabilities such as battery level. Figure 4 shows data flow within the module and I/O with the trajectory algorithm server. The Parameter Manager Function is responsible for managing the gathering of inputs of different nature such as: 1. Search area boundaries based on user selects boundaries of the target area from the map interface. The interface uses google maps API and it is written in java-script with a back-end in java. 2. Drone parameters such as max-speed, battery charge and camera parameters. 3. Mission Parameters such as wind speed, targets number, … etc. The trajectory manager function acts as a client to communicate with the Algorithm Server using JASON liberty that unifies objects communication among different programing language. This function also export the trajectories to other system components. Parameters Input Manager

Trajectory Manager

JSON

Trajectory Algorithm Server

Trajectory Export

Fig. 4. Data flow in Mission Design Module In mission design module, multiple components can be changed to suit required use without much effort. A different

39

mapping solution can be used while needing a small change to java-script wrapper. This module utilizes JSON which is interoperable between many systems/languages, which means communication with external algorithm servers of any kind will be the same. Also, due to the nature of needed data from such servers (trajectory points with little other information), our system can be easily adapted for using any other algorithm with little to no modifications.

5.2

Mission Control Module

This module, consists of two main functions; (1) viewers to track drones’ location on maps and their speeds etc., and (2) a Manual/Event controller to manually control drones and offer event handling. The Viewers Function utilizes Java Swing’s threaded GUI application elements, mainly SwingWorker, which starts a new task and executes it intermittently along with the main GUI thread. Multiple workers are used to handle sending and receiving of commands, tracking drone stats, and handling console outputs from both BS and API client. The Manual/Event Control Function has the following components: 1. A log extractor, to read mission logs and interpret various parameters for display on GUI. 2. An “Add to CMD queue” component, which sends manual user commands to a command (CMD) queue. Commands are enqueuer here and executed at the CGE Module. 3. An Event Manager component that is responsible for event detection (from mission log and as designed in Mission Design module). An event may occur while the execution of the mission is taking place and further actions is required by the system. Through the interface, users can send basic commands to UAVs like ‘Take-Off’, ‘Land’, ‘Go-To’ a specific point, or to follow a given traversal trajectory (series of points to travel through)

5.3

Command Module

Generator

Engine

(CGE)

The CGE generates API codes for user’s commands to control the drone’s features. It also checks generated code for error prior to sending them out to minimize erroneous traffic between GUI and the base station or the drone. Figure 5, shown below illustrates CGE’s different functions. Execution in this module flows through three consecutive functions as shown in figure 5: First, Command Generator Function reads points generated by Mission Design module, and writes sets of drone commands. To handle the expected large volume of mission generated data, Quick input software component utilizes Java’s new I/O (nio) library to perform reading points and writing instructions. Performance of this function is discussed in following section.

ISBN: 1-60132-489-8, CSREA Press ©

40

Int'l Conf. Software Eng. Research and Practice | SERP'18 |

Command Checker Function checks code generated by first function. It is also responsible for checking manual command generated Mission Control Module. Commands are checked for conformity to the state diagram shown below (Figure 6), that is, each instruction will transition the drone from a state to another, and possibly then another state. A drone should be in a specific state in order for a given instruction to be able to execute. For example, a drone cannot fly to a destination when it is not armed or hovering midair after having taken off. Mission Design Module

Control Module

Comnd Gen.

Comnd Checker

Comnd Handler

Base Station

6

Agent state

Fig. 5. Data flow in CGE Module. After that comes Command Handler Function, which is responsible for sending / receiving commands/stats to/from drones and updating stats in mission logs for GUIs to view. Part of event handling is done here through events being logged in mission log, then Mission Control Module detects them and sends corresponding handling commands to command queue where this module sends them to drones. Mission Log component is a component where feedback coming from drones and base-station is written. Other parts of the GUI system like module 2 and 4 access this log to retrieve relevant information. The same command parser component from Command Checker Function is used in Command Handler.

Fig. 6. System states diagram.

5.4

the use of many functions from module 2 such as most viewers function components (text and map) while adding a video feed handler component to view footage and targets captured and detected by drones. Data presented via the GUI interface in this module is retrieved from mission log while some others are calculated such as Percentage of mission progress: This is based on number of way-points traversed relative to all way-points in a mission. A more accurate way of calculating this would be based on distance traversed that is the total traversal distance is calculated by adding distances between every pair of waypoints and adding them to a cumulative-type array. This would give each traversed waypoint a different weight when calculating percentage. Time elapsed calculations is based upon the previous calculation by extrapolating time consumed for example if the first 50 meters or 20 way-points took 3 minutes to traverse, then the remaining 150 meters will take approximately 9 minutes to complete, this measurement will get more accurate as mission progresses.

Mission Monitor Module

This module is assigned with displaying and presenting mission parameters using the information extracted from the data retrieved from the agent. This module’s functions utilizes

CGE Performance Evaluation

In the following section, we will address performance evaluation of the Command Generator Component as it plays a central role in the GUI design. The performance of this module is obvious to be a function of the processing power, in particular is the performance of the code generator. The performance of this component will be even more critical in case the commands controlling the drones are meant to be generated on board of a robotic agent (master) to command other agents (slaves). Our strategy for performance evaluation will be to calculate the processing overhead by running the code generator components on the base station server operated in two modes (full power and power saver modes). For this testing run, and since, a random trajectory point generator was used to write trajectory coordinates that this component will use to generate commands. Three double numbers were randomly generated for each point (latitude, longitude, and altitude) and written in binary using Java NIO’s (New I/O) FileChannel which utilizes memory mapped buffers that mirror into a file for efficient, near hardware speed I/O. Times taken to generate the random sequence, read it back, and generate trajectory defining code were measured for a number of points ranging from 6,500 to 650,000 in increments of 1% for a total of 100 measuring points in both power modes (Full power @3.4 GHz and Power Saver @0.88GHz). Results show a 9-fold increase in time when number of points increases by a factor of 10 (at Full power) and ~3X increase when going from Full power to power saver. The graph in Figure (7) shows a linear relationship between number of points and time needed to generate instructions to construct a trajectory. It is clear that relationship linear in both modes of operation. Another aspect of the developed CGE to examine would be to check the validity of the Command Checker algorithm to make sure it produces correct outputs. To test this component, we generated many files containing sets of command sequences,

ISBN: 1-60132-489-8, CSREA Press ©

Int'l Conf. Software Eng. Research and Practice | SERP'18 |

some sequences are correct and some are not. By correct sequence, we mean that each command will takes a drone to a state that is reachable from drone’s current state as defined in the rules file based on the state diagram shown in Figure 6. An incorrect sequence of commands will inevitably violate this rule and the algorithm should catch this for further actions as explained above. Causes for the file being incorrect could be command deletion, insertion, replacement or any other malfunctions resulted from software instability. For all tested sets of command sequences, the algorithm was able to identify correct from incorrect. Figure 8 shows output from two testing runs, one involving a correct command sequence and other with an incorrect one, as in the 9th line when the ‘land’ command is replaced with ‘disarm’ command. At the 9th instruction, the state transition shows that the drone is in “HOLDING POSITION” state (i.e. state 5 in Figure 8) that implies that only three commands are expected to transit the drone to the next state according to the rules, those commands are FlyTrajectory, FlyTo, and Land however with the incorrect sequence, the 9th command was Disarm, therefore, violation is detected as it appears that the next state is undetermined and error message is generated “Sequence Wrong”.

41

to control the agent’s movement and retrieval of onboard data from the sensors or GPS receiver. We have shown that to ensure message success delivery from the robot to base station using UDP connection, a higher level acknowledgment mechanism is implemented. The GUI design on the other hand ensures system modularity to facilitate configurability and reduce the customization effort. We have presented our detailed design and dependencies between the GUI’s modules. We also detailed the design of some module’s functions and presented some performance evaluation results to appreciate some of the systems tradeoffs. The future work will include running experiments to further investigate the system’s performance under some extreme scenarios.

8

References

[1] Ahmed Barnawi, Abdullah Al-Barakati, “Design and Implantation of a Search and Find Application on a Heterogeneous Robotic Platform”, Journal of Engineering Technology, 2017, pp. 235-239 [2] Ahmed Barnawi, Seyed M Buhari, Marwan Alharbi, Fuad Bajaber and Abdullah Al-Barakati ,“A Traversal Algorithm of an Advanced Search and Find System (ASAS) Over a Heterogeneous Robotic Platform”, submitted to International Journal of Advanced Robotic Systems, 2017 [3] https://en.wikipedia.org/wiki/Modularity . [4] Parnas, David Lorge. "On the criteria to be used in decomposing systems into modules." Communications of the ACM 15.12 (1972): 1053-1058.

Fig. 7. Command Generator Performance Analysis Chart

[5] Lombard Hill Group (October 22, 2014). "What is Software Reuse". http://www.lombardhill.com. Lombard Hill Group. Retrieved 22 October 2014. [6] Luke, S., Cioffi-Revilla, C., Panait, L., Sullivan, K. and Balan, G., 2005. Mason: A multiagent simulation environment. Simulation, 81(7), pp.517-527. [7] Del Arco, J.C., Alejo, D., Arrue, B.C., Cobano, J.A., Heredia, G. and Ollero, A., 2015, June. Multi-UAV ground control station for gliding aircraft. In Control and Automation (MED), 2015 23th Mediterranean Conference on (pp. 36-43). IEEE.

Fig. 8. Command Checker Checking Experiment Sample.

7

Conclusions and Future Work

In this paper, we have presented a detailed design of our developed system’s API and GUI based on MAUVET platform. This platform contains BS connects multiple robotic agents to perform some user’s application. In this work, we have implemented more than two dozens of API’s commands

[8] Tisdale, J., Ryan, A., Zennaro, M., Xiao, X., Caveney, D., Rathinam, S., Hedrick, J.K. and Sengupta, R., 2006, October. The software architecture of the Berkeley UAV platform. In the proceedings of IEEE International Conference on Control Applications, 2006 (pp. 1420-1425). IEEE.

ISBN: 1-60132-489-8, CSREA Press ©

42

Int'l Conf. Software Eng. Research and Practice | SERP'18 |

M-Commerce Apps Usability: The Case of Mobile Hotel Booking Apps D. Matlock, A. Rendell, B. Heath, and S. Swaid, Ph. D¹ 1

Division of Natural and Physical Sciences Philander Smith College, Little Rock, AR

Abstract - Today, mobile users have a powerful portal to the web in their pockets. They use their mobile to book a hotel room, open bank account, buy a new pair of shoes and search for entertainment events nearby. Unsparingly, software engineering now integrate usability testing in the development cycle of mobile apps. About one third of people use their mobile devices to book a hotel room. However, a study by Google found that 54% of travelers indicate that usability and limitations on mobile apps are the biggest reason to abandon hotel mobile apps. This paper presents a study to identify the usability heuristics to use when evaluating the usability of hotel booking mobile apps. An integrated approach is used considering the general usability heuristics of Nielsen, and the industry guidelines developed by Google Android and Apple. Based on empirical experimentation the study resulted in 13 heuristics: “Status”, “Matching-Real-World”, “Control”, “Error-Prevention”,“Recognition”, “Flexibilityand-Efficient Use”, “Design”,“Diagnose-andRecover”,“Help”, “Performance”, “Information–and-VisualHierarchy”, “Natural-Interaction”, and “DynamicEngagement”. Four usability experts tested the usability of VirginHotels, and ExtendedStay mobile apps using the developed heuristics . The study highlights usability deficiencies and provides recommendations to improve usability of hotel booking mobile apps. Keywords: Commerce

1

Usability,

Mobile,

Apps,

Heuristics,

M-

Introduction

In parallel to the growth of ownership and use of mobile devices, mobile e-commerce is also growing. By 2020, it is estimated to have 189 billion US dollars via apps stores and in-app advertising [16]. The explosion of mobile apps is seen in just about every industry such as retail, media, travel, education, healthcare, finance and social. In 2017, consumers downloaded 178.1 billion mobile apps to their connected devices. In 2022, this figure is projected to grow to 258.2 billion app downloads [16] with social networking apps, mcommerce apps and mobile games among the most downloadable apps [16].

Mobile e-commerce, or m-commerce, as termed here, is defined as all activities related to a potential commercial transaction conducted through communications networks that interface with wireless or mobile devices [20]. Enabled by mobile devices, m-commerce possibilities range from browsing, searching, to purchasing items or services, and conducing online banking transactions. Users would download mobile apps to be able to conduct any of the activities related to m-commerce. Mobile applications are software programs developed for use on personal wireless devices such as smartphone which have different platforms (e.g., iOS, Android). Recently, and due to research on user experience and engagement, usability in mobile apps software engineering is integrated in the software development lifecycle. Moreover, software developers’ community of mobile apps has developed a list of guidelines to ensure smooth user experience, regardless of platform use or mobile application type. However, the research on usability is fragmented, incomprehensive, far lesser focus on real user context, and lacks clarity on what rules usability experts to apply when evaluating mobile apps usability [11]. Moreover, as heuristics evaluation becomes more popular in usability studies, it may miss domain specific problems. This study develops a set of usability heuristics to identify usability deficiencies in hotel booking mobile apps.

2

Literature Review

Usability of a system is a key factor that contributes to the effectiveness, efficiency and satisfaction in which specified users achieve specified goals when using the product [9]. Likewise, usability of mobile apps contributes to positive user experience, where users achieve their goals easily and quickly. Industry reports show that users expect mobile pages to load in two seconds and search results to be accurate with great functionality and varying features. Usability and usability evaluation have received increased attention in e-commerce and m-commerce due to the high cost of customer acquisition and retention [23]. One way to usability evaluation is expert-based techniques such as cognitive walkthrough and heuristics inspection. An advantage of such expert-based methods is that they can

ISBN: 1-60132-489-8, CSREA Press ©

Int'l Conf. Software Eng. Research and Practice | SERP'18 |

identify more usability problems in short time, making heuristics evaluation a cost-effective method allowing more than 60% usability deficiencies to be identified [12]. One set of usability heuristics that has been widely adopted is the one developed by Nielsen [12], including rules of : Visibility of system status, Match between system and real word; User control, Consistency, Error prevention, Recognition, Flexibility and efficiency of use; Aesthetics and minimal design, Help user diagnose and recover errors, and Help and documentation. The process of usability heuristics evaluation starts with engaging a small set of evaluators as many as five to examine the interface or software and judge its compliance with usability heuristic. Next, evaluators aggregate the usability evaluation results in a corresponding list of usability deficiencies and its corresponding violated heuristics. Then, usability deficiencies are ranked in terms of severity from zero, where zero is considered ‘cosmetic’ issue, to four, a ‘usability catastrophe’ that is imperative to fix. Considerable number of research studies applied usability heuristics to evaluate medical devices, online learning systems, scientific software, websites, and virtual reality systems [1][5] [6] [7][14][17][19].

3. The Study Heuristic evaluation is applied in step-wise approach with the help of a small number of usability evaluators[12] to identify the largest number of deficiencies. Each evaluator inspects the mobile apps individually, several times and tests the various dialogue elements and compares them with the list of recognized usability principles. Then, after completing the usability evaluation task by each evaluator, they meet to communicate their findings, aggregate results, rank deficiencies and generate written reports. As usability heuristics is challenging with emerging information technology [20] an integrative approach is applied by considering Nielsen’s heuristics [12], the Google Android [2] design guidelines and Apple human interface guidelines [4]. In this study we focused on mobile apps for hotel booking and we selected two apps the VoirginHotel.com app [22] and the StayConnected.com [15] app. The VirginHotels app, Lucy, is a personal assistant app that helps fulfilling requests for services and amenities, functioning as the room thermostat, streaming personal content and more. The app, Lucy, “will put guests in the captain’s chair. The technology will be smart and intuitive, and light the way to a more immersive experience within the hotel” [10]. On the other hand, the ExtendedStay app is similar in functionality to the website of ExtendedStayAmerica.com. (ESA). The app provides features of booking a room, explore hotels, find reservations, keep track of previous and future booking transactions. Total of four usability evaluators inspected the usability of the mcommerce apps [18]. Usability evaluators had training on human-computer interaction theories, user-centered design and usability. Nevertheless, a training session on usability heuristics and Nielsen’s heuristics in particular, was provided

43

in form of research experience for two semesters focusing on apps of retailing, hotel booking, online banking, social gaming and social networking, among others. All evaluators downloaded the two apps on smartphones that are operated with Android and iOS. Each evaluator inspected the elements of each app against Nielsen’s heuristics. Then, evaluators conducted the correlational-stage, “to identify the characteristics that the usability heuristics for specific applications should have” [21], based on Nielsen’s heuristics [12] and usability guidelines provided by Android [2], and Apple developers design team [4]. Finally, the evaluators met to discuss and compile the list of heuristics that fit hotel booking mobile apps (see Table 1). The study revealed a number of usability violation to the 13 usability heuristics suggested by the study. In an experimental stage, the evaluators conducted usability inspection on VirginHotels, Lucy, mobile app and ESA mobile app. Results reveled that both apps have usability problems varied in terms of severity. For example, VirginHotel app was found a couple of times showing a blank screen at the on-boarding stage. Also, evaluators found in spite of the vivid pictures the app employs, the app does not show dots or similar, to inform users of slides or pictures totals in the gallery. Lucy, the mobile app, violates the rule of “Systems Visibility”, when users check for room availability, no progress bar is displayed to users to inform them that the app is working to find rooms based on the inserted criteria. In the case of ExtendedStayAmerica, several usability issues were identified as well.. None was with high severity, but the issues found can cause users to abandon the app due to frustration. For example, at the reservation stage, after selecting the room, it is unclear how to advance to making the reservation. In some cases when “Special Note” is associated with some rooms, clicking “No Thanks” does not work, and cause the screen to flash. User has to accept, and then go back to change room selected. This violates “User Control” rule. Unlike the website of ExtendedStay, where users can share promotion they get via social media, the app does not fully support dynamic-engagement (Table 2). Table 1. Selected Usability Heuristic for Mobile Apps Usability Rule Definition Information and Visual Hierarchy

Natural-Interaction

DynamicEngagement

The app presents information and visual objects in a hierarchy fashion based on users functions of searching, sorting, filtering, zooming and swiping and The app utilizes the hardware and software capabilities for in-app gestures such as tap, swipe and pinch. The app enables users to engage in two-way communication with other apps, services (e.g., location-based services, widgets) and users

4. Conclusion and Implications Mobile apps are changing the landscape of e-commerce in general and the lodging industry in particular. Mobile apps will be that main channel for hotel booking, but also a necessity strategy for strengthening customers relationships,

ISBN: 1-60132-489-8, CSREA Press ©

44

Int'l Conf. Software Eng. Research and Practice | SERP'18 |

customers loyalty, and brand loyalty[3]. M-commerce usability is one of the biggest challenging issues in users onboarding and loyalty and the key factor that effects quality of mobile apps. This exploratory study helped in developing usability heuristics to evaluate mobile hotel booking apps using a set of 13 rules: “Visibility”, “Matching-Real-World”, “User-Control”, “Error-Prevention”, “Recognition”, “Flexibility-and-Efficient Use”, “Minimal Design”, “Diagnose-and-Recover”, “Help”, “Performance”, “Information –and-Visual-Hierarchy”, “Natural-Interaction”, and “Dynamic-Engagement”. Table 2. Selected Usability Deficiencies

ExtendedStay America App

Lucy,VirginHotels, App

Usability evaluation conducted in this study suggests a number of recommendations for mobile app usability designers and user experience researchers. Users of Mobile apps expect to complete their transactions with minimal efforts. Therefore, designers of mobile apps to provide the shortest path from launch to booking via straightforward navigation, step-wise directed path and clear call-to-action buttons. Also, mobile apps to provide search engines with filtering, sorting and zooming features to support user search. Users to be able to recognize rather than recall by providing visible information and features that are easy to access. Users of mobile apps to by engaged dynamically by utilizing location-based services such as GPS, weather forecast, and nearby attractions. Given the advanced of augmented reality technologies, mobile apps to apply augmented reality to open exciting opportunities for their users. To avoid users irritation, it is suggested that mobile apps to display a number of options for payment methods with easy to complete forms. Due to the limited features of mobile phones such as small screen size, it is important for content to be displayed as visual objects to support visual experience for the explored distention.

Acknowledgement This study is funded by NSF HBCU-UP Implementation Grant # 1238895.

3

References

[1] Al-Sumait and Al-Osaimi. A.: Usability Heuristics Evaluation for Child ELearning Applications. Journal of Software, 2010, 5(6), 654-661. [2] Android: Google Android Design Guidelines. Retrieved from: https://developer.android.com/design/index.html, on May 30, 2018. [3] Anuar, J., Musa, M., and Khalid, K. 2014. Smartphones applications adoption benefits using mobile hotel reservation systems among 3 to 5-star city Hotels in Malaysia. Procedia-Social and Behavioral Sciences, 13, 552-557. [4] Apple.: Human Interface Guidelines. Retrieved from: https://developer.apple.com/design/, on May 30, 2018. [5] Baker,K., Greenberg, S., and Gutwin, C.: Empirical Development of a Heuristics Evaluation Methodology for Shared Workspace Groupware, Proc. of the ACM CSCW ‘02, New Orleans, Louisiana, USA, November, pp. 96-105, 2002 [6] Bertin, E., Gabrielli, S., Kimani, S.: Appropriating and assessing heuristics for mobile computing. Proceedings of the Working Conference on Advanced Visual Interfaces (AVI '06); May 2006; New York, NY, USA. ACM; pp. 119–126. [7] Desurvire, H.,Caplan, D.,and Toth, J.: Using heuristics to improve the playability of games, ” CHI Conference, 2004. Vienna, Austria, April 2004. [8] Göransson, B., Gulliksen, J. & Boivie. I.: The Usability Design Process – Integrating User-Centered Systems Design in the Software Development Process. Software Process Improvement and Practice, 2003, 8, 111-131. [9] ISO 9241-11.: Guidelines for Specifying and Measuring usability”, 1998. [10] InternetRetailing. 2015. Virgin Hotels will empower guests with Lucy, all-seeing app. Retrieved from; https://internetretailing.net/themes/themes/virgin-hotels-willempower-guests-with-lucy-all-seeing-app-12651. [11] Kjeldskov, J. and Paay, J. 2012. A longitudinal Review of Mobile HCI research methods. Proceedings of the 14th international conference on Human-computer interaction with mobile devices services, 69-78. [12] Nielsen, J. : Heuristic evaluation. In: Nielsen J, Mack RL, editors. Usability Inspection Methods. New York, NY, USA: John Wiley & Sons; 1994. [13] Seffah, A., Donyaee, M., Kline, R. & Padda, H.: Usability measurement and metrics: A consolidated model. Software Quality Journal, 2005, 14, 159–178. [14] Somervell, J., Wahid, S., and McCrickard, D.: Usability Heuristics for Large Screen Information Exhibits, Proc. of Human-Computer Interaction (Interact ’03) Zurigo, Svizzera, 2003, 904-907 [15] Stay Connected America.com

ISBN: 1-60132-489-8, CSREA Press ©

Int'l Conf. Software Eng. Research and Practice | SERP'18 |

[16] Statista. :Mobile commerce in the United States Statistics & Facts. 2018. Retrieved from: https://www.statista.com/topics/1185/mobile-commerce/ [17] Sutcliff A., and Gault. B.: Heuristics Evaluation of Virtual Reality Applications. Interacting with Computers , 2004, 16, 381-849. [18] Swaid, S. and Suid, T. 2018. Usability Heuristics for M-Commerce Apps. Proceedings of 2018 AHFE, Orlando, FL. [19] Swaid, S., Maat, M., Krishnan, H., Ghoshal, D., and Ramakrishnan,. L.: Usability Heuristics of Data Analyses and Visualization Tools, 8th International Conference on Applied Human Factors and Ergonomics (AHFE 2017) , Los Angeles, California, USA 17-21 July, 2017 [20] Tarasewich, P., Nickerson, R., Warkentin, M. 2011. Issues in Mobile Ecommerce. Communications of the Association for Information Systems, 8, 41-64. [21] Rusu, C. Roncagliolo, S., Rusu, V., Collazos, C. 2011.A Methodology to Establish Usability Heuristics. The Fourth International Conference on Advances in ComputerHuman Interactions, 60-62. [22] VirginHotels.com [23] Chan, S., Fang, X., Brzezinski, J., Zhou, Y., Xu, S., and Lam, J. 2002. Usability For mobile commerce across multiple form factors, Journal of Electronic Commerce Research 3(3), 187–199.

ISBN: 1-60132-489-8, CSREA Press ©

45

46

Int'l Conf. Software Eng. Research and Practice | SERP'18 |

Static Taint Analysis Tools to Detect Information Flows Dan Boxler

Kristen R. Walcott

University of Colorado, Corlorado Springs Colorado Springs, CO 80918 Email: [email protected]

University of Colorado, Corlorado Springs Colorado Springs, CO 80918 Email: [email protected]

Abstract—In today’s society, the smartphone has become a pervasive source of private and confidential information and has become a quintessential target for malicious intent. Smartphone users are bombarded with a multitude of third party applications that either have little to no regard to private information or aim to gather that information maliciously. To combat this, there have been numerous tools developed to analyze mobile applications to detect information leaks that could lead to privacy concerns. One popular category of these tools are static taint analysis tools that track the flow of data throughout an application. In this work we perform an objective comparison of several Android taint analysis tools that detect information flows in Android applications. We test the tools FlowDroid, IccTA, and DroidSafe on both the android taint analysis benchmarking suite DROIDBENCH as well as a random subset of applications acquired from F-Droid and measure these tools in terms of effectiveness and efficiency. Keywords: Taint Analysis, Information Flows, Static Analysis, Mobile Development

I. I NTRODUCTION As mobile devices become increasingly popular, there has been a shift in development for these platforms. Devices such as smart phones and tablets have taken over the forefront of mobile development over laptops and netbooks. There are several mobile operating systems out there but the most prominent in terms of market share is by far the Android operating system developed by Google [3]. With its increasing popularity, Android has become a more attractive target for malicious developers that wish to obtain private information from users. This could be anything from location data, contact information, passwords, or even user activity. Software applications on Android can be acquired from virtually anywhere and there are several marketplaces where users can find and download applications. Unlike other mobile operating systems such as iOS, there is no approval process that an application needs to go through to be uploaded to a marketplace such as Google Play. Thus, there is no barrier to entry for a developer to create an application that has malicious intent. Android attempts to address this issue by requiring application developers to declare what restricted resources they intend on using in their applications in a manifest that the user sees and must agree to prior to downloading the application. This helps in informing the user on what restricted resources the application has access to, but it does little to inform them how the resources are being used.

Furthermore, there is a theme of many applications to request more permissions than is actually needed for proper execution of the application [10]. A large number developers ask for more permissions and violate the least permission principle in favor of not needing to request additional privileges on future updates. Most marketplaces such as Google Play rely on user regulation to filter out unsavory applications based on reviews and reports of malicious behavior. A common trend in Android malware are privacy leaks. A privacy leak occurs when a malicious application sends private information from a device to an external destination without the user’s consent or knowledge. This can be anything from contact information, location details, passwords, or any data that could compromise the user’s privacy. This information can be used by malicious users ranging from anything from targeted advertising to identity theft. There are many static analysis tools that have been developed to detect for information flows that may lead to privacy leaks in Android applications. Static analysis is a testing technique that involves inspecting code without execution. Tools will analyze all potential paths in the program looking for specific vulnerabilities or defects. Static taint analysis involves tainting a piece of information from where it originates so that it can be found during subsequent analysis. Android static taint analysis tools analyze data flow from an information source where private information is generated or stored to an information sink where the data leaves the device to a potentially malicious destination. These results can then be manually analyzed to determine if the flow is indeed malicious or not. Alternatively, there are also malware detection tools that can assess the results for malicious intent. While there are many tools out there that intend to detect information flows in Android applications, there exists no objective comparison for such tools. It is necessary for a comparison to exist if there is to be a marketable application for these tools in industry. For example, if there was a screening process for applications to be added to a marketplace in that they need to not leak any private info, a static analysis tools could be used to determine if an application is safe for users. A tool of that nature would need to both fulfill the requirements of determining privacy leaks as well as do so in a cost efficient manner. To address this issue, we present an empirical analysis of three of the most state of the art tools for

ISBN: 1-60132-489-8, CSREA Press ©

Int'l Conf. Software Eng. Research and Practice | SERP'18 |

47

static analysis of information flow detection. In this analysis we will focus on effectiveness and efficiency. Effectiveness will be the tools ability to detect data flows in a Android application and efficiency will be measured by the total run time for a tool to run its analysis on an application and the amount of resources it consumes in the process. This will be useful in determining the cost of running each tool. Our contributions are as follows: • Design a methodology for testing static analysis tools for mobile applications • Develop a third-party test suite from various Android marketplaces and compare results of the chosen static analysis tools. • Objectively compare various Android static analysis tools using pre-established benchmarking suite DROIDBENCH in terms of efficiency and effectiveness. II. C HALLENGES IN M ALICIOUS B EHAVIOR D ETECTION As users are becoming more dependent on mobile devices throughout their everyday lives, they are being heavily relied upon to store personal data. This data has many privacy concerns when it is used in mobile applications with or without the user’s knowledge. The user essentially has to trust that the application is using his/her data in a secure and appropriate way. The caveat is that many applications appropriately use this information and their behavior is difficult to distinguish compared to that of a malicious application. For example, if there is an application that provides the user a list of nearby businesses that correspond with a search query, it would need to obtain the location of the device using the GPS sensor and provide that to a server to provide results. This would be a justified use of a restricted resource and should not be flagged as malicious behavior. Alternatively, if the application also provided the location data to another source such as an advertisement database or another 3rd party for targeted advertising, this should be flagged as malicious. As an extension, an alarming percentage of users do not exhibit the comprehension or attentiveness to make sound decisions on whether a particular application needs the permissions it has requested [11]. If a user has downloaded an application from an Android marketplace, there is a strong likelihood thatthe user will not be able to determine whether or not to cancel the installation based on the permissions and possibility of malicious execution. There are several tools that attempt to detect such potential malicious behaviors using static analysis. However, there currently exists no comparison between them, which leaves a user at a loss when it comes to choosing the correct tool to use. Furthermore, there exists no precedent of how a new tool in this realm should be evaluated. While there exists a benchmark of hand-tailored applications that exhibit the most common information flows, originally proposed by the creators of FlowDroid [6], there is not formal comparison that puts the benchmark to use. The aim of this paper is to analyze

Fig. 1: Taint Analysis Example various tools and determine the effectiveness and effectiveness of finding information flows in Android applications. III. I NFORMATION AND TAINT A NALYSIS This section discusses the various aspects involved in the detection of information flows and potential privacy leaks as well as the tools that are used for our analysis and comparison. A. Information Flow An information flow tracks the data stored inside of an object x and then transferred to another object y. As we are detecting information flows in Android applications, we define information flow as the transfer of information from an information source such as accessing a GPS location API to an information sink such as the application writing to a socket. In general, this refers to information being obtained by an application and then sent outside of the application, either to another application or the Internet. Sensitive data acquired by an application through an API can be potentially leaked maliciously through an information flow to an information sink. Without a valid information flow, sensitive information cannot leave the application or a pose a risk for users. B. Taint Analysis Taint analysis is used to track information flows from an information source to a sink. Data derived from an information source is considered tainted while all other data is considered untainted. An analysis tool tracks this data and how it propagates throughout the program because tainted data can also influence other information through assignments or other operations. For example, a left-hand operand of an assignment statement will be tainted if and only if one or more of the righthand operands were previously tainted. Taint can be revoked under circumstances such as a new expression or assignment. Figure 1 shows an example of how taint analysis works. A variable x is tainted based on the fact that it is assigned

ISBN: 1-60132-489-8, CSREA Press ©

48

Int'l Conf. Software Eng. Research and Practice | SERP'18 | information from a location API call. This taint is then propagated to variables y and z due to assignment operations with x as a right-hand operand. Variable z is then untainted by performing a new operation which effectively erases the tainted information from the variable. Lastly, the data in variables y and z are written to a socket to a location on the Internet. Only the writing of the data within y represents an information flow since it still contains tainted information. As such, a taint analysis tool will report this flow to the user as it potentially represents a privacy leak in the application. There are two contrasting techniques that can be used for taint analysis: static taint analysis and dynamic taint analysis. Three tools are known to support these analyses: FlowDroid, IccTA, and DroidSafe. Static Taint Analysis is done outside the operating environment and is run using a tool on a computer. Static analysis tools are able to provide better code coverage compared to dynamic analysis tools but suffer in that they are unable to access the runtime information of the intended environment since it is running on a different platform. Dynamic Taint Analysis is run while a program is executing. It is often run on the same hardware and while the programming is actually being run in the intended environment. Since it is running during program execution, it has access to all of the runtime information and can determine flows that are inherently too complex for static analysis tools. The downside of dynamic analysis tools is that they are only able to find flows that are actually being executed. Therefore, they often have less code coverage than static analysis tools. FlowDroid: FlowDroid [6] is an extension of the SOOT framework [22] that is able to precisely detect information flows in Android applications. The Soot framework is a compiler extension for Java applications. It takes as input either Java source code or Java bytecode and outputs Java bytecode. It was originally developed as an optimizer for Java programs. The main features of SOOT are its intermediate representations of the Java code that are produced, in particular Jimple, which is a typed three-address code. In addition, Soot also provides callgraph and pointer information, which are essential to developing a static anal ysis architecture. FlowDroid extracts the contents of an Android .apk file and parses the Android manifest, XML, and .dex files that contain the executable code. Using the tool SuSi [5], the sources and sinks of the application are identified and the callback and lifecycle methods are determined. From this, a dummy main method is generated. This is used to generate a interprocedural control flow graph which starts at sources where data is tainted and then followed to sinks to determine flows in the application. IccTA: Inter-Component Communication Taint Analysis (IccTA) [18] focuses specifically on detecting privacy leaks between components within Android. IccTA uses Dexpler [7] to convert Dalvik executables into Jimple, which is the representation used internally by the Soot [16] framework, which is popular for analyzing Android applications. They then utilize Epicc [18] and IC3 [20] to extract the inter-component

communication (ICC) links from an application and combine that with the Android intents that are associated with them. They then modify the Jimple representation to connect the components that are involved in the ICC so that they can be analyzed with data-flow analysis. With this enhanced model of the application, the process employs an altered version of FlowDroid to detect flows. DroidSafe: DroidSafe [13] is another system for detecting information leaks in Android applications. DroidSafe utilizes the Android Open Source Project [1] which is a precise model of the Android environment and use a stubbing technique to add in additional parts of the Android runtime that are imported for their analysis. This model is then used with static analysis techniques that are highly scalable to detect information leaks in the target Android application. IV. E XPERIMENT S ETUP This section describes the experiment setup and and design for our evaluation of the Android static analysis tools. A. Experimental Design and Metrics The experiments were executed on a GNU/Linux workstation with kernel version 4.2.0-16 with an Intel(R) Core(TM) i5-4670K CPU @ 3.40GHz with 16 Gb of RAM. 1) Test Suites: There are two test suites used to benchmark the various Android static taint analysis tools, DROIDBENCH, and F-Droid: DROIDBENCH is a suite Android applications that were originally developed by the creators of FlowDroid with the intent of creating a Java-based benchmark suite with the same aim as SecuriBench [19] which was designed for Java-based web applications. In our experiments, we utilize DROIDBENCH version 2.0 which contains 119 Android applications that were specifically written with the intent of testing Android taint analysis tools. while this suite of applications can be used with both static and dynamic taint analysis tools, many of the applications address the shortcomings of static analysis tools such as field sensitivity and object sensitivity. Also, there are also applications that focus on Android specific issues such as asynchronous callbacks and user interface interactions. DROIDBENCH has been used in the development of many Android Taint analysis tools including [6], [8], [13], [17], [21]. We consider thirteen classes of applications within the DROIDBENCH suite: Aliasing, Android specific, Arrays and lists, Callbacks, Emulator detection, Field and object sensitivity, General Java, Implicit flows, Inter-application communication, Inter-component communication, Life cycle, Reflection, Threading. In addition to being specifically developed to test taint analysis tools, each application includes annotations that denote how many information flows exists in the application. This information is used in the evaluation of the tools in our experiments to determine a tool’s effectiveness. F-Droid [2] is a repository of Free and Open Source Software (FOSS) Android Applications. F-Droid was founded in 2010 by Ciaran Gultnieks and now hosts over 1,800

ISBN: 1-60132-489-8, CSREA Press ©

Int'l Conf. Software Eng. Research and Practice | SERP'18 | applications available for download on Android devices. All of these applications have the source code freely available to the public. Of these applications, 50 randomly chosen applications were used in our experiments in comparing the static taint analysis tools. These applications are used to compare the tools on larger Android applications. 2) Evaluation Metrics: Each static taint analysis tool is evaluated in terms of efficiency and effectiveness. Efficiency is measured based on execution time for each tool. The other measure is that of effectiveness which will differ based on which application suite we are considering. In the DROIDBENCH application suite, there are a predetermined number of information flows that a given application will exhibit. Knowing this, we can perform several computations to calculate the effectiveness of a tool. In order to calculate these metrics, it is first necessary to obtain the true positives Tp , true negatives Tn , false positives Fp , and false negatives Fn as identified by each of the tools. The first measure is precision, which is the ratio of correctly identified information flows over the total number of identified flows and can be modeled by the following: precision =

Tp Tp + Fp

The next measure is recall, which can be defined as out of all of the actual information flows in the applications suite versus how many of those flows were identified. Recall can be modeled by the following: recall =

Tn Tp + Fp

The third measure is accuracy, which is defined as the ratio of true results to the total number of cases analyzed and can be modeled by the following: accuracy =

Tp + Tn Tp + Fp + Tn + Fn

The last measure is F-measure, which combines both precision and recall together. It provides a weighted average between precision and recall. In this work we employ the balanced Fscore metric (F1) which is the harmonic mean of precision and recall which is modeled by: F1 = 2 ∗

precision ∗ recall precision + recall

In terms of the F-Droid application suite, since we do not know the number of information flows that actually exist in the applications, we are limited to only comparing the static analysis tools based on the raw number of flows reported.

49

Fig. 2: FlowDroid - DROIDBENCH

platform to be imported in order to analyze the applications. For consistency, the Java Virtual Machine (JVM) used to run each tool is allocated 16 gigabytes of RAM. In order to perform the comparison, a tool was written in Python 3. This tool sets up the necessary environments for each tool and runs them on an input set of Android applications. The tools runs the tool and times the execution time for each run. The tool then parses the output for a run and extracts the number of information flows that were reported by the tool. In regards to the DROIDBENCH application suite, these reported results are subsequently compared with the actual number of information flows that exist in the application as reported by the authors. This is used to determine whether or not the tool successfully discovered all flows and whether or not there were false positives false negatives reported. All executions were terminated if they took longer that 2 hours as some applications caused several of the tools in question to reach a state in which they would never finish. The timing results will be useful for comparison of the tools in terms of cost to an organization that wishes to utilize these tools. They may not be helpful to a developer that wishes to analyze their personal application, but to a company such as Google that wishes to utilize this tool on a large scale, the time it takes to run a taint analysis tool for screening uploaded Android applications correlates strongly with a cost of operation. The longer an application takes to run can impose a large cost when applied to a large scale. The effectiveness measure, which is how many applications in which a tool successfully detects an information flow is also very important in terms of comparison. The more effective a tool is demonstrates its worth in both research and industry. These results are then stored JavaScript Object Notation (JSON) format for use in analysis and graph generation for a clear comparison between taint analysis tools. Graphs are generated using the Python modules NumPy and Matplotlib . V. E VALUATION

B. Experiment Implementation Three Android Static Taint analysis tools were chosen for this work: FlowDroid, Inter-Component Communication Taint Analysis (IccTA), and DroidSafe. The three tools are written in Java and capable of running on an Android .apk file and focus on solving the same problem of detecting information flows in Android applications. They also all require the Android

Figures 2, 3, and 4 display the results of the tools being executed against the DROIDBENCH application suite. Each bar of the graphs represent the execution time of a tool against a particular application of the test suite. It should be noted that due to large difference in execution times between the various tools, the scaling is not the same between each of

ISBN: 1-60132-489-8, CSREA Press ©

50

Int'l Conf. Software Eng. Research and Practice | SERP'18 |

Fig. 3: IccTA - DROIDBENCH Fig. 5: Statistics for DROIDBENCH

Fig. 4: DroidSafe - DROIDBENCH the graphs. The color of the bars also indicate a measure of success in the execution. A green bar indicates that the tool was able to successfully identify all information flows from the application. A yellow bar indicates that the tools identified more flows in the application than were intended, leading to one or more false positives in the results. A red bar indicates that the tool was unable to identify one or more information flows that existed in the test application. Figure 2 illustrates the results of FlowDroid on the DROIDBENCH test suite. It reveals that the execution time of running FlowDroid over the application suite ranged from 0.12 seconds to 4.27 seconds. It also highlights that many of the applications that applications such as those in the intercomponent communication category were over estimated in the number of flows leading to false positives. Also, it is evident that FlowDroid was unable to successfully discover flows in the implicit flow and reflection categories of applications. Figure 3 highlights the results of IccTA on the DROIDBENCH test suite. The execution time of running IccTA on the DROIDBENCH suite ranged from 15.26 seconds to 288.80 seconds. This is substantially higher than that of FlowDroid and also exhibits the largest range of execution times out of any of the other static taint analysis tools. By observing the chart, we can see that there are many similarities in the results between IccTA and FlowDroid. This can be somewhat expected since IccTA uses a modified version of FlowDroid in their implementation. However, it is surprising that IccTA had many false positives in the intercomponent communication applications such as Activity- Communication2.apk since IccTA aims at addressing flows that cross multiple components. Figure 4 shows the results running DroidSafe on the DROIDBENCH test suite. The execution time for DroidSafe on the application suite ranged from 63.54 seconds to 196.94 seconds. The execution times for DroidSafe were much more

predictable that that of FlowDroid and IccTA. DroidSafe also had much better results in terms of the inter-component communication applications and was able to correctly identify most of the intended flows in those applications. DroidSafe, along with the other tools was unable to identify the information flows in the implicit flow category. This may need to be a focus for future research in the area. Figure 5 breaks down the results for the three Android static taint analysis tools when run against the DROIDBENCH application suite. True positives represent information flows that are explicitly declared in the various applications that were successfully reported by the tools. True negatives represent the number of applications that contained no information flows and were reported as such. False positives represent flows that were reported by a tool that did not actually exist in the application. False negatives represent when a static analysis tool reports that there were no flows present in the application when there were flows declared. Using these statistics, we were able to derive the precision, recall, accuracy, and F-score for each tool based on the DROIDBENCH benchmark suite. Precision, which is the fraction of reported flows that are indeed existing flows with respect to the number of incorrect flows reported, for the three analysis tools is 71.2%, 68.5%, and 68.1% for FlowDroid, IccTA, and DroidSafe respectively. Despite reporting the least amount of actual information flows, FlowDroid was in fact the most precise tool, albeit by a small margin. IccTA and DroidSave also had very similar precision results. In terms of recall, which is the ratio of true reported flows against the number of reported flows, 60% for FlowDroid, 63.5% for IccTA, and 87.1% for DroidSafe. DroidSafe had by far the highest recall out of any of the tools which is directly correlated with the fact that it reported a significantly larger number of information flows. In terms of accuracy, both FlowDroid and IccTa had similar results with 53.4% and 53.3% respectively. DroidSafe, on the other hand, had a much higher accuracy of 64.5%. This theme also holds true in terms of the F-score, which is a weighted average between precision and recall where Droid- Safe had a much higher value of 0.76 compared to FlowDroid and IccTA (0.65 and 0.66). Figure 6 displays the execution statistics for each of the taint analysis tools when run against the DROIDBENCH applications. FlowDroid was by far fastest tool with an average

ISBN: 1-60132-489-8, CSREA Press ©

Int'l Conf. Software Eng. Research and Practice | SERP'18 |

Fig. 6: Execution time for DROIDBENCH

Fig. 7: Results for FDROID

execution time of only 1.55 seconds and a maximum of 4.27 seconds. IccTA and DroidSafe had very similar average execution times with 151.03 seconds and 151.80 seconds respectively but the amount of time that it took to complete for DroidSafe was much more consistent. IccTA was very inconsistent in terms of execution times with some applications only taking 15-16 seconds while others taking close to 5 minutes while DroidSafe never took longer than 3.5 minutes. Figure 7 presents the results from running all three tools on the F-Droid application suite. The average execution time only takes into consideration the runs of each tool that did not result in a timeout or a failure. This experiment shows that FlowDroid once again executed significantly faster than the other two tools. IccTA and DroidSafe took a similar amount of time to complete although IccTA had a considerably lower number of failed executions. Furthermore, all of the failures that IccTA encountered were due to exceeding the alloted amount of time while most of the failures in DroidSafe were attributed to running out of memory. It should also be noted that although DroidSafe only successfully analyzed 10 of the 50 Android applications, it reported 384 information flows, which is a exceedingly high number compared to the other two tools that had a much greater number of successful executions. Many of these flows were reported from one application, Email Pop-up, which Droid- Safe reported 237 flows. VI. D ISCUSSION While comparing the various static analysis tools in terms of execution time, we look only at the time that it takes for the tools to finish their analysis, regardless of whether or not there was a flow detected. This is useful in regards to cost for an organization to implement a static taint analysis tool into their screening process for application submissions. The longer a tool takes, the more resources it consumes and the more cost it induces on an organization. From the results, we see that FlowDroid performs significantly quicker than the other two tools. IccTA and DroidSafe were very similar in regards to execution time, although the time taken for IccTA is much more variable than that of DroidSafe. As such, it will be much cheaper for an organization to use FlowDroid than the other two tools. As for an individual developer or a customer that wants to use one of these tools to test an application,

51 the amount of time that it takes to execute may not be of concern. However, when looking at tools such as DroidSafe, we were unable to successfully execute the tool on 80% of the applications in the F-Droid application suite which represent realworld Android applications. If the average developer or customer does not have access to powerful enough hardware to even run the tool, it diminishes its value for potential users. In terms of flow detection, we look not only at the raw number of flows detected but the effectiveness of the tools to accurately and precisely detect the information flows of an application. On the DROIDBENCH application suite, FlowDroid and IccTA performed very similar in successfully reported the information flows. DroidSafe on the other hand performed much better on the benchmarking suite. DroidSafe was able to detect the highest number of true flows and had the best accuracy and f-score measures. However, DroidSafe also produced the highest number of false positives. This leads us to question the high number of flows detected in the F-Droid suite by DroidSafe. There is a high likelihood that a large portion of the 384 flows detected could be false positives. VII. R ELATED W ORK AppIntent [23] is an automated tool that aims to detect privacy leakage by creating a sequence of user interface manipulations that will initiate a sequence of events that will cause a transmission of sensitive data in an Android device. These events are then analyzed to derive the intent of the operations leading up to the sensitive data transmission and determine whether the user intended for the data to be transmitted or not. This is done by using static taint analysis to extract all possible inputs that lead to paths that could result in data transmission of a restricted resource. Using these inputs, they then automate the application execution on a step by step basis to generate user interface manipulations that will lead to a transmission of sensitive data. These results are then highlighted and displayed to a human analyst who then determines whether the transmission was indeed intended by the events that preceded it. AndroidLeaks [12] is another static analysis tool that is designed to identify potential privacy leaks in Android applications. This tools uses data flow analysis to determine if sensitive information has reached a sink as in a program exit point that leads to the Internet. AndroidLeaks leverages WALA (Watson Libraries for Analysis) [4]. ScanDal [15] is another automated tool that attempts to detect privacy leaks in Android applications. ScanDal uses a strict static analysis approach that uses Dalvik bytecode as it’s input. The benefit of this is that ScanDal can be run on any application as it doesn’t need to the source code. Many other approaches must use tools such as DED [9] to decompile the Dalvik bytecode into Java source code. This present many problems as the decompiled code may not be complete and fails in many instances. ScanDal translates the Dalvik bytecode into a subset of Dalvik instructions that they called Dalvik Core, an intermediate language. The translated Dalvik Core code is then analyzed and if there is a value that is created at

ISBN: 1-60132-489-8, CSREA Press ©

52

Int'l Conf. Software Eng. Research and Practice | SERP'18 | an information source, it is analyzed to see if it flows out of a information sink and if so, is considered a privacy leak. ApkCombiner [17] is an additional tool that allows for the detection of information leak that can occur between two running applications on an Android device. ApkCombiner essentially takes multiple Android applications and combines them together into a single .apk file that cant be used with other static taint analysis tools that focus on intercomponent communication to detect privacy leaks. This abstracts the inter-application communication away from existing tools and extends their functionality without alteration. Other tools such as AsDroid [14] detect stealthy behaviors in Android applications using the using the text in the user interface comparing them to the program behavior to determine if there are contradictions with the proposed action of the application and what really happens behind the scenes. AsDroid was able to detect stealthy behaviors such as SMS messages, HTTP connections, and phone calls. While AsDroid was successful in detecting stealthy behaviors, it could not indicate the intent behind the behaviors, just that they were not consistent with the user interface. While there is a lot of work in the realm of privacy leak and information flow detection for Android devices, there has been no prior work in comparing these tools for their efficiency and effectiveness. Our work lays ground for investigating the worth of various analysis tools and will hopefully spur more defined measures of comparison for these important tools. VIII. C ONCLUSION AND F UTURE W ORK Privacy leak detection has steadily become a important issue in terms of mobile applications. As mobile platforms continue to be a target for malicious users seeking the private information of others, research will be necessary to detect and remove these threats. Tools such as static taint analysis tools can offer us an automated way to detect potential threats in Android applications. This paper deals with comparing some of the state of the arts tools in the area to determine their effectiveness as well as efficiency, which can play an important role when looking to implement such a tool on a large scale. The results show that the effectiveness of a tool comes at a cost of efficiency. While DroidSafe performed the best in terms of accuracy in the detection of information flows, it suffered in terms of execution time and resource consumption. FlowDroid and IccTA, on the other hand, ran more efficiently but at the cost of missing more potential leaks in applications. In future work, we will increase the number of applications and investigate more taint analysis tools. We will also investigate other sources for new applications suite such as Contagion Mini Dump3 and Malware DB4 that contain known Android Malware. We will also pursue Android dynamic analysis tools to compare both approaches of taint analysis on the Android platform. R EFERENCES

[3] Smartphone os market share, 2015 q2. http://www.idc.com/prodserv/ smartphone-os-market-share.jsp. [4] Watson libraries for analysis, wala. http://wala.sourceforge.net/wiki/ index.php/Main_Page. [5] S. Arzt, S. Rasthofer, and E. Bodden. Susi: A tool for the fully automated classification and categorization of android sources and sinks. University of Darmstadt, Tech. Rep. TUDCS-2013-0114, 2013. [6] S. Arzt, S. Rasthofer, C. Fritz, E. Bodden, A. Bartel, J. Klein, Y. Le Traon, D. Octeau, and P. McDaniel. Flowdroid: Precise context, flow, field, object-sensitive and lifecycle-aware taint analysis for android apps. Acm Sigplan Notices, 49(6):259–269, 2014. [7] A. Bartel, J. Klein, Y. Le Traon, and M. Monperrus. Dexpler: converting android dalvik bytecode to jimple for static analysis with soot. In Proceedings of the ACM SIGPLAN International Workshop on State of the Art in Java Program analysis, pages 27–38. ACM, 2012. [8] Y. Cao, Y. Fratantonio, A. Bianchi, M. Egele, C. Kruegel, G. Vigna, and Y. Chen. Edgeminer: Automatically detecting implicit control flow transitions through the android framework. In NDSS, 2015. [9] W. Enck, D. Octeau, P. McDaniel, and S. Chaudhuri. A study of android application security. In USENIX security symposium, volume 2, page 2, 2011. [10] A. P. Felt, E. Chin, S. Hanna, D. Song, and D. Wagner. Android permissions demystified. In Proceedings of the 18th ACM conference on Computer and communications security, pages 627–638. ACM, 2011. [11] A. P. Felt, E. Ha, S. Egelman, A. Haney, E. Chin, and D. Wagner. Android permissions: User attention, comprehension, and behavior. In Proceedings of the eighth symposium on usable privacy and security, page 3. ACM, 2012. [12] C. Gibler, J. Crussell, J. Erickson, and H. Chen. Androidleaks: automatically detecting potential privacy leaks in android applications on a large scale. In International Conference on Trust and Trustworthy Computing, pages 291–307. Springer, 2012. [13] M. I. Gordon, D. Kim, J. H. Perkins, L. Gilham, N. Nguyen, and M. C. Rinard. Information flow analysis of android applications in droidsafe. In NDSS. Citeseer, 2015. [14] J. Huang, X. Zhang, L. Tan, P. Wang, and B. Liang. Asdroid: Detecting stealthy behaviors in android applications by user interface and program behavior contradiction. In Proceedings of the 36th International Conference on Software Engineering, pages 1036–1046. ACM, 2014. [15] J. Kim, Y. Yoon, K. Yi, J. Shin, and S. Center. Scandal: Static analyzer for detecting privacy leaks in android applications. MoST, 12, 2012. [16] P. Lam, E. Bodden, O. Lhoták, and L. Hendren. The soot framework for java program analysis: a retrospective. In Cetus Users and Compiler Infastructure Workshop (CETUS 2011), volume 15, page 35, 2011. [17] L. Li, A. Bartel, T. F. Bissyandé, J. Klein, and Y. Le Traon. Apkcombiner: combining multiple android apps to support inter-app analysis. In IFIP International Information Security Conference, pages 513–527. Springer, 2015. [18] L. Li, A. Bartel, T. F. Bissyandé, J. Klein, Y. Le Traon, S. Arzt, S. Rasthofer, E. Bodden, D. Octeau, and P. McDaniel. Iccta: Detecting inter-component privacy leaks in android apps. In Proceedings of the 37th International Conference on Software Engineering-Volume 1, pages 280–291. IEEE Press, 2015. [19] B. Livshits. Stanford securibench. http://suif.stanford.edu/livshits/ securibench, 2005. [20] D. Octeau, D. Luchaup, M. Dering, S. Jha, and P. McDaniel. Composite constant propagation: Application to android inter-component communication analysis. In Proceedings of the 37th International Conference on Software Engineering-Volume 1, pages 77–88. IEEE Press, 2015. [21] F. Shen, N. Vishnubhotla, C. Todarka, M. Arora, B. Dhandapani, E. J. Lehner, S. Y. Ko, and L. Ziarek. Information flows as a permission mechanism. In Proceedings of the 29th ACM/IEEE international conference on Automated software engineering, pages 515–526. ACM, 2014. [22] R. Vallée-Rai, P. Co, E. Gagnon, L. Hendren, P. Lam, and V. Sundaresan. Soot-a java bytecode optimization framework. In Proceedings of the 1999 conference of the Centre for Advanced Studies on Collaborative research, page 13. IBM Press, 1999. [23] Z. Yang, M. Yang, Y. Zhang, G. Gu, P. Ning, and X. S. Wang. Appintent: Analyzing sensitive data transmission in android for privacy leakage detection. In Proceedings of the 2013 ACM SIGSAC conference on Computer & communications security, pages 1043–1054. ACM, 2013.

[1] Android open source project. https://source.android.com. [2] F-droid. https://f-droid.org/.

ISBN: 1-60132-489-8, CSREA Press ©

Int'l Conf. Software Eng. Research and Practice | SERP'18 |

SESSION TESTING METHODS, AND SECURITY ISSUES + REVERSE ENGINEERING Chair(s) TBA

ISBN: 1-60132-489-8, CSREA Press ©

53

54

Int'l Conf. Software Eng. Research and Practice | SERP'18 |

ISBN: 1-60132-489-8, CSREA Press ©

Int'l Conf. Software Eng. Research and Practice | SERP'18 |

55

Using Deep Packet Inspection to Detect Mobile Application Privacy Threats Gordon Brown

Kristen R. Walcott

University of Colorado, Corlorado Springs Colorado Springs, CO 80918 Email: [email protected]

University of Colorado, Corlorado Springs Colorado Springs, CO 80918 Email: [email protected]

Abstract—Modern mobile and embedded devices hold increasing amounts of sensitive data, with little visibility into where that data is sent. Existing solutions tend to be platform-specific, targetting only Android or iOS. We present a technique and a develop a tool that can detect exfiltration of private data in a platform-independent way by utilizing deep packet inspection. We examine the techniques and patterns common to sensitive personal data attacks and evaluate our tool’s effectiveness in detecting the exfiltration of sensitive data on known mobile malware. We learn that the majority of the known-malicious applications in our study were unable to exfiltrate contact information. Only one known-malicious application under test was able to connect to a remote server and send contact information. One known-malicious application under test was able to connect to a remote server and send contact information, which raises awareness in the software assurance community. Keywords: Mobile Privacy, Security, Information Assurance, Malicious Applications

I. I NTRODUCTION As mobile and connected device usage continues to grow, the type and amount of data collected, either by actively malicious applications or by “free” advertising-supported applications is a growing concern in the minds of users. Users are generally more concerned about some types of data than others, but most are especially uncomfortable with sharing data such as text messages, contact lists, and calendar information. Having this information stolen can lead to serious consequences. For example, if a user’s text messages are stolen and that user engages in mobile banking, the stolen text messages could be used to obtain access to that user’s bank account. Stolen contact information could lead to spear phishing attacks against friends or coworkers. Fueled by this concern, there is a growing body of research into ways to detect applications which transmit user data to undesirable destinations, but there are still severe limitations on the techniques available. There is a wealth of research into analyzing mobile applications, both in the realms of static analysis such as ScanDal [11] and dynamic on-device analysis such as TaintDroid [8]. However, these approaches are specific to each platform - a separate framework is required for the analysis of applications on other platforms. An example is PiOS [7] for Apple iOS applications. This leads to a lack of analysis tools for less popular and more recent platforms, such as Windows Phone or Blackberry, or

devices which are completely proprietary, such as Internet of Things (hereafter “IoT”) devices. The lack of tools applicable to IoT devices is particularly concerning - an ever-increasing number of household object are available in Internet-connected form, from thermostats [22] to cookware [13]. Because the manufacturers of these devices generally do not distribute copies of their software, even in binary form, they are particularly difficult to analyze for security or privacy problems Analyzing the network traffic from mobile applications or connected devices can reveal with certainty what data is being transmitted and where it is being sent to in a platformindependent way. A tool designed to automatically inspect network traffic can warn the user when potentially private data is sent to an unknown destination. Many of the data categories users are most concerned about sharing (again, per Ferreria, et al.), including “Messages & Calls”, “Contacts”, and “Browser” are most likely to include somewhat structured data such as telephone numbers, email addresses, and URLs in close proximity to timestamps, durations, text blobs, or audio files. This type of data is most likely to be easily recognized by an automatic traffic inspection tool. Research into using network analysis techniques to detect privacy leaks has been very limited to date, particularly with respect to end user privacy protection. Network-based analysis allows platform independent data protection in a way that is sorely needed in this age of proprietary platforms, both in the mobile and embedded realms. In this paper, our main contributions are: • A technique for detecting exfiltration of private data over a local wireless network using deep packet inspection. • A tool which implements the above technique. • An analysis of several mobile applications that demonstrate the effectiveness of the deep packet inspection technique. II. I DENTIFICATION OF P RIVATE DATA There are two main components to identifying exfiltration of private data: Acquiring the transmissions that may contain private data and identifying private data in those transmissions. A. Network Traffic Interception Automatic analysis of data transmitted over a network, also known as Deep Packet Inspection, is used in existing systems

ISBN: 1-60132-489-8, CSREA Press ©

56

Int'l Conf. Software Eng. Research and Practice | SERP'18 | to detect exfiltration of sensitive data from high-security records without knowing who they were sent to or from, and networks [21], such as government networks. Fundamentally, “Contacts” is composed entirely of this data. inspecting traffic to detect transmission of private user data is The most common contact information stored on mobile denot very different, however, the specific techniques involved vices are telephone numbers and email addresses. Both of these do vary. types of information are highly structured - telephone numbers In order to analyze network traffic, the traffic must be taking the form of 7 to 11 numbers, depending on the inclusion observed. For reasons of efficiency and privacy, networks of area and country code, with some optional delimiters (i.e. typically do not transmit, or route traffic to devices which the parentheses which typically surround the area code, or are not necessary to get the traffic to its intended destination. the dash separating the first three and last four digits). Email There are several ways to force traffic to be directed through addresses take the form of “{individual}@{domain}.{tld}”, a specific device so that it may be analyzed. Some of these where “{individual}”, “{domain}”, and “{tld}” take the form of include Network Tapping, ARP Spoofing, or the use of proxy one or more typically alphanumeric characters, but may include periods or Unicode characters. Such highly structured data is servers. recognizable efficiently through the use of relatively simple Network tap Network tapping involves inserting a device directly regular expressions. Regular expressions can be converted to into the networking flow, similar to a phone tap. This deterministic automata [2], which can be run very quickly. More complex data, such as calendar events or documents approach requires specialized and expensive hardware, (which users are also mostly uncomfortable with sharing, per and is most often used by systems which passively Ferreira, et. al.) are typically more complex and require more monitor network traffic, such as NETSCOUT’s Trucomplicated heuristic techniques. View network performance analysis systems. ARP Spoofing III. I MPLEMENTATION ARP, or Address Resolution Protocol, is used by computer networks to identify which devices on a Given the challenges, we design and implement a tool local network traffic should be sent to, given an IP which intercepts and inspects network traffic for the purpose address. ARP is susceptible to an attack which allows of detecting the exfiltration of private data via a local Interneta single device to claim all IP addresses, forcing all connected network. We investigate the effectiveness of this traffic to flow through the attacking device before technique using the tool described. being forwarded on to other devices. This attack can be detected by security systems [1] on high-security A. Traffic Capture networks, but has been used with success in traffic Data collection is accomplished through the use of the inspection devices targeted at home networks, such ARP spoofing technique described above, taking inspiration as Disney’s parental control system Circle [4]. from the man-in-the-middle attack suite Ettercap [15]. To Proxy server accomplish this, forged ARP packets are broadcast by the Many devices include support for configuring a proxy program, claiming to own the addresses of both the device under server or Virtual Private Network (VPN), which test and the Internet gateway. Sending ARP packets without automatically forwards all traffic through that server. requests is allowed by the ARP specification [17], and these This is often done to allow access into or out of unrequested ARP packets are known as “ARP Announcements” a highly secured network. However, this requires or informally as “Gratuitous ARPs”. These announcements configuration on each device to be monitored, and are sent every 1.5 seconds, in our implementation, as well as many IoT devices do not have the ability to use proxy whenever other ARP packets are detected on the network. This servers. ensures the program misses little or no traffic from the device All of these techniques could be termed “man-in-the-middle attacks,” as they seek to allow a party which is not the intended recipient of network traffic to inspect or modify it en route. However, each of these techniques are also used for nonmalicious purposes, such as network troubleshooting, parental control [4], or advertisement blocking [5], because these things require some level of analysis of network traffic without being the intended recipient of that traffic. B. Identifying Private Data The categories users are least comfortable sharing with third parties are “Messages & Calls” and “Contacts” [9]. Both of these categories include the contact information of others: There would be relatively little value in stealing messages or call

under test. After being captured for analysis, the traffic is forwarded to its intended destination using the built-in Linux IP Forwarding feature, the same way Linux-based routers operate. This is possible as the operating system running our program does not receive the forged ARP packets, and thus uses the unaltered IP address to Ethernet address pairings. In addition to the ARP spoofing mode of capturing traffic to analyze, the tool can also be run on saved packet captures produced by other tools, such as Wireshark or Ettercap. B. Traffic Analysis We postulate HTTP traffic is the most likely to be used to communicate private information to a hostile server, due to its

ISBN: 1-60132-489-8, CSREA Press ©

Int'l Conf. Software Eng. Research and Practice | SERP'18 |

57

Malware Name Beita DougaLeaker FakeDaum Godwon Godwon2 Loozfon Scipiex Simhosy

Contact Exfiltration (Manually Detected) No1 No1 Yes No1 No1 No1 No1 No1 TABLE I M ALWARE T ESTED

Contact Exfiltration (Automatically Detected) No1 No1 Yes No1 No1 No1 No1 No1

Application Name Popular Music Funny Ringtone Sexy Girlfriend Fake Call Uber Facebook

Contact Exfiltration (Manually Detected) No No No No TABLE II A PPLICATIONS T ESTED

Contact Exfiltration (Automatically Detected) No No No No

\+?\d?([\s-.\/]?)((\(\d{3}\)?)|(\d{3}))([\s-. Suspicious data sent to 103.30.7.178: +685-555-1234 Suspicious data sent to 103.30.7.178: +555-555-3348 \/]?)(\d{3})([\s-.\/]?)(\d{4}) Fig. 1: A Regular Expression for Finding Telephone Numbers Suspicious data sent to 103.30.7.178: +723-555-2424 Fig. 2: Sample Output ease of use in mobile applications and servers. Thus, we focus IV. E VALUATION on the analysis of HTTP traffic. We show that inspection of network traffic is an effective Collected traffic is divided into “flows” based on source and means of detecting personal data exfiltration by malicious destination IP address, as well as source and destination TCP mobile applications and notifying the user of what information port numbers. This allows us to reassemble HTTP transactions, was stolen and where it was sent. which are then searched based on patterns likely to occur when Evaluation was performed by installing and running knowntransmitting private data. For example, a regular expression malicious mobile applications collected from Contagio Mosuch as the one in Figure 1 can recognize telephone numbers. bile [16], a repository of Android and iOS malware and running If a telephone number is detected in outgoing traffic, then the these applications, while connected to a private, passwordtelephone number can be extracted, marked as suspicious data, protected wireless network, which the network analysis tool and presented to the user for inspection. Email addresses can was also running on. Several well-reviewed applications are be similarly extracted and reported. run as well, in order to seek false positives. Futher, multiple Additional pieces of data detected include timestamps in lesser-known applications from the Google Play Store were several human-readable formats (such as those specified in installed and run to attempt to find data exfiltration in live ISO8601 [10]), which are suspicious when in proximity to applications. Finally, some traffic constructed to show signs of telephone numbers or email addresses, as this may indicate data exfiltration (using HTTP POST to send a list of phone transmission of text messages or emails. GPS coordinates numbers and email addresses) and traffic recorded from filling are also recognized through similar means, and are likely to out a web form were analyzed. All traffic analyzed was saved, represent location information. unmodified, for manual inspection in order to confirm results. The marked transmissions can then be presented to the user for closer examination, along with the destination of the A. Evaluation Environment and Setup transmission. This is done through printing the information Running on a private wireless network with no other devices to the screen, as seen in Figure 2, as well as exporting the packets in the flow to a standard packet capture (“pcap”) file, connected ensures that there is no accidental violation of privacy for analysis in common packet analysis tools such as Wireshark. by collecting the traffic of individuals not part of the experiment This allows the user to determine if private information has - all traffic collected is explicitly either sent from or to ether been leaked, and if so, which information it was, as well as the device running the application under test, or the device running the network analysis tool, both of which are owned where it was going. by the author who has, of course, given his full consent. Additionally, using a private network with only the network analysis tool and the device under test connected minimizes 1 As noted in the text, this malware was unable to send private data because the threat of network traffic not associated with the device the server it attempts to send to has been taken offline.

ISBN: 1-60132-489-8, CSREA Press ©

58

Int'l Conf. Software Eng. Research and Practice | SERP'18 | under test producing false positives or otherwise interfering with results. The device used to test the various applications is a Samsung Galaxy Nexus running Android 4.3. An Android device was selected for the test because Android devices are commonly used, contain a significant amount of sensitive data, and often run third-party applications. However, as noted in the introduction, the results should hold across all platforms. The Android device was prepared for testing by entering a number of false contact list entries, text messages, and emails that are easily recognizable so as to be . The device was run without a connection to a mobile network in order to avoid text messages or calls being send to “premium” numbers which charge money, a common tactic in mobile malware. This also forces all network traffic over the local WiFi network, which allows the capture of all network traffic. B. Traffic Capture and Analysis Each application, malicious and not, were installed, tested, and uninstalled before installing the next application in order to prevent previously installed applications from tainting the result of each application. Each malicious application is opened and used for a short period for its normal use (which varies by application) while traffic is captured. Traffic capture for each application is done twice, once using the tool we present and once using an application available from the Google Play Store, “Packet Capture” [20], which uses the “Proxy Server” approach described above, with the proxy server running locally on the Android device in question. This application uses Android’s built-in VPN facility, and ensures that all traffic will be compared. Finally, we constructed some network traffic to simulate entering an application sending private data over the network with a simple HTTP POST, as well as submitting private data into a web form.

Additionally, several applications from the Google Play Store were tested. Both well-reviewed applications and applications with few reviews and permissions that would allow the exfiltration of contact information were tested. However, none of the applications tested attempted to exfiltrate any contact information that was visible through either manual inspection or through automatic detection. The results of the analysis of the traffic from these applications can be seen in Table II. Our network analysis tool correctly identified all of the “artificial” cases described above. The first case of constructed traffic shows that detection is not unique to the “FakeDaum” application, and the latter case of a web form demonstrates a known false positive. D. Discussion of Results Our tool correctly identified all traffic which was either constructed to contain the transmission of sensitive private data or contained the transmission of sensitive data which could be identified manually. This shows the effectiveness of detecting exfiltration of private data through interception and analysis of network traffic, with a minimum of false positives. Despite the issues with the known-malicious applications being unable to connect to their associated remote servers, we can still ascertain some useful information. From the connection requests, all of the above applications were attempting to connect to TCP port 80, which indicates that they are using unencrypted HTTP connections, rather than encrypted HTTPS connections which would likely use TCP port 443. This implies that analysis of traffic to that server, if it had been present, would have revealed any sensitive data being transmitted, given the success with constructed data transmission, as well as with the traffic from FakeDaum. V. R ELATED W ORK

Existing systems for data exfiltration detection tends to focus on detecting transmission of a subset of a known corpus of C. Results documents [14]. This most useful in corporate or governmental The traffic captured from each application is inspected by environments, where there exist many sensitive documents, hand, using the open-source tool Wireshark [3] to determine such as confidential business plans or classified information. if there is any data being exfiltrated. This is compared to the A typical end user may have some such documents, such as results produced by our tool, which prints any suspicious data past tax returns, but is also concerned about data which is to the screen. typically much more easily accessible by a malicious mobile Several malicious applications known to exfiltrate contact list application, such as contact information. Such information does information acquired from Contagio were installed sequentially not fit the mold required for existing data exfiltration detection on the device under test, being sure to thoroughly remove each techniques, but is likely to fit more general signatures. For example, most telephone numbers will be seven, ten, or eleven application before installing the next. However, the majority of the known-malicious applications digits depending on the inclusion of area and country code. Our were unable to actually exfiltrate contact information, as the approach, using these general signatures, allows for increased remote server they attempt to connect to has been taken offline. flexibility over traditional data exfiltration detection techniques. Only one known-malicious application, FakeDaum, was able Additionally, existing systems for detecting data exfiltration to connect to a remote server and send contact information. assume a highly competent adversary [21], which effectively This data exfiltration was detected and reported as intended, utilizes encryption and/or data obfuscation techniques to hide and the exact output of the program can be seen in Figure2. data theft and avoid detection. This is appropriate for systems The phone numbers shown in that figure are from contact list used to protect high-value information such as government entries on the Android device running the malware. The results secrets, but given that malicious mobile applications are stealing of running this malware can be seen in Table I. relatively low-value data and thus are likely to aim to acquire

ISBN: 1-60132-489-8, CSREA Press ©

Int'l Conf. Software Eng. Research and Practice | SERP'18 | as much as possible with the least effort, in conjunction with the low rate of cryptography use (and especially secure cryptography use) in mobile applications overall [6], these assumptions are unlikely to be valid for this threat model. By designing for assumptions appropriate to home users, we can vastly reduce the amount of processing power required as well as increase the accuracy of exfiltration detection. There has been some very limited research that is very similar to the proposed solution - AntMonitor [12] in particular is very similar. AntMonitor uses a VPN-based method of traffic collection, which has the advantage of working whether on wireless Internet or on cellular networks. However, the prevents it from working with Internet of Things devices. Additionally, it relies on information gathered from an application running on the device under test to determine what strings to look for - our system monitors traffic based on patterns, and thus is independent from the device under test. Similar to AntMonitor, Meddle [18] also uses a proxy/VPN approach to traffic capture, and searches for some of the same information that our analysis tool does, that is, PII. However, our analyzer notifies the user in real time of transmission of sensitive information, and as discussed, our tool’s use of ARP poisoning rather than a proxy-based approach allows usage with a wider range of devices. The open-source tool “ngrep” [19] (network grep) allows the usage of regular expressions on network traffic, and has existed for some time. This is similar to the technique our tool uses to detect suspicious transmissions, but we also include a component to intercept network traffic through the use of ARP poisoning. VI. T HREATS TO VALIDITY The primary threat to this approach is the use of encryption by malicious devices to obfuscate the data they are exfiltrating. The most obvious one is the use of Transport Layer Security (TLS) or Socket Security Layer (SSL), collectively known informally as “HTTPS”. However, the use of these technologies is not ideal for the developer of a malicious application - HTTP clients typically check the public keys of SSL/TLS against centralized repositories known as Certificate Authorities to ensure the public key transmitted by a server is one known to be associated with the owner of the server. If it is not, this signals a likely man-in-the-middle attack, and most HTTP clients will refuse to connect. Public keys can be “revoked” by Certificate Authorities if the key is known to no longer be trustworthy - for example, if the associated private key was stolen. A certificate authority could revoke the public key of a malicious server owner, who could then no longer receive data from their malicious applications, as the HTTP client would refuse to connect. The other case is the use of encryption other than HTTPS, such as AES, before the data is transmitted. This case is more plausible, but still unlikely at this time. Even among non-malicious applications, the rate of encryption use among Android applications is very low, and even those applications which do use it rarely use it correctly [6].

59 We have collected network traffic through multiple applications and capture methods to ensure that our method of traffic collection did not negatively influence the results. Additionally, by manually inspecting all captured traffic, we have attempted to verify that our tool correctly identified all cases of data exfiltration. This method also has inevitable false positives: Some innocuous data will undoubtedly look like a telephone numbers or email addresses or locations. Additionally, the user filling out a web form is indistinguishable, from a network traffic perspective, from a malicious application sending sensitive data to a remote server. In order to alleviate these false positives, more work is needed to add “whitelists” of known-good servers, as discussed in the Future Work section. VII. F UTURE W ORK In the future, we will extend our tool to recognize more types of personal data, including location data, calendar event information, and unique identifiers, such as IMEI number. Further, we will add a way for the user to import lists of information likely to stolen, such as contact lists, in order to reduce false positives. Additionally, the traffic could be stopped to a remote server if it is being sent suspicious data until the user has had a chance to review the data and server in question. Approved servers could be saved in order to reduce future inconvenience to the user. VIII. C ONCLUSION We have presented a method for using deep packet inspection to detect private data exfiltration in a platform-independent, general way which does not require the user to input personal data into the tool, extending beyond traditional data exfiltration detection approaches. We have demonstrated the effectiveness of this tool in detecting data exfiltration by malicious mobile applications, which can be put to use by end users to aid in detecting data exfiltration early, across multiple platforms. R EFERENCES [1] C. L. Abad and R. I. Bonilla. An analysis on the schemes for detecting and preventing arp cache poisoning attacks. In Distributed Computing Systems Workshops, 2007. ICDCSW’07. 27th International Conference on, pages 60–60. IEEE, 2007. [2] G. Berry and R. Sethi. From regular expressions to deterministic automata. Theoretical computer science, 48:117–126, 1986. [3] G. Combs. Wireshark. https://www.wireshark.org/, 1998-2016. [4] T. W. D. Company. Circle with disney - internet. reimagined. parental controls and filtering. https://meetcircle.com/, 2015-2016. [5] P. Developers. Privoxy. http://www.privoxy.org/, 2001-2016. [6] M. Egele, D. Brumley, Y. Fratantonio, and C. Kruegel. An empirical study of cryptographic misuse in android applications. In Proceedings of the 2013 ACM SIGSAC conference on Computer & communications security, pages 73–84. ACM, 2013. [7] M. Egele, C. Kruegel, E. Kirda, and G. Vigna. Pios: Detecting privacy leaks in ios applications. In NDSS, 2011. [8] W. Enck, P. Gilbert, S. Han, V. Tendulkar, B.-G. Chun, L. P. Cox, J. Jung, P. McDaniel, and A. N. Sheth. Taintdroid: an information-flow tracking system for realtime privacy monitoring on smartphones. ACM Transactions on Computer Systems (TOCS), 32(2):5, 2014.

ISBN: 1-60132-489-8, CSREA Press ©

60

Int'l Conf. Software Eng. Research and Practice | SERP'18 | [9] D. Ferreira, V. Kostakos, A. R. Beresford, J. Lindqvist, and A. K. Dey. Securacy: an empirical investigation of android applications’ network usage, privacy and security. In Proceedings of the 8th ACM Conference on Security & Privacy in Wireless and Mobile Networks, page 11. ACM, 2015. [10] ISO. 8601: Data elements and interchange formats – information interchange – representation of dates and times. Technical report. [11] J. Kim, Y. Yoon, K. Yi, J. Shin, and S. Center. Scandal: Static analyzer for detecting privacy leaks in android applications. MoST, 12, 2012. [12] A. Le, J. Varmarken, S. Langhoff, A. Shuba, M. Gjoka, and A. Markopoulou. Antmonitor: A system for monitoring from mobile devices. In Proceedings of the 2015 ACM SIGCOMM Workshop on Crowdsourcing and Crowdsharing of Big (Internet) Data, pages 15–20. ACM, 2015. [13] B. LIU, J. Li, Y. LIANG, Z. Huang, X. Yang, et al. Smart pan provided with single temperature-sensing probe, and method for frying food, Oct. 8 2015. WO Patent App. PCT/CN2014/080,738. [14] Y. Liu, C. Corbett, K. Chiang, R. Archibald, B. Mukherjee, and D. Ghosal. Sidd: A framework for detecting sensitive data exfiltration by an insider attack. In System Sciences, 2009. HICSS’09. 42nd Hawaii International Conference on, pages 1–10. IEEE, 2009. [15] A. Ornaghi, M. Valleri, E. Escobar, and E. Milam. Ettercap: A man-inthe-middle attack suite. https://ettercap.github.io/ettercap/, 2004-2016. [16] M. Parkour. Contagio mobile mini malware dump, 2011-2016. [17] D. C. Plummer. Ethernet address resolution protocol: Or converting network protocol addresses to 48.bit ethernet address for transmission on ethernet hardware. STD 37, RFC Editor, November 1982. http: //www.rfc-editor.org/rfc/rfc826.txt. [18] A. Rao, A. M. Kakhki, A. Razaghpanah, A. Li, D. Choffnes, A. Legout, A. Mislove, and P. Gill. Meddle: Enabling transparency and control for mobile internet traffic. [19] J. Ritter. ngrep: network grep, 2006. [20] G. Shirts. Packet capture, 2015-2016. [21] G. J. Silowash, T. Lewellen, D. L. Costa, and T. B. Lewellen. Detecting and preventing data exfiltration through encrypted web sessions via traffic inspection. 2013. [22] S. Skafdrup, T. H. Sorensen, and H. Guld. Wireless thermostat with dial and display, May 4 2010. US Patent D614,976.

ISBN: 1-60132-489-8, CSREA Press ©

Int'l Conf. Software Eng. Research and Practice | SERP'18 |

61

Industrial Practices in Security Vulnerability Management for IoT Systems – an Interview Study Martin Höst1 , Jonathan Sönnerup2 , Martin Hell2 , Thomas Olsson3 1 Dept. of Computer Science, Lund University, Sweden, [email protected] 2 Dept. of Electrical and Information Technology, Lund University, Sweden, (jonathan.sonnerup, martin.hell)@eit.lth.se 3 RISE SICS AB, Lund, Sweden, [email protected] Abstract— The area of Internet of Things (IoT) is growing and it affects a large amount of users, which means that security is important. Many parts of IoT systems are built with Open Source Software, for which security vulnerabilities are available. It is important to update the software when vulnerabilities are detected, but it is unclear to what extent this is done in industry today. This study presents an investigation of industrial companies in the area of IoT to understand current procedures and challenges with respect to security updates. The research is conducted as an interview study with qualitative data analysis. It is found that few companies have formalized processes for this type of security updates, and there is a need to support both producers and integrators of IoT components. Keywords: IoT, security, vulnerabiliy, open source Contribution type: Regular Research Paper

1. Introduction The area of Internet of Things (IoT) is becoming a “disruptive innovation that will radically change business processes within and across sectors” [3]. In the current process of transforming traditional product offerings to service offerings, the need for IoT services will also be important. IoT systems are important in a wide spectrum of domains, such as in the medical domain, in “smart homes”, in supervision and control of industrial processes, meaning that many people are, or will be, dependent on the services of IoT systems. These systems and their components often include private data and sensitive information, and can be used to control and monitor critical infrastructure, as well as to protect physical assets, meaning that security is a crucial quality attribute. Embedded software systems represent an important part of IoT systems. These embedded systems typically include open source software (OSS) components. For OSS components, vulnerabilities are continuously published in publicly accessible databases. For many IoT systems this means that there are known security vulnerabilities, and “exploits”, i.e., descriptions of how to use the vulnerabilities in attacks.

For most products, it is important to update them after they have been taken in use. However, it is unclear to what extent security updates are carried out in a structured way. Since many of the systems are new, there is a risk that too much of the focus is on demonstrating innovative features rather than handling security pro-actively. Motivated by this, the presented research investigates how companies developing IoT system monitor published OSS vulnerabilities, how they decide which updates to make, and what main obseervations can be made based on this.

2. Background An IoT system typically consists of a perception layer with physical entities, a network layer together with a middleware layer, an application layer, and a business layer [7]. Solinen and Taivalsaari [12] present an architecture of an IoT system as involving sensors, gateways, cloud solutions, and interfaces. That is, a typical IoT system includes sensors and actuators, gateways to transport data, data in the cloud, and user interfaces e.g. connected through the cloud. Development of IoT systems typically involves integration of sub-systems into complete systems, as any system development. This involves integration of both components from other companies and OSS components, OSS components can be included as-is, or companies may be part of the OSS community and involved in the development [5]. That is, there is a chain of producers and consumers of components/system, i.e., forming a “value chain” of different company types. In this value chain component developers develop individual components, such as sensors and gateways, and integrators integrate components and develop sub-systems or complete systems that can be used by users. There are also supporting companies, not involved in the development of a specific system, but supporting the development. Examples are consultancy companies with specific competence in IoT and development tool developers. When securing IoT, there are many aspects that need to be taken into account. Access control to the device must be handled to avoid unauthorized access. If access, or administration through a web interface is offered, web attacks such as SQL injections and cross site scripting (XSS) must

ISBN: 1-60132-489-8, CSREA Press ©

62

Int'l Conf. Software Eng. Research and Practice | SERP'18 |

be protected against. If the device itself must access remote services, it must store credentials, private keys, or handle trusted certificates securely. The network in itself must be encrypted and authenticated in order to avoid man-in-themiddle, or replay attacks. A challenge with security is that even with robust security controls in place, and following best practice, newly discovered software vulnerabilities can render a device insecure and easily exploitable. It is thus important to implement methods that efficiently identify and evaluate the need for patching when new vulnerabilities are disclosed. There are documented examples of malware using known software vulnerabilities to infect IoT devices [14]. New software vulnerabilities are reported in a large variety of databases. Large multi-vendor databases for vulnerabilities include, e.g., Bugtraq and NVD. Bugtraq is a computer security mailing list, while NVD, maintained by NIST, provides a more structured way of describing a certain vulnerability. This includes, e.g., criticality given as a CVSS score, which software and versions are affected, and links to further information. In addition to the multivendor databases, some software vendors maintain their own vulnerability information. While the vulnerabilities often overlap between databases, different information is also published in different sources [8], [10]. This makes it difficult to accurately assess the impact of a certain vulnerability. Using responsible disclosure, the vulnerability is typically assigned a new CVE identifier (Common Vulnerabilities and Exposures), but the vendor is given some time to develop and issue a patch for the vulnerability before it is made public. There are efforts that aim to standardize different aspects of identifying and evaluating vulnerabilities. NIST maintains SCAP [16], which is a protocol for automatically managing vulnerabilities, and measure and evaluate systems and their configuration. Using SCAP will help monitoring systems and their configuration in order to quickly respond to new vulnerabilities. It consists of several components, e.g., the Common Platform Enumeration (CPE) for assigning names to IT products, Common Configuration Enumeration (CCE) for describing system configuration and Open Vulnerability and Assessment Language (OVAL), which is a language for assessing the state of a machine and report results of assessment. Also, CVE and CVSS as described above are components of SCAP. In addition to vulnerability information, detailed information on exploits can be found in ExploitDB, while Metasploit provides a framework for easily using developed exploits. The presence of exploits is a factor to consider when evaluating vulnerabilities.

3. Related research The degree to which security is seen as important by practitioners is different in different domains, as described by [2]. They found that legacy systems enhanced by IoT solutions were often highly critical for society, which could slow down the process of transforming them to an IoT

architecture, and they also found that system availability in general is more important than confidentiality of data. They also saw that practitioners value security differently in different domains. For example, the electricity sector is more “conservative” and there is a tradition of maintaining safety. However, based on the study, it can be concluded that security is important for many IoT services and products. A recent systematic mapping study by Mohammed et al. [9] indicates that there is an overweight in research on how to handle security issues in the development phase as opposed to other parts of the process, especially the operational phase. To some extent, the research presented in this paper addresses this by investigating the state of practice for IoT companies, and by studying several development phases. Telang and Wattal [15] analyzed the impact of security announcements on the firm stock price. They suggest that even though the direct liability for a pure software company is lower than that of, for example, a medical equipment company that typically have clearer liability clauses in their contracts, the software companies suffer more in terms of stock price. Furthermore, they state that there is a relationship between the announcement of a vulnerability and a patch to address it. The longer it takes, the more negative is the impact on the stock price. In this study, we interview companies in the IoT domain to understand their challenges in identifying vulnerabilities and addressing them. Telang and Wattel point out that smaller companies have a larger negative impact on stock price. In IoT in general and for the companies in this study, many are small and hence this emphasizes the need for fast rapid in IoT. Software companies are getting better and faster in addressing security vulnerabilities. However, hackers are also getting faster [13]. On a similar study, Arora et al. [1] observe that the patch time is faster once a vulnerability is announced. Furthermore, open source providers are quicker in addressing vulnerabilities and having an authority such as CERT increases the likelihood that software providers address the vulnerabilities. We complement these studies by analyzing a number of case companies in detail and how they use e.g., vulnerability databases such as CVE. Hafiz and Fang [4] report that there is a preference to public announcement of vulnerabilities rather than reporting them privately to the software providers, at least when it comes to experienced reporters. However, they also find that software providers are more likely to address a vulnerability if it is reported directly to them. This indicates that the software providers struggle to analyze public vulnerability databases and need strategies and tools for this. In our study, we focus on lead-times in general for the case companies and based on internal and private information, as opposed to Hafiz and Fang who rely on publicly available information [4].

ISBN: 1-60132-489-8, CSREA Press ©

Int'l Conf. Software Eng. Research and Practice | SERP'18 |

4. Research Methodology The following research questions have been investigated in the research 1) How are vulnerabilities for OSS identified and evaluated in companies developing IoT systems? 2) When a company has decided to update to address a vulnerability, how is the update work planned and synchronized with other development and maintenance work for the product? The research was carried out as an interview-based survey with companies in the area of IoT, i.e., a qualitative survey (e.g. [6], [17]). This type of survey is, as opposed to a traditional quantitative survey, based on qualitative analysis and not on statistical significance. That is, the sampling in the survey is based on diversity, purpose, and saturation, instead of probability based sampling. The analysis relies on coding and finding patterns instead of counting frequencies and determining statistical tests. Qualitative surveys are appropriate for studying diversity, opinions, relations, etc., which was the reason it was chosen for the research objective of this paper. The research was conducted in three main phases: 1) In the first phase of the research, the overall goal was decided, see above. 2) In the second phase, empirical data was collected through interviews in two rounds. 3) When data had been collected, an analysis phase could start. The phases were not simply conducted in sequence but instead iterated. For example, the formulated objective was somewhat adjusted based on the findings in the interviews, and the interview questions were adjusted based on the findings from analysis, especially between the two interview rounds. Below the phases are further described. Interview questions were formulated based on the research questions. There were questions about the company and their business, about how they identify and evaluate vulnerabilities, and about how they update and distribute new software versions. The interviews were conducted in a semi-structured way (e.g. [11]), in two interview rounds, round 1 and round 2. This means that the researcher adhered to the list of interview questions during the interview, but the questions were not given to the interviewee literally and exactly in the order as they were listed. Every interview lasted for about one hour. In some cases it was obvious that some questions were not applicable, and in those cases the questions were left out. The interview approach allowed the researcher to follow-up answers with new questions, which in some cases dealt with areas that were not included in the prepared questions. This allowed for learning, and a part in formulating questions for the second interview round. All interviews in the first interview round were carried out by at least two researchers.

63

One of the researchers took the lead and was responsible for the communication with the interviewee, and the other researcher asked follow-up questions. All interviews were audio-recorded and transcribed. The findings from interviewround 1 were summarized informally by the authors at a meeting with the objective to get a first understanding of the interview data. In this way it was possible to update the interview questions before the remaining interviews in round 2 were conducted. The interview questions were refined based on the results and the discussions during the interviews. Mostly this was done by adding detailed followup questions. Interview round 2 was carried out in a similar way as in interview round 1. However, some interviews here were carried out by only one researcher. In the analysis, first a set of codes were derived, then all transcripts were coded. After that, a number of subject areas were decided and for each area a number of relevant codes were identified, and after that the subject areas were summarized. The first set of codes were defined before the coding began, and then the coding list was iteratively increased during the coding. Examples of codes are “identification method” and “information in identification”. In total 31 codes were defined. Defining the codes is an illustrating example of a research step that was iterated in the research. Coding was first carried out by one researcher per interview. After that a few interviews were coded by a second researcher, and it was found that it was hard to get a perfect match of how each text segment was matched to a specific code. It was therefore decided that each interview should be coded by two researchers. The following subject areas were defined: • Identification of vulnerabilities • Evaluation of vulnerabilities • How to decide and deploy • Challenges and improvements For each area a set of relevant codes were defined, which served as a guide in summarizing the area. The resulting texts are presented in Section 5.2. When quotes are given they have, in most cases, been translated from Swedish to English.

5. Results 5.1 Involved Companies In this section, the involved companies are presented based on the interview results. We use descriptive names instead of their real names in order to preserve anonymity. In Table 1, all interviewed companies are listed and summarized. The value chain area denotes the company’s products in the value chain according to Section 2. The third column (“# part.”) lists the number of interviewees that participated in the interview. The age is an assessment of the total age of the company, regardless the time they have been working with IoT. The term “startup” means that the

ISBN: 1-60132-489-8, CSREA Press ©

64

Int'l Conf. Software Eng. Research and Practice | SERP'18 |

Table 1: Company overview (Under Value Chain, CD denotes Component Developer, I denotes Integrator, and S denotes Supporting organization) Company

Value chain # part. Age

IoT exp

IoT Platform Developer Gateway Consultant Consumer IoT Developer IoT Device Developer Software Tool Developer Security Product Developer Advanced IoT Product Dev. Product Integrator

CD, CD, CD, CD S CD, CD I

High High High High Medium Medium High Low

I S I I

2 1 3 1 1 1 2 1

Startup Medium Old Old Old Old Old Old

company is so new that they have not yet distributed their product commercially. IoT exp denotes our assessment of the company’s experience and maturity in the area of IoT. The first four cases were interviewed in interview round 1, and the other cases in interview round 2. IoT Platform Developer develops a platform for connecting different IoT devices to each other, and they also develop specific sensors for IoT systems. The company is rather new, and can be characterized as a small company and a start up. They are currently selling a component (at sensor level) that they have developed, and they are now developing the platform. They are both a component developer and an integrator. At the interview, a main architect with strategic responsibilities at the company and a developer participated. Gateway Consultant has a long history of developing embedded systems for customers covering a full range of devices. Lately, a new focus has been on so-called industrialgrade devices with high quality and a life-time of more than 10 years. They help customers with security-related questions, as this becomes more and more important. They both give support in specific areas, like security, and take on development assignments. A senior expert with regional responsibility for customers was interviewed. Consumer IoT Developer is new to developing IoT products. The company is rather old and they are used to providing services to other companies and selling products with a low amount of software that are not connected to the Internet. They are now approaching a new market segment, the consumer segment, with a new product in their domain, which is a typical IoT product. The lead developer, one more developer, and their main strategically responsible person, were interviewed. IoT Device Developer is a large company developing IoT products. It is an old company with significant experience of developing wireless communication systems. They provide products with a base unit on which customers may apply their own applications and integrate in their own systems. The interview was conducted with the CTO. Software Tool Developer are developing a software tool to integrate and visualize information from other tools, where

security is one aspect. Their tool integrates a wide range of tools and services and visualizes available information from the tools, updating the information in real-time. The interview was conducted with the CEO. Security Product Developer has for about 15 years developed products with a focus on high security applications. The company is currently broadening their focus to offer high security products also to a more general range of potential customers. The interview was conducted with a senior technical expert. Advanced IoT Product Developer develops hardware and software for advanced IoT products.The products are sold to many different domains, where they are integrated into larger plants and installations. The interviews were conducted with a company software security responsible and a system architect. Product Integrator has a long history in proving services to a global market, and has now started to complement their services with services based on IoT. They are new to IoT, but they see the services as important, e.g., since they believe that there may be new competitors with services based on IoT in the future. The interview was conducted with one person responsible for technical decisions.

5.2 Results for the Defined Subject Areas 5.2.1 Identification of Vulnerabilities The interviewed companies have somewhat different approach to identification of vulnerabilities. Small component developing companies with no explicit high requirements on security (Consumer IoT Developer, the Software Tool Developer, and the IoT Platform Developer) focus on keeping everything as updated as possible at all times. They see no possibility to read security reports, but the importance of being aware of potential security problems is recognized. In their experience, there is too much information available to keep track of it. The Gateway Consultant, that has many different customers with different needs, see the same tendency from their customers. Also the Product Integrator who integrates systems spend low effort on identification. To a large extent the problem of identifying new vulnerabilities is left to the product manufacturer. The information is not processed much further but relayed as information to customers to update the product. While they do not attempt to identify vulnerabilities themselves, they see a need to put clearer requirements on the product developers to have well defined vulnerability and security policies in place. The larger component developing companies do work with identification of vulnerabilities. The Advanced IoT Product Developer, distributes the responsibility for vulnerability identification among the developers. Every software package has an owner within the development organization. Each developer is responsible for at least one package and the owner is responsible for the quality of that software package, which includes security:

ISBN: 1-60132-489-8, CSREA Press ©

Int'l Conf. Software Eng. Research and Practice | SERP'18 |

“We have distributed the responsibility. Every module has a designated owner who is responsible for quality. This includes discovery of vulnerabilities.” (Advanced IoT Product Developer) There is however no defined process for how to identify vulnerabilities. The importance of security is well recognized, resulting in many vulnerabilities being identified shortly after disclosure. The company support also has an external entry point for reporting of vulnerabilities. The IoT Device Developer has development teams located in several different countries. Locally, there is no defined process for identifying vulnerabilities. There is a reliance on the fact that some employees are interested in security and the developers responsible for security are assumed to identify new vulnerabilities. The Security Product Developer has a high focus on security. They monitor both new CVEs and the web pages of the software developers/manufacturers. They also take part of at new results presented at conferences in cryptology and security. Developers have been assigned specific sources to audit and when somethings comes up, a meeting is arranged to discuss the new vulnerability. This group will then report to the product owner, who in turn decides on how to react. “Currently we have a designated responsibility among developers that specific individuals has a number of sources to monitor.” (Security Product Developer) They observe that security-aware customers have recently started to make more detailed requirements on security intelligence, resulting in the need for more stringent methods for vulnerability identification. Their experience from these cases are that due to the very broad audit, most of the information is irrelevant. It is estimated that about 5% of the information found in general sources is somewhat relevant, while almost all information found on developers’ own web pages is relevant. The 5% can easily be cut in half, but after that, identifying which issues need to be further scrutinized is much more costly, since it is necessary to take configuration and operation into account. 5.2.2 Evaluation of Vulnerabilities As with the identification, the organizations have somewhat different approaches. All three small companies, Consumer IoT Developer, Software Tool Developer, and the IoT Platform Developer lack resources and competence to be able to evaluate most of the vulnerabilities. The Gateway Consultant, that has customers, have the possibility to evaluates their customers’ products, set up test environments and examine exploits to simulate events of attacks, but it is not done frequently. Neither the Product Integrator do perform any extensive evaluation. Instead they relies, as with the identification, on the recommendations from the companies developing the products. They usually follow the

65

recommendation. They do not have the required resources or in-depth competence: “We are unfortunately not mature in the evaluation process to question the manufacturer, how this affects us in both a large scale and small.” (Product Integrator) Advanced IoT Product Developer does not have a well defined process for evaluating vulnerabilities. Instead, it is up to each maintainer of a specific package to either perform the evaluation or distribute it to another part of the company. It is usually the case that a maintainer discusses the issue with other developers and/or managers. Since the majority of the developers are security conscious, most of the identified vulnerabilities are assumed to be correctly evaluated and, if necessary, patched. The IoT Device Developer has a dedicated group focusing on security and vulnerabilities. Since most of the software modules are based on OSS, the security group can analyze the source code when evaluating a vulnerability. They use information from previously conducted evaluations when assessing new vulnerabilities, but the process is not formally defined. They are quite confident in the decisions they make due to discussions between the developers, but most of the decisions are based on ’gut feeling’. The Security Product Developer is currently defining a process for evaluation of vulnerabilities where they perform market analysis for the products based on key components and information regarding the vulnerabilities. When a vulnerability is identified, the evaluation is performed by a developer with experience regarding the vulnerable software. If the vulnerability is considered severe, the issue is raised among the group and project manager. In the gray zones, the developer goes with the ’gut feeling’. 5.2.3 How to Decide and Deploy After vulnerabilities have been identified and evaluated the next step is to decide when and how to deploy the changes to products. This is typically carried out in relation to other deployment processes in the company. The large and mature companies in the study, i.e., Advanced IoT Product Developer and Security Product Developer, have experience of working with evaluation of security vulnerabilities and distributing new versions. The companies typically have regular releases with planned dates for when new releases should be distributed. When a security update has been decided, it could either be planned into one of the next releases or planned into a new dedicated release especially for that vulnerability if it is crucial. While the first evaluation of the vulnerability, with the purpose to decide if it is a threat to the actual product, focuses on technical aspects such as configuration and Internet access, there are other aspects of importance in this evaluation, such as the number of components that are installed and in use, and the types and

ISBN: 1-60132-489-8, CSREA Press ©

66

Int'l Conf. Software Eng. Research and Practice | SERP'18 |

importance of customers. It is not common to roll out a patch before a planned release. The companies state that they do manage to do an update quickly, but it is not probable that it will be done according to a formalized process. Instead, this is seen as special circumstances, which requires people to take extra decisions in order to solve the problems. When it comes to the less mature companies, it is clear that they rely on non-formal processes for making these kind of changes. Companies have different roles that take decisions on when and how to roll out changes to customers and users. The larger and mature companies refer to “product owners” with the responsibility of the product. That role can decide how to make changes and contact affected parties. In smaller companies and in less mature companies it is not certain they have that kind of role defined. In those cases, some companies formally or informally relied on the support department for making this kind of decisions. These departments have contact with the customers and their view on the change is seen as important. It was found for all types of companies, that there to some extent is a tendency to postpone updates to the next planned release instead of making a specific release for a vulnerability. It is a trade-off between risk of waiting and cost and effort to make changes in the field. This is, for example, stated as “It can take long time to update, and typically they are reluctant to update, they ’need to check’.” (Gateway Consultant) The rather complicated structure of the value chain with component developers, integrators, etc., makes it complicated to distribute updates. For example, a component developer, such as the Advanced IoT Product Developer develops components that are distributed to integrators. They may then deliver the system to a customer that then are in charge of the system. This means that it is not only the developing company that decides if an update actually is made in the field. 5.2.4 Challenges and Improvements The main challenge, regardless of company size and place in the value chain, seems to be that there is too much information regarding vulnerabilities. With around 7,000 new CVEs annually combined with the often large number of software components used in the products, makes it very time consuming to keep track of everything. The larger companies can still do it to some extent since they have the resources, but for the smaller ones it is infeasible. Interestingly, Security Product Developer suggests using the great amount of information to their advantage by rotating the responsibility for the identification part among everyone in the development organization. This can be used for educating the developers and give them insight into vulnerability information and its sources. The same strategy

is however not considered appropriate in the evaluation phase, since evaluation requires more in-depth knowledge of the software component or the product in general. In order to improve their possibility to identify and evaluate vulnerabilities, the smaller companies mention automation and automatic filtering of information. This could reduce the amount of information and make it more manageable. Having someone that is explicitly responsible for security is also mentioned as prioritized among the small companies when more resources are available.

6. Discussion Our interviews indicate that when it comes to identification of vulnerabilities the extent to which vulnerabilities are monitored depends on both company size, its place in the value chain, and business segment. We have also identified an increased awareness of the need for security. For example, Product Integrator would like to improve their ability to put requirements on the manufacturers and the small companies Consumer IoT Developer, the Software Tool Developer, and the IoT Platform Developer, rely heavily on suppliers but without clear security requirements on them, i.e., placing a large trust on their suppliers. It is probably natural and a general problem for integrators of products to follow recommendations and assessments of the developers of subsystems without doing thorough investigations themselves. We have also seen that Security Product Developer has experienced these increased requirements from customers. These findings lead to Observation 1: There is an increased demand for improved vulnerability management throughout the value chain. Investing in increased knowledge about vulnerability management will allow organizations to be more efficient and agile when new vulnerabilities emerge. For the two manufacturers that do have described processes for identifying and evaluating vulnerabilities, i.e. the Advanced IoT Product Developer and the Security Product Developer, we can identify two different approaches. For the Advanced IoT product, the identifier and evaluator is almost always the same person, i.e., the software package maintainer. For the Security Product Developer, the identification process is more general and developers do not have responsibility for specific packages. Instead, when a vulnerability is identified, the most suitable person is given the task to evaluate, typically the software package maintainer. This many-to-many relationship between potential points of vulnerabilities and information sources seems to be of primary concern for the smaller companies. Based on this, we formulate the following observation: Observation 2: The roles and processes for identifying and evaluating new vulnerabilities differ between companies. The responsibility can be focused on both code parts and on information sources. Concerning how updates are implemented and distributed, it seems like there is a tendency to postpone updates until the

ISBN: 1-60132-489-8, CSREA Press ©

Int'l Conf. Software Eng. Research and Practice | SERP'18 |

next planned release. When a decision is taken to actually distribute it earlier, then it is given high priority and the tasks can be fulfilled, even if informal processes must be used. In many cases it was found that companies decided to postpone an update until the next planned release, instead of making a specific release for the security patch. This could probably be explained either by a low perceived risk of being attacked or a low perceived effect of being attacked. Based on the interviews, we believe that it is the former case. A complicating factor for decisions to deploy an extra release is the relationships in the value-chain. As mentioned by the Advanced IoT Product Developer, it is not always the case that the integrators will always update. Hence, for larger, more complex systems, there seems to be a reluctance to update. However, for the smaller companies who are also integrating supplier software (cf. Table 1), they have a much more optimistic approach and update suppliers’ software without a thorough analysis. This leads to: Observation 3: The decision to update, or apply an update, depends not only on the severity of the vulnerability, but also on organizational factors, which can complicate the process and leave devices vulnerable for an extended amount of time.

7. Conclusions Though many are aware that they need to quickly adapt to the increased importance of addressing security vulnerabilities in their products, the pace is not in line with the development in terms of features and connectivity. A recurring problem is that there is too much information and it is very difficult to efficiently process it. The companies simply cannot afford to invest enough resources to review the complete flow of published vulnerability information. The pace of change in the IoT domain and the priority on innovation and new features probably mean that the security processes get little attention and are hence informal and ad-hoc. Three observations can be made from organizations developing IoT products. Firstly, there is an increased demand for improved vulnerability management throughout the value chain, and therefore probably a need to invest in knowledge, mainly on how to identify vulnerabilities and their impact, and how to formulate requirements on identifying vulnerabilities if you are integrating components. Secondly, the roles and processes for identifying and evaluating new vulnerabilities differ between companies, either based on the source of information or on the software architecture. Thirdly, the decision to update, or apply an update, depends on more than the severity of the vulnerability, such as organizational factors. The results that have been identified are based on a limited set of interviews. However, based on the identified underlying complexity with a fast past of innovation in companies, complex product architectures, and a fast pace of published security vulnerabilities, we believe that it is possible to generalize the observations to IoT development

67

as such. In further research it would be possible to study more cases, e.g. in other types of IoT companies, to improve the generalizability of the results. Acknowledgments This work was supported by Swedish Governmental Agency for Innovation Systems (Vinnova), grant 201600603.

References [1] Ashish Arora, Ramayya Krishnan, Rahul Telang, and Yubao Yang. An empirical analysis of software vendors’ patch release behavior: impact of vulnerability disclosure. Information Systems Research, 21(1):115–132, 2010. [2] Mikael Asplund and Simin Nadjm-Tehrani. Attitudes and perceptions of IoT security in critical societal services. IEEE Access, 4:2130– 2138, 2016. [3] European Commission. Definition of a research and innovation policy leveraging cloud computing and IoT combination. Technical report, European Commission, Directorate-General for Communications Networks, Content and Technology, 2014. DOI: 10.2759/38400. [4] Munawar Hafiz and Ming Fang. Game of detections: how are security vulnerabilities discovered in the wild? Empirical Software Engineering, 21(5):1920–1959, 2016. [5] Martin Höst, Alma Orucevic-Alagic, and Per Runeson. Usage of open source in commercial software product development - findings from a focus group meeting. In Proc. International Conference on Product Focused Software Development and Process Improvement (PROFES), pages 143–155, 2011. [6] Harrie Jansen. The logic of qualitative survey research and its position in the field of social research methods. Forum: Qualitative Research, 11, 2010. [7] Rafiullah Khan, Sarmad Ullah Khan, Rifaqat Zaheer, and Shahid Khan. Future internet: The internet of things architecture, possible applications and key challenges. In Proc. International Conference on Frontiers of Information Technology, 2012. [8] Fabio Massacci and Viet Hung Nguyen. Which is the right source for vulnerability studies? an empirical analysis on Mozilla Firefox. In Proceedings of the 6th International Workshop on Security Measurements and Metrics, pages 4:1–4:8, 2010. [9] Nabil M. Mohammed, Mahmood Niazi, Mohammad Alshayeb, and Sajjad Mahmood. Exploring software security approaches in software development lifecycle: A systematic mapping study. Computer Standards and Interfaces, 50:107 – 115, 2017. [10] A. Ozment. Vulnerability discovery & software security, PhD thesis, University of Cambridge, UK, 2007. [11] Per Runeson, Martin Höst, Austen Rainer, and Björn Regnell. Case Study Research in Software Engineering - Guidelines and Examples. Wiley, 2012. [12] Petri Selonen and Antero Taivalsaari. Kiuas – IoT cloud environment for enabling the programmable world. In Proc. Euromicro Conference on Software Engineering and Advanced Applications (SEAA), 2016. [13] Muhammad Shahzad, Muhammad Zubair Shafiq, and Alex X Liu. A large scale exploratory analysis of software vulnerability life cycles. In Proc. International Conference on Software Engineering (ICSE), pages 771–781, 2012. [14] Symantec. IoT devices being increasingly used for DDoS attacks. Technical report, Symantec Security Response, 2016. DOI: 10.2759/38400. [15] Rahul Telang and Sunil Wattal. An empirical analysis of the impact of software vulnerability announcements on firm stock price. IEEE Transactions on Software Engineering, 33(8):544–557, 2007. [16] David Waltermire, Stephen Quinn, Karen Scarfone, and Adam Halbardier. The technical specification for the security content automation protocol (SCAP): SCAP version 1.2. Technical report, 2011. Special Publication 800-126. [17] Robert S. Weiss. Learning From Strangers, The Art and Method of Qualitative Interview Studies. The Free Press, 1994.

ISBN: 1-60132-489-8, CSREA Press ©

68

Int'l Conf. Software Eng. Research and Practice | SERP'18 |

A Review of Popular Reverse Engineering Tools from a Novice Perspective A. Lee, A. Payne, T. Atkison Computer Science Department, University of Alabama, Tuscaloosa, AL, USA Abstract— Many tools are available to help reverse engineers get a deeper look at a software artifact. For a novice, these tools might be formidable and difficult to understand. Additionally, no survey has been done of reverse engineering tools for several years. In this paper, tools a newcomer might find themselves choosing between are discussed from a novice’s perspective, so as to assist future novices in choosing the proper reverse engineering tool for their task. In examining the tools, usability, functionality, availability, and durability of the tools were examined, among other aspects. Overall, IDA Pro was found to be the most user friendly and capable of the tools. Keywords: Reverse Engineering Tools, Survey, Software Engineer-

points. These guidelines are not meant to be comprehensive, but are meant to capture the points a newcomer to the tool would notice immediately or upon their first usage of the tool as a measure of novice user satisfaction. By reviewing the tools from a novice’s perspective, differences between the tools and how these relate to user satisfaction can be found, and suggestions can be made to help make new and current tools more user friendly to novice users. The paper is organized as follows. Section 2 discusses the background. Section 3 holds the methodology. Section 4 contains results. Section 5 holds the discussion. Section 6 discusses the threats to validity. Section 7 concludes the paper.

ing.

2. Background

1. Introduction

Reverse engineering has been assisted by tools almost since its inception, as tools greatly help a developer understand code, whether it be source code or binary [1]. Tools allow users to visualize code, examine items that would otherwise not be human-readable, and generate an understanding of the code; in some cases, the tools even attempt to reverse or alter the code themselves [2]–[4]. Reverse engineering saves companies money over engineering a new system from scratch; therefore, it is not surprising that it has long been supported by tools. If reverse engineering required many man-hours of manual work, then it would not be a money-saving technique [3]. Instead, companies would simply develop new tools whenever technology changed.

Software systems, particularly older systems, often require maintenance. However, the people maintaining the system are not always the ones who designed it. Furthermore, systems often lack full documentation, and sometimes the code may be the only ‘documentation’ that exists [1]. Missing documentation may be lost, pieces of the system may have never been documented, or the system may be behaving in unexpected ways which a simple debugging cannot explain [2]. In such cases, reverse engineering is often used. Reverse engineering is the process of analyzing software to identify its processes and their relationships with one another, and create a representation of these processes and relationships [3], [4]. As the size and complexity of programs continues to grow, reverse engineering tools have become more important than ever as a method of understanding source code and binary files that would be difficult to comprehend without assistance [5]. Experts in reverse engineering have their own tools, and know how to use them [6]. However, novices are faced with an array of unknown tools, each of which have unique abilities and disjoint properties even when they fall under the same tool category [7]. This study looks at understanding some of the most popular reverse engineering tools in use today from the perspective of a novice reverse engineer, in order to more easily understand what modern reverse engineering tools lack, and what they succeed at helping novice users comprehend. To properly evaluate these tools, a set of guidelines was developed to compare dissimilar tools across a few standard

This is not to imply that reverse engineering, even with tools, is a simple task. Specialized knowledge is required to understand the output from most reverse engineering tools. Additionally, tools can often only semi-automate the process of reverse engineering, requiring the developer to do some, or even most, of the work manually [1], [2], [7]. Reverse engineering can help in many different scenarios. The best-known scenario is that in which reverse engineering is used to rediscover or translate legacy software, particularly in cases where the software is complex or large [1]. However, reverse engineering can be used in any phase of the development stage [3], [4]. For example, if a newcomer entered a poorlydocumented software project under development, they could use reverse engineering tools to obtain a basic understanding of the project. Defect detection can also be accomplished through reverse engineering techniques; an engineer can use the tools or techniques to play the part of an attacker searching the

ISBN: 1-60132-489-8, CSREA Press ©

Int'l Conf. Software Eng. Research and Practice | SERP'18 |

compiled code for defects, and hence detect them before the software hits the market [3], [4]. There was a boom in reverse engineering tools in the late 90’s due to the oncoming threat of Y2K. For a time, there was a real need for reverse engineering tools to fix the date issue in legacy software so that the software would keep running past Y2K [1], [4]. Many popular tools, such as Rigi, a visual reverse engineering tool, died out shortly after the event [7], [8]. However, new reverse engineering tools have risen up to fill the gap of the defunct older tools. Since Y2K, reverse engineering has helped adapt old code to new technology, including the Web, and has saved industry millions, if not billions, of dollars over creating new applications every time new technology appears [1], [4]. Modern tools include program analyzers, architecture and design recoverers, and visualizers, and have been used to reverse engineer UML diagrams [9], visualize web tools [10], and translate programs across languages, among other applications [4]. Some of the most popular tools used in modern reverse engineering are dynamic debuggers and disassemblers such as IDA Pro [6], which allow a user to step through processes one line at a time as well as recreate the assembly code based on the bytecode. IDA Pro also allows a user to visualize a program with a graph. As visualization is a vital first step for system maintainers with no knowledge of the code, this is an important attribute [2]. Visualization can also help novice engineers to begin to grasp the program’s shape. Other tools, such as process explorers, allow the user to look inside running code and see how the code is linked together from a different level of abstraction. These can assist in the process of redocumenting and rediscovering the design of the systems, important steps in reverse engineering [3]. This paper focuses on tools that use binary files as input, rather than reverse engineering from source code. While some of the tools surveyed can be used on source code, the focus is on their capabilities with binary executable files. Previous surveys on reverse engineering tools are outdated [7], [10], [11], as most of them take place around or before the turn of the century. Additionally, no surveys have looked at tool evaluation from the angle of a novice reverse engineer. By capturing a new angle, we hope to add new, rich data to the study of reverse engineering tools, as well as assist newcomers in making the right choice of tool to use.

3. Methodology This paper examines five tools from a novice’s prospective. All tools are examined on a 32-bit Windows 7 virtual machine. The tools were chosen based on a few heuristics. Many novices may not wish to risk money on an unknown tool; therefore, for the tool to be considered for evaluation, it had to be free. Second, the tool had to be available. The list of tools from previous surveys included many academic tools which were

69

either no longer available for download, or whose websites were dead and no longer functioning. Naturally, a tool that could not be located could not be evaluated. Third, the tool had to run on the system in question. Fourth, the tool had to be easily discoverable. All tools were discovered through a Google search for ‘reverse engineering tools.’ While this might seem like a strange step, it was included as a proxy measure to discover the ‘most popular’ tools. We cannot say with certainty that the tools surveyed are the most popular, but we can say with certainty that they are the most easily discovered, as they appear on the first page of the Google search either as a link to their websites or in lists of reverse engineering tools. In a study of novice reverse engineers, ‘most easily discoverable’ may be a truer representation of their choice of tool than ‘most popular,’ as they may not have colleagues to suggest tools to them and must rely on the output of search engines or other self-discovery methods. The final tools used in the study are as follows: •









IDA Pro 5.0 is a commercial dynamic debugger and disassembler. The version used in this study is the free online version. While the current version of IDA Pro contains functions not included in the free version, this study is comparing free software; therefore, the other feature are outside the scope of the paper [12]. OllyDbg is an x86 dynamic debugger. At the moment, it can only decompile 32-bit code, but a 64-bit version is under active development [13]. Immunity Debugger is a fork of OllyDbg from the 1.0 version developed by a commercial team but released for free online [14]. It is a dynamic debugger that primarily focuses on exploit creation and defect detection, but can also be used for reverse engineering. CFF Explorer is a process explorer that is capable of more than just process exploring. Its primary purpose is to allow users to see how their software is connected with other libraries and functions, but it also includes many periphery features [15]. SysInternals is a suite of small tools that, when used together, can thoroughly examine a piece of code or a whole system. The tool suite was created by Microsoft developers to help test Microsoft software. It includes a rudimentary process explorer, a system crashing script, a background monitor, an autostart entry point manager, a file permissions checker, and many other utilities. Both 64 and 32-bit versions of the various tools are included in the package. Each tool offers its own insight into the system, though the tools do not work with one another or communicate [16].

Once the tools were chosen, they were investigated using a simple guideline. Previous guidelines were identified and examined to determine common metrics used in evaluating reverse engineering tools [7], [10], [11], [17]. The guidelines include questions on the tool’s content, usability, and interface, and can be seen in condensed form in Table 1. Durability of

ISBN: 1-60132-489-8, CSREA Press ©

70

Int'l Conf. Software Eng. Research and Practice | SERP'18 |

the tool was also test by attempting to make the tool crash. The purpose of this study was to compare the tools, but it was not expected that the study would find a best tool. Previous studies have failed to find a ‘best tool’ based on the heterogeneity of reverse engineering tools, as well as the vast variety of tasks that face reverse engineers, for which any one tool is unlikely to be all-encompassing [1], [7]. As such, the primary research questions for this paper were: RQ1: How do some of the most commonly used reverse engineering tools compare to one another? RQ2: Are the tools accessible to a novice user?

4. Results 4.1 Tool Functionality Of all the tools examined, IDA Pro offered the most functionality. Beyond its basic abilities, it offers multiple graph views of the code, including call flow graphs; allows users to view subgraphs of specific functions or calls; allows users to edit and print graphs; offers a notepad; allows users to comment on the code; has a calculator; allows users to edit the code, edit values used by IDA, and convert objects in the assembly code; and many other functions. Users can search for text, code, and many other types of items. Many of these functions were available both from the toolbars as well as the drop-down menus. In particular, the graphs included both a graph view that could be used actively in debugging, while an option on a toolbar allowed the user to view alternate graphs. Some of the functionality, such as the calculator, seemed unnecessary. Of all the tools, it requires the most memory on startup, and is the slowest to analyze the code. However, it has the most capabilities of all the tools presented. While not as functional as IDA Pro, Immunity Debugger and OllyDbg both offer dynamic debugging and disassembly as well. Immunity Debugger includes code graphs and Python scripting ability and therefore is more flexible than OllyDbg; however, OllyDbg also has some features that Immunity Debugger does not have, such as the ability to launch some file formats that Immunity Debugger cannot read. While they are not designed for static analysis, both provide a disassembler and can disassemble files, though the files are not as well annotated as IDA Pro’s files. Aside from the scripter and the minor differences mentioned above, OllyDbg and Immunity Debugger are nearly identical in terms of functionality. Both are relatively slim tools that complete what they are asked to do. However, their search feature is somewhat lacking. It can only find the single next occurrence of a command, which must be an exact match with the input to the search function. Interestingly, while Immunity Debugger advertises being lightweight, it actually uses more memory than OllyDbg. Both applications can analyze the code

much faster than IDA, with Immunity Debugger being the faster of the two. None of the debuggers could attach to themselves. OllyDbg and Immunity hid this by not listing themselves in the list of attachable processes; IDA Pro listed itself, but gave a nopermissions error if the user tried to attach to it, even with elevated privileges. Attaching the debuggers to one another was a surefire way to crash them all. Beyond that, Immunity Debugger and OllyDbg crash if the process they attach to is closed, or if they are closed without detaching from a process. OllyDbg did not offer a detach button and the user had to know the correct order of steps (close the window of the process being analyzed, then close the debugger window), which was not described anywhere, in order to not crash the tool. IDA Pro was more robust, recovering from the closing process and giving the user a list of menu boxes to choose what to do with the process if it was closed without detaching. While the menu boxes were confusing and often led to loops wherein IDA Pro would not close, they at least prevented the tool from crashing. Immunity Debugger revealed the most information about the processes that were available to attach to when the attach window was opened. IDA only had the ID in decimal and the service name. OllyDbg reported on process ID in hex, the name, the window name, and the path. Immunity Debugger had the process ID in decimal, the name, a service column that reported the services the process offered, a listening column that reported any open ports, the window name, and the path. In this way, the information to support any method of attaching to the process was revealed. However, it might be overwhelming for newcomers, as the information is not necessary to use the tool. OllyDbg did not provide a command-line interface or a scripting language, while both of the other debuggers did. IDA Pro provides an internal programming language, and Immunity Debugger provides built-in support for Python scripting. While a novice might not be interested in scripting or commandline interfaces, it could become an important feature as they move on to mastery of the tool. However, OllyDbg’s pure GUI interface simplifies the tool for novices by removing the unnecessary and potentially confusing scripting. CFF Explorer is a powerful tool as well: it includes a process and windows viewer, a PE dump, a PE utility viewer, a PE rebuilder, and a dependency walker, among many other functionalities. Some of them go beyond the scope of a process explorer. For example, while it is not a dynamic debugger, it does have a “quick and dirty” disassembler on one of the tabs, as well as a memory dump, a hex editor, a dependency walker, and various other tools. Almost every value it displayed could be edited by double-clicking on the item, then typing the edits. As the tools are not linked to one another, it sometimes felt like a tool suite with an integrated GUI. CFF Explorer also includes a script-entering mode. Unlike the dynamic debuggers, it was capable of attaching to itself, and could even allow a user to edit its own values without crashing.

ISBN: 1-60132-489-8, CSREA Press ©

Int'l Conf. Software Eng. Research and Practice | SERP'18 |

71

Table 1 A SELECTION OF THE GUIDELINES AND THE RESULTS FOUND FOR EACH TOOL .

User guide GUI

Tool Type Ease of use Speed Static or Dynamic Date of last release Languages and formats Developer Platform Memory usage (at startup) Extensible Active development Search

IDA Pro Full user guide, developer support, more behind paywall Multipaned interface; swaps to multiwindow when debugging Dynamic debugger and disassembler Easy Longest load time of all tools Both 3/23/2006 (free), 08/08/2016 (paid) Many (more in paid)

Tools Immunity Debugger OllyDbg Full user guide, de- Quick start guide and veloper support, and a developer support forum Multipaned interface Multipaned interface

Dynamic debugger (can disassemble) Medium Fast

CFF Explorer Developer support Tabbed interface; multi-window interface

SysInternals Behind paywall; some videos and whitepapers available Depends

Process explorer

Tool suite

Easy Very fast

Hard Immediate-fast

Both (more dynamic) 9/15/2008

Dynamic debugger (can disassemble) Medium Fastest of the dynamic debuggers Both (more dynamic) 9/27/2013

Static 11/18/2012

Depends 2/17/2017

Many

Many

PE

Takes no input

Commercial Windows 32/64-bit; Mac OS 64/86; Linux 32/64 16,512

Commercial freeware Windows 32-bit

Single developer Windows 32-bit (64 bit in development)

Single developer Windows 64/86, full support for 32

9,208

6,412

1,460

Commercial freeware Separate versions of each tool for Windows 32- and 64-bit Varies

Yes Yes Many

Yes Yes Limited

Yes Yes Limited

Yes No No

No Yes No

The way the functionalities are presented makes it feel as though CFF Explorer is bloated, though it’s hard to say that it is. It offers a more complete static analysis than the dynamic debuggers do, and each functionality is clearly delineated from the others through the tabs setup. Additionally, it is much more lightweight than any of the debuggers. Of all the tools with an overarching GUI, it uses the least memory at startup. This is likely because it only displays what it has been asked to display. For example, if the item it is analyzing is a small C application without many dependencies or external objects, it only displays the disassembler tab and the overview tab, but does not include any tabs for dependencies the file doesn’t have. On the other hand, if it is analyzing a large application with many dependencies, it will launch all its tabs. It does not disassemble an artifact until the user presses a button to ask it to disassemble; most other tabs also work on this principle. Thus, it does not waste runtime in loading all the possible functionalities available until the user asks for those specific functionalities. Each tool in SysInternals has its own role. The whole suite is made up of small, precise tools, which do not do any more than necessary. SysInternals separates each tool into its own item so that the user has no bloat whatsoever and no unnecessary functionality included when all they wanted was one particular tool. On the flipside, if the user wants a wideangle or whole-picture experience, SysInternals is not going to present that without additional effort on the user’s part. All the tools could be used individually to get an idea of the system or program, but there is no overview available among the tool suite. Overall, SysInternals was similar to CFF Explorer in

terms of functionality, but there was no integrated GUI.

4.2 Tool Accessibility to Novices Overall, IDA Pro was the most intuitive tool in the study, which might be expected, as it was a commercially-developed tool. All the options were clearly labeled with descriptive names. The interface was set up so that even though there were many windows, clicking one item highlighted its location across several windows and made it easier to understand. When it comes time to close the analysis, IDA Pro autodetaches when a user closes the window, which makes the whole process of detaching less confusing and prevents the generation of an unnecessary error. The graph view it provided also gave a wide overview of the whole system and made it easier to grasp a general understanding of how the whole program was wired more quickly. Its static disassembly feature revealed the most information of any disassembler in the study, including some auto-generated comments, and presented the information in the clearest format. The instructions were colorcoded for clarity, with values tagged according to what names were associated with them. IDA Pro’s search function was perhaps the most questionable part of the software from a usability standpoint; IDA Pro provides several different search functions instead of a single search, as is more natural. While it is understandable why these different searches are provided, as searching certain areas can take a very long time, it was confusing at first as to which search did what and how the searches were supposed to be used, potentially causing novices to waste time on the

ISBN: 1-60132-489-8, CSREA Press ©

72

Int'l Conf. Software Eng. Research and Practice | SERP'18 |

search function. However, IDA Pro has a free and available user guide, so any user can easily find help. Additionally, while the multiple searches may be confusing at first, they are preferable to the very limited searching capabilities of Immunity Debugger and OllyDbg. Immunity Debugger and OllyDbg’s interfaces were very similar, as Immunity Debugger is a fork of OllyDbg from the 1.0 version [15]. The most immediately-visible stand-out point between the two is that Immunity Debugger is in black with neon green text, making it hard to read, and OllyDbg is in khaki with black text, making it a little more readable. Aside from that, they are nearly identical. Both tools have very simple toolbars with letters to identify each function. The letters were sometimes intuitive (i.e. B for Breakpoints), and sometimes not (i.e. K for Call stack). However, between the two, the letters all refer to the same items. The consistency made it easy to switch between the two; however, as they each offer more-or-less the same functionality, it is hard to imagine a situation where a user would have to swap between them. Neither show hovertext to help explain what the letters mean, but do display alt-text in a border at the bottom of the screen. As this is not a normal place to display alt-text, it may frustrate a novice confused by the letter-based toolbar options. Immunity Debugger offers a few different ways to attach to a program, but unless the user knows beforehand which menu option equates to which method, it is confusing. It is not immediately obvious what differences there are between choosing the attach option and using the other methods. This may be because Immunity Debugger is intended for higherlevel users who would be interested in different methods of attaching to use either its Python scripting capabilities or command-line navigation, giving the user finer-grained control than a GUI does. OllyDbg only has one method of attaching, which is easier for a newcomer to navigate. However, it does not offer Python scripting as Immunity Debugger does. Both OllyDbg and Immunity Debugger had relatively limited annotations in the assembly code they generated. OllyDbg did not color code its assembly code, while Immunity Debugger did, a helpful move that allows novices to quickly grasp a system. In Immunity Debugger, a graph view was available, but not integrated. IDA Pro included annotations, color-coded the assembly code, and provided a graph view integrated into the initial view to provide the user an easy way to immediately recognize the code flow patterns. Immunity Debugger had a detach button; IDA Pro offered users the option to detach or terminate the process. OllyDbg offers no detach options in its GUI. This was a serious issue that affected user satisfaction; it was not immediately clear how to properly detach from a process without either crashing the process, crashing OllyDbg, or both. Unlike IDA, closing the tool or database without detaching from the process generated an error message box which did not assist the process of detaching in any way, and could lead to the program crashing if handled wrong, and closing the OllyDbg window

immediately crashed both OllyDbg and the process it was attached to. With OllyDbg, the only ‘proper’ way to detach without crashing anything is to close the process OllyDbg has attached to, though this sometimes generates errors as well. Immunity Debugger also gave some of these same errors, but the detach feature was available and the error easier to resolve. Overall, both OllyDbg and Immunity Debugger were easy to crash, while IDA was more difficult to crash. Immunity Debugger has an available user guide and an active forum manned by developers as well as users. OllyDbg has a quick-start guide with a short table of often-used functions, as well as a list of available plugins. Unfortunately, both OllyDbg and Immunity Debugger’s in-application Help functions are outdated and erroneously reports that there is no help, which may leave a novice user in a dead end. On the other hand, IDA Pro has a user guide and a working in-application help function, as well as available paid service packages. Overall, the dynamic debuggers had the same type of interface: a large window with many smaller windows within it, one for each type of interesting output, that could be positioned around the larger window or layered over one another with toolbars and menus above. The interface felt cluttered, and individual windows sometimes got lost in the clutter. However, given the amount of information needed to debug from assembly, it was actually helpful to have all the information on the screen despite the clutter. It allows for several different items to be paneled on the screen beside one another at the same time, so the user can walk through how one step makes multiple alterations throughout the stack, names, logs, and other areas. IDA Pro’s large background window goes transparent when it attaches to a process in order to allow the user to see and manipulate the process it is attached to, which was a useful function that the other two debuggers did not offer. With Immunity and OllyDbg, the large window stayed prominent in the screen, and the user had to juggle between the application’s window and the debugger’s window. CFF Explorer was very intuitive to navigate. There was only one method to open a file, and once a file was opened, CFF explorer read out all the relevant data. The initial screen, before any data was loaded, was very simple, with a button to load a file and an empty pane where the file would be loaded; unlike the debuggers, a user did not have to go hunting to find out how to load a file. Menus were simple, and only a few were available. The main window held a tabbed interface with multi-leveled file-viewers and other data on the tabs; sometimes, navigating through the folders displayed in the fileviewers could become annoying, if they didn’t hold anything interesting. Additionally, flipping away from a tab reset that tab; if a user was six levels into a file map and moved to a new tab to look at a different kind of data, they would come back to find themselves back on the first level again. However, CFF Explorer also offers a multi-windowed interface where each window is persistent, so if a user anticipates swapping between tabs, they can open this view and see the

ISBN: 1-60132-489-8, CSREA Press ©

Int'l Conf. Software Eng. Research and Practice | SERP'18 |

items side-by-side without having to worry about resetting each page. Navigating to this view was simple; it took the press of a single prominently-displayed button. There was not a user guide for the tool, but there were user guides for the scripting language the tool uses, the extension repository, and for writing an extension. Overall, CFF Explorer was very intuitive as a tool, though the information it displayed was sometimes opaque without specialized knowledge to understand it. Additionally, the author was not capable of crashing CFF Explorer, suggesting that CFF Explorer is very robust. SysInternals was inconsistent in terms of usability. Some tools had easy-to-use GUIs; other tools ran invisibly. The tools were not intuitively named, nor were they labeled in any way based on the interface they used, be it command-line, GUI, or other. In terms of overall intuitive comprehension, SysInternals ranked the lowest. The tools in the suite that did offer GUIs were fairly intuitive to use, however, they all had different styles of interfaces with different designs and standards; knowing one tool’s interface would not help you navigate another tool’s interface. Some tools executed silently and were meant to be caught by a program, or popped up a command-line output. There was no how-to or README file to help explain the tools, and no user guide available. A book is available, but it is outdated and additionally hidden behind a paywall, which made it beyond the scope of this study. The toolset does have some associated white papers and videos, but none of those are a user guide. The user would have to read all the papers and watch all the videos just in hopes of gleaning relevant data in terms of using the tool. The author could not crash these tools, either, but these tools did succeed at crashing the author’s virtual machine.

5. Discussion Of the dynamic debuggers, IDA stands out as the one with the most functionality and the best usability. It is the most intuitive for a novice reverse engineer. While it does take longer to load the code in and is relatively heavyweight when compared to either of the other two, it makes up for this in offering more readable and better annotated assembly code and a clearer layout. However, the other two tools are slimmer tools and faster at loading and analyzing code. If a user wants a full analysis of a tool, including both static and dynamic analysis, IDA Pro is the better choice, but if they are worried about memory usage and load times, then Immunity or OllyDbg may be the better tool. Because of its more limited functionality (one way to attach, no scripting) OllyDbg is the better of OllyDbg or Immunity for newcomers; however, a novice who wants to eventually expand into more detailed analysis may prefer to start with Immunity Debugger instead. Additionally, while all three debuggers allow users to write new extensions, Immunity Debugger also allows users to execute Python scripts, which makes it more flexible than the other debuggers. However, the Python scripts may not be

73

immediately transparent to a novice, so in terms of novice developers, OllyDbg may be the better choice, as it is simpler and relies only on its GUI for functionality. IDA Pro has its own internal programming language, giving it some of the same flexibility as Immunity Debugger. As a user will have to learn that programming language, Immunity Debugger would be a better choice for novices who want to write scripts for the tools. If the user already knows Python, they can jump immediately into scripting without having to learn a new language. Immunity Debugger also includes some prewritten methods for Python scripting, making the process more accessible overall. CFF Explorer is the most lightweight of all the comprehensive tools; if the user does not need a debugger, it provides functionalities the other tools do not provide and gives a clear view of the target process’s dependencies, among other insights. Additionally, it is simple and intuitive to use, easily navigated by a novice, and robust. SysInternals, while lightweight, is best used by an experienced user who knows what they’re looking for in a system. Its uses cannot be easily discerned through a point-and-click method, and as the user manual is behind a paywall, it is also difficult to look up how to use it properly. The majority of the SysInternals suite is almost impossible to comprehend without foreknowledge or extensive research. Novice reverse engineers could be assisted by providing tools with intuitive user interfaces, or reworking existing tools to have more intuitive interfaces. A simple change would be to improve OllyDbg and Immunity Debugger by changing the letters in the toolbar to descriptive titles or images. In OllyDbg’s case, it could further improve by providing colorcoding as Immunity Debugger does. Immunity Debugger could present one method of attaching in its menus, and abstract the other methods to a second menu level or to the scripting language. CFF Explorer is largely intuitive; however, it would be preferable if it did not reset the tabs when the user clicked to a new tab, or warned users about this quirk and directed them to the multi-window interface. IDA Pro presents many toolbars and windows at startup, which can give the user an impression of clutter. If it presented fewer toolbars on startup, perhaps only the ones that included the most vital information, but included the option to enable more toolbars later on, it would give less of an impression of clutter, and would still allow users to have quick access to all the functionality they wanted to have access to. The SysInternals tools that did have GUIs were reasonably comprehensible. If all the tools for which it is reasonable to give a GUI had GUIs, and all the GUIs were based on the same format, then SysInternals would be more accessible to a novice. The tools do not have to be simple to be accessible to a novice. Although IDA offered the most complexity of any dynamic debugger and disassembly tool, it was also the most usable of those tools thanks to its intuitive user interface. By presenting a user interface that was human-readable and well-annotated,

ISBN: 1-60132-489-8, CSREA Press ©

74

Int'l Conf. Software Eng. Research and Practice | SERP'18 |

IDA stood out as an easy tool to use, regardless of its advanced functionality. Clearly marking advanced functionality or making the simple functionality more reachable than the advanced, such as placing the simple functionality on a toolbar and the advanced functionality in menus, could assist a novice user in immediately determining which functions were of use to them. By providing better user interfaces overall, the tools will become more usable for both novices and experts. As several of the tools also offer command-line scripting, adding a more intuitive GUI should not prevent the expert user from using the tool the way they are familiar with; experts can continue to use the command line and other scripting interfaces, while the GUI can be used by the novices.

best off sticking to tools with a clear, intuitive GUI. Future tools that want to be accessible to novices should focus on simple and intuitive GUIs with descriptive labels, while not removing functionality from the tools.

References [1] H. A. M¨uller, J. H. Jahnke, D. B. Smith, M.-A. Storey, S. R. Tilley, and K. Wong, “Reverse engineering: A roadmap,” in Proceedings of the Conference on the Future of Software Engineering. ACM, 2000, pp. 47–60. [2] A. von Mayrhauser and A. M. Vans, “From code understanding needs to reverse engineering tool capabilities,” in Computer-Aided Software Engineering, 1993. CASE’93., Proceeding of the Sixth International Workshop on. IEEE, 1993, pp. 230–239.

6. Threats to Validity

[3] E. J. Chikofsky and J. H. Cross, “Reverse engineering and design recovery: A taxonomy,” IEEE software, vol. 7, no. 1, pp. 13–17, 1990.

Internal Validity. Only one novice evaluated the tools. Therefore, the results may differ with a different novice. The guidelines were developed to give an overview of the system from a novice perspective; however, there are doubtlessly functionalities that were left out. To the best of the author’s ability, the guidelines capture the core functionalities of the tools.

[4] G. CanforaHarman and M. Di Penta, “New frontiers of reverse engineering,” in 2007 Future of Software Engineering. IEEE Computer Society, 2007, pp. 326–341.

The selection of tools may not truly reflect the most popular or the most easily discovered tools. However, it seems reasonable that a novice would search for tools on a search engine, so it is likely the tools are at least somewhat discoverable for novices. External Validity. As the tools were chosen and evaluated by a single investigator who also developed the guidelines, the results are not generalizable. However, this paper presents the baseline for an investigation into modern tools, which other authors can use going forward to conduct their own investigations. Additionally, most of the chosen measures, such as what platform the tool runs on and whether or not it has a search feature, are essentially quantitative and would not change regardless of investigator. Only the qualitative elements of the guidelines would change with the investigator.

´ Besz´edes, M. Tarkiainen, and T. Gyim´othy, “Columbus[5] R. Ferenc, A. reverse engineering tool and schema for c++,” in Software Maintenance, 2002. Proceedings. International Conference on. IEEE, 2002, pp. 172– 181. [6] M. Hafiz and M. Fang, “Game of detections: how are security vulnerabilities discovered in the wild?” Empirical Software Engineering, vol. 21, no. 5, pp. 1920–1959, 2016. [7] B. Bellay and H. Gall, “A comparison of four reverse engineering tools,” in Reverse Engineering, 1997. Proceedings of the Fourth Working Conference on. IEEE, 1997, pp. 2–11. [8] H. M¨uller. Rigi: a visual tool for understanding legacy systems. [Online]. Available: www.rigi.cs.uvic.ca [9] Y.-G. Gu´eh´eneuc, “A reverse engineering tool for precise class diagrams,” in Proceedings of the 2004 conference of the Centre for Advanced Studies on Collaborative research. IBM Press, 2004, pp. 28–41. [10] M.-A. D. Storey, S. E. Sim, and K. Wong, “A collaborative demonstration of reverse engineering tools,” ACM SIGAPP Applied Computing Review, vol. 10, no. 1, pp. 18–25, 2002. [11] S. Tilley and S. Huang, “Evaluating the reverse engineering capabilities of web tools for understanding site content and structure: A case study,” in Software Engineering, 2001. ICSE 2001. Proceedings of the 23rd International Conference on. IEEE, 2001, pp. 514–523. [12] Hex-Rays. Ida pro. [Online]. rays.com/products/ida/index.shtml

7. Conclusions Overall, IDA Pro’s free version is the best freeware available for novice reverse engineers. CFF Explorer, while not a dynamic debugger, offers an intuitive UI and a simple interface suitable for newcomers as well. Immunity Debugger and OllyDbg are slim tools that might be confusing to a novice, but nonetheless offer functionality and user interfaces for the user to discover their features through. SysInternals is best used by an expert, and may be overwhelming for a newcomer. Each tool has its role in reverse engineering; novices, however, are

Available:

https://www.hex-

[13] O. Yuschuk. Ollydbg. [Online]. Available: http://www.ollydbg.de/ [14] Immunity. Immunity debugger. [Online]. https://www.immunityinc.com/products/debugger/,

Available:

[15] D. Pistelli. Ntcore. [Online]. Available: www.ntcore.com/exsuite.php [16] Microsoft. Windows sysinternals. [Online]. Available: https://technet.microsoft.com/en-us/sysinternals/bb545021.aspx [17] G. C. Gannod and B. H. Cheng, “A framework for classifying and comparing software reverse engineering and design recovery techniques,” in Reverse Engineering, 1999. Proceedings. Sixth Working Conference on. IEEE, 1999, pp. 77–88.

ISBN: 1-60132-489-8, CSREA Press ©

Int'l Conf. Software Eng. Research and Practice | SERP'18 |

75

An Approach to Analyze Power Consumption on Apps for Android OS Based on Software Reverse Engineering E. Caballero-Espinosa, A. Payne, T. Atkison Computer Science Department, University of Alabama, Tuscaloosa, AL, USA Abstract— Managing power consumption efficiently in mobile devices is a challenge for software engineers who design and develop operating systems or applications for this market. Studies have been conducted to offer tools and techniques aimed to accomplish this goal by implementing simulation, apps, and sophisticated power meters. However, they do not provide a practical methodology for those students, practitioners or newcomers who want to analyze source code of apps and reduce power consumption. In this paper, a new approach is presented applying Software Reverse Engineering (SRE) to optimize the source code of mobile applications in Android OS and increase the battery duration. The results demonstrate that the proposed lightweight methodology reduces power consumption on a mobile application up to 20%. Keywords: Apps, Mobile Devices, Power Consumption Analysis, Software Reverse Engineering, Source Code Analysis.

1. Introduction Handheld devices, such as smartphones and tablets, have become important tools for human beings not only for communication but to support daily basic activities. They include organizing lists of tasks, geolocation, and reading news from the Internet. However, when these apps are executed by users, the instructions of the source code generate processes that must be attended to by the processor. This constant interaction generates a degree of power consumption. Therefore, techniques and methods are needed to minimize the power consumption on mobile devices. The research presented in this paper will attempt to answer the following question: Can a flexible method be suggested to debug the source code of any Android mobile app by means of a Software Reverse Engineering (SRE) approach to aid in a power consumption reduction? This effort is important to the SRE field because it has been identified as an emerging goal [1]. Therefore, a methodology based on SRE will be proposed in this research to analyze the source code of a mobile application and identify potential changes that could contribute to reducing the power consumption. In addition, this analysis is crucial in the context of mobile devices and sensor networks [2], [3]. Moreover, the results will offer hints to recommend changes for future maintenance tasks making the power consumption performance more efficient. In addition, this study will provide additional information to the

current comprehension of the mobile applications, which is not accessible by using other methods. The remainder of this paper is structured as follows. In Section 2, a succinct background about SRE will be presented. In Section 3, the methodology presented in this research will be described. In Section 4, the results will be presented. In Section 5, the limitations and threats to validity will be discussed, as well as the implications to practitioners and researchers by suggesting future work. Finally, conclusions will be drawn in Section 6.

2. Background 2.1 SRE and Legal Aspects Chikofsky and Cross [4] define SRE as the process of analyzing a subject system to identify the system’s components and their inter-relationships and create representations of the system in another form or at a higher level of abstraction. In other words, SRE is a process of examination, not a process of change or replication. Furthermore, SRE inverts the traditional software life cycle taking place on the design stage. Therefore, when a software engineer performs reverse engineering, software artifacts like source code, executables, project documentation, or test cases are the source of information [1], [4]. Despite the fact that researchers and practitioners applied SRE to analyze software products for different purposes, such as teaching and fixing bugs, it is very important to keep in mind some practical legal aspects before undertaking a SRE project. Behrens and Levary [5] recommend straightforward steps in order to avoid legal conflicts like Sega Enterprise Ltd. vs Accolade, Inc. First, the attorney should advise the client to obtain an authorized copy of the software. The second step indicates that attorneys should advise their clients to be sure that reverse engineering is the only mechanism to get information from the software under evaluation. The third step indicates that engineers should be advised to reverse engineer only the portions of the original program needed to decipher the precise functional elements required for the new program. The next step suggests that software developers should be sure to divide their reverse engineering efforts between two groups of engineers, one group to reverse engineer the program, the other to develop the new software. Finally, software developers should conduct research on the product to be reverse engineered to ensure that patent law does not provide protection

ISBN: 1-60132-489-8, CSREA Press ©

76

Int'l Conf. Software Eng. Research and Practice | SERP'18 |

for the particular process or function that is to be reverse engineered and used in a new program.

2.2 Android Applications and Power Consumption Analysis The usage of mobile devices such as smartphones and tablets has increased dramatically in the past decade [6]. They have also become indispensable personal gadgets to support activities in almost every aspect of our lives [7]. Because of the fast advancement of mobile technologies, platforms, as well as the enthusiasm of individual developers, a wide range of mobile applications, commonly named apps, has been created to serve and make daily life more convenient. These apps are distributed as special files, called Application Package File (APK), with the .apk file extension [8]. Ongkosit and Takada [9] state that responsiveness has been an important type of quality factor in Android apps since it directly affects user experience. However, Couto et al., [10] argue that the Android ecosystem was designed to support all different mobile (and non-mobile) devices (ranging from smart-watches to TVs). As a result, a power consumption model for Android needs to consider all the main hardware components and their different states (for example, CPU frequency, percentage of use, etc.).

Fig. 1 W ELCOME SCREEN AND TUTORIAL .

power consumption of their apps in order to make changes to their source code.

3.2 Ethics

Previous studies have made important contributions to the analysis of power consumption in mobile-system devices and Android OS by using a range of methods and tools. They include simulating a variety of environments or test scenarios simultaneously [11], Arduino Duemilanove Board [12], apps like PowerTutor [13], commercial devices like Power Monitor [14], and web-based platforms for measurement and analysis of power consumption on multiple mobile devices [15].

To put into practice the concepts and legal aspects of SRE, an app developer was contacted in order to get an authorized copy of an app’s source code. The source code was the subject of this research. In this case, the developer was informed about the goal of the study, the techniques that would be applied, and what sections of the code would be used in the results section. After the explanation, the developer agreed to share the app’s source code.

Furthermore, other researchers have analyzed power consumption of apps based on instructions executed. Couto et al. [16] applied API-based software for monitoring and analyzing power consumption for the Android ecosystem. Hao et al. [17] combined program analysis and per-instruction energy modeling. However, they suggest neither a SRE method nor insights about code optimization to reduce the power consumption of an app.

3.3 Study Object and Subsystem

3. Methodology In this section, the experimental setup and how the study was conducted will be discussed.

3.1 Context The focus of this research will be in developing and providing a methodology for power consumption analysis based on SRE. This proposal is an option for those app developers who can afford to acquire neither expensive nor affordable instruments like Monsoon Power Monitor [18], for tracking

The source code used was from a music-player prototype that works by gesture recognition (Figure 1), which was developed in Android Studio 2.3.1. This programming environment is an Integrated Development Environment (IDE) for Android app development, based on IntelliJ IDEA [19]. The source code contains 350 lines, and three classes comprise this mobile application, such as MainActivity, MainInstructions, and POP. This app allows users to manage music files and execute the main functions. They include play (shaking), forward (passing the right hand in front of the phone), backward (passing the left hand in front of the phone), stop (double tap) and volume control (swipe the screen up or down). In the case of the MainActivity class, only the first three functions will be evaluated because the actions to activate volume control and stop require swapping the screen or continuous taps. Consequently, sophisticated instruments are needed to measure these power consumption cases.

ISBN: 1-60132-489-8, CSREA Press ©

Int'l Conf. Software Eng. Research and Practice | SERP'18 |

77

Table 1 C OMPONENTS OF S AMSUNG G ALAXY E XPRESS P RIME . Component Processor LCD Display Wi-Fi GPS Cellular Audio Battery Operating System

Description Quad-core, 1300 MHz, ARM Cortex-A7, 28 nm Super AMOLED IEEE 802.11 b/g/n A-GPS Cricket-Mobile USA: GSM/UMT/LTE/HSPA Built-in microphone and speaker Internal Rechargeable Li-ion: 2600 mAh Android 6.0 Marshmallow

Murmuria et al. [20] define subsystems as each dependent on various factors that determine their power consumption. This includes the amount of time they are kept alive, but more importantly, the state of the hardware, and other environmental factors. Table 1, based on Zhang et al. [21], details the Samsung Galaxy Express Prime device used for monitoring the power consumption of the study object.

3.4 Power Consumption Measurement Currently, most power measurements are based on hardware meters used to confirm power side channels and obtain specialized measurements on mobile phones. Nonetheless, these instruments are still not applicable for this study because they are apparently not a standard auxiliary accessory for smartphones [18]. However, Android is an operating system that makes accessible power statistics. In general, instant power numbers can be calculated based on voltage and current readings of BMU (Battery Monitoring Unit). For this research, PowerTutor was used to obtain measurements before and after possible changes in the source code. This app has been widely used [21] and is a free app available on Google Play (Figure 2). In addition, it provides accurate estimates of energy consumption in real time to the hardware components, including CPU and LCD display, as well as GPS, Wi-Fi, audio and mobile network interfaces. For apps, it calculates how much power each app uses by simulating hardware state as though the app was running alone [21].

Fig. 2 P OWERT UTOR INTERFACE . Table 2 R EPLACING R EDUNDANT E XPRESSIONS . Original code x = (a + b *c) * 10 z = a + c * b + d/e

After replacement aux = a + b *c x = aux * 10 z = aux + d/e D

expression once and the execution time would decrease. They also propose the use of direct acyclic graph (DAG). The complete optimization process of a source code by replacing equivalent expressions and using a DAG is as follows. 3.5.1 Identifying and replacing equivalent expressions This stage is based on mathematical properties. The first step is to scrutinize each logical unit to determine expressions and subexpressions. After that, the equivalent expressions should be identified. Finally, the equivalent expressions should be replaced by a representative one. Table 2 illustrates this step.

3.5 Optimizing the Performance Based on Reverse Engineering

3.5.2 Constructing a DAG

Regarding source code, Couto et al. [10] recommend monitoring application methods, since they are the logical code unit used by programmers to structure the functionality of their applications. Nonetheless, classes will still be monitored since the app prototype only contains three main classes.

A DAG for each logic unit can be described as follows: leaf nodes are labeled with unique identifiers like variable or constants; internal nodes are labeled with operation symbols; arcs represent relationships between operations and operand; and internal nodes represent computed values held by the identifiers of these nodes, i.e. Figure 3.

In order to optimize the source code, Binh et al. [22] propose reverse engineering and the substitution of equivalent expressions as an improved local optimization method for high-level source code. They also highlight that the technique is generic because it is independent of the processor. In addition, by using this proposal, the app only needs to calculate an equivalent

When common subexpressions, dead code, and unused variables are removed, this action may improve the app performance and minimize power consumption levels. Figure 4 illustrates the procedure adapted to this case study.

ISBN: 1-60132-489-8, CSREA Press ©

78

Int'l Conf. Software Eng. Research and Practice | SERP'18 |

Fig. 3 DAG FOR EQUIVALENT EXPRESSION IN A LOGIC UNIT.

Fig. 5 A PP DEMO INSTALLED . Table 3 S TUDY O BJECT C LASSES . Classes MainActivities MainInstructions POP

Fig. 4 S OURCE - CODE OPTIMIZATION PROCESS USING EQUIVALENT EXPRESSIONS REPLACEMENT.

3.6 Getting started With the research question in mind, the construction of the pilot case study was designed and a protocol created. Then, a developer was contacted to get the source code of an app following SRE legal practices. It is important to indicate that only the app, description and programming environment were requested for this experiment, so software requirements or specific features were not asked for to avoid threats to validity. After setting up the programming environment (Android Studio), the app was imported and its execution verified according to the descriptions provided by the developer. The subsystem was, then, set by enabling its development options like USB debugging, and a demo of this app was installed for the experiments (Figure 5). Joule, an amount energy equivalent to one watt per 1 second, was chosen as the unit for power consumption measurements from PowerTutor. For the source code analysis, a flexible design was created to allow both engineers and developers to optimize the source code using reverse engineering techniques like replacing equivalent expressions and constructing DAGs. For this first

Description It contains the action of each gesture. It makes available the welcome screen and graphical tutorial. It makes available i or help bottom.

approach, an app was investigated because it is more feasible to get and evaluate its source code than robust software.

4. Results The analysis identified three main classes as logic units on the source code. Two of them control basic functions, such as the help bottom and the welcome screen that also shows a concise tutorial. However, MainActivity, the main class, contains the instructions of each gesture and other actions. Table 3 shows the description of each class. As for power consumption, PowerTutor was used to obtain measurements during the execution of the app before and after modifications to the source code in order to track power consumption for this study object. However, PowerTutor provides accumulative measurements according to the processes executed by each app; as a result, this information is not shown by instruction or line of code executed. In this case, power consumption is estimated for each class or function in the source code by subtracting the measurement of the next action to the measurement of the action executed before. An action is defined as a tap or gesture that activates the instructions in the class. P Caction = mn+1 − mn−1

ISBN: 1-60132-489-8, CSREA Press ©

(1)

Int'l Conf. Software Eng. Research and Practice | SERP'18 | Table 4 F IRST SET OF POWER - CONSUMPTION MEASUREMENTS . Classes MainActivities MainInstructions POP

Power Consumption (mJ) 1671 88 78

79

results.

5. Discussions In this section, limitations, and validity of the results will be discussed, as well as the implications for practitioners and researchers.

5.1 Validity and Limitations Fig. 6 C ODE ANALYZER OPTION ON A NDROID S TUDIO .

For Equation 1, let P Cactions be the power consumption measurement estimated for each action. Then, m is the measurement obtained from PowerTutor, and n is the number of taps required to generate one event on the app. For example, the first tap activates the welcome screen and generates a measurement of 75 mJ. The next action is playing the song by shaking the device and generates an accumulative measurement of 185 mJ. Applying Equation 1, n is equal to two, the measurement for the welcome screen (MainInstructions class) is 70 mJ and 110 mJ for playing the song (gesture recognition on the MainActivity class). Table 4 shows the first set of measurements before modifications to the source code. Analysis of the source code was performed using the Code Inspector tool available on Android Studio tool bar (Figure 6). After running the tool, the inspection results displayed a number of warnings without significant impact on the execution of the study object. Table 5 shows the results of this inspection. However, each warning is evaluated to find common subexpressions, dead code, or unused variables that could affect the power consumption.

To increase the credibility, an app developer was contacted to obtain the study object. Particular requirements or features were not requested. Moreover, PowerTutor was used to obtain accurate power consumption measures. However, one limitation regarding this tool is the accumulative format to show the power consumption measurements for each app. Because of that, a simple equation was defined to estimate the power consumption of each class or functions in the source code based on the actions that generate an event. Therefore, it could cast threats to validity. This research is a small scale case study that involves one app. Therefore, the results are not generalizable to other apps since developers could have taken into account one or more contexts, used different programming languages, and written thousands of lines of code. In fact, the study was a design decision because it is a need identified in a previous publication [1] and has not been investigated from a SRE perspective. Furthermore, the existing literature about power consumption analysis on both devices and apps did not cover the topic applying SRE. This study also contributes to the “green computing” perspective by finding alternative methods to reduce utilization of power resources.

5.2 Implications for Practitioners

After analysis, clusters 1 and 5 were determined to not require changes because these warnings pinpoint phrases or variables written in Spanish. In addition, the warnings about declaration suggested some variables could be private; nonetheless, no changes were made since they do not represent a technical error on Java. The same was determined for cluster 6. Regarding cluster 4, an unused import found during the code inspection was deleted. For cluster 7, both control-flow warnings were simplified. Because the source code did not have common mathematical subexpressions, this reverse-engineering technique could not be applied.

This study was designed to provide a flexible methodology based on SRE. The results show that identifying and adjusting issues like unused variables, dead code or equivalent expressions can reduce power consumption. The following steps are a guideline to replicate our approach for other mobile applications for Android OS. Figure 7 shows the flowchart of this guideline.

Upon finishing the analysis and optimization process, the study object was compiled and tested on the subsystem again. Table 6 shows the second set of power-consumption measurements estimated by PowerTutor. The new measurements indicate a decrease of power consumption in runtime.

3) Set the programming language environment on a computer and the subsystem for app tests on your mobile phone.

Regarding the MainActivity class, the three actions under evaluation also presented improvements. Table 7 shows the

1) Read the practical legal aspects of SRE. 2) Contact the app developer and apply the practical legal aspects of SRE.

4) Identify logic units of interest in the source code like methods or classes and create a register, see Table 6. 5) Download PowerTutor on the subsystem.

ISBN: 1-60132-489-8, CSREA Press ©

80

Int'l Conf. Software Eng. Research and Practice | SERP'18 | Table 5 R ESULTS OF THE SOURCE - CODE INSPECTION . .

# 1

Cluster Internationalization

Text View

2 3 4 5

Declaration Imports Probable bugs Spelling

Declaration access Unused import Unused assignment Typo

8 1 1 10

6

Verbose or redundant code constructs Control flow

Redundant type cast

2

If statements can be simplified

2

7

Type

Table 6 S ECOND SET OF POWER - CONSUMPTION MEASUREMENTS . Classes MainActivities MainInstructions POP

Power Consumption (mJ) Before After 167 122 88 70 78 73

Table 7 M EASUREMENTS ON MainActivity FUNCTIONS . Classes Play Forward Backward

Power Consumption (mJ) Before After 69 48 47 38 51 36

6) Take measurements of the app in runtime on the subsystem and apply Equation 1 if it is needed. 7) Create a table to register the measurements obtained from PowerTutor. 8) Optimize the source code by applying Section 3.5 described in the methodology section. 9) Set the subsystem with the new version of the app after source code modifications. 10) Take the second set of measurements and compare them against the first set.

5.3 Implication for Research Although this case study is a small-scale design, it provides interesting results and a flexible methodology to analyze the power consumption on apps following SRE concepts and techniques. In this research, measurements of power consumption were presented for logic units on source code. Future research could estimate the power consumption of the most implemented logic units from the most popular IDEs to develop apps, such as Android Studio. From these results, it would be possible to generate metrics about power patterns and power consumption for each logic unit. It would also provide elements for a robust security analysis, and allow developers to create programming strategies to prevent power-side channel attacks [18].

Quantity 6

Example String literal in ’setText’ can not be translated. Use Android resources instead. They can be private. Unused import ’import android.view.keyEvent;’ Variable ’listOfSensorOnDevice’ is never used. Words written in Spanish language like ’Detenido’ stoped. Casting ’this’ to ’SensorEventListener’ is redundant. if (prueba==true) {...

Furthermore, the methodology could be applied to analyze the source code of robust apps from different contexts and use software to automate this task. It will also include experiments on diverse mobile devices and processors. In this case, CodeSurfer and Bauhaus could be acquired for this purpose because the source code of these apps is comprised of thousands of lines of code and it is a time consuming task for humans.

6. Conclusions The results of research conducted to study power consumption on mobile applications were presented based on source code analysis by following SRE techniques. From these results, a preliminary methodology was built explaining how to apply practical legal aspects of SRE and perform a power consumption analysis using PowerTutor. These results answered the research question of this study. Furthermore, the results provide a complete description of how a practitioner can apply SRE to evaluate a source code and apply changes to make an app more efficient in terms of CPU use, power consumption and lines of code. As far as we know, this is the first study to address this topic in SRE. While this methodology may not be complete, this approach can be modified and enriched by including other application contexts, size of source code, programming environment, and affordable SRE tools, not only for researchers but also practitioners. Therefore, the proposed approach provides a well-founded starting point to keep studying this topic from a SRE perspective.

References [1] G. Canfora, M. Di Penta, and L. Cerulo, “Achievements and challenges in software reverse engineering,” Communications of the ACM, vol. 54, no. 4, pp. 142–151, 2011. [2] K. Sohrabi, J. Gao, V. Ailawadhi, and G. J. Pottie, “Protocols for selforganization of a wireless sensor network,” IEEE personal communications, vol. 7, no. 5, pp. 16–27, 2000. [3] I. F. Akyildiz, W. Su, Y. Sankarasubramaniam, and E. Cayirci, “Wireless sensor networks: a survey,” Computer networks, vol. 38, no. 4, pp. 393– 422, 2002.

ISBN: 1-60132-489-8, CSREA Press ©

Int'l Conf. Software Eng. Research and Practice | SERP'18 |

81

Fig. 7 G UIDELINE TO APPLY POWER CONSUMPTION ANALYSIS BASED ON SRE.

[4] E. J. Chikofsky and J. H. Cross, “Reverse engineering and design recovery: A taxonomy,” IEEE software, vol. 7, no. 1, pp. 13–17, 1990. [5] B. C. Behrens and R. R. Levary, “Practical legal aspects of software reverse engineering,” Communications of the ACM, vol. 41, no. 2, pp. 27–29, 1998. [6] R. M. Frey, R. Xu, and A. Ilic, “Mobile app adoption in different life stages: An empirical analysis,” Pervasive and Mobile Computing, vol. 40, pp. 512–527, 2017. [7] H. Cao and M. Lin, “Mining smartphone data for app usage prediction and recommendations: A survey,” Pervasive and Mobile Computing, vol. 37, pp. 1–22, 2017. [8] A. Tongaonkar, S. Dai, A. Nucci, and D. Song, “Understanding mobile app usage patterns using in-app advertisements,” in International Conference on Passive and Active Network Measurement. Springer, 2013, pp. 63–72. [9] T. Ongkosit and S. Takada, “Responsiveness analysis tool for android application,” in Proceedings of the 2nd International Workshop on Software Development Lifecycle for Mobile. ACM, 2014, pp. 1–4. [10] M. Couto, T. Carc¸a˜ o, J. Cunha, J. P. Fernandes, and J. Saraiva, “Detecting anomalous energy consumption in android applications,” in Brazilian Symposium on Programming Languages. Springer, 2014, pp. 77–91. [11] U. Bareth, “Simulating power consumption of location tracking algorithms to improve energy-efficiency of smartphones,” in Computer Software and Applications Conference (COMPSAC), 2012 IEEE 36th Annual. IEEE, 2012, pp. 613–622.

[15] S. A. Carvalho, R. N. Lima, D. C. Cunha, and A. G. Silva-Filho, “A hardware and software web-based environment for energy consumption analysis in mobile devices,” in Network Computing and Applications (NCA), 2016 IEEE 15th International Symposium on. IEEE, 2016, pp. 242–245. [16] M. Couto, J. Cunha, J. P. Fernandes, R. Pereira, and J. Saraiva, “Greendroid: A tool for analysing power consumption in the android ecosystem,” in Scientific Conference on Informatics, 2015 IEEE 13th International. IEEE, 2015, pp. 73–78. [17] S. Hao, D. Li, W. G. Halfond, and R. Govindan, “Estimating mobile application energy consumption using program analysis,” in Proceedings of the 2013 International Conference on Software Engineering. IEEE Press, 2013, pp. 92–101. [18] L. Yan, Y. Guo, X. Chen, and H. Mei, “A study on power side channels on mobile devices,” in Proceedings of the 7th Asia-Pacific Symposium on Internetware. ACM, 2015, pp. 30–38. [19] Android. Android developer resources. [Online]. https://developer.android.com/studio/intro/index.html

Available:

[20] R. Murmuria, J. Medsger, A. Stavrou, and J. M. Voas, “Mobile application and device power usage measurements,” in Software Security and Reliability (SERE), 2012 IEEE Sixth International Conference on. IEEE, 2012, pp. 147–156.

[12] R. Trestian, A.-N. Moldovan, O. Ormond, and G.-M. Muntean, “Energy consumption analysis of video streaming to android mobile devices,” in Network Operations and Management Symposium (NOMS), 2012 IEEE. IEEE, 2012, pp. 444–452.

[21] L. Zhang, B. Tiwana, Z. Qian, Z. Wang, R. P. Dick, Z. M. Mao, and L. Yang, “Accurate online power estimation and automatic battery behavior based power model generation for smartphones,” in Proceedings of the eighth IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis. ACM, 2010, pp. 105–114.

[13] A. Vieira, D. Debastiani, and C. MATTOS, “An analysis of power and performance of applications for mobile devices with android os,” in South Symposium on Microelectronics, 2012. [14] M. Y. Malik, “Power consumption analysis of a modern smartphone,” arXiv preprint arXiv:1212.1896, 2012.

[22] N. N. Binh, P. Van Huong, and B. N. Hai, “A new approach to embedded software optimization based on reverse engineering,” IEICE TRANSACTIONS on Information and Systems, vol. 98, no. 6, pp. 1166– 1175, 2015.

ISBN: 1-60132-489-8, CSREA Press ©

82

Int'l Conf. Software Eng. Research and Practice | SERP'18 |

TTExTS: A Dynamic Framework to Reverse UML Sequence Diagrams From Execution Traces X. Zhao, A. Payne, T. Atkison Computer Science Department, University of Alabama, Tuscaloosa, AL, USA Abstract— Most approaches for reversing UML diagrams are static and highly dependent on the source code. Static analysis is time consuming and error prone. This paper extracts UML sequence diagrams in a dynamic fashion; reversing from system application, but not source code. To achieve this, a dynamic framework for reversing UML sequence diagrams, called TTExtS, is proposed. TTExtS contains three subsystems: trace collection, trace transformation and sequence diagram generation. Results show that TTExtS can reverse the UML sequence diagram from system execution in reasonable time, less than 2 seconds processing time. The diagrams produced are high quality, which will allow stakeholders to delve deeper into the behavior aspects of applications without manually analyzing the code. Keywords: Reverse Engineering, Model Driven Engineering, UML Sequence Diagram, Execution Trace.

1. Introduction

Fig. 1 A SIMPLE UML S EQUENCE D IAGRAM SHOWING USER LOGIN CASES

process. In this paper, we are interested in dynamic reverse fashion due to the following reasons:

Unified Modeling Language (UML) is a standard modeling language for object-oriented software design and analysis. There are several advantages when adopting UML during software system implementation. For example, many diagrams generated during UML designs, such as UML class diagrams, UML use-case diagrams, UML sequence diagrams, etc., could be employed as system testing tools and basics.





Dynamic analysis is closer to the nature of sequence diagram - sequence diagram represents activities with interaction. The diagram in fact, is “dynamic”. Remove the restriction imposed on the reversing process by legacy systems. In dynamic reversing, sequence diagrams can be extracted from system execution, but the details behind the system executions are not needed. Dynamic approaches save retrieval time and reduce possibilities of bugs and errors compared with static analysis, which is mainly done manually.

UML sequence diagrams describe interactions between different objects in chronological order. They have strong readabilities and UML specifications are more sequence diagram centric. Another important feature of UML sequence diagrams are that they have a better connection with reverse engineering. When there are discrepancies between UML sequence diagrams and system code, reverse engineering could be used as a tool for the retrieval of more accurate diagrams. Figure 1 shows a simple UML sequence diagram for a user login use-case.

Similar to previous efforts (e.g. [1] [2] [3]), this work is steered by the core issues in dynamic reversing of sequence diagrams. These issues are realized in the following questions:

There are two ways to perform the retrieval: static and dynamic. A static approach refers to the extraction of the diagram from system code, while a dynamic approach performs the process during system execution at runtime. Both of these two approaches have advantages and disadvantages. In the static approach, developers need the support of system code to achieve the process. This prerequisite brings several issues when it is unable to access the system code. Besides, when the system has a large size, analyzing the code is a time consuming

An execution trace, according to the definition provided by Oracle1 , “collects information about what is happening in your program and displays it”. Though there is no unified definition of trace and what traces should look like, execution traces have to include three items: caller, method, and callee. That is to say, one execution trace needs to clarify which object



• • •

How will execution traces be obtained? How are process traces collected? How can UML sequence diagrams be generated from these traces?

1 https://docs.oracle.com/cd/E19205-01/819-5257/blafk/index.html

ISBN: 1-60132-489-8, CSREA Press ©

Int'l Conf. Software Eng. Research and Practice | SERP'18 |

invokes which method to which target. Tewfik et al. [4] defined execution trace as a triplet , which reflects the essence for execution traces. In this paper, the above three questions are solved in different approaches. For trace collection and sequence diagram generation, existing tools - InTrace and WebSequenceDiagrams, are used. For trace process, an approach using trace transformation and trace merge is proposed. These two steps enable a bridge to be formed between trace collection and UML sequence diagram generation. To justify the framework, a prototype based on Java was implemented. The empirical case study shows that TTExTS can extract UML sequence diagram from system execution. The rest of the paper develops as follows: Section 2 introduces the background and related work of this study, including some approaches handling execution traces for sequence diagrams, different approaches for the reverse of UML sequence diagrams and some existing reverse tools. Section 3 gives an overview of the proposed framework in this paper. Sections 4 through Section 6 are discussions of the main steps in TTExTS. In Section 7, a case study is presented to validate the framework. Finally, in Sections 8 and 9, discussions and conclusions of the paper are drawn.

83

et al. [11] analyzed the connection between user and server and collected the traces during the communication. However, this method only applies to Ajax-based applications. In Ishio’s work [12], the authors proposed a quite novel idea to achieve this goal - it divides a long execution trace into several pieces and combine these pieces to generate the features in the sequence diagrams. However, this novel idea needs empirical evaluation on its performance, but the authors did not conduct experiments on the tool. The same issue also appears in another work [13]. Though Ziadi et al. [4] provided an evaluation of the tool, the evaluation lacks comparison. One can not determine its performance contrast with other current state-of-the-art tools. In Ng’s work [14], the authors proposed MoDec, a dynamic approach used to the reverse of sequence diagrams. Though the authors provided evaluation of the tool and verified that the tool outperformed other tools, this method highly depends on design motifs, which are provided by design patterns. It is hard to finish the reverse when the design pattern is unclear. COSMOPEN, a tool discussed in another work [15], also suffered from the high dependency on specific situations. This tool needs the correct capture of stacks stored during the execution.

2.2 Static and Hybrid Reverse for Sequence Diagrams

2. Related Work Execution traces are an important intermediate in our approach. In this section, some basic background on execution traces and how execution traces are developed and analyzed in order to obtain the reverse of sequence diagrams in existing literatures will be explored. Second, considering extracting execution traces, some methods use static or hybrid ways. These static/hybrid methods will be discussed after execution traces extraction. Finally, some existing tools for reverse engineering of sequence diagrams are presented.

2.1 Execution Traces for Sequence Diagram Execution trace is an important aspect in dynamic program comprehension. In Tewfik’s work [4], the authors defined execution trace as a sequence of method invocations. In a method invocation, a caller invokes callee through certain method(s). Execution traces can be applied to many areas, such as system understanding [5], performance analysis [6] [7], debugging and testing [8], and reverse engineering [4]. There are some efforts on how to extract execution traces at system runtime [9]. For this research, the focus will be on dynamic retrieving of execution traces to the reverse of sequence diagrams. The next subsection will discuss some static and hybrid methods for the reverse of sequence diagrams. Alalfi proposed a dynamic approach [10] for sequence diagrams from execution traces. Though this approach is dynamic, it only works in web applications. The authors did not generalize the framework to other fields. Similarly, Zaidman

Compared with execution traces for reverse of sequence diagrams, there are some works that attempted to employ static or hybrid (static and dynamic) approaches. Static approaches need the support from source code. In Imazeki’s study [16], the authors managed to extract sequence diagrams with analyzing source code. However, the approach proposed is only applicable to Web applications. Likewise, Roubstov [17] and Serebrenik [18] investigated this issue but they used the idea of interceptors. The introduction of interceptors could be helpful for reversing because it clearly separates core functions in JavaBeans. However, it is only applicable to Enterprise JavaBeans applications, which lacks the ability of generalization. Wang et al. [19] proposed a static method on sequence diagrams reverse based on Object Constraint Language (OCL) mapping. This method applied more meta-modeling thoughts, which has more consistencies, completeness and high-level abstractions. However, the authors failed to provide us a practical tool, making the idea only theoretically feasible. Recently, there is a trend that more and more methodologies for reversing of UML sequence diagrams adopt a hybrid approach, which combine both static and dynamic analysis. The motivation for the hybrid approach is that the execution traces are sometimes voluminous and unordered [20], it is efficient if only meaningful classes and methods are analyzed during execution. This needs the support of source code comprehension, which is a static aspect of system understanding. The basic idea behind these works is to reduce workload

ISBN: 1-60132-489-8, CSREA Press ©

84

Int'l Conf. Software Eng. Research and Practice | SERP'18 |

Fig. 2 OVERVIEW OF TTE X TS

of execution traces by applying control information with the help of static code analysis [21] [22] [23]. Other works, focused on more narrow aspects in the reversing process. In Myers’s investigation [24], the authors studied how loops in the program could be better analyzed and transformed to the reverse of sequence diagrams. Hybrid reverse approaches have not been fully studied by researchers due to the fact that it needs both the support of source code and the support of application execution. However, it seems that it will be an interesting study direction in the future.

2.3 Existing Tools There are several tools used for reversing sequence diagrams (While there are some tools used for the reverse of class diagrams, such as MoDisco, ModelMaker, they will not be discuss here). Among these tools, some of them depend on source code and adopt static analysis, such as RSA and MagicDraw. Other tools, such as MaintainJ1 , JSonde, UML2 trace Interaction View, javaCallTracer2 and CodeLogic3 , employ dynamic approaches by analyzing execution traces. There are still no reversing tools that are popular, or well-known tools that are in a hybrid fashion for reversing sequence diagrams. Compared with purely static or purely dynamic approaches, hybrid reverse is still immature and needs more development. Besides, in practical situations, hybrid reverse may introduce much more analyzing time due to the time consumption from both source code analysis and application execution. However, this may be an interesting direction for future research. Among all the tools, MaintainJ is a popular one because of the popularity, ease of access, and good performance. Tools like JSonde, is out-dated and not available through the official website. Even if some tools are still available on the Internet, they have not been updated for many years (For example, the latest version of UML2 trace Interaction View, the tool developed by IBM, still stays in 2009). CodeLogic is a good 1 http://maintainj.com/

2 https://sourceforge.net/projects/javacalltracer/

tool for reverse sequence diagrams, but it is not free (limited features for the free version). Though MaintainJ is also a licensed tool, the free version provides enough functions for users.

3. TTExTS Overview TTExTS stands for “Textual Transformations from Execution Traces to Sequence Diagram.” As the name implies, the core part of TTExTS is the transformation of traces. Figure 2 shows the overview of this framework. From Figure 2, the three steps in TTExTS can be seen: (1) Trace Collection; (2) Trace Transformation; (3) Diagram Generation. The framework starts with system executions and ends at UML sequence diagram generations. In Step 1, the framework collects execution traces during system execution at runtime. The output are several execution traces. Then in Step 2, these traces are transformed and merged in order to be recognizable for Step 3. This step acts as a bridge to connect Step 1 and Step 3. Finally, in Step 3, TTExTS takes merged traces as input and produces a UML sequence diagram as the final result, finishing the whole process. In Sections 4 to Section 6, these three steps will be discussed in detail, respectively.

4. Trace Collection There are several trace collection approaches and tools [25] [26]. In TTExTS, InTrace4 is employed to collect system execution traces. InTrace is a tool for tracing different classes (such as system class, library class, self-defined class) and collecting execution traces during system execution. The reasons InTrace was chosen are as follows: •

InTrace is integrated into Eclipse IDE. It is easy to access and free to use.

4 http://mchr3k.github.io/org.intrace/

3 http://www.codelogicinc.com/

ISBN: 1-60132-489-8, CSREA Press ©

Int'l Conf. Software Eng. Research and Practice | SERP'18 |

• •

85

Traces collected by InTrace contain all the information needed for further analyses. InTrace provides filter functions with include and exclude settings. Traces useful for analysis can be kept, and unneeded traces can be eliminated.

In order to better explain execution traces collected by InTrace, the execution of the following code snippet is introduced as an example: public class Test{ p u b l i c s t a t i c c l a s s A{ p u b l i c v o i d m1 ( ) { B b = new B ( ) b . m2 ( ) ; } } p u b l i c s t a t i c c l a s s B{ p u b l i c v o i d m2 ( ) { / ∗ TO DO ∗ / } } p u b l i c s t a t i c v o i d main ( S t r i n g a r g s [ ] ) { A a = new A ( ) ; a . m1 ( ) ; } }

Figure 3 shows the results obtained from InTrace for trace collections after executing the code snippet above. As seen, InTrace collected all the execution traces during system execution. The signature of one execution trace collected from InTrace is: [Timestamp] : [Thread ID] : Class being called: Method being called: Entrance/Exit Point From the signature, the following information is obtained: (1) When the method is called in the system and the intervals for different executions by subtracting timestamps; (2) Which thread is being executed (for multiple-thread systems); (3) Which class and which method in this class is being executed at which place in the source code. There are many forms of execution traces. InTrace may not provide the most information compared with other tools, but it already provides enough information for the future analysis to be done in this work. Readers could refer to its website for more information about the meaning for each token in one execution trace. Moreover, InTrace also provides filter functions with exclude and include options. To see how class A and class B from Figure 3 interact with each other, other traces can be filtered out by adding filter conditions, such as excluding instructions, execution traces with initiation traces - , and main function traces with the key word “main”. After filtering, the simplified execution traces for the code snippet above are shown in Figure 4.

Fig. 3 E XECUTION T RACES OBTAINED BY I N T RACE FOR SAMPLE CODE SNIPPET

Fig. 4 S IMPLIFIED E XECUTION T RACES FOR F IGURE 3

5. Trace Transformation Trace transformation acts as a connector in TTExTS. As the name TTExTS implies, this framework processes execution traces mainly through textual transformation. The reasons for adopting textual transformations are: •



In the Trace Collection phase, execution traces collected by InTrace could be stored in .txt files easily. InTrace provides an interface for users to store the traces in a textual format. The contents in these .txt files are the same as Figure 3 shows. In the Sequence Diagram Generation phase, the tool used needs some textual contents as input. The textual input conforms to some specific rules, each specific rule corresponds to certain structure in UML sequence diagrams.

However, the output of InTrace cannot be used directly as input for WebSequenceDiagrams because there are format discrepancies. In order to fill this gap, the trace transformation step was proposed.

5.1 Valid Textual Input for Diagram Generation In WebSequenceDiagrams, there are several valid inputs for generating sequence diagrams. Step 2 addresses the question: What should execution traces look like after transformation to make them valid for WebSequenceDiagrams? In the context of this paper, only three situations were focused on: normal message passings, message passings with loop structure, and message passings with alternative structure. Normal message passing are the most simple while common situations. From Figure 1, one can see that the most important

ISBN: 1-60132-489-8, CSREA Press ©

86

Int'l Conf. Software Eng. Research and Practice | SERP'18 |

components in UML sequence diagrams are objects and methods as passing messages. In WebSequenceDiagrams, a basic passing message has the signature of: Caller -> Callee : method In the sequence diagram shown in Figure 1, the first message - user attempts to log in - can be expressed as: user -> loginPage : login() Besides basic expressions for normal message passings, two other important features in UML sequence diagrams are of interest to this research: “alt” and “opt”. “Alt” means alternative. It is similar to the “if...else...” structure in a high level programming language. In WebSequenceDiagrams, the expression for an alt structure is: alt condition 1 A -> B: method1 else condition 2 A -> B: method2 end

5.3 Alternative Structure Identification If alternative structures are to be identified, at least two execution trace files are needed. When execution traces merge from different trace files, common traces must first be found according to the order they appear. Then these common traces must be fixed, and the different traces replaced with “alt”. If one of them is null, then the structure can be refined as “opt” in the results.

6. Diagram Generation In this section, the last step in the proposed framework will be discussed. The last step is UML sequence diagram generation. In this step, another external tool - WebSequenceDiagram1 is deployed. WebSequenceDiagrams is an online tool that generates UML sequence diagrams. This tool was chosen for the following reasons: •

“Opt” means optional. It is similar to “alt”, but “opt” only provides the situation with “if...” but no “else...” The expression format for “opt” is: opt condtion A -> B: method end





To conclude, the goal of this process is to transform execution traces from the format shown in Figure 4 to the same format shown in the above expressions.

5.2 Loop Structure Identification In a trace, there may exist loop structures. The character of loops shown in execution traces is that one execution trace or several execution traces binding as a group repeat at least two times continuously among all the execution traces. In this paper, loop structures are identified through Longest Repeated Substring (LRS). The implementation for finding the LRS is based on suffix trees [27]. For execution traces in one system execution, LRS must first be found. If these traces contain one LRS, then it must be determined whether the occurrences of this LRS are continuous for all the positions it appears. If these are such LRSes, their occurrences are recorded and all these LRSes are transformed to loop structure. At last, these LRSes are placed with empty substring. The above process is iterated to find the second-longest repeated string, third-longest repeated string, so on and so forth. The search stops when there are no LRS in the execution traces or all the execution traces are replaced by empty substrings.

Like InTrace, WebSequenceDiagrams is easy to access and free to use. Besides, it provides the plug-ins and APIs, which is convenient for users to import the tool as an external resource. The syntax for generating sequence diagrams is very simple. It is straight-forward and users could grasp the grammar in a short time. The product produced by WebSequenceDiagrams is highquality and the generation time is short.

Besides these advantages in WebSequenceDiagrams, another important reason why WebSequenceDiagrams was chosen is the fact that the input of WebSequenceDiagrams is textualbased, which is consistent with the first tool chosen for TTExTS. The nature of textual trace collection and textual trace used for diagram generation makes WebSequenceDiagrams a good choice for this work.

Fig. 5 E XECUTION T RACES OBTAINED BY I N T RACE FOR CODE SNIPPET ABOVE 1 https://www.websequencediagrams.com/

ISBN: 1-60132-489-8, CSREA Press ©

Int'l Conf. Software Eng. Research and Practice | SERP'18 |

By employing existing API’s provided by WebSequenceDiagrams, the diagram generation function can be embed in TTExTS. In this step, the input is execution traces after transformation and merging, and the output is a UML sequence diagram.

7. Case Study In order to test the feasibility of the proposed framework, a case study was conducted. The case selected for this study is a simple Java program (2 classes interact with each other through four different methods) with a structure like this: if Situation 1 A else Situation 2 B C D By checking processing time, TTExTS was found to spend ˜700ms on trace collection, ˜70ms on trace transformation, ˜80ms on trace merging and ˜300ms on generation of the UML sequence diagram. The result of this case study is shown in Figure 5. The result shows that the proposed framework can accomplish the reversing of UML sequence diagrams from system execution. The processing time is reasonable and the diagram quality is high. However, the size of the testing case in this study is limited and it just proves the feasibility of the proposed framework. In order to mitigate this weakness, the framework is currently undergoing more complicated system testing.

87

8.1.2 System Testing This is the main threat to the validity of this work. In this paper, TTExTS is evaluated by case studies with limited size. However, for more realistic scenarios, the cases may be more complicated. The test cases in this study are only showning the feasibility of the prototype. To mitigate this, more complicated system testings are currently being evaluated. 8.1.3 Performance Evaluation In this paper, although it was proved that TTExTS can accomplish the reversing of UML sequence diagrams dynamically, no comparisons are made between the proposed prototype and other existing tools, such as MaintainJ and JavaCallTracer. Therefore, the performance of the TTExTS framework cannot be evaluated. To mitigate this limitation, performance comparisons with similar state-of-the-art techniques are planned for future work.

8.2 Future Work Future work will span three directions - system improvement, system evaluation, and system integration. 8.2.1 System improvement

8.1 Limitations

System improvement consists of two aspects: system testing and algorithm optimization. In this paper, testing work was done to TTExTS but with limited case size. In the future, more complicated testing cases will be employed to test the proposed framework. Additionally, algorithm optimization will be explored in future work. For this paper, string iterations are widely used in the prototype implementation. Though it can achieve the processing goals, it may suffer from low efficiency. In the future, better algorithms will be investigated to optimize the prototype.

In this section, some limitations associated with the major steps in this study will be discussed.

8.2.2 System evaluation

8. Discussion

8.1.1 Tool Selection The external tools selected for this study, InTrace and WebSequenceDiagrams, prove to be able to finish the task of the reversing of UML sequence diagrams. However, there may still exist better tools to accomplish the goal. The principal criteria for tool selection in this study are easy of access and free to use. These standards may sacrifice the performance and quality. The case study conducted for this proposed framework shows that the trace collection process occupies over 50% of all the processing time. However, this limitation is hard to mitigate in the context of this study because TTExTS needs the support of these external tools.

A very necessary future work may be the evaluation of the proposed system compared with similar tools (such as MaintainJ, JavaCallTracer) when testing the same programs. Comparisons between different tools could provide a better understandings of the advantages and weaknesses of the proposed framework. Comparisons may include indicators such as processing time and product quality, etc. 8.2.3 System integration In order to be more accessible for developers, it is intended that the proposed system will be integrated into a plug-in for Eclipse IDE in the future. Plug-ins are convenient for users to employ external tools into their own systems, especially for starters. Besides, Eclipse provides a powerful plug-in

ISBN: 1-60132-489-8, CSREA Press ©

88

Int'l Conf. Software Eng. Research and Practice | SERP'18 |

management system which allows users to download and use different kinds of plug-ins easily and free.

9. Conclusion In this paper, a framework for dynamic reversing of UML sequence diagrams was proposed. By adopting existing tools, InTrace and WebSequenceDiagrams, a novel approach to extracting UML sequence diagrams from system execution was built. The case study shows that the proposed framework could solve the reversing process in an acceptable time and provide high quality sequence diagrams as final results.

References [1] L. C. Briand, Y. Labiche, and Y. Miao, “Towards the reverse engineering of uml sequence diagrams.” in wcre, vol. 3, 2003, p. 57. [2] R. Oechsle and T. Schmitt, “Javavis: Automatic program visualization with object and sequence diagrams using the java debug interface (jdi),” in Software Visualization. Springer, 2002, pp. 176–190. [3] D. Lo, S. Maoz, and S.-C. Khoo, “Mining modal scenario-based specifications from execution traces of reactive systems,” in Proceedings of the twenty-second IEEE/ACM international conference on Automated software engineering. ACM, 2007, pp. 465–468.

[13] S. Munakata, T. Ishio, and K. Inoue, “Ogan: Visualizing object interaction scenarios based on dynamic interaction context,” in Program Comprehension, 2009. ICPC’09. IEEE 17th International Conference on. IEEE, 2009, pp. 283–284. [14] J. K.-Y. Ng, Y.-G. Gu´eh´eneuc, and G. Antoniol, “Identification of behavioural and creational design motifs through dynamic analysis,” Journal of Software Maintenance and Evolution: Research and Practice, vol. 22, no. 8, pp. 597–627, 2010. [15] F. Taiani, M.-O. Killijian, and J.-C. Fabre, “Cosmopen: dynamic reverse engineering on a budget. how cheap observation techniques can be used to reconstruct complex multi-level behaviour,” Software: Practice and Experience, vol. 39, no. 18, pp. 1467–1514, 2009. [16] Y. Imazeki and S. Takada, “Reverse engineering of sequence diagrams from framework based web applications,” in 13th IASTED International Conference on Software 13th IASTED International Conference on Software Engineering and Applications, SEA 2009, 2009. [17] A. Serebrenik, S. Roubtsov, E. Roubtsova, and M. van den Brand, “Reverse engineering sequence diagrams for enterprise javabeans with business method interceptors,” in Reverse Engineering, 2009. WCRE’09. 16th Working Conference on. IEEE, 2009, pp. 269–273. [18] S. Roubtsov, A. Serebrenik, A. Mazoyer, M. van den Brand, and E. Roubtsova, “I2sd: reverse engineering sequence diagrams enterprise java beans from with interceptors,” IET software, vol. 7, no. 3, pp. 150– 166, 2013.

[4] T. Ziadi, M. A. A. da Silva, L. M. Hillah, and M. Ziane, “A fully dynamic approach to the reverse engineering of uml sequence diagrams,” in Engineering of Complex Computer Systems (ICECCS), 2011 16th IEEE International Conference on. IEEE, 2011, pp. 107–116.

[19] F. Wang, H. Ke, and J. Liu, “Towards the reverse engineer of uml2. 0 sequence diagram for procedure blueprint,” in Software Engineering, 2009. WCSE’09. WRI World Congress on, vol. 3. IEEE, 2009, pp. 118–122.

[5] B. Cornelissen, A. Zaidman, D. Holten, L. Moonen, A. van Deursen, and J. J. van Wijk, “Execution trace analysis through massive sequence and circular bundle views,” Journal of Systems and Software, vol. 81, no. 12, pp. 2252–2268, 2008.

[20] S. P. Reiss and M. Renieris, “Encoding program executions,” in proceedings of the 23rd International Conference on Software Engineering. IEEE Computer Society, 2001, pp. 221–230.

[6] D. C. I. Charles, K. A. Shields, and I. Preston Pengra Briggs, “Parallelism performance analysis based on execution trace information,” Nov. 1 2005, uS Patent 6,961,925. [7] F. Doray and M. Dagenais, “Diagnosing performance variations by comparing multi-level execution traces,” IEEE Transactions on Parallel and Distributed Systems, 2016. [8] D. Amalfitano, A. R. Fasolino, and P. Tramontana, “Rich internet application testing using execution trace data,” in Software Testing, Verification, and Validation Workshops (ICSTW), 2010 Third International Conference on. IEEE, 2010, pp. 274–283. [9] B. Cornelissen, A. Zaidman, A. Van Deursen, L. Moonen, and R. Koschke, “A systematic survey of program comprehension through dynamic analysis,” IEEE Transactions on Software Engineering, vol. 35, no. 5, pp. 684–702, 2009. [10] M. H. Alalfi, J. R. Cordy, and T. R. Dean, “Automated reverse engineering of uml sequence diagrams for dynamic web applications,” in Software Testing, Verification and Validation Workshops, 2009. ICSTW’09. International Conference on. IEEE, 2009, pp. 287–294. [11] A. Zaidman, N. Matthijssen, M.-A. Storey, and A. Van Deursen, “Understanding ajax applications by connecting client and server-side execution traces,” Empirical Software Engineering, vol. 18, no. 2, pp. 181–218, 2013. [12] T. Ishio, Y. Watanabe, and K. Inoue, “Amida: A sequence diagram extraction toolkit supporting automatic phase detection,” in Companion of the 30th international conference on Software engineering. ACM, 2008, pp. 969–970.

[21] Y. Labiche, B. Kolbah, and H. Mehrfard, “Combining static and dynamic analyses to reverse-engineer scenario diagrams,” in Software Maintenance (ICSM), 2013 29th IEEE International Conference on. IEEE, 2013, pp. 130–139. [22] P. Dugerdil and J. Repond, “Automatic generation of abstract views for legacy software comprehension,” in Proceedings of the 3rd India software engineering conference. ACM, 2010, pp. 23–32. [23] A. Hamou-Lhadj and T. Lethbridge, “Summarizing the content of large traces to facilitate the understanding of the behaviour of a software system,” in Program Comprehension, 2006. ICPC 2006. 14th IEEE International Conference on. IEEE, 2006, pp. 181–190. [24] D. Myers, M.-A. Storey, and M. Salois, “Utilizing debug information to compact loops in large program traces,” in Software Maintenance and Reengineering (CSMR), 2010 14th European Conference on. IEEE, 2010, pp. 41–50. [25] A. Vasudevan, N. Qu, and A. Perrig, “Xtrec: Secure real-time execution trace recording on commodity platforms,” in System Sciences (HICSS), 2011 44th Hawaii International Conference on. IEEE, 2011, pp. 1–10. [26] M. Xu, V. Malyugin, J. Sheldon, G. Venkitachalam, B. Weissman et al., “Retrace: Collecting execution trace with virtual machine deterministic replay,” in In Proceedings of the 3rd Annual Workshop on Modeling, Benchmarking and Simulation, MoBS. Citeseer, 2007. [27] E. M. McCreight, “A space-economical suffix tree construction algorithm,” Journal of the ACM (JACM), vol. 23, no. 2, pp. 262–272, 1976.

ISBN: 1-60132-489-8, CSREA Press ©

Int'l Conf. Software Eng. Research and Practice | SERP'18 |

89

Integrating Software Testing Standard ISO/IEC/IEEE 29119 to Agile Development Ning Chen1, Ethan W. Chen1 and Ian S. Chen2 Department of Computer Science, California State University, Fullerton, California, USA 2 Raytheon Company, Tucson, Arizona, USA

1

Abstract - The IEEE standard 29119 on Software and Systems Engineering - Software Testing which replaces an older standard of IEEE Std 829 and others is designed with the need of agile process in mind. It provides an explanation on Agile projects and some suggestions on integrating the standard to Agile process. Nevertheless, integrating the standard to Agile, still is not that straightforward and may need further elaboration. This paper addresses the following issues: the needs for a testing standard, the mechanism that integrates the testing standard to Agile, and on how to tailor the conformance to a proper level that involves the maturity levels of the Test Maturity Model Integration (TMMi). The paper concludes with a suggested tailored conformance plan that is suitable for a typical Agile Development. Keyword: IEEE Std 29119, Software Testing, conformance, Agile, TMMi

1

Introduction

Testing is an integral part of the software development process. ISO/IEC/IEEE 29119 is a relatively new standard for software testing with the most recent part published in 2016 [1]. As a new international standard, IEEE 29119 becomes the benchmark upon which testing processes are measured against. Organizations are then influenced to adhere to the standard to some degree and implement IEEE 29119 conformance within their existing software development process. Agile methodology is one of the most popular software development philosophies and widely used. Per a recent 2017 survey, 94% of respondents claimed that their organizations practiced Agile [2]. As a result, many companies will face or are currently facing the task of implementing IEEE 29119 within their Agile process. One of the most notable standards that IEEE 29119 is replacing is IEEE 829-2008, which describes test documentation standards [3]. The old standard, IEEE 829, is deeply rooted in the traditional Waterfall development lifecycle, but is not entirely at odds with implementation into Agile [4]. Though IEEE 29119 was written with more consideration for Agile in comparison to IEEE 829, it does not clearly describe how it can be implemented into Agile. The question then arises of whether IEEE 29119 can be easily integrated into Agile and how.

Though IEEE 29119 does not necessarily conflict with Agile, it is possible that full conformance to the standard, as defined in IEEE 29119, may be too cumbersome for proper Agile methodology. This produces a conundrum where extensive adherence to testing standards is required to be fully compliant with IEEE 29119, but in doing so would violate Agile principles. At the same time, it would not be sensible to completely forgo any testing standards in Agile. Some industry surveys estimate that testing comprises up to 30 to 50 percent of the total cost of development [5]. As a result, some testing standard is desired in order to properly exhibit the results of and to justify the expense invested in testing. Though IEEE 29119 provides some leeway in this problem with the option of tailored conformance, the implementation of tailored conformance is arguably too loose. Tailored conformance in IEEE 29119 is defined to be completely dependent on a specific organization and/or project needs, and thus loses its utility as a standard. Since Agile holds significant market-share in the community, a standard adoption of IEEE 29119 within Agile is needed to maintain parity between different organizations. If no Agile standard of IEEE 29119 is available, implementations of IEEE 29119 tailored conformance across multiple organizations may be too radically different to be properly compared. This paper starts with an analysis of IEEE 29119 and Agile methodology. We then explore the need for a testing standard, propose an integration of IEEE 29119 within Agile, and describe IEEE 29119 conformance maturity levels related to the Test Maturity Model Integration (TMMi).

2

What is IEEE Std 29119?

IEEE 29119 is an internationally agreed set of standards with the purpose of supporting software testing. IEEE 29119 consists of five different parts: 1. 2. 3. 4. 5.

IEEE 29119-1: Concepts & Definitions IEEE 29119-2: Test Processes IEEE 29119-3: Test Documentation IEEE 29119-4: Test Techniques IEEE 29119-5: Keyword Driven Testing

IEEE 29119 is intended to replace the following existing standards for software testing:

ISBN: 1-60132-489-8, CSREA Press ©

90

Int'l Conf. Software Eng. Research and Practice | SERP'18 |

o o o o

IEEE 829 Test Documentation IEEE 1008 Unit Testing BS 7925-1 Vocabulary of Terms in Software Testing BS 7925-2 Software Component Testing Standard

IEEE 29119-1: Concepts & Definitions – This document correlates to BS 7925-1 and serves as a glossary of definitions of testing terms and concepts used in the overall IEEE 29119 standard.

their situation. Lastly, IEEE 29119-5 defines a modular approach to describing test cases through the Keyword-Driven Testing framework which lends itself to test automation and is a specific implementation of the test process model described in IEEE 2119-2. Similarly, to the rest of IEEE 29119, the testing framework itself is described in detail, but the specific integration of the framework into different types of software development lifecycle models is not.

IEEE 29119-2: Test Processes – This document describes software testing processes at multiple levels. The processes are meant to be generic as to be able to be implemented in any organization for any kind of software testing and for any type of software development life cycle model. Most notably, IEEE 29119-2 focuses on a risk-based approach to testing with the primary goal of risk mitigation. The standard test process model described is multi-layered with a top level organizational test process, a middle level test management process, and a bottom level of dynamic test processes [Figure 1]. IEEE 29119-3: Test Documentation – This part of IEEE 29119 provides templates and examples of test documentation. These work products are aligned with the test processes described in IEEE 29119-2 and cover the organizational level, test management level, and dynamic test level. Examples and suggestions are provided for both Agile and traditional software development life cycle models. IEEE 29119-4: Test Techniques – A description of software test design techniques that align with the test processes described in IEEE 29119-2 is given. The document describes how to derive test conditions, test coverage items, and test cases.

Figure 1. IEEE 29119 Multi-layer Test Process Model

3

What is Agile Development?

Agile development is a broad term encompassing a set of practices that promote the values espoused in the Agile Manifesto [6]. From the Agile Manifesto [Figure 2] are derived the Twelve Principles of Agile that make up the Agile philosophy. In total, Agile is meant to enhance the ability to effectively respond to the constant change present in the software development process. Agile advocates continuous, small incremental releases of working software. Most importantly Agile describes the methods, or the “how”, in software development (e.g. Scrum, Kanban).

IEEE 29119-5: Keyword Driven Testing – The most recently published portion of IEEE 29119 describes the usage of keyword-driven testing as a strategy of modular testing for usage of test automation frameworks. IEEE Std 29119 lays out a set of requirements to be considered as adherent to the standard. Though IEEE 29119 provides examples of how test documentation might look like in Agile or traditional software development lifecycle models (found in IEEE 29119-3), it does not provide specific instruction on how IEEE 29119 should be implemented. Regarding test process, IEEE 29119-2 sets standards of what forms of test processes should be defined at multiple levels [Figure 1] but no specifics on integration into various software development lifecycle models. Similarly, IEEE 29119-3 describes what work products in the form on documentation should be produced throughout the testing process, but provides minimal guidance on how it should be implemented. IEEE 29119-4 describes various test techniques that can be used, but does not specify any particular requirements as to which are universally needed. Rather the user is left to pick and choose which techniques are appropriate and suitable for

Figure 2. The Agile Manifesto

3.1

Software Testing in Agile Development

In Agile Development, software testing is performed differently in comparison to traditional testing methodology like Waterfall Model. Instead of testing performed as a separate and distinct phase after coding, Agile software testing is performed in concurrence with coding. Furthermore, testing is integrated into the development team as well as relevant stakeholders and the distinction between developers and testers is blurred. In Agile methods, there is no true definitive or standard testing strategy. Though Test Driven Development (TDD) is

ISBN: 1-60132-489-8, CSREA Press ©

Int'l Conf. Software Eng. Research and Practice | SERP'18 |

91

an extremely popular method of software testing in Agile, it is not required, strictly speaking, to adhere to the Agile Manifesto. The first line of the Agile Manifesto describes this succinctly: "Individuals and interactions over processes and tools". Agile does not technically prescribe any specific testing processes only that the software development process remains responsive to change. Literature has expanded on this topic and produced recommendations and trends on best practices of implementing testing into Agile [7] [8].

development lifecycle. What Agile does not specify is what processes should be carried out or what work-products should be produced throughout the testing process. In that sense, testing standards do not conflict with Agile development and instead can be complementary to Agile. The question then becomes how should testing standards, in particular IEEE 29119, be implemented in Agile environments.

3.2

Testing in Agile development aims at different goals than that of traditional iteration life process models (e.g., Water Fall) [7]. There are no firm requirements at the beginning and typically no solid, fixed architecture plan either. Furthermore, one of the important goals in each iteration (e.g., Sprint) is to have a “working software”. The term “working software” implies the existing of requirements and the process of verifying those requirements even they are very informal, not in the form of formal documentation. This opens the idea that if the development activity can be iterative and incremental, then the forming of test plans, test documentation, and test tool policy can also be iterative and incremental [4] [9]. We present an effort in forming of plans, documentation, and policy in a Scrum setting as shown below.

The Role and Need of Testing Standard in Agile Development

Agile, in a strict interpretation, does not prescribe or require any sort of testing standard. This is not to say that a testing standard would not be valuable in an Agile environment. A standard provides common ground upon where the community can agree upon what practices are necessary to say that the overall testing process is sound. Individuals and interactions over processes and tools Though the Agile Manifesto does prefer individuals and interactions over processes, that does not mean that testing processes are not needed. In the end, having standardized testing processes and organization provides structure to the overall testing effort. Utilizing standard processes also does not run contrary to Agile as long as it remains flexible and does not become rigid to the point that responsiveness to change is affected, thus violating Agile philosophy. Working software over comprehensive documentation Though Agile does explicitly value "Working software over comprehensive documentation", the total absence of any testing documentation is not desirable. Some documentation is desired to have records that can demonstrate testing was adequately performed. At the same time, excessive documentation would run contrary to Agile. Agile organizations then must individually determine what level of documentation is needed to satisfy their own internal documentation requirements while remaining Agile. With such a broad specification, a testing standard would be useful here. Customer collaboration over contract negotiation - Though this particular part of the Agile Manifesto does not explicitly involve testing, customer collaboration can be an important part of the Agile testing process. For example, Acceptance Test Driven Development, where customer engagement is key to testing, can be implemented into Agile though it is not necessarily a standard. Responding to change over following a plan - Similarly, while more value is placed on the capability to respond to change, this does not mean that a plan has no value. A test plan does provide value in an Agile environment as long as it does not hinder responsiveness to change. Agile philosophy describes how testing should be implemented. It should be continuous and integrated into the

4

4.1

How to Integrate?

Sprints in Scrum

We briefly describe the Scrum process first. The Scrum development is done in time-boxed efforts called Scrum sprints. As the business requirements evolve incrementally, the Product Backlog (e.g., user stories) forms. Each sprint aims for reducing some project backlog in one to four weeks of time. Although the business requirements typically focus on feature and functionality of the software product, it is possible to treat the need to have processes, documentation, techniques and automation related keyword driven testing framework as part of the product backlog and allocate special sprints, for example, sprint zero or other sprints, for this purpose.

4.2

Processes, Documentation, Techniques and Keyword Driven Framework

Part 1 of 29119 describes the concepts and definitions and is informative. Part 2 to Part 5 involve the conformance clauses. Part 2 is related to test processes and has three major sections: Organization Test Process, Test Management Processes, and Dynamic Test Processes. An easy interpretation of those processes is that we have some steps of doing things (i.e., activities) for some purpose. If we record the way of doing things and their outcomes, documentation typically is a natural artifact. Documentation logically leads to the Part 3 of 29119. Part 3 requires three sub-sections: Organizational Test Process Documentation, Test Management Processes Documentation, and Dynamic Test Processes Documentation. Part 4 of 29119 describes the test techniques (e.g., how one executes the tests). Part 5 of 29119 is titled Keyword Driven Testing. It is designed to pave the ground work for test automation. Maybe naming it “Test Automation Framework Based on Keywords” is more intuitive and makes it more Agile friendly since most Agile projects, if

ISBN: 1-60132-489-8, CSREA Press ©

92

Int'l Conf. Software Eng. Research and Practice | SERP'18 |

not all, involve test automation. We decide to list relevant section names of Parts 2 to 5 below just to show the first impression that most people get – quite massive and almost for certain anti-Agile!

Processes (29119 Part 2) Organizational Test Process Develop Organizational Test Specification (OT1) Monitor and Control Use of Organizational Test Specification (OT2) Update Organizational Test Specification (OT3)

Test Management Processes Test planning Understand Context (TP1) Organize Test Plan Development (TP2) Identify and Analyze Risks (TP3) Identify Risk Mitigation Approaches (TP4) Design Test Strategy (TP5) Determine Staffing and Scheduling (TP6) Record Test Plan (TP7) Gain Consensus on Test Plan (TP8) Communicate Test Plan and Make Available (TP9) Test monitoring and control Set-Up (TMC1) Monitor (TMC2) Control (TMC3) Report (TMC4) Test completion Archive Test Assets (TC1) Clean Up Test Environment (TC2) Identify Lessons Learned (TC3) Report Test Completion (TC4)

Dynamic Test Processes Test design & implementation Identify Feature Sets (TD1) Derive Test Conditions (TD2) Derive Test Coverage Items (TD3) Derive Test Cases (TD4) Assemble Test Sets (TD5) Derive Test Procedures (TD6) Test environment set-up & maintenance Establish Test Environment (ES1) Maintain Test Environment (ES2) Test execution Execute Test Procedure(s) (TE1) Compare Test Results (TE2) Record Test Execution (TE3) Test incident reporting Analyze Test Result(s) (IR1) Create/Update Incident Report (IR2)

Documentation (29119 Part 3) Organizational Test Process Documentation Test Policy Document specific information Overview Unique identification of document

Issuing organization Approval authority Change history Introduction Scope References Glossary Test policy statements Objectives of testing Test process Test organization structure Tester training Tester ethics Standards Other relevant policies Measuring the value of testing Test asset archiving and reuse Test process improvement Organizational Test Strategy Document specific information Overview Unique identification of document Issuing organization Approval authority Change history Introduction Scope References Glossary Project-wide organizational test strategy statements Generic risk management Test selection and prioritization Test documentation and reporting Test automation and tools Configuration management of test work products Incident management Test sub-processes Test sub-process-specific organizational test strategy statements Entry and exit criteria Test completion criteria Test documentation and reporting Degree of independence Test design techniques Test environment Metrics to be collected Retesting and regression testing

Test Management Processes Documentation Test Plan Overview Document specific information (unique id, issuing organization…) Introduction (scope, references, glossary…) Context of the testing (projects/test sub-processes, test items, test scope, assumptions and constraints…) Testing communication Risk register (product risks, project risks, …) Test strategy (test sub-processes, test deliverables, …) Testing activities and estimates

ISBN: 1-60132-489-8, CSREA Press ©

Int'l Conf. Software Eng. Research and Practice | SERP'18 |

Staffing (roles, activities, and responsibilities, hiring needs…) Test Status Report Overview Document specific information (unique id, issuing organization…) Scope Test status (reporting period, progress against test plan, …) Test Completion Report. Overview Document specific information (unique id, issuing organization…) Introduction (scope, references…) Test performed (summary, deviations…)

Dynamic Test Processes Documentation Test design specification Test case specification Test procedure specification Test data requirements Test environment requirements Test data readiness report Test environment readiness report Actual results Test results Test execution log Test incident reporting

93

Manual test assistant (tool) (a) The manual test assistant shall support manual test execution based on the defined test cases. … Tool bridge (a) The tool bridge shall provide the test execution engine with the appropriate execution code to execute the test cases. … Test execution engine (a) Keywords that do not express conditions or loops within a test case shall be executed sequentially starting with the first keyword. … Keyword library (a) The keyword library shall support the definition of keywords that includes the basic attributes of name, description and parameters. … Script repository (a) The script repository shall support the storage of keyword execution code. …

4.3

Test Techniques (29119 Part 4) “An organization could choose to conform to one technique, such as boundary value analysis. In this scenario, the organization would only be required to provide evidence that they have met the requirements of that one technique…” Identify Feature Sets (TD1) Derive Test Conditions (TD2) Derive Test Coverage Items (TD3) Derive Test Cases (TD4)

Keyword Driven Testing Framework (29119 Part 5) Basic attributes General attributes (a) There shall be documentation recorded describing each keyword. (b) There shall be documentation recorded for the parameters of each keyword. … Dedicated keyword-driven editor (tool) (a) Within the keyword-driven editor, non-composite keywords shall be displayed with their associated actions. (b) For keywords which have been defined with lower level keywords, the user shall be able to access this definition within the keyword-driven editor.

… Decomposer and data sequencer (a) The decomposer shall be able to process parameters, including assuring that the parameters associated with the higher-level keywords are decomposed and associated with the lower-level keywords. …

Is It Really Anti-Agile?

The first impression sometimes may not be always correct. Is Std 29119 really Anti-Agile? Let’s take a look at the Processes part (29119 Part 2). Part 2 has three sub-parts (Organizational Test Process, Test Management Processes and Dynamic Test Processes). The sub-part of Test Management Processes is divided further into Test planning, Test monitoring and control, and Test completion. The sub-part of Dynamic Test Processes is divided further into Test design & implementation, Test environment set-up & maintenance, Test execution, and Test incident reporting. At this point a novice user of 29119 is probably overwhelmed by a plethora of similar terms. For convenience, Part 2 of 29119 provides abbreviations for all required sections as: OT1 to OT3, TP1 to TP9, TMC1 to TMC4, TD1 to TD6, ES1 to ES2, TE1 to TE3 and IR1 to IR2. To make the “process” even more heavy, not lighter, one may soon realize that the Dynamic Test Processes need to be repeated multiple times, for example, each time for unit, integration, system and acceptance test. This means At least four times of (TD1 to TD6, ES1 to ES2, TE1 to TE3 and IR1 to IR2). We think the abbreviations actually provide a hint that points to a possible light-weight usage of the standard – we can view them as a “pre-flight checklist.” It may take a competent pilot having years of training to be familiar with the pre-flight checklist. Nevertheless, the actual checking of the pre-flight checklist by a competent pilot only takes a few minutes. We can also use the Agile Development itself as an analogy. Although it takes four years of training to train a competent software developer for most of the professionals who hold B.S. degree in Computer Science, it only takes a short sprint for those professionals to produce a small but working

ISBN: 1-60132-489-8, CSREA Press ©

94

Int'l Conf. Software Eng. Research and Practice | SERP'18 |

software product. By the same logic, it is hard not to accept that a well-trained tester can quickly finish the “pre-flight checklist” of the test processes. We envision that Part 2 of 29119 can be realized in a form of couple pages of tables in Sprint Zero. Its contents can be incrementally filled and improved in the subsequent sprints.

4.4

Managed, Defined, Measured, and Optimized as depicted in Fig. 3.

How About Part 3 of 29119 (Documentation)?

Part 3 of 29119 requires the following three sub-parts: Organizational Test Process Documentation, Test Management Processes Documentation, and Dynamic Test Processes Documentation. How do we integrate Part 3 to Agile Development? Before we answer that question, it is probably apparent that those three sub-parts of documentation match exactly to the three sub-parts of Part 2 of 29119 (Processes). This observation may provide a hint on how we integrate Part 3 of 29119 to Agile Development. Part 3 (Documentation) is just a natural “result” of Part 2 (Processes). For example, when one executes the “pre-flight checklist” of TP1 to TP9 (Test Planning under Test Management Processes), a natural result will be Test Plan (one sub-part of Test Management Processes Documentation of Part 3 of 29119). We believe that it is highly feasible to have a second set of tables (just a couple of pages) that match the “pre-flight checklist/process” for briefly recording the results of the activities of the “pre-flight checking” (processes).

4.5

How About Parts 4 (Test Technique) and 5 (Keyword Driven Testing) of 29119?

Parts 4 and 5 of 29119 cover Test Technique and Keyword Driven Testing. For Agile Development it is relatively easy to conform the requirements of Parts 4 and 5. The reason is that any Agile Development project involves at least one typical test technique (for example, Boundary Value Testing) and this almost automatically fulfills the requirements of Part 4. Furthermore, most of the Agile projects, if not all, employs the use of automated test tools (proprietary or open source) that most likely fit into the Keyword-Driven Testing Framework as outlined in Part 5 of 29119.

5

5.2

Why Add TMMi in Integrating 29119 to Agile Development?

At its core value TMMi is not a standard and it does not impose rules, nor it demands “proof of compliance.” TMMi’s main goal is to encourage “changes in testing” that gives value to the business objectives [11]. IEEE 29119 indeed is a standard and it does demand proof of compliance. Obviously, there are no fundamental conflicts between TMMi and 29119. Furthermore, TMMi does not conflict with Agile Development, simply because, again, the former, same as 29119, focuses on the what problem and the latter mainly aims at the how issue. One may still question the need to bring TMMi into the picture. What is the main value that TMMi brings to the 29119 and Agile Development integration? We think the staged maturity model gives discrete, clear, and reachable targets that encourage changes in testing to increase values in business. The discrete, clear, and reachable targets also encourage Agile Development to seek conformance of Std 29119 in tailored, incremental steps which fit naturally into the Agile mentality.

6

Integration Example

In this section we propose an integration example (tailored conformance) of 29119 and Agile Development that targets the TMMi Level 2 maturity model (Managed). TMMi Level 2 Managed targets five process areas (PAs) and their corresponding goals (called Specific Goals, SP) as below: Test Police and Strategy (Establish a Test Policy, establish a Test Strategy, and Establish Test Performance Indicators)

Conformance

Even readers who accept our argument that the heavy processes and documentation can be “not-that-heavy-at-all”, a lingering question remains – Is there a proper level of tailored conformance, since full conformance may not be desirable? Most professionals may respond that it depends! We envision that another line of research Test Maturity Model Integration (TMMi) [10] may play a role in answering this question.

5.1

Fig. 3 TMMi Model [10]

What is TMMi?

TMMi [10], same as its older sister CMMi (Capacity Maturity Model Integration), has five levels of maturity: Initial,

Test Planning (Perform Risk Assessment, Establish a Test Approach, Establish Test Estimates, Develop a Test Plan, Obtain Commitment to the Test Plan) Test Monitoring and Control (Monitor Test Progress Against Plan, Monitor Product Quality Against Plan and Expectations, and Manage Corrective Actions to Closure) Test Design and Execution (Perform Test Analysis and Design using Test Design Techniques, Perform Test Implementation, Perform Test Execution, and Manage Test Incidents to Closure) Test Environment (Develop Test Environment Requirements, Perform Test Environment Implementation, and Manage and Control Test Environments)

ISBN: 1-60132-489-8, CSREA Press ©

Int'l Conf. Software Eng. Research and Practice | SERP'18 |

6.1

Tailored Conformance at TMMi Level 2 (Managed)

We map the TMMi Level 2 (Managed) process areas (PAs) and their specific goals (SPs) to Part 2 of 29119 in the table below: TMMi Level2 (Managed) Process Area (PA) Test Police and Strategy

Test Planning

Test Monitoring and Control

Test Design and Execution

Test Environment

Specific Goal (SP)

Part 2 of 29119 (Processes)

Establish a Test Policy Establish a Test Strategy Establish Test Performance Indicators Perform Risk Assessment Establish a Test Approach Establish Test Estimates Develop a Test Plan Obtain Commitment to the Test Plan Monitor Test Progress Against Plan Monitor Product Quality Against Plan and Expectations Manage Corrective Actions to Closure Perform Test Analysis and Design using Test Design Techniques Perform Test Implementation Perform Test Execution, and Manage Test Incidents to Closure Develop Test Environment Requirements Perform Test Environment Implementation, and Manage and Control Test Environments

TP2 TP5 TP5 TP3, TP4, TMC1 TMC1 TP6 TP2, TP7 TP8, TP9 TMC2, TMC4 TMC1, TMC2, TMC4 TMC3 TD1,2, 3

TD4, 5 TD6

ES1 ES2

Part 3 of 29119 can be fulfilled by the above corresponding processes. Part 4 and Part 5 of 29119, as discussed before, almost always are met by a typical Agile Development. We envision that this tailored conformance example may serve as a test template for most typical Agile projects.

7

95

answer will have room for improvement. On the other hand, it is hard to argue that there is no need to have a standard in a community in which a common platform is needed so that stakeholders have a common ground in conducting, for example, discussion, brainstorming, and evaluation. Our premise is that in Agile, although we value other characteristics more, we still need some processes, plans, documentation, and tools (i.e., it really cannot be zero, nothing). If our premise is valid, the next logical question then is “What is the proper amount of processes, tools, documentation, and plans suitable for Agile development?” At this point, one may suspect that there are no answers for that – it highly depends on the agile project at hand, and/or the organization, the client, the budget, etc. In this research we propose to bring another line of research, TMMi (Test Maturity Model Integration), into the picture. TMMi recommends five maturity levels: Initial, Managed, Defined, Measured and Optimized. The proper amount of processes, plans, documentation and tools in an Agile setting can be reasonably gauged by the maturity level as recommended by TMMi. The paper concludes with an example of tailored conformance of IEEE 29119 that targets TMMi maturity level 2 (i.e., Managed) and likely is suitable for most of the Agile projects.

8

References

[1] IEEE Std 29119, Part 1, Part 2, Part 3, Part 4 and Part 5. Available from http://ieee.org [2] “11th Annual State of Agile Report”. VersionOne. 2017. [3] IEEE 829-2008 IEEE Standard for Software Test documentation [4] Chen, N. "IEEE std 829-2008 and Agile Process - Can They Work Together?", 2013 International Conference on Software Engineering Research and Practice. 2013. [5] Burnstein, Ilene. "Practical Software Testing". NY: Springer-Verlag, 2003. [6] Agile Manifesto. Retrieved from http://agilemanifesto.org/

Conclusion

The IEEE standard 29119 on Software and Systems Engineering - Software Testing has five major parts: Concepts and Definitions, Test Processes, Test Documentation, Test Techniques and Keyword-Driven Testing. It was designed to suit the needs of different life cycle models, for example, Agile, Evolutionary and Sequential (i.e., the waterfall model). The life cycle model can be viewed as an answer to the how and when questions (e.g., how and when to perform some desirable activities). The 29119 standard can be viewed as an answer to the what questions (e.g., what are the desirable activities needed to be done so that it is not too far away from the norm of the community). For sure, it is very reasonable to argue that a standard may not be perfect, especially when the standard urges, if not dictates, everyone in the community to meet the requirement on the question of what needs to be done. The answer to the question on what needs to be done always is debatable and any

[7] van den Broek, R et al. "Integrating Testing into Agile Software Development Process". 2nd International Conference on Model-Driven Engineering and Software Development. 2014. [8] Murphy, B, et al, Have Agile Techniques been the Silver Bullet for Software Development at Microsoft? , ACM / IEEE International Symposium on Empirical Software Engineering and Measurement, 2013. [9] Jakobsen, C.R. and Sutherland, J. Scrum and CMMI: Going from Good to Great. In Agile Conference, 2009, pages 333–337. IEEE,2009. [10] TMMi Reference Model, Retrieved from http://tmmi.org [11] TMMi in the Agile World, ver. 1, TMMi Foundation, Retrieved from http://tmmi.org/

ISBN: 1-60132-489-8, CSREA Press ©

96

Int'l Conf. Software Eng. Research and Practice | SERP'18 |

ISBN: 1-60132-489-8, CSREA Press ©

Int'l Conf. Software Eng. Research and Practice | SERP'18 |

SESSION AGILE DEVELOPMENT AND MANAGEMENT Chair(s) TBA

ISBN: 1-60132-489-8, CSREA Press ©

97

98

Int'l Conf. Software Eng. Research and Practice | SERP'18 |

ISBN: 1-60132-489-8, CSREA Press ©

Int'l Conf. Software Eng. Research and Practice | SERP'18 |

99

Conformance to Medical Device Software Development Requirements with XP and Scrum Implementation 1

Ö. Özcan-Top1 and F. McCaffery1,2 Regulated Software Research Centre, Dundalk Institute of Technology & Lero, Dundalk, Ireland 2 STATSports Group, Newry, Ireland

Abstract – A key challenge of medical device software development companies is to maintain both conformance to the strict regulatory requirements enforced by the safety critical nature of the domain and achieve efficiency in software development. Agile software development methods provide promising solutions to overcome the efficiency issues and the challenges of traditional software development approaches. Even though Agile practices are being welcomed by the medical domain, their suitability for conformance to the regulatory requirements is still being questioned by the medical industry. In our previous work, we investigated to what extent the regulatory requirements defined in MDevSPICE® (the software process assessment framework for medical device software development) are met through using a Scrum implementation and what additional practices have to be performed to ensure safety and regulatory compliance in the medical domain. In this paper, we extended the research to include the XP method and provide a comprehensive and quantitative analysis of its suitability for medical device software development. Keywords: MDevSPICE®, Scrum, XP, Safety Critical Domain, Medical Domain, Agile Software Development

1

Introduction

Due to the potential risk of Medical Devices (MD) harming patients’, strict regulations need to be in place in development to ensure the safety of these devices. Depending on the region that a MD is to be marketed, different standards or guidance have to be followed. In the US, the Food and Drug Administration (FDA) issues the regulation through a series of official channels, including the Code of Federal Regulation (CFR) Title 21, Chapter I, Subchapter H, Part 820 [1]. In the EU, the corresponding regulation is outlined in the general Medical Device Directive (MDD) 93/42/EEC [2], the Active Implantable Medical Device Directive (AIMDD) 90/385/EEC [3], and the In-vitro Diagnostic (IVD) Medical Device Directive 98/79/EC [4] - all three of which have been amended by 2007/47/EC. The focus of this study is MD Software which is often an integral part of an overall medical device. IEC 62304:2006 [5] is the main medical device software development (MDSD) standard to establish the safety of medical device software by defining processes, activities and tasks for

development so that software does not cause any unacceptable risks. Whether medical device software is marketed in the USA or EU, the challenges associated with the development remain the same. Some of them are listed below: a) Adherence to a large number of regulatory requirements specified in various international standards [6]; b) Establishing a full traceability schema from stakeholder requirements to code [7, 8]; c) Performing changes to process artefacts (requirements, code, documents) in a traceable way [9, 10]; d) Being able to embrace change during development; e) Producing development evidence for auditory purposes consistently and continuously and managing the documentation process in an effective way so that it is not overwhelming; f) Ensuring reliability, safety and correctness of products; g) Improving the quality of products and productivity of teams; h) The necessity of clinical software validation to be done manually in some cases [10]. In relation to challenge (a), the MDevSPICE® framework [11], which was previously developed by our research group (RSRC), assists companies to efficiently prepare for the demanding and costly regulatory audits as it combines requirements from a wide number of medical software development and software engineering standards. To overcome the challenges listed from (b) to (h), usage of Agile Software Development (ASD) practices with a combination of traditional software development practices could provide significant improvements. In one of our previous studies [12], we evaluated one of the most preferred agile methods, Scrum , to understand the level of regulatory compliance when it is fully implemented as described in the Scrum GuideTM [13]. We performed the evaluation by mapping the Scrum roles and events with MDevSPICE® base practices. The mapping results indicated that Scrum implementation would provide full or partial coverage in only five MDevSPICE® Processes: Project Planning; Project Assessment and Control; Stakeholder Requirements Definition; System Requirements Analysis and

ISBN: 1-60132-489-8, CSREA Press ©

100

Int'l Conf. Software Eng. Research and Practice | SERP'18 |

Software Requirements Analysis. The significance of this study was that for the first time, the degree of compliance to the MDSD requirements for all of these processes was provided in a quantitative way. In this study, we aimed to extend this evaluation by including eXtreme Programming (XP) [14], an agile method which evolved from the issues caused by long development cycles of plan-driven methods. Unlike Scrum, it focuses more on the technical side of software development. Considering that both XP and Scrum have been used in organizations, the evaluation of their combination would provide a more comprehensive perspective to organizations in terms of their compliance to medical requirements. The second purpose of this research is to reveal additional practices that have to be performed to ensure compliance when a combination of XP and Scrum are implemented. The rest of the paper is structured as follows: In Section 2, we provide the background for this research which includes brief descriptions of MDevSPICE®, Scrum and XP. In Section 3, we describe the research methodology. In Section 4, we present the XP mapping analysis in great detail and discuss additional practices that have to be considered Additionally, we provide a summary of the Scrum mapping analysis which we published previously in [12] . Finally, in Section 5, we provide conclusions for this research.

2 2.1

Background MDevSPICE®

The challenge that medical software development companies face when they want to market a device is in the adherence to a large number of regulatory requirements specified in various international standards that can often be overwhelming. In order to help companies better prepare for the demanding and costly regulatory audits, we previously developed the MDevSPICE® framework [11]. MDevSPICE® is an integrated framework of medical device software development best practices. MDevSPICE® was based upon the ISO/IEC 15504/SPICE [15] (ISO/IEC 33000 series, now) process assessment standard and includes requirements from a wide number of medical software development and software engineering standards, some of which were mentioned in the introduction section. Requirements from different standards and guidance are reflected in the process reference model (PRM) which describes a set of processes in a structured manner through a process name, process purpose, process outcomes and base practices. A base practice, which is one of the main components of this study, an activity that addresses the purpose of a particular process. Figure 1 shows the all processes of MDevSPICE®.

Figure 1 MDevSPICE® processes

2.2

Scrum

Scrum is mainly a management model for software development, and was developed by Schwaber and Sutherland [13]. Although, use of technical practices was strongly supported by the creators of the model, it does not present any specific technical practices for implementation. The fundamental idea behind Scrum is to apply process control theory to software development to achieve flexibility, adaptability and productivity [16]. It relies on a set of values, principles and practices which can be adopted based on specific conditions. Scrum gives value on providing frequent feedback, embracing and leveraging variability, being adaptive, balancing upfront and just-in-time work, continuous learning, value-centric delivery and employing sufficient ceremony [17]. It offers effective solutions by providing specific roles, artifacts, activities and rules. A Scrum Team consists of a number of roles: Product Owner; a Scrum Master; and the Development Team. Scrum Teams are self-organizing and cross-functional so that they may accomplish their work by themselves, rather than being directed by others outside of the team and without depending on others that are not part of the team [13]. There are special events in Scrum which have been developed to create regularity and to minimize the need for meetings and are timeboxed.

2.3

eXtreme Programming (XP)

XP was developed by Kent Beck in 1999, it provides a collection of software engineering practices [14]. Even though the practices are not novel, XP brings them together to facilitate change and to produce higher quality software at a sustainable pace. XP is defined by values, principles and roles. Some of the fundamental practices of XP are planning game, small releases, metaphors, simple design, continuous unit testing, refactoring, pair programming, collective code ownership, continuous integration, work 40-hour-a-week, onsite customer and coding standards. XP does not provide much support for software project management activities [16]. The details of the XP practices are provided in the mapping section below.

ISBN: 1-60132-489-8, CSREA Press ©

Int'l Conf. Software Eng. Research and Practice | SERP'18 |

3

Research Approach

We applied the same research approach that we followed in [12]. The purpose of this research is to reveal to what extent the regulatory requirements defined in MDevSPICE® are met when highly adapted agile software development methods, XP and Scrum are implemented. We defined the following research questions in relation to this purpose: RQ1: Which processes of MDevSPICE® are covered by implementation of XP and Scrum? RQ2: Which base practices MDevSPICE® are covered by implementation of Scrum or XP? RQ3: What additional practices regarding those processes specified need to be performed in order to fully achieve regulatory compliance in the medical domain? Research steps 1. Listing XP and Scrum practices/events/roles and their descriptions 2. Mapping the MDevSPICE® base practices with XP and Scrum practices/events/roles. 3. Identifying which processes were affected from the mapping. 4. Identifying the coverage ratio and deciding which MDevSPICE® base practices need to be included for those processes to satisfy a fully-achieved level.

101

example, “sit together” vs “continuous integration” or “customer involvement” vs “incremental deployment” practices. As the definitions of the practices are limited, we needed to make some assumptions during the mapping.

4

The Systematic Mapping Process

In [12], we provided a very detailed analysis of the mapping of Scrum’s activities, roles and MDevSPICE®’s processes and base practices. Due to space limitations, we will summarize the analysis of the Scrum mapping and detail the XP mapping. For both of the XP and Scrum mappings and the coverage evaluation, MDevSPICE® Class B requirements were taken into account. Due to the descriptive characteristics of the methods mentioned above, we assumed that process artifacts such as project plans or project monitoring reports would be developed during XP and Scrum implementation, as the evidence for audits need to be collected. Although it is very likely that some base practices would be performed during software development with Scrum or XP, we couldn’t rate a 100% coverage for them, as they might not be performed at the level of the detail that is required by MDevSPICE®. The coverage ratio (CR) is calculated based on the formula of: å %& '() *+(,)-). /*0) 12*+',+)0 ,3 * 12%+)00

𝐶𝑅 = '%'*4 356/)2 %& '() /*0) 12*+',+)0 ,3 * 12%+)00

(1)

Abrahamsson et al. provide a comparison of different agile software development methods in [16] and specify which phases of software development life cycle were supported by these methods. Based on this research, Scrum covers project management, requirements specification, integration test and system test stages/activities. XP, on the other hand, presents solutions for requirements specification, design, code, unit test, integration test and system test stages/activities.

In this evaluation, the base practices (BPs) are considered either partially or fully achieved. For the calculation of the å of the achieved base practices in a process, partially achieved (PA) practices were weighted by 0.5, while fully achieved (FA) BPs were weighted by 1.

For the mapping we performed, instead of initially selecting the processes mentioned above, and then checking their coverage within MDevSPICE®, we performed the mapping the other way around. We first listed the XP and Scrum practices and then mapped them to the MDevSPICE® base practices. With this approach we were able to identify which MDevSPICE® processes were satisfied through adopting XP and Scrum implementations.

The XP method [14] is described in terms of roles, values, principles and practices. The roles in an XP team are testers, interaction designers, architects, project managers, product managers, executives, technical writers, users, programmers, human resources. They are suggested to have a flexible structure rather than being fixed and rigid.

Limitations of the Research With the given descriptions of XP, we note it as a descriptive method in which the practices are described from a high abstraction level. Compared to XP, Scrum could be taken as a prescriptive method with the descriptions of how the Scrum events will be performed and how the artifacts will be developed. However, both of them were not at the practice description level provided by MDevSPICE®. Mappings of the methods were limited to the information in the following resources: The Scrum GuideTM by Ken Schwaber and Jeff Sutherland [13] and the book: Extreme Programming Explained: Embrace Change by Kent Beck [14]. Although twenty-two XP practices were described to the same level of detail in the XP book [14], they show significant differences in terms of their characteristics. For

4.1

XP Mapping

The values which are “communication, simplicity, feedback, courage, respect, safety, security, predictability, and qualityof-life”, shape the teams’ behavior and the development environment. But, they don’t provide concrete guidance on software development. The principles play a bridge role between the values and the practices. Some of the XP principles are “humanity economics, mutual benefit, selfsimilarity, improvement and diversity”. As could be seen, they are also at a very abstract level and do not provide advice for software development. The XP practices, which are the main component of this mapping, are presented in two categories: the primary practices and the corollary practices. Based on Kent [14], the primary practices aim at immediate improvement and the corollary practices are difficult without mastering the primary practices.

ISBN: 1-60132-489-8, CSREA Press ©

102

Int'l Conf. Software Eng. Research and Practice | SERP'18 |

In Table 1 and Table 2, the mapping between the primary and corollary practices of XP and the processes and base practices of MDevSPICE® are provided (RQ1-RQ2). Due to space limitation, the XP practice descriptions cannot be provided in the below tables. The bold text in the 2nd columns of Table 1 and Table 2 show the mapped processes. The other text in the same column cell refer to the mapped base practices (BPs).

verification Test First Programming / Continuous Testing

Table 1 Mapping of the XP Primary Practices and the MDevSPICE® Processes and Base Practices

Primary Practices

MDevSPICE® Processes and Base Practices

Weekly Cycle

PRO.1 Project Planning PRO.1.BP4: Define and maintain estimates for project attributes PRO.1.BP5: Define project activities and tasks PRO.2 Project Assessment and Control PRO.2.BP3: Report progress of the project PRO.2.BP4: Perform project review PRO.2.BP5: Act to correct deviations. DEV.1 Software Requirements Analysis DEV.1.BP2: Prioritize requirements. DEV.1.BP7: Baseline and communicate software requirements. PRO.2 Project Assessment and Control PRO.2.BP3: Report progress of the project PRO.2.BP4: Perform project review PRO.2.BP5: Act to correct deviations. PRO.1 Project Planning PRO.1.BP6: Define needs for experience, knowledge and skills. No Corresponding Practice

Quarterly Cycle Whole Team Informative Workspace Energized Work Sit Together Pairing and Personal Space Pair Programming Slack

Ten Minute Build

Continuous Integration

Incremental Design Story

DEV.1 Software Requirements Analysis DEV.1.BP1: Define and document all software requirements. Table 2 Mapping of the XP Corollary Practices and the MDevSPICE® Processes and Base Practices

Corollary Practices

MDevSPICE® Processes and Base Practices

Real Customer Involvement

PRO.1 Project Planning PRO.1.BP6: Define needs for experience, knowledge and skills. SUP.4 Software Release SUP.4.BP2: Define the software product for release SUP.4.BP3: Assemble product for release. SUP.4.BP5: Deliver the release to the acquirer and obtain a confirmation of release. PRO.1 Project Planning PRO.1.BP6: Define needs for experience, knowledge and skills. PRO.1 Project Planning PRO.1.BP6: Define needs for experience, knowledge and skills. SUP.8 Software Problem Resolution SUP.8.BP1: Identify and record each problem in a problem report. SUP.8.BP2: Provide initial support to reported problems and classify problems. SUP.8.BP3: Investigate and identify the cause of the problem. SUP.8.BP4: Assess the problem to determine solution and document the outcome of the assessment. SUP.8.BP7: Implement problem resolution. DEV.4 Software Unit Implementation and Verification DEV.4.BP1: Implement the software units.

Incremental Deployment

No Corresponding Practice No Corresponding Practice PRO.1 Project Planning PRO.1.BP6: Define needs for experience, knowledge and skills. DEV.4 Software Unit Implementation and Verification DEV.4.BP1: Implement the software units. PRO1.Project Planning PRO.1.BP8: Define project schedule. PRO.2 Project Assessment and Control PRO.2.BP5: Act to correct deviations. DEV.5 Software Integration and Integration Testing DEV.5.BP1: Integrate software units into software items. DEV.5.BP2: Verify that software integration follows integration strategy. DEV.5.BP3: Develop tests for integrated software items. DEV.4 Software Unit Implementation and Verification DEV.4.BP4: Verify software units. DEV.5 Software Integration and Integration Testing DEV.5.BP1: Integrate software units into software items. SUP.4 Software Release SUP.4.BP1: Ensure the completeness of software

DEV.4 Software Unit Implementation and Verification DEV.4.BP4: Verify software units. DEV.5 Software Integration and Integration Testing DEV.5.BP3: Develop tests for integrated software items. DEV.5.BP4: Test integrated software items in accordance with the integration plan and document the results. SUP.4 Software Release SUP.4.BP1: Ensure the completeness of software verification Excluded from the analysis, as the definition of this process was not clear

Team Continuity Shrinking Teams Root-Cause Analysis

Shared Code/ Collective Code Ownership Code and Tests Contradicts with MDevSPICE® Single Code Base

DEV.5 Software Integration and Integration Testing DEV.5.BP1: Integrate software units into software items. DEV.5.BP2: Verify that software integration

ISBN: 1-60132-489-8, CSREA Press ©

Int'l Conf. Software Eng. Research and Practice | SERP'18 |

follows integration strategy.

According to the mappings shown in Table 1 and Table 2, the XP method, when implemented fully, is related to seven processes of MDevSPICE®. Table 3 shows these processes and the coverage ratio of each process. Table 3 Coverage Ratios of the Mapped MDevSPICE® Processes from XP Perspective

1. 2. 3. 4. 5. 6. 7.

Mapped MDevSPICE® Processes

CR

PRO.1 Project Planning PRO.2 Project Assessment and Control DEV.1 Software Requirements Analysis DEV.4 Software Unit Implementation and Verification DEV.5 Software Integration and Integration Testing SUP.4 Software Release SUP.8 Software Problem Resolution

0.32 0.50 0.22 0.375 0.80 0.43 0.80

Below, we discuss why these processes in Table 3 did not have a full coverage ratio and what additional practices are required in order to achieve compliance to the medical requirements (RQ3). XP practices are shown in italics and underlined not to be confused with the MDevSPICE® base practices. The mapping illustrated that some of the primary XP practices which are Informative Workspace, Energized Work and Sit Together do not have specific correspondence at the MDevSPICE® side. The practice called Code and Test favours maintaining only the code and the tests as permanent artifacts and generating other documents from the code and tests when necessary. It is suggested to rely on social mechanisms to keep alive important historical parts of the project. However, this approach will not be acceptable in a safety critical software project for traceability and auditory reasons. We excluded the Incremental Design practice of the mapping, as the definition provided in [14] was not clear to associate the practice either with the Architectural Design or Software Detailed Design processes. Below, we discuss the mapped processes and practices in terms of coverage analysis. #1 PRO.1 Project Planning Process (CR of PRO.1 = 3.5 BP / 11 BP = 0.318) The sixth base practice of PRO.1, Define needs for experience, knowledge and skills, could be achieved with the implementation of Whole Team, Real Customer Involvement, Team Continuity and Shrinking Teams practices. The strong emphasis on the team structure of XP could be deduced with these four practices. PRO.1.BP8: Define project schedule base practice requires determining the sequence and schedule of performance of activities within the project. Slack is a practice which suggests having flexibility on project schedule by allowing tasks to be added, changed or dropped and discusses the commitments in terms of honesty with the stakeholders. However, this BP is assumed to be partially achieved (PA), as it is not enough just by itself to establish a project schedule for

103

a medical device software project. The Weekly Cycle practice of XP advises customers to decide stories to be implemented for the following week, breaking the stories into tasks and team members sign up for tasks and estimate them. Therefore, PRO.1.BP4: Define and maintain estimates for project attributes and PRO.1.BP5: Define project activities and tasks BPs may be achieved with proper implementation of the Weekly Cycle practice. Based on this analysis, of the XP implementation, three BPs of PRO.1 (BP4-BP5-BP6) are fully achieved and one BP (BP8) is partially achieved. Additionally, for the PRO.1 process to be fully achieved, the project scope, the project life cycle model, the need for experience, knowledge and skills, and major project interfaces have to be defined with a project plan, including all this information being established and implemented. #2 PRO.2 Project Assessment and Control (CR of PRO.2 = (3 BP / 6 BP =0.50) BPs, PRO.2.BP3: Report progress of the project, PRO.2.BP4: Perform project review and PRO.2.BP5: Act to correct deviations may be achieved through adopting the Weekly Cycle and Quarterly Cycle primary practices of XP. The Weekly cycle practice suggests reviewing progress to date, including monitoring how the actual progress for the previous week matches expected progress. PRO.2.BP3 is fully achieved with this practice. The Quarterly Cycle practice suggests planning work on a quarterly basis from a broader perspective. During this planning activity it is suggested to identify bottlenecks, especially those controlled outside the team, initiate repairs, plan the theme or themes for the quarter and select a quarter's worth of stories to address those themes. Use of a combination of Weekly Cycle, Quarterly Cycle and Slack practices would be sufficient to enable PRO.2.BP4 and PRO.2.BP5 to be fully achieved. Although these practices provide a good structure to monitor project activities and take corrective actions, the medical domain needs special focus on monitoring project attributes such as scope, budget, cost, resources; project interfaces and recording project experiences and data to be available for future projects and process improvement. #3 DEV.1 Software Requirements Analysis Process: (CR of DEV.1 = (2 BP / 9 BP = 0.22) Beck introduces a new form of requirements in XP: Stories which are described using a short name and a graphical description on an index card. They are a place holder to initiate discussions around the requirements. With this definition of Stories in XP, we could say it is not a suitable format for a regulated domain. However, adaptations could be performed on stories to extend their usage. Briefly, in relation to DEV.1.BP1: Define and document all software requirements BP; all functional, performance, security and usability requirements including hardware and software requirements (for third party software as well), software system inputs and outputs, interfaces between the software system and other systems; software-driven alarms, and warnings need to be defined. With the Weekly Cycle practice, defined software requirements could be prioritized. Stories

ISBN: 1-60132-489-8, CSREA Press ©

104

Int'l Conf. Software Eng. Research and Practice | SERP'18 |

and Weekly Cycle provide a means to communicate on software requirements. However, DEV.1.BP7: Baseline and communicate software requirements BP cannot be considered as fully achieved with the implementation of these two practices, as software requirements have to be baselined. Based on this analysis, with XP implementation, two BPs of DEV.1 (BP1-BP7) are partially achieved and one BP (BP2) is fully achieved. Additionally, for the DEV.1 process to be fully achieved impact of the requirements on the operating environment needs to be determined and risk control measures in software requirements need to established and maintained. It is also essential that the consistency is achieved between system and software requirements. In the safety critical domain, the consistency is supported by establishing and maintaining bilateral traceability between the project artefacts. However, XP does not make distinction between requirement types and suggest establishing traceability.

associated with the release and documenting the version of the released software. SUP.4.BP3 requires preparing and assembling the deliverable product and establishing the baseline for the product including user documentation, designs and the product itself. SUP.4.BP5 requires delivering the release to the acquirer and obtaining confirmation of the release. XP suggests the Incremental Deployment practice and SUP.4.BP2, SUP.4.BP3 and SUP.4.BP5 are highly related to this practice. However, we could only assume that SUP.4.BP5 is fully achieved and others partially achieved due to an emphasis on delivering documentation and baselines. In addition to the above, for the SUP.4 process to be fully achieved, procedures to ensure that the released software product can reliably be delivered to the point of use without corruption and unauthorized change need to be established and all software development activities and tasks together with their associated documentation have been completed.

#4 DEV.4 Software Unit Implementation and Verification Process: (CR of DEV.4 = (1.5 BP/ 4 BP = 0.375) The Pair Programming, Test First Programming and Shared Code practices of XP were mapped to DEV.4.BP1: Implement the software units and DEV.4.BP4: Verify software units BPs. Additionally, the Continuous Integration practice was also mapped to DEV.4.BP4. As part of the DEV.4.BP4 BP, each software unit implementations needs to verified and the verification results have to be documented. Because of the documentation requirement DEV.4.BP4 BP was considered to be partially achieved (PA). For the DEV.4 process to be fully achieved, software unit verification procedures needed to be established, acceptance criteria are needed to be defined for each software unit and it has to be ensured that the software units meet that criteria. From a FDA perspective, source code should be evaluated to verify its compliance with specified coding guidelines and additional code inspections need to be performed.

#7 SUP.8 Software Problem Resolution Process: (CR of SUP.8 = 4 BP/ 5 BP= 0.80) The purpose of the software problem resolution process is to ensure that all discovered problems (bugs, defects) are identified, analyzed, managed and controlled to resolution. Test First Programming and Continuous Integration practices are mainly related to the detection and resolution of problems. The Root-Cause Analysis practice has a good procedure for problem resolution. It suggests writing automated systemlevel tests that demonstrate the defect and the desired behavior of the system, writing unit tests with the smallest possible scope that also reproduces the defects and fixes the system so the unit tests work. It is also suggested that once the defect is resolved, to identify why the defect was created and wasn't caught in the first place and to initiate the necessary changes to prevent this kind of defect in the future. With these practices/procedures, SUP.8.BP1: Identify and record each problem in a problem report, SUP.8.BP2: Provide initial support to reported problems and classify problems, SUP.8.BP3: Investigate and identify the cause of the problem and SUP.8.BP7: Implement problem resolution BPs are fully achieved, whereas SUP.8.BP4: Assess the problem to determine solution and document the outcome of the assessment BP is partially achieved.

#5 DEV.5 Software Integration and Integration Testing Process: (CR of DEV.5 = 4 BP/ 5 BP= 0.80) The purpose of the Ten-Minute Build practice of XP is to automatically build the whole system and run all of the tests in ten minutes. With a combination of Continuous Integration and Single Code Base, these practices were mapped to the first three BPs of the DEV.5 process. With Test First Programming practice, the DEV.5.BP4: Test integrated software items in accordance with the integration plan and document the results will enable the process to be achieved. Additionally, MDevSPICE® emphases developing regression tests and providing evidence regarding the tests performed. There is no information found on XP regarding regression tests. #6 SUP.4 Software Release Process: (CR of SUP.4 = 3 BP/ 7 BP= 0.43) Continuous Integration and Test First Programming practices could be used to achieve SUP.4.BP1 which requires ensuring that the detected residual anomalies have been evaluated to ensure that they do not contribute to an unacceptable risk of a hazard. It is also essential that these anomalies were recorded and traced. SUP.4.BP2 requires defining the products

Additionally, for the SUB.8 process to be fully achieved, problem reports need to be developed to include a statement of criticality and potentially adverse events. A problem’s relevance to safety needs to be evaluated. The outcome of the evaluation needs to be documented, relevant parties of the existence of the problem need to be informed. Records of problem reports, problem resolutions and their verification are maintained.

4.2

Summary of the Scrum Mapping

With the same approach described and followed above, we evaluated Scrum to learn how it meets the regulatory requirements defined in MDevSPICE®. Table 4 below shows these processes and the coverage ratio of each process.

ISBN: 1-60132-489-8, CSREA Press ©

Int'l Conf. Software Eng. Research and Practice | SERP'18 |

Table 4 CRs of Mapped MDevSPICE® Processes from Scrum Perspective Mapped MDevSPICE® Processes

CR

6 [1]

1.

PRO.1 Project Planning

1

2.

PRO.2 Project Assessment and Control

3.

ENG.1 Stakeholder Requirements Definition 0.55

4.

ENG.2 System Requirements Analysis

0.71

[2]

5.

DEV.1 Software Requirements Analysis

0.33

[3]

0.9

The details of this evaluation were provided in “How does Scrum Conform to the Regulatory Requirements Defined in MDevSPICE®?” publication [12].

5

105

Conclusions

[4] [5] [6]

In this paper, we analyzed how well the medical device software development requirements are met by the implementation of XP and provided a very detailed evaluation. For this evaluation, we listed XP practices and mapped them to MDevSPICE® base practices and calculated the coverage ratios for the associated MDevSPICE® processes. The purpose of providing these ratios is to provide readers and practitioners with an indication of how much value is achieved with the XP implementation and how much needs to be done more from a regulatory perspective. Implementing the XP and Scrum practices in a medical device software organization may provide partial or full achievement in nine MDevSPICE® processes. Above, we provided what additional practices need to performed for conformance to medical regulations. However, the framework defines 14 more processes to be addressed. This shows that main agile software development methods could be implemented in a MDSD organization. However, they cannot be a complete solution just by themselves. It can be deduced that tailoring is essential for agile practices within the medical device software domain. This mapping has illustrated the level of XP’s support for project management processes/practices, is very limited. Therefore, Scrum could be a good complement to XP for planning and assessment practices in MDevSPICE®. Besides, even though the technical practices are provided by XP, it was shown that they are also not sufficient to meet the needs of medical requirements. We have captured association in nine MDevSPICE® processes, in research we will extend the mapping process to include other ASD methods to be able to provide a complete software development life cycle coverage.

[7]

[8]

[9]

[10] [11]

[12]

[13] [14] [15]

Acknowledgement. This research is supported by Science Foundation Ireland under a co-funding initiative by the Irish Government and European Regional Development Fund through Lero - the Irish Software Research Centre (http://www.lero.ie) grant 13/RC/2094. This research is also partially supported by the EU Ambient Assisted Living project – Maestro.

[16] [17]

References FDA. (15.05). Chapter I - Food and drug administration, department of health and human services subchapter H Medical devices, Part 820 - Quality system regulation. Available: http://www.accessdata.fda.gov/scripts/cdrh/cfdocs/cfcfr/CF RSearch.cfm?CFRPart=820 Directive 93/42/EEC of the European Parliament and of the Council concerning medical devices, 1993. Council directive 90/385/EEC on active implantable medical devices (AIMDD), 1990. Directive 98/79/EC of the european parliament and of the council of 27 october 1998 on in vitro diagnostic medical devices, 1998. IEC 2006. IEC 62304: Medical Device Software - Software Life-Cycle Processes. F. McCaffrey, M. Lepmets, K. Trektere, O. Ozcantop, and M. Pikkarainen, "Agile Medical Device Software Development: Introducing Agile Practices into MDevSPICE®," 2016. B. Fitzgerald, K.-J. Stol, R. O'Sullivan, and D. O'Brien, "Scaling agile methods to regulated environments: An industry case study," in Software Engineering (ICSE), 2013 35th International Conference on, 2013, pp. 863-872: IEEE. G. Regan, F. Mc Caffery, K. Mc Daid, and D. Flood, "Medical device standards' requirements for traceability during the software development lifecycle and implementation of a traceability assessment model," Computer Standards & Interfaces, vol. 36, no. 1, pp. 3-9, 2013. M. Mc Hugh, O. Cawley, F. McCaffcry, I. Richardson, and X. Wang, "An agile v-model for medical device software development to overcome the challenges with plan-driven software development lifecycles," in Software Engineering in Health Care (SEHC), 2013 5th International Workshop on, 2013, pp. 12-19: IEEE. P. A. Rottier and V. Rodrigues, "Agile development in a medical device company," in Agile, 2008. AGILE'08. Conference, 2008, pp. 218-223: IEEE. M. Lepmets, F. McCaffery, and P. Clarke, "Development and benefits of MDevSPICE®, the medical device software process assessment framework," Journal of Software: Evolution and Process, vol. 28, no. 9, pp. 800816, 2016. Ö. Özcan-Top and F. McCaffery, "How Does Scrum Conform to the Regulatory Requirements Defined in MDevSPICE®?," in International Conference on Software Process Improvement and Capability Determination, 2017, pp. 257-268: Springer. J. Sutherland and K. Schwaber, "The scrum guide," The Definitive Guide to Scrum: The Rules of the Game. Scrum. org, 2013. K. Beck, Extreme programming explained: embrace change. Addison-Wesley Professional, 2000. ISO/IEC 15504-5:2012 Information technology -- Process assessment -- Part 5: An exemplar software life cycle process assessment model, 2012. P. Abrahamsson, O. Salo, J. Ronkainen, and J. Warsta, "Agile software development methods: Review and analysis," ed: VTT Finland, 2002. K. S. Rubin, Essential Scrum: A Practical Guide to the Most Popular Agile Process. Addison-Wesley Professional, 2012.

ISBN: 1-60132-489-8, CSREA Press ©

106

Int'l Conf. Software Eng. Research and Practice | SERP'18 |

Seven Principles of Undergraduate Capstone Project Management Xiaocong Fan School of Engineering, Behrend College Pennsylvania State University Erie, PA 16563 Email: [email protected]

Abstract—The Agile development approach has been increasingly adopted in undergraduate capstone courses for software engineering as well as other engineering programs. Our capstone course series is designed to help students grow in three aspects: software engineering practices, teamwork, and project management. Bringing all three aspects together, in this study we have focused on the agile management of team-based capstone projects. In particular, we propose to distinguish three levels of capstone project management: base-level management, metalevel management, and cross-team management. We then refine the three levels of management into seven principles which are critical for senior students to learn in the management of their capstone projects. The seven principles have been implemented into a web-based platform, and a survey was conducted to evaluate the effectiveness of the platform in helping students practice the seven principles. The survey confirmed that students have gained quite positive experiences in all seven aspects. Keywords—Capstone projects, Teamwork, Agile development, Sprint review, Project traceability.

I. I NTRODUCTION The Agile software development approach has gained popularity and become a standard practice in software industry [1], [2]. It advocates adaptive planning, early delivery, and continual improvement. To minimize the amount of up-front planning and design, it breaks the development process into short time frames (a.k.a. iterations or sprints), each of which typically lasts from one to four weeks. In each sprint, the development team work closely in all software activities including planning, analysis, design, coding, and testing, with the goal to demonstrate a working product to the stakeholders at the end of the sprint. It favors an incremental development strategy in the sense that each successive release of the product is usable, and each builds upon the previous one by adding new user-visible functionality. Many undergraduate engineering programs require senior students to take capstone design courses, where students have the opportunity to work on capstone projects with realistic requirements that are typically proposed by industry sponsors [3]–[8]. In the Department of Computer Science and Software Engineering at Penn State Behrend, our capstone program spans two semesters, requiring senior students to form teams to design and implement systems to meet requirements and constraints originated from industrial customers. Upon the completion of a project, our goal is not just to ensure that

students can deliver a final product to the sponsor, but more importantly, to help them gain essential skills and project development experience that can launch them into successful future careers. To achieve such a goal, our capstone course series is designed to help students grow in three aspects: 1) Software Engineering Practices: As documented in the literature, the Agile development approach has been increasingly adopted in undergraduate capstone courses for software engineering as well as other engineering programs such as computer engineering and electrical engineering [2], [9]–[11]. In order to bridge the gap between campus training and industry practice, we have advised our students to adopt many agile practices [12] in the development of their capstone projects since 2015. 2) Teamwork: Capstone projects are team-based in many undergraduate engineering programs [13]–[20]. At Penn State Behrend, we have implemented a student-centered collaboration model, where throughout the two-semester long process, a student team need to work closely with multiple persons playing different roles including industry mentor, faculty advisor, and the course instructor. The industry mentor of a team serves as a real customer of the system under development, clarifying user requirements and sending change requests after review of an intermediate system. The faculty advisor closely monitors the progress of a team on their projects, reviews the team’s project work, and provides advice for improvement. The course instructor oversees all capstone projects to ensure that engineering standards and principles are consistently applied. Obviously, the success of a capstone project largely relies on the intentional integration of multi-party collaboration, and we hope our students can learn how to teamwork with people playing various roles. 3) Project Management: Effective project management is hard, especially for undergraduate students who still have little working experience of managing a team-based project across two semesters. In our program curriculum, there are a few courses which do require multiple students to work as a team on a course project. However, due to the small scale and short timespan of course projects, students cannot be really exposed to the complexity of project management until they come to the capstone projects.

ISBN: 1-60132-489-8, CSREA Press ©

Int'l Conf. Software Eng. Research and Practice | SERP'18 |

107

TABLE I S EVEN P RINCIPLES FOR U NDERGRADUATE C APSTONE P ROJECT M ANAGEMENT 1 2 3 4 5 6

Principle Effective Planning and Scheduling Effective Communication with Customers Traceability in Design Artifacts Student Teamwork Awareness Share Progress with Stakeholders Share Task Challenges

7

Collect Timely Feedbacks

Description Students should know how to plan and schedule their project tasks effectively. Students have to manage their communication with customers effectively, or fail. Students should take the responsibility of managing the traceability among their design artifacts. Students should form the habit of actively building teamwork awareness among teammates. Student should regularly share project progress with their faculty advisor and industry mentor. Students should be trained to proactively share the challenges and difficulities they are facing with teammates and faculty advisor. Students should regularly and frequently work with their project stakeholders to collect feedbacks on the system under development.

Via the capstone projects, we hope our students can learn project management skills (sometimes the hard way) to better prepare themselves for working on larger-scale projects after graduation. Bringing all the three aspects together, in this study we focus on the agile management of team-based capstone projects, sharing the guiding principles and practices implemented in our capstone program. The rest of this paper is organized as follows. In Section II, we give the seven principles for undergraduate capstone project management. In Section III, we describe how the seven principles are implemented. In Section IV, we report responses from students regarding the effectiveness of our implementation, and Section V concludes the paper.

2)

3)

II. S EVEN PRINCIPLES FOR UNDERGRADUATE CAPSTONE PROJECT MANAGEMENT

For team-based capstone projects, we distinguish three levels of project management. • Base-level management: First of all, students should learn how to manage their own projects. Particularly, a student team should be responsible for activities such as risk management, requirements management, change management, project planning and scheduling, project monitoring and reviewing. All these need to be frequently conducted by students themselves. • Meta-level management: The faculty advisor of a student team is not directly responsible for managing the team’s project. However, student success does rely on the advisor’s guidance. In order to keep students stay on the right track, an advisor ought to provide a higher-level project monitoring and reviewing regularly. • Cross-team management: A cross-team project management is enforced by the course instructor, who oversees all capstone projects to ensure that engineering standards and principles are consistently applied. This offers another level of quality control over the capstone projects. By reflecting on over 20 years of practices in undergraduate capstone projects, we have recently distilled them into seven important principles for capstone project management ( as listed in Table I). 1) Effective Planning and Scheduling: Students should know how to plan and schedule their project tasks effectively.

4)

5)

To be effective, it is critical for a team to have a mutual agreement on a to-do task list and task assignments (or ownership), a reasonable time commitment to each task, and a clear understanding of inter-task dependency. Effective Communication with Customers: Students need to communicate with relevant stakeholders to identify, elicit, and prioritize system requirements and constraints. Project customers such as the industry mentor are involved only on an as-needed basis, and their time commitment to the sponsored project is oftentimes limited. Students have to manage their communication with customers effectively, or fail. Traceability in Design Artifacts: Many project artifacts are generated throughout the development process. Although it is not expected to produce hundreds of pages of documentation as in the waterfall approach, a well-run Agile project should produce the right amount of documentation in each sprint that should include high level analysis and design artifacts. In our practice, students are instructed to document their design artifacts such as user requirements, system requirements, use cases, and test cases. Students should take the responsibility of managing the traceability among their design artifacts. Student Teamwork Awareness: In addition to the capstone projects, students on campus often have to take on other course works and activities. It is actually a norm that each student may work only a couple of hours every week on the project [21], [22]. All these factors bring up a common pitfall in undergraduate capstone teams: lack of teamwork awareness. We can often observe that there are quite a few students do not know exactly where they or their teammates are in the development process. Key characteristics of high-performing human teams include progress awareness, task awareness and proactive communication [23], [24]. One key learning objective of our capstone program is to help students form the habit of actively building teamwork awareness among teammates. Share Progress with Stakeholders: In addition to building teamwork awareness among peer students, they should also regularly share project progress with their faculty advisor and industry mentor. This helps the stakeholders to offer timely advices that are pertinent to students’ current progress.

ISBN: 1-60132-489-8, CSREA Press ©

108

Int'l Conf. Software Eng. Research and Practice | SERP'18 |

6) Share Task Challenges: According to our experience, successful student teams are those who proactively share the challenges or difficulties they are confronting with and seek guidance from their teammates and faculty advisor. Chances are that others might have struggled then resolved the same challenges. Hence, students should be trained to proactively share their challenges, taking advantage of collective intelligence. 7) Collect Timely Feedbacks: One of the best practices offered in the agile development approach is that, at the end of each sprint, a student team can demonstrate the working software to the project stakeholders and collect their feedbacks about the system under development. Without timely feedbacks, the team could run into difficulties in planning for subsequent sprints. The three levels of project management as described in the beginning of this section can be reified in the seven principles pretty well. While the base-level management concerns about all seven principles, the Principles 5, 6, and 7 also apply to the meta-level management. As far as the cross-team management is concerned, the course instructor, with accessibility to all capstone projects, can disseminate the best engineering practices, identify the most appropriate teamwork behavior and encourage its adoption in the whole class. III. I MPLEMENTATION OF S EVEN P RINCIPLES CapStone is a web-based platform that has been developed and deployed at the Behrend College, Pennsylvania State University to facilitate multi-party collaboration during the development process of capstone projects [25]. To promote student success, we have now implemented new features in CapStone to support the seven principles in capstone project management. Below we describe how each of the seven principles is implemented. A. Principle 1: Effective Planning and Scheduling First of all, at the cross-team management level, we have split the two-semester development process into ten agile sprints: one inception sprint and 9 construction sprints. The inception sprint (Sprint 0) lasts for one month, from midSeptember till mid-October, and each construction sprint lasts for about 2 weeks. In industry, each project may have a different sprint schedule and the length of a sprint may be flexible to change. However, we have to enforce a fixed sprint schedule for all capstone projects on campus, for two reasons. First, students are still novice Agile software developers and few have the ability to be a scrum master. Second, the curriculum requires that all student teams demonstrate tangible progress regularly (four times to give presentations and submit technical reports along the process). Enforced sprint schedule helps to keep all teams on the right track. Second, at the base-management level, student planning and scheduling spans throughout the agile development process and occurs in each sprint. Typically at the beginning of a sprint, each team meet together to identify a list of tasks to do in the sprint. CapStone allows students to collaboratively

create team tasks. Once a task has been created, each student can drag and drop the task into his/her own tasking panel to schedule a task instance. A task created in a sprint is supposed to be completed by the end of the sprint. In case not, the task will be carried over to the next sprint. It’s interesting to comment here that students tend to plan big tasks in the first few sprints, and start to realize very soon that big tasks can take more than one sprint to complete. Gradually, most students could learn the lesson to split big tasks to make them manageable and completable in one sprint. B. Principle 2: Effective Communication with Customers Because industry mentors are fully engaged in their own jobs, we advise our student teams to contact their industry mentors only on an as-needed basis. We also instruct students to be organized whenever they talk to their industry mentors. For instance, they should plan ahead by preparing a list of questions or discussion points before a meeting, clearly indicate when an answer will be expected if not provided in the meeting time, and end a meeting with a short summary and the next meeting time set up, if possible. To facilitate effective communication, after each meeting with their industry mentor, the team should use CapStone to document the meeting minutes and share with the mentor. Students are reluctant to spend time in writing meeting minutes in the beginning, but many of them will appreciate having such a formal communication record as soon as they run into a situation where ambiguities need to be resolved or misunderstandings need to be clarified. Oftentimes the meeting minutes are the best place to help resolve an ambiguity or clarify a misunderstanding. Most communication between a student team and their industry mentor is about requirements engineering. This includes requirements elicitation during the inception sprint and requirements refinement at the end of each construction sprint. During the inception sprint, each team need to work with their industry mentor to elicit an initial set of user requirements, which are then transformed into system requirements. The hope is to have a few high-priority system features and constraints identified in the early stage of development so that students can start their development activities. At the end of a construction sprint, each team demonstrate the system under development to their industry mentor and collect feedbacks (change requests) afterward. The collection of feedbacks is critical for the team to refine existing requirements and discover new ones for subsequent iterations/sprints. C. Principle 3: Traceability in Design Artifacts CapStone has several CASE tools for documenting design artifacts and manage traceability. In particular, CapStone allows students to document user requirements, system requirements, use cases, test cases, test case execution, and change requests. CapStone helps to manage the following traceability information: • refinement relation between user requirements and system requirements;

ISBN: 1-60132-489-8, CSREA Press ©

Int'l Conf. Software Eng. Research and Practice | SERP'18 |

109

Fig. 1. TaskBoard for Building Student Teamwork Awareness.

connection between functional user/system requirements and use cases; • connection between user/system requirements and test cases; • connection between use cases and test cases; • connection between a sprint task and outstanding change requests; • connection between a sprint task and system requirements. The traceability information helps a team quickly assess the potential impacts of changes and take appropriate actions to respond to the changes. Some traceability information also helps a team justify their designs or implementation. For instance, one user requirements may be refined into several system requirements. The process of establishing such a refinement relation can invite a team to think deep whether a user requirements has been engineered appropriately (neither over engineered nor partially engineered). For one more example, when a task is created, it can be linked to some outstanding change request(s) and system requirements. Indeed, one should suspect, and discard, any task which does not directly address an outstanding change request or a system requirements. •

ness. A screenshot of the TaskBoard is shown in Figure 1. The TaskBoard offers the following features: •



• •



D. Principle 4: Student Teamwork Awareness On the CapStone platform, a student team can use the TaskBoard to develop and maintain shared teamwork aware-



It displays tasks and activities as relevant to the current sprint. It also allows a student to navigate back and check records in an earlier sprint. It allows a team to create and manage a to-do task list, where each task is color-coded by its priority, and is linked to the relevant change requests and/or system requirements. It helps a team to manage a done task list, where each task is listed with its student contributor(s). It allows each student to manage his/her own tasking panel. Specifically, a student can schedule a list of task instances, manage records of working on a task, and adjust the status of a task as time passes. Each task instance may go through states such as ‘todo’, ‘inprogress’, ‘stuck’, ‘canceled’, and ‘done’. A task assigned to more than student doer is not moved to the done list until after the last doer has changed its status to ‘done’. Task assignments, task progress and tasking effort (total tasking hours) are shared among teammates. By clicking on the icon of a teammate, a student can view the detail tasking records of that teammate. It also allows students to share messages (such as encouragement, challenges) among teammates.

ISBN: 1-60132-489-8, CSREA Press ©

110

Int'l Conf. Software Eng. Research and Practice | SERP'18 |

Fig. 2. Student’s tasking trend along the agile process.

E. Principle 5: Share Progress with Stakeholders To support meta-level management, the CapStone platform also allows the faculty advisor of a student team to keep track of the team’s progress. Via the progress monitor panel, the advisor can see • a list of on-going tasks and tasks completed in the current sprint; • total tasking hours of each student; and • how much time has each student worked on each task assigned to him/her. By clicking on the icon of a student, the advisor can also get details about the student’s effort and performance: • the tasks completed in the current sprint by the student along; • the tasks completed in the current sprint together with other teammates; • tasking overview of the student (one example is given in Fig. 2), showing the tasking trend along the development process; and • tasking details (one example is given in Fig. 3), showing what tasks the student has worked on in each sprint and how long. The CapStone platform also supports cross-team management, offering the same features as described for the course instructor to keep track of each team’s progress. F. Principle 6: Share Task Challenges In the TaskBoard as shown in Fig. 1, when a student gets stuck on a task, he/she can always change the status of the task instance to ‘stuck’, and attach a note to explain the challenges he/she is confronting with. In so doing, the student actually has raised a ‘call-for-help’ signal to others. The ‘call-for-help’ signal can be successfully caught by a teammate when he/she opens the TaskBoard (typically multiple times every day), or caught by the faculty advisor when he/she uses CapStone to review students’ progress (at least once a week). Whoever catches the ‘stuck’ signal may be able to provide guidance or pointers, which can help the student resolve issues quicker.

Fig. 3. Student’s tasking record in detail.

G. Principle 7: Collect Timely Feedbacks The CapStone platform allows students to regularly collect two types of feedbacks from stakeholders: grading feedbacks and sprint reviews. Twice per semester, each student team need to submit their project report and give a technical presentation to the faculty. For each presentation and report submission, the advisor is requested to complete an online evaluation form on CapStone. The online feedback approach offers at least two benefits. First, the feedback can become immediately available to the students in a combined form. Second, it allows the advisor to easily track whether his/her feedbacks have been considered in the later versions of student work (system/reports). At the end of each construction sprint, a sprint review is conducted to gather feedbacks from the project stakeholders on the system under development. Each team should conduct a sprint review with their faculty advisor and industry mentor, together or separately. Additionally, to encourage cross-team learning, we have also coined a term called “agile buddy” as a complementary approach to collecting user feedbacks. Let’s say, there are three teams T1 , T2 , and T3 . Each team has one student designated as that team’s agile buddy (or ambassador). Let’s refer to them by A1 , A2 , and A3 , respectively. A1 is responsible for reviewing the system-inprogress of T2 , A2 is responsible for reviewing T3 ’s system, and A3 is responsible for reviewing T1 ’s system. In so doing, each team can acquire an outsider student’s opinion on their system. Furthermore, each team, say T1 , has a chance to collect information from two other teams: A1 can bring back team T2 ’s progress and practices, and T1 can get to know T3 ’s progress and practices from A3 . This not only helps to disseminate best practices, but also to drive teams who have

ISBN: 1-60132-489-8, CSREA Press ©

Int'l Conf. Software Eng. Research and Practice | SERP'18 |

111

Fig. 4. Student Responses to Survey Questions.

lagged behind their schedules to speed up. After experiencing the system features or simply watching a demonstration by students, a stakeholder is asked to submit feedbacks via the CapStone platform. Specifically, each feedback can be labeled as one of three action types: corrective action, perfective action, and progressive action. A corrective feedback concerns about detected bugs or defects that should be fixed; a perfective feedback describes where could be changed to offer an enhanced system performance or better user experience; a progressive feedback suggests to the team what new features may be developed next. IV. E FFECTIVENESS OF O UR I MPLEMENTATION Students in the class of 2018 started their capstone projects in Aug. 2017. There were 44 students in the class, where 24 of them were Software Engineering students, and the rest were Computer Science students. In total, 15 teams were formed, each consisting of both CS and SE major students. The seven principles have been fully implemented in the CapStone platform and made available to the students since Nov. 2017. A survey was conducted in April, 2018, when the students had intensively used the project management features for about 6 months. All students except four responded. Figure 4 plots student responses to the seven survey questions, where students were asked to rate the system effectiveness at 6 levels (with the lowest level 1 indicating ‘not helpful’ and the highest level 6 indicating ‘extremely helpful’. At the end of the survey, students were also asked to categorize their own teamwork habit into one of three categories: (1) prefer working alone, (2) prefer teamwork with those of my own level, and (3) prefer teamwork with those with superior skills. Thus, in our analysis, student responses are grouped separately according to their teamwork habit. There are four curves in each of Fig. 4(a–g), plotting respectively the responses from

students in the groups of ‘work alone’, ‘with own level’, ‘with superior skills’, and the whole class. Fig. 4(a) plots the responses to how well CapStone helped with project planning and scheduling. Except the ‘work alone’ group, the other two groups and the whole class each has the mode at the level 5 (very much helpful). Overall, 87.5% of students had positive experiences. For the students who preferred to work alone, 5 out of 8 students rated CapStone as very much useful. Fig. 4(b) plots the responses to how well CapStone helped you manage and share meeting notes with industry mentor. Fig. 4(f) plots the responses to how well CapStone helped you share your own task challenges with teammates and faculty advisor, and provide guidance or pointers to help resolve your teammates’ challenges. In both cases, majority of the students rated CapStone as a bit helpful. When asked, most students preferred to use other means (social media) to manage their communication with mentors and share ideas with teammates. Fig. 4(c) plots the responses to how well CapStone helped you manage traceability in design artifacts. While the responses from three different groups are mixed, the overall class rating confirmed that CapStone is quite or very useful. Fig. 4(d) plots the responses to how well CapStone helped you keep track of teammates’ task assignments, tasking records, and progress. It’s interesting to note that more than half students deemed CapStone very helpful or higher. It’s also interesting to see that the opinions of the students who preferred to work with superior skills are split into two ends: 6 liked strongly and 6 disliked. Some students did complain that why bother to keep track of other teammates who virtually have nothing to track (never or rarely contribute to the project) or who are reluctant to report progress (lack of teamwork spirit). Fig. 4(e) plots the responses to how well CapStone helped you share project progress with faculty advisor. Overall, 70%

ISBN: 1-60132-489-8, CSREA Press ©

112

Int'l Conf. Software Eng. Research and Practice | SERP'18 |

of students rated CapStone at the ‘quite helpful’ or a higher level, and only 2 students claimed they did not benefit from it. Fig. 4(g) plots the responses to how well CapStone helped you collect timely feedbacks from stakeholders. Overall, 80% of students rated CapStone at the ‘quite helpful’ or a higher level, and only 4 students reported a non-positive experience. V. C ONCLUSION The Agile development approach has been increasingly adopted in undergraduate capstone courses for software engineering as well as other engineering programs. In order to bridge the gap between campus training and industry practice, we have instructed our students to adopt the agile approach in the development of their capstone projects since 2015. Upon the completion of a project, our goal is not just to ensure that students can deliver a final product to the sponsor, but more importantly, to help them gain essential skills and project development experience that can launch them into successful future careers. To achieve such a goal, our capstone course series is designed to help students grow in three aspects: software engineering practices, teamwork, and project management. Bringing all the three aspects together, in this study we have focused on the agile management of team-based capstone projects. In particular, we proposed to distinguish three levels of capstone project management: base-level management, meta-level management, and cross-team management. After reflecting on over 20 years of practices in undergraduate capstone projects, we have recently distilled them into seven important principles for capstone project management. The seven principles have been implemented into CapStone– a web-based application where the responsibilities of different roles are meshed into one common platform such that all parties could effectively orchestrate their activities. A survey was conducted recently to evaluate the effectiveness of CapStone in helping students practice the seven principles. The survey confirmed that students have gained quite positive experiences in all seven aspects. The study also suggests some areas for improvement. For example, many students preferred to use other means such as social media instead to manage their communication with mentors and share ideas with teammates. It would be more beneficial if CapStone could exchange information with popular social media tools. R EFERENCES [1] D. A. Umphress, T. D. Hendrix, and J. H. Cross, “Software process in the classroom: the capstone project experience,” IEEE Software, vol. 19, no. 5, pp. 78–81, Sep 2002. [2] D. Rover, C. Ullerich, R. Scheel, J. Wegter, and C. Whipple, “Advantages of agile methodologies for software and product development in a capstone design project,” in Frontiers in Education Conference (FIE), 2014 IEEE, Oct 2014, pp. 1–9. [3] J. L. Ray, “Industry-academic partnerships for successful capstone projects,” in Frontiers in Education, 2003. FIE 2003 33rd Annual, vol. 3, Nov 2003, pp. S2B–24–9 vol.3. [4] R. M. Ford and W. C. Lasher, “Processes for ensuring quality capstone design projects,” in Frontiers in Education, 2004. FIE 2004. 34th Annual, Oct 2004, pp. S2G–12–17 Vol. 3.

[5] D. Ingalsbe and J. Godbey, “Project-oriented capstone course: Integrating curriculum assessment utilizing industry partner and student input,” in Proceedings of the ASEE Annual Conference and Exposition, Austin, 2005. [6] J. V. Tocco and D. D. Carpenter, “Re-engineering the captsone: Melding an industry oriented framework and the BOK2,” in Proceedings of the ASEE Annual Conference and Exposition, Vancouver, BC, Canada, 2011. [7] R. K. Stanfill and A. Rigby, “The professional guide: A resource for preparing capstone design students to function effectively on industrysponsored project teams,” in Proceedings of the ASEE Annual Conference and Exhibition, Indianapolis, 2014, p. 22. [8] C. Steinlicht and B. G. Garry, “Capstone project challenges: How industry sponsored projects offer new learning experiences,” in Proceedings of the ASEE Annual Conference and Exposition, Indianapolis, 2014, p. 11. [9] J. C. R. Stansbury, M. Towhidnejad and M. Dop, “Agile methodologies for hardware/software teams for a capstone design course: lessons learned,” in Proc. ASEE Annual Conference, 2011. [10] V. Mahnic, “A capstone course on agile software development using scrum,” IEEE Transactions on Education, vol. 55, no. 1, pp. 99–106, Feb 2012. [11] M. Grimheden, “Increasing student responsibility in design projects with agile methods,” in Proc. ASEE Annual Conference, 2013. [12] X. Fan, “Capstone: A cloud-based platform for multi-party collaboration on capstone projects,” in 2016 IEEE Frontiers in Education Conference (FIE), 2016, pp. 1–9. [13] P. M. Griffin, S. O. Griffin, and D. C. Llewellyn, “The impact of group size and project duration on capstone design,” Journal of Engineering Education, pp. 185–193, 2004. [14] K. F. Li, A. Zielinski, and F. Gebali, “Capstone team design projects in engineering curriculum: Content and management,” in Teaching, Assessment and Learning for Engineering (TALE), 2012 IEEE International Conference on, Aug 2012, pp. T1C–1–T1C–6. [15] J. W. Jones and M. Mezo, “Capstone = team teaching + team learning + industry,” in Proceedings of the ASEE Annual Conference and Exposition, Indianapolis, 2014, p. 7. [16] C. W. Ferguson and P. A. Sanger, “Facilitating student professional readiness through industry sponsored senior capstone projects,” in Proceedings of the ASEE Annual Conference and Exposition, Vancouver, BC, Canada, 2011. [17] D. Knudson and A. Radermacher, “Updating cs capstone projects to incorporate new agile methodologies used in industry,” in Software Engineering Education and Training (CSEE T), 2011 24th IEEE-CS Conference on, May 2011, pp. 444–448. [18] V. Isomttnen and T. Krkkinen, “The value of a real customer in a capstone project,” in Software Engineering Education and Training, 2008. CSEET ’08. IEEE 21st Conference on, April 2008, pp. 85–92. [19] J. Vanhanen, T. O. A. Lehtinen, and C. Lassenius, “Teaching real-world software engineering through a capstone project course with industrial customers,” in Software Engineering Education based on Real-World Experiences (EduRex), 2012 First International Workshop on, June 2012, pp. 29–32. [20] A. Rusu and M. Swenson, “An industry-academia team-teaching case study for software engineering capstone courses,” in Frontiers in Education Conference, 2008. FIE 2008. 38th Annual, Oct 2008, pp. F4C– 18–F4C–23. [21] M. Kropp and A. Meier, “Teaching agile software development at university level: Values, management, and craftsmanship,” in 2013 26th International Conference on Software Engineering Education and Training (CSEE T), 2013, pp. 179–188. [22] D. Kumar and I. Govender, “Bringing agile practice to the classroom: Student voices of third-year major project implementation,” African Journal of Science, Technology, Innovation and Development, vol. 8, no. 5-6, pp. 423–428, 2016. [23] M. Borrego, J. Karlin, L. McNair, and K. Beddoes, “Team effectiveness theory from industrial and organizational pychology applied to engineering student project teams: A research review,” Journal of Engineering, vol. 102, no. 4, pp. 472–512, 2002. [24] X. Fan, J. Yen, and R. A. Volz, “A theoretical framework on proactive information exchange in agent teamwork,” Artificial Intelligence, vol. 169, pp. 23–97, 2005. [25] X. Fan, “Capstone: A cloud-based platform for multi-party collaboration on capstone projects,” in Proceedings of the IEEE Frontiers in Education Conference (FIE)., Oct 2016, pp. 1–9.

ISBN: 1-60132-489-8, CSREA Press ©

Int'l Conf. Software Eng. Research and Practice | SERP'18 |

113

The role of Product Owner from the practitioner’s perspective. An exploratory study Gerardo Matturro Departamento de Ingeniería de software Universidad ORT Uruguay Cuareim 1451, Montevideo, Uruguay [email protected]

Felipe Cordovés Departamento de Ingeniería de Software Universidad ORT Uruguay Cuareim 1451, Montevideo, Uruguay [email protected]

Abstract—The role of Product Owner (PO) is an essential component of the Scrum methodology. This role is well defined in the literature in terms of its responsibilities and the skills that the person who performs it should have. However, in the field, the understanding of the role and its responsibilities is quite different among organizations, and rarely in perfect compliance with the official Scrum methodology. This article reports an exploratory study about how this role is performed in industrial practice. Through semi-structured interviews conducted with a group of POs, we investigated the procedure and criteria for choosing the person for the role, the ceremonies in which they participate, and the desirable soft skills in a good PO. Findings reveal that: a) the recommendation that the PO should be a representative of the business area that will benefit from the solution is not fulfilled, b) the ceremonies that the POs always attend are sprint planning and sprint review, and c) the most valued soft skills in a PO are communication, teamwork and customer-orientation. Keywords—Scrum, agile, product owner, soft skills Regular Research Paper

I. INTRODUCTION In the Agile Scrum methodology, the Product Owner (hereinafter, PO) is one of the three defined roles with wellestablished characteristics and responsibilities. Even so, this role is usually assimilated to other more traditional ones in software engineering, such as Product Manager, Business Analyst or even Project Manager. According to Pichler, since the PO is a new role, individuals often need time and support to transition into the role and to acquire the necessary skills. According to this author, in addition, a common challenge in organizations is finding employees with the necessary breadth and depth of knowledge, experience, and personal skills needed to do the job well [1]. On the other hand, Sverrisdottira and colleagues mention that the understanding of the role and its responsibilities is quite different among organizations, and rarely in perfect compliance with the official Scrum methodology [2]. The purpose of this article is to report the process followed and the results obtained from an exploratory study oriented to know how some aspects of the role of the PO are actually performed in the industrial practice.

Martín Solari Departamento de Ingeniería de Software Universidad ORT Uruguay Cuareim 1451, Montevideo, Uruguay [email protected]

This study presents a synthesis of the characterization of the role of the PO obtained from the literature accepted by the community, to establish similarities and differences with what is observed in the field. The rest of this article is organized as follows. Section 2 presents a review of the main concepts and definitions related to the role of the PO, and some works related to the research topic. Section 3 explains the methodological design of the study, where the research questions and the data collection and preparation procedures are exposed. Section 4 shows the main results obtained, while section 5 presents the discussion of these results. Section 6 addresses the threats to the validity of this study, and Section 7 presents the conclusions and outlines future work. II. BACKGROUND In this section we present a review of the main conceptual elements related to the aspects of the role of PO that are the focus of this work, as obtained from the literature. A. Definition of the role The role of the PO in a Scrum team is well documented in the technical literature, and is clearly coincident in terms of its functions and its main responsibilities. Schwaber and Sutherland describe it as responsible for maximizing the value of the product resulting from work of the development team, and as the sole person responsible for managing the backlog of the product under development [3]. Layton, on the other hand, characterizes this role indicating that its primary job is to take care of the business side of the project. The PO is responsible for maximizing the product value by delivering the return on investment (ROI) to the organization. The PO is only one person, not a committee, and is a full-time, dedicated member of the Scrum team. They don’t literally own the product, but they take ownership of the business-side duties, representing the stakeholders and customers [4]. B. The election of the PO In Rubin's opinion [5], the role of the PO is a combination of authority and responsibilities that have historically been found in several traditional roles. In its most comprehensive expression, the PO incorporates elements of the roles of

ISBN: 1-60132-489-8, CSREA Press ©

114

Int'l Conf. Software Eng. Research and Practice | SERP'18 |

product manager, product marketer, project manager, business analyst, and acceptance tester. To this author, as well as to Larman and Voode [6], exactly who should be the PO depends on the type of development effort and the specific organization, and distinguish three main situations: internal development, commercial development, and outsourced development (or subcontracted): • Internal development: the PO must be an empowered person from the business area benefiting from the solution under development. • Commercial development: the PO must be a member of the organization acting as the voice of the actual customers and users. Frequently, this person comes from the areas of product management or product marketing. • Outsourced development: the PO must be a person from the company that is paying for the solution (the one receiving the system under development) and receiving the benefits. Moreira, on the other hand, considers that, in general, anyone who has played a role in which he has worked with clients / users to gather the needs and then work with the teams to create products or solutions that meet those needs, can become an effective PO [7]. C. Responsibilities of the PO In the literature about Scrum there are multiple activities in which the PO must participate, and the responsibilities assigned to the person that perform this role are well stated. Rubin mentions the following responsabilities: participate in planning (product-, release-, and sprint-planning activities), grooming of the product backlog, define acceptance criteria for each element of the product backlog as well as verify that these criteria are met, and collaborate with the development team and with the stakeholders of the project [8]. Green considers that one of the most critical duties of a PO is clarifying the needs of the customer, developing a backlog of product features to work on, and writing clear user stories that communicate the full expectations for the product. The PO is also responsible to be in regular communication with the customer, to make sure the stories they're writing and the backlog they're grooming meet current expectations [9]. Layton also addresses the responsibilities of a PO and mentions, among others, the following: setting the goals and vision for the product, including writing the vision statement, creating and maintaining the product roadmap, which is a broad view of the scope of the product and the initial product backlog, setting release and sprint goals, determining which product backlog items go into the next sprint, and being available throughout the day to work directly with development team members, thereby increasing efficiency through clear and immediate communication [10].

D. Personal Characteristics of the PO In relation to the personal characteristics of who plays the role of PO Rubin says that, although there are many characteristics that should be sought in a good PO, these can be grouped into four categories: domain skills, personal skills, decisions making, and accountability [8]. Resnick, Bjork, and de la Masa mention the following four characteristics of a good PO: deep knowledge of the domain, strong verbal and written communication skills, presence, and empowerment [11]. To Pichler, the main characteristics that a good PO should have are the following: visionary and doer, leader and team player, communicator and negotiator, empowered and committed, and available and qualified [1]. The characteristics and abilities mentioned by the preceding authors, except for the skills or deep knowledge of the domain, are often mentioned in the literature as examples of "soft skills". In a study reported by Matturro and colleagues, communication skills, customer orientation and teamwork are the three soft skills that appear as the most valued for a PO, from the perspectives of the Scrum Masters, team members, and POs interviewed [12]. III. METHODOLOGICAL DESIGN The methodological design of this study consisted of: the definition of the research questions, the selection of the key informants, the elaboration of the data collection instruments, their application in semi-structured personal interviews, the critical reading and coding of the answers obtained, and the analysis and interpretation of the data obtained. A. Research Questions The role of PO is key in a Scrum team and the choice of the right person has a relevant impact on the way the team works, and on the correct application of the methodology. Therefore, in relation to the choice of the PO, the research question posed was: • RQ1: What are the procedures and criteria used to select the person who will act as PO, and who participate in their election? Of the multiple responsibilities associated with the role of PO outlined in the previous section, for this study the interest was particularly focused on the degree of participation of the PO in the different Scrum ceremonias in the field. Thus, the second research question posed was: • RQ2: In which Scrum ceremonies does the PO usually participate, and at what frequency? Finally, regarding the personal characteristics desired for a good PO this study focused on the so-called soft skills and, on this, the third research question posed was: • RQ3: What are the soft skills considered most valuable in a PO?

ISBN: 1-60132-489-8, CSREA Press ©

Int'l Conf. Software Eng. Research and Practice | SERP'18 |

B. Selection of key informants To contact key informants who could give their vision on the aspects of the role of PO that are the focus of this work, nine software development companies based in Montevideo were contacted on the basis that they were using Scrum as a methodology to manage their development projects. Four companies finally agreed to participate in the study, facilitating contacts with members of the organization or with external people who are performing or have played the role of PO in a software development project. C. Questionaire for the interview To conduct the interviews to the selected key informants in an efficient way, a questionnaire was prepared, organized in the following sections: Demographic questions (confidential), Description of the project in which you participate or have participated as a PO, Description of the project team, Questions about the Choice of the PO, Questions about Training, Questions about your participation in the project, Questions about your relationship with users and with the team, Questions about the most valued soft skills in a PO, and Questions about difficulties found and lessons learned during the role performance. D. Data collection Data collection was carried out through semi-structured interviews with key informants, following the guidelines established in the questionnaires. The process of data collection was as follows: • Coordinate interviews with the key informants of the participating companies, paying special attention that they meet the criteria of being or having played the role of PO. • During each interview, a questionnaire was used to ask about the aspects to be surveyed, and clarification or expansion is requested if necessary. • Each interview was recorded, with the authorization of the interviewee, to have a later reference to extract the answers to the questions asked. E. Coding of responses After each interview, we transcribed what was stated by the interviewee, while verifying whether the answers obtained provided the necessary information to answer the research questions posed. In case of detecting any inconsistency between answers or absence of an adequate response, the corresponding key informant was interviewed again to complete the necessary information. The procedure to extract and classify the answers was done based on the process of open coding proposed in [13]. Open coding is a technique to perform the analysis of answers in interviews with open questions. The transcripts were coded after knowing all the answers of the respondents. The codes are

115

labels to identify categories, that is, they describe a segment of text where, in this case, the texts correspond to the transcripts of what was stated in the interviews. For example, in relation to choosing the PO for a new project, one interviewee stated "I was a developer in and when this project came out they chose me, I think because in previous projects I defined user stories and to some tasks that would be a product owner.". In this case, the code assigned as the main reason for election was "Experience in the role". On the other hand, another interviewee said "I had worked on a project for a bank; so I think that as per business unit, it must have seemed like I could take advantage of those financial knowledge for this new project.". In this case, the assigned code was "Domain knowledge ". IV. RESULTS In synthesis, six key informants from the four participating companies that have performed or were performing the PO role at the time of the survey were interviewed. The interviews had an average duration of 50 minutes and, when there was a need to re-interview, these second activities averaged 25 minutes. With the statements obtained in the interviews, the following sub-sections present the answers to the research questions posed for this study. A. About choosing the PO In relation to the procedures for choosing the PO (research question RQ1) the interviewees found it difficult to answer in a concrete way, since none of them said they had participated in a formal process, beyond a personal interview in their own company. However, they highlighted that, in their personal cases, people with previous experience in the performance of the role (three of six cases) or with knowledge of the business (two of six cases) were sought as main reasons for the appointment. On the other hand, from the interviews obtained, it is observed that, in the cases of subcontracted projects, the evidence indicates that the recommendation that the PO is someone that belongs to the customer's business area that will benefit is not usually followed. with the developed product, as discussed in Section 2. With respect to those involved in choosing the PO, in all cases it was mentioned that a regular participant is the manager or head of the area in which the PO candidate works. In addition, both in the cases of internal development projects, and in half of the outsourced type, "the client" (understood as the business area that will benefit from the developed product) also has a direct participation in the election of the PO. Table I shows a summary of the answers obtained from the interviewees regarding the aspects investigated related to the election of the PO of a project.

ISBN: 1-60132-489-8, CSREA Press ©

116

Int'l Conf. Software Eng. Research and Practice | SERP'18 |

TABLE I.

ASPECTS RELATED TO THE ELECTION OF THE PO

Interviewee

Project type

Belongs to business area

Andrés

Subcontracted

No

Juan

Subcontracted

No

Julieta

Subcontracted

No

Mateo

Subcontracted

No

Nicolás

Internal

Yes

Martín

Internal

Yes

Main reason of choice Experience in the role Experience in the role Domain knowledge Personal request Experience in the role Domain knowledge

Customer participat. No

From the answers of the interviewees, the unanimity of opinions shows that communication skills, teamwork and customer orientation are the three soft skills that a good PO should exhibit.

No Yes

Table III shows the soft skills that were mentioned by at least half of the interviewees.

Yes Yes Yes

TABLE III.

B. About the Responsibilities of the PO The research question RQ2 refers to the participation of POs in the various Scrum ceremonies throughout the execution of a project. According to the answers given by the interviewees regarding their participation in the ceremonies and the frequency with which they attend, there are two ceremonies in which they are shown "always". These are the sprint planning meetings, and the sprint review meetings. A third type of meeting, the product backlog grooming meeting, appears as "almost always" assistance. In general, the PO is in charge of guiding these three ceremonies, especially the sprint planning meetings that, according to the literature, turn out to be the most important ceremony for the PO and for the project itself since it is the instance where it is used the primordial artifact that manages the PO - the product backlog - and where the team and the PO negotiate which work will be extracted for the next sprint. Table II shows the degree of participation of the intervieweed POs in the different Scrum ceremonies. TABLE II.

Ceremony Daily stand up Sprint planning Sprint revisión Sprint retrospective Grooming

DEGREE OF PARTICIPATION OF POS IN SCRUM CEREMONIES

Never

Rarely

Sometimes

Many times

Always

2

1

1

2 6 6

2 1

1

C. About the characteristics of the PO Regarding the personal characteristics valued in those who play the role of PO, the research question RQ3 points particularly to the so-called soft skills.

1

2 5

Regarding the backlog grooming meeting, although they are not defined in the canonical methodological proposal as a typical Scrum ceremony, in practice the POs usually perform them when the environment requires it, either in response to the needs of the team for greater clarity in user stories and their estimates, or when the customer requires them when creating new user stories or refining existing ones.

SOFT SKILLS MOST VALUED IN POS

Soft skills Communication skills, Teamwork, Customer orientation Commitment and responsibility, Planning skills, Leadership, Analytical and problem solving Interpersonal skills, Initiative and proactivity, Motivation Results orientation

Mentions 6 5 4 3

Communication skills, both oral and written, are fundamental in a PO because his/her role involves dialogue and being in permanent contact with both stakeholders and the team, and also, for example, writing the user stories that feed the product backlog of the project. V. DISCUSSION Although this is an exploratory study for a small group of scenarios, the results obtained establish certain points for a more in-depth discussion about how the role of the PO is performed in practice. Requirements engineering is one of the critical areas in software development and the Scrum methodology proposes the role of PO as a bridge to solve many of the challenges related to requirements and to the interaction with project stakeholders. Given the importance of the role of PO in the context of the Scrum methodology, it is striking that organizations do not have a formal mechanism to select the person who has to perform the role. On the other hand, there is no agreement on the main reason for the selection of the PO. Most of the answers are oriented to the experience of the person in the role and, to a lesser extent, the knowledge of the business. This leads us to hypothesize that the allocation of the PO may be more influenced by internal or hierarchical experience reasons (the project manager assigns it for trust in their performance). In an agile context, product specification and validation is not done at one time, but in small cycles and with continuous face-to-face interaction between the PO and the team. It could be said that the software requirements arise from a conversation and continuous discovery of the whole team,

ISBN: 1-60132-489-8, CSREA Press ©

Int'l Conf. Software Eng. Research and Practice | SERP'18 |

117

facilitated by the PO, rather than only from instances of formal validation.

discuss the differences that might arise and arrive at unified results.

The results of the study show a consensus on that the PO always participates in the sprint planning and in the sprint review ceremonies. However, in the team's daily interaction ceremony, the daily stand up meeting, we observed less participation of this role.

The criterion of external dependability could not be verified because there are no other researchers (beyond the three authors) who have collected data from the same interviewees and in similar circumstances (same environment and period).

Regarding the sprint retrospective meeting, where the team reflects on its work and proposes improvements, we also observe relatively low participation of the PO. These results raise the hypothesis that the PO is seen as a link figure, but not effectively part of the core of the team. This could be a substantial difference in terms of the role as should be performed in the Scrum methodology, where the PO collaborates continuously with the team and is an integral part of its daily activity. With respect to the soft skills required for the PO, clear results emerge from the study that prioritize communication skills, teamwork and customer orientation. These results confirm key aspects mentioned in the literature on the role of the PO in Scrum, as explained in Section II. We also observe that the interviewees explicitly identify these skills as necessary (must have) in the profile of the person who plays the role. This emphasis on the person's skill profile could be an indirect mechanism to minimize the impact of other shortcomings in the selection process. For example, the lack of participation in the daily ceremonies can be compensated with strong communication skills, or less specific knowledge of the business could be improved with customer orientation skills. VI. THREATS TO VALIDITY According to Merriam and Tisdell, because qualitative research is based on assumptions about reality different from those of quantitative research, the standards for rigor in qualitative research necessarily differ from those of quantitative research [14]. Fernández Sampieri and colleagues mention that the main authors on the subject have formulated a series of criteria to establish a certain "parallel" with the quantitative reliability, validity and objectivity, which have been accepted by the majority of researchers (although they recognize that they are rejected by others). These substitutive criteria are called dependability, credibility, and transferability [13]. The dependability criterion is defined as the degree to which different researchers who collect similar data in the field and perform the same analyzes, generate equivalent results. Fernández Sampieri and colleagues, citing Franklin and Ballau, mention two types of dependability: a) internal (degree in which different researchers, at least two, generate similar categories with the same data) and b) external (degree in that different researchers generate similar categories in the same environment and period, but each one collecting its own data). In order to comply with the criterion of internal dependability, the analysis of the collected data was done independently by two researchers, in order to identify and

Regarding the criterion of credibility, refers to whether the researcher has captured the full and deep meaning of the experiences of the participants, particularly those related to the approach of the problem. In this sense, having recorded the interviews and disposing of the audios has allowed us to listen again to what was stated by each interviewee, in order to ensure that we did not omit anything important. In addition, efforts were made to avoid investigator biases in ignoring or minimizing data that does not support their beliefs and conclusions. Finally, regarding the criterion of transferability (or applicability of the results), this does not refer to generalizing the results to a broader population, since this is not a purpose of a qualitative study, but if part of those results, or its essence, can be applied in other contexts. In relation to this criterion, of the six key informants interviewed, four are or were POs of outsourced projects, so that these results of the research, in principle, can not be applied to POs of commercial or internal development projects. VII. CONCLUSIONS AND FURTHER WORKS An exploratory study was carried out on the role of PO in industrial practice. In the first place, characteristics considered key to the performance of the role were collected from the literature accepted by the Scrum community. Secondly, we collected information through semi-structured interviews to six key informants who played the role of PO in different companies. The transcripts of the interviews were coded using the open coding technique to answer the research questions. Based on the results obtained in this exploratory study, there are some differences between the canonical proposal of the Scrum methodology regarding the performance of the role of the PO and its responsibilities, and what happens in industrial practice. The methodological recommendation that the PO is a representative of the business area that will benefit from the product under development is not fulfilled, at least in the cases of the interviewees. Business knowledge is not used as a selection criterion for the PO in most cases; however, experience in the role is frequently mentioned as the main criterion. On the other hand, since the PO is a member of the team, he should participate in all the ceremonies. However, the evidence obtained indicates that there is only unanimity of participation in the planning and review of each sprint, leaving, at the other end, the daily stand up meeting and the retrospective meetings with sporadic participation. Regarding the skills sought in the person who plays the role, we ask specifically for soft skills. There is a significant

ISBN: 1-60132-489-8, CSREA Press ©

118

Int'l Conf. Software Eng. Research and Practice | SERP'18 |

consensus on the ability to communicate, teamwork and customer orientation, required in the person who plays the role of PO. This suggests that one way to compensate for the aforementioned shortcomings of business knowledge and participation may be through the explicit search for people with these soft skills. These findings raise new questions, which will be the subject of research in future work. In the case of POs of outsourced projects that do not belong to the business area that will benefit from the development, it is interesting to find out how they acquire the knowledge of the business to be able to act effectively as "the voice of the customer". As for the ceremonies in which there is little participation on the part of the POs, it is considered interesting to know the reasons why; for example, if they do not find value in these ceremonies, or if it is due to lack of time because the POs are participating in other projects simultaneously, or some other reasons. REFERENCES [1] [2]

[3] [4] [5] [6] [7] [8] [9] [10] [11] [12]

[13] [14]

R. Pichler, Agile product management with Scrum. Creating products that customers love. Boston, MA: Pearson Education, 2010. H. Sverrisdottira, H. Ingasonb, y H. Jonassonc, “The role of the product owner in Scrum. Comparison between theory and practices”, Procedia Soc. Behav. Sci., no 119, pp. 257–267, 2014. K. Schwaber y J. Sutherland, The Scrum guide. Scrum Inc., 2017. M. C. Layton, Scrum for Dummies. Hoboken, NJ: John Wiley & Sons, 2015. K. Rubin, Essential Scrum. A practical guide to the most popular agile process. Upper Saddle River, NJ: Pearson, 2013. C. Larman and B. Vodde, Large-Scale Scrum. More with LeSS. Boston, MA: Addison-Wesley, 2017. M. Moreira, Being Agile. Your roadmap to successful adoption of Agile. New York: NY: Apress, 2013. K. Rubin, Essential Scrum. A practical guide to the most popular agile process. Upper Saddle River, NJ: Pearson, 2013. D. Green, Scrum. Novice to Ninja. Collingwood: Sitepoint, 2016. M. C. Layton, Scrum for Dummies. Hoboken, NJ: John Wiley & Sons, 2015. S. Resnick, A. Bjork, and M. de la Maza, Professional Scrum with Team Foundation Server 2010. Indianapolis, IN: Wiley Publishing, 2011. G. Matturro, C. Fontán, and F. Raschetti, “Soft Skills in Scrum Teams. A survey of the most valued to have by Product Owners and Scrum Masters”, 27th International Conference on Software Engineering and Knowledge Engineering (SEKE), Pittsburgh, 2015, pp. 42–45. R. Hernández Sampieri, C. Fernández, and P. Baptista, Metodología de la investi-gación, 6a ed. México: McGraw Hill, 2014. S. B. Merriam,and E. J. Tisdell, Qualitative research: a guide to design and imple-mentation, 4a ed. Hoboken, NJ: John Wiley & Sons, 2016.

ISBN: 1-60132-489-8, CSREA Press ©

Int'l Conf. Software Eng. Research and Practice | SERP'18 |

119

Requirements mining in agile software development Micheal Tuape, MSc. College of Computing and Information Sciences Makerere University Kampala Uganda [email protected]

Abstract - One of the most compelling characteristics of the agile approach is its ability to reduce the costs of change through the software process. The question however remains if this is really very possible given that change is expensive particularly if it is uncontrolled or poorly managed it could mess up the entire project. This work introduces a consolidated approach that concentrates on requirements and related information so as to limit and mitigate the effects of change in an agile process based on the identified challenges associated to agile development as identified.

Key words Agile software development, Requirements mining, System requirements and User requirements

1. Introduction Contrasting with traditional development methods, agile methods are marked by extensive collaboration,[1][2]. Although claimed to be beneficial, a bigger section of the software development community are still unfamiliar with the role of the Requirements Engineering(RE) practices in agile methods[3].In agile methodology, the client works closely with the development team to appropriately respond to change and to constantly validate the product being delivered[4][5]. The process is dynamic and open to changes in areas that can be identified at any given moment. Agile methods also reduce risks in global software development (GSD) and diminish the need for

coordination efforts, which result in an increase of productivity [6].It is also evidenced in the literature that projects that adopt agile methods exhibiting higher productivity [7], less rework [8], and more efficient defect fixing rates [9]. The FDA General Principles of Software Validation (GPSV) [10] require manufacturers to explicitly document requirements prior to implementation and test procedures [8]. This would appear to be a seeming barrier to adopting agile practices as one of the fundamental principles of the agile Manifesto [11] is ―working software over comprehensive documentation‖. Combined with this, another central principle of agile software development is that requirements are fluid, and changes in requirements can be straightforwardly accommodated and even welcomed throughout a development project [9]. Without fully refining requirements prior to the beginning of a project, the process of traceability can be difficult and traceability between requirements and all stages of development is required by the FDA [13]. Moreover, users of agile practice experience problems of how to add traceability in agile methods such as Scrum and XP[8]. Some organizations, customers and regulations require that tracing information is saved during development [10]. Traceability information regarding proceedings of code changes, test results, implemented requirements and so on. A number of researchers acknowledge that, traditional traceability is often heavy and will create a lot of overhead [5][7][11]. In traditional software development the processes are longer (time from the initial problem description to the final product) and the entire process involves several teams [10]. This is in contrast to the short and

ISBN: 1-60132-489-8, CSREA Press ©

120

Int'l Conf. Software Eng. Research and Practice | SERP'18 | flexible ways in which the agile development methods are structured to operate. This leaves us with a question that many researchers ask [1][6]and[18], however it is necessary to solve the dilemma as to whether an agile team needs traceability or is it only an avoidable overhead for the projects?

2. Agile development The term agile was coined by Kent Beck and 16 others [12] who are signatory to the agile manifesto in which they made the proclamation that: ―We are uncovering better ways of developing software by doing it and helping others do it [9]. Through this work we have come to value: Individuals and interactions over processes and tools Working software over comprehensive documentation Customer collaboration over contract negotiation Responding to change over following a plan That is, while there is value in the items on the right, we value the items on the left more‖ [12]. Agile methodologies describe a set of processes in which both the requirements and the delivered solution evolve incrementally through a series of short iterations. It advocates for adaptive planning, evolutionary development, early delivery and continual improvement, it also encourages rapid and flexible response to change. It is also premised on 12 guiding principles for Agile Software Development should be based: 1.

Customer satisfaction by early and continuous delivery of valuable software 2. Welcome changing requirements, even in late development 3. Working software is delivered frequently (weeks rather than months) 4. Close, daily cooperation between business people and developers 5. Projects are built around motivated individuals, who should be trusted 6. Face-to-face conversation is the best form of communication (co-location) 7. Working software is the primary measure of progress 8. Sustainable development, able to maintain a constant pace 9. Continuous attention to technical excellence and good design 10. Simplicity—the art of maximizing the amount of work not done is essential 11. Best architectures, requirements, and designs emerge from self-organizing teams 12. Regularly, the team reflects on how to become more effective, and adjusts accordingly

In light of the above principles and this being a highlight of the agile manifesto, it is important to note that RE practices such as observations, interviews, customer involvement, requirements prioritization, modeling, and documentation, workshops and strong team collaboration in an iteration based agile development, have also been suggested to be used like in traditional methods [3][1]. For example, if agile methods are used to build a healthcare device, then the approval process for that device will require a demonstration that the device is safe[7]. In the United States of America for example the Federal Drug Administration (FDA)[10], which is responsible for approving such devices, specifically requires traceability between requirements, design, code, and test cases [20]. Agile traceability goals are similar to those found in traditional development projects [3][14], but differ in the way in which they are practiced and achieved. Traceability desires in agile projects are not very different from the traditional project environments in numerous ways [14][15]. Like Brad Appleton in his works, he outlines reasons for tracing in agile development project [16], as listed below: 1. Change impact analysis – to evaluate how the proposed change will influence the existing system, in order to accommodate tasks such as communication, team coordination, and effort estimation. 2. Process compliance – to guarantee that any procedural processes such as reviews and tests have been conducted. 3. Product conformance – to guarantee that the delivered product must meet the customers‘ needs i.e. realizing the requirements. This is regularly referred to as requirements validation. 4. Project accountability – to provide assurance that the solution does not include goldplating (i.e. excess functionality), and that all changes match a requested feature request. 5. Baseline reproducibility – to support configuration of baselines, so that different versions can be reproduced. 6. Organizational learning – to document challenges and work done in order to transfer knowledge to new team members and other projects. It is important to note that during traditional software development the same practice are adopted and the only difference is that in the later we internalize them to depth for avoidance of any doubt, as a rule of thumb and as good practice in project management. The same reasons outlined above also

ISBN: 1-60132-489-8, CSREA Press ©

Int'l Conf. Software Eng. Research and Practice | SERP'18 |

2.1

Challenges in agile development practices

121 Agile practice has grown to great prominence especially in North America, Europe and Asia [3] and has greatly improved the practice of development of software [2], however it continues to face challenges and like [3] put it this could be the explanation to why the growth and prominence of agile software development practice has been experienced largely in North America Europe and to a less extent in Asia.

Figure 1 Authorship geographic distribution of the selected studies on agile software development practice. [3]

The recurrent meetings breed informality among stakeholders, which aids in the evolution of the requirements, crops in frequency of communication which depend on the availability and willingness of team members [3].Failure to identify customer representatives can lead to disagreement and differing viewpoints on a range of issues. Focus on business value only by the customer is a problem [7][1]; also, tolerating the prioritization of requirements to be done by the customer should not be acceptable like [3] states, because there can be other factors to consider, a case in point, the software architecture might not be scalable [1].With prototyping, some researchers think there is no wrong requirement other than those that are waiting to be discovered, like [18] asserts. Prototyping promotes quicker feedback and enhances customer expectation of the software being produced. However, this seems to be problematic in the case of an exceptional increase in client requirements [3][6]. In addition, the development of multiple high-fidelity prototypes can be prohibitively expensive and utmost often leads to

conflict and cancellation of the project due to unwillingness of the customer to pay for the unplanned expenses [3].

3. Requirement mining As a remedy to the risk of failing in some of the principals proposed by the agilists in the agile manifesto and to remedy risk of project management we propose the requirement mining process as an intervention in the original agile process as follows; First, a domain expert is proposed to work for the client in place of the surrogate who may not understand software development, on assumption that the domain expert understands software development and the clients business. Secondly, the appointed domain expert works with the development team and

ISBN: 1-60132-489-8, CSREA Press ©

122

Int'l Conf. Software Eng. Research and Practice | SERP'18 | generates user requirements and system requirements. System requirements are characteristics that components of a product must possess for the product so as to satisfy its user requirements. It describes the characteristics of products‘ internal components, or more generally of the system to be developed. They act as constraints on the design of these components. While the user requirements are externally visible characteristics to be possessed by a product, specifying user requirements is relatively straightforward. They are expressions of what a stakeholder of a product wants to perceive about the product externally, such as what it will do, what it will cost, what it will weigh, and what its interfaces will be. Thirdly, the information generated through

Requirements mining

requirements mining process gives us precise and substantial information for test case generation,

anomaly detection and traceability though the alignment of the user requirement to the system requirement.

Traceability, test case generation & Anomaly detection

Proposed Requirements mining

Assign domain expert

Design

Plan

Code

Refactoring Test Release

Figure 2 Proposed requirements mining in agile software development and how it interacts with the original agile development processes

When system requirements are said to fulfill user requirements, this means that if change is brought in we can be able to track the cause and point of change.

The alignment of both the context and content of user requirements (what external stakeholders can perceive) and system requirements (constraints on

ISBN: 1-60132-489-8, CSREA Press ©

Int'l Conf. Software Eng. Research and Practice | SERP'18 | characteristics of design components) are so radically different that it is, in general, impossible for stakeholders (users, buyers, maintainers and so on) to review the cross references between user and product requirements and to agree that these links are correct. This proposed requirements mining is cognizant of the principals proposed by the proponents of the agilists, in the same faith we consider the fact that a knowledgeable person as a domain expert will save the entire process substantial time and extra costs by mining the requirements and related information which will be used for traceability, test case generation and anomaly detection. It is important to note that most of the problems experienced in agile development can be settled by the three activities and the solutions coined (see figure 2) to settle these defects can fit in traceability, test case generation and anomaly detection.

4. Related work Among the highlighted problems, system quality both internal and external, i.e. maintainability, testability usability, and security [4] is considered as a major challenge for agile methods [5] and can be the reason for massive lapse and rework. Novel and slight artifacts such as agile use cases, agile loose cases and agile choose cases for NRF modeling by using NFRs Modeling for agile processes have been proposed by [17][18]; acceptance test-driven development (ATDD) has been proposed [6], it combines features of both agile and traditional RE practices, with detailed documentation with iterative frequent communication; suggestion of surrogates customers to play the role of a proxy customer (for example, product owners) [7]. Other organizations implement the practice of ‗‗onsite developer‘‘ by moving a developer representative to the customer site [5]; we also see the adoption of refactoring [5] as a measure to alienate the changes that transpirein the later stages that would add to the cost and become cumbersome to deal with. Refactoring is an ongoing activity among agile teams, which keeps on changing the code; agile methods deal with the validation of requirements through continuous prioritization [7][19]. This is done so that the user story is demonstrated to the product owner and customer representatives at the end of the iteration.

123

5. Discussion In this work, there is deliberate focus onsettling the gaps in agile development, although [1][2][3][4][5] [6][17][18] and [19] have come up with different solutions, the solutions are spread although the agile software development process. This proposal consolidates all these activities as responsibility of the domain expert who is introduced in place of the surrogate customer see figure 2. Among other challenges this proposal mitigates the problems associated with conflict and disagreement [3], this work draws attention to the effect that the requirements that are well aligned reduce the possibility of refactoring and struggling with the difficulty of change management at any stage of the software development. Although there is considerable solutions that have been coined to solve the challenges associated to agile methods [3], a lot more emphasis requires to be put into working around making requirements to make them extremely clear and most especially rich enough to avoid rework particularly those that arise out the changes of the client‘s needs. Even if in the agile manifesto continuous change of requirements is tolerated [12][19] and [20], it is paramount to note that at a certain point, time and costs may change due to change of requirements and hence conflict which will affect the ultimate project [4]. This same information collected for requirements is very important for traceability which is often looked at as a burdensome, time wasting and unnecessary process by most agilists in practice [7] yet, in some situations during agile practice, traceability is ever more adopted especially while building safety-critical, larger, and distributed software development projects, it is often in these environments that the benefits of traceability outweigh its costs [2][16] and [20].

6. Conclusion and future work This paper reviews the practices in agile software development and comes out with the challenges and without compromising the principals of agile practice suggesting an intervention that anchors into the agile process figure 2. In this intervention, emphasis is put on limiting the effects of the challenges identified [3] and use the data and activities herein to manage any eventualities that arise out of change.

6.1.

Limitations

ISBN: 1-60132-489-8, CSREA Press ©

124

Int'l Conf. Software Eng. Research and Practice | SERP'18 | Because change requires a modification to the architectural design of the software, the design and construction of three new components, modifications to another five components, the design of new tests, and so on, costs and time will escalate. The work in this proposal fits best with the all agile methodology, however it may not fit for large enterprise projects as it is a little complicated to manage change in large projects.

6.2.

Future work

There is need for more studies especially in relation to the effect of change at any time in the development process; more studies should be undertaken to understand the relationship between continuous change, cost, acceptance and cost of maintenance of enterprise software development under agile methodologies. This proposal seeks to mitigate effects of change and since change is expensive particularly if it is uncontrolled or poorly managed, a lot more effort should be put towards understanding agility and change in enterprise projects.

References

[1]. Dingsøyr, Torgeir, and Casper Lassenius. "Emerging themes in agile software development: Introduction to the special section on continuous value delivery." Information and Software Technology 77 (2016): 56-60. [2]. Cleland-Huang, Jane. "Traceability in agile projects." Software and Systems Traceability. Springer, London, 2012. 265275. [3]. Inayat, I., et al. A systematic literature review on agile requirements engineering practices and challenges. Computers in Human Behavior (2014), http://dx.doi.org/10.1016/j.chb.2014.10.046 Accessed February 2018 [4]. Ernst, N. a., Borgida, A.,Jureta, I. J., &Mylopoulos, J. Agile requirements engineering via paraconsistent reasoning. Information Systems,(2013) 1–17. http://dx.doi.org/10.1016/j.is.2013.05.008. [5]. Ramesh, B., Baskerville, R., & Cao, L.. Agile requirements engineering practices and challenges: an empirical study. Information Systems Journal, 20(5),(2010)

449–480. http://dx.doi.org/10.1111/j.13652575.2007.00259.x. [6]. Berry, D. M. The Inevitable Pain of Software Development, Including of Extreme Programming, Caused by Requirements Volatility University of Waterloo. Proceedings of the International Workshop on Time Constrained Requirements Engineering (2002). 1–11. [7]. Mc Hugh, Martin, Fergal McCaffery, and Valentine Casey. "Barriers to using agile software development practices within the medical device industry." (2017). [8]. Vogel, D., Agile Methods:Most are not ready for prime time in medical device software design and development. DesignFax Online, 2006. [9]. Boehm, B. and R. Turner, Balancing Agility and Discipline: A Guide for the Perplexed. 2003: Addison‐Wesley. [10]. FDA, General Principles of Software Validation: Final Guidance for Industry and FDA Staff. 2002, Centre for Devices and Radiological Health. [11]. Fowler, M. and J. Highsmith, The Agile Manifesto, in Software Development: The Lifecycle starts here. 2001. [12]. Beck, K., et al., ―Manifesto for Agile Software Development,‖ www.agilemanifesto.org Accessed February 2018 [13]. Weiguo, L. and F. Xiaomin, Software Development Practice for FDA ‐ Compliant Medical Devices, in International Joint Conference on Computational Sciences and Optimization, 2009. . 2009: Sanya, Hainan. p. 388 ‐ 390. [14]. Gotel, O., Finkelstein, A.: An analysis of the requirements traceability problem. In: Proceedings of the International Conference on Requirements Engineering (RE), IEEE Computer Society, Springs, Colorado (1994). 94–102. [15]. Pinheiro, F.A.C.: Requirements traceability. In: Sampaio do Prado Leite, J.C., Doorn J.H. (eds.),Perspectives on Software Requirements, vol. 753, pp. 93– 113. Springer, Berlin (2003) [16]. Appleton, B. ACME Blog: Traceability and TRUST-ability. http://bradapp.blogspot.com/2005/03/traceab ility-and-trust-ability.html (2005,). Accessed March 2018 [17]. Farid, W. M., & Mitropoulos, F. J. NORMATIC: A visual tool for modeling Non-Functional Requirements in agile processes. 2012 Proceedings of IEEE Southeastcon, (978),(2012a). 1–8. http://dx.doi.org/10.1109/SECon.2012.6196 989.

ISBN: 1-60132-489-8, CSREA Press ©

Int'l Conf. Software Eng. Research and Practice | SERP'18 | [18]. Farid, W. M., & Mitropoulos, F. J.. Novel lightweight engineering artifacts for modeling non-functional requirements in agile processes. 2012 Proceedings of IEEE Southeastcon, (2012b)1–7. http://dx.doi.org/10.1109/SECon.2012.6196 988. [19]. Gupta, Piyansh. Applying Agile Lean to Global Software Development. (2017). [20]. Cleland-Huang, J., Berenbach, B., Clark, S., Settimi, R., Romanova, E.: Best practices for automated traceability. IEEE Comp. 40(6), 27–35 (2007). ISSN:00189162.

ISBN: 1-60132-489-8, CSREA Press ©

125

126

Int'l Conf. Software Eng. Research and Practice | SERP'18 |

ISBN: 1-60132-489-8, CSREA Press ©

Int'l Conf. Software Eng. Research and Practice | SERP'18 |

SESSION PROGRAMMING ISSUES AND ALGORITHMS + SOFTWARE ARCHITECTURES AND SOFTWARE ENGINEERING Chair(s) TBA

ISBN: 1-60132-489-8, CSREA Press ©

127

128

Int'l Conf. Software Eng. Research and Practice | SERP'18 |

ISBN: 1-60132-489-8, CSREA Press ©

Int'l Conf. Software Eng. Research and Practice | SERP'18 |

129

Improving Programming Language Transformation Steven A. O’Hara Eagle Legacy Modernization, LLC, Austin, Texas, USA

Abstract - Modernizing legacy computer programs is challenging. This paper introduces a generalized framework for language transformation with three main elements. First, target languages are shielded from the transformation process by a collection of interfaces such as “create an if statement.” Transforming from Delphi is the same whether the target is Python, Java, C# or some other language that implements the interfaces. Second, the framework ensures synchronization between source language grammars and transformation tools, so changes to a grammar cannot be made without adjusting the impacted tools. This allows large scale transformation projects where both the grammars and the tools are under concurrent development. Third, source code generation is accomplished by adding output formatting annotations to the target language grammar. Keywords: Legacy modernization, programming language transformation.

1

Introduction

By some estimates, the global number of computer programming languages is about 565 [1], or about 1,500 including variations [2]. Sources that list the most popular programming languages, such as [3], do not include the most pervasive business programming language, COBOL. Many of these programming languages have been in use for many decades, such as COBOL, Fortran, Natural, RPG, PL/I and others. They suffer from a shortage of skilled software developers, and many applications were built without internet security in mind, potentially exposing vulnerabilities. New languages are introduced on a regular basis and existing languages continue to evolve, but legacy languages rarely become extinct. This proliferation of languages and their variations is not handled well by existing tools, such as [4] that tend to focus on a few specific language pairs, such as COBOL to Java [5]. “There are still hundreds of billions of lines of COBOL code in use today by banks, insurance companies and other organizations, and COBOL is still used somewhere in a large proportion of all business transactions.” [6]

“Some 23 of the world's top 25 retailers, 92 of the top 100 banks, and the 10 largest insurers all entrust core operations to Cobol programs running on IBM mainframes.” [7] This paper deals with three issues, and introduces a new framework to facilitate more efficient, effective software modernization. The first issue is choosing a target language. We recognize that transformation does not have to utilize all the features of each target language. In fact, a relatively small set of features must be provided, such as “add a method to a class,” and that these features are generally shared by the target languages. When considering legacy language transformation, the choice of target programming languages typically includes Java, C#, Python and a few others. Transformation into legacy languages like COBOL is specifically not a part of this effort. There are many efforts to introduce a common intermediate language definition, such as [8], to span multiple source languages. This paper only considers an intermediate interface for target languages. There are also many point-to-point translation systems. Stratego/XT [9], for example, supports strategic term rewriting to perform transformations. Likewise, RascalMPL [10] has tools for term rewriting. Our contention is that the expressiveness of rules is well-suited for small transformation projects, but does not scale up to thousands of programs in dozens of languages. To achieve that scale, we believe that the transformation logic is best expressed in a classic programming language like Java. The second issue is synchronization. In our experience with legacy modernization, grammars are continually being enhanced. Rarely are perfect grammars available that encompass all existing source code. It is normal that small adjustments have to be made to accommodate certain language patterns. By utilizing the work of [11], we are able to represent source grammars in Java code, rather than text files containing Backus-Naur Form (BNF) grammars or similar. Likewise, the transformation tools are also written in Java, within the same environment. With this approach, changes to the grammar will be detected directly by the compiler or, more commonly, the Integrated Development Environment (IDE). The third issue is target language generation. After creating an in-memory representation of the new program, it must be converted to a text file. In our framework, we add Java anno-

ISBN: 1-60132-489-8, CSREA Press ©

130

Int'l Conf. Software Eng. Research and Practice | SERP'18 |

tations1 (such as @NOSPACE, @INDENT, etc.) to the target Program Grammar to control word and line spacing, indentation, etc. Other systems, such as Ekeko/X [12] also use annotations, but we use them mainly for output generation, not for transformation logic. Our experimental results involve three sets of language transformation pairs, all using the same framework.   

BNF to a Program Grammar (in Java) COBOL to Python Delphi to C#

No claim is being made that the transformation process is complete for any of these. Transformation is a complex process and this paper discusses many of those difficulties, and how to help alleviate some of the burden.

1. Transformation Framework

Token List means zero or more occurrences. The Program Grammar in Figure 1 is equivalent to the BNF version shown in Figure 2. We have used these Program Grammars to parse many thousands of programs, and millions of lines, written in COBOL, RPG, Natural, PL/I, C, Java, C#, Python, JavaScript, HTML, CSS and others. All Program Grammars will be opensourced so they can be used and managed by the community (see Section 4.1). public class COBOL_IfStatement extends TokenSequence { public COBOL_Keyword IF = new COBOL_Keyword("IF"); public COBOL_Expression condition; public @OPT COBOL_Keyword THEN = new COBOL_Keyword("THEN"); public TokenList thenActions; public @OPT COBOL_Else elseClause; public @OPT COBOL_Keyword ENDIF = new COBOL_Keyword("END-IF"); public static class COBOL_Else extends TokenSequence { public COBOL_Keyword ELSE = new COBOL_Keyword("ELSE"); public TokenList elseActions; } };

A traditional transformation process is described in detail in [13]. Here are the steps for our process: 





Using a Program Grammar, parse the source program, producing a Program Semantic Tree (PST). A PST is similar to an Abstract Syntax Tree (AST), but includes some semantic information about each token, such as a two-way cross reference between variable definitions and their references. Read the source program PST and generate a second PST in the target language using transformation logic, such as Sections 2.1, 2.2, and 2.2.3. This process is only aware of the source language; all references to the target language are hidden through interfaces. Using a Program Grammar for the target language, read the new PST and generate a source file.

Figure 1. COBOL "IF" Statement Program Grammar COBOL_IfStatement ::= "IF" condition ["THEN"] thenActions [elseClause] ["ENDIF"]; condition ::= COBOL_Expression; thenActions ::= COBOL_Statement*; elseClause ::= "ELSE" elseActions; elseActions ::= COBOL_Statement*;

Figure 2. COBOL "IF" Statement in BNF

1.2. Transformation Logic Figure 3 shows the logic used to transform one variation of the PERFORM verb in COBOL. The logic is specific to COBOL, but independent of the variations of COBOL (e.g., some versions use fixed columns for labels and comments; some are freeformat). if (token instanceof COBOL_PerformVarying) { COBOL_PerformVarying varying = (COBOL_PerformVarying) token; // Collect all the inline statements ArrayList statements = new ArrayList(); for (COBOL_Statement statement : inline.statements) { AbstractToken oldStatement = statement.getWhich(); Stmt newStatement = trans.transformStatement(oldStatement); statements.add(newStatement); } Stmt action = trans._target.createStatementBlock(statements);

The Parser and the Program Generator are languageindependent. The Transformation Engine depends only on the source language, not the target language. The main benefit of our framework is that the transformation process does not have access to the target language, just the interface layer. There is also just one grammar for each language, whether used for source parsing or target generation.

String loopVar = trans.getFullVariableName(varying.id); Expr initVal = trans.transformExpression(varying.from); Expr incrVal = trans.transformExpression(varying.by); Expr until = trans.transformExpression(varying.until.cond); Expr term = trans._target._createExpression.createNot(until);

1.1. Source Program Grammars Figure 1 shows a sample Program Grammar for the “IF” statement in COBOL. Additional details are available in [11]. The fields in a Token Sequence must appear in order, although fields may be optional, as denoted by the @OPT annotation.

Stmt forStatement = trans._target.createForStatement(loopVar, initVal, term, action, incrVal, varying); return forStatement; }

Figure 3. Transforming “PERFORM” from COBOL

1

Java annotations are a form of syntactic metadata that can be added to source code.

ISBN: 1-60132-489-8, CSREA Press ©

Int'l Conf. Software Eng. Research and Practice | SERP'18 |

131

The Expr (and Stmt) terms are Java generic data types2 that represent Java_Expression, CSharp_Expression, etc. depending on the target language. The trans variable is specific to COBOL, while trans._target is specific to the target interface layer.

public class CSharp_Statement extends TokenChooser { public @CURIOUS("Extra semicolon") PunctuationSemicolon semi; public CSharp_Data data; public CSharp_Class myclass; public CSharp_Enum enumeration; public static class CSharp_StmtBlock extends TokenSequence { public @INDENT PunctuationLeftBrace leftBrace; public @OPT TokenList statements; public @OUTDENT PunctuationRightBrace rightBrace; }

1.3. Target Interface Layer For a language to be a transformation target in this framework, there must be classes set up that implement the following six interfaces3: Transform Target (the main class), Create Program, Create Class, Create Method, Create Statement, and Create Expression. The total count of methods that must be implemented for each language is approximately sixty, as of this writing. Although every method must be implemented for each target language, the implementations are often very similar. public interface Create_Class

public CSharp_BreakStatement breakStatement; public CSharp_ContinueStatement continueStatement; public CSharp_CheckedStatement checkedStatement; (etc. for each statement type)

Figure 5. C# Statement Program Grammar The use of @INDENT means to indent after rendering this token while @OUTDENT means to un-indent before rendering this token. These annotations help make the output program file more readable. Additional annotations include @NOSPACE to prevent spaces before a token, and @NEWLINE to force a new line before a token.

{ public enum CLASS_QUALIFIERS { NONE, OVERRIDES } public AbstractComment addClassComment(Cls cls, String comm); public void addClassData(Cls cls, Stmt dataStmt); public void addMethod(Cls cls, Meth method); public void addConstructor(Cls cls, String className, AbstractExpression[] args); public Cls addInnerClass(PRIVACY privacy, Cls cls, String className, CLASS_QUALIFIERS qual); public void setClassExtends(Cls cls, String extendsClass); public Cls addInnerDataClass(Cls cls, String name, TYPES typ); }

Figure 4. Target Interface for Create Class The interface methods shown in Figure 4 (slightly condensed) must be implemented for each target language. The use of Java generics on the first two lines are crucial for this framework, because they retain the strong typing needed for scalability. With them, the Java target transformation code can be strongly typed, which avoids risky type-casts.

1.5. Macro Pre-Processing Many programming languages support macros and/or include files. Although compilers require access to all of the included files, our system continues processing even when some of those include files are not available. In C/C++ this is exemplified by the #define macro capability. Our system can pre-process the source file(s) and expand macros. Because macros often depend on environment settings, macro expansion also allows project-specific controls to decide which macros to expand and what initial environment settings to use. Other languages that utilize this pre-processor include PL/I, PHP, CMD (DOS), Delphi, and COBOL. The COBOL copy books are especially interesting with the REPLACING clause.

1.4. Target Program Grammars The source Program Grammars and target Program Grammars are actually the same grammar, just used in different ways. The only significant difference is that the formatting annotations are only used when generating program output. In Figure 5, the Token Chooser indicates that any one field or class instance can be used to satisfy that grammar rule. In this example, a statement could be just a semicolon (with a @CURIOUS warning), some data, an inner class, an enumeration, a statement block in braces, or one of the specified C# statements.

2

Generics extend Java's type system to allow a type or method to operate on objects of various types while providing compiletime type safety.

1.6. Comments Many transformation systems either omit comments before transformation, or require comments as part of the grammar, in every place where they are used. In our system, we take a hybrid approach. If the Program Grammar includes comments, they are kept in the in-memory representation of the program. Otherwise, comments are discarded with a brief notification.

2. Experiments Three distinct transformations are detailed in this paper, showing a range of capabilities. The point of this discussion is to describe the framework, not to claim that transformation from languages like COBOL or Delphi is fully implemented.

3

An interface in Java is an abstract type that is used to specify a behavior that classes must implement.

ISBN: 1-60132-489-8, CSREA Press ©

132

Int'l Conf. Software Eng. Research and Practice | SERP'18 |

2.1. BNF to Program Grammar To jump-start a new Program Grammar, it is often useful to use an existing BNF-like grammar. Program Grammars do not represent terminal nodes (such as strings, numbers, punctuation, comments, etc.) in the grammar itself, rather, they use Java code that can be shared with grammars for other programming languages. For example, string literals and numbers are generally similar between languages. Furthermore, implementing terminal nodes (consider the BNF rules for a floating-point number) in a traditional grammar is often complicated and error-prone.

2.1.2. The BNF Program Grammar Figure 7 shows a section of the Program Grammar for BNF. It is used to read the EBNF grammar in Figure 6. 2.1.3. Generated Program Grammar for EBNF Figure 8 shows a section of the generated Program Grammar. All spaces and newlines are exactly as output from the transformation framework. public static class BNF_Group_2 extends TokenChooser { public BNF_Literal literal; public static class BNF_Sequence_1 extends TokenSequence { public BNF_EbnfIdentifier ebnfIdentifier; public @OPT BNF_EbnfModifier ebnfModifier; } public BNF_EbnfOptional ebnfOptional; public BNF_EbnfGroup ebnfGroup;

The following example parses a BNF grammar, transforms it to a Program Grammar (in Java), and uses that generated Program Grammar to re-parse the original grammar. 2.1.1. Source EBNF Program Extended BNF (EBNF) has several variations. In this case, the asterisk * means zero or more of the previous item, a plus sign + means one or more. The vertical bar | means to choose one, square brackets [ and ] mean the item is optional. The ::= marker is used to define a rule, with semicolon ; indicating the end of a rule. ` ebnf-program ::= ebnf-rule*; ebnf-rule ::= ebnf-identifier '::=' ebnf-alternation ';'; ebnf-alternation ::= ebnf-expression ( '|' ebnf-expression )*; ebnf-expression ::= ( literal | ebnf-identifier [ebnf-modifier] | ebnf-optional | ebnf-group )+; ebnf-group ::= '(' ebnf-alternation ')' [ebnf-modifier]; ebnf-optional ::= '[' ebnf-alternation ']' [ebnf-modifier]; ebnf-modifier ::= '+' | '*';

Figure 6. Source EBNF Grammar for EBNF The two terminal nodes (literal and ebnf-identifier) in Figure 6 are defined in Java, and are outside the scope of this document. public static class BNF_ExpressionTerm extends TokenChooser { public BNF_Literal literal; public static class BNF_Rulename extends TokenSequence { public BNF_Rule_Reference ref; public @NOSPACE @OPT BNF_PunctuationChoice starOrPlus = new BNF_PunctuationChoice("*", "+"); } public static class BNF_Group extends TokenSequence { public PunctuationLeftParen leftParen; public @NOSPACE BNF_Expression expression; public @NOSPACE PunctuationRightParen rightParen; public @NOSPACE @OPT BNF_PunctuationChoice starOrPlus = new BNF_PunctuationChoice("*", "+"); } public static class BNF_Optional extends TokenSequence { public PunctuationLeftBracket leftBracket; public @NOSPACE BNF_Expression expression; public @NOSPACE PunctuationRightBracket rightBracket; } }

Figure 7. BNF Program Grammar Snippet

}

Figure 8. Generated Program Grammar Snippet Class names are auto-generated in many cases because BNF grammars are not as descriptive as Program Grammars. Somewhat recursively, the generated Program Grammar is used to parse the original BNF grammar again. The code snippet in Figure 8 contains almost all of the ebnf-expression rule (lines 4-5) in Figure 6. It does not include the + quantifier. 2.1.4. Re-Parsing the Original EBNF Grammar The generated Program Grammar was used to re-parse the original EBNF grammar from Figure 6. Seq SLn SC ELn EC ==== ==== ==== ==== ==== 1 1 1 8 0 2 1 1 8 0 3 1 1 8 0 4 1 1 2 0 5 1 1 1 12 6 1 14 1 16 7 1 18 1 27 8 1 18 1 27 9 1 18 1 27 10 1 18 1 27 11 1 18 1 26 12 1 27 1 27 13 1 28 1 28

Token Type ==================== Grammar_EBNF_bnf BNF_EbnfProgram TokenList BNF_EbnfRule BNF_EbnfIdentifier BNF_Punct BNF_EbnfAlternation BNF_EbnfExpression TokenList BNF_Sequence_1 BNF_EbnfIdentifier BNF_Punct BNF_Punct

Text =================

ebnf-program ::=

ebnf-rule * ;

Figure 9. Parse Log from EBNF Re-Parse Figure 9 shows a partial log from re-parsing the original EBNF grammar using the generated Program Grammar in Figure 8.

2.2. COBOL to Python (or Java or C#) This example shows conversion of a COBOL program to Python. The exact same process is used for conversion to Java and C#, and the output is exactly the same. 2.2.1. Source COBOL Program The COBOL program in Figure 10 prints 0-0-0 to 9-9-9 twice, using two variations on the PERFORM verb, like a mileage odometer.

ISBN: 1-60132-489-8, CSREA Press ©

Int'l Conf. Software Eng. Research and Practice | SERP'18 |

133

using the Java annotations in the Program Grammar to control formatting.

WORKING-STORAGE SECTION. 01 Dial. 02 Hundreds PIC 99 VALUE ZEROS. 02 Tens PIC 99 VALUE ZEROS. 02 Units PIC 99 VALUE ZEROS. PROCEDURE DIVISION. Begin. DISPLAY "Using an out-of-line Perform". DISPLAY "Start mileage counter simulation 1". PERFORM CountMileage VARYING Hundreds FROM 0 BY 1 UNTIL Hundreds > 9 AFTER Tens FROM 0 BY 1 UNTIL Tens > 9 AFTER Units FROM 0 BY 1 UNTIL Units > 9 DISPLAY "End of mileage counter simulation 1."

Figure 10. Excerpt from MileageCount.CBL There are two paragraphs, called Begin and CountMileage with the former calling the latter. The primary focus of this experiment is converting two different forms of the PERFORM verb. 2.2.2. COBOL Transformation The inline version of PERFORM is shown in Figure 11; the version that calls a separate paragraph was shown in Figure 3. if (token instanceof COBOL_PerformUntil) { COBOL_PerformUntil until = (COBOL_PerformUntil) token; // Collect all the inline statements ArrayList actions = new ArrayList(); for (COBOL_Statement statement : inline.statements) { Stmt newStatement = trans.transformStatement(statement); actions.add(newStatement); } Expr termCond = trans.transformExpression(until.condition); Expr notTerm = trans._target.createNot(termCond);

2.3. Delphi to C# (or Java or Python) Delphi (originally Pascal) programs are relatively easy to transform because the language was designed to be simple and efficient [14]. 2.3.1. Source Delphi Program This sample program was written in Pascal before it was replaced by Delphi. The program prints the first few prime numbers. Program Prime; {$I IsPrime.p} Var I : Integer; N : Integer; Begin Write('Enter N -->'); ReadLn(N); For I := 1 to N do Begin If IsPrime(I) then Begin WriteLn(I, ' is prime') End End End.

Figure 13. Source Prime.pas Note the {$I IsPrime.p} in Figure 13. Our framework has a full macro pre-processor to handle this, as described in Section 1.5.

return trans._target.createDoStatement(actions, notTerm); }

Figure 11. Transformation Logic for inline PERFORM There are many additional variations on the PERFORM verb in COBOL, which are not part of this experiment. import sys class MileageCount: def Begin(self): print "Using an out-of-line Perform" print "Start mileage counter simulation 1" for Dial.Hundreds in range(0, (9) + 1, 1): for Dial.Tens in range(0, (9) + 1, 1): for Dial.Units in range(0, (9) + 1, 1): self.CountMilage() print "End of mileage counter simulation 1." sys.exit(0) def CountMilage(self): Disp.Hunds = Dial.Hundreds Disp.Tens = Dial.Tens Disp.Units = Dial.Units print '{}{}{}{}{}'.format( Disp.Hunds, "-", Disp.Tens, "-", Disp.Units)

Figure 12. Excerpt of Generated MileageCount.py 2.2.3. Generated Python Program The generated Python program is shown in Figure 12. It was generated by parsing the input COBOL program, creating an in-memory Program Semantic Tree. Then the COBOL transformation engine was used to create an instance of a target PST in memory. The COBOL transformation engine was not aware of the target language and was also able to produce both Java and C# programs. The target PST was written to a text file,

Function IsPrime(N : Integer) : Boolean; Var K : Integer; Begin If N < 2 Then IsPrime := False Else If N = 2 Then IsPrime := True Else If N mod 2 = 0 Then IsPrime := False Else Begin K := 3; While K * K 1 =1 and P 1, S >1, and 1, and P

1.

ISBN: 1-60132-489-8, CSREA Press ©

168

Int'l Conf. Software Eng. Research and Practice | SERP'18 |

3)

4)

5)

6)

7)

8)

9)

This category indicates that the refactoring technique improved only the performance and maintained the same level of energy efficiency. The refactoring technique in this category can be used when the performance is more important than the energy efficiency in heavy and linear computations. Category 3 (newly added to the categories) is when Speedup = 1 and P owerup < 1. This category indicates that the refactoring technique improved only the energy efficiency and maintained the same level of performance. The refactoring technique in this category can be used when energy efficiency is more important than performance in order to make mobile device batteries last for a longer time. Category 4 is when P owerup > 1, Speedup > 1, and Speedup > P owerup. This category indicates that the refactoring technique slightly improved the performance whereas the level of energy efficiency was decreased. The refactoring technique in this category can be used when performance is needed while the amount of the energy required is not high, such as parallel computations. Category 5 is when P owerup < 1, Speedup < 1, and Speedup > P owerup. This category indicates that the refactoring technique reduced the performance to consume less energy. The refactoring technique in this category can be used when adjusting the performance to achieve better energy efficiency. Category 6 is when P owerup > 1, Speedup > 1, and Speedup < P owerup. This category indicates that the refactoring technique improved the performance; however, the increase in the energy consumption exceeded the improvement in the performance. The refactoring technique in this category can be used when using parallel computations with parallel devices where more energy is consumed, but the performance level does not increase. Category 7 is when P owerup < 1, Speedup < 1, and Speedup < P owerup. This category indicates that the refactoring technique decreased the performance and increased the energy efficiency; however, the more computations the software runs, the more energy that will be consumed, which leads to decreased the energy efficiency. Category 8 is when P owerup = 1, Speedup < 1. This category indicates that the refactoring technique decreased the average of the performance, and the energy efficiency did not improve. Category 9 (newly added to the categories) is when Speedup = 1 and P owerup > 1. This category indicates that the refactoring technique maintained the same level of performance and reduced the energy efficiency, such as when a mobile device performs heavy computation and the battery charge is consumed

faster. 10) Category 10 is when P owerup > 1, Speedup < 1. This category indicates that the refactoring technique increased the energy consumption and decreased the performance. The refactoring technique is needed only when the requirement to make the application software code maintainable is more important than the performance and energy consumption. Table 2: Samsung Galaxy S5 Specifications Category

Specifications

Processors

Qualcomm Snapdragon 801 Processor, 2.5GHz Quadcore Krait 400 Adreno 330 GPU RAM: 2GB Internal: 32GB Android 6.0 Marshmallow, TouchWiz UI

Memory Storage Platform

Table 3: LG Nexus 5X Specifications Category

Specifications

Processors

Qualcomm Snapdragon 808 Processor, 1.8GHz Hexacore 64-bit Adreno 418 GPU RAM: 2GB Internal: 32GB Android 6.0 Marshmallow

Memory Storage Platform

4. Experiment Setup and Execution Two versions of the application software code (nonrefactored and refactored) are compared for each code refactoring technique by using the GPS-UP metrics in order to show the improvement or decline in performance and energy efficiency.

4.1 Experiment Execution Setup The scenario of the experiment contains seven steps: 1) Develop Fowler’s sample code in Android Studio to create a mobile application. 2) Run the application on the Android mobile platform. 3) Measure the average of the energy consumption and performance of 10 runs. 4) Apply a code refactoring technique to the code. 5) Run the application again. 6) Measure the average of the energy consumption and performance of 10 runs. 7) Apply the results to the GPS-UP metrics to evaluate and categorize the code refactoring technique.

4.2 Experiment Environment Android Studio was used to develop an application that runs the sample code in Fowler’s studies to measure the code refactoring techniques. The application was installed on a Samsung Galaxy S5 smartphone and an LG Nexus 5X smartphone. The specifications of each mobile device are listed in Table 2 and Table 3, respectively.

ISBN: 1-60132-489-8, CSREA Press ©

Int'l Conf. Software Eng. Research and Practice | SERP'18 |

4.3 Energy Measurement Tool PowerTutor [12] is the application used to measure energy consumption in our study. This application was developed for Android mobile devices to measure and display the power consumed by the device components and any running application. In addition, PowerTutor enables software developers to monitor changes in software energy efficiency while modifying the software design or code. PowerTutor has a power consumption estimation within 5%.

4.4 Results An Android mobile application was developed from Fowler’s sample and installed on each mobile device. The energy consumption and execution time of the mobile application were measured ten times before each refactoring technique was applied and ten times after each refactoring technique was applied. After the average, median, and variance of each ten runs were calculated and analyzed, there were no extreme score, and the average value was found as the best value to be applied to the GPS-UP metrics. Then, Greenup, Speedup, and Powerup were calculated in order to categorize each refactoring technique. Table 4 illustrates the results of the Samsung Galaxy S5, and Table 5 illustrates the results of the LG Nexus 5X. In addition, the two tables illustrate the total number of mobile application lines of code and the number of mobile application lines of code that will be changed when each code refactoring technique is applied. In both result sets, there are 9 refactoring techniques in the green area and 12 refactoring techniques in the red area. In the section 6 discussion, we discuss each category we observed in the experiment.

5. Discussion Following is an explanation of the technical reasons behind the positive or negative improvement for each refactoring technique shown in Table 4 and Table 5.

5.1 Green Categories The refactoring techniques that fell in this green area improved performance or energy efficiency or both together. The Inline Method technique replaced the method call with its body which eliminates the fetch-decode-execute cycle to call the method. The Move Method technique reduced the cost of the queries between two classes by moving the method to the class that has more features with it. The Inline Class technique deleted the class that is not needed very often and moved its work to another class. The Remove Parameter technique deleted the parameter that is no longer needed in the method. As a result, the recent two refactoring techniques eliminated the extra load in the memory. The Pull Up Field technique moved the field that the two subclasses had into the upper class which reduced the cost of duplicate parameters.

169

The Pull Up Method technique which improved only the energy efficiency and maintained the same level of performance. The refactoring technique moved the same method that the two subclasses had into the upper class which reduced the energy consumption to allocate one method in the memory instead of two duplicate methods. The Inline Temp technique improved the performance more than the energy efficiency. Deleting temporary variables reduced the time for fetching the unnecessary temporary variable from the main and cache memories. As Speedup is greater than Powerup, the Inline Temp technique is still considered a positive improvement to the energy efficiency. The following refactoring techniques did not improve performance and maintained the same level of energy efficiency which is considered a positive improvement. The Remove Assignment to Parameter technique assigned a value to a local variable instead of a parameter which improved maintainability and readability. The Replace Type Code With State Strategy technique changed the behavior of the class by adding subclasses for the type of object which made the code updateable. In addition, because there was no change in performance and energy efficiency, these refactoring techniques are still considered green categories.

5.2 Red Categories The refactoring techniques that consumed more energy although several refactoring techniques improved performance. When the method is long and the Extract Method technique cannot be applied to the code, The Replace Method with Method Object technique is the solution to turn the method into an object with its attributes in order to split the long method into short methods and that makes the code better organized and faster. However, the code consumed more energy because of the extra computing and allocation of the extra attributes and methods. The Move Field technique moved the field to the class that uses the field more than its original class; therefore, the technique improved performance only by reducing unnecessary queries from two classes to the field. The Extract Class technique split the class that did the work of two classes which improved performance; however, this technique did not improve energy efficiency because of the increase in the number of classes in the memory. The Replace Array With Object technique changed the array that had different types of elements into an object and the different types of elements into the object’s attributes which made the code more understandable and faster over the cost of consuming more energy for accessing the object’s attributes instead of the array’s elements. The Decompose Conditional technique replaced each part of a complicated condition if-then-else into a method which is more readable and understandable; however, calling the created methods costs more energy. The Extract Method

ISBN: 1-60132-489-8, CSREA Press ©

170

Int'l Conf. Software Eng. Research and Practice | SERP'18 |

Table 4: GPS-UP Metrics Categories for 21 Code Refactoring Techniques for the Samsung Galaxy S5, Where (s) Is Second(s) and (j) Is Joule(s) Samsung Galaxy S5 Code refactoring techniques Inline Method Move Method Inline Class Remove Parameter Pull Up Field Pull Up Method Inline Temp Remove Assignments to Parameters Replace Type Code with State Strategy Replace Method with Method Object Move Field Replace Array with Object Extract Class Replace Data Value with Object Decompose Conditional Replace Temp with Query Split Temporary Variable Extract Method Self Encapsulate Field Replace Conditional with Polymorphism Add Parameter

AVG of 10 Runs TBR(s) EBR(j) 62.36 61.43 63.38 60.67 60.59 60.61 60.55 61.44 62.60 60.65 61.72 62.41 60.28 62.00 61.13 62.37 60.16 61.14 61.42 63.21 59.15

11.37 11.06 12.78 10.00 10.71 10.55 10.84 10.97 10.46 10.69 10.73 11.04 12.28 11.44 11.13 11.30 10.68 11.10 12.54 10.17 9.85

AVG of 10 Runs TAR(s) EAR(j) 59.46 58.47 58.50 58.50 60.45 60.50 58.51 59.45 60.43 59.44 61.30 59.98 62.84 63.12 60.94 62.59 60.38 62.13 63.10 64.13 61.02

9.91 8.80 9.56 9.62 10.50 10.40 10.63 10.69 10.39 11.86 11.70 11.74 12.39 11.63 12.12 11.70 12.09 11.84 14.18 12.91 11.10

Greenup 1.15 1.26 1.34 1.04 1.02 1.01 1.02 1.03 1.01 0.90 0.92 0.94 0.99 0.98 0.92 0.97 0.88 0.94 0.88 0.79 0.89

GPS-UP Metrics Speedup Powerup 1.05 1.05 1.08 1.04 1.00 1.00 1.03 1.03 1.04 1.02 1.01 1.04 0.96 0.98 1.00 1.00 1.00 0.98 0.97 0.99 0.97

0.91 0.84 0.81 1.00 0.98 0.99 1.01 1.01 1.03 1.13 1.10 1.11 0.97 1.00 1.09 1.03 1.13 1.05 1.10 1.25 1.09

Category

Total

C1 C1 C1 C2 C3 C3 C4 C4 C4 C6 C6 C6 C7 C8 C9 C9 C9 C10 C10 C10 C10

264 288 298 360 354 354 264 272 320 274 288 306 298 296 335 265 270 263 293 340 354

LOC Changed 4 16 5 6 10 5 2 3 20 15 10 19 20 10 6 7 4 7 3 17 6

Table 5: GPS-UP Metrics Categories for 21 Code Refactoring Techniques for the LG Nexus 5X, Where (s) Is Second(s) and (j) Is Joule(s) LG Nexus 5X Code refactoring techniques Inline Method Move Method Inline Class Remove Parameter Pull Up Field Pull Up Method Inline Temp Remove Assignments to Parameters Replace Type Code with State Strategy Replace Method with Method Object Move Field Extract Class Replace Array with Object Decompose Conditional Extract Method Replace Temp with Query Split Temporary Variable Replace Data Value with Object Self Encapsulate Field Replace Conditional with Polymorphism Add Parameter

AVG of 10 Runs TBR(s) EBR(j) 67.68 67.81 66.18 62.67 65.82 65.50 66.85 66.78 66.67 68.28 64.30 64.79 66.77 66.17 64.51 66.36 67.36 63.65 64.46 65.55 63.49

13.20 12.39 13.18 13.22 12.41 13.27 13.17 13.13 13.76 13.35 13.73 13.78 13.29 12.96 12.74 14.33 13.12 12.17 12.62 13.42 12.51

AVG of 10 Runs TAR(s) EAR(j) 66.54 63.94 62.71 61.41 64.63 65.37 63.99 68.57 68.14 66.30 62.80 63.49 64.35 66.24 68.13 68.11 68.87 66.25 66.91 66.89 64.98

12.70 10.56 11.03 12.82 11.63 12.39 12.96 12.12 12.64 14.69 14.42 14.24 13.88 14.40 14.82 15.28 14.39 14.10 14.21 14.55 13.20

Greenup 1.04 1.17 1.19 1.03 1.07 1.07 1.02 1.08 1.09 0.91 0.95 0.97 0.96 0.90 0.86 0.94 0.91 0.86 0.89 0.92 0.95

GPS-UP Metrics Speedup Powerup 1.02 1.06 1.06 1.02 1.02 1.00 1.04 0.97 0.98 1.03 1.02 1.02 1.04 1.00 0.95 0.97 0.98 0.96 0.96 0.98 0.98

ISBN: 1-60132-489-8, CSREA Press ©

0.98 0.90 0.88 0.99 0.95 0.94 1.03 0.90 0.90 1.13 1.08 1.05 1.08 1.11 1.10 1.04 1.07 1.11 1.08 1.06 1.03

Category

Total

C1 C1 C1 C1 C1 C3 C4 C5 C5 C6 C6 C6 C6 C9 C10 C10 C10 C10 C10 C10 C10

264 288 298 360 354 354 264 272 320 274 288 298 306 335 263 265 270 296 293 340 354

LOC Changed 4 16 5 6 10 5 2 3 20 15 10 20 19 6 7 7 4 10 3 17 6

Int'l Conf. Software Eng. Research and Practice | SERP'18 |

technique extracted part of the long method to be in a new separate method. As a result, the refactoring technique downgraded the performance and energy efficiency for calling the new method. However, this technique is still beneficial because the new method can be called by other methods. The Replace Temp With Query technique replaced the temporary variable and its references with a query from a method which made the CPU execute the method every time there is reference to the temporary variable. The positive improvement is that the created query and its method can be used by other methods. The Split Temporary Variable technique replaced a temporary variable that is assigned to two values with two temporary variables which made the code more understandable. The price was sacrificing performance and energy by loading two variables instead of one to the memory. The Self Encapsulate Field technique added a Get Method to access the field instead of directly accessing the field in order to achieve encapsulation. However, the Set and Get methods are extra, which lead to consuming more time and energy to execute. The Replace Data Value with Object technique replaced the data attribute that needed more data or behavior with an object which means creating a new class for the object; therefore, creating a new class made the CPU and RAM slower and consumed more energy. The Replace Conditional with Polymorphism technique replaced the condition that chooses different tracks and different objects with a subclass and method for each object; similarly, adding classes leads to the same result for performance and energy efficiency. The last refactoring technique, the Add Parameter technique added a parameter to the method that needed more information which means more energy is needed to access this parameter from the RAM. These refactoring techniques did not improve performance or energy efficiency despite being useful for reaching maintainability.

6. Threats to Validity The main threat to the validity of our results is the possibility of getting different results when testing refactoring techniques in different mobile operating systems and devices and using different energy measurement tools; moreover, it will prevent us from generalizing our experiment results. Thus, testing these techniques in different environments will minimize this threat. Another threat to validity is that we tested only Fowler’s refactoring techniques in Fowler’s sample code; therefore, we need to test these refactoring techniques in real-world mobile applications in order to get additional experiment results and better insights.

7. Conclusions, and Future Work This paper presents a study of evaluating code refactoring techniques in a mobile software system by using GPS-UP

171

metrics. We developed an Android application that runs the sample code using Fowler’s studies and applied 21 code refactoring techniques to the mobile application software. We then applied the results to GPS-UP metrics to categorize the code refactoring technique into one of ten categories. The results showed the interrelationship between performance and energy efficiency for each code refactoring technique. Our work helps application software developers to explore the trade-off between performance and energy efficiency for these refactoring techniques. In future work, we will evaluate additional code refactoring techniques in open source mobile applications running in different mobile software systems.

References [1] M. Fowler, Refactoring: Improving the Design of Existing Code. Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA, 1999. [2] S. Abdulsalam, Z. Zong, Q. Gu, and M. Qiu, “Using the greenup, powerup, and speedup metrics to evaluate software energy efficiency,” in 2015 Sixth International Green and Sustainable Computing Conference (IGSC), Dec 2015, pp. 1–8. [3] G. Metri, W. Shi, and M. Brockmeyer, “Energy-efficiency comparison of mobile platforms and applications: A quantitative approach,” in Proceedings of the 16th International Workshop on Mobile Computing Systems and Applications, ser. HotMobile ’15. New York, NY, USA: ACM, 2015, pp. 39–44. [Online]. Available: http://doi.acm.org/10.1145/2699343.2699358 [4] J. Medellin, O. Barack, Q. Zhang, and L. Huang, “Enabling migration decisions through policy services in soa mobile cloud computing systems,” in Proceedings of the International Conference on Grid Computing and Applications (GCA). The Steering Committee of The World Congress in Computer Science, Computer Engineering and Applied Computing (WorldComp), 2015, p. 23. [5] K. Fekete, Ã. Pelle, and K. Csorba, “Energy efficient code optimization in mobile environment,” in 2014 IEEE 36th International Telecommunications Energy Conference (INTELEC), Sept 2014, pp. 1–6. [6] R. Mittal, A. Kansal, and R. Chandra, “Empowering developers to estimate app energy consumption,” in Proceedings of the 18th annual international conference on Mobile computing and networking. ACM, 2012, pp. 317–328. [7] J. J. Park, J.-E. Hong, and S.-H. Lee, “Investigation for software power consumption of code refactoring techniques.” in SEKE, 2014, pp. 717– 722. [8] W. G. da Silva, L. Brisolara, U. B. Corrêa, and L. Carro, “Evaluation of the impact of code refactoring on embedded software efficiency,” in Proceedings of the 1st Workshop de Sistemas Embarcados, 2010, pp. 145–150. [9] C. Bunse, H. HÃ˝upfner, E. Mansour, and S. Roychoudhury, “Exploring the energy consumption of data sorting algorithms in embedded and mobile environments,” in 2009 Tenth International Conference on Mobile Data Management: Systems, Services and Middleware, May 2009, pp. 600–607. [10] C. Comito and D. Talia, “Evaluating and predicting energy consumption of data mining algorithms on mobile devices,” in Data Science and Advanced Analytics (DSAA), 2015. 36678 2015. IEEE International Conference on. IEEE, 2015, pp. 1–8. [11] S. Hasan, Z. King, M. Hafiz, M. Sayagh, B. Adams, and A. Hindle, “Energy profiles of java collections classes,” in Proceedings of the 38th International Conference on Software Engineering. ACM, 2016, pp. 225–236. [12] Z. Yang, “Powertutor-a power monitor for android-based mobile platforms,” EECS, University of Michigan, retrieved September, vol. 2, p. 19, 2012.

ISBN: 1-60132-489-8, CSREA Press ©

172

Int'l Conf. Software Eng. Research and Practice | SERP'18 |

Discovery of Things in the Internet of Things Naseem Ibrahim CSSE, School of Engineering, The Behrend College, The Pennsylvania State University Erie, PA, USA [email protected]

Abstract—The Internet of Things (IoT) added connectivity to objects that were traditionally not connected. Billions of new connected Things are expected to join IoT in the next few years. Things were always thought of as private properties. A Thing owner communicates with the Thing to obtain a service or a functionality. This should be changed. Things should be though of as public service or functionality providers. Any client should be able to communicate with any Thing that provides a desired service or functionality. To enable this, Thing owners should be able to publish their Things, and Thing requesters should be able to discovery these Things. Current IoT architectures does not support this. Hence, our research introduced a contextaware SOA architecture for the publication and discovery of Things. This paper is concerned with the discovery of Things. It introduces an architecture for the specification of client queries. It also introduces an XML language for the specification of queries. Keywords-IoT; SOA; Context; Architecture;XML

I. I NTRODUCTION The Internet of Things is the network of physical objects that are connected and able to exchange data. These objects were traditionally not connected, such as vehicles, home appliances and other items embedded with electronics, software, sensors, or actuators. The popularity of the Internet of Things has significantly increased in the last few years and recent studies [1] suggests that billions of new smart Things are expected to take hold in the next few years. Each Thing usually provides a functionality or a service. The traditional way of looking at the Internet of Things is that a Thing owner can remotely connect with his/her Thing to obtain a functionality or a service. The traditional philosophy of looking at the Internet of Things is lacking. A Thing is a physical object that can provide a functionality or a service, which can be anything, from a physical action to a sensor reading. A Thing should be seen as a provider, you don’t have to own the Thing to be able to use its services or functionality. To enable this new philosophy of looking at Things in the Internet of Things, a Thing owner should make his Thing public and a Thing requester should be able to discover the Thing service or functionality. That process should also consider the massive increase in the number of Things, which will also add complexity to finding Things that matches the requesters requirements.

Many platforms exists in the literature of the Internet of Things that supports the communication between Thing owners and their Things. Such platforms can not support the publication and discovery of Things. Hence, our research is focused on introducing a new architecture for the publication and discovery of Things in the Internet of Things. In [2], we proposed the use of a Service-oriented Architecture (SOA) [3], which is an architectural model in which service is a first class element for providing functionality, for the publication and discovery of Things. Traditional Serviceoriented architectures are not rich enough to represent the changing properties of Things. Hence, we proposed the inclusion of context to achieve context-awareness. Context [4] is the conditions that are used to describe a product, or a service. Context awareness [5] is the ability to detect and respond to changes in context. A novel context-aware SOA based architecture for the publication and discovery of Things has been introduced in [2]. This paper is concerned with the discovery part of the architecture. To support the discovery process, we introduce a new architecture for the specification of requesters requirements. This architecture is formalized to support the formal verification of the requirements specified in the requester query. To support portability and implementation, we introduce an XML based language for the specification of Queries called Thing Query Language (TQL). The reset of this paper is structured as follows. Section II, introduces background information including a brief introduction to the Internet of Things, SOA, and context. Section III, briefly review our novel architecture for the publication and discovery of Things in the Internet of Things. Section IV, introduces informally and formally the architecture of the context-aware queries that is used to discovery Things. Section V, introduces the Thing Query Language. Section VI, provides a brief introduction of related work. Finally, Section VII provides some concluding remarks and a brief discussion of future work. II. BACKGROUND The Internet of Things (IoT) [6] is the network of physical objects that are connected and able to exchange data. These objects were traditionally not connected, such as vehicles, home appliances and other items embedded with electronics,

ISBN: 1-60132-489-8, CSREA Press ©

Int'l Conf. Software Eng. Research and Practice | SERP'18 |

software, sensors, and actuators. These objects gather data, and then communicate that to another object that can then analyze the data and make decisions. The idea of the Internet of Things is a massive fundamental shift. Numerous possibilities of new products and services are introduced by these intelligent Things. The main issue with current IoT architectures is that they are looking at Things as private property. An organization or an individual owns a Things, they use the functionalities provided by the Thing to achieve a task or to make a decision. Basically, only the owner of the Thing can make any use of the Thing. We believe this is counter productive, Things can be service providers to anyone. Any client should be able to find a Thing that provides a service that the client needs. This service can be an actual physical tasks, information needed for a decision or a cyber task. Thing owners should be able to publish the services provided by their Things, and clients should be able to find Things that provide the required services. But current architectures for the implementation of IoT does not support this philosophy. Service-oriented Computing (SOC) [7] [8] [9], is a computing paradigm that uses service as the fundamental element for application development processes. An architectural model of SOC in which service is a first class element is called Service-oriented Architecture [3] [10]. A service is a task well defined, that does not depend on the state of other services. SOA consists of three main modules, the service provider, the service requester and the service registry. The two main activities in SOA are service publication and service discovery. Service publication refers to defining the service contract by service providers and publishing them through available service registries. Service discovery refers to the process of finding services that have been previously published and that meet the requirements of a service requester [11]. Typically, service discovery includes service query, service matching, and service ranking. Service requesters define their requirements as service queries. Service matching refers to the process of matching the service requester requirements, as defined in the service query, with the published services. Service ranking is the process of ordering the matched services according to the degree they meet the requester requirements. Our research investigates the publication and discovery of Things using SOA. Combining SOA with the Internet of Things allows users to request and receive functionalities and services from smart objects in a seamless manner. Context has been defined [4] as the information used to characterize the situation of an entity. This entity can be a person, a place, or an object. With the emergence of IoT, context [12] has become even more important. Some questions to ask when determining context is who will be using it, why, when, where, and what. It also consists of more specific aspects such as [13] location, time, temperature, even users emotions and preferences; these can

173

be classified as scalar, vector, or abstract values. Selecting certain factors such as platform, will eliminate a lot of context of use issues; however, when considering IoT, most cases will not be platform dependent. Smart phones being context aware is another interesting concept. Cell phones being completely context aware would require sensors, and RFID tags. This would allow phones to be aware of locations and other information [13], which will limit the amount of interaction between the user and device. Context can also be used in terms of service contracts which is responsible for the fulfillment of agreed upon conditions. As the number of Things and users increase, it will become difficult to pair a Thing with a user. It will become even more difficult to find a smart object that will meet the user’s requirements and context. Because of this, the process will become time consuming and less efficient. In order to find the Thing that will provide the user with the most relevant service, the matching process between the user’s requirements and available Things must consider context. Our research aims to solve the problem of scalable matching of available Things with the use of context-awareness. The result will be a Context-aware Service-oriented Architecture that will allow users to be paired with Things based on their context and requirements. This architecture is discussed in the next section. III. T HING P UBLICATION AND D ISCOVERY A RCHITECTURE This section introduces a novel SOA based architecture for the publication and discovery of Things in the Internet of Things. The architecture was designed to meet seven criteria that we believe are necessary for the architecture to achieve its goals. These seven criteria are: • Context: Context provide useful information that can be interpreted. This needs to come from both the Thing and the client. The context information of the client and Thing can be used to get the best matching between the client’s requirements and available Things. • Dynamic context: The context information of clients and Things are not static, they can dynamically change. The architecture should support dynamic change to best match clients and Things. • Scalability: As the number of Things increase drastically, it will be important that the architecture can adapt and scale with this increase. • Discovery: In order for a client to find Things, the architecture must support the discovery of Things. • Search: Clients should be able to search for the Things that best match to their needs. • Ranking: The search process will be far too time consuming with the increased number of Things being published. To really make this architecture useful, the architecture should rank the result of the search according to the context of the Thing and client. This ranking

ISBN: 1-60132-489-8, CSREA Press ©

174

Int'l Conf. Software Eng. Research and Practice | SERP'18 |

Publication Architecture

Registry Rank

Search publiish

discover Publish

Requester Reviews Thing Requester

Thing Owner

use

Context of User

Thing Architecture

Figure 1.

SOA-based IoT Architecture

will decrease the amount of time the user spends searching for the service they desire. This ranking will also provide the user with the most relevant service to them based on their context and needs. • Formality: The architecture should be formal. Formalism will enable the validation and verification of the functions provided by the architecture. Figure 1 illustrates our newly introduced SOA based architecture for the publication and discovery of Things in IoT. This architecture satisfies the seven criteria discussed above. Below is a discussion of the elements of the architecture. Registry: The registry is the entity that will be responsible for enabling the publication, discovery, matching and ranking of Things. It will allow the Thing owner to publish his Things, and users to discover these Things. The Registry consists of four main elements: • Search: This element is responsible for the discovery of Things. It will enable clients to find Things that provide the services they are requesting. The discovery will be based on the client context, and the context of published Things. • Rank: This element is responsible for ranking the Things that meets the client’s requirements. It is very common that the registry will find multiple Things that meets the client’s requirements. It is also common that none of these Things provide an exact match to the client’s requirements. The ranking element will run ranking algorithms that will rank the matching Things according to the client’s context and priorities. • Publish: This element will be responsible for storing the information provided by Things owners. The information provided by the providers will be structured according to the Thing structure. • Requestor Reviews: To ensure the correctness of the

information contained in the registry. The registry will also allow Thing users to review Things after using them. The reviews completed by users will be managed by this element. The review can include, how the service performed, the correctness of the description, and any experiences the client have with the Thing owner. Requester: This entity represents the client that is trying to discover a Thing that provides a required service. The requester will directly interact with the registry by sending queries that is discussed in the next section. After discovering a Thing, the requester can use the Thing by directly interacting with it. The requester provides his context information to the registry to get the best match. Hence, the requester interacts with following entity. • Context of User: This entity is responsible for collecting and structuring the context of the user. This entity is dynamic. It monitors the user context and update the context information accordingly. The context information provided by this entity will be used by the client when communicating with the registry. Owner: This entity represents the owner of Thing. The owner of the Thing will publish the Thing information in the registry to be available for discovery. The Thing owner will also enable the communication between clients and their Things. The information published by the Thing owner will be structured according to the Thing Architecture. • Thing Architecture: This entity will define the structure that the Thing owner should use to describe his/her Thing. The Thing architecture is formally defined. Hence, Thing description can be formally verified by the Registry. IV. T HING Q UERY The previous section briefly introduced the architecture for the publication and discovery of Things. In this architecture, the Thing owner will publish the Thing description in the registry. The Thing requester will then query the Registry looking for Things that will satisfy a required functionality. This section introduced the structure of the Thing query informally and formally. A. Informal description of Query To ensure consistency and correctness, a Thing query will be structured according to the structure discussed in this section. This subsection will introduce an informal description of the Thing query structure. Figure 2 illustrates the architecture for a Thing query. A Thing query consists of three main elements, non-functional requirements, functional requirement, and requester context. Following is a discussion of the main elements of the Thing query. Functional Requirement: This element specifies the requesters required functionality. The functional requirement

ISBN: 1-60132-489-8, CSREA Press ©

Int'l Conf. Software Eng. Research and Practice | SERP'18 |

Query Architecture

Query Non-Functional Requirments Requester Context

Required Context

Cost

Functional Requirments

Constraints Requirements

Interoperability

Security

Scalability

Performance

Results

Availability

175

– Performance: This will display the required performance information. – Availability: This will describe the maximum acceptable down time that is required by the Thing requester. It can also include the required operating hours. Requester Context: This entity will describe the requester’s context. This will be both static and dynamic context. The point of having context will be to aid in pairing a requestor with the proper Thing based on their context information.

Aggregation

Figure 2.

Query Architecture

will be used by the registry to match the requester required functionality with available Things. A functional requirements is defined in terms of requirements and results. • Requirements: This entity describes the preconditions that the Thing requester guarantees to satisfy before requesting a Thing functionality. • Results: This entity describes the postconditions that the requester is requiring to be satisfied. Non-Functional Requirements: This element specifics the non-functional properties that the Thing requester is requiring. It is important to emphasize that for each of these requirements the Thing requester can specify a priority. A priority is a Natural number between 1 and 5, where 1 indicates low priority and 5 indicates a high priority. The priority will enable the Registry to rank the matched Things according to the matched functionality and non-functional properties. This non-functional requirements elements is divided into the following entities. • Required Context: This entity will describe any conditions or rules the requester will have with respect to the Thing context. This will be both static and dynamic context. This will also enable the matching of the required context with the Thing published context. • Cost: This entity will state the maximum accepted cost of a Thing functionality. This can be either a one-time use cost, or a subscription cost. • Constraints: This entity will contain the non-functional guarantees that the requester is requesting. It includes the following. – Interoperability: This will define the required platforms or environments. – Security: This will describe the required security mechanisms to secure data, and communication. – Scalability: This discuss the required Thing ability to scale to a large number of requests. It can contain the maximum number of requests it can handle per unit of time.

B. Formal description of Query A Query is formally defined using a model-based specification notation. The context information is written in the notation introduced by Wan [14]. Below we give the formal representation of the Thing Query. Let C denote the set of all such logical expressions. X ∈ C is a constraint. The following notation is used in our definition: • T denotes the set of all data types, including abstract data types. • Dt ∈ T means Dt is a datatype. • v : Dt denotes that v is either constant or variable of type Dt. • Xv is a constraint on v. If v is a constant then Xv is true. • Vq denotes the set of values of data type q. • x :: ∆ denotes a logical expression x ∈ C defined over the set of parameters ∆. A parameter is a 3tuple, defining a data type, a variable of that type, and a constraint on the values assumed by the variable. We denote the set of data parameters as Λ = {λ = (Dt, v, Xv )|Dt ∈ T, v : Dt, Xv ∈ C}. 1) Functional Requirement: : The functional requirement section encapsulate the function Requirements and Results. A request includes a single functionality. Definition 1: A functional requirement is a 2-tuple fˆ = ˆ where requirements re ˆ are data hre, ˆ oti ˆ and results ot ˆ :: z, z ⊆ Λ constraints. That is, re ˆ :: z, z ⊆ Λ and ot 2) Requester Context:: This information is formally specified, as defined in [14], using dimensions and tags along the dimensions. In our research, we have been using the five dimensions where, when. what, who, and why. Assume that the service provider has invented a finite set DIM = {X1 , X2 , . . . , Xn } of dimensions, and associated with each dimension Xi a type τi . Following the formal aspects of context developed by Wan [14], we define a context c as an aggregation of ordered pairs (Xj , vj ), where Xj ∈ DIM , and vj ∈ τj . Definition 2: Context information cˆ is formalized using the notation in [14]: Let τ : DIM → I, where DIM = {X1 , X2 ,...,Xn } is a finite set of dimensions and I = {a1 , a2 , ..., an } is a set of types. The function τ associates

ISBN: 1-60132-489-8, CSREA Press ©

176

Int'l Conf. Software Eng. Research and Practice | SERP'18 |

a dimension to a type. Let τ (Xi ) = ai , ai ∈ I. We write cˆ as an aggregation of ordered pairs (Xj , vj ), where Xj ∈ DIM , and vj ∈ τ (Xj ). 3) Nonfunctional Requirements: : The non-functional requirements section encapsulates the following: Requested Context: A requested context is the set of context rules, where a context rule is a situation which might be true in some contexts and false in some others. For example, the situation warm = T emp > 25 ∧ humid > 50, is true only in contexts where the temperature is greater than 25 degrees, and the humidity is greater than 50. Definition 3: A requested context is formalized as rˆ where rˆ = {x|x ∈ C} . Cost: The elements specifies the maximum acceptable cost for requesting a functionality from a Thing. Definition 4: The cost pˆ is defined as a 3-tuple pˆ = ha, cu, uni, where a : N is the price amount defined as a natural number, cu : cT ype is currency tied to a currency type cT ype, and un : uT ype is the unit for which pricing is valid. As an example, pˆ = (25, $, hour) denotes the pricing of 25$/hour. Constraints: This element represents the trustworthiness and quality requirements. Each requirement can be mapped to a priority. Definition 5: A constrains is a rule, expressed as a logical expression in C. A rule may imply another, however no two rules can conflict. We write con ˆ = {y|y ∈ C} to represent the set of constraints. Putting these definitions together we arrive at a formal definition for the non-functional requirements. Definition 6: Non-Functional requirements is a 4-tuple ˆ = hˆ nf r, pˆ, con, ˆ Ξi, where rˆ is the required context, pˆ is the required cost of the Thing, con ˆ is the set of constraints, and Ξ : (x ∈ {1, 2, 3, 4, 5}) → (y ∈ {ˆ c, rˆ, pˆ, con)} ˆ is a function that assign priorities to the elements of the query. Finally, the formal definition of a query is: ˆ , cˆi, where Definition 7: A query is a 3-tuple q = hfˆ, nf ˆ ˆ f is the functional requirement, nf is the non-functional requirements, and cˆ is the requester context.



......



......



......



Below is a detailed discussion of the XML schema for each of these components. A. Functional Requirements The required function definition includes the requirements, and results. Requirements and results are logical conditions that can evaluate to either true or false.











B. Requester Context

V. T HING Q UERY L ANGUAGE

The contextual information of the Thing requester is listed in this section. It is defined according to the five dimensions discussed earlier. Below is the XML schema for defining the requester context.

A Thing requester specifies the Thing requirements according to the architecture we introduced in the previous section. The Thing query has been formally defined to support formal verification. To support portability, in this section we introduce an XML based language called Thing Query Language (TQL). Following is discussion of the structure of TQL. TQL represents a one-to-one mapping to the query structure presented in Figure 2. The three main elements of a query are functional requirements, non-functional requirements, and requester context. Below is the XML schema for the main elements of a query.









ISBN: 1-60132-489-8, CSREA Press ©

Int'l Conf. Software Eng. Research and Practice | SERP'18 |



C. Non-functional requirements The non-functional requirements are listed in this section. It includes required context, cost and constraints. The XML schema for the elements of the non-functional requirements is shown below.



.......

.......

.......



1) Required Context: The required context lists any conditions or rules the requester will have with respect to the Thing context. Each rule is associated with a priority. Below is the XML schema for defining required context.





2) Cost: Cost information lists the maximum acceptable cost from the requester’s point of view. A cost is also associated with a priority. Below is the XML schema for specifying the non-functional requirement cost.







3) Constraints: This section lists the trustworthiness and quality guarantees that a requester is requiring from a Thing. It includes interoperability, security, scalability, performance, and availability guarantees. Each required guarantee is associated with a priority. Below is the XML schema for specifying constraints.



177





























VI. R ELATED W ORK Our research introduced an SOA based architecture for publication and discovery of Things in IoT. This section is concerned with reviewing related approaches that uses SOA for IoT. Context, scalability, dynamic context, discovery, search, rank, and formality were the criteria used to review the related approaches. Each of these criteria is crucial for any architecture as discussed in Section III. In [15], the authors aims to integrate business systems with their manufacturing and logistics processes. This architecture does consider discovery, search, and ranking of Things. This is achieved by using a service catalogue and ranking services based on the program needs during runtime. It also uses web services, which allows it to be scalable. However, this architecture does not consider context at all. In [8], the authors look at the interaction between IoT and the environment. This is caused by the environment creating events, rather than receiving input from the user. This architecture does allow discovery and search-ability of Things using an IoT Browser but does not have a way to rank them. The browser also allows users to view statistics and certain context of sensors and devices based on current

ISBN: 1-60132-489-8, CSREA Press ©

178

Int'l Conf. Software Eng. Research and Practice | SERP'18 |

events. The events themselves are viewable and dynamic in nature. This approach provides some formalism. However, this research does not consider scalability. In [9]. the authors consider IoT Things as resources and builds an architecture for that. Like other approaches, this one does consider discovery by using a service broker, but there is no mention of being able to search or rank these services. Besides discovery, this approach also allows for scalability in its architecture by using the Web of Things. This approach also does not consider any type of context. In [16], the authors introduce a Discovery Driven SOA. The architecture allows for the discovery of Things, applications, and middleware. However, no mention of being able to rank Things, and only a brief mention of being able to search for Things using a query. By using SOA for services, middleware, computing, storage, and applications, there is support for scalability in this architecture. There was no consideration of context. In [17], the authors are heavily concerned with dynamic context-awareness through context aware applications. This architecture however, does not allow for any scalability and discovery due to its one-to-one behavior. Table I illustrates a summary of our study of the related work. A common theme in all of these architectures is either some type of context awareness and little to no discovery options is provided, or no context with discovery option is provided. None of these architectures support all the comparison criteria. On the other hand, our architecture satisfies all of these criteria.

Context Scalability Dynamic Context Discovery Search Rank Formal

[15]

[8]

[9]

[16]

[17]

No Yes No Yes Yes Some No

Some No Yes Yes Yes No Some

No Yes No Yes No No No

No No No Yes Some No No

Yes No Yes No No No No

Table I R ELATED W ORK

VII. C ONCLUSION We have introduced an SOA-based architecture for the publication and discovery of Things in IoT. This paper have introduced an architecture for specifying requesters queries. This architecture has been formally defined using set-theory. A prototype has also been implemented. The prototype simulates the behavior of the architecture. In this prototype, Thing owners were able to publish information about their weather sensors. Clients were also able to create queries using the query structure and our XML language. The registry responded with a list of available Things that matches their requirements. Our future work includes a

complete implementation of the provision architecture. It will also include the use of cloud computing to implement the registry. This will enable our registry to be scalable using the cloud resources. R EFERENCES [1] P. Ray, “A survey on internet of things architectures,” Journal of King Saud University - Computer and Information Sciences, 2016. [2] N. Ibrahim and B. Brandon, “Service-oriented architecture for the internet of things,” in The 4th Annual Conference on Computational Science and Computational Intelligence (CSCI’17), Dec 2017, pp. 1004–1009. [3] T. Erl, SOA Principles of Service Design. Upper Saddle River, NJ, USA: Prentice Hall PTR, 2007. [4] A. K. Dey, “Understanding and using context,” Personal Ubiquitous Comput., vol. 5, no. 1, pp. 4–7, 2001. [5] N. Ibrahim, “A service-oriented architecture for the provision and ranking of student-oriented courses,” ICST Trans. eEducation e-Learning, vol. 3, no. 9, p. e2, 2016. [6] J. Gubbi, R. Buyya, S. Marusic, and M. Palaniswami, “Internet of things (iot): A vision, architectural elements, and future directions,” Future Gener. Comput. Syst., vol. 29, no. 7, pp. 1645–1660, Sep. 2013. [7] D. Georgakopoulos and M. P. Papazoglou, Service-Oriented Computing. The MIT Press, 2008. [8] L. Lan, B. Wang, L. Zhang, R. Shi, and F. Li, “An eventdriven service-oriented architecture for the internet of things service execution,” International Journal of Online Engineering, vol. 11, no. 2, pp. 4–8, 2015. [9] V. Vujovic, M. Maksimovic, D. Kosmajac, and B. Perisic, “Resource: A connection between internet of things and resource-oriented architecture,” in Smart SysTech 2015; European Conference on Smart Objects, Systems and Technologies, July 2015, pp. 1–7. [10] N. Ibrahim, V. Alagar, and M. Mohammad, “A contextdependent service model,” EAI Endorsed Trans. Contextaware Syst. & Appl., vol. 1, no. 2, p. e3, 2014. [11] M. P. Papazoglou, Web Services: Principles and Technology, 1st ed. Prentice Hall, 2008. [12] D. Shamonsky, “Internet of things: Context of use just became more important,” https://www.ics.com/blog/internet-thingscontext-use-just-became-more-important, January 2015. [13] L. Yu, Y. Yang, Y. Gu, X. Liang, S. Luo, and F. Tung, “Applying context-awareness to service-oriented architecture,” in 2009 IEEE International Conference on e-Business Engineering, Oct 2009, pp. 397–402. [14] K. Wan, “Lucx: Lucid enriched with context,” Phd Thesis, Concordia University, Montreal, Canada, January 2006. [15] P. Spiess, S. Karnouskos, D. Guinard, D. Savio, O. Baecker, L. M. S. d. Souza, and V. Trifa, “Soa-based integration of the internet of things in enterprise services,” in Proceedings of the 2009 IEEE International Conference on Web Services, ser. ICWS ’09. Washington, DC, USA: IEEE Computer Society, 2009, pp. 968–975. [16] D. Georgakopoulos, P. P. Jayaraman, M. Zhang, and R. Ranjan, “Discovery-driven service oriented iot architecture,” in 2015 IEEE Conference on Collaboration and Internet Computing (CIC), Oct 2015, pp. 142–149. [17] C. M. Olsson and O. Henfridsson, Designing Context-Aware Interaction: An Action Research Study. Boston, MA: Springer US, 2005, pp. 233–247.

ISBN: 1-60132-489-8, CSREA Press ©

Int'l Conf. Software Eng. Research and Practice | SERP'18 |

179

Publishing Things in the Internet of Things Naseem Ibrahim CSSE, School of Engineering, The Behrend College, The Pennsylvania State University Erie, PA, USA [email protected]

Abstract—The popularity of the Internet of Things has increased dramatically in the last few years. That popularity was accompanied by a large number of platforms that support the communication between the Things and their owners. The major issue with available platforms is that they view Things as private property. Only the Thing owner can communicate with the Thing to obtain the service or functionality that is provided by the Thing. But this view is short sighted. A Thing should be viewed as a service or a functionality provider. The Thing owner should be able to publish Thing information and make it public. On the other hand, Thing requesters should be able to discover available Things. Then a communication between the Thing and the requester is performed. Our research is concerned with introducing an architecture that supports the publication and discovery of Things in the Internet of Things. This architecture will enable Thing owners to publish Thing specification and Thing requesters to find Things that best matches their needs. This paper is concerned with the publication aspect of our architecture. It introduced the Thing architecture informally and formally. It also introduces an XML based language for the specification of Things. Keywords-Internet of Things; SOA; Context; Architecture

I. I NTRODUCTION The popularity of the Internet of Things has significantly increased in the last few years. Smart Things are prevailing for both business and personal use. From cell phones, smart homes, smart sensors and more [1], nearly every object can be made intelligent and relay information and data to a user at a moment’s notice. The Internet of Things is adding connectivity to objects that were traditionally not connected. Recent studies [2] suggests that billions of new smart Things are expected to take hold in the next few years. Each Thing usually provides a functionality or a service. The traditional way of looking at the Internet of Things is that a Thing owner can remotely connect with his/her Thing to obtain a functionality or a service. We believe that the traditional way of looking at these Things is counter productive. A Thing should be looked at as a service provider. This service can be anything, from a physical action to a sensor readying. If a Thing is a service provider, then why do you have to own the Thing to be able to use its service. We believe that you don’t have to own a Thing to obtain the Thing service. To realize this new way of looking at Things, a Thing owner should make his Thing public and a Thing requester

should be able to discover these Things. The literature of the Internet of Things is rich with platforms that supports the communication between Thing owners and their Things. Such architectures can not support the public publication and discovery of Things. Hence, our research is focused on introducing a new architecture for the publication and discovery of Things in the Internet of Things. Our research [3], proposed the use of a Service-oriented Architecture (SOA) [4], which is an architectural model in which service is a first class element for providing functionality, for the publication and discovery of Things. Traditional Service-oriented architectures are not rich enough to represent the changing properties of Things. Hence, we proposed the inclusion of context to achieve contextawareness. Context [5] is the conditions that are used to describe a product, or a service. Context awareness [6] is the ability to detect and respond to changes in context. In [3], we gave a general introduction to a context-aware SOA based architecture for the publication and discovery of Things. This paper is concerned with the publication part of the architecture. To support the publication process, we introduce a new architecture for the specification of Things. This architecture is formalized to support the formal verification of the claims made in the Thing description. To support portability and implementation, we introduce an XML based language for the specification of Things called Thing Specification Language(TSL). The reset of this paper is structured as follows. Section II, introduces background information including a brief introduction to the Internet of Things, SOA, and context. Section III, briefly review our novel architecture for the publication and discovery of Things in the Internet of Things. Section IV, introduces informally and formally the architecture of the context-aware Things that is used to publish Things. Section V, introduces the Thing Specification Language. Section VI, provides a brief introduction of related work. Finally, Section VII provides some concluding remarks and a brief discussion of future work. II. BACKGROUND The Internet of Things (IoT) [1] is the connection of physical objects that collect and transfer data from one

ISBN: 1-60132-489-8, CSREA Press ©

180

Int'l Conf. Software Eng. Research and Practice | SERP'18 |

object to another. These objects are embedded with software, and sensors. These objects can be virtually anything. Most of today’s IoT applications [2] are cloud-based. These cloud applications [4] combine the characteristics and functionality of a desktop, and web application by offering portability, and usability. These applications [2] are key in receiving the data, interpreting it, and transmitting it to create useful information and decisions. The idea of the Internet of Things is a massive fundamental shift. Numerous possibilities of new products and services are introduced by these intelligent Things. The Internet of Things is expected to change the day to day operations of virtually everything. The main issue with current IoT architectures is that they are looking at Things as private property. An organization or an individual owns a Things, they use the functionalities provided by the Thing to achieve a task or to make a decision. Basically, only the owner of the Thing can make any use of the Thing. Things can be service providers to anyone. Any client should be able to find a Thing that provides a service that the client needs. This service can be an actual physical tasks, information needed for a decision or a cyber task. Thing owners should be able to publish the services provided by their Things, and clients should be able to find Things that provide the required services. But current architectures for the implementation of IoT does not support this philosophy. Service-oriented Computing (SOC) [7] is a computing paradigm that uses service as the fundamental element for application development processes. An architectural model of SOC in which service is a first class element is called Service-oriented Architecture [4]. Service-oriented architecture (SOA) [8] is the collection, communication, and sharing of services. It can involve sharing data, or coordinating activity when connected. A service is a task well defined, that does not depend on the state of other services. SOA consists of three main modules, the service provider, the service requester and the service registry. The three main activities in SOA are service publication, service discovery and service provision. In the literature, these terms are used in the following sense. Service publication refers to defining the service contract by service providers and publishing them through available service registries. Service discovery refers to the process of finding services that have been previously published and that meet the requirements of a service requester [9]. Typically, service discovery includes service query, service matching, and service ranking. Service requesters define their requirements as service queries. Service matching refers to the process of matching the service requester requirements, as defined in the service query, with the published services. Service ranking is the process of ordering the matched services according to the degree they meet the requester requirements. The ranking will enable the service requester to select the most relevant service from the list of candidate

services. Service provision refers to the process of executing a selected service. The execution may include some form of an interaction between the service requester and service provider. Our research investigates the use of SOA for the publication, discovery, and provision of the services provided by Things in the Internet of Things. Context has been defined [5] as the information used to characterize the situation of an entity. This entity can be a person, a place, or an object. Context is an important concept to consider when designing and implementing software. With the emergence of IoT, context [10] has become even more important. Some questions to ask when determining context is who will be using it, why, when, where, and what. It also consists of more specific aspects such as [11] location, time, temperature, even users emotions and preferences; these can be classified as scalar, vector, or abstract values. Selecting certain factors such as platform, will eliminate a lot of context of use issues; however, when considering IoT, most cases will not be platform dependent. Smart devices being context aware is another interesting concept. Cell phones being completely context aware would require sensors, and RFID tags. This would allow phones to be aware of locations and other information [11], which will limit the amount of interaction between the user and device. Context can also be used in terms of service contracts which is responsible for the fulfillment of agreed upon conditions. Things are context-aware rich entities. Our research extends the traditional definition of services with context, which will enable the specification and publication of the context information of Things. III. P UBLICATION A RCHITECTURE This section reviews our novel SOA based architecture that was introduced in [3]. This architecture supports the publication and discovery of Things in the Internet of Things. The main goal of this architecture is support the publication and discovery of trustworthy context-dependent Things. To achieve this goal the architecture should consider the following criteria: • Context: Things are context-aware in nature. The consideration of context information in publication and discovery is sessional to find the best match between the requester requirements and the available Things. • Dynamic context: The context information of requesters and Things are not static, they can dynamically change. The architecture should support dynamic change to best match clients and Things. • Scalability: As the number of Things increase drastically, it will be important that the architecture can adapt and scale with this increase. • Discovery: In order for a client to find Things, the architecture must support the discovery of Things. Discovering Things can be achieved by discovering the services provided by the Things.

ISBN: 1-60132-489-8, CSREA Press ©

Int'l Conf. Software Eng. Research and Practice | SERP'18 |

Publication Architecture

Registry Rank

Search publiish

discover Publish

Requester Reviews Thing Requester

Thing Owner

use

Context of User

Thing Architecture

Figure 1.

SOA-based IoT Architecture

Search: After discovering the services provided by Things, the client should be able to search for the Things that best matches his/her needs. • Ranking: The search process will be time consuming with the increased number of Things being published. To really make this architecture useful, the architecture should rank the result of the search according to the context of the Thing and client. This ranking will decrease the amount of time the user spends searching for the service they desire. This ranking will also provide the user with the most relevant Things to them based on their context and needs. • Formality: The architecture should be formal. Formalism will enable the validation and verification of the functions provided by the architecture. Figure 1 illustrates our newly introduced SOA based architecture for the publication, and discovery, of Things in the Internet of Things. This architecture satisfies the seven criteria discussed above. Below is a brief review of the elements of the architecture. Registry: The registry is the entity that will be responsible for the publication, discovery, matching and ranking of things. It will allow the Thing owner to publish his Things, and users to discover these Things. The Registry consists of four main elements: • Search: This element is responsible for the discovery of Things. It will enable clients to find Things that provide the services they are requesting. The discovery will be based on the client context, and the context of published Things. • Rank: This element is responsible for ranking the Things that meets the client’s requirements. It is very common that the registry will find multiple Things that meets the client’s requirements. It is also common that •

181

none of these Things provide an exact match to the client’s requirements. Hence, a ranking is necessary. The ranking element will run ranking algorithms that will rank Things according to the client’s context and priorities. • Publish: This element will be responsible for storing the information provided by the owners of Things. The information provided by the providers will be structured according to the Thing structure discussed in the next section. • Requestor Reviews: To ensure the correctness of the information contained in the registry. The registry will also allow Thing users to review Things after using them. The reviews completed by users will be managed by this element. The review can include, how the service performed, the correctness of the description, and any experiences the client have with the Thing owner. Requester: This entity represents the client that is trying to discover a Thing that provides a required functionality or service. The requester will directly interact with the registry. After discovering a Thing, the requester can use the Thing by directly interacting with it. The requester provides his context information to the registry to get the best match. Hence, the requester interacts with the following entity. • Context of User: This entity is responsible for collecting and structuring the context of the user. This entity is dynamic. It monitors the user context and update the context information accordingly. The context information provided by this entity will be used by the client when communicating with the registry. Owner: This entity represents the owner of the Thing. The owner of the Thing will publish the Thing information in the registry to be available for discovery. The Thing owner will also enable the communication between clients and their Things. The information published by the Thing owner will be structured according to the Thing Architecture. • Thing Architecture: This entity will define the structure that the Thing owner should use to describe his Things. This is discussed in details in the next section. IV. T HING P UBLICATION The previous section discussed the architecture for the publication, discovery, and provision of Things. In this architecture, the Thing owner will publish the Thing information in the registry. This section introduces the structure of the Thing information that will be published. This section introduces the informal and formal description of a Thing. A. Informal description of Things To ensure consistency and correctness, the information of the Thing will be structured according to the structure discussed in this section. Figure 2 illustrates the architecture

ISBN: 1-60132-489-8, CSREA Press ©

182

Int'l Conf. Software Eng. Research and Practice | SERP'18 |

Thing Architecture

Thing NonFunctional Properties



Functional Properties

– Context of Thing

Interoperability

Security

Cost

Scalability

Constraints

Performance

Availability

Example Data

Accessibility

Functionality

Requirements

Outcomes of Service



– Figure 2.

IoT Thing Architecture

for a Thing that can be published and discovered using our SOA based architecture. A Thing consists of two main elements, non-functional and functional properties. Following is a discussion of the main elements of the Thing architecture. Functional Properties: This element define the behavior and functionality provided by a Thing. It consists of the following entities. • Functionality: This element will include the following entities. – Outcomes of Service: This entity states the postconditions that are guaranteed by the Thing to be true after execution. It can also include the technologies used by the Thing. – Requirements: This entity describes the preconditions that should be satisfied by the Thing user before execution. • Example Data: This entity shows what the output will look like, and how it will be formatted. Non-Functional Properties: This element specifics the non-functional properties that will constrain the functionality provided by the Thing. This element is divided into the following entities. • Context of Thing: This entity will describe all of the Thing’s context, the context used will be dependent upon the Thing. This will be both static and dynamic context. The point of having context will be to aid in pairing a requestor with the proper service based on their context. • Cost: This entity will state the cost of requesting a functionality from a Thing. This can be either for a one-time use, or a subscription based service. • Constraints: This entity will contain the non-functional guarantees that the Thing can provide. It includes the following. – Interoperability: This will define if the Thing has the ability to work with other systems or products (the service will use or be used by other services).



It can also include information related to platform support. Security: This will describe the security mechanism used to secure data, and communication. Scalability: This discuss the Thing ability to scale to a large number of requests. It can contain the maximum number of requests it can handle per unit of time. Performance: This will display the performance information such as: How long the process takes? and How accurate is the data being received? Availability: This will describe the maximum amount of down time that is guaranteed by the Thing provider. It can also include the operating hours. Accessibility: This can include information about user location constraints, type of users allowed and Thing location.

B. Formal description of Things A Thing is formally defined using a model-based specification notation. The context information encapsulated in the non-functional part of the Thing is written in the notation introduced by Wan [12]. The formal definition will enable the verification of the claims made in the Thing description. Below we give the formal representation of the Thing elements. Let C denote the set of all such logical expressions. X ∈ C is a constraint. The following notation is used in our definition: • T denotes the set of all data types, including abstract data types. • Dt ∈ T means Dt is a datatype. • v : Dt denotes that v is either constant or variable of type Dt. • Xv is a constraint on v. If v is a constant then Xv is true. • Vq denotes the set of values of data type q. • x :: ∆ denotes a logical expression x ∈ C defined over the set of parameters ∆. A parameter is a 3tuple, defining a data type, a variable of that type, and a constraint on the values assumed by the variable. We denote the set of data parameters as Λ = {λ = (Dt, v, Xv )|Dt ∈ T, v : Dt, Xv ∈ C}. 1) Functional Properties: The functional properties section encapsulate the Functionality and Example data. A Thing can provide a single functionality. This functionality is defined to include the function Requirements and Outcome of Service. Definition 1: A Thing functional properties is a 2-tuple f p = hf, ri, where f is the functionality, and r is the function example data which represents the result format. The example data is defined as r = hm, qi, where m : string is the result identification name and q = {x|x ∈ Λ} is the

ISBN: 1-60132-489-8, CSREA Press ©

Int'l Conf. Software Eng. Research and Practice | SERP'18 |

set of parameters. A functionality is a 2-tuple f = hre, oti where requirements re and outcomes ot are data constraints. That is, re :: z, z ⊆ Λ and ot :: z, z ⊆ Λ 2) Non-functional properties: The non-functional properties section encapsulates the following: Context: Context information and Context rules are formally specified as part of the context of a Thing. These two parts provide context-awareness ability to Things. Context information is formally specified, as defined in [12], using dimensions and tags along the dimensions. In our research, we have been using the five dimensions where, when. what, who, and why. Assume that the service provider has invented a finite set DIM = {X1 , X2 , . . . , Xn } of dimensions, and associated with each dimension Xi a type τi . Following the formal aspects of context developed by Wan [12], we define a context c as an aggregation of ordered pairs (Xj , vj ), where Xj ∈ DIM , and vj ∈ τj . A Context rule is a situation which might be true in some contexts and false in some others. For example, the situation warm = temp > 30 ∧ humid > 75, is true only in contexts where the temperature is greater than 30 degrees, and the humidity is greater than 75. Definition 2: A context is formalized as a 2-tuple β = hr, ci, where r ∈ C, built over the contextual information c. Context information is formalized using the notation in [12]: Let τ : DIM → I, where DIM = {X1 , X2 ,...,Xn } is a finite set of dimensions and I = {a1 , a2 , ..., an } is a set of types. The function τ associates a dimension to a type. Let τ (Xi ) = ai , ai ∈ I. We write c as an aggregation of ordered pairs (Xj , vj ), where Xj ∈ DIM , and vj ∈ τ (Xj ). Cost: The elements specifies the cost for requesting a functionality from a Thing. Definition 3: The service cost p is defined as a 3-tuple p = ha, cu, uni, where a : N is the price amount defined as a natural number, cu : cT ype is currency tied to a currency type cT ype, and un : uT ype is the unit for which pricing is valid. As an example, p = (100, $, hour) denotes the pricing of 100$/hour. Constraints: This element represents the trustworthiness and quality properties that Thing providers guarantees to be met before, during and after the provision of the functionality of the Thing. Definition 4: A constrains is a rule, expressed as a logical expression in C. A rule may imply another, however no two rules can conflict. We write con = {y|y ∈ C} to represent the set of Constraints. Putting these definitions together we arrive at a formal definition for the Thing non-functional properties. Definition 5: Non-Functional properties is a 3-tuple nf = hβ, p, coni, where β is the context of the Thing, c is the cost of the Thing, and con is the set of constraints. Finally, the formal definition of a Thing is: Definition 6: A Thing is a 2-tuple T h = hf p, nf i, where f p is the functional properties of the Thing, and nf is the

183

nonfunctional properties. V. T HING S PECIFICATION L ANGUAGE A Thing provider specifies the Thing specification using the architecture we introduced in the previous section. The Thing description has been formally defined to support formal verification. To support portability, in this section we introduce an XML based language called Thing Specification Language (TSL). Following is a discussion of the structure of TSL. TSL represents a one-to-one mapping to the Thing structure presented in Figure 2. The two main elements of a Thing are functional properties and non-functional properties. Below is the XML schema for the main elements of a Thing.



......



......



Below is a detailed discussion of the XML schema for each of these components. A. Functional Properties The functional properties definition includes the Functionality, Example Data. The Functionality part defines the function requirements, and outcomes. Each Example Data is defined with an ID and parameter list. Below is the XML schema for presenting functionality.















ISBN: 1-60132-489-8, CSREA Press ©

184

Int'l Conf. Software Eng. Research and Practice | SERP'18 |









B. Non-functional Properties The nonfunctional properties associated with the Thing are listed in this section. It includes, context, cost and constraints. The XML schema for the structure of the nonfunctional properties is shown below.



.......

.......

.......



1) Cost: Cost information, which can itself be a complex property expressing different prices for different amount of buying, is a non-functional property. Below is the XML schema for specify the non-functional property cost.







2) Context of Thing: The context part is divided into context info and context rules. The contextual information of the Thing provider is specified in the context info section. The situation or context rule that should be true for Thing delivery is specified in context rules section. Below is the XML schema for defining context.















3) Constraints: This section lists the guarantees that a Thing owner can provide to the requesters of the Thing functionalities. It includes interoperability, security, scalability, performance, availability and accessibility constraints. Below is the XML schema for specifying constraints.









VI. R ELATED W ORK This research has introduced an SOA based architecture for the publication, and discovery of Things in the Internet of Things. This section is concerned with reviewing related approaches that uses SOA for IoT. Context, scalability, dynamic context, discovery, search, rank, and formality were the criteria used to review the related approaches. Each of these criteria is crucial for any architecture as discussed in Section III. In [13], the authors aims to integrate business systems with their manufacturing and logistics processes. This architecture does consider discovery, search, and ranking of Things. This is achieved by using a service catalogue and ranking services based on the program needs during runtime. It also uses web services, which allows it to be scalable. However, this architecture does not consider context at all. In [14], the authors look at the interaction between IoT and the environment. This is caused by the environment creating events, rather than receiving input from the user. This architecture does allow discovery and search-ability of Things using an IoT Browser but does not have a way to rank them. The browser also allows users to view statistics and certain context of sensors and devices based on current events. The events themselves are viewable and dynamic in nature. This approach is formal, and does show some coding

ISBN: 1-60132-489-8, CSREA Press ©

Int'l Conf. Software Eng. Research and Practice | SERP'18 |

and algorithms. However, this research does not consider scalability. In [15]. the authors consider IoT Things as resources and builds an architecture for that. Like other approaches, this one does consider discovery by using a service broker, but there is no mention of being able to search or rank these services. Besides discovery, this approach also allows for scalability in its architecture by using the Web of Things. This approach also does not consider any type of context. In [16], the authors introduce a Discovery Driven SOA. The architecture allows for the discovery of Things, applications, and middleware. However, no mention of being able to rank Things, and only a brief mention of being able to search for Things using a query. By using SOA for services, middleware, computing, storage, and applications, there is support for scalability in this architecture. There was no consideration of context. In [17], the authors are heavily concerned with dynamic context-awareness through context aware applications. This architecture however, does not allow for any scalability. It also does not allow for any discovery due to its one-to-one behavior. Table I illustrates a summary of our study of the related work. A common theme in all of these architectures is either some type of context awareness and little to no discovery options is provided, or no context with discovery option is provided. None of these architectures support all the comparison criteria. On the other hand, our novel architecture satisfies all of these criteria.

Context Scalability Dynamic Context Discovery Search Rank Formal

[13]

[14]

[15]

[16]

[17]

No Yes No Yes Yes Some No

Some No Yes Yes Yes No Some

No Yes No Yes No No No

No No No Yes Some No No

Yes No Yes No No No No

Table I R ELATED W ORK

VII. C ONCLUSION We have introduced an SOA-based architecture for the publication and discovery of Thing in IoT. This paper has introduced an architecture for Thing description. This architecture is formally defined using set-theory. A prototype has also been implemented. The prototype simulates the behavior of the registry. In this prototype, Thing owners were able to publish information about their sensors using our XML language. Clients were also able to search for sensors and request live weather information. Our future work includes a complete implementation of the publication and discovery architecture. It will also include the use of

185

cloud computing to implement the registry. This will enable our registry to be scalable using the cloud resources. R EFERENCES [1] J. Gubbi, R. Buyya, S. Marusic, and M. Palaniswami, “Internet of things (iot): A vision, architectural elements, and future directions,” Future Gener. Comput. Syst., vol. 29, no. 7, pp. 1645–1660, Sep. 2013. [2] P. Ray, “A survey on internet of things architectures,” Journal of King Saud University - Computer and Information Sciences, 2016. [3] N. Ibrahim and B. Brandon, “Service-oriented architecture for the internet of things,” in The 4th Annual Conference on Computational Science and Computational Intelligence (CSCI’17), Dec 2017, pp. 1004–1009. [4] T. Erl, SOA Principles of Service Design. Upper Saddle River, NJ, USA: Prentice Hall PTR, 2007. [5] A. K. Dey, “Understanding and using context,” Personal Ubiquitous Comput., vol. 5, no. 1, pp. 4–7, 2001. [6] N. Ibrahim, “A service-oriented architecture for the provision and ranking of student-oriented courses,” ICST Trans. eEducation e-Learning, vol. 3, no. 9, p. e2, 2016. [7] D. Georgakopoulos and M. P. Papazoglou, Service-Oriented Computing. The MIT Press, 2008. [8] N. Ibrahim, V. Alagar, and M. Mohammad, “A contextdependent service model,” EAI Endorsed Trans. Contextaware Syst. & Appl., vol. 1, no. 2, p. e3, 2014. [9] M. P. Papazoglou, Web Services: Principles and Technology, 1st ed. Prentice Hall, 2008. [10] D. Shamonsky, “Internet of things: Context of use just became more important,” https://www.ics.com/blog/internet-thingscontext-use-just-became-more-important, January 2015. [11] L. Yu, Y. Yang, Y. Gu, X. Liang, S. Luo, and F. Tung, “Applying context-awareness to service-oriented architecture,” in 2009 IEEE International Conference on e-Business Engineering, Oct 2009, pp. 397–402. [12] K. Wan, “Lucx: Lucid enriched with context,” Phd Thesis, Concordia University, Montreal, Canada, January 2006. [13] P. Spiess, S. Karnouskos, D. Guinard, D. Savio, O. Baecker, L. M. S. d. Souza, and V. Trifa, “Soa-based integration of the internet of things in enterprise services,” in Proceedings of the 2009 IEEE International Conference on Web Services, ser. ICWS ’09. Washington, DC, USA: IEEE Computer Society, 2009, pp. 968–975. [14] L. Lan, B. Wang, L. Zhang, R. Shi, and F. Li, “An eventdriven service-oriented architecture for the internet of things service execution,” International Journal of Online Engineering, vol. 11, no. 2, pp. 4–8, 2015. [15] V. Vujovic, M. Maksimovic, D. Kosmajac, and B. Perisic, “Resource: A connection between internet of things and resource-oriented architecture,” in Smart SysTech 2015; European Conference on Smart Objects, Systems and Technologies, July 2015, pp. 1–7. [16] D. Georgakopoulos, P. P. Jayaraman, M. Zhang, and R. Ranjan, “Discovery-driven service oriented iot architecture,” in 2015 IEEE Conference on Collaboration and Internet Computing (CIC), Oct 2015, pp. 142–149. [17] C. M. Olsson and O. Henfridsson, Designing Context-Aware Interaction: An Action Research Study. Boston, MA: Springer US, 2005, pp. 233–247.

ISBN: 1-60132-489-8, CSREA Press ©

186

Int'l Conf. Software Eng. Research and Practice | SERP'18 |

WBAS: A Priority-based Web Browsing Automation System for Real Browsers and Pages Kenji Hisazumi System LSI Research Center Kyushu University Fukuoka, Japan [email protected]

Taiyo Nakazaki Electrical Engineering and Computer Science Kyushu University Fukuoka, Japan

Takeshi Kamiyama Information Science and Electrical Engineering Kyushu University Fukuoka, Japan

Akira Fukuda Information Science and Electrical Engineering Kyushu University Fukuoka, Japan

Abstract—Nowadays measuring characteristics of Web browser plays an important role in developing web pages and browsers because we massively browse web pages in daily life. Some literature measures the features of Web browsers for each Web pages by giving input during browsing, but manual input requires time and effort. Therefore, an automated input method is proposed in the previous research, but there are problems such as not being able to cover the all possible input on the Web page. The paper proposes a priority-based automatic input generation system to solve the challenge. The system inputs included in the target Web page as much as possible in the order of the left to right of the page.

I.

INTRODUCTION

Suppose that we want to measure the characteristics both of a mobile Web browser and a site used by end users. For example, we want to identify the correlation between the used library for each site and the power usage. It is quite a good contribution for both of end users and Web developer if we can know which library tend to consume massive energy in actual usage. It is necessary to browse the site and measure the power while automatically giving mouse clicks and key inputs on the real browser to realize such kind of applications. There are crawling tools such as PhantomJS [1] and headless browsers such as Selenium [2] as tools for such kind of purpose. They have a massive function to do automated traversing among a site. However, we believe that these tools are inappropriate because they do not conduct all functions the real browsers do such as rendering and display, etc. to measure characteristics of both of the browser and the sites. Therefore, in this research, we propose Web Browsing Automation System (WBAS) which automatically performs browsing while controlling the actual browser externally. WBAS simulates human user inputs as much as possible. This input method enables us to gather natural information.

Yuki Ishikawa Information Science and Electrical Engineering Kyushu University Fukuoka, Japan

The contributions of this paper are as follows: ⚫

Proposes an automated traverse process to gather needed information and input mouse and key related event for traversing using Web browser which normally used by end users.



Demonstrates the prototype implementation of the WBAS using Google Chrome for Personal Computers.



Evaluates performances of the WBAS regarding the coverages of the sites, coverages of types of events and execution times

This paper is organized as follows: Section II discusses the requirements for the WBAS which is used for measuring characteristics while it drives the real browser. Section III shows a design of it and section IV implements WBAS according to the design using Google Chrome. In section V, we evaluate our proposed system preliminary. We discuss the related works in VI and conclude this paper in section VII. II. THE REQUIREMENTS This section discusses the requirements for Web browsing automation system (WBAS). WBAS is a system to input actual web pages in websites into a real Web browser for measuring the characteristics of both web browser and the website. Here we clarify the supposed environment of the system as follows (Fig. 1). We use an actual Web browser as a part of the WBAS for measuring behaviors of all the browser components necessary from page request to display. It uses the externally observable information from a target website to cover all pages as much as possible. We discussed above in the motivational example; we cannot assume that it is possible to acquire internal information such as the structure of the site, the information that can be input and its position, the expected response at the time of input.

ISBN: 1-60132-489-8, CSREA Press ©

Int'l Conf. Software Eng. Research and Practice | SERP'18 |

187 III. DESIGN OF THE WBAS This section discusses the design of WBAS. The WBAS should traverse pages as much as possible in the target site. Here, we itemize how to explore pages in it as follows: 1.

Set the top page of the website as a target page

2.

For each page: A)

Identify input components from the target web page

B)

Identify input coordinates for each component obtained A)

C)

Filter and sort input components according to the priority

mouseover

D)

Execute inputs

error

E)

Record the URL of pages on the target page such as the link included in the page

Fig. 1 System Overview

click load

mousedown focus mouseout

2.

Recursively execute 2. as the target for the URL identified in 2. We will describe the prioritized the target URLs. In the prioritized method avoids exploring the same page.

3.

The traverse system finished these steps if there is no recorded URL.

TYPE OF EVENT

blur keydown keyup mouseup

We discuss these in order in this section.

change scroll

A. Identify Input Components from a Target Web Page The system should identify all inputs we want to give the target page. The system obtains a document object model (DOM) which is made by given HTML document. The DOM contains links as or tags. DOM also contents event listeners which handle and respond to inputs. We treat these links and event listeners as input components we should give input.

touchstart submit focusin pointerdown input pointerup keypress live mousemove 0

500

1000

1500

Fig. 2 Statistics of Used Events For satisfying these constraints, the system should have the following functions: ⚫

Controlling real Web browser to do web browsing and measure characteristics



Identifying all the targets and its position to be input from given source codes such as HTML [3], JavaScript [4], etc.



Detecting changes when it inputs something into the web browser to judge whether inputs are reflected correctly. For instance, the browser displays new information according to the input.



The system can traverse the pages comprehensively on the site without internal information.

To generate input, we obtain the coordinates where the input component is located and give the coordinates the input corresponding to the event. We also can give the input directly to the event listener stored on the DOM, but we do not employ this way. The Web page sometimes has unclickable or invisible elements, so even if the system executes the event listener, it is different from the actual user behavior. We discuss the types of events to be input next. To identify these, we investigate event listeners acquired for the top page of 19 websites. We select them from top 19 large accessed Japanese and English sites from Alexa[5]. The Fig. 2 shows the result. This figure shows that many Web sites contain not only mouse events such as click and mousedown, but also keyboard events such as keyup and keydown. The system should handle both of mouse and key events. B. Identify Coordinate of Inputs The system gives inputs to the coordinates where the input component is located. We need to identify the coordinates them. Each component in the DOM has the x and y coordinates. It also has the height and width. Because the component has the thickness, we have to decide the location where the system

ISBN: 1-60132-489-8, CSREA Press ©

188

Int'l Conf. Software Eng. Research and Practice | SERP'18 |

inputs. We input the event to the center of the component. We can calculate (x0 , y0 ) the center of components as follows: 𝑤 x0 = 𝑥 + 2 ℎ y0 = 𝑦 + 2 where (x, y) is coordinates of the left upper side, w is the width and h is the height of the element. C. Filter and sort input components according to the priority The system performs filtering and sorting according to the priority of the input components. The filtering is for not to give input once input. The prioritize is for simulating the order that a human user is likely to input. Here, we discuss how to prioritize the input components. The priority is defined as the degree to be input first. We expect that the system simulates user behaviors while browsing and inputting to the Web page for measuring natural characteristics of the sites. The Buscher et al. [6] clarify that which part of the page the user is interested in first. They analyze gaze of the users by having 15 subjects while browsing Web pages. As a result, they have found that the user gazes closer to the left side of the screen of the Web pages. The users tend to pay attention to the left side of the screen first. Therefore, the system sets the priority to be higher from the events arranged on the left side and generates the inputs in the order of priority to fire the event listeners in the Web page in natural order of human gaze. Based on this idea, the system sorts the input components according to x0 of the center coordinates ( x0 , y0 ) of the elements. It sorts in ascending order of x0 for each input. The coordinate is absolute coordinate with the origin at the upper left corner of the Web page. The browser, however, displays the part of the page only. It cannot display the whole page at once. The system has to scroll the screen and display hidden components corresponded to the input to generate inputs located in coordinates outside the screen. To do this, we divide the screen containing the input components by the height of the display. We call the divided screen strip (Fig. 3). The system sorts strips according to y coordinates of them in ascending order. It also prioritizes according to x0 of each component in the strip.

Fig. 3 A Page is Divided into some Strips

D. Execute Inputs The system generates inputs based on priorities discussed above. Firstly, it deals with first top strips. It generates inputs to input components in order of x coordinate of it. After it completes to input all target components in the strip, it scrolls down the screen to the height of the display and conducted inputs for next strip while it can scroll down. The inputs might be the cause of state changes such as the transition to the links. When the input makes any transition, the system records the URL of the transition, and back to the original target pages to continue the traverse of the current target page. E. Record the URL of pages on the target page such as the link included in the page While executing inputs, the system detects and collects URLs on the target page for recursive browsing. We employ the two-detecting method of the URLs used by VeriWeb [7] as follows: ⚫

Record the URLs stored in the href attribute of or tags that represents the transition destination of the target page.



Record the transitioned URLs by event execution

IV. THE IMPLEMENTATION The section describes how to implement the method described in the design. This implementation employs Google Chrome [8] as a web browser. Google Chrome is a browser developed based on Chromium, which is an open source browser and has the top market share in the world. We describe the development environment for the browser, the method of acquiring the event listeners, and the method of generating the input explained the previous section in order. A. Development Environment We use chrome-remote-interface [9] which is a package that runs on node.js [10] which can use API of Chrome Developer Tools [11] as an environment. The Chrome Developer Tools are tools for web developers supporting development included in the web browser Google Chrome. By using the Chrome Developer Tools, it is possible to browse the HTML of the currently displayed web page, acquire the DOM, and edit it. Also, Chrome Developer Tools has its API, and by calling that API within the console item on the Chrome Developer Tools, you can search for specific elements in the web page, generates the inputs and get event listeners It becomes possible. The chrome-remote-interface run on node.js. Node.js is a JavaScript environment that runs on Chrome's V8 JavaScript engine. chrome-remote-interface is a module of node.js which connects to Chrome Developer Tools.

ISBN: 1-60132-489-8, CSREA Press ©

Int'l Conf. Software Eng. Research and Practice | SERP'18 | B. Launch the WBAS The user launches the WBAS, it launches a web browser and opens a Web page given by the user. We can observe the status of the Web browser. C. Obtain Event Listeners We can obtain event listeners to use getEventListeners() provided in the API of Chrome Developer Tools. The function takes selector as an argument and returns all types of the event listeners contained in the elements specified by the selector in the target page. We can acquire all event listener on the page by calling the function with a root element of the DOM, which is translated from target HTML as a selector, as an argument. Unfortunately, we cannot determine a component which handles events such as a text field. We need to identify the component to obtain the coordinate of it. Therefore, we identify the component from the argument of the event listener. After obtaining the component information from the argument, the system knows all needed information which is the type of event listener, coordinate of it. D. Generate Inputs The system generates five types of raw event which can be generated with APIs of Chrome Developer Tools. We combine these five raw events to generate inputs handled by HTML5 and JavaScript code for simulating inputs from the user. We describe these in order. 1)

Generate Raw Events

The system needs to input mouse event or key event to specific coordinates to execute the event listener. The dispatchMouseEvent() in the API can generate mouse event to the specified coordinates. This API generates four types of mouse event with arguments (x, y) which is a coordinate of that event occurs. The system inputs three types of the event; mousedown, mouseup, mousemove. mousedown is an event that pushes the mouse button once to the specified coordinates, mouseup is an input to release the button at the specified coordinates, and mousemove is an input to move the mouse from the coordinates. The dispatchKeyEvent() in the API generates a key related event. This API inputs four kinds of key events: keydown, keyup, rawkeydown, char. It takes a key code corresponding to a specific key as arguments. We employ two types of the key event; keydown and keyup. Keydown is an event when the user presses a key and keyup is an event when the user depresses a key. 2)

Generate Inputs Handled by HTML5 and JavaScript

The system generates 11 types of inputs handled by HTML5 and JavaScript to combine these five types of raw events using APIs discussed above. Table 1 shows how to give inputs the types of inputs using raw events. We can simulate click input by generating mousedown and mouseup consecutively even though there is no way to generate click event directly. For mouseout, the system is

189 necessary to move away from the mouse cursor from the target element so that we can generate mousemove from the center of the element to outside of it. We use target element width + 0.1px as an outside coordinate. Focus and blur are events that occur when the focus is on elements and remove them, respectively and are generated when putting characters in an input form. We simulate focus event to input events same as a click and blur to click on the out of the element coordinates after clicking.

Table 1. The event, and Actual Input Event Event click mousedown mouseup mousemove mouseover mouseout keydown keyup keypress focus blur

Actual Input Event mousedown → mouseup mousedown mouseup mousemove x + 0.1(px) mousemove x + 0.1(px) mousemove x + 0.1 + width(px) click → keydown click → keyup click → keypress click click → click out of the element

V. EVALUATION In this section, we demonstrate our proposed WBAS. This evaluation aims to measure the versatility, coverage and execution time of the system. We use WBAS for commonly browsed websites to measure versatility. We also measure coverage to input events shown in Table 1 to the real websites and confirm the given inputs are reflected. We measure the execution time of the WBAS. A. Evaluation Procedure Here discusses the evaluation procedure of WBAS. We employ 19 sites of Japanese and English websites in the world access ranking Top 50 sites by Alexa [5]. The reason to exclude other than English and Japanese sites is that we cannot verify the results regarding readability. We measure the coverage for each target 19 sites. We want to limit the number of pages to traverse to avoid longer execution time; the system accesses just five pages can be traced from the top page of one of the target sites. Also, the evaluation system continues its execution until 100 events are executed, when the number of executed events is less than 100 events after accessing five pages. We calculate the execution coverage Ce as follows: Ce =

𝐸𝑐ℎ𝑎𝑛𝑔𝑒𝑑𝑜𝑚 × 100 𝐸𝑖𝑛𝑝𝑢𝑡

Where Einput is the number of actually inputted events and Echangedom is the number of events which changes the DOM. We suppose that the event is reflected on the Web page if there is a change in the DOM by giving input. The system detects this change of the DOM to comparing the DOM before and after input respectively. We just ignored the target elements

ISBN: 1-60132-489-8, CSREA Press ©

Next, Fig. 6 shows the execution coverage for each event shown in Table 1 among the all 19 sites in a box-plot diagram. 100

60 40 20

Sites

Fig. 4 The coverage for each Web sites

Microsoft

Wordpress

Twitch

Pornhub

Netflix

Live

Reddit

Facebook

Amazon

0

Google

coverage(%)

80

Microsoft

Sites

Wordpress

Twitch

Pornhub

Netflix

Live

Reddit

Amazon

Facebook

0%

  click

  mousedown

  mouseup

  mouseover

  mouseout

  mousemove

  keyup

  keydown

  keypress

  focus

  blur

Fig. 5 Statistics of Used Events 100 80 60 40 20

  blur

  focus

  keypress

  keydown

  keyup

  mousemove

0

  mouseout

On the other hand, the coverage of Reddit, Twitter, Facebook execution rates are lower than other Web pages. The reason for this is that when the input is given on any page, new components are displayed on the page, and subsequent inputs are not reflected until the screen is closed. Accordingly, when detecting new components appeared, it is necessary to interrupt providing the input and take countermeasures to close the screen to reset original state.

20%

  mouseover

This Fig. 4 shows that the variance of the coverage is relatively small in Pornhub, Wordpress, Twitch, and Imgur. We found focusing both of Fig. 4 and Fig. 5 that the variation is small in the site where the click rate is high. On these pages, the execution coverage is high because many inputs that cause page transition will occur when the input is given.

40%

  mouseup

Next shows and discuss the result of the evaluation. Fig. 4 shows the execution coverage traversing the 19 Web sites by a boxplot diagram for each target sites. Fig. 5 shows the proportion of events included in each site.

60%

  mousedown

B. Results 1) Coverage

80%

Google

We measure the execution time between opening a target page and completing to execute all event inputs. The raw execution time includes the load time of the page which vary for each page and condition. We separately measure the load time to remove the effect of it from execution time. The execution time also includes 0.5 seconds waiting time to wait the DOM is changed after inputting events. The system waits one second after inputting click and mousedown event because these events can be causes of a page transition which needs more time to stabilize. We measure the time while inputting 100 events due to fairly measure the time among sites.

100%

  click

which are located in (0, 0) or negative value because there is no change in the DOM when we try to input events to such elements. We give ‘a’ latter for key related event listeners such as keydown, keyup, keypress as a parameter tentatively.

percentage for each type of event(%)

Int'l Conf. Software Eng. Research and Practice | SERP'18 |

coverage(%)

190

type of events

Fig. 6 The coverage for each type of events keydown, keypress, and blur show higher execution coverage than other events. The input forms typically have keydown, and keypress handler and these are easily reflected their behavior to DOM. Therefore, the coverage can be high. The coverage of the blur event is also high. blur is an event listener that ignites when a form is unfocused after the form focused once. The system can handle it properly. The coverage of mouseup is low. The mouseup is an event that fires when the mouse button is released after pressing the mouse button and is often incorporated together with mousedown in the same element, but it is sometimes called only by mouseup. The case of mouseup is used alone, the DOM is instantaneously changed, and the change is not observable by the system. The keyup one is also low. The keyup is an event that fires at the moment of the key is released and is incorporated for displaying search candidates in a text field. The system cannot input the keyup event properly in case of newly appearing

ISBN: 1-60132-489-8, CSREA Press ©

Int'l Conf. Software Eng. Research and Practice | SERP'18 | elements overlapped due to another event execution. This is a problem that the system input some event continuously to the target site. To avoid this problem, give priority to event listeners included in the same element, or give action to return to the same state as before input. 2)

Execution Time

Fig. 7 shows the execution times until 100 events are executed for Web sites. The net execution time is indicated in blue, the loading time for page transition is indicated in orange, and the input waiting time shown in yellow. 1,000

Time(s)

800 600

400 200

Microsoft

Wordpress

Twitch

Pornhub

Netflix

Outlook

Reddit

Facebook

Amazon

Google

0

Sites 入力待機時間 Waiting

ロード時間 Loading

実行時間 Executing

Fig. 7 Execution Time We try to shorten standby time in the net execution time to reduce the net execution time, but the coverage is decreased because the system cannot detect changes of DOM correctly. The standby time mentioned above is needed to work properly. The net execution time, which excludes the load and input waiting time, becomes longer as the number of the pages increases. Especially in Office, it takes a lot of time to analyze even pages with no event listeners. To reduce the analysis time, we need to tune up to avoid this analysis such as filtering. Also, if you want to obtain results from the automatic input of multiple Web sites, we can make our system in parallel to reduce execution time. It can hide the loading and standby time. VI. RELATED WORKS The [12] proposes an automated input generation method which solves similar challenges of our proposed method. However, the method and implementation support only five mouse related events such as click and mouseup. It is important to support not only mouse event but also key event such as keyup to operate target sites more comprehensively. Dynodroid [13] is an automated input generation system for Android applications. Dynodroid employs an observe-

191 select-execute cycle. The observer stage analyzes the state of the phone for two types of events, UI events, and system events, and specifies what kind of event is expected. The selector stage chooses one event using a selection algorithm called BiasedRandom from the expected events specified by the observer. The executor executes the selected event and reflects the state of the result executed on the phone. Dynodroid repeats this cycle and enables the automated generation of consecutive inputs. Our system also employs this observe-select-execute cycle. We, however, did not employ the BiasedRandom algorithm because of two reasons as follows. We can identify places where we should input due to the nature of the Web system. The BiasedRandom can be employed the system in which it is difficult to identify what type of input can be accepted by the application. Also, we want to simulate human inputs to Web system as described in Section II. VII. CONCLUSION This paper proposes Web Browsing Automation System (WBAS) which automatically performs browsing while controlling the actual browser externally. WBAS simulates human user inputs. This input method enables us to gather natural information which reflects real usage pattern of the Web system. We propose automated traverse steps to gather needed information and input mouse and key related event for traversing the websites. We implemented the prototype of the WBAS using Google Chrome for Personal Computers. We also demonstrate the performance evaluation. It shows that the coverage can be varied according to the target sites, we believe that we can run the target sites enough. We still have some challenges in the WBAS as follows: The system should be improved regarding coverage. We can consider several causes of low coverage. For example, now we cannot handle events in the newly appeared elements such as JavaScript drawn pulldown menu. Also, we have challenges related to resetting to the original state after firing an event. It can because of lower coverage. We also have to give natural text inputs which take into account a field name, etc. Performance issues are also main challenges in our system. REFERENCES [1] [2] [3] [4]

[5] [6]

[7]

[8]

A. Hidayat, “PhantomJS,” http://phantomjs.org, [Accessed May 2018]. Selenium Project, “Selenium,” https://www.seleniumhq.org/, [Accessed May 2018]. HTML mdn mozilla developer network, https://developer.mozilla.org/en/docs/Web/HTML, [Accessed May 2018]. JavaScript mdn mozilla developer network, https://developer.mozilla.org/en/docs/Web/JavaScript, [Accessed May 2018]. Alexa, “The top 500 sites on the web,” https://www.alexa.com/topsites, [Accessd May 2018]. G. Buscher, R. Biedert, D. Heinesch, and A. Dengel, “Eye Tracking Analysis of Preferred Reading Regions on the Screen,” Proceedings of CHI '10 Extended Abstracts on Human Factors in Computing Systems, CHI EA'10, ACM, pp. 3307–3312, DOI: 10.1145/1753846.1753976, 2010. M. Benedikt, J. Freire, and P. Godefroid, “VeriWeb: Automatically Testing Dynamic Web Sites,” Proceedings of 11th International World Wide Web Conference (WWW 2002), pp. 654–668, 2002. Google, “Google Chrome,” https://www.google.com/chrome/, [Accessd May 2018].

ISBN: 1-60132-489-8, CSREA Press ©

192 [9]

Int'l Conf. Software Eng. Research and Practice | SERP'18 |

chrome-remote-interface, https://www.npmjs.com/package/chromeremote-interface, [Accessed May 2018]. [10] Node.js, https://nodejs.org/, [Accessed May 2018]. [11] Chrome DevTools, https://developers.google.com/web/tools/chromedevtools/, [Accessed May 2018]. [12] Y. Ishikawa, K. Hisazumi, and A. Fukuda, “An Automated Input Generation Method for Crawling of Web Applications,” Proceedings of

the 2017 International Conference on Software Engineering Research and Practice, pp. 81–87, 2017. [13] A. Machiry, R. Tahiliani, and M. Naik, “Dynodroid: An Input Generation System for Android Apps,” Proceedings of the 2013 9th Joint Meeting on Foundations of Software Engineering, ESEC/FSE 2013, ACM, pp. 224– 234, DOI: 10.1145/2491411.2491450, 2013.

ISBN: 1-60132-489-8, CSREA Press ©

Int'l Conf. Software Eng. Research and Practice | SERP'18 |

193

Clone Detection Using Scope Trees M. Mohammed and J. Fawcett Department of Computer Science and Electrical Engineering, Syracuse University, Syracuse, USA [email protected], [email protected]

Abstract - Code clones are exact or nearly exact matching code segments in software source code. Clones make software maintenance difficult. Identification and management of code clones is an important part of software maintenance. This paper proposes a clone detection system that uses a scope tree representation of the structure of source files, using several measures of similarity. The method can be used for detecting clones in C, C++, Java, and C#. An experimental clone detection tool is evaluated based on its analysis of OpenSSL and Notepad++. Using scopes makes the clones an effective guide for code restructuring. Our approach is also compared with other clone detection tools: CCFinder and Deckard. The results are encouraging both in terms of efficiency as well as detection accuracy. In addition, we expect our method to scale well for large systems based on its computational complexity and the fact that much of the analysis is readily implemented with concurrent processing of each file analyzed. Keywords: Clone Detection, Maintenance, Restructuring, Scope Tree, Token.

1. Introduction A code clone is a segment of source code which is the same or similar to another segment in the same or another file. If software has clones, maintenance becomes burdensome [4]. When a change is required for a code segment having corresponding clones in other parts of the source, the change may be needed in all corresponding parts. Therefore, detecting and managing code clones is an essential part of software maintenance. Code may be cloned for several reasons [5]. Quick fixes using copy-and-paste is the most common one. Code cloning has pros and cons [6]. There is extensive research on clone detection. Some of the tools discussed, such as CCFinder, detect clones using methods focused on code segment texts and miss modified but important clones [1]. Others use AST based approaches to detect relevant clones. The drawback of these is that they discourage use in software maintenance process as building the AST itself is a complex process, even using available compiler components, like LLVM. Also, the resulting clones may not be suitable for restructuring.

We propose a clone detection method using scope trees, similar to, but significantly simpler than ASTs, that yields relevant clones in software source code, for multiple files, relatively efficiently. The approach builds scope trees for each file using a light weight parser with tokens stored in blocks for a given scope. Structural matches within a file and among files is found using a structural match algorithm by comparing scope trees. Once a structural match is detected, tokens for corresponding scopes are compared using string comparison for exact or near matchings. An edit distance algorithm [13] is also adapted to detect modified code. The rest of the paper is organized as follows. Part II gives background about clone detection and scope trees. The proposed clone detection method is presented in part III. Details of the major algorithms is provided in part IV. Part V gives information about the implementation of our analysis tool. Part VI describes results obtained by running the clone detection tool with example test source code, and two open source software system codes –to examine clone detection capability and to explore refactoring opportunities. Part VII concludes with a summary of results and planned future work. Part VIII summarizes related clone detection work.

2. Background This part gives a brief overview of clone detection and scope trees. Part 2.1 discusses clones; the concept of scope trees and how they are used in this paper is discussed in the second part, 2.2.

2.1. Code clone basics There is no unique agreed upon single definition of code clone [4]. However, several clone types have been defined [2, 3, 4, 5]: Type 1(exact clones) - these are fragments of code which are identical except for variation in white space and comments. Type 2 (renamed/parameterized clones) – fragments of code which are structurally/syntactically the same except for changes in identifiers, literals, types, layout and comments. Type 3 (near miss clones) – fragments of code that are copied with further modifications such as statement insertions/deletions in addition to changes in identifiers, literals, types and layouts.

ISBN: 1-60132-489-8, CSREA Press ©

194

Int'l Conf. Software Eng. Research and Practice | SERP'18 |

Type 4 (semantic clones)– fragments of code which are functionally similar without having text similarity.

similar to Macabe’s cyclomatic complexity [10]. However, the computation and its purpose are different.

In this work, we will address efficient detection and management of all the clone types except for type 4. In the next section, we focus on the method we used for clone detection.

3. Proposed method of clone detection

2.2. Scope trees A scope Tree is the representation of source code using a tree structure with each scope being a node in the tree. A scope is usually delimited by open and closed braces. Functions, if, else, switch statements, loops, enums, classes, structs, and namespaces are all scopes. An example is shown in Fig. 1 for pseudo code described in Table 1, File 1.

for(...)

{ B0

{ B5

if(..)

if(..)

{ B2 }

{B51 }

else

}

{ B3 }

B6

B4}

}

3.1. Description of clone detection computation The proposed method uses scope trees to detect clones in software system source code. The process is illustrated in figure.2. The block diagram shows four major steps, discussed in the following except the last one which is discussed in part 4. 3.1.1. Tokenizer This step takes the path to a set of files to analyze and generates a stream of tokens for each file. The result is given to the next step, rule-based parser.

Table 1 - File 1 pseudocode. Fun( )

This section discusses our clone detection method. First, a general overview is given using block diagrams. The last step, clone detections, shown in the following diagram is explained in detail, in section 4.

B3, 0

Fig. 1. Example file,File 1, simplified source and scope tree representation. The source file, File1, is represented as scopes having blocks representing sequence of tokens which are in turn representations of statements in the source file. In the scope tree diagram, the blocks are stored in a list. The index indicates the place where the blocks are found relative to the child scopes. For instance, B0,0 means block B0 is located before all the child scopes whereas B4, 2 means B4 is found next to the second child scope. In addition, scope cyclomatic complexity, henceforth called cyclomatic complexity, is defined based on the scope tree hierarchy. A leaf node has a cyclomatic complexity of one. Internal nodes have cyclomatic complexity of the sum of the complexities of child nodes plus one. This complexity is

Fig. 2. Scope tree-based clone detection block diagram 3.1.2. Rule based parser Rules are specified to extract data from source code. It takes the tokens from the previous step and generates scopes such as namespaces, classes, functions, conditions, loops and even just blocks separated by curly braces. See [11] for details. 3.1.3. Scope tree builder This step builds a scope tree based on the containment of one scope in another. The output of this stage is scope tree,

ISBN: 1-60132-489-8, CSREA Press ©

Int'l Conf. Software Eng. Research and Practice | SERP'18 |

like figure 1, for each file which will be used to check structural similarity in the next stage. Algorithm 1 describes how a scope tree is built. Token block extraction is done also using the same algorithm. This algorithm scans tokens to check start and end of scope and acts accordingly. Algorithm 1 - Scope tree builder BuildScopeTree() { root = CreateNode(“file”) BuildScopeTree(root) } BuildScopeTree(current_node) { for each tok of tokenized file { If (start of scope) { scope_node = CreateScopeNode() current.stack.push_back(scope_node) } else if (end of scope) { current.blocks.puch_back(block) current.stack.pop() } else { push token to block. } }

In addition, cyclomatic complexity of each node is computed by assigning the value 1 to leaf nodes and the sum of the cyclomatic complexities of the child nodes plus 1 to the internal nodes. To make the structure matching step efficient, nodes are ordered based on their cyclomatic complexity and placed in buckets. The maximum bucket is determined from the cyclomatic complexities of the root nodes. Nodes having the same cyclomatic complexity are stored in the same bucket. For example, the root node of File 1, in figure 1, is stored in bucket 6 as it has cyclomatic complexity of 6. Functions fun of File 1 is stored in bucket 5 as it has complexity of 5. Buckets may be combined to detect modified code at the structural level. There will be only one scope structure that spans all of the analyzed files. In the next part the clone detection step is discussed.

4. Clone detections This is the last step shown in Fig. 1.. It detects matches within a file or among different files in the source file. It is a two-stage process. First structurally similar trees are identified. The output of this stage is given to the exact or near exact match step. This stage uses a relatively precise step to detect the clones. The following algorithm handles both cases.

195

Algorithm 2 – Matching algorithm matchTrees(node1, node2, isExact) { NodePairQueue.push({n1, n2}) While ( !NodePairQueue.empty()) { Node_pair = NodePairQueue.front NodePairQueue.pop() if( !isExact) { Match = matchNodes(node_pair.first, node_pair.second) if(match < matchThreshold) return 0 } else { // Algorithm 3. nodeMatch = matchNodesED(node_pair.first, node_pair.second, noOfNodeTokens) totalNumOfTokens = totalNumberOfTokens + noOfNodeTokens numOfMatchedTreeTokens += nodeMatch } for each children node1 and node2 pair of nodePair.first and nodePair.second { NodePairQueue.push({node1, node2}) } if(!isExact) Return match else { if(totalNumOfTokens == 0) Return 1.0 else return (numOfMatchedTreeTokens / totalNumOfTokens) } } }

4.1. Structural matching At this stage scope trees having same cyclomatic complexity or difference of 1 or 2, are retrieved from their buckets for comparison. Similarity checking is done using different criteria such as number of blocks, number of children, number tokens, number of identifiers and type of node. These are parameters with values we can change before clone detection starts. Two trees are structurally similar if the corresponding nodes are similar.

4.2. Exact or near exact match At this stage the structurally similar nodes are filtered with additional matching criteria. The blocks have sequences of tokens. By setting a certain level of accuracy say 0.90 comparison is done between nodes of the structurally similar nodes. This approach can detect modified code. To detect modified code, we also use an edit distance algorithm applied

ISBN: 1-60132-489-8, CSREA Press ©

196

Int'l Conf. Software Eng. Research and Practice | SERP'18 |

to list of tokens. Algorithm 3 describes the edit distancebased algorithm. Algorithm – 3 Modified edit distance algorithm matchNodeEd(node1, node2) for each block in node1 for each token in block tokenList1.push_back(token) for each block in node2 for each token in block tokenList2.push_back(token) // EditDistanceStrLst is edit distance // algorithm for list of strings. editDist = EditDistanceStrLst(tokenList1, tokenList2) noOfTokens = max(len(tokenList1, tokenList2)) approxMatch = noOfTokens – editDist return {approxMatch, noOfTokens}

5. Implementation As a proof of concept, a tool has been implemented using the C++ programming language. The tokenizer and rule-based parser are general purpose modules developed for source code analysis and have been used in previous research and class room projects. Some tuning is made to fit this work. The other components, Scope builder and Clone detection are implemented for this research.

6. Results and discussion We have carried out experiments on test code prepared for clone detection purposes by copy-and-pasting and systematically modifying files and fragments of code. First, we used c source code, implementing quick sort, to detect clones using our tool and two other clone detection tools: CCFinder [1] and Deckard [15]. For real case studies, we used Notepad++ [12] and OpenSSL [14]. Our tool also successfully detects clones in benchmark code specified in [2].

6.1. Test code for clone detection To evaluate the clone detection capability of our clone detection, and two other tools: CCFinder and Deckard, we prepared test code examples by modifying an implementation of the quick sort algorithm. The algorithm is stored in a file (qsort_1a1.c [1a1]). Various modifications are done to this file. The important function for clone detection is the “partition” function. Most of the modifications, 1a2 to 1a9, are based on this function; and these are for function level clone detection even though each function is stored separately in its respective file. The rest, 1a10 and 1a11, are prepared for file level clone detection. The original partition function from 1a1 is replaced with a different implementation in 1a10 leaving the rest of the functions as they are; whereas, in 1a11,

a different algorithm, merge sort algorithm is implemented leaving only two small functions, “display” and “main” the same. Here are the descriptions of each test file used for clone detection: i. qsort_1a1.c (1a1)– quick sort implementation. ii. partition_1a2.c (1a2)- copy-and-pasted partition function from original (1a1) with only identifier changes including function name and parameters. iii. partion_1a3.c (1a3)– If block, line 6-9, is added to the original (1a1) “partition” function. iv. partion_1a4.c (1a4) – else block, line 12-13, added to the original(1a1) “partition” function. v. partition_1a5.c (1a5)– several comments are added to the original(1a1) “partition” function. vi. Partition_1a6.c (1a6) - If block, line 6-9, is added to 1a2’s “part” function before the outer while loop. vii. Partition_1a7.c (1a7) - else condition, line 16-17, is added to 1a6's “part” function. viii. Partition_1a8.c(1a8) - comments are added at several places to 1a7. ix. Partition_1a9.c (1a9) – the empty else statement in 1a8 is modified by adding two statements. x. Qsort_mutated_1a1(1a10) – 1a1 copy-pasted with a completely different partition function. However, the rest of the functions are the same. xi. `Merge-sort_1a1.c (1a11) – this is a different algorithm, merge sort. However, two of the functions are the same: “display” and “main.” The changes made in 1a2 to 1a5 are relative to the original file; whereas, the changes in 1a6 to 1a9 are cumulative, with 1a9 modified the most. We ran CCFinder[1], Deckard[15], and our tool with the above test files as input. To compare the clones detected, refer to figure 3, that shows the distinct parts of the code that will be used in the clone detection result. The block code insertions are shown with gray blocks. It shows most of the partition function modifications. The content is actually 1a9’s, which is a result of all the modifications listed in 1a2 to 1a9. One can use figure 3, to make sense of most of the clone detection result discussions. The last two files, 1a10 and 1a11, are not shown in this result. Please see the link in reference [16] for these and the individual files for 1a1 – 1a9. Table 2 reports the clones detected by the three tools. As can be seen in Table 2, all three tools detected the same clone for 1a2 vs 1a1, 1a3 vs 1a1, and 1a5 vs 1a1. They detected whole partition function, outer while loop and whole partition function respectively.

ISBN: 1-60132-489-8, CSREA Press ©

Int'l Conf. Software Eng. Research and Practice | SERP'18 |

size_t part (int * data, int left, int right) { int i = left + 1, j = right; // Left item is selected as pivot. int pivot = data[left]; Function upper // This case is redundant. If (l == r && data[pivot] method. These syntaxes should cover most APIs programming languages that are used for integration. The result was 1486 questions. However, there are 227 questions linked to more than one methods. After some investigation, we found that more than 190 of these questions are addressing more than one method on purpose. Therefore, we kept these records duplicated since they are addressing more than one methods. The second way to link discussions and method is searching in the answers considering that developers do not know the name of the method and they are looking for a functionality that is offered by the service. The result is 474 new discussions that calling protocol syntax appear in its answers only. The total discussions we have are 1960 discussions that addressed YouTube APIs. Github is a suitable code repository for YouTube since it is the official resource of open-source projects according to

ISBN: 1-60132-489-8, CSREA Press ©

212

Int'l Conf. Software Eng. Research and Practice | SERP'18 |

returns< 138.5 |

47.117 n=1960

300 250 200 150 100

V. score/1000

Q. scoure

class size X 10

Comments %

SLOC/10

99.686 n=257

exampls X 10

39.184 n=1703 returns

sum.code>=684.5

0 arguments X 10

arguments< 3.5

words/10

50

Fig. 3: Data summary for all metrics

YouTube itself. It has more that 50K of source code files that using YouTube APIs from different domains. Therefore, it can represent a good sample of systems or applications that use YouTube APIs. The number of YouTube method calling syntaxes appeared in Github codes repository is used to calculate operational profile for each method. The variables we use for analysis are the number of questions for each method and total number of viewers for each method. We divided each one by the operation profile for its method. The results are our overall usability metrics for questions and viewers: questions score and viewers score for each method.

4.3 Data Summary and Analysis First we summarize the metrics in Figure 3 in box plots and give the median and mean for each metric in Table 2. Figure 3 also shows the first and third quartile for the distribution, and outliers outside the wiskers. We also performed correlation analysis for our APIs metrics and overall usability metrics. Table 3 contains seven APIs metrics we used in our study, and two overall usability metrics questions score and viewers score. There are positive high correlation 0.99 between number of arguments and the number of words. This is expected, since more arguments in the method needs more explanation. Also, the number of arguments correlates positively with number of examples at 0.81. This shows that documentation offers more examples when a method has more arguments. Moreover, there is a correlation 0.84 between words and examples. In fact, this is a transitive relation through arguments. In other words, since the number of words is correlated with the number of arguments and the number of the arguments is correlated with the number of examples, then the number of the words is correlated with the number of examples.

10.403 n=228

43.632 n=1475

15.882 n=27

109.52 n=230

Fig. 4: TBM relating cloud APIs metrics to questions score

Finally, there is high correlation of 0.84 between the two overall usability metrics we used, questions score and viewers score. This shows that the method with more questions has more viewers. However, the correlation between API metrics and overall usability metrics are relatively low, with the correlation between questions score and returns at 0.72 and all others below 0.38.

4.4 Predictive Analysis with Tree-base Models It would be interesting to examine the relationship between our APIs metrics and overall usability metrics. In particular, we would like to establish predication relationships between our API metrics as predictor variables and our overall usability metrics as response variables. Because of the low correlation between these two groups of variables noted above and in Table 3, we explored the use of Tree-base Models (TBMs). TBM finds predictive relationship between predictor and response variables. In our case, we would like to know what are the metrics of APIs driving overall usability metrics. There are two response variables that we want to study, and we need to know how they get influenced by APIs metrics. First we used all seven APIs metrics plus seven language flags to predict questions score in Figure 4. The result shows that methods that returning more that 138 variables driving most discussions. After investigation, we found out that one method out of 50 is returning XML file with 182 variables, and there are 257 out of 1960 discussions that addressed this method. We investigated the usage or the operation profile of this method which might plays a role on increasing questions score. We found out that this method has 2% operation profile which is relatively low. This means that methods or APIs with large numbers of return variables and low usage can also make a high impact on quality and overall

ISBN: 1-60132-489-8, CSREA Press ©

Int'l Conf. Software Eng. Research and Practice | SERP'18 |

213

Table 2: Median and mean for all metrics Median Mean

Words 886 1151

Arguments 11.00 13.08

Returns 8.00 31.66

Examples 6.000 6.401

SLOC 630.0 489.6

Comment density 36.50 56.57

class size 3.000 3.617

Questions scour 46.67 47.12

View scour 52451.77 60652.34

Table 3: correlation table between variables Words Arguments Returns Example SLOC Comments Class size Questions Score Viewer Score

Words

Arguments

Returns

Example

SLOC

Comments

Class size

Questions Score

Viewer Score

1.000

0.995 1.000

-0.235 -0.305 1.000

0.844 0.814 0.050 1.000

0.536 0.497 0.222 0.804 1.000

-0.085 -0.042 -0.163 -0.144 -0.218 1.000

-0.556 -0.587 0.536 -0.274 0.197 -0.180 1.000

-0.092 -0.119 0.724 0.209 0.341 -0.010 0.426 1.000

-0.159 -0.159 0.383 0.171 0.258 0.207 0.387 0.841 1.000

increases the number of viewers. In general, we can tell from TBMs results that complicated design can decrease developer cognition for API regardless of the usage. Also design metrics such as return and arguments are more related to the questions score.

words< 611 |

60652 n=1960

arguments>=8 16150 n=399

5. Conclusions

72027 n=1561

comm.dens< 0.5119 56841 n=1158

36035 n=651

1.1566e+05 n=403

83557 n=507

Fig. 5: TBM for cloud APIs metrics to viewers score

usability. The second metrics that played an important role in developers questions are the number of arguments and size of examples. Where three arguments is the threshold linked to increased the number of questions. We applied TBM with all metrics as predictors and viewer score as response variable. The result is shown in Figure 5. The size of documentation is the major driver for viewers. Documentation that has more than 611 words drives 1561 of 1960 discussions for higher viewer score. We investigated these methods and found out that nine methods have more than 611 words. However the sum of operation profile for these method are 35% which is relatively high. Also, the number of arguments is the second driver for viewer score where more than eight arguments is the threshold that

Cloud service APIs have become a major way to integrate, use, and share computing resources. High demand for cloud services increases the need to ensure their quality and usability. Cloud service APIs quality need to be addressed by applicable quality attributes and metrics. Also, these APIs quality metrics need to be linked to some overall metrics of usability. In this paper, we develop a comprehensive framework to measure and analyze cloud service APIs quality. First, we identify relevant quality attributes applicable to cloud service APIs. Second, we decompose cloud service APIs to measurable elements. Then we define metrics to quantify these quality attributes using decomposed elements. Lastly, we measure and analyze cloud service APIs quality using existing data source from crowd source Q&A. We applied our framework on YouTube APIs and Stack Overflow to demonstrate its applicability and effectiveness. The results demonstrated the internal link among these metrics and the predictive relationship between the API metrics as a group and the overall usability metrics. The framework will allowed cloud APIs service providers to identify and remedy the weaknesses in their design and documentation.

References [1] P. Mell and T. Grance, “The NIST definition of cloud computing,” 2011.

ISBN: 1-60132-489-8, CSREA Press ©

214

Int'l Conf. Software Eng. Research and Practice | SERP'18 |

[2] W. S. Humphrey, Managing the software process. Addison-Wesley, 1989. [3] A. von Mayrhauser, Software Engineering: Methods and Management. San Diego, CA, USA: Academic Press Professional, Inc., 1990. [4] J. A. McCall, P. K. Richards, and G. F. Walters, “Factors in software quality. Volume i. concepts and definitions of software quality,” General Electric CO, Tech. Rep., 1977. [5] S. H. Kan, Metrics and Models in Software Quality Engineering, 2nd ed. Boston, MA, USA: Addison-Wesley Longman Publishing Co., Inc., 2002. [6] IOS, ISO-IEC 25010: 2011 Systems and Software EngineeringSystems and Software Quality Requirements and Evaluation (SQuaRE)-System and Software Quality Models. ISO, 2011. [7] S. Clarke and C. Becker, “Using the cognitive dimensions framework to evaluate the usability of a class library,” in Proceedings of the First Joint Conference of EASE PPIG (PPIG 15), 2003. [8] T. Green and A. Blackwell, “Cognitive dimensions of information artefacts: a tutorial,” in BCS HCI Conference, vol. 98, 1998. [9] B. A. Myers and J. Stylos, “Improving API usability,” Commun. ACM, vol. 59, no. 6, pp. 62–69, May 2016. [10] M. P. Robillard, “What makes APIs hard to learn? answers from developers,” IEEE Software, vol. 26, no. 6, pp. 27–34, Nov 2009. [11] M. Zibran, “What makes APIs difficult to use,” International Journal of Computer Science and Network Security (IJCSNS), vol. 8, no. 4, pp. 255–261, 2008. [12] H. Washizaki, H. Yamamoto, and Y. Fukazawa, “A metrics suite for measuring reusability of software components,” in Proceedings. 5th International Workshop on Enterprise Networking and Computing in Healthcare Industry, Sept 2003, pp. 211–223. [13] E. Mosqueira-Rey, D. Alonso-Rìos, V. Moret-Bonillo, I. FernàndezVarela, and D. Àlvarez-Estèvez, “A systematic approach to api usability: Taxonomy-derived criteria and a case study,” Information and Software Technology, vol. 97, pp. 46 – 63, 2018. [14] D. Alonso-Ríos, A. Vàzquez-Garcìa, E. Mosqueira-Rey, and V. MoretBonillo, “Usability: A critical analysis and a taxonomy,” International Journal of Human Computer Interaction, vol. 26, no. 1, pp. 53–74, 2009. [15] M. F. Bertoa, J. M. Troya, and A. Vallecillo, “Measuring the usability of software components,” Journal of Systems and Software, vol. 79, no. 3, pp. 427 – 439, 2006. [16] J. D. Musa, “Operational profiles in software-reliability engineering,” IEEE Softw., vol. 10, no. 2, pp. 14–32, Mar. 1993.

ISBN: 1-60132-489-8, CSREA Press ©

Int'l Conf. Software Eng. Research and Practice | SERP'18 |

215

Risk Priority numbers for various categories of Alzheimer disease Dr. Chandrasekhar Putcha, Fellow, ASCE,Professor1,, Dr. Brian Sloboda, Associate Chair, Center for Management and Entrepreneurship (CME) 2, Vish Putcha, Sr Manager Strategy, Walmart Bentonville3 1

Department of Civil and Environmental Engineering, California State University, Fullerton, USA, School for Advanced Studies (SAS), University of Phoenix, Tempe, USA, 3Walmart – Bentonville, USA

2

Abstract This paper deals with application of the principles of FMEA (Failure Mode Effects Analysis) to medical field especially the study of Alzheimer disease. FMEA is widely used tool wherein all the failure modes are individually studied with their effect on the principal components of the system and the criticality of each of the component is established so that steps can be taken to mitigate to increase safety. 1.

Introduction

This topic of FMEA has been widely discussed both in text books (Henley and Kumamoto, 1981; Ebeling,1997) and in many research papers (Cristea and Costaniescu, 2017 ; Putcha et al., 2004). FMEA principles can be applied to address both hardware and software reliability problems (Strong, 2013; Putcha, 2007). A further extension of FMEA has been carried out by some researchers (Dytczak and Ginda, 2017) wherein the authors developed a method to calculate risk priority number (RPN). As is well known, there is no cure for Alzheimer’s disease (AD) and is one of the ten main causes of death in USA. Hence, this is obviously a matter of health concern. It also imposes a big economic burden on the individuals and the society. 2.

Methodology

The basic idea of this paper is to develop the Risk priority numbers (RPN) based on the method developed by Dytczak and Ginda (2017). This is basically achieved through identification of the most important causes of registered failure modes. This

identification is achieved through quantitative description of causes and effect relationships i.e. through description of failure modes. A set of three distinct and dimensionless variables constitute the description. The actual expression of RPN is fiven as (Dytczak and Ginda, 2017): RPN = S.O.D

(1)

Where, S represents severity, O represents Occurrence, and D represents detectability This equation is a very useful relation and has been used in this study to Alzheimer disease. The data has been obtained and reported in literature (Casanova et al.,2013) through medical imaging process. In that medical study, Alzheimer disease has been classified into 4 categories listed below (Casanova et al; 2013). CN - cognitively normal cMCI - mild cognitive impairment in subjects ncMCI - MCI subjects who remained stable AD - Alzheimer’s disease One can see from the above classification that there are different stages a person goes through before he/she develops full blown Alzherimer’s disease (AD). 3.

Results

The following data has been reported in the literature (Casanova et al., 2013).

ISBN: 1-60132-489-8, CSREA Press ©

216

Int'l Conf. Software Eng. Research and Practice | SERP'18 |

Table 1. Demographic data for the 694 ADNI participants, shown across categories of clinical status CN

cMCI

ncMCI

The variables that have not been defined earlier are as follows: BMI – Body Mass Index

AD GDS - Geriatric Depression State

Number

153

182

75.0(7.0)

75.2(7.6) 75.5(7)

Sex(M/F) 102/86

92/61

136/62

Educ.(years) 16.1

15.7

15.7

Age

188 75.9(5.0)

Hand(R/L) 173/15 142/11

164/18

BMI

26.2(3.6)

26.4(4.7)

e4(Y/N) 49/139

25.4(4.7) 103/50

84/98 1.5(1.4)

171

95/76

159/12 25.6(3.9) 112/59

0.8(1.13)

1.6(1.4)

FAQ

0.1(0.4)

5.4(4.7)

2.7(3.7)

13.1(6.8)

26.6(1.7)

27.5(1.8)

23.4(2.0)

1.7(1.4)

Some very interesting points are to be noted from the above table  

 



MMSE – Mini-Mental State Exam ADNI – Alzheimer Disease Neuroimaging Initiative

14.9

GDS

MMSE 29.1(1.0)

FAQ – Functional Assessment Questionnaire

Education also plays an important role in determining if a person is cognitively normal or not. .For example, a person with 16.1 years of education is classified as CN while a person with 14.9 number of years of education is classified as AD (Alzheimer’s disease). Geriatric Depression Scale (GDS) is higher for AD patients compared to CN (cognitively normal) people. Mini-Mental State Exam (MMSE) will be scores are lower for AD patients compared to CN. Hence the grading score of abnormality is reverse. That is this score for AD will be higher than the CN people (see the attached MMSE questionnaire at the end of this paper in Appendix I ) The FAQ (Functional Assessment Questionnaire) scores are higher for AD patients compared to CN people (see the attached FAQ questionnaire at the end of this paper in Appendix II ).

The RPN values are calculated using Eq. 1 for all the 4 categories of CN, cMCI, ncMCI and AD and are shown in Table 2. For this, the following equivalence has been established: Severity (S)

GDS

Occurrence (O)

Age

Detectability (D)

MMSE

Table 2 RPN values for all Alzheimer disease

4 categories of

Group name

CN

cMCI

ncMCI

AD

S

1.13

1.4

1.4

1.4

O

0.2516

0.2486

D

1.0

1.7

RPN = S.O.D

0.2843

0.5916

4.

0.2493 0.2503 1.8

2.0

0.6282 0.7008

Discussion of Results and Conclusions

As can be seen from the RPN values, RPN is highest for AD category and lowest for CN. This is because as the Alzheimer’s disease progresses in a person, the Risk values go up. Another point to be noted is that RPN values for cMCI and ncMCI are not much different .This is because a very precise Risk calculation method has been applied to Alzheimer disease data. Using these numbers, the doctors can establish the way the medicines should be given to the patients.

ISBN: 1-60132-489-8, CSREA Press ©

Int'l Conf. Software Eng. Research and Practice | SERP'18 |

5.

References

[1] Cristea, G. and Constaninescu, D.M. (2017), ‘A comparative critical study between FMEA and FTA risk analysis methods, IOP Conference Series: Mathematical Science and Engineering 252.

217

[5] Casanova, R., F-C. Hsu, K.M. Sink, S.R. Rapp, J.D. Williamson and S. M. Resnick (2013), Alzheimer’s Disease Risk Assessment Using LargeScale Machine Learning Methods, PLOS.

and

[6] Putcha, C. (2007), ‘Software Reliability – Engineering Point of View’, Proceedings of the 2007 International Conference on Software Engineering Research and Practice (SERP), Las Vegas.

[3] Henley, E.J. and Kumamoto, H.H. (1981). Reliability Engineering and Risk Assessment. Prentice Hall.

[7] Strong, K. (2013), ‘Using FMEA to improve Software Reliability’, PSNQC 2013 Proceedings.

[2] Ebeling, C. E. (1997). Reliability Maintainability Engineering. McGraw-Hill.

[4] Putcha, C.S., Mikula,D.F., Dueease, R.A., Jensen, T., Peercy, R.L. and L. Dang (2004), ‘Risk Methodology for failed hardware, International Journal of Modeling and Simulation (now Taylor and Francis).

[8] tp://www.marketwatch.com/story/the-10-best10-worst-us-stocks-of-2017-2017-05-01 [9] https://money.usnews.com/funds/mutualfunds/rankings/health [10] https://money.usnews.com/funds/mutualfunds/rankings/health

ISBN: 1-60132-489-8, CSREA Press ©

218

Int'l Conf. Software Eng. Research and Practice | SERP'18 |

A Web-based Application for Sales Predication Mao Zheng1, Song Chen2, and Yuxin Liu1 Department of Computer Science, University of Wisconsin-LaCrosse, LaCrosse, WI, USA 2 Department of Mathematics and Statistics, University of Wisconsin-LaCrosse, LaCrosse, WI, USA 1

Abstract - Machine learning is a field of computer science that gives computer systems the ability to "learn" with data, without being explicitly programmed. In this project, we are interested in exploring machine learning algorithms to study past sales data in order to make sales predications for Fastenal Company. Fastenal Company is an American company based in Winona, Minnesota. Through distributing goods used by other businesses, it is the largest fastener distributor in North America. It has over 2,600 branches throughout the US, Canada, Mexico and Europe along with 13 distribution centers. Currently, some branches have a surplus of items while others do not have enough. Optimizing the inventory in the distribution centers and branches will certainly increase sales and reduce shipping and storage costs for the company. We developed a web-based application for sales predications. The manager can select a regional distribution center, select a branch that belongs to that distribution center, then select a particular item and choose one of the three machine learning algorithms to forecast sales. Keywords: Machine Learning, Data Science, Web-based Application

1

Introduction

Machine learning is a method of data analysis that automates analytical model building [1]. We are interested in exploring machine learning algorithms to study past sales data in order to make future sales predications for Fastenal Company. Fastenal Company is the largest fastener distributor in North America. It has over 2,600 branches throughout the US, Canada, Mexico and Europe along with 13 distribution centers [2]. Optimizing the inventory in the distribution centers and branches will certainly reduce shipping and storage costs for the company. In order to do so, we are using Fastenal’s product consumption table from the period of 2013 to 2016 to predict product sales in the future as a starting point. In our approach, we used Linear Regression, Autoregressive Integrated Moving Average (ARIMA) and Long Short Term Memory (LSTM) to forecast product sales in the selected branch or distribution center. To evaluate the forecast result, we used the Coefficient of Determination (R2 for short), and Error Rate, which is our own proposed approach to measure the accuracy of our model.

The size of the consumption table is over 9GB with over 146 millions lines. The three predication models used in this study have different processing times. The Linear Regression is the fastest, and the LSTM is the slowest. However, LSTM has the best accuracy. Due to the large amount of data and limited computing power, we propose a process flow to use the three models in practice. We run the data through a Linear Regression model first. Based on the model accuracy measurement, we either accept the forecast result or pass the bad fit data into the ARIMA model. Similarly, we then either accept the ARIMA model result or pass the bad fit data to the LSTM model. This software application is developed in Python and uses Django framework and MySQL as the backend database.

2

The Architecture Design

The client-server architecture is used in the sales prediction system, illustrated in Figure 1. A web browser that connects to the server will act as the client. The architecture uses a Model-View-Controller (MVC) pattern. The controller is responsible for interactions with the client – getting requests from the client and sending responses back to the client. While getting requests from the client, the controller processes some of the information from the client and then passes the request to the model. There are three modules in this model – Sign in module, Prediction module and Profile module. The first module is responsible for sign up, sign in and sign out functionaries. The second module is responsible for sales prediction. And the third module is for modifying the profile of the user. In this project, the server is a single machine where the project is deployed.

Figure 1 Client-Server Architecture Design Diagram

ISBN: 1-60132-489-8, CSREA Press ©

Int'l Conf. Software Eng. Research and Practice | SERP'18 |

3

Machine Learning Algorithms

In this section, we briefly introduce the three machine learning algorithms used in our project.

3.1

Linear Regression

In statistics, linear regression is a linear approach for modeling the relationship between a dependent variable y and one or more independent variables denoted X. As a machine learning model, linear regression is the simplest form of modeling of a supervised learning problem. For example, given a set of points as training data, we can find a linear function that best fits them as a linear model. By using the linear function, when can do a prediction when new points come. A linear regression is a possible method to help Fastenal predict sales. However, this model cannot explain seasonal changes along with many other variables. Therefore, in addition to the linear regression model, we needed to use other machine learning models to improve the accuracy of sales predictions.

3.2

Autoregressive Integrated Moving Average (ARIMA)

As mentioned above, time plays an important role in the data set given by Fastenal. We needed to explore other machine learning models to handle the complex changes in data, such as seasonal changes. Time series models [3] are considered as a solution in handling the seasonal demand in Fastenal problem. A time series is a series of data points indexed in time order. Most commonly, a time series is a sequence taken at successive equally spaced points in time [2]. By analyzing time series data, we can extract meaningful statistics and other characteristics. It focuses on comparing values of a single time series or multiple dependent time series at different points in time. The first model we used in this project is ARIMA, an acronym that stands for autoregressive integrated moving average.

3.3

Long Short Term Memory (LSTM)

In our project, LSTM is the second time series model. LSTM means long short-term memory. A powerful type of neural network designed to handle sequence dependence is called recurrent neural networks (RNN for short) [3]. The LSTM model is a type of RNN, which can have the ability to handle non-linear time series prediction problems by feedforward networks using fixed size time windows. Compared to ARIMA, the promise of LSTM is that the temporal dependence in the input data can be learned. We used the LSTM model to learn the trend and the seasonal changes of a given data set more accurately.

219

4

Evaluation of the Prediction Results

The 4-year consumption table has the sales data for 48 months. We chose data from 44 months as the training data, to predict the sales for the next four months. The actual data of the last four months was used to measure the accuracy of the prediction. To evaluate the forecast result, we are using the classical measurement Coefficient of Determination (R2 for short), and our own proposed Error Rate to measure the accuracy of our predication models. R2 is a measure of the residual error in the model. We can use this as a measure of how good the model fits the historic data 2 set. R takes on values from (x, 1], with 1 meaning that the model is a perfect fit to the dataset, where x can be negative 2 values. When fitting non-linear functions to the data, R maybe negative. 2 For the linear regression model, R is the proportion of the variance in the dependent variable that is predictable from the independent variables. Generally speaking, it is how much better your regression line is than a simple horizontal line through the mean of the data. Therefore, it can be used as an assessment of how good the linear regression model fits the historical data. The better the linear regression fits the data, value to 1. the closer of the 2 However, for a non-linear model, R often is a negative value. It is not fit to evaluate the non-linear model anymore. Therefore, we propose using Error Rate to evaluate time series models. The equation for calculating Error Rate is shown as the following:

In the Error Rate equation, n is the number of months in testing set, ACTUAL is the actual value in the testing set, PREDICTION is the prediction value given by the time series model, MEAN is the average obtained by dividing the sum of actual value in the training set by their number. The range of ], the closer to 0, means the better the Error Rate is [0, time series model fit to the data. From the Error Rate value, we can easily know the error between the prediction values and actual values. By dividing the square of MEAN value helped us standardized the Error Rate so that we can compare the Error Rate among different items. Therefore, Error Rate can help us evaluate the two time series models - ARIMA and LSTM.

ISBN: 1-60132-489-8, CSREA Press ©

220

4.1

Int'l Conf. Software Eng. Research and Practice | SERP'18 |

Sample Evaluation Results

Figure 2 and Figure 3 are the screenshots of good fit and is 0.579 in Figure 2. bad fit using linear regression. The This result was better than most of the other linear regression results. We also calculated Error Rate in the Linear Regression model, although it is not used in the evaluation.

evaluate the ARIMA model although the calculated as well.

value was

For ARIMA and LSTM, the good fit must have an Error Rate close to 0. And, the Error Rate of the bad fit is greater than 1.

Please note all the data given to us from Fastenal is encrypted. Hence the distribution center, branch and items are non meaningful strings.

Figure 4 Good Fit of ARIMA for the Item >(>!>(& in Branch qpzdt

Figure 5 Bad Fit of ARIMA for the Item ~